from:"Uma Maheswara Rao G \(JIRA\)"

[jira] [Commented] (HDFS-11150) [SPS]: Provide persistence when satisfying storage policy.

2017-01-10 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816508#comment-15816508
 ] 

Uma Maheswara Rao G commented on HDFS-11150:


Latest patch almost looks good to me. A few minor improvements on tests

# .
{quote}
+//  // test directory
{quote}
Can you remove this double comment?
# .
{code}
  @Test
{code}
Please add timeouts for tests
Also add java doc for each test what they are doing.
# .
{code}
"WARM"
{code}
Can we make these string as constants in the file?
# .
typo: DataNodes' —> DataNode’s
# .
I think waitExpectedStorageType is duplicate from other testfiles. Can we move 
this to DFSTestUtil.java and rename method to waitForExpectedStorageType
I think 
TestStoragePolicySatisfyWorker#waitForLocatedBlockWithArchiveStorageType also 
can use this common method.

At last, tasks to address are: 
# When we finish the movement successfully, we should clean Xattrs.
# When we disable SPS dynamically, we should clean Xattrs
I am ok to handle them in separate JIRA. Could you be able to raise a JIRA to 
track them?   (They both can be handled in single JIRA IMO)


> [SPS]: Provide persistence when satisfying storage policy.
> --
>
> Key: HDFS-11150
> URL: https://issues.apache.org/jira/browse/HDFS-11150
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
> Attachments: HDFS-11150-HDFS-10285.001.patch, 
> HDFS-11150-HDFS-10285.002.patch, HDFS-11150-HDFS-10285.003.patch, 
> HDFS-11150-HDFS-10285.004.patch, HDFS-11150-HDFS-10285.005.patch, 
> HDFS-11150-HDFS-10285.006.patch, editsStored, editsStored.xml
>
>
> Provide persistence for SPS in case that Hadoop cluster crashes by accident. 
> Basically we need to change EditLog and FsImage here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11309) [SPS]: chooseTargetTypeInSameNode should pass accurate block size to chooseStorage4Block while choosing target

2017-01-10 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11309:
---
Summary: [SPS]: chooseTargetTypeInSameNode should pass accurate block size 
to chooseStorage4Block while choosing target  (was: chooseTargetTypeInSameNode 
should pass accurate block size to chooseStorage4Block while choosing target)

> [SPS]: chooseTargetTypeInSameNode should pass accurate block size to 
> chooseStorage4Block while choosing target
> --
>
> Key: HDFS-11309
> URL: https://issues.apache.org/jira/browse/HDFS-11309
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>
> Currently chooseTargetTypeInSameNode is not passing accurate block size to 
> chooseStorage4Block while choosing local target. Instead of accurate size we 
> are passing 0, which assumes to ignore space constraint in the storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-11309) chooseTargetTypeInSameNode should pass accurate block size to chooseStorage4Block while choosing target

2017-01-09 Thread Uma Maheswara Rao G (JIRA)

Uma Maheswara Rao G created HDFS-11309:
--

 Summary: chooseTargetTypeInSameNode should pass accurate block 
size to chooseStorage4Block while choosing target
 Key: HDFS-11309
 URL: https://issues.apache.org/jira/browse/HDFS-11309
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS-10285
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G


Currently chooseTargetTypeInSameNode is not passing accurate block size to 
chooseStorage4Block while choosing local target. Instead of accurate size we 
are passing 0, which assumes to ignore space constraint in the storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11150) [SPS]: Provide persistence when satisfying storage policy.

2017-01-09 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813252#comment-15813252
 ] 

Uma Maheswara Rao G commented on HDFS-11150:


[~yuanbo], please proceed with your patch now. HDFS-11293 has committed just 
few minutes ago. 

> [SPS]: Provide persistence when satisfying storage policy.
> --
>
> Key: HDFS-11150
> URL: https://issues.apache.org/jira/browse/HDFS-11150
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
> Attachments: HDFS-11150-HDFS-10285.001.patch, 
> HDFS-11150-HDFS-10285.002.patch, HDFS-11150-HDFS-10285.003.patch, 
> HDFS-11150-HDFS-10285.004.patch, HDFS-11150-HDFS-10285.005.patch, 
> editsStored, editsStored.xml
>
>
> Provide persistence for SPS in case that Hadoop cluster crashes by accident. 
> Basically we need to change EditLog and FsImage here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11293) [SPS]: Local DN should be given preference as source node, when target available in same node

2017-01-09 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11293:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-10285
   Status: Resolved  (was: Patch Available)

> [SPS]: Local DN should be given preference as source node, when target 
> available in same node
> -
>
> Key: HDFS-11293
> URL: https://issues.apache.org/jira/browse/HDFS-11293
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Yuanbo Liu
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Fix For: HDFS-10285
>
> Attachments: HDFS-11293-HDFS-10285-00.patch, 
> HDFS-11293-HDFS-10285-01.patch, HDFS-11293-HDFS-10285-02.patch
>
>
> In {{FsDatasetImpl#createTemporary}}, we use {{volumeMap}} to get replica 
> info by block pool id. But in this situation:
> {code}
> datanode A => {DISK, SSD}, datanode B => {DISK, ARCHIVE}.
> 1. the same block replica exists in A[DISK] and B[DISK].
> 2. the block pool id of datanode A and datanode B are the same.
> {code}
> Then we start to change the file's storage policy and move the block replica 
> in the cluster. Very likely we have to move block from B[DISK] to A[SSD], at 
> this time, datanode A throws ReplicaAlreadyExistsException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11293) [SPS]: Local DN should be given preference as source node, when target available in same node

2017-01-09 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813069#comment-15813069
 ] 

Uma Maheswara Rao G commented on HDFS-11293:


I have just committed this to branch.
Thanks [~yuanbo] and [~rakeshr] for reviews!
Thanks [~yuanbo] for finding issue and sharing test cases.


> [SPS]: Local DN should be given preference as source node, when target 
> available in same node
> -
>
> Key: HDFS-11293
> URL: https://issues.apache.org/jira/browse/HDFS-11293
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Yuanbo Liu
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Attachments: HDFS-11293-HDFS-10285-00.patch, 
> HDFS-11293-HDFS-10285-01.patch, HDFS-11293-HDFS-10285-02.patch
>
>
> In {{FsDatasetImpl#createTemporary}}, we use {{volumeMap}} to get replica 
> info by block pool id. But in this situation:
> {code}
> datanode A => {DISK, SSD}, datanode B => {DISK, ARCHIVE}.
> 1. the same block replica exists in A[DISK] and B[DISK].
> 2. the block pool id of datanode A and datanode B are the same.
> {code}
> Then we start to change the file's storage policy and move the block replica 
> in the cluster. Very likely we have to move block from B[DISK] to A[SSD], at 
> this time, datanode A throws ReplicaAlreadyExistsException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11293) [SPS]: Local DN should be given preference as source node, when target available in same node

2017-01-09 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11293:
---
Attachment: HDFS-11293-HDFS-10285-02.patch

Thank you [~rakeshr]
I overlooked on the test which was not having try -finally. Fixed it
Fixed the typos in comments.
Please take a look.

> [SPS]: Local DN should be given preference as source node, when target 
> available in same node
> -
>
> Key: HDFS-11293
> URL: https://issues.apache.org/jira/browse/HDFS-11293
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Yuanbo Liu
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Attachments: HDFS-11293-HDFS-10285-00.patch, 
> HDFS-11293-HDFS-10285-01.patch, HDFS-11293-HDFS-10285-02.patch
>
>
> In {{FsDatasetImpl#createTemporary}}, we use {{volumeMap}} to get replica 
> info by block pool id. But in this situation:
> {code}
> datanode A => {DISK, SSD}, datanode B => {DISK, ARCHIVE}.
> 1. the same block replica exists in A[DISK] and B[DISK].
> 2. the block pool id of datanode A and datanode B are the same.
> {code}
> Then we start to change the file's storage policy and move the block replica 
> in the cluster. Very likely we have to move block from B[DISK] to A[SSD], at 
> this time, datanode A throws ReplicaAlreadyExistsException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11293) [SPS]: Local DN should be given preference as source node, when target available in same node

2017-01-09 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11293:
---
Component/s: namenode

> [SPS]: Local DN should be given preference as source node, when target 
> available in same node
> -
>
> Key: HDFS-11293
> URL: https://issues.apache.org/jira/browse/HDFS-11293
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Yuanbo Liu
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Attachments: HDFS-11293-HDFS-10285-00.patch, 
> HDFS-11293-HDFS-10285-01.patch
>
>
> In {{FsDatasetImpl#createTemporary}}, we use {{volumeMap}} to get replica 
> info by block pool id. But in this situation:
> {code}
> datanode A => {DISK, SSD}, datanode B => {DISK, ARCHIVE}.
> 1. the same block replica exists in A[DISK] and B[DISK].
> 2. the block pool id of datanode A and datanode B are the same.
> {code}
> Then we start to change the file's storage policy and move the block replica 
> in the cluster. Very likely we have to move block from B[DISK] to A[SSD], at 
> this time, datanode A throws ReplicaAlreadyExistsException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11293) [SPS]: Local DN should be given preference as source node, when target available in same node

2017-01-09 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11293:
---
Affects Version/s: HDFS-10285

> [SPS]: Local DN should be given preference as source node, when target 
> available in same node
> -
>
> Key: HDFS-11293
> URL: https://issues.apache.org/jira/browse/HDFS-11293
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Yuanbo Liu
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Attachments: HDFS-11293-HDFS-10285-00.patch, 
> HDFS-11293-HDFS-10285-01.patch
>
>
> In {{FsDatasetImpl#createTemporary}}, we use {{volumeMap}} to get replica 
> info by block pool id. But in this situation:
> {code}
> datanode A => {DISK, SSD}, datanode B => {DISK, ARCHIVE}.
> 1. the same block replica exists in A[DISK] and B[DISK].
> 2. the block pool id of datanode A and datanode B are the same.
> {code}
> Then we start to change the file's storage policy and move the block replica 
> in the cluster. Very likely we have to move block from B[DISK] to A[SSD], at 
> this time, datanode A throws ReplicaAlreadyExistsException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10285) Storage Policy Satisfier in Namenode

2017-01-09 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-10285:
---
Affects Version/s: (was: 2.7.2)
   HDFS-10285
 Target Version/s:   (was: )
   Status: Patch Available  (was: Open)

> Storage Policy Satisfier in Namenode
> 
>
> Key: HDFS-10285
> URL: https://issues.apache.org/jira/browse/HDFS-10285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: Storage-Policy-Satisfier-in-HDFS-May10.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11289) [SPS]: Make SPS movement monitor timeouts configurable

2017-01-09 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11289:
---
Attachment: HDFS-11289-HDFS-10285-01.patch

Thank you [~rakeshr] for the review.
Attached new patch to address the comments.

> [SPS]: Make SPS movement monitor timeouts configurable
> --
>
> Key: HDFS-11289
> URL: https://issues.apache.org/jira/browse/HDFS-11289
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-11289-HDFS-10285-00.patch, 
> HDFS-11289-HDFS-10285-01.patch
>
>
> Currently SPS tracking monitor timeouts were hardcoded. This is the JIRA for 
> making it configurable.
> {code}
>  // TODO: below selfRetryTimeout and checkTimeout can be configurable later
> // Now, the default values of selfRetryTimeout and checkTimeout are 30mins
> // and 5mins respectively
> this.storageMovementsMonitor = new BlockStorageMovementAttemptedItems(
> 5 * 60 * 1000, 30 * 60 * 1000, storageMovementNeeded);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11293) [SPS]: Local DN should be given prefernce as source node, when target available in same node

2017-01-08 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11293:
---
Attachment: HDFS-11293-HDFS-10285-01.patch

Thank you [~yuanbo] for reviewing the patch.
Its a good catch. Yes we can remove source element as we already selected it.
Attached a patch by incorporating your testcase and also took advantage to 
remove unused parameter 'existing' from the code and added expected targets 
meeting check instead of depending on source list iterations for more 
correctness.

Please review the patch!

> [SPS]: Local DN should be given prefernce as source node, when target 
> available in same node
> 
>
> Key: HDFS-11293
> URL: https://issues.apache.org/jira/browse/HDFS-11293
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Yuanbo Liu
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Attachments: HDFS-11293-HDFS-10285-00.patch, 
> HDFS-11293-HDFS-10285-01.patch
>
>
> In {{FsDatasetImpl#createTemporary}}, we use {{volumeMap}} to get replica 
> info by block pool id. But in this situation:
> {code}
> datanode A => {DISK, SSD}, datanode B => {DISK, ARCHIVE}.
> 1. the same block replica exists in A[DISK] and B[DISK].
> 2. the block pool id of datanode A and datanode B are the same.
> {code}
> Then we start to change the file's storage policy and move the block replica 
> in the cluster. Very likely we have to move block from B[DISK] to A[SSD], at 
> this time, datanode A throws ReplicaAlreadyExistsException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11293) [SPS]: Local DN should be given prefernce as source node, when target available in same node

2017-01-07 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11293:
---
Status: Patch Available  (was: Open)

> [SPS]: Local DN should be given prefernce as source node, when target 
> available in same node
> 
>
> Key: HDFS-11293
> URL: https://issues.apache.org/jira/browse/HDFS-11293
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
>Priority: Critical
> Attachments: HDFS-11293-HDFS-10285-00.patch
>
>
> In {{FsDatasetImpl#createTemporary}}, we use {{volumeMap}} to get replica 
> info by block pool id. But in this situation:
> {code}
> datanode A => {DISK, SSD}, datanode B => {DISK, ARCHIVE}.
> 1. the same block replica exists in A[DISK] and B[DISK].
> 2. the block pool id of datanode A and datanode B are the same.
> {code}
> Then we start to change the file's storage policy and move the block replica 
> in the cluster. Very likely we have to move block from B[DISK] to A[SSD], at 
> this time, datanode A throws ReplicaAlreadyExistsException and it's not a 
> correct behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11284) [SPS]: Avoid running SPS under safemode and fix issues in target node choosing.

2017-01-07 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15808082#comment-15808082
 ] 

Uma Maheswara Rao G commented on HDFS-11284:


BTW, do you have a testcase for #3?  If yes, could you attach testcase alone as 
patch here?
As per the above discussions, after retry item should be removed as it should 
have satisfied already.

> [SPS]: Avoid running SPS under safemode and fix issues in target node 
> choosing.
> ---
>
> Key: HDFS-11284
> URL: https://issues.apache.org/jira/browse/HDFS-11284
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
> Attachments: TestSatisfier.java
>
>
> Recently I've found in some conditions, SPS is not stable:
> * SPS runs under safe mode.
> * There're some overlap nodes in the chosen target nodes.
> * The real replication number of block doesn't match the replication factor. 
> For example, the real replication is 2 while the replication factor is 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11150) [SPS]: Provide persistence when satisfying storage policy.

2017-01-07 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15808073#comment-15808073
 ] 

Uma Maheswara Rao G commented on HDFS-11150:


HI [~yuanbo], Thank you for your efforts. Thank you for adding more test cases.
Here is my suggestion:
I prefer to separate StoragePolicySatsifier changes from here (except safemode 
check). We have other JIRAs raised to fix node choosing issues right. Please 
use them to fix the issues if you feel any thing. BTW, I have attached a patch 
in HDFS-11293. Please check if that is helping to fix these issues. Feel free 
to review it and put your feedback there.
Keep only persistence based test cases here. If any other things failing due to 
node choosing, add a comment about them, we can fix as part of 
HDFS-11293/HDFS-11284.

> [SPS]: Provide persistence when satisfying storage policy.
> --
>
> Key: HDFS-11150
> URL: https://issues.apache.org/jira/browse/HDFS-11150
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
> Attachments: HDFS-11150-HDFS-10285.001.patch, 
> HDFS-11150-HDFS-10285.002.patch, HDFS-11150-HDFS-10285.003.patch, 
> HDFS-11150-HDFS-10285.004.patch, HDFS-11150-HDFS-10285.005.patch, 
> editsStored, editsStored.xml
>
>
> Provide persistence for SPS in case that Hadoop cluster crashes by accident. 
> Basically we need to change EditLog and FsImage here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11293) [SPS] Local DN should be given prefernce as source node, when target available in same node

2017-01-07 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11293:
---
Summary: [SPS] Local DN should be given prefernce as source node, when 
target available in same node  (was: [SPS] Local node should be given prefernce 
as source node, when target available in same node)

> [SPS] Local DN should be given prefernce as source node, when target 
> available in same node
> ---
>
> Key: HDFS-11293
> URL: https://issues.apache.org/jira/browse/HDFS-11293
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
>Priority: Critical
> Attachments: HDFS-11293-HDFS-10285-00.patch
>
>
> In {{FsDatasetImpl#createTemporary}}, we use {{volumeMap}} to get replica 
> info by block pool id. But in this situation:
> {code}
> datanode A => {DISK, SSD}, datanode B => {DISK, ARCHIVE}.
> 1. the same block replica exists in A[DISK] and B[DISK].
> 2. the block pool id of datanode A and datanode B are the same.
> {code}
> Then we start to change the file's storage policy and move the block replica 
> in the cluster. Very likely we have to move block from B[DISK] to A[SSD], at 
> this time, datanode A throws ReplicaAlreadyExistsException and it's not a 
> correct behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11293) [SPS]: Local DN should be given prefernce as source node, when target available in same node

2017-01-07 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11293:
---
Summary: [SPS]: Local DN should be given prefernce as source node, when 
target available in same node  (was: [SPS] Local DN should be given prefernce 
as source node, when target available in same node)

> [SPS]: Local DN should be given prefernce as source node, when target 
> available in same node
> 
>
> Key: HDFS-11293
> URL: https://issues.apache.org/jira/browse/HDFS-11293
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
>Priority: Critical
> Attachments: HDFS-11293-HDFS-10285-00.patch
>
>
> In {{FsDatasetImpl#createTemporary}}, we use {{volumeMap}} to get replica 
> info by block pool id. But in this situation:
> {code}
> datanode A => {DISK, SSD}, datanode B => {DISK, ARCHIVE}.
> 1. the same block replica exists in A[DISK] and B[DISK].
> 2. the block pool id of datanode A and datanode B are the same.
> {code}
> Then we start to change the file's storage policy and move the block replica 
> in the cluster. Very likely we have to move block from B[DISK] to A[SSD], at 
> this time, datanode A throws ReplicaAlreadyExistsException and it's not a 
> correct behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11293) [SPS] Local node should be given prefernce as source node, when target available in same node

2017-01-07 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11293:
---
Summary: [SPS] Local node should be given prefernce as source node, when 
target available in same node  (was: [SPS] Local node should be given prefernce 
as source node when target available in same node)

> [SPS] Local node should be given prefernce as source node, when target 
> available in same node
> -
>
> Key: HDFS-11293
> URL: https://issues.apache.org/jira/browse/HDFS-11293
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
>Priority: Critical
> Attachments: HDFS-11293-HDFS-10285-00.patch
>
>
> In {{FsDatasetImpl#createTemporary}}, we use {{volumeMap}} to get replica 
> info by block pool id. But in this situation:
> {code}
> datanode A => {DISK, SSD}, datanode B => {DISK, ARCHIVE}.
> 1. the same block replica exists in A[DISK] and B[DISK].
> 2. the block pool id of datanode A and datanode B are the same.
> {code}
> Then we start to change the file's storage policy and move the block replica 
> in the cluster. Very likely we have to move block from B[DISK] to A[SSD], at 
> this time, datanode A throws ReplicaAlreadyExistsException and it's not a 
> correct behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11293) [SPS] Local node should be given prefernce as source node when target available in same node

2017-01-07 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11293:
---
Summary: [SPS] Local node should be given prefernce as source node when 
target available in same node  (was: FsDatasetImpl throws 
ReplicaAlreadyExistsException in a wrong situation)

> [SPS] Local node should be given prefernce as source node when target 
> available in same node
> 
>
> Key: HDFS-11293
> URL: https://issues.apache.org/jira/browse/HDFS-11293
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
>Priority: Critical
> Attachments: HDFS-11293-HDFS-10285-00.patch
>
>
> In {{FsDatasetImpl#createTemporary}}, we use {{volumeMap}} to get replica 
> info by block pool id. But in this situation:
> {code}
> datanode A => {DISK, SSD}, datanode B => {DISK, ARCHIVE}.
> 1. the same block replica exists in A[DISK] and B[DISK].
> 2. the block pool id of datanode A and datanode B are the same.
> {code}
> Then we start to change the file's storage policy and move the block replica 
> in the cluster. Very likely we have to move block from B[DISK] to A[SSD], at 
> this time, datanode A throws ReplicaAlreadyExistsException and it's not a 
> correct behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-11293) FsDatasetImpl throws ReplicaAlreadyExistsException in a wrong situation

2017-01-07 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15807317#comment-15807317
 ] 

Uma Maheswara Rao G edited comment on HDFS-11293 at 1/7/17 11:22 AM:
-

I spent some time on this issue to find the root cause of the issue.
Here is the reason for failure:
  We are currently picking source nodes from blockstorages, and we are randomly 
picking source nodes.
 In the current case, existing types[DISK] available in all 3 nodes. 
Here lets assume we have DN1[DISK, ARCHIVE], DN2[DISK, SSD], DN3[DISK, 
RAM_DISK]. When we set storage policy as ONE_SSD and then satisfy, first we 
need to find overlap nodes. From overlap report, DISK is existing type and SSD 
would be expected type. Here we are just picking source nodes from existing 
storages, here all 3 nodes will have existing storage types[DISK] which can be 
qualified as source node. 
{code}
for (StorageType existingType : existing) {
iterator = existingBlockStorages.iterator();
while (iterator.hasNext()) {
  DatanodeStorageInfo datanodeStorageInfo = iterator.next();
  StorageType storageType = datanodeStorageInfo.getStorageType();
  if (storageType == existingType) {
iterator.remove();
sourceWithStorageMap.add(new StorageTypeNodePair(storageType,
datanodeStorageInfo.getDatanodeDescriptor()));
break;
  }
}
  }
{code}

But in reality if we choose DN1 or DN2 as source nodes[with DISK], obviously 
target node[SSD] would be DN2. But since DN2 already has replica, it would fail 
with ReplicaAlreadyExistsException. 
Some times it may pass if source node was picked as DN2 (This is possible, 
because we are just picking any one node among storages). When source and 
target node is same, we move the block locally.
{code}
 try {
  // Move the block to different storage in the same datanode
  if (proxySource.equals(datanode.getDatanodeId())) {
ReplicaInfo oldReplica = datanode.data.moveBlockAcrossStorage(block,
storageType);
if (oldReplica != null) {
  LOG.info("Moved " + block + " from StorageType "
  + oldReplica.getVolume().getStorageType() + " to " + storageType);
}
  } else {
{code}

Now Attached a patch to for find the nodes which has sourceType and TargetType, 
then remaining sources will be identified. 

I ran the test several times, it is passing consistently now.
 
[~yuanbo], Thank you for sharing the test case in email. Included testcase is 
your test case and passing now. Could you please verify now?


was (Author: umamaheswararao):
I spent some time on this issue to find the root cause of the issue.
Here is the reason for failure:
  We are currently picking source nodes from blockstorages, and we are randomly 
picking source nodes.
 In the current case, existing types[DISK] available in all 3 nodes. 
Here lets assume we have DN1[DISK, ARCHIVE], DN2[DISK, SSD], DN3[DISK, 
RAM_DISK]. When we set storage policy as ONE_SSD and then satisfy, first we 
need to find overlap nodes. From overlap report, DISK is existing type and SSD 
would be expected type. Here we are just picking source nodes from existing 
storages, here all 3 nodes will have existing storage types[DISK] which can be 
qualified as source node. 
{code}
for (StorageType existingType : existing) {
iterator = existingBlockStorages.iterator();
while (iterator.hasNext()) {
  DatanodeStorageInfo datanodeStorageInfo = iterator.next();
  StorageType storageType = datanodeStorageInfo.getStorageType();
  if (storageType == existingType) {
iterator.remove();
sourceWithStorageMap.add(new StorageTypeNodePair(storageType,
datanodeStorageInfo.getDatanodeDescriptor()));
break;
  }
}
  }
{code}

But in reality if we choose DN1 or DN2 as source nodes[with DISK], obviously 
target node[SSD] would be DN2. But since DN2 already has replica, it would fail 
with ReplicaAlreadyExistsException. 
Some times it may pass if source node was picked as DN2 (This is possible, 
because we are just picking any one node among storages). When source and 
target node is same, we move the block locally.
{code}
 try {
  // Move the block to different storage in the same datanode
  if (proxySource.equals(datanode.getDatanodeId())) {
ReplicaInfo oldReplica = datanode.data.moveBlockAcrossStorage(block,
storageType);
if (oldReplica != null) {
  LOG.info("Moved " + block + " from StorageType "
  + oldReplica.getVolume().getStorageType() + " to " + storageType);
}
  } else {
{code}

Now Attached a patch to for find the nodes which has sourceType and TargetType, 
then remaining sources will be identified. 

I ran the test several times, it is passing

[jira] [Updated] (HDFS-11293) FsDatasetImpl throws ReplicaAlreadyExistsException in a wrong situation

2017-01-07 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11293:
---
Attachment: HDFS-11293-HDFS-10285-00.patch

I spent some time on this issue to find the root cause of the issue.
Here is the reason for failure:
  We are currently picking source nodes from blockstorages, and we are randomly 
picking source nodes.
 In the current case, existing types[DISK] available in all 3 nodes. 
Here lets assume we have DN1[DISK, ARCHIVE], DN2[DISK, SSD], DN3[DISK, 
RAM_DISK]. When we set storage policy as ONE_SSD and then satisfy, first we 
need to find overlap nodes. From overlap report, DISK is existing type and SSD 
would be expected type. Here we are just picking source nodes from existing 
storages, here all 3 nodes will have existing storage types[DISK] which can be 
qualified as source node. 
{code}
for (StorageType existingType : existing) {
iterator = existingBlockStorages.iterator();
while (iterator.hasNext()) {
  DatanodeStorageInfo datanodeStorageInfo = iterator.next();
  StorageType storageType = datanodeStorageInfo.getStorageType();
  if (storageType == existingType) {
iterator.remove();
sourceWithStorageMap.add(new StorageTypeNodePair(storageType,
datanodeStorageInfo.getDatanodeDescriptor()));
break;
  }
}
  }
{code}

But in reality if we choose DN1 or DN2 as source nodes[with DISK], obviously 
target node[SSD] would be DN2. But since DN2 already has replica, it would fail 
with ReplicaAlreadyExistsException. 
Some times it may pass if source node was picked as DN2 (This is possible, 
because we are just picking any one node among storages). When source and 
target node is same, we move the block locally.
{code}
 try {
  // Move the block to different storage in the same datanode
  if (proxySource.equals(datanode.getDatanodeId())) {
ReplicaInfo oldReplica = datanode.data.moveBlockAcrossStorage(block,
storageType);
if (oldReplica != null) {
  LOG.info("Moved " + block + " from StorageType "
  + oldReplica.getVolume().getStorageType() + " to " + storageType);
}
  } else {
{code}

Now Attached a patch to for find the nodes which has sourceType and TargetType, 
then remaining sources will be identified. 

I ran the test several times, it is passing consistently now.
 

> FsDatasetImpl throws ReplicaAlreadyExistsException in a wrong situation
> ---
>
> Key: HDFS-11293
> URL: https://issues.apache.org/jira/browse/HDFS-11293
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
>Priority: Critical
> Attachments: HDFS-11293-HDFS-10285-00.patch
>
>
> In {{FsDatasetImpl#createTemporary}}, we use {{volumeMap}} to get replica 
> info by block pool id. But in this situation:
> {code}
> datanode A => {DISK, SSD}, datanode B => {DISK, ARCHIVE}.
> 1. the same block replica exists in A[DISK] and B[DISK].
> 2. the block pool id of datanode A and datanode B are the same.
> {code}
> Then we start to change the file's storage policy and move the block replica 
> in the cluster. Very likely we have to move block from B[DISK] to A[SSD], at 
> this time, datanode A throws ReplicaAlreadyExistsException and it's not a 
> correct behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11293) FsDatasetImpl throws ReplicaAlreadyExistsException in a wrong situation

2017-01-07 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11293:
---
Issue Type: Sub-task  (was: Bug)
Parent: HDFS-10285

> FsDatasetImpl throws ReplicaAlreadyExistsException in a wrong situation
> ---
>
> Key: HDFS-11293
> URL: https://issues.apache.org/jira/browse/HDFS-11293
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
>Priority: Critical
>
> In {{FsDatasetImpl#createTemporary}}, we use {{volumeMap}} to get replica 
> info by block pool id. But in this situation:
> {code}
> datanode A => {DISK, SSD}, datanode B => {DISK, ARCHIVE}.
> 1. the same block replica exists in A[DISK] and B[DISK].
> 2. the block pool id of datanode A and datanode B are the same.
> {code}
> Then we start to change the file's storage policy and move the block replica 
> in the cluster. Very likely we have to move block from B[DISK] to A[SSD], at 
> this time, datanode A throws ReplicaAlreadyExistsException and it's not a 
> correct behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11293) FsDatasetImpl throws ReplicaAlreadyExistsException in a wrong situation

2017-01-05 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15802413#comment-15802413
 ] 

Uma Maheswara Rao G commented on HDFS-11293:


[~yuanbo], 

{quote}
A[DISK], not A[SSD].
{quote}
This should have selected as part of chooseTargetInSameNode. If the target is 
in same node, it should move little differently. 

Related code to be executed in this case is
In DataXceiver#replaceBlock
{code}
 // Move the block to different storage in the same datanode
  if (proxySource.equals(datanode.getDatanodeId())) {
ReplicaInfo oldReplica = datanode.data.moveBlockAcrossStorage(block,
storageType);
if (oldReplica != null) {
  LOG.info("Moved " + block + " from StorageType "
  + oldReplica.getVolume().getStorageType() + " to " + storageType);
}
  } else {
{code}

Can you confirm code flow going this way? It would be great if you can attach 
test case here. Also if this reproducing consistently?

> FsDatasetImpl throws ReplicaAlreadyExistsException in a wrong situation
> ---
>
> Key: HDFS-11293
> URL: https://issues.apache.org/jira/browse/HDFS-11293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
>Priority: Critical
>
> In {{FsDatasetImpl#createTemporary}}, we use {{volumeMap}} to get replica 
> info by block pool id. But in this situation:
> {code}
> datanode A => {DISK, SSD}, datanode B => {DISK, ARCHIVE}.
> 1. the same block replica exists in A[DISK] and B[DISK].
> 2. the block pool id of datanode A and datanode B are the same.
> {code}
> Then we start to change the file's storage policy and move the block replica 
> in the cluster. Very likely we have to move block from B[DISK] to A[SSD], at 
> this time, datanode A throws ReplicaAlreadyExistsException and it's not a 
> correct behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11193) [SPS]: Erasure coded files should be considered for satisfying storage policy

2017-01-05 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11193:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-10285
   Status: Resolved  (was: Patch Available)

Thank you [~rakeshr], I have just push the patch to branch

> [SPS]: Erasure coded files should be considered for satisfying storage policy
> -
>
> Key: HDFS-11193
> URL: https://issues.apache.org/jira/browse/HDFS-11193
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Rakesh R
>Assignee: Rakesh R
> Fix For: HDFS-10285
>
> Attachments: HDFS-11193-HDFS-10285-00.patch, 
> HDFS-11193-HDFS-10285-01.patch, HDFS-11193-HDFS-10285-02.patch, 
> HDFS-11193-HDFS-10285-03.patch, HDFS-11193-HDFS-10285-04.patch
>
>
> Erasure coded striped files supports storage policies {{HOT, COLD, ALLSSD}}. 
> {{HdfsAdmin#satisfyStoragePolicy}} API call on a directory should consider 
> all immediate files under that directory and need to check that, the files 
> really matching with namespace storage policy. All the mismatched striped 
> blocks should be chosen for block movement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11293) FsDatasetImpl throws ReplicaAlreadyExistsException in a wrong situation

2017-01-05 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15800742#comment-15800742
 ] 

Uma Maheswara Rao G commented on HDFS-11293:


[~yuanbo], I am wondering how 'A' chosen as target when replica already there 
in that node. the scheduling is wrong if that happening right? Can you explain 
a little more whats your scenario?

> FsDatasetImpl throws ReplicaAlreadyExistsException in a wrong situation
> ---
>
> Key: HDFS-11293
> URL: https://issues.apache.org/jira/browse/HDFS-11293
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
>Priority: Critical
>
> In {{FsDatasetImpl#createTemporary}}, we use {{volumeMap}} to get replica 
> info by block pool id. But in this situation:
> {code}
> datanode A => {DISK, SSD}, datanode B => {DISK, ARCHIVE}.
> 1. the same block replica exists in A[DISK] and B[DISK].
> 2. the block pool id of datanode A and datanode B are the same.
> {code}
> Then we start to change the file's storage policy and move the block replica 
> in the cluster. Very likely we have to move block from B[DISK] to A[SSD], at 
> this time, datanode A throws ReplicaAlreadyExistsException and it's not a 
> correct behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11193) [SPS]: Erasure coded files should be considered for satisfying storage policy

2017-01-04 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15800153#comment-15800153
 ] 

Uma Maheswara Rao G commented on HDFS-11193:


[~rakeshr] There is a check style error reported is related and asf license 
header comment is invalid.
Also could you please look at test failures if they are related? 

> [SPS]: Erasure coded files should be considered for satisfying storage policy
> -
>
> Key: HDFS-11193
> URL: https://issues.apache.org/jira/browse/HDFS-11193
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-11193-HDFS-10285-00.patch, 
> HDFS-11193-HDFS-10285-01.patch, HDFS-11193-HDFS-10285-02.patch, 
> HDFS-11193-HDFS-10285-03.patch
>
>
> Erasure coded striped files supports storage policies {{HOT, COLD, ALLSSD}}. 
> {{HdfsAdmin#satisfyStoragePolicy}} API call on a directory should consider 
> all immediate files under that directory and need to check that, the files 
> really matching with namespace storage policy. All the mismatched striped 
> blocks should be chosen for block movement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11193) [SPS]: Erasure coded files should be considered for satisfying storage policy

2017-01-04 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15799327#comment-15799327
 ] 

Uma Maheswara Rao G commented on HDFS-11193:


Latest patch looks good to me. +1
pending jenkins

> [SPS]: Erasure coded files should be considered for satisfying storage policy
> -
>
> Key: HDFS-11193
> URL: https://issues.apache.org/jira/browse/HDFS-11193
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-11193-HDFS-10285-00.patch, 
> HDFS-11193-HDFS-10285-01.patch, HDFS-11193-HDFS-10285-02.patch, 
> HDFS-11193-HDFS-10285-03.patch
>
>
> Erasure coded striped files supports storage policies {{HOT, COLD, ALLSSD}}. 
> {{HdfsAdmin#satisfyStoragePolicy}} API call on a directory should consider 
> all immediate files under that directory and need to check that, the files 
> really matching with namespace storage policy. All the mismatched striped 
> blocks should be chosen for block movement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11289) [SPS]: Make SPS movement monitor timeouts configurable

2017-01-03 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11289:
---
Status: Patch Available  (was: Open)

> [SPS]: Make SPS movement monitor timeouts configurable
> --
>
> Key: HDFS-11289
> URL: https://issues.apache.org/jira/browse/HDFS-11289
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-11289-HDFS-10285-00.patch
>
>
> Currently SPS tracking monitor timeouts were hardcoded. This is the JIRA for 
> making it configurable.
> {code}
>  // TODO: below selfRetryTimeout and checkTimeout can be configurable later
> // Now, the default values of selfRetryTimeout and checkTimeout are 30mins
> // and 5mins respectively
> this.storageMovementsMonitor = new BlockStorageMovementAttemptedItems(
> 5 * 60 * 1000, 30 * 60 * 1000, storageMovementNeeded);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11289) [SPS]: Make SPS movement monitor timeouts configurable

2017-01-03 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11289:
---
Attachment: HDFS-11289-HDFS-10285-00.patch

Attached simple patch for making the mentioned values to be configurable.
Please review

> [SPS]: Make SPS movement monitor timeouts configurable
> --
>
> Key: HDFS-11289
> URL: https://issues.apache.org/jira/browse/HDFS-11289
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-11289-HDFS-10285-00.patch
>
>
> Currently SPS tracking monitor timeouts were hardcoded. This is the JIRA for 
> making it configurable.
> {code}
>  // TODO: below selfRetryTimeout and checkTimeout can be configurable later
> // Now, the default values of selfRetryTimeout and checkTimeout are 30mins
> // and 5mins respectively
> this.storageMovementsMonitor = new BlockStorageMovementAttemptedItems(
> 5 * 60 * 1000, 30 * 60 * 1000, storageMovementNeeded);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11289) [SPS] Make SPS movement monitor timeouts configurable

2017-01-03 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11289:
---
Summary: [SPS] Make SPS movement monitor timeouts configurable  (was: Make 
SPS movement monitor timeouts configurable)

> [SPS] Make SPS movement monitor timeouts configurable
> -
>
> Key: HDFS-11289
> URL: https://issues.apache.org/jira/browse/HDFS-11289
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>
> Currently SPS tracking monitor timeouts were hardcoded. This is the JIRA for 
> making it configurable.
> {code}
>  // TODO: below selfRetryTimeout and checkTimeout can be configurable later
> // Now, the default values of selfRetryTimeout and checkTimeout are 30mins
> // and 5mins respectively
> this.storageMovementsMonitor = new BlockStorageMovementAttemptedItems(
> 5 * 60 * 1000, 30 * 60 * 1000, storageMovementNeeded);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11289) [SPS]: Make SPS movement monitor timeouts configurable

2017-01-03 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11289:
---
Summary: [SPS]: Make SPS movement monitor timeouts configurable  (was: 
[SPS] Make SPS movement monitor timeouts configurable)

> [SPS]: Make SPS movement monitor timeouts configurable
> --
>
> Key: HDFS-11289
> URL: https://issues.apache.org/jira/browse/HDFS-11289
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>
> Currently SPS tracking monitor timeouts were hardcoded. This is the JIRA for 
> making it configurable.
> {code}
>  // TODO: below selfRetryTimeout and checkTimeout can be configurable later
> // Now, the default values of selfRetryTimeout and checkTimeout are 30mins
> // and 5mins respectively
> this.storageMovementsMonitor = new BlockStorageMovementAttemptedItems(
> 5 * 60 * 1000, 30 * 60 * 1000, storageMovementNeeded);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-11284) [SPS]: Avoid running SPS under safemode and fix issues in target node choosing.

2017-01-03 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796425#comment-15796425
 ] 

Uma Maheswara Rao G edited comment on HDFS-11284 at 1/3/17 10:46 PM:
-

HI [~yuanbo], Retry will not happen in DN itself. 
When DN send movement result as failure, NN will take care to retry(HDFS-11029) 
. That time, NN find all existing blocks satisfied, then that items will be 
ignore to send for movement. If still need satisfaction, then it will send by 
finding new src,targets again. The default time for retry was 30mins. (Higher 
timeout made because, some times DN itself takes longer time to send results 
back due to low process nodes, then NN unnecessarily go for retry. This can be 
refined more on testing)
Hope this helps you understand better.

{quote}
Agree, I will go back to HDFS-11150. Since #2 has been addressed, the last 
issue seems belong to retry mechanism. I'm thinking about removing/changing 
this JIRA.
{quote}
Please keep this JIRA open, until you agree on the reason.
Can you confirm one point from your logs that whether the block was deleted due 
to over replication and used the same node for movement(as movement was 
scheduled before)? If thats the case, behavior should be fine. Also can you 
confirm remaining block movements were successful (by looking at logs)?
Any way, go ahead with HDFS-11150 please. There were some test failure related 
to that, can you please check?

Thanks a lot for putting efforts. 

 


was (Author: umamaheswararao):
HI [~yuanbo], Retry will not happen in DN itself. 
When DN send movement result as failure, NN will take care to retry. That time, 
NN find all existing blocks satisfied, then that items will be ignore to send 
for movement. If still need satisfaction, then it will send by finding new 
src,targets again. The default time for retry was 30mins. (Higher timeout made 
because, some times DN itself takes longer time to send results back due to low 
process nodes, then NN unnecessarily go for retry. This can be refined more on 
testing)
Hope this helps you understand better.

{quote}
Agree, I will go back to HDFS-11150. Since #2 has been addressed, the last 
issue seems belong to retry mechanism. I'm thinking about removing/changing 
this JIRA.
{quote}
Please keep this JIRA open, until you agree on the reason.
Can you confirm one point from your logs that whether the block was deleted due 
to over replication and used the same node for movement(as movement was 
scheduled before)? If thats the case, behavior should be fine. Also can you 
confirm remaining block movements were successful (by looking at logs)?
Any way, go ahead with HDFS-11150 please. There were some test failure related 
to that, can you please check?

Thanks a lot for putting efforts. 

 

> [SPS]: Avoid running SPS under safemode and fix issues in target node 
> choosing.
> ---
>
> Key: HDFS-11284
> URL: https://issues.apache.org/jira/browse/HDFS-11284
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
> Attachments: TestSatisfier.java
>
>
> Recently I've found in some conditions, SPS is not stable:
> * SPS runs under safe mode.
> * There're some overlap nodes in the chosen target nodes.
> * The real replication number of block doesn't match the replication factor. 
> For example, the real replication is 2 while the replication factor is 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-11289) Make SPS movement monitor timeouts configurable

2017-01-03 Thread Uma Maheswara Rao G (JIRA)

Uma Maheswara Rao G created HDFS-11289:
--

 Summary: Make SPS movement monitor timeouts configurable
 Key: HDFS-11289
 URL: https://issues.apache.org/jira/browse/HDFS-11289
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS-10285
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G


Currently SPS tracking monitor timeouts were hardcoded. This is the JIRA for 
making it configurable.

{code}
 // TODO: below selfRetryTimeout and checkTimeout can be configurable later
// Now, the default values of selfRetryTimeout and checkTimeout are 30mins
// and 5mins respectively
this.storageMovementsMonitor = new BlockStorageMovementAttemptedItems(
5 * 60 * 1000, 30 * 60 * 1000, storageMovementNeeded);
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11284) [SPS]: Avoid running SPS under safemode and fix issues in target node choosing.

2017-01-03 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796425#comment-15796425
 ] 

Uma Maheswara Rao G commented on HDFS-11284:


HI [~yuanbo], Retry will not happen in DN itself. 
When DN send movement result as failure, NN will take care to retry. That time, 
NN find all existing blocks satisfied, then that items will be ignore to send 
for movement. If still need satisfaction, then it will send by finding new 
src,targets again. The default time for retry was 30mins. (Higher timeout made 
because, some times DN itself takes longer time to send results back due to low 
process nodes, then NN unnecessarily go for retry. This can be refined more on 
testing)
Hope this helps you understand better.

{quote}
Agree, I will go back to HDFS-11150. Since #2 has been addressed, the last 
issue seems belong to retry mechanism. I'm thinking about removing/changing 
this JIRA.
{quote}
Please keep this JIRA open, until you agree on the reason.
Can you confirm one point from your logs that whether the block was deleted due 
to over replication and used the same node for movement(as movement was 
scheduled before)? If thats the case, behavior should be fine. Also can you 
confirm remaining block movements were successful (by looking at logs)?
Any way, go ahead with HDFS-11150 please. There were some test failure related 
to that, can you please check?

Thanks a lot for putting efforts. 

 

> [SPS]: Avoid running SPS under safemode and fix issues in target node 
> choosing.
> ---
>
> Key: HDFS-11284
> URL: https://issues.apache.org/jira/browse/HDFS-11284
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
> Attachments: TestSatisfier.java
>
>
> Recently I've found in some conditions, SPS is not stable:
> * SPS runs under safe mode.
> * There're some overlap nodes in the chosen target nodes.
> * The real replication number of block doesn't match the replication factor. 
> For example, the real replication is 2 while the replication factor is 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11243) [SPS]: Add a protocol command from NN to DN for dropping the SPS work and queues

2017-01-01 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11243:
---
Status: Patch Available  (was: Open)

> [SPS]: Add a protocol command from NN to DN for dropping the SPS work and 
> queues 
> -
>
> Key: HDFS-11243
> URL: https://issues.apache.org/jira/browse/HDFS-11243
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-11243-HDFS-10285-00.patch
>
>
> This JIRA is for adding a protocol command from Namenode to Datanode for 
> dropping SPS work. and Also for dropping in progress queues.
> Use case is: when admin deactivated SPS at NN, then internally NN should 
> issue a command to DNs for dropping in progress queues as well. This command 
> can be packed via heartbeat. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11243) [SPS]: Add a protocol command from NN to DN for dropping the SPS work and queues

2017-01-01 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11243:
---
Attachment: HDFS-11243-HDFS-10285-00.patch

Attached initial patch to address. Please review.

> [SPS]: Add a protocol command from NN to DN for dropping the SPS work and 
> queues 
> -
>
> Key: HDFS-11243
> URL: https://issues.apache.org/jira/browse/HDFS-11243
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-11243-HDFS-10285-00.patch
>
>
> This JIRA is for adding a protocol command from Namenode to Datanode for 
> dropping SPS work. and Also for dropping in progress queues.
> Use case is: when admin deactivated SPS at NN, then internally NN should 
> issue a command to DNs for dropping in progress queues as well. This command 
> can be packed via heartbeat. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11284) [SPS]: Avoid running SPS under safemode and fix issues in target node choosing.

2016-12-31 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15790557#comment-15790557
 ] 

Uma Maheswara Rao G commented on HDFS-11284:


Thanks for the details.
So, except first issue, this JIRA can not be a blocker for HDFS-11150. SO, I 
suggest, fix #1 as part of HDFS-11150 and proceed as that is more important 
task. #3 is special condition where you made file to have under replicated. #3 
alone you can discuss and fix here.

For #3, If I understand the issue correctly, the thing is that we can satisfy 
the the policy for existing replicas. In the case of under replicated blocks, 
NN will try to replicate the block later and while doing, it would choose the 
block location as per the new policy set at NN. Can you confirm that it is not 
happening?

> [SPS]: Avoid running SPS under safemode and fix issues in target node 
> choosing.
> ---
>
> Key: HDFS-11284
> URL: https://issues.apache.org/jira/browse/HDFS-11284
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
>
> Recently I've found in some conditions, SPS is not stable:
> * SPS runs under safe mode.
> * There're some overlap nodes in the chosen target nodes.
> * The real replication number of block doesn't match the replication factor. 
> For example, the real replication is 2 while the replication factor is 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11284) [SPS]: Avoid running SPS under safemode and fix issues in target node choosing.

2016-12-30 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788537#comment-15788537
 ] 

Uma Maheswara Rao G commented on HDFS-11284:


[~yuanbo], Thank you for raising the JIRA. Can you provide more details on #2 
and #3 ? 

> [SPS]: Avoid running SPS under safemode and fix issues in target node 
> choosing.
> ---
>
> Key: HDFS-11284
> URL: https://issues.apache.org/jira/browse/HDFS-11284
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
>
> Recently I've found in some conditions, SPS is not stable:
> * SPS runs under safe mode.
> * There're some overlap nodes in the chosen target nodes.
> * The real replication number of block doesn't match the replication factor. 
> For example, the real replication is 2 while the replication factor is 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11284) [SPS]: Avoid running SPS under safemode and fix issues in target node choosing.

2016-12-30 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11284:
---
Summary: [SPS]: Avoid running SPS under safemode and fix issues in target 
node choosing.  (was: [SPS]: Avoid running SPS under safemode and improve 
target node choosing conditions in SPS.)

> [SPS]: Avoid running SPS under safemode and fix issues in target node 
> choosing.
> ---
>
> Key: HDFS-11284
> URL: https://issues.apache.org/jira/browse/HDFS-11284
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
>
> Recently I've found in some conditions, SPS is not stable:
> * SPS runs under safe mode.
> * There're some overlap nodes in the chosen target nodes.
> * The real replication number of block doesn't match the replication factor. 
> For example, the real replication is 2 while the replication factor is 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11284) [SPS]: Avoid running SPS under safemode and improve target node choosing conditions in SPS.

2016-12-30 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11284:
---
Summary: [SPS]: Avoid running SPS under safemode and improve target node 
choosing conditions in SPS.  (was: [SPS]: Improve the stability of Storage 
Policy Satisfier.)

> [SPS]: Avoid running SPS under safemode and improve target node choosing 
> conditions in SPS.
> ---
>
> Key: HDFS-11284
> URL: https://issues.apache.org/jira/browse/HDFS-11284
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
>
> Recently I've found in some conditions, SPS is not stable:
> * SPS runs under safe model.
> * There're some overlap nodes in the chosen target nodes.
> * The real replication number of block doesn't match the replication factor. 
> For example, the real replication is 2 while the replication factor is 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11284) [SPS]: Avoid running SPS under safemode and improve target node choosing conditions in SPS.

2016-12-30 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11284:
---
Description: 
Recently I've found in some conditions, SPS is not stable:
* SPS runs under safe mode.
* There're some overlap nodes in the chosen target nodes.
* The real replication number of block doesn't match the replication factor. 
For example, the real replication is 2 while the replication factor is 3.

  was:
Recently I've found in some conditions, SPS is not stable:
* SPS runs under safe model.
* There're some overlap nodes in the chosen target nodes.
* The real replication number of block doesn't match the replication factor. 
For example, the real replication is 2 while the replication factor is 3.


> [SPS]: Avoid running SPS under safemode and improve target node choosing 
> conditions in SPS.
> ---
>
> Key: HDFS-11284
> URL: https://issues.apache.org/jira/browse/HDFS-11284
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
>
> Recently I've found in some conditions, SPS is not stable:
> * SPS runs under safe mode.
> * There're some overlap nodes in the chosen target nodes.
> * The real replication number of block doesn't match the replication factor. 
> For example, the real replication is 2 while the replication factor is 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11248) [SPS]: Handle partial block location movements

2016-12-28 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11248:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-10285
   Status: Resolved  (was: Patch Available)

I have just committed to branch

> [SPS]: Handle partial block location movements
> --
>
> Key: HDFS-11248
> URL: https://issues.apache.org/jira/browse/HDFS-11248
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Rakesh R
>Assignee: Rakesh R
> Fix For: HDFS-10285
>
> Attachments: HDFS-11248-HDFS-10285-00.patch, 
> HDFS-11248-HDFS-10285-01.patch, HDFS-11248-HDFS-10285-02.patch, 
> HDFS-11248-HDFS-10285-03.patch, HDFS-11248-HDFS-10285-04.patch, 
> HDFS-11248-HDFS-10285-05.patch
>
>
> This jira is to handle partial block location movements due to unavailability 
> of target nodes for the matching storage type. 
> For example, We have only A(disk,archive), B(disk) and C(disk,archive) are 
> live nodes with A & C have archive storage type. Say, we have a block with 
> locations {{A(disk), B(disk), C(disk)}}. Again assume, user changed the 
> storage policy to COLD. Now, SPS internally starts preparing the src-target 
> pairing like, {{src=> (A, B, C) and target=> (A, C)}} and sends 
> BLOCK_STORAGE_MOVEMENT to the coordinator. SPS is skipping B as it doesn't 
> have archive media to indicate that it should do retries to satisfy all block 
> locations after some time. On receiving the movement command, coordinator 
> will pair the src-target node to schedule actual physical movements like, 
> {{movetask=> (A, A), (B, C)}}. Here ideally it should do {{(C, C)}} instead 
> of {{(B, C)}} but mistakenly choosing the source C and creates problem.
> IMHO, the implicit assumptions of retry needed is creating confusions and 
> leads to coding mistakes. One idea to fix this problem is to create a new 
> flag {{retryNeeded}} flag to make it more readable. With this, SPS will 
> prepare only the matching pair and dummy source slots will be avoided like, 
> {{src=> (A, C) and target=> (A, C)}} and mark {{retryNeeded=true}} to convey 
> the message that this {{trackId}} has only partial blocks movements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11248) [SPS]: Handle partial block location movements

2016-12-28 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15784738#comment-15784738
 ] 

Uma Maheswara Rao G commented on HDFS-11248:


+1 on the latest patch

> [SPS]: Handle partial block location movements
> --
>
> Key: HDFS-11248
> URL: https://issues.apache.org/jira/browse/HDFS-11248
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-11248-HDFS-10285-00.patch, 
> HDFS-11248-HDFS-10285-01.patch, HDFS-11248-HDFS-10285-02.patch, 
> HDFS-11248-HDFS-10285-03.patch, HDFS-11248-HDFS-10285-04.patch, 
> HDFS-11248-HDFS-10285-05.patch
>
>
> This jira is to handle partial block location movements due to unavailability 
> of target nodes for the matching storage type. 
> For example, We have only A(disk,archive), B(disk) and C(disk,archive) are 
> live nodes with A & C have archive storage type. Say, we have a block with 
> locations {{A(disk), B(disk), C(disk)}}. Again assume, user changed the 
> storage policy to COLD. Now, SPS internally starts preparing the src-target 
> pairing like, {{src=> (A, B, C) and target=> (A, C)}} and sends 
> BLOCK_STORAGE_MOVEMENT to the coordinator. SPS is skipping B as it doesn't 
> have archive media to indicate that it should do retries to satisfy all block 
> locations after some time. On receiving the movement command, coordinator 
> will pair the src-target node to schedule actual physical movements like, 
> {{movetask=> (A, A), (B, C)}}. Here ideally it should do {{(C, C)}} instead 
> of {{(B, C)}} but mistakenly choosing the source C and creates problem.
> IMHO, the implicit assumptions of retry needed is creating confusions and 
> leads to coding mistakes. One idea to fix this problem is to create a new 
> flag {{retryNeeded}} flag to make it more readable. With this, SPS will 
> prepare only the matching pair and dummy source slots will be avoided like, 
> {{src=> (A, C) and target=> (A, C)}} and mark {{retryNeeded=true}} to convey 
> the message that this {{trackId}} has only partial blocks movements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11150) [SPS]: Provide persistence when satisfying storage policy.

2016-12-28 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15784154#comment-15784154
 ] 

Uma Maheswara Rao G commented on HDFS-11150:


[~yuanbo] Its a good point. Since this JIRA supports persistence, restart case 
would be supported only after this JIRA. So, along with this we should handle 
this case as well.  If we don't support restart, all items in 
storageMovementNeeded, would have dropped after restart and new call anyway 
would be accepted only after safemode left. So, this issue comes only when we 
start supporting restart. 
I agree to add the safemode check. 

Also one another JIRA we need to consider is, after restart Namenode, datanodes 
would be reconnecting. I think they should drop all its work and start freshly, 
as NN will start scanning all items freshly. We can discuss more about this 
JIRA later after this persistence task in.


> [SPS]: Provide persistence when satisfying storage policy.
> --
>
> Key: HDFS-11150
> URL: https://issues.apache.org/jira/browse/HDFS-11150
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
> Attachments: HDFS-11150-HDFS-10285.001.patch, 
> HDFS-11150-HDFS-10285.002.patch, HDFS-11150-HDFS-10285.003.patch, 
> HDFS-11150-HDFS-10285.004.patch, editsStored, editsStored.xml
>
>
> Provide persistence for SPS in case that Hadoop cluster crashes by accident. 
> Basically we need to change EditLog and FsImage here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11248) [SPS]: Handle partial block location movements

2016-12-28 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15784094#comment-15784094
 ] 

Uma Maheswara Rao G commented on HDFS-11248:


[~rakeshr], Looks like my comment 2 still exist. Can you check and fix it?

> [SPS]: Handle partial block location movements
> --
>
> Key: HDFS-11248
> URL: https://issues.apache.org/jira/browse/HDFS-11248
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-11248-HDFS-10285-00.patch, 
> HDFS-11248-HDFS-10285-01.patch, HDFS-11248-HDFS-10285-02.patch, 
> HDFS-11248-HDFS-10285-03.patch, HDFS-11248-HDFS-10285-04.patch
>
>
> This jira is to handle partial block location movements due to unavailability 
> of target nodes for the matching storage type. 
> For example, We have only A(disk,archive), B(disk) and C(disk,archive) are 
> live nodes with A & C have archive storage type. Say, we have a block with 
> locations {{A(disk), B(disk), C(disk)}}. Again assume, user changed the 
> storage policy to COLD. Now, SPS internally starts preparing the src-target 
> pairing like, {{src=> (A, B, C) and target=> (A, C)}} and sends 
> BLOCK_STORAGE_MOVEMENT to the coordinator. SPS is skipping B as it doesn't 
> have archive media to indicate that it should do retries to satisfy all block 
> locations after some time. On receiving the movement command, coordinator 
> will pair the src-target node to schedule actual physical movements like, 
> {{movetask=> (A, A), (B, C)}}. Here ideally it should do {{(C, C)}} instead 
> of {{(B, C)}} but mistakenly choosing the source C and creates problem.
> IMHO, the implicit assumptions of retry needed is creating confusions and 
> leads to coding mistakes. One idea to fix this problem is to create a new 
> flag {{retryNeeded}} flag to make it more readable. With this, SPS will 
> prepare only the matching pair and dummy source slots will be avoided like, 
> {{src=> (A, C) and target=> (A, C)}} and mark {{retryNeeded=true}} to convey 
> the message that this {{trackId}} has only partial blocks movements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11239) [SPS]: Check Mover file ID lease also to determine whether Mover is running

2016-12-27 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781485#comment-15781485
 ] 

Uma Maheswara Rao G commented on HDFS-11239:


# .
{code}
+  public boolean isFileOpenedForWrite(String path) {
{code}
Can you please add @Override flag?
# .
{code}
 @Test(timeout = 2)
+  @Test(timeout = 12)
   public void testClusterIdMismatchAtStartupWithHA() throws Exception {
{code}
Is this failure related to this patch? if No, could you please file separate 
JIRA for that failure? This is to avoid unrelated changes to this patch.

> [SPS]: Check Mover file ID lease also to determine whether Mover is running
> ---
>
> Key: HDFS-11239
> URL: https://issues.apache.org/jira/browse/HDFS-11239
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Wei Zhou
>Assignee: Wei Zhou
> Attachments: HDFS-11239-HDFS-10285.00.patch, 
> HDFS-11239-HDFS-10285.01.patch, HDFS-11239-HDFS-10285.02.patch
>
>
> Currently in SPS only checks the Mover ID file existence to determine whether 
> a Mover is running, this can be an issue when Mover exists unexpected without 
> deleting the ID file,  and this further stops SPS to function. This is a 
> following on to HDFS-10885 and there we bypassed this due to some 
> implementation problems.  This issue can be fixed after HDFS-11123.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11032) [SPS]: Handling of block movement failure at the coordinator datanode

2016-12-22 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11032:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-10285
   Status: Resolved  (was: Patch Available)

I have just pushed this to branch

> [SPS]: Handling of block movement failure at the coordinator datanode
> -
>
> Key: HDFS-11032
> URL: https://issues.apache.org/jira/browse/HDFS-11032
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Rakesh R
>Assignee: Rakesh R
> Fix For: HDFS-10285
>
> Attachments: HDFS-11032-HDFS-10285-00.patch, 
> HDFS-11032-HDFS-10285-01.patch
>
>
> The idea of this jira is to discuss and implement an efficient failure(block 
> movement failure) handling logic at the datanode cooridnator.  [Code 
> reference|https://github.com/apache/hadoop/blob/HDFS-10285/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/StoragePolicySatisfyWorker.java#L243].
> Following are the possible errors during block movement:
> # Block pinned - no retries marked as success/no-retry to NN. It is not 
> possible to relocate this block to another datanode.
> # Network errors(IOException) - no retries maked as failure/retry to NN.
> # No disk space(IOException) - no retries maked as failure/retry to NN.
> # Gen_Stamp mismatches - no retries marked as failure/retry to NN. Could be a 
> case that the file might have re-opened.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11032) [SPS]: Handling of block movement failure at the coordinator datanode

2016-12-22 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771512#comment-15771512
 ] 

Uma Maheswara Rao G commented on HDFS-11032:


+1 on the patch. Changes looks good to me.

> [SPS]: Handling of block movement failure at the coordinator datanode
> -
>
> Key: HDFS-11032
> URL: https://issues.apache.org/jira/browse/HDFS-11032
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-11032-HDFS-10285-00.patch, 
> HDFS-11032-HDFS-10285-01.patch
>
>
> The idea of this jira is to discuss and implement an efficient failure(block 
> movement failure) handling logic at the datanode cooridnator.  [Code 
> reference|https://github.com/apache/hadoop/blob/HDFS-10285/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/StoragePolicySatisfyWorker.java#L243].
> Following are the possible errors during block movement:
> # Block pinned - no retries marked as success/no-retry to NN. It is not 
> possible to relocate this block to another datanode.
> # Network errors(IOException) - no retries maked as failure/retry to NN.
> # No disk space(IOException) - no retries maked as failure/retry to NN.
> # Gen_Stamp mismatches - no retries marked as failure/retry to NN. Could be a 
> case that the file might have re-opened.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11244) [SPS]: Limit the number of satisfyStoragePolicy items at Namenode

2016-12-21 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11244:
---
Summary: [SPS]: Limit the number of satisfyStoragePolicy items at Namenode  
(was: [SPS]: Limit the number satisfyStoragePolicy items at Namenode)

> [SPS]: Limit the number of satisfyStoragePolicy items at Namenode
> -
>
> Key: HDFS-11244
> URL: https://issues.apache.org/jira/browse/HDFS-11244
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>
> This JIRA is to provide a provision to limit the number storagePolisySatisfy 
> pending queues. If we don't limit this number and if users keep calling more 
> and more and if DNs are slow processing machines, then NN sides queues can 
> grow up. So, it may be good to have an option to limit incoming requests for 
> satisfyStoragePolicy. May be default 10K?  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11239) [SPS]: Check Mover file ID lease also to determine whether Mover is running

2016-12-21 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768522#comment-15768522
 ] 

Uma Maheswara Rao G commented on HDFS-11239:


Thanks [~zhouwei] for updating the patch!
{code}
 return ((FSNamesystem)namesystem).isFileOpenedForWrite(moverId);
{code}
Please don't cast here. You can add an API to the internal interface 
Namesystem.java. 
This interface intends to communicate between namespace layer and block 
management layer. Avoid tighter code between block management and namespace 
layer.

Thanks

> [SPS]: Check Mover file ID lease also to determine whether Mover is running
> ---
>
> Key: HDFS-11239
> URL: https://issues.apache.org/jira/browse/HDFS-11239
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Wei Zhou
>Assignee: Wei Zhou
> Attachments: HDFS-11239-HDFS-10285.00.patch, 
> HDFS-11239-HDFS-10285.01.patch
>
>
> Currently in SPS only checks the Mover ID file existence to determine whether 
> a Mover is running, this can be an issue when Mover exists unexpected without 
> deleting the ID file,  and this further stops SPS to function. This is a 
> following on to HDFS-10885 and there we bypassed this due to some 
> implementation problems.  This issue can be fixed after HDFS-11123.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11248) [SPS]: Handle partial block location movements

2016-12-21 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768505#comment-15768505
 ] 

Uma Maheswara Rao G commented on HDFS-11248:


Hi [~rakeshr], Thanks for updating the patch!
Please find my feedback on latest patch 02.

# .
{code}
if (itemInfo != null
+  && !itemInfo.isAllBlockLocsAttemptedToSatisfy()) {
+trackIDNeededRetries = true;
+LOG.warn("Blocks storage movement is SUCCESS for the track id : "
++ storageMovementAttemptedResult.getTrackId()
++ " reported from co-ordinating datanode. But adding trackID"
++ " back to retry queue as some of the block didn't find"
++ " matching target nodes in previous iteration.");
+  }
# . 
{code}
Looks like this part will be executed even if result is failure. else missing 
here?
# .
{code}
LOG.info("Blocks storage movement results for the tracking id : "
++ storageMovementAttemptedResult.getTrackId()
++ " is reported from co-ordinating datanode. "
++ "The result status is SUCCESS.”);
{code}
Looks like this log comes for all conditions. It is saying SUCCESS.
# .
{code}
 if (itemInfo != null
+  && !itemInfo.isAllBlockLocsAttemptedToSatisfy()) {
{code}
Also if itemInfo is null, means item would have moved to needed list earlier. 
So, do we need assertion to make sure? But its tricky here that it would have 
been picked for processing from needed list again. Add TODO otherwise to check 
this case?
# . I would like to see a simple doc for this class ItemInfo.

> [SPS]: Handle partial block location movements
> --
>
> Key: HDFS-11248
> URL: https://issues.apache.org/jira/browse/HDFS-11248
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-11248-HDFS-10285-00.patch, 
> HDFS-11248-HDFS-10285-01.patch, HDFS-11248-HDFS-10285-02.patch
>
>
> This jira is to handle partial block location movements due to unavailability 
> of target nodes for the matching storage type. 
> For example, We have only A(disk,archive), B(disk) and C(disk,archive) are 
> live nodes with A & C have archive storage type. Say, we have a block with 
> locations {{A(disk), B(disk), C(disk)}}. Again assume, user changed the 
> storage policy to COLD. Now, SPS internally starts preparing the src-target 
> pairing like, {{src=> (A, B, C) and target=> (A, C)}} and sends 
> BLOCK_STORAGE_MOVEMENT to the coordinator. SPS is skipping B as it doesn't 
> have archive media to indicate that it should do retries to satisfy all block 
> locations after some time. On receiving the movement command, coordinator 
> will pair the src-target node to schedule actual physical movements like, 
> {{movetask=> (A, A), (B, C)}}. Here ideally it should do {{(C, C)}} instead 
> of {{(B, C)}} but mistakenly choosing the source C and creates problem.
> IMHO, the implicit assumptions of retry needed is creating confusions and 
> leads to coding mistakes. One idea to fix this problem is to create a new 
> flag {{retryNeeded}} flag to make it more readable. With this, SPS will 
> prepare only the matching pair and dummy source slots will be avoided like, 
> {{src=> (A, C) and target=> (A, C)}} and mark {{retryNeeded=true}} to convey 
> the message that this {{trackId}} has only partial blocks movements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11150) [SPS]: Provide persistence when satisfying storage policy.

2016-12-20 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15766127#comment-15766127
 ] 

Uma Maheswara Rao G commented on HDFS-11150:


{quote}
I've tried that before. There is an issue here if we only mark the directory. 
When recovering from FsImage, the InodeMap isn't built up, so we don't know the 
sub-inode of a given inode, in the end, We cannot add these inodes to movement 
queue in FSDirectory#addToInodeMap, any thoughts?
{quote}
I got what you are saying. Ok for simplicity we can add for all Inodes now. For 
this to handle 100%, we may need intermittent processing, like first we should 
add them to some intermittentList while loading fsImage, once fully loaded and 
when starting active services, we should process that list and do required 
stuff. But it would add some additional complexity may be. Let's do with all 
file inodes now and we can revisit later if it is really creating issues. How 
about you raise a JIRA for it and think to optimize separately?

> [SPS]: Provide persistence when satisfying storage policy.
> --
>
> Key: HDFS-11150
> URL: https://issues.apache.org/jira/browse/HDFS-11150
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
> Attachments: HDFS-11150-HDFS-10285.001.patch, 
> HDFS-11150-HDFS-10285.002.patch, editsStored, editsStored.xml
>
>
> Provide persistence for SPS in case that Hadoop cluster crashes by accident. 
> Basically we need to change EditLog and FsImage here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11239) [SPS]: Check Mover file ID lease also to determine whether Mover is running

2016-12-20 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765619#comment-15765619
 ] 

Uma Maheswara Rao G commented on HDFS-11239:


Hi [~zhouwei], Thanks for raising the issue and working on it.
Please find my feedback below.

# Implementing INode related code inside SPS class is not a good idea. All of 
this implementation parts should got to name system layer. How about adding 
small API which returns true if INode file is opened for write? Lets say API 
name as like isINodeFileOpenedForWrite(), this can have all these lease check 
and return boolean.
# -
{code}
+  running = hdfsCluster.getFileSystem()
+  .getClient().isStoragePolicySatisfierRunning();
+  Assert.assertFalse("SPS should not be able to run as file "
+  + HdfsServerConstants.MOVER_ID_PATH + " is being hold.", running);
{code}
I think it will be good if you can add test case to call satisfySatoragePolicy 
as well to make sure functionality also working after SPS restart successfully 
with MoverID checks


> [SPS]: Check Mover file ID lease also to determine whether Mover is running
> ---
>
> Key: HDFS-11239
> URL: https://issues.apache.org/jira/browse/HDFS-11239
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Wei Zhou
>Assignee: Wei Zhou
> Attachments: HDFS-11239-HDFS-10285.00.patch
>
>
> Currently in SPS only checks the Mover ID file existence to determine whether 
> a Mover is running, this can be an issue when Mover exists unexpected without 
> deleting the ID file,  and this further stops SPS to function. This is a 
> following on to HDFS-10885 and there we bypassed this due to some 
> implementation problems.  This issue can be fixed after HDFS-11123.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11150) [SPS]: Provide persistence when satisfying storage policy.

2016-12-20 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15763884#comment-15763884
 ] 

Uma Maheswara Rao G commented on HDFS-11150:


HI [~yuanbo], Thank you for working on this patch. Below is my feedback on the 
patch.

# Can we rename addSatisfyMovement —> addStoragePolicySatisfierXAttr ?
# Suggestion to think: One another thought I am having but, it could be tricky 
to handle. To save memory, when user call satisfyStoragePolicy on directory, we 
would take immediate files under that directory. So, for persistence part, how 
about keeping Xattrs only of that directory. Build  required elements based on 
this. 
Below is some details where I am talking about
{code}
for (INode node : candidateNodes) {
+  bm.satisfyStoragePolicy(node.getId());
+  List existingXAttrs = XAttrStorage.readINodeXAttrs(inode);
+  List newXAttrs = FSDirXAttrOp.setINodeXAttrs(
+  fsd, existingXAttrs, xattrs, EnumSet.of(XAttrSetFlag.CREATE));
+  XAttrStorage.updateINodeXAttrs(inode, newXAttrs, snapshotId);
+}
{code}
Can we think to add Xattr to only dir here?  The idea here is, when loading 
from FSImage, we can think to process and add the childs(if files) to 
bm.satisfyStoragePolicy
In that case, you need to change below part as well to handle to recalculate 
childs when it is dir.
{code}
 if (isFile && XATTR_SATISFY_STORAGE_POLICY.equals(xaName)) {
+  fsd.getBlockManager().satisfyStoragePolicy(inode.getId());
+  }
{code}
And also 
{code}
 private void addSatisfyMovement(INodeWithAdditionalFields inode,
+  XAttrFeature xaf) {
+if (xaf == null || inode.isDirectory()) {
+  return;
+}
+XAttr xattr = xaf.getXAttr(XATTR_SATISFY_STORAGE_POLICY);
+if (xattr == null) {
+  return;
+}
+getBlockManager().satisfyStoragePolicy(inode.getId());
+  }
{code}
Here instead of calling directly getBlockManager().satisfyStoragePolicy, we can 
think to build unprotected API, which should find childs(only files) in the 
case of directory and call 
getBlockManager().satisfyStoragePolicy(inode.getId()) for each. Does that work?
# Can we add test cases with checkpoints and multiple restarts? Can you also 
add test with HA cases?
# Simply  handling below retryCache implementation won’t enough, You need to 
mark ClientProtocol#satisfyStoragePolicy API annotate with @AtMostOnce. 
{code}
 CacheEntry cacheEntry = RetryCache.waitForCompletion(retryCache);
+if (cacheEntry != null && cacheEntry.isSuccess()) {
+  return; // Return previous response
+}
+boolean success = false;
+try {
+  namesystem.satisfyStoragePolicy(src, cacheEntry != null);
+  success = true;
+} finally {
+  RetryCache.setState(cacheEntry, success);
+}
{code}
# Also please consider fixing the retryCache related testcase. Its because one 
more API is getting added to @AtMostOnce, so API count increases.
# -
{code}
throw new IOException("Failed to satisfy storage policy for "
+  + iip.getPath()
+  + " since it has been added to satisfy movement queue." );
{code}
After iip.getPath(), please put comma. and also make message clear saying 
something like “Cannot request to call satisfy storage policy on path 
iip.getPath(), as this file/dir was already called for satisfying storage 
policy."
# Could you please add timeout and javadoc for testcase?
# -
{code}
Assert.assertTrue(e.getMessage().contains(
+String.format("Failed to satisfy storage policy for %s since "
++ "it has been added to satisfy movement queue.", file)));
{code}
Can you use GenericTestUtils#assertExceptionContains instead?

> [SPS]: Provide persistence when satisfying storage policy.
> --
>
> Key: HDFS-11150
> URL: https://issues.apache.org/jira/browse/HDFS-11150
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
> Attachments: HDFS-11150-HDFS-10285.001.patch, 
> HDFS-11150-HDFS-10285.002.patch, editsStored, editsStored.xml
>
>
> Provide persistence for SPS in case that Hadoop cluster crashes by accident. 
> Basically we need to change EditLog and FsImage here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11248) [SPS]: Handle partial block location movements

2016-12-19 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15763326#comment-15763326
 ] 

Uma Maheswara Rao G commented on HDFS-11248:


[~rakeshr], Thank you for accepting the idea and providing the patch quickly.

Please find the feed back on latest patch.

# -
I think there is race condition here in the following condition:
{code}
 synchronized (storageMovementAttemptedResults) {
  boolean exist = isExistInResult(blockCollectionID);
  if (!exist) {
blockStorageMovementNeeded.add(blockCollectionID);
  } else {
LOG.info("Blocks storage movement results for the"
+ " tracking id : " + blockCollectionID
+ " is reported from one of the co-ordinating datanode."
+ " So, the result will be processed soon.");
  }
  iter.remove();
}
{code}
Talking about partial block movements case. 
Lets assume this piece has got chance to execute when immediately after SUCCESS 
result came.
So, storageMovementAttemptedResults contains the element, but not processed 
yet. The above condition thinks exist as true and will not add element back to 
blockStorageMovementNeeded for actual retry. Now assume, Later following code 
got chance to execute. itemInfo.isAllBlockLocsAttemptedToSatisfy() will be true 
and will try to remove from storageMovementAttemptedItems. But here with the 
current patch, the element disappeared in all the lists now. So, we can not 
retry.
So, Here based on 
result(storageMovementAttemptedItems.remove(storageMovementAttemptedResult.getTrackId())
) you may need to act. if remove is true, then fine, otherwise 
storageMovementAttemptedItems item would have processed earlier than the 
current processing. You can rethink better in this conditions may be.
{code}
 synchronized (storageMovementAttemptedItems) {
  ItemInfo itemInfo = storageMovementAttemptedItems
  .get(storageMovementAttemptedResult.getTrackId());
  if (itemInfo.isAllBlockLocsAttemptedToSatisfy()) {
storageMovementAttemptedItems
.remove(storageMovementAttemptedResult.getTrackId());
  }
}
{code}
# -
I would like you to refine log message to represent exact situation here.
{code}
   LOG.warn("Blocks storage movement results for the tracking id : "
+ storageMovementAttemptedResult.getTrackId()
+ " is reported from co-ordinating datanode, but result"
+ " status is FAILURE. So, added for retry");
  } else {
synchronized (storageMovementAttemptedItems) {
  ItemInfo itemInfo = storageMovementAttemptedItems
  .get(storageMovementAttemptedResult.getTrackId());
  if (itemInfo.isAllBlockLocsAttemptedToSatisfy()) {
storageMovementAttemptedItems
.remove(storageMovementAttemptedResult.getTrackId());
  }
}
LOG.info("Blocks storage movement results for the tracking id : "
+ storageMovementAttemptedResult.getTrackId()
+ " is reported from co-ordinating datanode. "
+ "The result status is SUCCESS.”);
{code}
When SUCCESS but itemInfo.isAllBlockLocsAttemptedToSatisfy is false
, you may not remove from storageMovementAttemptedItems. This means you are 
retrying. May be we can say, result success but will retry due to this 
condition.
# -
{code}
if (!blockCollection.getLastBlock().isComplete()) {
   // Postpone, currently file is under construction
   // So, should we add back? or leave it to user
-  return;
+  return true;
{code}
Not introduced from this patch, but for improving debug-ability we can add 
message here why we are postponing.


> [SPS]: Handle partial block location movements
> --
>
> Key: HDFS-11248
> URL: https://issues.apache.org/jira/browse/HDFS-11248
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-11248-HDFS-10285-00.patch, 
> HDFS-11248-HDFS-10285-01.patch
>
>
> This jira is to handle partial block location movements due to unavailability 
> of target nodes for the matching storage type. 
> For example, We have only A(disk,archive), B(disk) and C(disk,archive) are 
> live nodes with A & C have archive storage type. Say, we have a block with 
> locations {{A(disk), B(disk), C(disk)}}. Again assume, user changed the 
> storage policy to COLD. Now, SPS internally starts preparing the src-target 
> pairing like, {{src=> (A, B, C) and target=> (A, C)}} and sends 
> BLOCK_STORAGE_MOVEMENT

[jira] [Commented] (HDFS-11248) [SPS]: Handle partial block location movements

2016-12-18 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15760201#comment-15760201
 ] 

Uma Maheswara Rao G commented on HDFS-11248:


Thank you [~rakeshr] finding this issue.

I am still thinking that, when we found source/targets for portion of blocks 
only, then how about having this information in  storageMovementsMonitor when 
adding. 
Like we can add Map storageMovementAttemptedItems --> Map storageMovementAttemptedItems
Here ItemInfo can contain timestamp, isAllBlocksCoveredToSatisfy(boolean)
If isAllBlocksCoveredToSatisfy false means, we did not sent all blocks for 
movement. So, when processing this item, we can consider it to have another try.

Can we think in this lines? I am bit concerned on the details communicating to 
DN just for this retry reason. 


> [SPS]: Handle partial block location movements
> --
>
> Key: HDFS-11248
> URL: https://issues.apache.org/jira/browse/HDFS-11248
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-11248-HDFS-10285-00.patch
>
>
> This jira is to handle partial block location movements due to unavailability 
> of target nodes for the matching storage type. 
> For example, We have only A(disk,archive), B(disk) and C(disk,archive) are 
> live nodes with A & C have archive storage type. Say, we have a block with 
> locations {{A(disk), B(disk), C(disk)}}. Again assume, user changed the 
> storage policy to COLD. Now, SPS internally starts preparing the src-target 
> pairing like, {{src=> (A, B, C) and target=> (A, C)}} and sends 
> BLOCK_STORAGE_MOVEMENT to the coordinator. SPS is skipping B as it doesn't 
> have archive media to indicate that it should do retries to satisfy all block 
> locations after some time. On receiving the movement command, coordinator 
> will pair the src-target node to schedule actual physical movements like, 
> {{movetask=> (A, A), (B, C)}}. Here ideally it should do {{(C, C)}} instead 
> of {{(B, C)}} but mistakenly choosing the source C and creates problem.
> IMHO, the implicit assumptions of retry needed is creating confusions and 
> leads to coding mistakes. One idea to fix this problem is to create a new 
> flag {{retryNeeded}} flag to make it more readable. With this, SPS will 
> prepare only the matching pair and dummy source slots will be avoided like, 
> {{src=> (A, C) and target=> (A, C)}} and mark {{retryNeeded=true}} to convey 
> the message that this {{trackId}} has only partial blocks movements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11193) [SPS]: Erasure coded files should be considered for satisfying storage policy

2016-12-14 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749800#comment-15749800
 ] 

Uma Maheswara Rao G commented on HDFS-11193:


{quote}
While testing I found an issue in when there is no target node with the 
required storage type logic. For example, I have a block with locations 
A(disk), B(disk), C(disk) and assume only A, B and C are live nodes with A & C 
have archive storage type. Again assume, user changed the storage policy to 
COLD. Now, SPS internally starts preparing the src-target pairing like, src=> 
(A, B, C) and target=> (A, C). Its skipping B as it doesn't have archive media 
and this is an indication that SPS should do retries for satisfying all of its 
block locations. On the other side, coordinator will pair the src-target node 
for actual physical movement like, movetask=> (A, A), (B, C). Here ideally it 
should do (C, C) instead of (B, C) but mistakenly choosing the source C. I 
think, the implicit assumptions of retry needed will create confusions and 
coding mistakes like this. In this patch, I've created a new flag retryNeeded 
flag to make it more readable. Now, SPS will prepare only the matching pair and 
dummy source slots will be avoided like, src=> (A, C) and target=> (A, C) and 
set retryNeeded=true to convey the message that this trackId has only partial 
blocks movements.
{quote}
This is a good test case. Since this is a bug in existing part, can we make 
separate JIRA for this issue? Otherwise the current JIRA will mark that issue. 
I would like to review that fix separately than combining with this patch.

{quote}
Thanks for this idea. Following is my analysis on this approach. As we know, 
presently NN is passing simple Block objects to the coordinator datanode for 
movement. Inorder to do the internal block constrcution at the DN side, it 
requires the BlockInfoStriped complex object and the blockIndices array. I 
think passing list of simple object is better compare to the complex object, 
this will keep all the computation complexities at the SPS side and makes the 
coordinator logic more readable. I'd prefer to keep the internal block 
constrcution logic at the NN side. Does this make sense to you?
{quote}
I think this is ok. My suggestion was based on EC. When we send ECRecovery 
command, we will just send Block and liveBlocks indices, actual Blk id will be 
constructed back at DN. It should not be a big deal. Since this command 
combines both EC and non EC, lets just construct full details at NN only. 

> [SPS]: Erasure coded files should be considered for satisfying storage policy
> -
>
> Key: HDFS-11193
> URL: https://issues.apache.org/jira/browse/HDFS-11193
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-11193-HDFS-10285-00.patch, 
> HDFS-11193-HDFS-10285-01.patch, HDFS-11193-HDFS-10285-02.patch
>
>
> Erasure coded striped files supports storage policies {{HOT, COLD, ALLSSD}}. 
> {{HdfsAdmin#satisfyStoragePolicy}} API call on a directory should consider 
> all immediate files under that directory and need to check that, the files 
> really matching with namespace storage policy. All the mismatched striped 
> blocks should be chosen for block movement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-11244) Limit the number satisfyStoragePolicy items at Namenode

2016-12-13 Thread Uma Maheswara Rao G (JIRA)

Uma Maheswara Rao G created HDFS-11244:
--

 Summary: Limit the number satisfyStoragePolicy items at Namenode
 Key: HDFS-11244
 URL: https://issues.apache.org/jira/browse/HDFS-11244
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G


This JIRA is to provide a provision to limit the number storagePolisySatisfy 
pending queues. If we don't limit this number and if users keep calling more 
and more and if DNs are slow processing machines, then NN sides queues can grow 
up. So, it may be good to have an option to limit incoming requests for 
satisfyStoragePolicy. May be default 10K?  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-11243) Add a protocol command from NN to DN for dropping the SPS work and queues

2016-12-13 Thread Uma Maheswara Rao G (JIRA)

Uma Maheswara Rao G created HDFS-11243:
--

 Summary: Add a protocol command from NN to DN for dropping the SPS 
work and queues 
 Key: HDFS-11243
 URL: https://issues.apache.org/jira/browse/HDFS-11243
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G


This JIRA is for adding a protocol command from Namenode to Datanode for 
dropping SPS work. and Also for dropping in progress queues.

Use case is: when admin deactivated SPS at NN, then internally NN should issue 
a command to DNs for dropping in progress queues as well. This command can be 
packed via heartbeat. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11123) [SPS] Make storage policy satisfier daemon work on/off dynamically

2016-12-13 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11123:
---
Attachment: HDFS-11123-HDFS-10285-01.patch

Attached a new patch which incorporated with Rakesh's comments.

> [SPS] Make storage policy satisfier daemon work on/off dynamically
> --
>
> Key: HDFS-11123
> URL: https://issues.apache.org/jira/browse/HDFS-11123
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-10285-HDFS-11123.00.patch, 
> HDFS-11123-HDFS-10285-00.patch, HDFS-11123-HDFS-10285-01.patch
>
>
> The idea of this task is to make SPS daemon thread to start/stop dynamically 
> in Namenode process with out needing to restart complete Namenode.
> So, this will help in the case of admin wants to switch of this SPS and wants 
> to run Mover tool externally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11164) Mover should avoid unnecessary retries if the block is pinned

2016-12-13 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11164:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha2
   Status: Resolved  (was: Patch Available)

I have just committed this to trunk. 

> Mover should avoid unnecessary retries if the block is pinned
> -
>
> Key: HDFS-11164
> URL: https://issues.apache.org/jira/browse/HDFS-11164
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: Rakesh R
>Assignee: Rakesh R
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-11164-00.patch, HDFS-11164-01.patch, 
> HDFS-11164-02.patch, HDFS-11164-03.patch
>
>
> When mover is trying to move a pinned block to another datanode, it will 
> internally hits the following IOException and mark the block movement as 
> {{failure}}. Since the Mover has {{dfs.mover.retry.max.attempts}} configs, it 
> will continue moving this block until it reaches {{retryMaxAttempts}}. If the 
> block movement failure(s) are only due to block pinning, then retry is 
> unnecessary. The idea of this jira is to avoid retry attempts of pinned 
> blocks as they won't be able to move to a different node. 
> {code}
> 2016-11-22 10:56:10,537 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: Failed to move 
> blk_1073741825_1001 with size=52 from 127.0.0.1:19501:DISK to 
> 127.0.0.1:19758:ARCHIVE through 127.0.0.1:19501
> java.io.IOException: Got error, status=ERROR, status message opReplaceBlock 
> BP-1772076264-10.252.146.200-1479792322960:blk_1073741825_1001 received 
> exception java.io.IOException: Got error, status=ERROR, status message Not 
> able to copy block 1073741825 to /127.0.0.1:19826 because it's pinned , copy 
> block BP-1772076264-10.252.146.200-1479792322960:blk_1073741825_1001 from 
> /127.0.0.1:19501, reportedBlock move is failed
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:118)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:417)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:358)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$5(Dispatcher.java:322)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:1075)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11123) [SPS] Make storage policy satisfier daemon work on/off dynamically

2016-12-13 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745588#comment-15745588
 ] 

Uma Maheswara Rao G commented on HDFS-11123:


Thanks [~rakeshr] for the reviews.

{quote}
Can we avoid adding blksMovementResults to the monitor thread if SPS is not 
running.
{quote}
Exactly. I will add this in next patch.

I will add debug messages. Thanks

> [SPS] Make storage policy satisfier daemon work on/off dynamically
> --
>
> Key: HDFS-11123
> URL: https://issues.apache.org/jira/browse/HDFS-11123
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-10285-HDFS-11123.00.patch, 
> HDFS-11123-HDFS-10285-00.patch
>
>
> The idea of this task is to make SPS daemon thread to start/stop dynamically 
> in Namenode process with out needing to restart complete Namenode.
> So, this will help in the case of admin wants to switch of this SPS and wants 
> to run Mover tool externally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11164) Mover should avoid unnecessary retries if the block is pinned

2016-12-12 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15743238#comment-15743238
 ] 

Uma Maheswara Rao G commented on HDFS-11164:


Thanks a lot, [~surendrasingh] for verification and confirming.

Thank you for suggesting ideas. In general its a nice idea, but the fact we 
should consider is, keeping less maintenance work at NN. Since this is not 
critical namespace info/block info, its ok to leave this info to DN. 

I will go ahead to commit this patch!

> Mover should avoid unnecessary retries if the block is pinned
> -
>
> Key: HDFS-11164
> URL: https://issues.apache.org/jira/browse/HDFS-11164
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-11164-00.patch, HDFS-11164-01.patch, 
> HDFS-11164-02.patch, HDFS-11164-03.patch
>
>
> When mover is trying to move a pinned block to another datanode, it will 
> internally hits the following IOException and mark the block movement as 
> {{failure}}. Since the Mover has {{dfs.mover.retry.max.attempts}} configs, it 
> will continue moving this block until it reaches {{retryMaxAttempts}}. If the 
> block movement failure(s) are only due to block pinning, then retry is 
> unnecessary. The idea of this jira is to avoid retry attempts of pinned 
> blocks as they won't be able to move to a different node. 
> {code}
> 2016-11-22 10:56:10,537 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: Failed to move 
> blk_1073741825_1001 with size=52 from 127.0.0.1:19501:DISK to 
> 127.0.0.1:19758:ARCHIVE through 127.0.0.1:19501
> java.io.IOException: Got error, status=ERROR, status message opReplaceBlock 
> BP-1772076264-10.252.146.200-1479792322960:blk_1073741825_1001 received 
> exception java.io.IOException: Got error, status=ERROR, status message Not 
> able to copy block 1073741825 to /127.0.0.1:19826 because it's pinned , copy 
> block BP-1772076264-10.252.146.200-1479792322960:blk_1073741825_1001 from 
> /127.0.0.1:19501, reportedBlock move is failed
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:118)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:417)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:358)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$5(Dispatcher.java:322)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:1075)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11193) [SPS]: Erasure coded files should be considered for satisfying storage policy

2016-12-10 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15738932#comment-15738932
 ] 

Uma Maheswara Rao G commented on HDFS-11193:


[~rakeshr], Thanks for working on this task. Please check my feedback on this 
patch. BTW, looks like it needs rebase on latest code.

# -
{code}
+if (ErasureCodingPolicyManager
+.checkStoragePolicySuitableForECStripedMode(
+existingStoragePolicyID)) {
+  expectedStorageTypes = existingStoragePolicy
+  .chooseStorageTypes((short) blockInfo.getCapacity());
+} else {
+  // Currently we support only limited policies (HOT, COLD, ALLSSD)
+  // for EC striped mode files.
+  // Mover tool will ignore to move the blocks if the storage policy
+  // is not in EC Striped mode supported policies
+  LOG.warn("The storage policy " + existingStoragePolicy.getName()
+  + " is not suitable for Striped EC files. "
+  + "So, Ignoring to move the blocks");
+  return;
 }
{code}
Since we are simply returning here, this item may be added to attempts lists 
and later it will be retried? but retries are not needed right?
# -
{code}
// Currently we support only limited policies (HOT, COLD, ALLSSD)
+  // for EC striped mode files.
+  // Mover tool will ignore to move the blocks if the storage policy
+  // is not in EC Striped mode supported policies
{code}
Can you update this comment. It should be “SPS”
# -
{code}
if (blockInfo.isStriped()) {
+  // For a striped block, it needs to construct internal block at the given
+  // index of a block group. Here it is iterating over all the block 
indices
+  // and construct internal blocks which can be then considered for block
+  // movement.
+  BlockInfoStriped sBlockInfo = (BlockInfoStriped) blockInfo;
+  for (StorageAndBlockIndex si : sBlockInfo.getStorageAndIndexInfos()) {
+if (si.getBlockIndex() >= 0) {
+  DatanodeDescriptor dn = si.getStorage().getDatanodeDescriptor();
+  DatanodeInfo[] srcNode = new DatanodeInfo[1];
+  StorageType[] srcStorageType = new StorageType[1];
+  DatanodeInfo[] targetNode = new DatanodeInfo[1];
+  StorageType[] targetStorageType = new StorageType[1];
+  for (int i = 0; i < sourceNodes.size(); i++) {
+DatanodeInfo node = sourceNodes.get(i);
+if (node.equals(dn)) {
+  srcNode[0] = node;
+  srcStorageType[0] = sourceStorageTypes.get(i);
+  if (targetNodes.size() > i) {
+targetNode[0] = targetNodes.get(i);
+targetStorageType[0] = targetStorageTypes.get(i);
+  } else {
+// empty target
+targetNode = new DatanodeInfo[0];
+targetStorageType = new StorageType[0];
+  }
+  break; // found matching source-target nodes
+}
+  }
+  // construct internal block
+  long blockId = blockInfo.getBlockId() + si.getBlockIndex();
+  long numBytes = StripedBlockUtil.getInternalBlockLength(
+  sBlockInfo.getNumBytes(), sBlockInfo.getCellSize(),
+  sBlockInfo.getDataBlockNum(), si.getBlockIndex());
+  Block blk = new Block(blockId, numBytes,
+  blockInfo.getGenerationStamp());
+  BlockMovingInfo blkMovingInfo = new BlockMovingInfo(blk, srcNode,
+  targetNode, srcStorageType, targetStorageType);
+  blkMovingInfos.add(blkMovingInfo);
{code}
One another idea in my mind is that, how about just including blockIndexes in 
the case of Striped?  Like adding new parameter blockIndexes, which will be 
empty array in the case of non striped. So, that DN can interpret block ids in 
striped case. So, single blockInfo can be included instead of 9 items in the 
case of 6,3 EC. But this is also fine but it needs more items to be included in 
list. Just a thought.
# -No comment
Thanks for adding more test cases.

> [SPS]: Erasure coded files should be considered for satisfying storage policy
> -
>
> Key: HDFS-11193
> URL: https://issues.apache.org/jira/browse/HDFS-11193
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-11193-HDFS-10285-00.patch, 
> HDFS-11193-HDFS-10285-01.patch
>
>
> Erasure coded striped files supports storage policies {{HOT, COLD, ALLSSD}}. 
> {{HdfsAdmin#satisfyStoragePolicy}} API call on a directory should consider 
> all immediate files under that directory and need to check that, the files 
> really matching with namespace storage policy. All the mismatched

[jira] [Updated] (HDFS-11123) [SPS] Make storage policy satisfier daemon work on/off dynamically

2016-12-09 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11123:
---
Attachment: HDFS-11123-HDFS-10285-00.patch

Just renamed the patch to pick up in jenkins.

> [SPS] Make storage policy satisfier daemon work on/off dynamically
> --
>
> Key: HDFS-11123
> URL: https://issues.apache.org/jira/browse/HDFS-11123
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-10285-HDFS-11123.00.patch, 
> HDFS-11123-HDFS-10285-00.patch
>
>
> The idea of this task is to make SPS daemon thread to start/stop dynamically 
> in Namenode process with out needing to restart complete Namenode.
> So, this will help in the case of admin wants to switch of this SPS and wants 
> to run Mover tool externally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11123) [SPS] Make storage policy satisfier daemon work on/off dynamically

2016-12-09 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11123:
---
Status: Patch Available  (was: Open)

> [SPS] Make storage policy satisfier daemon work on/off dynamically
> --
>
> Key: HDFS-11123
> URL: https://issues.apache.org/jira/browse/HDFS-11123
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-10285-HDFS-11123.00.patch
>
>
> The idea of this task is to make SPS daemon thread to start/stop dynamically 
> in Namenode process with out needing to restart complete Namenode.
> So, this will help in the case of admin wants to switch of this SPS and wants 
> to run Mover tool externally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11186) [SPS]: Daemon thread of SPS should start only in Active NN

2016-12-08 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11186:
---
Summary: [SPS]: Daemon thread of SPS should start only in Active NN  (was: 
[SPS]: Daemon thread of SPS starts only in Active NN)

> [SPS]: Daemon thread of SPS should start only in Active NN
> --
>
> Key: HDFS-11186
> URL: https://issues.apache.org/jira/browse/HDFS-11186
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Wei Zhou
>Assignee: Wei Zhou
>
> As discussed in [HDFS-10885 
> |https://issues.apache.org/jira/browse/HDFS-10885], we need to ensure that 
> SPS is started only in Active NN. This JIRA is opened for discussion and 
> tracking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11123) [SPS] Make storage policy satisfier daemon work on/off dynamically

2016-12-08 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11123:
---
Attachment: HDFS-10285-HDFS-11123.00.patch

Attached the initial patch for this feature. Please review it.

> [SPS] Make storage policy satisfier daemon work on/off dynamically
> --
>
> Key: HDFS-11123
> URL: https://issues.apache.org/jira/browse/HDFS-11123
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-10285-HDFS-11123.00.patch
>
>
> The idea of this task is to make SPS daemon thread to start/stop dynamically 
> in Namenode process with out needing to restart complete Namenode.
> So, this will help in the case of admin wants to switch of this SPS and wants 
> to run Mover tool externally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10885) [SPS]: Mover tool should not be allowed to run when Storage Policy Satisfier is on

2016-12-07 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730144#comment-15730144
 ] 

Uma Maheswara Rao G commented on HDFS-10885:


Thanks [~zhouwei] for your hard work!

> [SPS]: Mover tool should not be allowed to run when Storage Policy Satisfier 
> is on
> --
>
> Key: HDFS-10885
> URL: https://issues.apache.org/jira/browse/HDFS-10885
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Wei Zhou
>Assignee: Wei Zhou
> Fix For: HDFS-10285
>
> Attachments: HDFS-10800-HDFS-10885-00.patch, 
> HDFS-10800-HDFS-10885-01.patch, HDFS-10800-HDFS-10885-02.patch, 
> HDFS-10885-HDFS-10285-10.patch, HDFS-10885-HDFS-10285-11.patch, 
> HDFS-10885-HDFS-10285.03.patch, HDFS-10885-HDFS-10285.04.patch, 
> HDFS-10885-HDFS-10285.05.patch, HDFS-10885-HDFS-10285.06.patch, 
> HDFS-10885-HDFS-10285.07.patch, HDFS-10885-HDFS-10285.08.patch, 
> HDFS-10885-HDFS-10285.09.patch
>
>
> These two can not work at the same time to avoid conflicts and fight with 
> each other.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10885) [SPS]: Mover tool should not be allowed to run when Storage Policy Satisfier is on

2016-12-06 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15724763#comment-15724763
 ] 

Uma Maheswara Rao G commented on HDFS-10885:


Lets push this in.

Latest patch looks good to me. +1
[~rakeshr], if you are fine, then push it.

> [SPS]: Mover tool should not be allowed to run when Storage Policy Satisfier 
> is on
> --
>
> Key: HDFS-10885
> URL: https://issues.apache.org/jira/browse/HDFS-10885
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Wei Zhou
>Assignee: Wei Zhou
> Fix For: HDFS-10285
>
> Attachments: HDFS-10800-HDFS-10885-00.patch, 
> HDFS-10800-HDFS-10885-01.patch, HDFS-10800-HDFS-10885-02.patch, 
> HDFS-10885-HDFS-10285-10.patch, HDFS-10885-HDFS-10285-11.patch, 
> HDFS-10885-HDFS-10285.03.patch, HDFS-10885-HDFS-10285.04.patch, 
> HDFS-10885-HDFS-10285.05.patch, HDFS-10885-HDFS-10285.06.patch, 
> HDFS-10885-HDFS-10285.07.patch, HDFS-10885-HDFS-10285.08.patch, 
> HDFS-10885-HDFS-10285.09.patch
>
>
> These two can not work at the same time to avoid conflicts and fight with 
> each other.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11164) Mover should avoid unnecessary retries if the block is pinned

2016-12-05 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15724275#comment-15724275
 ] 

Uma Maheswara Rao G commented on HDFS-11164:


[~rakeshr] Overall patch looks good to me. +1

[~surendrasingh] since you tested mover scenarios before, do you mind checking 
this patch with your clusters whether its effecting any of your scenarios?

> Mover should avoid unnecessary retries if the block is pinned
> -
>
> Key: HDFS-11164
> URL: https://issues.apache.org/jira/browse/HDFS-11164
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-11164-00.patch, HDFS-11164-01.patch, 
> HDFS-11164-02.patch, HDFS-11164-03.patch
>
>
> When mover is trying to move a pinned block to another datanode, it will 
> internally hits the following IOException and mark the block movement as 
> {{failure}}. Since the Mover has {{dfs.mover.retry.max.attempts}} configs, it 
> will continue moving this block until it reaches {{retryMaxAttempts}}. If the 
> block movement failure(s) are only due to block pinning, then retry is 
> unnecessary. The idea of this jira is to avoid retry attempts of pinned 
> blocks as they won't be able to move to a different node. 
> {code}
> 2016-11-22 10:56:10,537 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: Failed to move 
> blk_1073741825_1001 with size=52 from 127.0.0.1:19501:DISK to 
> 127.0.0.1:19758:ARCHIVE through 127.0.0.1:19501
> java.io.IOException: Got error, status=ERROR, status message opReplaceBlock 
> BP-1772076264-10.252.146.200-1479792322960:blk_1073741825_1001 received 
> exception java.io.IOException: Got error, status=ERROR, status message Not 
> able to copy block 1073741825 to /127.0.0.1:19826 because it's pinned , copy 
> block BP-1772076264-10.252.146.200-1479792322960:blk_1073741825_1001 from 
> /127.0.0.1:19501, reportedBlock move is failed
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:118)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:417)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:358)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$5(Dispatcher.java:322)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:1075)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10885) [SPS]: Mover tool should not be allowed to run when Storage Policy Satisfier is on

2016-12-01 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713254#comment-15713254
 ] 

Uma Maheswara Rao G commented on HDFS-10885:


Overall patch looks nice to me. Great work [~zhouwei].

Except the corner cases of double checking, everything fine. Do you mind filing 
another JIRA for checking that double check corner cases as we identified some 
cases before? I am ok with that.

I am ok to push this JIRA in, though I have small point to discuss.
Its regarding configuration item name.
{code}
dfs.namenode.sps.enabled
{code}
 The API what we are exposing is named as isStoragePolicySatisfierActive. This 
two items would be exposed to users, So, I feel we should unify in name to 
avoid confusions.
How about naming the parameter some thing like, (option 1) 
dfs.storage.policy.satisfier.activate = true ? or to avoid Active/Standby HA 
confusions, (option 2) we need to change API name like 
isStoragePolicySatisfierEnabled. 

What do you think [~rakeshr] and others?
I feel #2 may be more appropriate considering dynamic enable disable options.
Other than this point, patch can be pushed.


> [SPS]: Mover tool should not be allowed to run when Storage Policy Satisfier 
> is on
> --
>
> Key: HDFS-10885
> URL: https://issues.apache.org/jira/browse/HDFS-10885
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Wei Zhou
>Assignee: Wei Zhou
> Fix For: HDFS-10285
>
> Attachments: HDFS-10800-HDFS-10885-00.patch, 
> HDFS-10800-HDFS-10885-01.patch, HDFS-10800-HDFS-10885-02.patch, 
> HDFS-10885-HDFS-10285.03.patch, HDFS-10885-HDFS-10285.04.patch, 
> HDFS-10885-HDFS-10285.05.patch, HDFS-10885-HDFS-10285.06.patch, 
> HDFS-10885-HDFS-10285.07.patch, HDFS-10885-HDFS-10285.08.patch, 
> HDFS-10885-HDFS-10285.09.patch
>
>
> These two can not work at the same time to avoid conflicts and fight with 
> each other.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11193) [SPS]: Erasure coded files should be considered for satisfying storage policy

2016-12-01 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713180#comment-15713180
 ] 

Uma Maheswara Rao G commented on HDFS-11193:


I think API itself will not worry whether it is EC'ed file or normal. Process 
goes same for all files. 
But worth checking the part on scanning blocks specifically for EC'ed files and 
add some tests around that. Of course we can fix issues if tests finds.

> [SPS]: Erasure coded files should be considered for satisfying storage policy
> -
>
> Key: HDFS-11193
> URL: https://issues.apache.org/jira/browse/HDFS-11193
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Rakesh R
>Assignee: Rakesh R
>
> Erasure coded striped files supports storage policies {{HOT, COLD, ALLSSD}}. 
> {{HdfsAdmin#satisfyStoragePolicy}} API call on a directory should consider 
> all immediate files under that directory and need to check that, the files 
> really matching with namespace storage policy. All the mismatched striped 
> blocks should be chosen for block movement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDFS-11123) [SPS] Make storage policy satisfier daemon work on/off dynamically

2016-12-01 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G reassigned HDFS-11123:
--

Assignee: Uma Maheswara Rao G

> [SPS] Make storage policy satisfier daemon work on/off dynamically
> --
>
> Key: HDFS-11123
> URL: https://issues.apache.org/jira/browse/HDFS-11123
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>
> The idea of this task is to make SPS daemon thread to start/stop dynamically 
> in Namenode process with out needing to restart complete Namenode.
> So, this will help in the case of admin wants to switch of this SPS and wants 
> to run Mover tool externally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11151) [SPS]: StoragePolicySatisfier should gracefully handle when there is no target node with the required storage type

2016-11-26 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15699072#comment-15699072
 ] 

Uma Maheswara Rao G commented on HDFS-11151:


Oh yeah you are right. We are already adding the element. 

+1 on the patch.

> [SPS]: StoragePolicySatisfier should gracefully handle when there is no 
> target node with the required storage type
> --
>
> Key: HDFS-11151
> URL: https://issues.apache.org/jira/browse/HDFS-11151
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-11151-HDFS-10285.00.patch
>
>
> Presently SPS is not handling a case where the failed to choose target node 
> for the required storage type. In general, there are two cases:
> # For the given path, unable to find any target node for any of its blocks or 
> block locations(src nodes). Means, no block movement will be scheduled 
> against this path.
> # For the given path, there are few target nodes available for few block 
> locations(source nodes). Means, some of the blocks or block locations(src 
> nodes) under the given path will be scheduled for block movement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11151) [SPS]: StoragePolicySatisfier should gracefully handle when there is no target node with the required storage type

2016-11-26 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15697587#comment-15697587
 ] 

Uma Maheswara Rao G commented on HDFS-11151:


[~rakeshr], Thank you for the patch.

{code}
+boolean needBlockStorageMovement = false;
+for (BlockMovingInfo blkMovingInfo : blockMovingInfos) {
+  // Check for atleast one block storage movement has been chosen
+  if (blkMovingInfo.getTargets().length > 0){
+needBlockStorageMovement = true;
+break;
+  }
+}
+if (!needBlockStorageMovement) {
+  // Simply return as there is no targets selected for scheduling the block
+  // movement.
+  return;
+}
{code}
Here simply returning may completely drop this element?  I think we may need to 
add the item to attempted items? So, that it will be retried after some time?

> [SPS]: StoragePolicySatisfier should gracefully handle when there is no 
> target node with the required storage type
> --
>
> Key: HDFS-11151
> URL: https://issues.apache.org/jira/browse/HDFS-11151
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-11151-HDFS-10285.00.patch
>
>
> Presently SPS is not handling a case where the failed to choose target node 
> for the required storage type. In general, there are two cases:
> # For the given path, unable to find any target node for any of its blocks or 
> block locations(src nodes). Means, no block movement will be scheduled 
> against this path.
> # For the given path, there are few target nodes available for few block 
> locations(source nodes). Means, some of the blocks or block locations(src 
> nodes) under the given path will be scheduled for block movement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11164) Mover should avoid unnecessary retries if the block is pinned

2016-11-23 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691480#comment-15691480
 ] 

Uma Maheswara Rao G commented on HDFS-11164:


[~rakeshr], Thanks for reporting the issue.
Adding extra error code make sense to me.

But the current patch may not solve for avoiding unnecessary retries issue I 
think.
Please check the following cases, correct me if I am wrong.

{code}
 // Check that the block movement failure(s) are only due to block pinning.
  // If yes, just mark as failed and exit without retries.
  if(!hasFailed && hasBlockPinningFailure){
hasFailed = hasBlockPinningFailure;
result.setRetryFailed();
  } else if (hasFailed && !hasSuccess) {
if (retryCount.get() == retryMaxAttempts) {
  result.setRetryFailed();
  LOG.error("Failed to move some block's after "
  + retryMaxAttempts + " retries.");
  return result;
} else {
  retryCount.incrementAndGet();
}
  } else {
// Reset retry count if no failure.
retryCount.set(0);
  }
  result.updateHasRemaining(hasFailed);
  return result;
{code}
Here !hasFailed && hasBlockPinningFailure case is targeting for only pinned 
failure and no normal failures right? if so, when there are normal failures and 
pinned failures together, it will still retry?
If it retries, it may scan that paths again and try to move even they are 
pinned blocks.
We may need to think this in a bit different way than node level failures I 
think.
One thought is, Failed blocks due to pinned can be stored separately and when 
retry happens and if blocks exist in failedDueToPinned list, then skip to add 
them into PendingMoves? Just a thought, we need to check the feasibility.


> Mover should avoid unnecessary retries if the block is pinned
> -
>
> Key: HDFS-11164
> URL: https://issues.apache.org/jira/browse/HDFS-11164
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-11164-00.patch, HDFS-11164-01.patch
>
>
> When mover is trying to move a pinned block to another datanode, it will 
> internally hits the following IOException and mark the block movement as 
> {{failure}}. Since the Mover has {{dfs.mover.retry.max.attempts}} configs, it 
> will continue moving this block until it reaches {{retryMaxAttempts}}. If the 
> block movement failure(s) are only due to block pinning, then retry is 
> unnecessary. The idea of this jira is to avoid retry attempts of pinned 
> blocks as they won't be able to move to a different node. 
> {code}
> 2016-11-22 10:56:10,537 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: Failed to move 
> blk_1073741825_1001 with size=52 from 127.0.0.1:19501:DISK to 
> 127.0.0.1:19758:ARCHIVE through 127.0.0.1:19501
> java.io.IOException: Got error, status=ERROR, status message opReplaceBlock 
> BP-1772076264-10.252.146.200-1479792322960:blk_1073741825_1001 received 
> exception java.io.IOException: Got error, status=ERROR, status message Not 
> able to copy block 1073741825 to /127.0.0.1:19826 because it's pinned , copy 
> block BP-1772076264-10.252.146.200-1479792322960:blk_1073741825_1001 from 
> /127.0.0.1:19501, reportedBlock move is failed
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:118)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:417)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:358)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$5(Dispatcher.java:322)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:1075)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10802) [SPS]: Add satisfyStoragePolicy API in HdfsAdmin

2016-11-17 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15673088#comment-15673088
 ] 

Uma Maheswara Rao G commented on HDFS-10802:


Thank you. 
+1 on the latest patch.

> [SPS]: Add satisfyStoragePolicy API in HdfsAdmin
> 
>
> Key: HDFS-10802
> URL: https://issues.apache.org/jira/browse/HDFS-10802
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Uma Maheswara Rao G
>Assignee: Yuanbo Liu
> Attachments: HDFS-10802-HDFS-10285.001.patch, 
> HDFS-10802-HDFS-10285.002.patch, HDFS-10802-HDFS-10285.003.patch, 
> HDFS-10802-HDFS-10285.004.patch, HDFS-10802-HDFS-10285.005.patch, 
> HDFS-10802-HDFS-10285.006.patch, HDFS-10802.001.patch, editsStored
>
>
> This JIRA is to track the work for adding user/admin API for calling to 
> satisfyStoragePolicy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10802) [SPS]: Add satisfyStoragePolicy API in HdfsAdmin

2016-11-16 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669887#comment-15669887
 ] 

Uma Maheswara Rao G commented on HDFS-10802:


[~yuanbo] , Thank you for all your work.
Almost Patch looks nice to me. I have few nits though
# -
{code}
LOG.debug("Added block collection id {} to block "
++ "storageMovementNeeded queue", id);
{code}
Can you add isDebugEnable check?
# -
{code}
+  /**
+   * satisfy the storage policy for a file/directory.
{code}
Should be Satisfy (Should start with capital letter)
# -
{code}
 Assert.fail(String.format(
+"Failed to satisfy storage policy for %s since %s is set to 
false.",
+file, DFS_STORAGE_POLICY_ENABLED_KEY));
{code}
I think this message would confuse if test fails. If this assertion fails, 
message would be still right. So, we should log expectation here I think.
Exa: “Should failed to satisfy…….”
 
On API part, I had one thought as follows. Currently user may need to call two 
APIs, one is setStoargePolicy and then satisfyStoragePolicy. Some times if user 
wants to do both immediately then there can be one API which should do both.
how about thinking to have one overloaded API to setStoragePolicy with args of 
path and satisfy flag to trigger storage policy satisfaction? example: 
setStoragePolicy (src, true) 
This can be in another JIRA. I am raising here just to think and raise new JIRA 
if you guys agree.

Regarding TODO of persisting: do you want to file new JIRA?


> [SPS]: Add satisfyStoragePolicy API in HdfsAdmin
> 
>
> Key: HDFS-10802
> URL: https://issues.apache.org/jira/browse/HDFS-10802
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Uma Maheswara Rao G
>Assignee: Yuanbo Liu
> Attachments: HDFS-10802-HDFS-10285.001.patch, 
> HDFS-10802-HDFS-10285.002.patch, HDFS-10802-HDFS-10285.003.patch, 
> HDFS-10802-HDFS-10285.004.patch, HDFS-10802-HDFS-10285.005.patch, 
> HDFS-10802.001.patch, editsStored
>
>
> This JIRA is to track the work for adding user/admin API for calling to 
> satisfyStoragePolicy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10802) [SPS]: Add satisfyStoragePolicy API in HdfsAdmin

2016-11-14 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15666056#comment-15666056
 ] 

Uma Maheswara Rao G commented on HDFS-10802:


HI [~yuanbo], Thank you so much for the contribution. Its great work from you 
here. 
Thank you Rakesh for quick reviews.

One suggestion here, instead of mixing up the persistence and API, how about 
this JIRA focus on API alone and discuss on API signature more. Let's file 
another JIRA for persistence, which could cover the persistence of this items 
into FSImage as well?

> [SPS]: Add satisfyStoragePolicy API in HdfsAdmin
> 
>
> Key: HDFS-10802
> URL: https://issues.apache.org/jira/browse/HDFS-10802
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Uma Maheswara Rao G
>Assignee: Yuanbo Liu
> Attachments: HDFS-10802-HDFS-10285.001.patch, 
> HDFS-10802-HDFS-10285.002.patch, HDFS-10802-HDFS-10285.003.patch, 
> HDFS-10802.001.patch, editsStored
>
>
> This JIRA is to track the work for adding user/admin API for calling to 
> satisfyStoragePolicy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10802) [SPS]: Add satisfyStoragePolicy API in HdfsAdmin

2016-11-11 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15658719#comment-15658719
 ] 

Uma Maheswara Rao G commented on HDFS-10802:


Hey [~yuanbo], Are you working on this? Just wanted to check the status.

> [SPS]: Add satisfyStoragePolicy API in HdfsAdmin
> 
>
> Key: HDFS-10802
> URL: https://issues.apache.org/jira/browse/HDFS-10802
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Uma Maheswara Rao G
>Assignee: Yuanbo Liu
>
> This JIRA is to track the work for adding user/admin API for calling to 
> satisfyStoragePolicy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11068) [SPS]: Provide unique trackID to track the block movement sends to coordinator

2016-11-11 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11068:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-10285
 Release Note: I have just committed this to branch
   Status: Resolved  (was: Patch Available)

> [SPS]: Provide unique trackID to track the block movement sends to coordinator
> --
>
> Key: HDFS-11068
> URL: https://issues.apache.org/jira/browse/HDFS-11068
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Rakesh R
>Assignee: Rakesh R
> Fix For: HDFS-10285
>
> Attachments: HDFS-11068-HDFS-10285-01.patch, 
> HDFS-11068-HDFS-10285-02.patch, HDFS-11068-HDFS-10285-03.patch, 
> HDFS-11068-HDFS-10285.patch
>
>
> Presently DatanodeManager uses constant  value -1 as 
> [trackID|https://github.com/apache/hadoop/blob/HDFS-10285/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1607],
>  which is a temporary value. As per discussion with [~umamaheswararao], one 
> proposal is to use {{BlockCollectionId/InodeFileId}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11068) [SPS]: Provide unique trackID to track the block movement sends to coordinator

2016-11-11 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656591#comment-15656591
 ] 

Uma Maheswara Rao G commented on HDFS-11068:


+1 on the latest patch

> [SPS]: Provide unique trackID to track the block movement sends to coordinator
> --
>
> Key: HDFS-11068
> URL: https://issues.apache.org/jira/browse/HDFS-11068
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-11068-HDFS-10285-01.patch, 
> HDFS-11068-HDFS-10285-02.patch, HDFS-11068-HDFS-10285-03.patch, 
> HDFS-11068-HDFS-10285.patch
>
>
> Presently DatanodeManager uses constant  value -1 as 
> [trackID|https://github.com/apache/hadoop/blob/HDFS-10285/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1607],
>  which is a temporary value. As per discussion with [~umamaheswararao], one 
> proposal is to use {{BlockCollectionId/InodeFileId}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-11068) [SPS]: Provide unique trackID to track the block movement sends to coordinator

2016-11-09 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652862#comment-15652862
 ] 

Uma Maheswara Rao G edited comment on HDFS-11068 at 11/10/16 3:13 AM:
--

# -
{code}
+  public Map getBlocksToMoveStorages() {
+Map trackIdVsBlocks = new LinkedHashMap<>();
+synchronized (storageMovementBlocks) {
+  if (storageMovementBlocks.isEmpty()) {
+return trackIdVsBlocks;
+  }
+  trackIdVsBlocks.putAll(storageMovementBlocks);
+  storageMovementBlocks.keySet().removeAll(trackIdVsBlocks.keySet());
+}
+return trackIdVsBlocks;
   }
{code}
Here what if one trackId/blockcollection contains many blocks to move? So, how 
about just keep once trackID per heartbeat? (Later we may need to sub decid 
them into small batches with in tracked itself if blocks are many(ex: a file 
contains many blocks)) 
# -
{quote}
+  // TODO: Temporarily using the results from StoragePolicySatisfier
+  // class. This has to be revisited as part of HDFS-11029.
{quote}
HDFS-11029 almost ready. we can incorporate required changes and remove this 
TODO. Thanks for adding TODO.

Other than this comments, patch looks great. Thanks


was (Author: umamaheswararao):

{code}

+  public Map getBlocksToMoveStorages() {
+Map trackIdVsBlocks = new LinkedHashMap<>();
+synchronized (storageMovementBlocks) {
+  if (storageMovementBlocks.isEmpty()) {
+return trackIdVsBlocks;
+  }
+  trackIdVsBlocks.putAll(storageMovementBlocks);
+  storageMovementBlocks.keySet().removeAll(trackIdVsBlocks.keySet());
+}
+return trackIdVsBlocks;
   }
{code}
Here what if one trackId/blockcollection contains many blocks to move? So, how 
about just keep once trackID per heartbeat? (Later we may need to sub decid 
them into small batches with in tracked itself if blocks are many(ex: a file 
contains many blocks)) 
{quote}
+  // TODO: Temporarily using the results from StoragePolicySatisfier
+  // class. This has to be revisited as part of HDFS-11029.
{quote}
HDFS-11029 almost ready. we can incorporate required changes and remove this 
TODO. Thanks for adding TODO.

Other than this comments, patch looks great. Thanks

> [SPS]: Provide unique trackID to track the block movement sends to coordinator
> --
>
> Key: HDFS-11068
> URL: https://issues.apache.org/jira/browse/HDFS-11068
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-11068-HDFS-10285-01.patch, 
> HDFS-11068-HDFS-10285.patch
>
>
> Presently DatanodeManager uses constant  value -1 as 
> [trackID|https://github.com/apache/hadoop/blob/HDFS-10285/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1607],
>  which is a temporary value. As per discussion with [~umamaheswararao], one 
> proposal is to use {{BlockCollectionId/InodeFileId}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11068) [SPS]: Provide unique trackID to track the block movement sends to coordinator

2016-11-09 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652862#comment-15652862
 ] 

Uma Maheswara Rao G commented on HDFS-11068:



{code}

+  public Map getBlocksToMoveStorages() {
+Map trackIdVsBlocks = new LinkedHashMap<>();
+synchronized (storageMovementBlocks) {
+  if (storageMovementBlocks.isEmpty()) {
+return trackIdVsBlocks;
+  }
+  trackIdVsBlocks.putAll(storageMovementBlocks);
+  storageMovementBlocks.keySet().removeAll(trackIdVsBlocks.keySet());
+}
+return trackIdVsBlocks;
   }
{code}
Here what if one trackId/blockcollection contains many blocks to move? So, how 
about just keep once trackID per heartbeat? (Later we may need to sub decid 
them into small batches with in tracked itself if blocks are many(ex: a file 
contains many blocks)) 
{quote}
+  // TODO: Temporarily using the results from StoragePolicySatisfier
+  // class. This has to be revisited as part of HDFS-11029.
{quote}
HDFS-11029 almost ready. we can incorporate required changes and remove this 
TODO. Thanks for adding TODO.

Other than this comments, patch looks great. Thanks

> [SPS]: Provide unique trackID to track the block movement sends to coordinator
> --
>
> Key: HDFS-11068
> URL: https://issues.apache.org/jira/browse/HDFS-11068
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-11068-HDFS-10285-01.patch, 
> HDFS-11068-HDFS-10285.patch
>
>
> Presently DatanodeManager uses constant  value -1 as 
> [trackID|https://github.com/apache/hadoop/blob/HDFS-10285/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1607],
>  which is a temporary value. As per discussion with [~umamaheswararao], one 
> proposal is to use {{BlockCollectionId/InodeFileId}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11029) [SPS]:Provide retry mechanism for the blocks which were failed while moving its storage at DNs

2016-11-09 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11029:
---
Attachment: HDFS-11029-HDFS-10285-02.patch

Fixed minor checkstyle and javadoc

> [SPS]:Provide retry mechanism for the blocks which were failed while moving 
> its storage at DNs
> --
>
> Key: HDFS-11029
> URL: https://issues.apache.org/jira/browse/HDFS-11029
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-11029-HDFS-10285-00.patch, 
> HDFS-11029-HDFS-10285-01.patch, HDFS-11029-HDFS-10285-02.patch
>
>
> When DN co-ordinator finds some of blocks associated to trackedID could not 
> be moved its storages, due to some errors.Here retry may work in some cases, 
> example if target node has no space. Then retry by finding another target can 
> work. 
> So, based on the movement result flag(SUCCESS/FAILURE) from DN Co-ordinator,  
> NN would retry by scanning the blocks again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-11123) [SPS] Make storage policy satisfier daemon work on/off dynamically

2016-11-09 Thread Uma Maheswara Rao G (JIRA)

Uma Maheswara Rao G created HDFS-11123:
--

 Summary: [SPS] Make storage policy satisfier daemon work on/off 
dynamically
 Key: HDFS-11123
 URL: https://issues.apache.org/jira/browse/HDFS-11123
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Uma Maheswara Rao G


The idea of this task is to make SPS daemon thread to start/stop dynamically in 
Namenode process with out needing to restart complete Namenode.
So, this will help in the case of admin wants to switch of this SPS and wants 
to run Mover tool externally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11029) [SPS]:Provide retry mechanism for the blocks which were failed while moving its storage at DNs

2016-11-09 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11029:
---
Attachment: HDFS-11029-HDFS-10285-01.patch

Thank you so much, [~rakeshr] for the quick reviews. Here is the patch which 
addresses the comments except #2. For #2, I will make them configurable later 
along with other parameters in other JIRA. If you notice I already added a 
TODO. Thanks

> [SPS]:Provide retry mechanism for the blocks which were failed while moving 
> its storage at DNs
> --
>
> Key: HDFS-11029
> URL: https://issues.apache.org/jira/browse/HDFS-11029
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-11029-HDFS-10285-00.patch, 
> HDFS-11029-HDFS-10285-01.patch
>
>
> When DN co-ordinator finds some of blocks associated to trackedID could not 
> be moved its storages, due to some errors.Here retry may work in some cases, 
> example if target node has no space. Then retry by finding another target can 
> work. 
> So, based on the movement result flag(SUCCESS/FAILURE) from DN Co-ordinator,  
> NN would retry by scanning the blocks again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11029) [SPS]:Provide retry mechanism for the blocks which were failed while moving its storage at DNs

2016-11-07 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11029:
---
Status: Patch Available  (was: Open)

> [SPS]:Provide retry mechanism for the blocks which were failed while moving 
> its storage at DNs
> --
>
> Key: HDFS-11029
> URL: https://issues.apache.org/jira/browse/HDFS-11029
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-11029-HDFS-10285-00.patch
>
>
> When DN co-ordinator finds some of blocks associated to trackedID could not 
> be moved its storages, due to some errors.Here retry may work in some cases, 
> example if target node has no space. Then retry by finding another target can 
> work. 
> So, based on the movement result flag(SUCCESS/FAILURE) from DN Co-ordinator,  
> NN would retry by scanning the blocks again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Issue Comment Deleted] (HDFS-10285) Storage Policy Satisfier in Namenode

2016-11-07 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-10285:
---
Comment: was deleted

(was: Attached the initial patch for this work. Please review.)

> Storage Policy Satisfier in Namenode
> 
>
> Key: HDFS-10285
> URL: https://issues.apache.org/jira/browse/HDFS-10285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 2.7.2
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: Storage-Policy-Satisfier-in-HDFS-May10.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11029) [SPS]:Provide retry mechanism for the blocks which were failed while moving its storage at DNs

2016-11-07 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11029:
---
Attachment: HDFS-11029-HDFS-10285-00.patch

Attached initial patch for this work. Please review.

> [SPS]:Provide retry mechanism for the blocks which were failed while moving 
> its storage at DNs
> --
>
> Key: HDFS-11029
> URL: https://issues.apache.org/jira/browse/HDFS-11029
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-11029-HDFS-10285-00.patch
>
>
> When DN co-ordinator finds some of blocks associated to trackedID could not 
> be moved its storages, due to some errors.Here retry may work in some cases, 
> example if target node has no space. Then retry by finding another target can 
> work. 
> So, based on the movement result flag(SUCCESS/FAILURE) from DN Co-ordinator,  
> NN would retry by scanning the blocks again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10285) Storage Policy Satisfier in Namenode

2016-11-07 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-10285:
---
Attachment: (was: HDFS-11029-HDFS-10285-00.patch)

> Storage Policy Satisfier in Namenode
> 
>
> Key: HDFS-10285
> URL: https://issues.apache.org/jira/browse/HDFS-10285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 2.7.2
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: Storage-Policy-Satisfier-in-HDFS-May10.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10285) Storage Policy Satisfier in Namenode

2016-11-07 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-10285:
---
Attachment: HDFS-11029-HDFS-10285-00.patch

Attached the initial patch for this work. Please review.

> Storage Policy Satisfier in Namenode
> 
>
> Key: HDFS-10285
> URL: https://issues.apache.org/jira/browse/HDFS-10285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 2.7.2
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: Storage-Policy-Satisfier-in-HDFS-May10.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10954) [SPS]: Provide mechanism to send blocks movement result back to NN from coordinator DN

2016-11-02 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15631039#comment-15631039
 ] 

Uma Maheswara Rao G commented on HDFS-10954:


+1 on the latest patch

> [SPS]: Provide mechanism to send blocks movement result back to NN from 
> coordinator DN
> --
>
> Key: HDFS-10954
> URL: https://issues.apache.org/jira/browse/HDFS-10954
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-10954-HDFS-10285-00.patch, 
> HDFS-10954-HDFS-10285-01.patch, HDFS-10954-HDFS-10285-02.patch, 
> HDFS-10954-HDFS-10285-03.patch
>
>
> This jira is a follow-up task of HDFS-10884. As part of HDFS-10884 jira, it 
> is providing a mechanism to collect all the success/failed block movement 
> results at the {{co-ordinator datanode}} side. Now, the idea of this jira is 
> to discuss an efficient way to report these success/failed block movement 
> results to namenode, so that NN can take necessary action based on this 
> information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10954) [SPS]: Provide mechanism to send blocks movement result back to NN from coordinator DN

2016-11-01 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15627666#comment-15627666
 ] 

Uma Maheswara Rao G commented on HDFS-10954:


Thank you [~rakeshr] for giving the patch, I have reviewed this patch. Please 
check the comments below.

# -
{code}
+public class BlocksStorageMovementResult {
+
+  private final long trackId;
+  private final MovementStatus status;
{code}
Please change MovementStatus to Status
# -
for testPerTrackIdBlocksStorageMovementResults:
Can you check TestStorageReport for adding test case instead of adding temp 
method for assertion if possible? If above test does not help, then fine.
# -
{code}
if (blksMovementResults != null) {
+  builder.addAllBlksMovementResults(
+  PBHelper.convertBlksMovResults(blksMovementResults));
+}
{code}
Since the BlocksStorageMovementResultProto is repeated type, you may need to 
send empty array instead of skipping when null? I think its not made as 
optional right.


> [SPS]: Provide mechanism to send blocks movement result back to NN from 
> coordinator DN
> --
>
> Key: HDFS-10954
> URL: https://issues.apache.org/jira/browse/HDFS-10954
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-10954-HDFS-10285-00.patch, 
> HDFS-10954-HDFS-10285-01.patch, HDFS-10954-HDFS-10285-02.patch
>
>
> This jira is a follow-up task of HDFS-10884. As part of HDFS-10884 jira, it 
> is providing a mechanism to collect all the success/failed block movement 
> results at the {{co-ordinator datanode}} side. Now, the idea of this jira is 
> to discuss an efficient way to report these success/failed block movement 
> results to namenode, so that NN can take necessary action based on this 
> information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDFS-10802) [SPS]: Add satisfyStoragePolicy API in HdfsAdmin

2016-10-31 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G reassigned HDFS-10802:
--

Assignee: Yuanbo Liu  (was: Uma Maheswara Rao G)

Hi [~yuanbo], Thank you for your interest on the project. I have just assigned 
this task to you. Feel free to upload a patch when you are ready. I would be 
happy to review your patch.

> [SPS]: Add satisfyStoragePolicy API in HdfsAdmin
> 
>
> Key: HDFS-10802
> URL: https://issues.apache.org/jira/browse/HDFS-10802
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Uma Maheswara Rao G
>Assignee: Yuanbo Liu
>
> This JIRA is to track the work for adding user/admin API for calling to 
> satisfyStoragePolicy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10884) [SPS]: Add block movement tracker to track the completion of block movement future tasks at DN

2016-10-25 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-10884:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-10285
   Status: Resolved  (was: Patch Available)

> [SPS]: Add block movement tracker to track the completion of block movement 
> future tasks at DN
> --
>
> Key: HDFS-10884
> URL: https://issues.apache.org/jira/browse/HDFS-10884
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: HDFS-10285
>Reporter: Rakesh R
>Assignee: Rakesh R
> Fix For: HDFS-10285
>
> Attachments: HDFS-10884-HDFS-10285-00.patch, 
> HDFS-10884-HDFS-10285-01.patch, HDFS-10884-HDFS-10285-02.patch, 
> HDFS-10884-HDFS-10285-03.patch, HDFS-10884-HDFS-10285-04.patch, 
> HDFS-10884-HDFS-10285-05.patch
>
>
> Presently 
> [StoragePolicySatisfyWorker#processBlockMovingTasks()|https://github.com/apache/hadoop/blob/HDFS-10285/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/StoragePolicySatisfyWorker.java#L147]
>  function act as a blocking call. The idea of this jira is to implement a 
> mechanism to track these movements async so that would allow other movement 
> while processing the previous one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10884) [SPS]: Add block movement tracker to track the completion of block movement future tasks at DN

2016-10-25 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15604526#comment-15604526
 ] 

Uma Maheswara Rao G commented on HDFS-10884:


I have just pushed this patch to the branch

> [SPS]: Add block movement tracker to track the completion of block movement 
> future tasks at DN
> --
>
> Key: HDFS-10884
> URL: https://issues.apache.org/jira/browse/HDFS-10884
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: HDFS-10285
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-10884-HDFS-10285-00.patch, 
> HDFS-10884-HDFS-10285-01.patch, HDFS-10884-HDFS-10285-02.patch, 
> HDFS-10884-HDFS-10285-03.patch, HDFS-10884-HDFS-10285-04.patch, 
> HDFS-10884-HDFS-10285-05.patch
>
>
> Presently 
> [StoragePolicySatisfyWorker#processBlockMovingTasks()|https://github.com/apache/hadoop/blob/HDFS-10285/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/StoragePolicySatisfyWorker.java#L147]
>  function act as a blocking call. The idea of this jira is to implement a 
> mechanism to track these movements async so that would allow other movement 
> while processing the previous one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10884) [SPS]: Add block movement tracker to track the completion of block movement future tasks at DN

2016-10-25 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15604523#comment-15604523
 ] 

Uma Maheswara Rao G commented on HDFS-10884:


+1 on the latest patch. Thanks Rakesh for incorporating feedbacks.

> [SPS]: Add block movement tracker to track the completion of block movement 
> future tasks at DN
> --
>
> Key: HDFS-10884
> URL: https://issues.apache.org/jira/browse/HDFS-10884
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: HDFS-10285
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-10884-HDFS-10285-00.patch, 
> HDFS-10884-HDFS-10285-01.patch, HDFS-10884-HDFS-10285-02.patch, 
> HDFS-10884-HDFS-10285-03.patch, HDFS-10884-HDFS-10285-04.patch, 
> HDFS-10884-HDFS-10285-05.patch
>
>
> Presently 
> [StoragePolicySatisfyWorker#processBlockMovingTasks()|https://github.com/apache/hadoop/blob/HDFS-10285/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/StoragePolicySatisfyWorker.java#L147]
>  function act as a blocking call. The idea of this jira is to implement a 
> mechanism to track these movements async so that would allow other movement 
> while processing the previous one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10884) [SPS]: Add block movement tracker to track the completion of block movement future tasks at DN

2016-10-21 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15597224#comment-15597224
 ] 

Uma Maheswara Rao G commented on HDFS-10884:


[~rakeshr] Thanks. Please check below.

# {quote}
How about renaming the handler class to BlockMovementsCompletionHandler 
representing collection of block movements under the respective trackId?
{quote}
you mean BlocksMovementsCompletionHandler? It make sense to me.
# -
 {code}
+// TODO: Need to revisit this when NN is implemented to be able to send
+// block moving commands.
{code}
Can you check this TODO? Now protobuf with commands done right?
# TestStoragePolicySatisfyWorker.java: It would be good if we cover to add some 
test cases to get end result per trackID with some special conditions, like 
when no space on one target but other block targets are fine, then result 
should be retry. etc. When target node completely down etc. I am ok if you 
planned this in another patch, but have a plan to cover.

Once addressed this, we can push this important patch. Thanks

> [SPS]: Add block movement tracker to track the completion of block movement 
> future tasks at DN
> --
>
> Key: HDFS-10884
> URL: https://issues.apache.org/jira/browse/HDFS-10884
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: HDFS-10285
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-10884-HDFS-10285-00.patch, 
> HDFS-10884-HDFS-10285-01.patch, HDFS-10884-HDFS-10285-02.patch, 
> HDFS-10884-HDFS-10285-03.patch, HDFS-10884-HDFS-10285-04.patch
>
>
> Presently 
> [StoragePolicySatisfyWorker#processBlockMovingTasks()|https://github.com/apache/hadoop/blob/HDFS-10285/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/StoragePolicySatisfyWorker.java#L147]
>  function act as a blocking call. The idea of this jira is to implement a 
> mechanism to track these movements async so that would allow other movement 
> while processing the previous one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

< 1 2 3 4 5 6 7 8 9 10 >

501 - 600 of 2332 matches

Mail list logo