[jira] [Comment Edited] (HDFS-12225) [SPS]: Optimize extended attributes for tracking SPS movements

2017-08-22 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137598#comment-16137598
 ] 

Uma Maheswara Rao G edited comment on HDFS-12225 at 8/23/17 1:21 AM:
-

Hi [~surendrasingh] thanks for updating the patch. 

Latest patch almost looks good to me. I have few more minor comments though.

# .
{code}
Integer pendingWork = pendingWorkForDirectory.get(rootId) - 1;
+pendingWorkForDirectory.put(rootId, pendingWork);
+if (pendingWork != null && pendingWork <= 0) {
{code}
I think pendingWork never be null. So, no null check needed. Probably you need 
a null check for pendingWorkForDirectory.get(rootId) ?
# .
{code}
pendingSPSTaskScanner = storageMovementNeeded.getPendingTaskScanner();
+pendingSPSTaskScanner.start();
{code}
Instead of getting thread outside of the class, keep start/stop inside that 
class itself and expose methods to do that.
BlockStorageMovementNeeded class can have init and close methods?
# .
{code}
 BlockStorageMovementInfosBatch blkStorageMovementInfosBatch = nodeinfo
 .getBlocksToMoveStorages();
{code}
 Could you please move this variable down before to the below code piece?
{code}
if (blkStorageMovementInfosBatch != null) {
+  cmds.add(new BlockStorageMovementCommand(
+  DatanodeProtocol.DNA_BLOCK_STORAGE_MOVEMENT,
+  blkStorageMovementInfosBatch.getTrackID(), blockPoolId,
+  blkStorageMovementInfosBatch.getBlockMovingInfo()));
+}
{code}
# . Seems like we have 2 kind of classes I can see for tracking the info. 1. 
ItemInfo for tracking attempted items 2. SatisfyTrackInfo - for tracking 
storage movement needed items. Should we unify that class naming? How about 
something like, SatisfyTrackInfo  --> ItemInfo and ItemInfo  --> 
AttemptedItemInfo( extends ItemInfo ?) . Does this make sense to you?
# . Could you please provide clear documentation in the class 
BlockStorageMovementNeeded, what it is doing and responsible? Now its doing 
more than what doc says.


was (Author: umamaheswararao):
Hi [~surendrasingh] thanks for updating the patch. 

Latest patch almost looks good to me. I have few more minor comments though.

# .
{code}
Integer pendingWork = pendingWorkForDirectory.get(rootId) - 1;
+pendingWorkForDirectory.put(rootId, pendingWork);
+if (pendingWork != null && pendingWork <= 0) {
{code}
I think pendingWork never be null. So, no null check needed. Probably you need 
a null check for pendingWorkForDirectory.get(rootId) ?
# .
{code}
pendingSPSTaskScanner = storageMovementNeeded.getPendingTaskScanner();
+pendingSPSTaskScanner.start();
{code}
Instead of getting thread outside of the class, keep start/stop inside that 
class itself and expose methods to do that.
BlockStorageMovementNeeded class can have init and close methods?
# .
{code}
 BlockStorageMovementInfosBatch blkStorageMovementInfosBatch = nodeinfo
 .getBlocksToMoveStorages();
{code}
 Could you please move this variable down before to the below code piece?
{code}
if (blkStorageMovementInfosBatch != null) {
+  cmds.add(new BlockStorageMovementCommand(
+  DatanodeProtocol.DNA_BLOCK_STORAGE_MOVEMENT,
+  blkStorageMovementInfosBatch.getTrackID(), blockPoolId,
+  blkStorageMovementInfosBatch.getBlockMovingInfo()));
+}
{code}
# . Seems like we have 2 kind of classes I can see for tracking the info. 1. 
ItemInfo for tracking attempted items 2. SatisfyTrackInfo - for tracking 
storage movement needed items. Should we unify that class naming? How about 
something like, SatisfyTrackInfo  --> ItemInfo and ItemInfo  --> 
AttemptedItemInfo( extends ItemInfo ?) . Does this make sense to you?

> [SPS]: Optimize extended attributes for tracking SPS movements
> --
>
> Key: HDFS-12225
> URL: https://issues.apache.org/jira/browse/HDFS-12225
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Uma Maheswara Rao G
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-12225-HDFS-10285-01.patch, 
> HDFS-12225-HDFS-10285-02.patch, HDFS-12225-HDFS-10285-03.patch, 
> HDFS-12225-HDFS-10285-04.patch, HDFS-12225-HDFS-10285-05.patch, 
> HDFS-12225-HDFS-10285-06.patch, HDFS-12225-HDFS-10285-07.patch, 
> HDFS-12225-HDFS-10285-08.patch
>
>
> We have discussed to optimize number extended attributes and asked to report 
> separate JIRA while implementing [HDFS-11150 | 
> https://issues.apache.org/jira/browse/HDFS-11150?focusedCommentId=15766127=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15766127]
> This is the JIRA to track that work 
> For the context, comment copied from HDFS-11150
> {quote}
> [~yuanbo] wrote : I've tried that before. There is an issue here if we only 
> mark the directory. 

[jira] [Comment Edited] (HDFS-12225) [SPS]: Optimize extended attributes for tracking SPS movements

2017-08-21 Thread Surendra Singh Lilhore (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136290#comment-16136290
 ] 

Surendra Singh Lilhore edited comment on HDFS-12225 at 8/22/17 5:09 AM:


Thanks [~umamaheswararao] for review..

bq. Why is this required? We will remove Xattr only when queue really becomes 
empty right?
This is for standby namenode. SN will load inode's into 
{{pendingSPSxAttrInode}} from the edits logs based in SPS xAttr. When SPS work 
is done for that INode, again one remove xAttr edit log will come and in that 
flow we will remove the inode from the  {{pendingSPSxAttrInode}} queue in SN.


was (Author: surendrasingh):
bq. Why is this required? We will remove Xattr only when queue really becomes 
empty right?
This is for standby namenode. SN will load inode's into 
{{pendingSPSxAttrInode}} from the edits logs based in SPS xAttr. When SPS work 
is done for that INode, again one remove xAttr edit log will come and in that 
flow we will remove the inode from the  {{pendingSPSxAttrInode}} queue.

> [SPS]: Optimize extended attributes for tracking SPS movements
> --
>
> Key: HDFS-12225
> URL: https://issues.apache.org/jira/browse/HDFS-12225
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Uma Maheswara Rao G
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-12225-HDFS-10285-01.patch, 
> HDFS-12225-HDFS-10285-02.patch, HDFS-12225-HDFS-10285-03.patch, 
> HDFS-12225-HDFS-10285-04.patch, HDFS-12225-HDFS-10285-05.patch, 
> HDFS-12225-HDFS-10285-06.patch, HDFS-12225-HDFS-10285-07.patch
>
>
> We have discussed to optimize number extended attributes and asked to report 
> separate JIRA while implementing [HDFS-11150 | 
> https://issues.apache.org/jira/browse/HDFS-11150?focusedCommentId=15766127=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15766127]
> This is the JIRA to track that work 
> For the context, comment copied from HDFS-11150
> {quote}
> [~yuanbo] wrote : I've tried that before. There is an issue here if we only 
> mark the directory. When recovering from FsImage, the InodeMap isn't built 
> up, so we don't know the sub-inode of a given inode, in the end, We cannot 
> add these inodes to movement queue in FSDirectory#addToInodeMap, any 
> thoughts?{quote}
> {quote}
> [~umamaheswararao] wrote: I got what you are saying. Ok for simplicity we can 
> add for all Inodes now. For this to handle 100%, we may need intermittent 
> processing, like first we should add them to some intermittentList while 
> loading fsImage, once fully loaded and when starting active services, we 
> should process that list and do required stuff. But it would add some 
> additional complexity may be. Let's do with all file inodes now and we can 
> revisit later if it is really creating issues. How about you raise a JIRA for 
> it and think to optimize separately?
> {quote}
> {quote}
> [~andrew.wang] wrote in HDFS-10285 merge time review comment : HDFS-10899 
> also the cursor of the iterator in the EZ root xattr to track progress and 
> handle restarts. I wonder if we can do something similar here to avoid having 
> an xattr-per-file being moved.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12225) [SPS]: Optimize extended attributes for tracking SPS movements

2017-08-19 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134189#comment-16134189
 ] 

Rakesh R edited comment on HDFS-12225 at 8/19/17 5:17 PM:
--

Thanks [~surendrasingh]. Few minor comments on the patch. Also, please take 
care checkstyle warnings.
# Could you add wait/notify for the PendingSPSTaskScanner re-check, otw it 
would occupy CPU cycles unnecessary if there are no items in the queue. Adding 
sample code snippet,
{code}
if (rootINodeId == null) {
  // Waiting for SPS path
  synchronized(pendingSPSxAttrInode){
  pendingSPSxAttrInode.wait(5000);
  }
} else {
   // ...
}
{code}
{code}
  public void addInodeToPendingSPSList(long id) {
pendingSPSxAttrInode.add(id);
// Notify waiting PendingSPSTaskScanner thread about the newly added SPS 
path.
synchronized(pendingSPSxAttrInode){
   pendingSPSxAttrInode.notify();
}
}
{code}
# Can you cleanup {{pendingSPSxAttrInode}} list on {{SPS#clearQueues()}}. Also, 
need to call {{SPS#clearQueues()}} on {{SPS#disable()}} function ?
# Can we keep the pendingSPSTaskScanner#interrupt call after {{if 
(storagePolicySatisfierThread == null)}} check like below:
{code}
StoragePolicySatisfier.java

  public synchronized void disable(boolean forceStop) {
isRunning = false;
if (storagePolicySatisfierThread == null) {
  return;
}
if (pendingSPSTaskScanner != null) {
  pendingSPSTaskScanner.interrupt();
}
{code}
# Please modify the javadoc of {{#isDir()}} to
{code}
// Returns true if the tracking path is a directory, false otherwise.
{code}
# Typo: {{block moved in ACHIVE}} -> {{block moved in ARCHIVE}}


was (Author: rakeshr):
Thanks [~surendrasingh] for the patch. Adding few minor comments:
# Could you add wait/notify for the PendingSPSTaskScanner re-check, otw it 
would occupy CPU cycles unnecessary if there are no items in the queue. Adding 
sample code snippet,
{code}
if (rootINodeId == null) {
  // Waiting for SPS path
  synchronized(pendingSPSxAttrInode){
  pendingSPSxAttrInode.wait(5000);
  }
} else {
   // ...
}
{code}
{code}
  public void addInodeToPendingSPSList(long id) {
pendingSPSxAttrInode.add(id);
// Notify waiting PendingSPSTaskScanner thread about the newly added SPS 
path.
synchronized(pendingSPSxAttrInode){
   pendingSPSxAttrInode.notify();
}
}
{code}
# Can you cleanup {{pendingSPSxAttrInode}} list on {{SPS#clearQueues()}}. Also, 
need to call {{SPS#clearQueues()}} on {{SPS#disable()}} function ?
# Can we keep the pendingSPSTaskScanner#interrupt call after {{if 
(storagePolicySatisfierThread == null)}} check like below:
{code}
StoragePolicySatisfier.java

  public synchronized void disable(boolean forceStop) {
isRunning = false;
if (storagePolicySatisfierThread == null) {
  return;
}
if (pendingSPSTaskScanner != null) {
  pendingSPSTaskScanner.interrupt();
}
{code}
# Please modify the javadoc of {{#isDir()}} to
{code}
// Returns true if the tracking path is a directory, false otherwise.
{code}
# Typo: {{block moved in ACHIVE}} -> {{block moved in ARCHIVE}}

> [SPS]: Optimize extended attributes for tracking SPS movements
> --
>
> Key: HDFS-12225
> URL: https://issues.apache.org/jira/browse/HDFS-12225
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Uma Maheswara Rao G
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-12225-HDFS-10285-01.patch, 
> HDFS-12225-HDFS-10285-02.patch, HDFS-12225-HDFS-10285-03.patch
>
>
> We have discussed to optimize number extended attributes and asked to report 
> separate JIRA while implementing [HDFS-11150 | 
> https://issues.apache.org/jira/browse/HDFS-11150?focusedCommentId=15766127=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15766127]
> This is the JIRA to track that work 
> For the context, comment copied from HDFS-11150
> {quote}
> [~yuanbo] wrote : I've tried that before. There is an issue here if we only 
> mark the directory. When recovering from FsImage, the InodeMap isn't built 
> up, so we don't know the sub-inode of a given inode, in the end, We cannot 
> add these inodes to movement queue in FSDirectory#addToInodeMap, any 
> thoughts?{quote}
> {quote}
> [~umamaheswararao] wrote: I got what you are saying. Ok for simplicity we can 
> add for all Inodes now. For this to handle 100%, we may need intermittent 
> processing, like first we should add them to some intermittentList while 
> loading fsImage, once fully loaded and when starting active services, we 
> should process that list and do required stuff. But it would add some 
> additional complexity may be. Let's do with all file inodes now and we can 
> revisit later if it is