[jira] [Commented] (HDFS-12969) DfsAdmin listOpenFiles should report files by type

2020-07-13 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156508#comment-17156508
 ] 

hemanthboyina commented on HDFS-12969:
--

test failures are not related , please review

> DfsAdmin listOpenFiles should report files by type
> --
>
> Key: HDFS-12969
> URL: https://issues.apache.org/jira/browse/HDFS-12969
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: Manoj Govindassamy
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-12969.001.patch, HDFS-12969.002.patch
>
>
> HDFS-11847 has introduced a new option to {{-blockingDecommission}} to an 
> existing command 
> {{dfsadmin -listOpenFiles}}. But the reporting done by the command doesn't 
> differentiate the files based on the type (like blocking decommission). In 
> order to change the reporting style, the proto format used for the base 
> command has to be updated to carry additional fields and better be done in a 
> new jira outside of HDFS-11847. This jira is to track the end-to-end 
> enhancements needed for dfsadmin -listOpenFiles console output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12969) DfsAdmin listOpenFiles should report files by type

2020-07-12 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-12969:
-
Attachment: HDFS-12969.002.patch

> DfsAdmin listOpenFiles should report files by type
> --
>
> Key: HDFS-12969
> URL: https://issues.apache.org/jira/browse/HDFS-12969
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: Manoj Govindassamy
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-12969.001.patch, HDFS-12969.002.patch
>
>
> HDFS-11847 has introduced a new option to {{-blockingDecommission}} to an 
> existing command 
> {{dfsadmin -listOpenFiles}}. But the reporting done by the command doesn't 
> differentiate the files based on the type (like blocking decommission). In 
> order to change the reporting style, the proto format used for the base 
> command has to be updated to carry additional fields and better be done in a 
> new jira outside of HDFS-11847. This jira is to track the end-to-end 
> enhancements needed for dfsadmin -listOpenFiles console output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12969) DfsAdmin listOpenFiles should report files by type

2020-07-11 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156118#comment-17156118
 ] 

hemanthboyina commented on HDFS-12969:
--

discussed with manoj offline , thanks [~manojg] for letting me to take up the 
jira

thanks [~tasanuma] for comments
{quote}but does it need to change listOpenFiles API? I'm a bit worried about 
one more deprecated listOpenFiles API
{quote}
we are adding an additional field to the return value of an existing API , so i 
dont think we need to deprecate a API

attached patch , please review 

> DfsAdmin listOpenFiles should report files by type
> --
>
> Key: HDFS-12969
> URL: https://issues.apache.org/jira/browse/HDFS-12969
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: Manoj Govindassamy
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-12969.001.patch
>
>
> HDFS-11847 has introduced a new option to {{-blockingDecommission}} to an 
> existing command 
> {{dfsadmin -listOpenFiles}}. But the reporting done by the command doesn't 
> differentiate the files based on the type (like blocking decommission). In 
> order to change the reporting style, the proto format used for the base 
> command has to be updated to carry additional fields and better be done in a 
> new jira outside of HDFS-11847. This jira is to track the end-to-end 
> enhancements needed for dfsadmin -listOpenFiles console output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12969) DfsAdmin listOpenFiles should report files by type

2020-07-11 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-12969:
-
Attachment: HDFS-12969.001.patch
Status: Patch Available  (was: Open)

> DfsAdmin listOpenFiles should report files by type
> --
>
> Key: HDFS-12969
> URL: https://issues.apache.org/jira/browse/HDFS-12969
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: Manoj Govindassamy
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-12969.001.patch
>
>
> HDFS-11847 has introduced a new option to {{-blockingDecommission}} to an 
> existing command 
> {{dfsadmin -listOpenFiles}}. But the reporting done by the command doesn't 
> differentiate the files based on the type (like blocking decommission). In 
> order to change the reporting style, the proto format used for the base 
> command has to be updated to carry additional fields and better be done in a 
> new jira outside of HDFS-11847. This jira is to track the end-to-end 
> enhancements needed for dfsadmin -listOpenFiles console output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15446) CreateSnapshotOp fails during edit log loading for /.reserved/raw/path with error java.io.FileNotFoundException: Directory does not exist: /.reserved/raw/path

2020-07-03 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151150#comment-17151150
 ] 

hemanthboyina commented on HDFS-15446:
--

[~ayushtkn] we can go ahead 

> CreateSnapshotOp fails during edit log loading for /.reserved/raw/path with 
> error java.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/path 
> ---
>
> Key: HDFS-15446
> URL: https://issues.apache.org/jira/browse/HDFS-15446
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Srinivasu Majeti
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: reserved-word, snapshot
> Attachments: HDFS-15446.001.patch, HDFS-15446.002.patch, 
> HDFS-15446.003.patch
>
>
> After allowing snapshot creation for a path say /app-logs , when we try to 
> create snapshot on 
>  /.reserved/raw/app-logs , its successful with snapshot creation but later 
> when Standby Namenode is restarted and tries to load the edit record 
> OP_CREATE_SNAPSHOT , we see it failing and Standby Namenode shuts down with 
> an exception "ava.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/app-logs" .
> Here are the steps to reproduce :
> {code:java}
> # hdfs dfs -ls /.reserved/raw/
> Found 15 items
> drwxrwxrwt   - yarn   hadoop  0 2020-06-29 10:27 
> /.reserved/raw/app-logs
> drwxr-xr-x   - hive   hadoop  0 2020-06-29 10:29 /.reserved/raw/prod
> ++
> [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /app-logs
> Allowing snapshot on /app-logs succeeded
> [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /prod
> Allowing snapshot on /prod succeeded
> ++
> # hdfs lsSnapshottableDir
> drwxrwxrwt 0 yarn hadoop 0 2020-06-29 10:27 1 65536 /app-logs
> drwxr-xr-x 0 hive hadoop 0 2020-06-29 10:29 1 65536 /prod
> ++
> [root@c3230-node2 ~]# hdfs dfs -createSnapshot /.reserved/raw/app-logs testSS
> Created snapshot /.reserved/raw/app-logs/.snapshot/testSS
> {code}
> Exception we see in Standby namenode while loading the snapshot creation edit 
> record.
> {code:java}
> 2020-06-29 10:33:25,488 ERROR namenode.NameNode (NameNode.java:main(1715)) - 
> Failed to start namenode.
> java.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/app-logs
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.valueOf(INodeDirectory.java:60)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.getSnapshottableRoot(SnapshotManager.java:259)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.createSnapshot(SnapshotManager.java:307)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:772)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:257)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15446) CreateSnapshotOp fails during edit log loading for /.reserved/raw/path with error java.io.FileNotFoundException: Directory does not exist: /.reserved/raw/path

2020-07-01 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149591#comment-17149591
 ] 

hemanthboyina commented on HDFS-15446:
--

thanks [~sodonnell] for your work , i  had a small question on latest patch

in the existing code we are using getINodesInPath for getting iip ,  
getINodesInPath internally calls checkTraverse(pc, iip, resolveLink)

i think we are missing the functionality of this method in the latest patch  , 
aren't we required this existing functionality ?

 

 

> CreateSnapshotOp fails during edit log loading for /.reserved/raw/path with 
> error java.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/path 
> ---
>
> Key: HDFS-15446
> URL: https://issues.apache.org/jira/browse/HDFS-15446
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Srinivasu Majeti
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: reserved-word, snapshot
> Attachments: HDFS-15446.001.patch, HDFS-15446.002.patch, 
> HDFS-15446.003.patch
>
>
> After allowing snapshot creation for a path say /app-logs , when we try to 
> create snapshot on 
>  /.reserved/raw/app-logs , its successful with snapshot creation but later 
> when Standby Namenode is restarted and tries to load the edit record 
> OP_CREATE_SNAPSHOT , we see it failing and Standby Namenode shuts down with 
> an exception "ava.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/app-logs" .
> Here are the steps to reproduce :
> {code:java}
> # hdfs dfs -ls /.reserved/raw/
> Found 15 items
> drwxrwxrwt   - yarn   hadoop  0 2020-06-29 10:27 
> /.reserved/raw/app-logs
> drwxr-xr-x   - hive   hadoop  0 2020-06-29 10:29 /.reserved/raw/prod
> ++
> [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /app-logs
> Allowing snapshot on /app-logs succeeded
> [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /prod
> Allowing snapshot on /prod succeeded
> ++
> # hdfs lsSnapshottableDir
> drwxrwxrwt 0 yarn hadoop 0 2020-06-29 10:27 1 65536 /app-logs
> drwxr-xr-x 0 hive hadoop 0 2020-06-29 10:29 1 65536 /prod
> ++
> [root@c3230-node2 ~]# hdfs dfs -createSnapshot /.reserved/raw/app-logs testSS
> Created snapshot /.reserved/raw/app-logs/.snapshot/testSS
> {code}
> Exception we see in Standby namenode while loading the snapshot creation edit 
> record.
> {code:java}
> 2020-06-29 10:33:25,488 ERROR namenode.NameNode (NameNode.java:main(1715)) - 
> Failed to start namenode.
> java.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/app-logs
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.valueOf(INodeDirectory.java:60)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.getSnapshottableRoot(SnapshotManager.java:259)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.createSnapshot(SnapshotManager.java:307)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:772)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:257)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15420) approx scheduled blocks not reseting over time

2020-06-27 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146956#comment-17146956
 ] 

hemanthboyina commented on HDFS-15420:
--

you can check timeoutReReplications ( Number of timed out block re-replications)

> approx scheduled blocks not reseting over time
> --
>
> Key: HDFS-15420
> URL: https://issues.apache.org/jira/browse/HDFS-15420
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: block placement
>Affects Versions: 2.6.0, 3.0.0
> Environment: Our 2.6.0 environment is a 3 node cluster running 
> cdh5.15.0.
> Our 3.0.0 environment is a 4 node cluster running cdh6.3.0.
>Reporter: Max Mizikar
>Priority: Minor
> Attachments: Screenshot from 2020-06-18 09-29-57.png, Screenshot from 
> 2020-06-18 09-31-15.png
>
>
> We have been experiencing large amounts of scheduled blocks that never get 
> cleared out. This is preventing blocks from being placed even when there is 
> plenty of space on the system.
> Here is an example of the block growth over 24 hours on one of our systems 
> running 2.6.0
>  !Screenshot from 2020-06-18 09-29-57.png! 
> Here is an example of the block growth over 24 hours on one of our systems 
> running 3.0.0
>  !Screenshot from 2020-06-18 09-31-15.png! 
> https://issues.apache.org/jira/browse/HDFS-1172 appears to be the main issue 
> we were having on 2.6.0 so the growth has decreased since upgrading to 3.0.0, 
> however, there appears to still be a systemic growth in scheduled blocks over 
> time and our systems will still need to restart the namenode on occasion to 
> reset this count. I have not determined what is causing the leaked blocks in 
> 3.0.0.
> Looking into the issue, I discovered that the intention is for scheduled 
> blocks to slowly go back down to 0 after errors cause blocks to be leaked.
> {code}
>   /** Increment the number of blocks scheduled. */
>   void incrementBlocksScheduled(StorageType t) {
> currApproxBlocksScheduled.add(t, 1);
>   }
>   
>   /** Decrement the number of blocks scheduled. */
>   void decrementBlocksScheduled(StorageType t) {
> if (prevApproxBlocksScheduled.get(t) > 0) {
>   prevApproxBlocksScheduled.subtract(t, 1);
> } else if (currApproxBlocksScheduled.get(t) > 0) {
>   currApproxBlocksScheduled.subtract(t, 1);
> } 
> // its ok if both counters are zero.
>   }
>   
>   /** Adjusts curr and prev number of blocks scheduled every few minutes. */
>   private void rollBlocksScheduled(long now) {
> if (now - lastBlocksScheduledRollTime > BLOCKS_SCHEDULED_ROLL_INTERVAL) {
>   prevApproxBlocksScheduled.set(currApproxBlocksScheduled);
>   currApproxBlocksScheduled.reset();
>   lastBlocksScheduledRollTime = now;
> }
>   }
> {code}
> However, this code does not do what is intended if the system has a constant 
> flow of written blocks. If blocks make it into prevApproxBlocksScheduled, the 
> next scheduled block increments currApproxBlocksScheduled and when it 
> completes, it decrements prevApproxBlocksScheduled preventing the leaked 
> block to be removed from the approx count. So, for errors to be corrected, we 
> have to not write any data for the roll period of 10 minutes. The number of 
> blocks we write per 10 minutes is quite high. This allows the error on the 
> approx counts to grow to very large numbers.
> The comments in the ticket for the original implementation suggest this 
> issues was known. https://issues.apache.org/jira/browse/HADOOP-3707. However, 
> it's not clear to me if the severity of it was known at the time.
> > So if there are some blocks that are not reported back by the datanode, 
> > they will eventually get adjusted (usually 10 min; bit longer if datanode 
> > is continuously receiving blocks).
> The comments suggest it will eventually get cleared out, but in our case, it 
> never gets cleared out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15434) RBF: MountTableResolver#getDestinationForPath failing with AssertionError from localCache

2020-06-24 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143851#comment-17143851
 ] 

hemanthboyina commented on HDFS-15434:
--

Though we couldn't reproduce the issue , we doubt this issue occured under high 
concurrency 

discussed with [~brahmareddy] offline ,  i think we can extend removalListener 
to cacheBuilder to solve the problem

any suggestions ?

> RBF: MountTableResolver#getDestinationForPath failing with AssertionError 
> from localCache
> -
>
> Key: HDFS-15434
> URL: https://issues.apache.org/jira/browse/HDFS-15434
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Priority: Major
>
> {code:java}
> org.apache.hadoop.ipc.Remote.Exception : java.lang.AssertionError 
> com.google.common.cache.LocalCache$Segment.evictEntries(LocalCache.java:2698) 
> at 
> com.google.common.cache.LocalCache$Segment.storeLoadedValue(LocalCache.java:3166)
>  
> at 
> com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2386)
> at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2351) 
> at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
>  
> at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) at
> at com.google.common.cache.LocalCache.get(LocalCache.java:3965) 
> at 
> com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
> at 
> org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver.getDestinationForPath(MountTableResolver.java:382)
> at 
> org.apache.hadoop.hdfs.server.federation.resolver.MultipleDestinationMountTableResolver.getDestinationForPath(MultipleDestinationMountTableResolver.java:87)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:1406)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:1389)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.getFileInfo(RouterClientProtocol.java:741)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:763)
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15434) RBF: MountTableResolver#getDestinationForPath failing with AssertionError from localCache

2020-06-24 Thread hemanthboyina (Jira)
hemanthboyina created HDFS-15434:


 Summary: RBF: MountTableResolver#getDestinationForPath failing 
with AssertionError from localCache
 Key: HDFS-15434
 URL: https://issues.apache.org/jira/browse/HDFS-15434
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: hemanthboyina


{code:java}
org.apache.hadoop.ipc.Remote.Exception : java.lang.AssertionError 
com.google.common.cache.LocalCache$Segment.evictEntries(LocalCache.java:2698) 
at 
com.google.common.cache.LocalCache$Segment.storeLoadedValue(LocalCache.java:3166)
 
at 
com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2386)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2351) 
at 
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
 
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) at
at com.google.common.cache.LocalCache.get(LocalCache.java:3965) 
at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
at 
org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver.getDestinationForPath(MountTableResolver.java:382)
at 
org.apache.hadoop.hdfs.server.federation.resolver.MultipleDestinationMountTableResolver.getDestinationForPath(MultipleDestinationMountTableResolver.java:87)
at 
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:1406)
at 
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:1389)
at 
org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.getFileInfo(RouterClientProtocol.java:741)
at 
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:763)
 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15415) Reduce locking in Datanode DirectoryScanner

2020-06-19 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140699#comment-17140699
 ] 

hemanthboyina commented on HDFS-15415:
--

thanks [~sodonnell] for your analysis 

after taking the snapshot , if we did not acquire the lock ,  have you 
considered the scenario  if the blocks being converted from RBW to FINALIZED ? 
{quote}A finalized block could be appended. If that happens both the genstamp 
and length will change
{quote}
agree with you , though the replica will be changed from FINALIZED to RBW  , so 
anyways we are getting only the finalized blocks it shouldnt be problem 

 

> Reduce locking in Datanode DirectoryScanner
> ---
>
> Key: HDFS-15415
> URL: https://issues.apache.org/jira/browse/HDFS-15415
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15415.001.patch
>
>
> In HDFS-15406, we have a small change to greatly reduce the runtime and 
> locking time of the datanode DirectoryScanner. They may be room for further 
> improvement here:
> 1. These lines of code in DirectoryScanner#scan(), obtain a snapshot of the 
> finalized blocks from memory, and then sort them, under the DN lock. However 
> the blocks are stored in a sorted structure (FoldedTreeSet) and hence the 
> sort should be unnecessary.
> {code}
>   final List bl = dataset.getFinalizedBlocks(bpid);
>   Collections.sort(bl); // Sort based on blockId
> {code}
> 2.  From the scan step, we have captured a snapshot of what is on disk. After 
> calling `dataset.getFinalizedBlocks(bpid);` as above we have taken a snapshot 
> of in memory. The two snapshots are never 100% in sync as things are always 
> changing as the disk is scanned.
> We are only comparing finalized blocks, so they should not really change:
> * If a block is deleted after our snapshot, our snapshot will not see it and 
> that is OK.
> * A finalized block could be appended. If that happens both the genstamp and 
> length will change, but that should be handled by reconcile when it calls 
> `FSDatasetImpl.checkAndUpdate()`, and there is nothing stopping blocks being 
> appended after they have been scanned from disk, but before they have been 
> compared with memory.
> My suspicion is that we can do all the comparison work outside of the lock 
> and checkAndUpdate() re-checks any differences later under the lock on a 
> block by block basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15420) approx scheduled blocks not reseting over time

2020-06-18 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17139468#comment-17139468
 ] 

hemanthboyina commented on HDFS-15420:
--

thanks [~maxmzkr] for providing the report , a quick question are there any 
pending reconstruction requests that are timed out?

> approx scheduled blocks not reseting over time
> --
>
> Key: HDFS-15420
> URL: https://issues.apache.org/jira/browse/HDFS-15420
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: block placement
>Affects Versions: 2.6.0, 3.0.0
> Environment: Our 2.6.0 environment is a 3 node cluster running 
> cdh5.15.0.
> Our 3.0.0 environment is a 4 node cluster running cdh6.3.0.
>Reporter: Max Mizikar
>Priority: Minor
> Attachments: Screenshot from 2020-06-18 09-29-57.png, Screenshot from 
> 2020-06-18 09-31-15.png
>
>
> We have been experiencing large amounts of scheduled blocks that never get 
> cleared out. This is preventing blocks from being placed even when there is 
> plenty of space on the system.
> Here is an example of the block growth over 24 hours on one of our systems 
> running 2.6.0
>  !Screenshot from 2020-06-18 09-29-57.png! 
> Here is an example of the block growth over 24 hours on one of our systems 
> running 3.0.0
>  !Screenshot from 2020-06-18 09-31-15.png! 
> https://issues.apache.org/jira/browse/HDFS-1172 appears to be the main issue 
> we were having on 2.6.0 so the growth has decreased since upgrading to 3.0.0, 
> however, there appears to still be a systemic growth in scheduled blocks over 
> time and our systems will still need to restart the namenode on occasion to 
> reset this count. I have not determined what is causing the leaked blocks in 
> 3.0.0.
> Looking into the issue, I discovered that the intention is for scheduled 
> blocks to slowly go back down to 0 after errors cause blocks to be leaked.
> {code}
>   /** Increment the number of blocks scheduled. */
>   void incrementBlocksScheduled(StorageType t) {
> currApproxBlocksScheduled.add(t, 1);
>   }
>   
>   /** Decrement the number of blocks scheduled. */
>   void decrementBlocksScheduled(StorageType t) {
> if (prevApproxBlocksScheduled.get(t) > 0) {
>   prevApproxBlocksScheduled.subtract(t, 1);
> } else if (currApproxBlocksScheduled.get(t) > 0) {
>   currApproxBlocksScheduled.subtract(t, 1);
> } 
> // its ok if both counters are zero.
>   }
>   
>   /** Adjusts curr and prev number of blocks scheduled every few minutes. */
>   private void rollBlocksScheduled(long now) {
> if (now - lastBlocksScheduledRollTime > BLOCKS_SCHEDULED_ROLL_INTERVAL) {
>   prevApproxBlocksScheduled.set(currApproxBlocksScheduled);
>   currApproxBlocksScheduled.reset();
>   lastBlocksScheduledRollTime = now;
> }
>   }
> {code}
> However, this code does not do what is intended if the system has a constant 
> flow of written blocks. If blocks make it into prevApproxBlocksScheduled, the 
> next scheduled block increments currApproxBlocksScheduled and when it 
> completes, it decrements prevApproxBlocksScheduled preventing the leaked 
> block to be removed from the approx count. So, for errors to be corrected, we 
> have to not write any data for the roll period of 10 minutes. The number of 
> blocks we write per 10 minutes is quite high. This allows the error on the 
> approx counts to grow to very large numbers.
> The comments in the ticket for the original implementation suggest this 
> issues was known. https://issues.apache.org/jira/browse/HADOOP-3707. However, 
> it's not clear to me if the severity of it was known at the time.
> > So if there are some blocks that are not reported back by the datanode, 
> > they will eventually get adjusted (usually 10 min; bit longer if datanode 
> > is continuously receiving blocks).
> The comments suggest it will eventually get cleared out, but in our case, it 
> never gets cleared out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15416) The addStorageLocations() method in the DataStorage class is not perfect.

2020-06-17 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138634#comment-17138634
 ] 

hemanthboyina commented on HDFS-15416:
--

thanks for filing the issue [~jianghuazhu]

successLocations is a list , so you can check successLocations.isEmpty()  and 
return 
{code:java}
final List successLocations = loadDataStorage(
  datanode, nsInfo, dataDirs, startOpt, executor); {code}

> The addStorageLocations() method in the DataStorage class is not perfect.
> -
>
> Key: HDFS-15416
> URL: https://issues.apache.org/jira/browse/HDFS-15416
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.1.0, 3.1.1
>Reporter: jianghua zhu
>Priority: Major
>
> SuccessLocations content is an array, when the number is 0, do not need to be 
> executed again loadBlockPoolSliceStorage ().
> code : 
> try
> {    
> final List successLocations = loadDataStorage(   datanode, 
> nsInfo,    dataDirs, startOpt, executor);  
> return loadBlockPoolSliceStorage(   datanode, nsInfo,   successLocations, 
> startOpt, executor); }
> finally
> {     executor.shutdown(); }
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15406) Improve the speed of Datanode Block Scan

2020-06-17 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138627#comment-17138627
 ] 

hemanthboyina commented on HDFS-15406:
--

thanks [~sodonnell] , the test failure was related to this change

updated the patch as per your suggestion, please review

> Improve the speed of Datanode Block Scan
> 
>
> Key: HDFS-15406
> URL: https://issues.apache.org/jira/browse/HDFS-15406
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15406.001.patch, HDFS-15406.002.patch
>
>
> In our customer cluster we have approx 10M blocks in one datanode 
> the Datanode to scans all the blocks , it has taken nearly 5mins
> {code:java}
> 2020-06-10 12:17:06,869 | INFO  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | BlockPool BP-1104115233-**.**.**.**-1571300215588 Total blocks: 
> 11149530, missing metadata files:472, missing block files:472, missing blocks 
> in memory:0, mismatched blocks:0 | DirectoryScanner.java:473
> 2020-06-10 12:17:06,869 | WARN  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | Lock held time above threshold: lock identifier: 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl 
> lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
> org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
> org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
>  | InstrumentedLock.java:143 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15406) Improve the speed of Datanode Block Scan

2020-06-17 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15406:
-
Attachment: HDFS-15406.002.patch

> Improve the speed of Datanode Block Scan
> 
>
> Key: HDFS-15406
> URL: https://issues.apache.org/jira/browse/HDFS-15406
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15406.001.patch, HDFS-15406.002.patch
>
>
> In our customer cluster we have approx 10M blocks in one datanode 
> the Datanode to scans all the blocks , it has taken nearly 5mins
> {code:java}
> 2020-06-10 12:17:06,869 | INFO  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | BlockPool BP-1104115233-**.**.**.**-1571300215588 Total blocks: 
> 11149530, missing metadata files:472, missing block files:472, missing blocks 
> in memory:0, mismatched blocks:0 | DirectoryScanner.java:473
> 2020-06-10 12:17:06,869 | WARN  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | Lock held time above threshold: lock identifier: 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl 
> lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
> org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
> org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
>  | InstrumentedLock.java:143 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15406) Improve the speed of Datanode Block Scan

2020-06-17 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138275#comment-17138275
 ] 

hemanthboyina commented on HDFS-15406:
--

thanks [~sodonnell] for the comment
{quote}If you don't want to investigate that as part of this Jira, we can 
create a sub-jira for it. It might be a trivial change to improve things a bit 
further, perhaps saving about another 5 seconds.
{quote}
we can create a sub Jira to track that change

> Improve the speed of Datanode Block Scan
> 
>
> Key: HDFS-15406
> URL: https://issues.apache.org/jira/browse/HDFS-15406
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15406.001.patch
>
>
> In our customer cluster we have approx 10M blocks in one datanode 
> the Datanode to scans all the blocks , it has taken nearly 5mins
> {code:java}
> 2020-06-10 12:17:06,869 | INFO  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | BlockPool BP-1104115233-**.**.**.**-1571300215588 Total blocks: 
> 11149530, missing metadata files:472, missing block files:472, missing blocks 
> in memory:0, mismatched blocks:0 | DirectoryScanner.java:473
> 2020-06-10 12:17:06,869 | WARN  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | Lock held time above threshold: lock identifier: 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl 
> lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
> org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
> org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
>  | InstrumentedLock.java:143 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15406) Improve the speed of Datanode Block Scan

2020-06-16 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136730#comment-17136730
 ] 

hemanthboyina commented on HDFS-15406:
--

thanks [~sodonnell] for the comment 
{quote} # After you started caching `getBaseURI()` did it improve the runtime 
of both the getDiskReport() step and compare with in-memory step?{quote}
yes , on caching the getBaseURI the time taken is improved in both places , 
more details below
{quote}2. Looking at the code on trunk, I don't think we create any scanInfo 
objects under the lock in the compare sections unless there is a difference. If 
this change improved your runtime under the lock from 6m -> 52 seconds, is this 
because there is a large number of differences between disk and memory on your 
cluster for some reason?
{quote}
i think you are referring creation of scan info objects here because for 
creating ScanInfo object we use vol.getBaseUri() , Even we do not create 
scaninfo objects  we internally  call getBaseURI 3 times inside the lock by  
info.getBlockFile() , info.getGenStamp() , memBlock.compareWith(info) , so if 
we have large number of differences between disk and in memory calls to 
getBaseURI will be even more , So by caching getBaseURI we saved  atleast 3 
times inside lock and once in outside lock  , so there was huge decrease in 
lock time held ,We tried creating 11M blocks in our independent cluster and we 
could see lock time is held for 3min 20 sec before caching and upon caching the 
getBaseURI lock time was52sec
{quote}3. Did you do capture any profiles (flame chart or debug log messages) 
to see how long each part of the code under the lock runs for? I am interested 
in these lines in DirectoryScanner#scan():
{quote}
i didn't capture profiles , but i could see dataset.getFinalizedBlocks(bpid) in 
stacktrace as it has taken more than 600ms on every iteration

 

> Improve the speed of Datanode Block Scan
> 
>
> Key: HDFS-15406
> URL: https://issues.apache.org/jira/browse/HDFS-15406
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15406.001.patch
>
>
> In our customer cluster we have approx 10M blocks in one datanode 
> the Datanode to scans all the blocks , it has taken nearly 5mins
> {code:java}
> 2020-06-10 12:17:06,869 | INFO  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | BlockPool BP-1104115233-**.**.**.**-1571300215588 Total blocks: 
> 11149530, missing metadata files:472, missing block files:472, missing blocks 
> in memory:0, mismatched blocks:0 | DirectoryScanner.java:473
> 2020-06-10 12:17:06,869 | WARN  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | Lock held time above threshold: lock identifier: 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl 
> lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
> org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
> org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
>  | InstrumentedLock.java:143 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15406) Improve the speed of Datanode Block Scan

2020-06-15 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15406:
-
Attachment: HDFS-15406.001.patch
Status: Patch Available  (was: Open)

> Improve the speed of Datanode Block Scan
> 
>
> Key: HDFS-15406
> URL: https://issues.apache.org/jira/browse/HDFS-15406
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15406.001.patch
>
>
> In our customer cluster we have approx 10M blocks in one datanode 
> the Datanode to scans all the blocks , it has taken nearly 5mins
> {code:java}
> 2020-06-10 12:17:06,869 | INFO  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | BlockPool BP-1104115233-**.**.**.**-1571300215588 Total blocks: 
> 11149530, missing metadata files:472, missing block files:472, missing blocks 
> in memory:0, mismatched blocks:0 | DirectoryScanner.java:473
> 2020-06-10 12:17:06,869 | WARN  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | Lock held time above threshold: lock identifier: 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl 
> lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
> org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
> org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
>  | InstrumentedLock.java:143 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15406) Improve the speed of Datanode Block Scan

2020-06-15 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17135767#comment-17135767
 ] 

hemanthboyina commented on HDFS-15406:
--

thanks [~brahmareddy] for the comment
{quote}Not sure, whether HDFS-9668 will address the same.
{quote}
the locking contention was being handled through HDFS-15150 and HDFS-15160 by 
introducing read and write lock , though these doesn't improve the time taken 
by the lock which this Jira is aimed to solve

and by caching the getBaseURI() , the lock time was reduced to 52sec

> Improve the speed of Datanode Block Scan
> 
>
> Key: HDFS-15406
> URL: https://issues.apache.org/jira/browse/HDFS-15406
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
>
> In our customer cluster we have approx 10M blocks in one datanode 
> the Datanode to scans all the blocks , it has taken nearly 5mins
> {code:java}
> 2020-06-10 12:17:06,869 | INFO  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | BlockPool BP-1104115233-**.**.**.**-1571300215588 Total blocks: 
> 11149530, missing metadata files:472, missing block files:472, missing blocks 
> in memory:0, mismatched blocks:0 | DirectoryScanner.java:473
> 2020-06-10 12:17:06,869 | WARN  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | Lock held time above threshold: lock identifier: 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl 
> lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
> org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
> org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
>  | InstrumentedLock.java:143 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15406) Improve the speed of Datanode Block Scan

2020-06-15 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17135756#comment-17135756
 ] 

hemanthboyina commented on HDFS-15406:
--

discussed with [~pilchard] offline  , the major drawback in his report was the 
configuration , "dfs.datanode.directoryscan.threads" was set as 1

two major points here

*) if we have more volumes , the thread 
count(dfs.datanode.directoryscan.threads)  will impact on the time taken by 
getDiskReport() aka getVolumeReports() , as each volume will be launched by a 
thread here , if we increase the threads count , the time taken by 
getDiskReport() will be less

*) next we acquire the lock and compare the report to the in memory data

For creating ScanInfo object we use vol.getBaseUri()
{code:java}
FSVolumeSpi#ScanInfo
public ScanInfo(long blockId, File blockFile, File metaFile,
FsVolumeSpi vol) {
  String condensedVolPath =
  (vol == null || vol.getBaseURI() == null) ? null :
  getCondensedPath(new File(vol.getBaseURI()).getAbsolutePath()); {code}
we addDifference if there is any mismatch in blockId or blockLength for that we 
call getMetaFile() and getBlockFile() , here we  again use vol.getBaseUri
{code:java}
public File getMetaFile() {
return new File(new File(volume.getBaseURI()).getAbsolutePath(),
metaSuffix); {code}
so if a DN has more blocks the calls to getBaseUri are more , and each time we 
call getBaseURI we care converting the currentDir.getParent to URI  which is 
taking time and we can cache this here
{code:java}
public URI getBaseURI() {
  return new File(currentDir.getParent()).toURI();
} {code}
on making this as cache , the lock time reduced to 52 Sec

> Improve the speed of Datanode Block Scan
> 
>
> Key: HDFS-15406
> URL: https://issues.apache.org/jira/browse/HDFS-15406
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
>
> In our customer cluster we have approx 10M blocks in one datanode 
> the Datanode to scans all the blocks , it has taken nearly 5mins
> {code:java}
> 2020-06-10 12:17:06,869 | INFO  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | BlockPool BP-1104115233-**.**.**.**-1571300215588 Total blocks: 
> 11149530, missing metadata files:472, missing block files:472, missing blocks 
> in memory:0, mismatched blocks:0 | DirectoryScanner.java:473
> 2020-06-10 12:17:06,869 | WARN  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | Lock held time above threshold: lock identifier: 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl 
> lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
> org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
> org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
>  | InstrumentedLock.java:143 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15403) NPE in FileIoProvider#transferToSocketFully

2020-06-13 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134907#comment-17134907
 ] 

hemanthboyina commented on HDFS-15403:
--

{quote}Shouldn't we call onFailure() when e.getMessage() is null?
{quote}
that's a valid point  [~tasanuma]

updated the patch , please review

> NPE in FileIoProvider#transferToSocketFully
> ---
>
> Key: HDFS-15403
> URL: https://issues.apache.org/jira/browse/HDFS-15403
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15403.001.patch, HDFS-15403.002.patch
>
>
> {code:java}
> [DataXceiver for client  at /127.0.0.1:41904 [Sending block 
> BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
> error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
> /127.0.0.1:34789[DataXceiver for client  at /127.0.0.1:41904 [Sending block 
> BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
> error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
> /127.0.0.1:34789java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.FileIoProvider.transferToSocketFully(FileIoProvider.java:283)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:614)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:809)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:756)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:610)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:292)
> at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15403) NPE in FileIoProvider#transferToSocketFully

2020-06-13 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15403:
-
Attachment: HDFS-15403.002.patch

> NPE in FileIoProvider#transferToSocketFully
> ---
>
> Key: HDFS-15403
> URL: https://issues.apache.org/jira/browse/HDFS-15403
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15403.001.patch, HDFS-15403.002.patch
>
>
> {code:java}
> [DataXceiver for client  at /127.0.0.1:41904 [Sending block 
> BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
> error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
> /127.0.0.1:34789[DataXceiver for client  at /127.0.0.1:41904 [Sending block 
> BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
> error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
> /127.0.0.1:34789java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.FileIoProvider.transferToSocketFully(FileIoProvider.java:283)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:614)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:809)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:756)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:610)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:292)
> at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15403) NPE in FileIoProvider#transferToSocketFully

2020-06-13 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134700#comment-17134700
 ] 

hemanthboyina commented on HDFS-15403:
--

thanks [~tasanuma] for the review 
{quote} we should call onFailure() when NPE
{quote}
the NPE here is because of e.getMessage() is null , which is not the exception 
thrown by try block here

 

> NPE in FileIoProvider#transferToSocketFully
> ---
>
> Key: HDFS-15403
> URL: https://issues.apache.org/jira/browse/HDFS-15403
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15403.001.patch
>
>
> {code:java}
> [DataXceiver for client  at /127.0.0.1:41904 [Sending block 
> BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
> error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
> /127.0.0.1:34789[DataXceiver for client  at /127.0.0.1:41904 [Sending block 
> BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
> error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
> /127.0.0.1:34789java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.FileIoProvider.transferToSocketFully(FileIoProvider.java:283)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:614)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:809)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:756)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:610)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:292)
> at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15403) NPE in FileIoProvider#transferToSocketFully

2020-06-12 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134177#comment-17134177
 ] 

hemanthboyina commented on HDFS-15403:
--

thanks for the comment [~elgoiri]  , it was a rare scenario , i tried to 
reproduce , but couldn't  

> NPE in FileIoProvider#transferToSocketFully
> ---
>
> Key: HDFS-15403
> URL: https://issues.apache.org/jira/browse/HDFS-15403
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15403.001.patch
>
>
> {code:java}
> [DataXceiver for client  at /127.0.0.1:41904 [Sending block 
> BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
> error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
> /127.0.0.1:34789[DataXceiver for client  at /127.0.0.1:41904 [Sending block 
> BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
> error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
> /127.0.0.1:34789java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.FileIoProvider.transferToSocketFully(FileIoProvider.java:283)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:614)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:809)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:756)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:610)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:292)
> at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15391) Standby NameNode due loads the corruption edit log, the service exits and cannot be restarted

2020-06-11 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133296#comment-17133296
 ] 

hemanthboyina commented on HDFS-15391:
--

{quote}The block size should be 108764672 in the first 
CloseOp(TXID=126060942290).
When truncate is used, the block size is 63154347.
The block used by CloseOp twice is the same instance, which causes the first 
CloseOp has wrong block size.
When the second CloseOp(TXID=126060943585) is executed, the file is not in the 
UnderConstruction state, and SNN down.
{quote}
HDFS-15175 has reported similar kind of issue 

 

> Standby NameNode due loads the corruption edit log, the service exits and 
> cannot be restarted
> -
>
> Key: HDFS-15391
> URL: https://issues.apache.org/jira/browse/HDFS-15391
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: huhaiyang
>Priority: Critical
>
> In the cluster version 3.2.0 production environment,
>  We found that due to edit log corruption, Standby NameNode could not 
> properly load the Ediltog log, result in abnormal exit of the service and 
> failure to restart
> {noformat}
> The specific scenario is that Flink writes to HDFS(replication file), and in 
> the case of an exception to the write file, the following operations are 
> performed :
> 1.close file
> 2.open file
> 3.truncate file
> 4.append file
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15408) Failed execution caused by SocketTimeoutException

2020-06-11 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133225#comment-17133225
 ] 

hemanthboyina commented on HDFS-15408:
--

thanks for the filing the issue [~echohlne]

at present we have default socket timeout as 1min , do you think 1min is not 
sufficient to determine the http connection timeout ?

> Failed execution caused by SocketTimeoutException
> -
>
> Key: HDFS-15408
> URL: https://issues.apache.org/jira/browse/HDFS-15408
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.1
>Reporter: echohlne
>Priority: Major
>
> When I execute command: hdfs fsck / 
>  in the hadoop cluster to check the health of the cluster, It always report 
> an error execution failure like below:
> {code}
> Connecting to namenode via http://hadoop20:50070/fsck?ugi=hdfs=%2F
> Exception in thread "main" java.net.SocketTimeoutException: Read timed out
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>   at java.net.SocketInputStream.read(SocketInputStream.java:171)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
>   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:359)
>   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
>   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:159)
>   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:156)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:155)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:402)
> {code}
> We try to solve this problem by adding a new parameter: 
> {color:#de350b}*dfs.fsck.http.timeout.ms*{color} to control the 
> connectionTimeout and the readTimeout if the HttpConnection in DFSck.java 
> .Please check is it the right way to solve the problem? thanks a lot!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12969) DfsAdmin listOpenFiles should report files by type

2020-06-10 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130934#comment-17130934
 ] 

hemanthboyina commented on HDFS-12969:
--

any suggestions for this   [~liuml07]  [~elgoiri]

> DfsAdmin listOpenFiles should report files by type
> --
>
> Key: HDFS-12969
> URL: https://issues.apache.org/jira/browse/HDFS-12969
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
>Priority: Major
>
> HDFS-11847 has introduced a new option to {{-blockingDecommission}} to an 
> existing command 
> {{dfsadmin -listOpenFiles}}. But the reporting done by the command doesn't 
> differentiate the files based on the type (like blocking decommission). In 
> order to change the reporting style, the proto format used for the base 
> command has to be updated to carry additional fields and better be done in a 
> new jira outside of HDFS-11847. This jira is to track the end-to-end 
> enhancements needed for dfsadmin -listOpenFiles console output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15351) Blocks Scheduled Count was wrong on Truncate

2020-06-10 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130927#comment-17130927
 ] 

hemanthboyina commented on HDFS-15351:
--

yes  [~elgoiri]  , i think we can go ahead

> Blocks Scheduled Count was wrong on Truncate 
> -
>
> Key: HDFS-15351
> URL: https://issues.apache.org/jira/browse/HDFS-15351
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15351.001.patch, HDFS-15351.002.patch, 
> HDFS-15351.003.patch
>
>
> On truncate and append we remove the blocks from Reconstruction Queue 
> On removing the blocks from pending reconstruction , we need to decrement 
> Blocks Scheduled 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15372) Files in snapshots no longer see attribute provider permissions

2020-06-10 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130923#comment-17130923
 ] 

hemanthboyina commented on HDFS-15372:
--

thanks for the patch [~sodonnell] , the changes looks good

> Files in snapshots no longer see attribute provider permissions
> ---
>
> Key: HDFS-15372
> URL: https://issues.apache.org/jira/browse/HDFS-15372
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15372.001.patch, HDFS-15372.002.patch, 
> HDFS-15372.003.patch, HDFS-15372.004.patch, HDFS-15372.005.patch
>
>
> Given a cluster with an authorization provider configured (eg Sentry) and the 
> paths covered by the provider are snapshotable, there was a change in 
> behaviour in how the provider permissions and ACLs are applied to files in 
> snapshots between the 2.x branch and Hadoop 3.0.
> Eg, if we have the snapshotable path /data, which is Sentry managed. The ACLs 
> below are provided by Sentry:
> {code}
> hadoop fs -getfacl -R /data
> # file: /data
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::---
> group:flume:rwx
> user:hive:rwx
> group:hive:rwx
> group:testgroup:rwx
> mask::rwx
> other::--x
> /data/tab1
> {code}
> After taking a snapshot, the files in the snapshot do not see the provider 
> permissions:
> {code}
> hadoop fs -getfacl -R /data/.snapshot
> # file: /data/.snapshot
> # owner: 
> # group: 
> user::rwx
> group::rwx
> other::rwx
> # file: /data/.snapshot/snap1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/.snapshot/snap1/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> {code}
> However pre-Hadoop 3.0 (when the attribute provider etc was extensively 
> refactored) snapshots did get the provider permissions.
> The reason is this code in FSDirectory.java which ultimately calls the 
> attribute provider and passes the path we want permissions for:
> {code}
>   INodeAttributes getAttributes(INodesInPath iip)
>   throws IOException {
> INode node = FSDirectory.resolveLastINode(iip);
> int snapshot = iip.getPathSnapshotId();
> INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot);
> UserGroupInformation ugi = NameNode.getRemoteUser();
> INodeAttributeProvider ap = this.getUserFilteredAttributeProvider(ugi);
> if (ap != null) {
>   // permission checking sends the full components array including the
>   // first empty component for the root.  however file status
>   // related calls are expected to strip out the root component according
>   // to TestINodeAttributeProvider.
>   byte[][] components = iip.getPathComponents();
>   components = Arrays.copyOfRange(components, 1, components.length);
>   nodeAttrs = ap.getAttributes(components, nodeAttrs);
> }
> return nodeAttrs;
>   }
> {code}
> The line:
> {code}
> INode node = FSDirectory.resolveLastINode(iip);
> {code}
> Picks the last resolved Inode and if you then call node.getPathComponents, 
> for a path like '/data/.snapshot/snap1/tab1' it will return /data/tab1. It 
> resolves the snapshot path to its original location, but its still the 
> snapshot inode.
> However the logic passes 'iip.getPathComponents' which returns 
> "/user/.snapshot/snap1/tab" to the provider.
> The pre Hadoop 3.0 code passes the inode directly to the provider, and hence 
> it only ever sees the path as "/user/data/tab1".
> It is debatable which path should be passed to the provider - 
> /user/.snapshot/snap1/tab or /data/tab1 in the case of snapshots. However as 
> the behaviour has changed I feel we should ensure the old behaviour is 
> retained.
> It would also be fairly easy to provide a config switch so the provider gets 
> the full snapshot path or the resolved path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2020-06-10 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130521#comment-17130521
 ] 

hemanthboyina commented on HDFS-15160:
--

though the scanner takes more time to scan all the blocks , filed HDFS-15406 to 
improvise it

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, 
> HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, 
> HDFS-15160.006.patch, image-2020-04-10-17-18-08-128.png, 
> image-2020-04-10-17-18-55-938.png
>
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15406) Improve the speed of Datanode Block Scan

2020-06-10 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15406:
-
Description: 
In our customer cluster we have approx 10M blocks in one datanode 

the Datanode to scans all the blocks , it has taken nearly 5mins
{code:java}
2020-06-10 12:17:06,869 | INFO  | 
java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
queue] | BlockPool BP-1104115233-**.**.**.**-1571300215588 Total blocks: 
11149530, missing metadata files:472, missing block files:472, missing blocks 
in memory:0, mismatched blocks:0 | DirectoryScanner.java:473
2020-06-10 12:17:06,869 | WARN  | 
java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
queue] | Lock held time above threshold: lock identifier: 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl 
lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: 
java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
 | InstrumentedLock.java:143 {code}

  was:
In our customer cluster we have approx 10M blocks in one datanode 

When Datanode scans all the blocks , it has taken more time 
{code:java}
2020-06-10 12:17:06,869 | INFO  | 
java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
queue] | BlockPool BP-1104115233-**.**.**.**-1571300215588 Total blocks: 
11149530, missing metadata files:472, missing block files:472, missing blocks 
in memory:0, mismatched blocks:0 | DirectoryScanner.java:473
2020-06-10 12:17:06,869 | WARN  | 
java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
queue] | Lock held time above threshold: lock identifier: 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl 
lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: 
java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
 | InstrumentedLock.java:143 {code}


> Improve the speed of Datanode Block Scan
> 
>
> Key: HDFS-15406
> URL: https://issues.apache.org/jira/browse/HDFS-15406
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
>
> In our customer cluster we have approx 10M blocks in one datanode 
> the Datanode to scans all the blocks , it has 

[jira] [Updated] (HDFS-15406) Improve the speed of Datanode Block Scan

2020-06-10 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15406:
-
Description: 
In our customer cluster we have approx 10M blocks in one datanode 

When Datanode scans all the blocks , it has taken more time 

> Improve the speed of Datanode Block Scan
> 
>
> Key: HDFS-15406
> URL: https://issues.apache.org/jira/browse/HDFS-15406
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
>
> In our customer cluster we have approx 10M blocks in one datanode 
> When Datanode scans all the blocks , it has taken more time 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15406) Improve the speed of Datanode Block Scan

2020-06-10 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15406:
-
Description: 
In our customer cluster we have approx 10M blocks in one datanode 

When Datanode scans all the blocks , it has taken more time 
{code:java}
2020-06-10 12:17:06,869 | INFO  | 
java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
queue] | BlockPool BP-1104115233-**.**.**.**-1571300215588 Total blocks: 
11149530, missing metadata files:472, missing block files:472, missing blocks 
in memory:0, mismatched blocks:0 | DirectoryScanner.java:473
2020-06-10 12:17:06,869 | WARN  | 
java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
queue] | Lock held time above threshold: lock identifier: 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl 
lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: 
java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
 | InstrumentedLock.java:143 {code}

  was:
In our customer cluster we have approx 10M blocks in one datanode 

When Datanode scans all the blocks , it has taken more time 


> Improve the speed of Datanode Block Scan
> 
>
> Key: HDFS-15406
> URL: https://issues.apache.org/jira/browse/HDFS-15406
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
>
> In our customer cluster we have approx 10M blocks in one datanode 
> When Datanode scans all the blocks , it has taken more time 
> {code:java}
> 2020-06-10 12:17:06,869 | INFO  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | BlockPool BP-1104115233-**.**.**.**-1571300215588 Total blocks: 
> 11149530, missing metadata files:472, missing block files:472, missing blocks 
> in memory:0, mismatched blocks:0 | DirectoryScanner.java:473
> 2020-06-10 12:17:06,869 | WARN  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | Lock held time above threshold: lock identifier: 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl 
> lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
> org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
> org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 

[jira] [Created] (HDFS-15406) Improve the speed of Datanode Block Scan

2020-06-10 Thread hemanthboyina (Jira)
hemanthboyina created HDFS-15406:


 Summary: Improve the speed of Datanode Block Scan
 Key: HDFS-15406
 URL: https://issues.apache.org/jira/browse/HDFS-15406
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: hemanthboyina
Assignee: hemanthboyina






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2020-06-10 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130497#comment-17130497
 ] 

hemanthboyina edited comment on HDFS-15160 at 6/10/20, 10:48 AM:
-

thanks [~pilchard] for the report ,  HDFS-15150 has introduced read and write 
lock in datanode 

With HDFS-15160 we acquire read lock for scanner , so the write wont be blocked 


was (Author: hemanthboyina):
thanks [~pilchard] for the report ,  HDFS-15150 has introduced read and write 
lock in datanode 

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, 
> HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, 
> HDFS-15160.006.patch, image-2020-04-10-17-18-08-128.png, 
> image-2020-04-10-17-18-55-938.png
>
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2020-06-10 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130497#comment-17130497
 ] 

hemanthboyina commented on HDFS-15160:
--

thanks [~pilchard] for the report ,  HDFS-15150 has introduced read and write 
lock in datanode 

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, 
> HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, 
> HDFS-15160.006.patch, image-2020-04-10-17-18-08-128.png, 
> image-2020-04-10-17-18-55-938.png
>
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15403) NPE in FileIoProvider#transferToSocketFully

2020-06-09 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15403:
-
Attachment: HDFS-15403.001.patch
Status: Patch Available  (was: Open)

> NPE in FileIoProvider#transferToSocketFully
> ---
>
> Key: HDFS-15403
> URL: https://issues.apache.org/jira/browse/HDFS-15403
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15403.001.patch
>
>
> {code:java}
> [DataXceiver for client  at /127.0.0.1:41904 [Sending block 
> BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
> error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
> /127.0.0.1:34789[DataXceiver for client  at /127.0.0.1:41904 [Sending block 
> BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
> error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
> /127.0.0.1:34789java.lang.NullPointerException at 
> org.apache.hadoop.hdfs.server.datanode.FileIoProvider.transferToSocketFully(FileIoProvider.java:283)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:614)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:809)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:756)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:610)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:292) 
> at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15403) NPE in FileIoProvider#transferToSocketFully

2020-06-09 Thread hemanthboyina (Jira)
hemanthboyina created HDFS-15403:


 Summary: NPE in FileIoProvider#transferToSocketFully
 Key: HDFS-15403
 URL: https://issues.apache.org/jira/browse/HDFS-15403
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: hemanthboyina
Assignee: hemanthboyina






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15403) NPE in FileIoProvider#transferToSocketFully

2020-06-09 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15403:
-
Description: 
{code:java}
[DataXceiver for client  at /127.0.0.1:41904 [Sending block 
BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
/127.0.0.1:34789[DataXceiver for client  at /127.0.0.1:41904 [Sending block 
BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
/127.0.0.1:34789java.lang.NullPointerException at 
org.apache.hadoop.hdfs.server.datanode.FileIoProvider.transferToSocketFully(FileIoProvider.java:283)
 at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:614)
 at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:809)
 at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:756)
 at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:610)
 at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152)
 at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104)
 at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:292) at 
java.lang.Thread.run(Thread.java:748) {code}

> NPE in FileIoProvider#transferToSocketFully
> ---
>
> Key: HDFS-15403
> URL: https://issues.apache.org/jira/browse/HDFS-15403
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
>
> {code:java}
> [DataXceiver for client  at /127.0.0.1:41904 [Sending block 
> BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
> error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
> /127.0.0.1:34789[DataXceiver for client  at /127.0.0.1:41904 [Sending block 
> BP-293397713-127.0.1.1-1591535936877:blk_-9223372036854775786_1001]] ERROR 
> datanode.DataNode (DataXceiver.java:run(324)) - 127.0.0.1:34789:DataXceiver 
> error processing READ_BLOCK operation  src: /127.0.0.1:41904 dst: 
> /127.0.0.1:34789java.lang.NullPointerException at 
> org.apache.hadoop.hdfs.server.datanode.FileIoProvider.transferToSocketFully(FileIoProvider.java:283)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:614)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:809)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:756)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:610)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:292) 
> at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12969) DfsAdmin listOpenFiles should report files by type

2020-06-09 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129660#comment-17129660
 ] 

hemanthboyina commented on HDFS-12969:
--

now in the present code , we can list all open files or can list open files 
which are blocking ongoing decommission 

On calling dfsadmin -listOpenFiles -blockingDecommission we list only the files 
which are blocking decommission

but On calling dfsadmin -listOpenFiles we list all open files ,  some of these 
open files can be blocking an ongoing decommission , So for  listOpenFiles 
should we return the list based on type ?

> DfsAdmin listOpenFiles should report files by type
> --
>
> Key: HDFS-12969
> URL: https://issues.apache.org/jira/browse/HDFS-12969
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
>Priority: Major
>
> HDFS-11847 has introduced a new option to {{-blockingDecommission}} to an 
> existing command 
> {{dfsadmin -listOpenFiles}}. But the reporting done by the command doesn't 
> differentiate the files based on the type (like blocking decommission). In 
> order to change the reporting style, the proto format used for the base 
> command has to be updated to carry additional fields and better be done in a 
> new jira outside of HDFS-11847. This jira is to track the end-to-end 
> enhancements needed for dfsadmin -listOpenFiles console output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15372) Files in snapshots no longer see attribute provider permissions

2020-06-08 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128566#comment-17128566
 ] 

hemanthboyina commented on HDFS-15372:
--

thanks for the work [~sodonnell]  , overall the  code looks good 

some comments 

1) AFAIK Only the Snapshot INode will have same Id as of INode's Parent Id , so 
you can use something like iip.getINode(iip.getLength-1).getId() != 
iip.getINode(iip.length()-1).getParent().getId()  instead of checking 
!iip.isDotSnapshotDirPrefix() 

2) In FSPermissionChecker we can get inodes path components by using 
INodesInPath#fromINode , but this method requires rootDir , which you have to 
get when FSDirectory calls FSPermissionChecker#checkTraverse or any other 
better way , upon this changes  you can do same as you have done for 
FSDirectory#getAttributes

kindly correct me if  i am wrong , thanks

> Files in snapshots no longer see attribute provider permissions
> ---
>
> Key: HDFS-15372
> URL: https://issues.apache.org/jira/browse/HDFS-15372
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15372.001.patch, HDFS-15372.002.patch
>
>
> Given a cluster with an authorization provider configured (eg Sentry) and the 
> paths covered by the provider are snapshotable, there was a change in 
> behaviour in how the provider permissions and ACLs are applied to files in 
> snapshots between the 2.x branch and Hadoop 3.0.
> Eg, if we have the snapshotable path /data, which is Sentry managed. The ACLs 
> below are provided by Sentry:
> {code}
> hadoop fs -getfacl -R /data
> # file: /data
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::---
> group:flume:rwx
> user:hive:rwx
> group:hive:rwx
> group:testgroup:rwx
> mask::rwx
> other::--x
> /data/tab1
> {code}
> After taking a snapshot, the files in the snapshot do not see the provider 
> permissions:
> {code}
> hadoop fs -getfacl -R /data/.snapshot
> # file: /data/.snapshot
> # owner: 
> # group: 
> user::rwx
> group::rwx
> other::rwx
> # file: /data/.snapshot/snap1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/.snapshot/snap1/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> {code}
> However pre-Hadoop 3.0 (when the attribute provider etc was extensively 
> refactored) snapshots did get the provider permissions.
> The reason is this code in FSDirectory.java which ultimately calls the 
> attribute provider and passes the path we want permissions for:
> {code}
>   INodeAttributes getAttributes(INodesInPath iip)
>   throws IOException {
> INode node = FSDirectory.resolveLastINode(iip);
> int snapshot = iip.getPathSnapshotId();
> INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot);
> UserGroupInformation ugi = NameNode.getRemoteUser();
> INodeAttributeProvider ap = this.getUserFilteredAttributeProvider(ugi);
> if (ap != null) {
>   // permission checking sends the full components array including the
>   // first empty component for the root.  however file status
>   // related calls are expected to strip out the root component according
>   // to TestINodeAttributeProvider.
>   byte[][] components = iip.getPathComponents();
>   components = Arrays.copyOfRange(components, 1, components.length);
>   nodeAttrs = ap.getAttributes(components, nodeAttrs);
> }
> return nodeAttrs;
>   }
> {code}
> The line:
> {code}
> INode node = FSDirectory.resolveLastINode(iip);
> {code}
> Picks the last resolved Inode and if you then call node.getPathComponents, 
> for a path like '/data/.snapshot/snap1/tab1' it will return /data/tab1. It 
> resolves the snapshot path to its original location, but its still the 
> snapshot inode.
> However the logic passes 'iip.getPathComponents' which returns 
> "/user/.snapshot/snap1/tab" to the provider.
> The pre Hadoop 3.0 code passes the inode directly to the provider, and hence 
> it only ever sees the path as "/user/data/tab1".
> It is debatable which path should be passed to the provider - 
> /user/.snapshot/snap1/tab or /data/tab1 in the case of snapshots. However as 
> the behaviour has changed I feel we should ensure the old behaviour is 
> retained.
> It would also be fairly easy to provide a config switch so the provider gets 
> the full snapshot path or the resolved path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15390) client fails forever when namenode ipaddr changed

2020-06-08 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128243#comment-17128243
 ] 

hemanthboyina commented on HDFS-15390:
--

[~seanlook] you can click on the More option specified under the Jira title , 
In More you can select MOVE 

> client fails forever when namenode ipaddr changed
> -
>
> Key: HDFS-15390
> URL: https://issues.apache.org/jira/browse/HDFS-15390
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 2.9.2, 3.2.1
>Reporter: Sean Chow
>Priority: Major
> Attachments: HDFS-15390.01.patch
>
>
> For machine replacement, I replace my standby namenode with a new ipaddr and 
> keep the same hostname. Also update the client's hosts to make it resolve 
> correctly
> When I try to run failover to transite the new namenode(let's say nn2), the 
> client will fail to read or write forever until it's restarted.
> That make yarn nodemanager in sick state. Even the new tasks will encounter 
> this exception  too. Until all nodemanager restart.
>  
> {code:java}
> 20/06/02 15:12:25 WARN ipc.Client: Address change detected. Old: 
> nn2-192-168-1-100/192.168.1.100:9000 New: nn2-192-168-1-100/192.168.1.200:9000
> 20/06/02 15:12:25 DEBUG ipc.Client: closing ipc connection to 
> nn2-192-168-1-100/192.168.1.200:9000: Connection refused
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1517)
> at org.apache.hadoop.ipc.Client.call(Client.java:1440)
> at org.apache.hadoop.ipc.Client.call(Client.java:1401)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
> at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:193)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> {code}
>  
> We can see the client has {{Address change detected}}, but it still fails. I 
> find out that's because when method {{updateAddress()}} return true,  the 
> {{handleConnectionFailure()}} thow an exception that break the next retry 
> with the right ipaddr.
> Client.java: setupConnection()
> {code:java}
> } catch (ConnectTimeoutException toe) {
>   /* Check for an address change and update the local reference.
>* Reset the failure counter if the address was changed
>*/
>   if (updateAddress()) {
> timeoutFailures = ioFailures = 0;
>   }
>   handleConnectionTimeout(timeoutFailures++,
>   maxRetriesOnSocketTimeouts, toe);
> } catch (IOException ie) {
>   if (updateAddress()) {
> timeoutFailures = ioFailures = 0;
>   }
> // because the namenode ip changed in updateAddress(), the old namenode 
> ipaddress cannot be accessed now
> // handleConnectionFailure will thow an exception, the next retry never have 
> a chance to use the right server updated in updateAddress()
>   handleConnectionFailure(ioFailures++, ie);
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15372) Files in snapshots no longer see attribute provider permissions

2020-06-07 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127661#comment-17127661
 ] 

hemanthboyina commented on HDFS-15372:
--

thanks for the very clear explanation [~sodonnell]

> Files in snapshots no longer see attribute provider permissions
> ---
>
> Key: HDFS-15372
> URL: https://issues.apache.org/jira/browse/HDFS-15372
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15372.001.patch
>
>
> Given a cluster with an authorization provider configured (eg Sentry) and the 
> paths covered by the provider are snapshotable, there was a change in 
> behaviour in how the provider permissions and ACLs are applied to files in 
> snapshots between the 2.x branch and Hadoop 3.0.
> Eg, if we have the snapshotable path /data, which is Sentry managed. The ACLs 
> below are provided by Sentry:
> {code}
> hadoop fs -getfacl -R /data
> # file: /data
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::---
> group:flume:rwx
> user:hive:rwx
> group:hive:rwx
> group:testgroup:rwx
> mask::rwx
> other::--x
> /data/tab1
> {code}
> After taking a snapshot, the files in the snapshot do not see the provider 
> permissions:
> {code}
> hadoop fs -getfacl -R /data/.snapshot
> # file: /data/.snapshot
> # owner: 
> # group: 
> user::rwx
> group::rwx
> other::rwx
> # file: /data/.snapshot/snap1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/.snapshot/snap1/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> {code}
> However pre-Hadoop 3.0 (when the attribute provider etc was extensively 
> refactored) snapshots did get the provider permissions.
> The reason is this code in FSDirectory.java which ultimately calls the 
> attribute provider and passes the path we want permissions for:
> {code}
>   INodeAttributes getAttributes(INodesInPath iip)
>   throws IOException {
> INode node = FSDirectory.resolveLastINode(iip);
> int snapshot = iip.getPathSnapshotId();
> INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot);
> UserGroupInformation ugi = NameNode.getRemoteUser();
> INodeAttributeProvider ap = this.getUserFilteredAttributeProvider(ugi);
> if (ap != null) {
>   // permission checking sends the full components array including the
>   // first empty component for the root.  however file status
>   // related calls are expected to strip out the root component according
>   // to TestINodeAttributeProvider.
>   byte[][] components = iip.getPathComponents();
>   components = Arrays.copyOfRange(components, 1, components.length);
>   nodeAttrs = ap.getAttributes(components, nodeAttrs);
> }
> return nodeAttrs;
>   }
> {code}
> The line:
> {code}
> INode node = FSDirectory.resolveLastINode(iip);
> {code}
> Picks the last resolved Inode and if you then call node.getPathComponents, 
> for a path like '/data/.snapshot/snap1/tab1' it will return /data/tab1. It 
> resolves the snapshot path to its original location, but its still the 
> snapshot inode.
> However the logic passes 'iip.getPathComponents' which returns 
> "/user/.snapshot/snap1/tab" to the provider.
> The pre Hadoop 3.0 code passes the inode directly to the provider, and hence 
> it only ever sees the path as "/user/data/tab1".
> It is debatable which path should be passed to the provider - 
> /user/.snapshot/snap1/tab or /data/tab1 in the case of snapshots. However as 
> the behaviour has changed I feel we should ensure the old behaviour is 
> retained.
> It would also be fairly easy to provide a config switch so the provider gets 
> the full snapshot path or the resolved path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15378) TestReconstructStripedFile#testErasureCodingWorkerXmitsWeight is failing on trunk

2020-06-07 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127656#comment-17127656
 ] 

hemanthboyina commented on HDFS-15378:
--

thanks for the comment [~elgoiri]

i just had taken some time to understand why exactly we are facing the issue 
here , and had attached some logs in test and at while getting xceiver count 
and xmits count

Datanode (127.0.0.1:38635) has received the command for recovery
{code:java}
2020-06-07 18:55:10,052 [Command processor] INFO  datanode.DataNode 
(BPOfferService.java:processCommandFromActive(795)) - DatanodeCommand action: 
DNA_ERASURE_CODING_RECOVERY
2020-06-07 18:55:10,086 [DataXceiver for client  at /127.0.0.1:47330 [Receiving 
block BP-1804107793-127.0.1.1-1591536306390:blk_-9223372036854775787_1001]] 
INFO  datanode.DataNode (DataXceiver.java:writeBlock(747)) - Receiving 
BP-1804107793-127.0.1.1-1591536306390:blk_-9223372036854775787_1001 src: 
/127.0.0.1:47330 dest: /127.0.0.1:38635 {code}
in Test , Datanode (127.0.0.1:38635) has received check for xciver count and 
its xmits count , as the xceiver count was 1 the test went to next line where 
we are asserting xmits count without a wait , meanwhile reconstruction work is 
happening in parallel
{code:java}
##2020-06-07 18:55:10,575 [Listener at localhost/42653] WARN  
hdfs.TestReconstructStripedFile 
(TestReconstructStripedFile.java:testErasureCodingWorkerXmitsWeight(577)) - 
called test xciver count = 1 DatanodeRegistration(127.0.0.1:38635, 
datanodeUuid=d6f1f0ed-be4d-403a-bc00-32d950681c2a, infoPort=38759, 
infoSecurePort=0, ipcPort=35273, 
storageInfo=lv=-57;cid=testClusterID;nsid=1421432615;c=1591536306390)
2020-06-07 18:55:10,575 [Listener at localhost/42653] WARN  datanode.DataNode 
(DataNode.java:getXceiverCount(2252)) - called xciver count on test call 1
2020-06-07 18:55:10,576 [Listener at localhost/42653] WARN  
hdfs.TestReconstructStripedFile 
(TestReconstructStripedFile.java:testErasureCodingWorkerXmitsWeight(580)) - 
called test xmits count 
2020-06-07 18:55:10,576 [Listener at localhost/42653] WARN  datanode.DataNode 
(DataNode.java:getXmitsInProgress(2292)) - called xmits on test call3
2020-06-07 18:55:10,637 [Listener at localhost/42653] WARN  datanode.DataNode 
(DataNode.java:getXmitsInProgress(2292)) - called xmits on test call3
##2020-06-07 18:55:10,662 [StripedBlockReconstruction-0] WARN  
datanode.DataNode (StripedBlockReconstructor.java:run(74)) - Reconstruction 
happened{code}
upon waiting , the xmits has become zero and on completing the task the xciever 
count also become zero
{code:java}
2020-06-07 18:55:10,663 [Block report processor] DEBUG BlockStateChange 
(BlockManager.java:processIncrementalBlockReport(4331)) - *BLOCK* 
NameNode.processIncrementalBlockReport: from 127.0.0.1:38635 receiving: 0, 
received: 1, deleted: 0
2020-06-07 18:55:10,665 [Listener at localhost/39641] WARN  datanode.DataNode 
(DataNode.java:getXceiverCount(2252)) - called xciver count on test call 0
##2020-06-07 18:55:10,665 [Listener at localhost/42653] WARN 
hdfs.TestReconstructStripedFile 
(TestReconstructStripedFile.java:testErasureCodingWorkerXmitsWeight(577)) - 
called test xciver count = 0 DatanodeRegistration(127.0.0.1:38635, 
datanodeUuid=d6f1f0ed-be4d-403a-bc00-32d950681c2a, infoPort=38759, 
infoSecurePort=0, ipcPort=35273, 
storageInfo=lv=-57;cid=testClusterID;nsid=1421432615;c=1591536306390){code}

> TestReconstructStripedFile#testErasureCodingWorkerXmitsWeight is failing on 
> trunk
> -
>
> Key: HDFS-15378
> URL: https://issues.apache.org/jira/browse/HDFS-15378
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Priority: Major
> Attachments: HDFS-15378.001.patch
>
>
> [https://builds.apache.org/job/PreCommit-HDFS-Build/29377/#showFailuresLink]
> [https://builds.apache.org/job/PreCommit-HDFS-Build/29368/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15351) Blocks Scheduled Count was wrong on Truncate

2020-06-06 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127413#comment-17127413
 ] 

hemanthboyina commented on HDFS-15351:
--

thanks [~belugabehr] for the review 

> Blocks Scheduled Count was wrong on Truncate 
> -
>
> Key: HDFS-15351
> URL: https://issues.apache.org/jira/browse/HDFS-15351
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15351.001.patch, HDFS-15351.002.patch, 
> HDFS-15351.003.patch
>
>
> On truncate and append we remove the blocks from Reconstruction Queue 
> On removing the blocks from pending reconstruction , we need to decrement 
> Blocks Scheduled 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15372) Files in snapshots no longer see attribute provider permissions

2020-06-06 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127412#comment-17127412
 ] 

hemanthboyina commented on HDFS-15372:
--

thanks for good analysis [~sodonnell]
{quote}With the 001 patch in place, if you try to list 
/data/.snapshot/snapshot_1, the path seen by the attribute provider is:

/user/snapshot_1

Before, it was:

/user/.snapshot/snapshot1

When checking a path like /data/.snapshot/snap1 the provider will see 
/data/snap1, but on the branch-2, it would have seen /data/.snapshot/snap1.
{quote}
is the path seen by the attribute provider for branch and trunk was same ? it 
was bit confusing , can you add all in one comment with an example for a 
snapshot path  

If we try list for a path , the path will be resolved as Inodes from 
InodeInPath , and the same inodes components will be used by the provider right 
? and INodesInPath handles .snapshot part of a path

While creating a snapshot we add the inode directory as the root to snapshot 
{code:java}
DirectorySnapshottableFeature#createSnaphot 
public Snapshot addSnapshot(INodeDirectory snapshotRoot, int id, String name,
 final Snapshot s = new Snapshot(id, name, snapshotRoot); {code}
While getting inodesInPath for a file in snapshot we use the root of snapshot 
to get the file , IMO that means the if the file has an acl the file under 
snapshot root should have acl
{code:java}
 if (isDotSnapshotDir(childName) && dir.isSnapshottable()) {
final Snapshot s = dir.getSnapshot(components[count + 1]);
 else {
  curNode = s.getRoot();
   snapshotId = s.getId();
   } {code}
please correct me if am missing some thing here

> Files in snapshots no longer see attribute provider permissions
> ---
>
> Key: HDFS-15372
> URL: https://issues.apache.org/jira/browse/HDFS-15372
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15372.001.patch
>
>
> Given a cluster with an authorization provider configured (eg Sentry) and the 
> paths covered by the provider are snapshotable, there was a change in 
> behaviour in how the provider permissions and ACLs are applied to files in 
> snapshots between the 2.x branch and Hadoop 3.0.
> Eg, if we have the snapshotable path /data, which is Sentry managed. The ACLs 
> below are provided by Sentry:
> {code}
> hadoop fs -getfacl -R /data
> # file: /data
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::---
> group:flume:rwx
> user:hive:rwx
> group:hive:rwx
> group:testgroup:rwx
> mask::rwx
> other::--x
> /data/tab1
> {code}
> After taking a snapshot, the files in the snapshot do not see the provider 
> permissions:
> {code}
> hadoop fs -getfacl -R /data/.snapshot
> # file: /data/.snapshot
> # owner: 
> # group: 
> user::rwx
> group::rwx
> other::rwx
> # file: /data/.snapshot/snap1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/.snapshot/snap1/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> {code}
> However pre-Hadoop 3.0 (when the attribute provider etc was extensively 
> refactored) snapshots did get the provider permissions.
> The reason is this code in FSDirectory.java which ultimately calls the 
> attribute provider and passes the path we want permissions for:
> {code}
>   INodeAttributes getAttributes(INodesInPath iip)
>   throws IOException {
> INode node = FSDirectory.resolveLastINode(iip);
> int snapshot = iip.getPathSnapshotId();
> INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot);
> UserGroupInformation ugi = NameNode.getRemoteUser();
> INodeAttributeProvider ap = this.getUserFilteredAttributeProvider(ugi);
> if (ap != null) {
>   // permission checking sends the full components array including the
>   // first empty component for the root.  however file status
>   // related calls are expected to strip out the root component according
>   // to TestINodeAttributeProvider.
>   byte[][] components = iip.getPathComponents();
>   components = Arrays.copyOfRange(components, 1, components.length);
>   nodeAttrs = ap.getAttributes(components, nodeAttrs);
> }
> return nodeAttrs;
>   }
> {code}
> The line:
> {code}
> INode node = FSDirectory.resolveLastINode(iip);
> {code}
> Picks the last resolved Inode and if you then call node.getPathComponents, 
> for a path like '/data/.snapshot/snap1/tab1' it will return /data/tab1. It 
> resolves the snapshot path to its original location, but its still the 
> snapshot inode.
> However the logic passes 'iip.getPathComponents' which returns 
> "/user/.snapshot/snap1/tab" to the provider.
> 

[jira] [Commented] (HDFS-15246) ArrayIndexOfboundsException in BlockManager CreateLocatedBlock

2020-06-03 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125141#comment-17125141
 ] 

hemanthboyina commented on HDFS-15246:
--

thanks for the review ![~elgoiri]

test failures are not related to this jira

> ArrayIndexOfboundsException in BlockManager CreateLocatedBlock
> --
>
> Key: HDFS-15246
> URL: https://issues.apache.org/jira/browse/HDFS-15246
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15246-testrepro.patch, HDFS-15246.001.patch, 
> HDFS-15246.002.patch, HDFS-15246.003.patch
>
>
> java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
>  
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:1362)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:1501)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:179)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2047)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:770)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15375) Reconstruction Work should not happen for Corrupt Block

2020-06-03 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124998#comment-17124998
 ] 

hemanthboyina commented on HDFS-15375:
--

test failures were not related
{quote}We can't remove {{pendingNum}} from here, it will create extra 
replication task if this count doesn't include pendingNum
{quote}
i think it does not create extra replication task , because the pendingNum 
count is for selecting in which priority level the block should be added or 
updated in priority queue 

> Reconstruction Work should not happen for Corrupt Block
> ---
>
> Key: HDFS-15375
> URL: https://issues.apache.org/jira/browse/HDFS-15375
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15375-testrepro.patch, HDFS-15375.001.patch
>
>
> In BlockManager#updateNeededReconstructions , while updating the 
> NeededReconstruction we are adding Pendingreconstruction blocks to live 
> replicas
> {code:java}
>  int pendingNum = pendingReconstruction.getNumReplicas(block);
>   int curExpectedReplicas = getExpectedRedundancyNum(block);
>   if (!hasEnoughEffectiveReplicas(block, repl, pendingNum)) {
> neededReconstruction.update(block, repl.liveReplicas() + 
> pendingNum,{code}
> But if two replicas were in pending reconstruction (due to corruption) , and 
> if the third replica is corrupted the block should be in 
> QUEUE_WITH_CORRUPT_BLOCKS but because of above logic it was getting added in 
> QUEUE_LOW_REDUNDANCY , this makes the RedudancyMonitor to reconstruct a 
> corrupted block , which is wrong



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15246) ArrayIndexOfboundsException in BlockManager CreateLocatedBlock

2020-05-31 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120463#comment-17120463
 ] 

hemanthboyina commented on HDFS-15246:
--

thanks for the review [~elgoiri]

have updated the patch , please review

> ArrayIndexOfboundsException in BlockManager CreateLocatedBlock
> --
>
> Key: HDFS-15246
> URL: https://issues.apache.org/jira/browse/HDFS-15246
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15246-testrepro.patch, HDFS-15246.001.patch, 
> HDFS-15246.002.patch, HDFS-15246.003.patch
>
>
> java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
>  
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:1362)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:1501)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:179)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2047)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:770)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15246) ArrayIndexOfboundsException in BlockManager CreateLocatedBlock

2020-05-31 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15246:
-
Attachment: HDFS-15246.003.patch

> ArrayIndexOfboundsException in BlockManager CreateLocatedBlock
> --
>
> Key: HDFS-15246
> URL: https://issues.apache.org/jira/browse/HDFS-15246
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15246-testrepro.patch, HDFS-15246.001.patch, 
> HDFS-15246.002.patch, HDFS-15246.003.patch
>
>
> java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
>  
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:1362)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:1501)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:179)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2047)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:770)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15375) Reconstruction Work should not happen for Corrupt Block

2020-05-29 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119811#comment-17119811
 ] 

hemanthboyina commented on HDFS-15375:
--

thanks [~surendrasingh] for the comment

we have a configuration dfs.namenode.reconstruction.pending.timeout-sec which 
is by default 5mins , after 5mins the blocks in pending reconstruction will be 
timedout  and will be moved to needed reconstruction  by redundancy monitor 
thread , so now on moving to needed reconstruction the block will be kept on 
QUEUE_WITH_CORRUPT_BLOCKS

and even fsck uses this priority queue to get corrupt blocks by 
QUEUE_WITH_CORRUPT_BLOCKS , so data mismatch will be happen here too

> Reconstruction Work should not happen for Corrupt Block
> ---
>
> Key: HDFS-15375
> URL: https://issues.apache.org/jira/browse/HDFS-15375
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15375-testrepro.patch, HDFS-15375.001.patch
>
>
> In BlockManager#updateNeededReconstructions , while updating the 
> NeededReconstruction we are adding Pendingreconstruction blocks to live 
> replicas
> {code:java}
>  int pendingNum = pendingReconstruction.getNumReplicas(block);
>   int curExpectedReplicas = getExpectedRedundancyNum(block);
>   if (!hasEnoughEffectiveReplicas(block, repl, pendingNum)) {
> neededReconstruction.update(block, repl.liveReplicas() + 
> pendingNum,{code}
> But if two replicas were in pending reconstruction (due to corruption) , and 
> if the third replica is corrupted the block should be in 
> QUEUE_WITH_CORRUPT_BLOCKS but because of above logic it was getting added in 
> QUEUE_LOW_REDUNDANCY , this makes the RedudancyMonitor to reconstruct a 
> corrupted block , which is wrong



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14901) RBF: Add Encryption Zone related ClientProtocol APIs

2020-05-29 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119660#comment-17119660
 ] 

hemanthboyina commented on HDFS-14901:
--

we have hit this issue again
{quote}Should we do the federated token approach
{quote}
thanks [~elgoiri] , but i think this doesn't work out , as router doesn't know 
for which name service the getDataEncryptionKey call has happened

We are thinking of adding  Name Service in RPC header (an Incompatible change) 
, so  router can get the key from that NS

please say your opinion on this

> RBF: Add Encryption Zone related ClientProtocol APIs
> 
>
> Key: HDFS-14901
> URL: https://issues.apache.org/jira/browse/HDFS-14901
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14901.001.patch, HDFS-14901.002.patch, 
> HDFS-14901.003.patch
>
>
> Currently listEncryptionZones,reencryptEncryptionZone,listReencryptionStatus 
> these APIs are not implemented in Router.
> This JIRA is intend to implement above mentioned APIs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15246) ArrayIndexOfboundsException in BlockManager CreateLocatedBlock

2020-05-28 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119220#comment-17119220
 ] 

hemanthboyina commented on HDFS-15246:
--

thanks [~elgoiri] for review 

i have updated the patch , please review

> ArrayIndexOfboundsException in BlockManager CreateLocatedBlock
> --
>
> Key: HDFS-15246
> URL: https://issues.apache.org/jira/browse/HDFS-15246
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15246-testrepro.patch, HDFS-15246.001.patch, 
> HDFS-15246.002.patch
>
>
> java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
>  
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:1362)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:1501)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:179)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2047)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:770)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15246) ArrayIndexOfboundsException in BlockManager CreateLocatedBlock

2020-05-28 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15246:
-
Attachment: HDFS-15246.002.patch

> ArrayIndexOfboundsException in BlockManager CreateLocatedBlock
> --
>
> Key: HDFS-15246
> URL: https://issues.apache.org/jira/browse/HDFS-15246
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15246-testrepro.patch, HDFS-15246.001.patch, 
> HDFS-15246.002.patch
>
>
> java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
>  
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:1362)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:1501)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:179)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2047)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:770)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15378) TestReconstructStripedFile#testErasureCodingWorkerXmitsWeight is failing on trunk

2020-05-28 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15378:
-
Attachment: HDFS-15378.001.patch
Status: Patch Available  (was: Open)

> TestReconstructStripedFile#testErasureCodingWorkerXmitsWeight is failing on 
> trunk
> -
>
> Key: HDFS-15378
> URL: https://issues.apache.org/jira/browse/HDFS-15378
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Priority: Major
> Attachments: HDFS-15378.001.patch
>
>
> [https://builds.apache.org/job/PreCommit-HDFS-Build/29377/#showFailuresLink]
> [https://builds.apache.org/job/PreCommit-HDFS-Build/29368/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15378) TestReconstructStripedFile#testErasureCodingWorkerXmitsWeight is failing on trunk

2020-05-28 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118963#comment-17118963
 ] 

hemanthboyina commented on HDFS-15378:
--

there was a slight delay in the test case run , if we wait for 
curDn.getXmitsInProgress() == 0 , the test case was success

> TestReconstructStripedFile#testErasureCodingWorkerXmitsWeight is failing on 
> trunk
> -
>
> Key: HDFS-15378
> URL: https://issues.apache.org/jira/browse/HDFS-15378
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Priority: Major
>
> [https://builds.apache.org/job/PreCommit-HDFS-Build/29377/#showFailuresLink]
> [https://builds.apache.org/job/PreCommit-HDFS-Build/29368/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15378) TestReconstructStripedFile#testErasureCodingWorkerXmitsWeight is failing on trunk

2020-05-28 Thread hemanthboyina (Jira)
hemanthboyina created HDFS-15378:


 Summary: 
TestReconstructStripedFile#testErasureCodingWorkerXmitsWeight is failing on 
trunk
 Key: HDFS-15378
 URL: https://issues.apache.org/jira/browse/HDFS-15378
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: hemanthboyina


[https://builds.apache.org/job/PreCommit-HDFS-Build/29377/#showFailuresLink]

[https://builds.apache.org/job/PreCommit-HDFS-Build/29368/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15351) Blocks Scheduled Count was wrong on Truncate

2020-05-27 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117993#comment-17117993
 ] 

hemanthboyina commented on HDFS-15351:
--

{quote} I've always hated this list to array interface...
[~belugabehr] you are the expert on these things; any alternative?
{quote}
hi [~belugabehr] , you have any alternatives or suggestions for this ?

> Blocks Scheduled Count was wrong on Truncate 
> -
>
> Key: HDFS-15351
> URL: https://issues.apache.org/jira/browse/HDFS-15351
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15351.001.patch, HDFS-15351.002.patch, 
> HDFS-15351.003.patch
>
>
> On truncate and append we remove the blocks from Reconstruction Queue 
> On removing the blocks from pending reconstruction , we need to decrement 
> Blocks Scheduled 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15246) ArrayIndexOfboundsException in BlockManager CreateLocatedBlock

2020-05-27 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15246:
-
Attachment: HDFS-15246.001.patch
Status: Patch Available  (was: Open)

> ArrayIndexOfboundsException in BlockManager CreateLocatedBlock
> --
>
> Key: HDFS-15246
> URL: https://issues.apache.org/jira/browse/HDFS-15246
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15246-testrepro.patch, HDFS-15246.001.patch
>
>
> java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
>  
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:1362)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:1501)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:179)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2047)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:770)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15375) Reconstruction Work should not happen for Corrupt Block

2020-05-27 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117980#comment-17117980
 ] 

hemanthboyina commented on HDFS-15375:
--

ran test failures in local , seems not related

 

org.apache.hadoop.hdfs.TestReconstructStripedFile.testErasureCodingWorkerXmitsWeight
                          
org.apache.hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy.testErasureCodingWorkerXmitsWeight

these tests were failing even without this patch , following up on these tests 
, found they were failing continonusly

[https://builds.apache.org/job/PreCommit-HDFS-Build/29368/]

[https://builds.apache.org/job/PreCommit-HDFS-Build/29366/|https://builds.apache.org/job/PreCommit-HDFS-Build/29366/#showFailuresLink]

[https://builds.apache.org/job/PreCommit-HDFS-Build/29358/]

  

> Reconstruction Work should not happen for Corrupt Block
> ---
>
> Key: HDFS-15375
> URL: https://issues.apache.org/jira/browse/HDFS-15375
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15375-testrepro.patch, HDFS-15375.001.patch
>
>
> In BlockManager#updateNeededReconstructions , while updating the 
> NeededReconstruction we are adding Pendingreconstruction blocks to live 
> replicas
> {code:java}
>  int pendingNum = pendingReconstruction.getNumReplicas(block);
>   int curExpectedReplicas = getExpectedRedundancyNum(block);
>   if (!hasEnoughEffectiveReplicas(block, repl, pendingNum)) {
> neededReconstruction.update(block, repl.liveReplicas() + 
> pendingNum,{code}
> But if two replicas were in pending reconstruction (due to corruption) , and 
> if the third replica is corrupted the block should be in 
> QUEUE_WITH_CORRUPT_BLOCKS but because of above logic it was getting added in 
> QUEUE_LOW_REDUNDANCY , this makes the RedudancyMonitor to reconstruct a 
> corrupted block , which is wrong



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15376) Update the error about command line POST in httpfs documentation

2020-05-26 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116921#comment-17116921
 ] 

hemanthboyina commented on HDFS-15376:
--

No , this is fine [~elgoiri]

> Update the error about command line POST in httpfs documentation
> 
>
> Key: HDFS-15376
> URL: https://issues.apache.org/jira/browse/HDFS-15376
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Affects Versions: 3.2.1
>Reporter: bianqi
>Assignee: bianqi
>Priority: Major
> Attachments: HDFS-15376.001.patch
>
>
>    In the official Hadoop documentation, there is an exception when executing 
> the following command.
> {quote} {{curl -X POST 
> 'http://httpfs-host:14000/webhdfs/v1/user/foo/bar?op=MKDIRS=foo'}} 
> creates the HDFS {{/user/foo/bar}} directory.
> {quote}
>      Command line returns results:
> {quote}     *{"RemoteException":{"message":"Invalid HTTP POST operation 
> [MKDIRS]","exception":"IOException","javaClassName":"java.io.IOException"}}*
> {quote}
>      
> I checked the source code and found that the way to create the file should 
> use PUT to submit the form.
>     I modified to execute the command in PUT mode and got the result as 
> follows
> {quote}     {{curl -X PUT 
> 'http://httpfs-host:14000/webhdfs/v1/user/foo/bar?op=MKDIRS=foo'}} 
> creates the HDFS {{/user/foo/bar}} directory.
> {quote}
>      Command line returns results:
> {"boolean":true}
> . At the same time the folder is created successfully.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15375) Reconstruction Work should not happen for Corrupt Block

2020-05-26 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15375:
-
Attachment: HDFS-15375.001.patch
Status: Patch Available  (was: Open)

> Reconstruction Work should not happen for Corrupt Block
> ---
>
> Key: HDFS-15375
> URL: https://issues.apache.org/jira/browse/HDFS-15375
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15375-testrepro.patch, HDFS-15375.001.patch
>
>
> In BlockManager#updateNeededReconstructions , while updating the 
> NeededReconstruction we are adding Pendingreconstruction blocks to live 
> replicas
> {code:java}
>  int pendingNum = pendingReconstruction.getNumReplicas(block);
>   int curExpectedReplicas = getExpectedRedundancyNum(block);
>   if (!hasEnoughEffectiveReplicas(block, repl, pendingNum)) {
> neededReconstruction.update(block, repl.liveReplicas() + 
> pendingNum,{code}
> But if two replicas were in pending reconstruction (due to corruption) , and 
> if the third replica is corrupted the block should be in 
> QUEUE_WITH_CORRUPT_BLOCKS but because of above logic it was getting added in 
> QUEUE_LOW_REDUNDANCY , this makes the RedudancyMonitor to reconstruct a 
> corrupted block , which is wrong



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14984) HDFS setQuota: Error message should be added for invalid input max range value to hdfs dfsadmin -setQuota command

2020-05-26 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116475#comment-17116475
 ] 

hemanthboyina commented on HDFS-14984:
--

thanks for the interest [~zhaoyim] 

you can work on this issue and  assign this issue to yourself 

 

> HDFS setQuota: Error message should be added for invalid input max range 
> value to hdfs dfsadmin -setQuota command
> -
>
> Key: HDFS-14984
> URL: https://issues.apache.org/jira/browse/HDFS-14984
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Souryakanta Dwivedy
>Priority: Minor
> Attachments: image-2019-11-13-14-05-19-603.png, 
> image-2019-11-13-14-07-04-536.png
>
>
> An error message should be added for invalid input max range value 
> "9223372036854775807" to hdfs dfsadmin -setQuota command
>  * set quota for a directory with invalid input vlaue as 
> "9223372036854775807"- set quota for a directory with invalid input vlaue as 
> "9223372036854775807"   the command will be successful without displaying any 
> result.Quota value    will not be set for the directory internally,but it 
> will be better from user usage point of view  if an error message will 
> display for the invalid max range value "9223372036854775807" as it is 
> displaying    while setting the input value as "0"   For example "hdfs 
> dfsadmin -setQuota  9223372036854775807 /quota"        
>              !image-2019-11-13-14-05-19-603.png!
>  
>  *   - Try to set quota for a directory with invalid input value as "0"   It 
> will throw an error message as "setQuota: Invalid values for quota : 0 and 
> 9223372036854775807"       For example "hdfs dfsadmin -setQuota 0 /quota" 
>           !image-2019-11-13-14-07-04-536.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15375) Reconstruction Work should not happen for Corrupt Block

2020-05-25 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15375:
-
Description: 
In BlockManager#updateNeededReconstructions , while updating the 
NeededReconstruction we are adding Pendingreconstruction blocks to live replicas
{code:java}
 int pendingNum = pendingReconstruction.getNumReplicas(block);
  int curExpectedReplicas = getExpectedRedundancyNum(block);
  if (!hasEnoughEffectiveReplicas(block, repl, pendingNum)) {
neededReconstruction.update(block, repl.liveReplicas() + 
pendingNum,{code}
But if two replicas were in pending reconstruction (due to corruption) , and if 
the third replica is corrupted the block should be in QUEUE_WITH_CORRUPT_BLOCKS 
but because of above logic it was getting added in QUEUE_LOW_REDUNDANCY , this 
makes the RedudancyMonitor to reconstruct a corrupted block , which is wrong

> Reconstruction Work should not happen for Corrupt Block
> ---
>
> Key: HDFS-15375
> URL: https://issues.apache.org/jira/browse/HDFS-15375
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15375-testrepro.patch
>
>
> In BlockManager#updateNeededReconstructions , while updating the 
> NeededReconstruction we are adding Pendingreconstruction blocks to live 
> replicas
> {code:java}
>  int pendingNum = pendingReconstruction.getNumReplicas(block);
>   int curExpectedReplicas = getExpectedRedundancyNum(block);
>   if (!hasEnoughEffectiveReplicas(block, repl, pendingNum)) {
> neededReconstruction.update(block, repl.liveReplicas() + 
> pendingNum,{code}
> But if two replicas were in pending reconstruction (due to corruption) , and 
> if the third replica is corrupted the block should be in 
> QUEUE_WITH_CORRUPT_BLOCKS but because of above logic it was getting added in 
> QUEUE_LOW_REDUNDANCY , this makes the RedudancyMonitor to reconstruct a 
> corrupted block , which is wrong



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15375) Reconstruction Work should not happen for Corrupt Block

2020-05-25 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15375:
-
Attachment: HDFS-15375-testrepro.patch

> Reconstruction Work should not happen for Corrupt Block
> ---
>
> Key: HDFS-15375
> URL: https://issues.apache.org/jira/browse/HDFS-15375
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15375-testrepro.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15375) Reconstruction Work should not happen for Corrupt Block

2020-05-25 Thread hemanthboyina (Jira)
hemanthboyina created HDFS-15375:


 Summary: Reconstruction Work should not happen for Corrupt Block
 Key: HDFS-15375
 URL: https://issues.apache.org/jira/browse/HDFS-15375
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: hemanthboyina
Assignee: hemanthboyina






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15288) Add Available Space Rack Fault Tolerant BPP

2020-05-24 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17115122#comment-17115122
 ] 

hemanthboyina edited comment on HDFS-15288 at 5/24/20, 11:07 AM:
-

Good work [~ayushtkn] , just a small query 

In AvailableSpaceRackFaultTolerantBlockPlacementPolicy#chooseDataNode , instead 
of  chooseRandomWithStorageType can we use chooseRandomWithStorageTypeTwoTrial 
? , as it was in AvailableSpaceBpp


was (Author: hemanthboyina):
Good work [~ayushtkn] , just a small doubt 

> Add Available Space Rack Fault Tolerant BPP
> ---
>
> Key: HDFS-15288
> URL: https://issues.apache.org/jira/browse/HDFS-15288
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15288-01.patch, HDFS-15288-02.patch, 
> HDFS-15288-03.patch
>
>
> The Present {{AvailableSpaceBlockPlacementPolicy}} extends the Default Block 
> Placement policy, which makes it apt for Replicated files. But not very 
> efficient for EC files, which by default use. 
> {{BlockPlacementPolicyRackFaultTolerant}}. So propose a to add new BPP having 
> similar optimization as ASBPP where as keeping the spread of Blocks to max 
> racks, i.e as RackFaultTolerantBPP.
> This could extend {{BlockPlacementPolicyRackFaultTolerant}}, rather than the 
> {{BlockPlacementPOlicyDefault}} like ASBPP and keep other logics of 
> optimization same as ASBPP



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15288) Add Available Space Rack Fault Tolerant BPP

2020-05-24 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17115122#comment-17115122
 ] 

hemanthboyina commented on HDFS-15288:
--

Good work [~ayushtkn] , just a small doubt 

> Add Available Space Rack Fault Tolerant BPP
> ---
>
> Key: HDFS-15288
> URL: https://issues.apache.org/jira/browse/HDFS-15288
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15288-01.patch, HDFS-15288-02.patch, 
> HDFS-15288-03.patch
>
>
> The Present {{AvailableSpaceBlockPlacementPolicy}} extends the Default Block 
> Placement policy, which makes it apt for Replicated files. But not very 
> efficient for EC files, which by default use. 
> {{BlockPlacementPolicyRackFaultTolerant}}. So propose a to add new BPP having 
> similar optimization as ASBPP where as keeping the spread of Blocks to max 
> racks, i.e as RackFaultTolerantBPP.
> This could extend {{BlockPlacementPolicyRackFaultTolerant}}, rather than the 
> {{BlockPlacementPOlicyDefault}} like ASBPP and keep other logics of 
> optimization same as ASBPP



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15362) FileWithSnapshotFeature#updateQuotaAndCollectBlocks should collect all distinct blocks

2020-05-21 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113450#comment-17113450
 ] 

hemanthboyina commented on HDFS-15362:
--

thanks [~elgoiri] for the review

i have updated the patch , please review

> FileWithSnapshotFeature#updateQuotaAndCollectBlocks should collect all 
> distinct blocks
> --
>
> Key: HDFS-15362
> URL: https://issues.apache.org/jira/browse/HDFS-15362
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15362.001.patch, HDFS-15362.002.patch
>
>
> FileWithSnapshotFeature#updateQuotaAndCollectBlocks uses list to collect 
> blocks 
> {code:java}
>  List allBlocks = new ArrayList();
>  if (file.getBlocks() != null) {
> allBlocks.addAll(Arrays.asList(file.getBlocks()));
>   }{code}
>  INodeFile#storagespaceConsumedContiguous collects all distinct blocks by set
> {code:java}
> // Collect all distinct blocks
>  Set allBlocks = new HashSet<>(Arrays.asList(getBlocks()));
>  DiffList diffs = sf.getDiffs().asList();
>  for(FileDiff diff : diffs) {
>BlockInfo[] diffBlocks = diff.getBlocks();
>if (diffBlocks != null) {
>  allBlocks.addAll(Arrays.asList(diffBlocks));
>  } {code}
> but on updating the reclaim context we subtract these both , so wrong quota 
> value can be updated
> {code:java}
> QuotaCounts current = file.storagespaceConsumed(bsp);
> reclaimContext.quotaDelta().add(oldCounts.subtract(current)); {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15362) FileWithSnapshotFeature#updateQuotaAndCollectBlocks should collect all distinct blocks

2020-05-21 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15362:
-
Attachment: HDFS-15362.002.patch

> FileWithSnapshotFeature#updateQuotaAndCollectBlocks should collect all 
> distinct blocks
> --
>
> Key: HDFS-15362
> URL: https://issues.apache.org/jira/browse/HDFS-15362
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15362.001.patch, HDFS-15362.002.patch
>
>
> FileWithSnapshotFeature#updateQuotaAndCollectBlocks uses list to collect 
> blocks 
> {code:java}
>  List allBlocks = new ArrayList();
>  if (file.getBlocks() != null) {
> allBlocks.addAll(Arrays.asList(file.getBlocks()));
>   }{code}
>  INodeFile#storagespaceConsumedContiguous collects all distinct blocks by set
> {code:java}
> // Collect all distinct blocks
>  Set allBlocks = new HashSet<>(Arrays.asList(getBlocks()));
>  DiffList diffs = sf.getDiffs().asList();
>  for(FileDiff diff : diffs) {
>BlockInfo[] diffBlocks = diff.getBlocks();
>if (diffBlocks != null) {
>  allBlocks.addAll(Arrays.asList(diffBlocks));
>  } {code}
> but on updating the reclaim context we subtract these both , so wrong quota 
> value can be updated
> {code:java}
> QuotaCounts current = file.storagespaceConsumed(bsp);
> reclaimContext.quotaDelta().add(oldCounts.subtract(current)); {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15362) FileWithSnapshotFeature#updateQuotaAndCollectBlocks should collect all distinct blocks

2020-05-19 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111421#comment-17111421
 ] 

hemanthboyina commented on HDFS-15362:
--

thanks [~elgoiri] for the review

we call updateQuotaAndCollectBlocks only once at the end , so i have done 
assert only that time

> FileWithSnapshotFeature#updateQuotaAndCollectBlocks should collect all 
> distinct blocks
> --
>
> Key: HDFS-15362
> URL: https://issues.apache.org/jira/browse/HDFS-15362
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15362.001.patch
>
>
> FileWithSnapshotFeature#updateQuotaAndCollectBlocks uses list to collect 
> blocks 
> {code:java}
>  List allBlocks = new ArrayList();
>  if (file.getBlocks() != null) {
> allBlocks.addAll(Arrays.asList(file.getBlocks()));
>   }{code}
>  INodeFile#storagespaceConsumedContiguous collects all distinct blocks by set
> {code:java}
> // Collect all distinct blocks
>  Set allBlocks = new HashSet<>(Arrays.asList(getBlocks()));
>  DiffList diffs = sf.getDiffs().asList();
>  for(FileDiff diff : diffs) {
>BlockInfo[] diffBlocks = diff.getBlocks();
>if (diffBlocks != null) {
>  allBlocks.addAll(Arrays.asList(diffBlocks));
>  } {code}
> but on updating the reclaim context we subtract these both , so wrong quota 
> value can be updated
> {code:java}
> QuotaCounts current = file.storagespaceConsumed(bsp);
> reclaimContext.quotaDelta().add(oldCounts.subtract(current)); {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15363) BlockPlacementPolicyWithNodeGroup should validate initialized NetworkTopology

2020-05-19 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15363:
-
Attachment: HDFS-15363.001.patch
Status: Patch Available  (was: Open)

> BlockPlacementPolicyWithNodeGroup should validate initialized NetworkTopology 
> --
>
> Key: HDFS-15363
> URL: https://issues.apache.org/jira/browse/HDFS-15363
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15363-testrepro.patch, HDFS-15363.001.patch
>
>
> BlockPlacementPolicyWithNodeGroup  type casts the initialized  clusterMap 
> {code:java}
> NetworkTopologyWithNodeGroup clusterMapNodeGroup =
> (NetworkTopologyWithNodeGroup) clusterMap {code}
> If clusterMap is an instance of DFSNetworkTopology we get a ClassCastException



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15363) BlockPlacementPolicyWithNodeGroup should validate initialized NetworkTopology

2020-05-19 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15363:
-
Attachment: HDFS-15363-testrepro.patch

> BlockPlacementPolicyWithNodeGroup should validate initialized NetworkTopology 
> --
>
> Key: HDFS-15363
> URL: https://issues.apache.org/jira/browse/HDFS-15363
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15363-testrepro.patch
>
>
> BlockPlacementPolicyWithNodeGroup  type casts the initialized  clusterMap 
> {code:java}
> NetworkTopologyWithNodeGroup clusterMapNodeGroup =
> (NetworkTopologyWithNodeGroup) clusterMap {code}
> If clusterMap is an instance of DFSNetworkTopology we get a ClassCastException



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15363) BlockPlacementPolicyWithNodeGroup should validate initialized NetworkTopology

2020-05-19 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15363:
-
Description: 
BlockPlacementPolicyWithNodeGroup  type casts the initialized  clusterMap 
{code:java}
NetworkTopologyWithNodeGroup clusterMapNodeGroup =
(NetworkTopologyWithNodeGroup) clusterMap {code}
If clusterMap is an instance of DFSNetworkTopology we get a ClassCastException

> BlockPlacementPolicyWithNodeGroup should validate initialized NetworkTopology 
> --
>
> Key: HDFS-15363
> URL: https://issues.apache.org/jira/browse/HDFS-15363
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
>
> BlockPlacementPolicyWithNodeGroup  type casts the initialized  clusterMap 
> {code:java}
> NetworkTopologyWithNodeGroup clusterMapNodeGroup =
> (NetworkTopologyWithNodeGroup) clusterMap {code}
> If clusterMap is an instance of DFSNetworkTopology we get a ClassCastException



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15363) BlockPlacementPolicyWithNodeGroup should validate initialized NetworkTopology

2020-05-19 Thread hemanthboyina (Jira)
hemanthboyina created HDFS-15363:


 Summary: BlockPlacementPolicyWithNodeGroup should validate 
initialized NetworkTopology 
 Key: HDFS-15363
 URL: https://issues.apache.org/jira/browse/HDFS-15363
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: hemanthboyina
Assignee: hemanthboyina






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15362) FileWithSnapshotFeature#updateQuotaAndCollectBlocks should collect all distinct blocks

2020-05-18 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15362:
-
Description: 
FileWithSnapshotFeature#updateQuotaAndCollectBlocks uses list to collect blocks 
{code:java}
 List allBlocks = new ArrayList();
 if (file.getBlocks() != null) {
allBlocks.addAll(Arrays.asList(file.getBlocks()));
  }{code}
 INodeFile#storagespaceConsumedContiguous collects all distinct blocks by set
{code:java}
// Collect all distinct blocks
 Set allBlocks = new HashSet<>(Arrays.asList(getBlocks()));
 DiffList diffs = sf.getDiffs().asList();
 for(FileDiff diff : diffs) {
   BlockInfo[] diffBlocks = diff.getBlocks();
   if (diffBlocks != null) {
 allBlocks.addAll(Arrays.asList(diffBlocks));
 } {code}
but on updating the reclaim context we subtract these both , so wrong quota 
value can be updated
{code:java}
QuotaCounts current = file.storagespaceConsumed(bsp);
reclaimContext.quotaDelta().add(oldCounts.subtract(current)); {code}

> FileWithSnapshotFeature#updateQuotaAndCollectBlocks should collect all 
> distinct blocks
> --
>
> Key: HDFS-15362
> URL: https://issues.apache.org/jira/browse/HDFS-15362
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15362.001.patch
>
>
> FileWithSnapshotFeature#updateQuotaAndCollectBlocks uses list to collect 
> blocks 
> {code:java}
>  List allBlocks = new ArrayList();
>  if (file.getBlocks() != null) {
> allBlocks.addAll(Arrays.asList(file.getBlocks()));
>   }{code}
>  INodeFile#storagespaceConsumedContiguous collects all distinct blocks by set
> {code:java}
> // Collect all distinct blocks
>  Set allBlocks = new HashSet<>(Arrays.asList(getBlocks()));
>  DiffList diffs = sf.getDiffs().asList();
>  for(FileDiff diff : diffs) {
>BlockInfo[] diffBlocks = diff.getBlocks();
>if (diffBlocks != null) {
>  allBlocks.addAll(Arrays.asList(diffBlocks));
>  } {code}
> but on updating the reclaim context we subtract these both , so wrong quota 
> value can be updated
> {code:java}
> QuotaCounts current = file.storagespaceConsumed(bsp);
> reclaimContext.quotaDelta().add(oldCounts.subtract(current)); {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15362) FileWithSnapshotFeature#updateQuotaAndCollectBlocks should collect all distinct blocks

2020-05-18 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15362:
-
Attachment: HDFS-15362.001.patch
Status: Patch Available  (was: Open)

> FileWithSnapshotFeature#updateQuotaAndCollectBlocks should collect all 
> distinct blocks
> --
>
> Key: HDFS-15362
> URL: https://issues.apache.org/jira/browse/HDFS-15362
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15362.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15362) FileWithSnapshotFeature#updateQuotaAndCollectBlocks should collect all distinct blocks

2020-05-18 Thread hemanthboyina (Jira)
hemanthboyina created HDFS-15362:


 Summary: FileWithSnapshotFeature#updateQuotaAndCollectBlocks 
should collect all distinct blocks
 Key: HDFS-15362
 URL: https://issues.apache.org/jira/browse/HDFS-15362
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: hemanthboyina
Assignee: hemanthboyina






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14762) "Path(Path/String parent, String child)" will fail when "child" contains ":"

2020-05-18 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109947#comment-17109947
 ] 

hemanthboyina commented on HDFS-14762:
--

i think this issue also relates to IPV6 

> "Path(Path/String parent, String child)" will fail when "child" contains ":"
> 
>
> Key: HDFS-14762
> URL: https://issues.apache.org/jira/browse/HDFS-14762
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shixiong Zhu
>Priority: Major
> Attachments: HDFS-14762.001.patch, HDFS-14762.002.patch, 
> HDFS-14762.003.patch, HDFS-14762.004.patch
>
>
> When the "child" parameter contains ":", "Path(Path/String parent, String 
> child)" will throw the following exception:
> {code}
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: ...
> {code}
> Not sure if this is a legit bug. But the following places will hit this error 
> when seeing a Path with a file name containing ":":
> https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java#L101
> https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java#L270



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15351) Blocks Scheduled Count was wrong on Truncate

2020-05-13 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15351:
-
Attachment: HDFS-15351.003.patch

> Blocks Scheduled Count was wrong on Truncate 
> -
>
> Key: HDFS-15351
> URL: https://issues.apache.org/jira/browse/HDFS-15351
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15351.001.patch, HDFS-15351.002.patch, 
> HDFS-15351.003.patch
>
>
> On truncate and append we remove the blocks from Reconstruction Queue 
> On removing the blocks from pending reconstruction , we need to decrement 
> Blocks Scheduled 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15351) Blocks Scheduled Count was wrong on Truncate

2020-05-13 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106503#comment-17106503
 ] 

hemanthboyina commented on HDFS-15351:
--

thanks for the review [~elgoiri]

have updated the patch , please review

> Blocks Scheduled Count was wrong on Truncate 
> -
>
> Key: HDFS-15351
> URL: https://issues.apache.org/jira/browse/HDFS-15351
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15351.001.patch, HDFS-15351.002.patch, 
> HDFS-15351.003.patch
>
>
> On truncate and append we remove the blocks from Reconstruction Queue 
> On removing the blocks from pending reconstruction , we need to decrement 
> Blocks Scheduled 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15351) Blocks Scheduled Count was wrong on Truncate

2020-05-13 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106486#comment-17106486
 ] 

hemanthboyina commented on HDFS-15351:
--

thanks for the review [~elgoiri]

i have updated the patch , please review

> Blocks Scheduled Count was wrong on Truncate 
> -
>
> Key: HDFS-15351
> URL: https://issues.apache.org/jira/browse/HDFS-15351
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15351.001.patch, HDFS-15351.002.patch
>
>
> On truncate and append we remove the blocks from Reconstruction Queue 
> On removing the blocks from pending reconstruction , we need to decrement 
> Blocks Scheduled 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15351) Blocks Scheduled Count was wrong on Truncate

2020-05-13 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15351:
-
Attachment: HDFS-15351.002.patch

> Blocks Scheduled Count was wrong on Truncate 
> -
>
> Key: HDFS-15351
> URL: https://issues.apache.org/jira/browse/HDFS-15351
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15351.001.patch, HDFS-15351.002.patch
>
>
> On truncate and append we remove the blocks from Reconstruction Queue 
> On removing the blocks from pending reconstruction , we need to decrement 
> Blocks Scheduled 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15308) TestReconstructStripedFile.testNNSendsErasureCodingTasks is flaky

2020-05-11 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17104747#comment-17104747
 ] 

hemanthboyina edited comment on HDFS-15308 at 5/11/20, 6:47 PM:


thanks for the work here

i think dfs.namenode.reconstruction.pending.timeout-sec should be lesser 
{code:java}
 long period = Math.min(DEFAULT_RECHECK_INTERVAL, timeout);
   try {
 pendingReconstructionCheck();
 Thread.sleep(period);
} {code}
to get zero timed out pending reconstructions , the timeout should be more as 
[~touchida] mentioned 


was (Author: hemanthboyina):
thanks for the work here

i think dfs.namenode.reconstruction.pending.timeout-sec should be lesser 
{code:java}
 long period = Math.min(DEFAULT_RECHECK_INTERVAL, timeout);
   try {
 pendingReconstructionCheck();
 Thread.sleep(period);
} {code}

> TestReconstructStripedFile.testNNSendsErasureCodingTasks is flaky
> -
>
> Key: HDFS-15308
> URL: https://issues.apache.org/jira/browse/HDFS-15308
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.3.0
>Reporter: Toshihiko Uchida
>Priority: Minor
>  Labels: flaky-test
> Attachments: HDFS-15308.001.patch
>
>
> In HDFS-14353, TestReconstructStripedFile.testNNSendsErasureCodingTasks 
> failed once due to pending reconstruction timeout as follows.
> {code}
> java.lang.AssertionError: Found 4 timeout pending reconstruction tasks
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.hdfs.TestReconstructStripedFile.testNNSendsErasureCodingTasks(TestReconstructStripedFile.java:502)
>   at 
> org.apache.hadoop.hdfs.TestReconstructStripedFile.testNNSendsErasureCodingTasks(TestReconstructStripedFile.java:458)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> The error occurred on the following assertion.
> {code}
> // Make sure that all pending reconstruction tasks can be processed.
> while (ns.getPendingReconstructionBlocks() > 0) {
>   long timeoutPending = ns.getNumTimedOutPendingReconstructions();
>   assertTrue(String.format("Found %d timeout pending reconstruction tasks",
>   timeoutPending), timeoutPending == 0);
>   Thread.sleep(1000);
> }
> {code}
> The failure could not be reproduced in the reporter's docker environment 
> (start-build-environment.sh).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15308) TestReconstructStripedFile.testNNSendsErasureCodingTasks is flaky

2020-05-11 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17104747#comment-17104747
 ] 

hemanthboyina commented on HDFS-15308:
--

thanks for the work here

i think dfs.namenode.reconstruction.pending.timeout-sec should be lesser 
{code:java}
 long period = Math.min(DEFAULT_RECHECK_INTERVAL, timeout);
   try {
 pendingReconstructionCheck();
 Thread.sleep(period);
} {code}

> TestReconstructStripedFile.testNNSendsErasureCodingTasks is flaky
> -
>
> Key: HDFS-15308
> URL: https://issues.apache.org/jira/browse/HDFS-15308
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.3.0
>Reporter: Toshihiko Uchida
>Priority: Minor
>  Labels: flaky-test
> Attachments: HDFS-15308.001.patch
>
>
> In HDFS-14353, TestReconstructStripedFile.testNNSendsErasureCodingTasks 
> failed once due to pending reconstruction timeout as follows.
> {code}
> java.lang.AssertionError: Found 4 timeout pending reconstruction tasks
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.hdfs.TestReconstructStripedFile.testNNSendsErasureCodingTasks(TestReconstructStripedFile.java:502)
>   at 
> org.apache.hadoop.hdfs.TestReconstructStripedFile.testNNSendsErasureCodingTasks(TestReconstructStripedFile.java:458)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> The error occurred on the following assertion.
> {code}
> // Make sure that all pending reconstruction tasks can be processed.
> while (ns.getPendingReconstructionBlocks() > 0) {
>   long timeoutPending = ns.getNumTimedOutPendingReconstructions();
>   assertTrue(String.format("Found %d timeout pending reconstruction tasks",
>   timeoutPending), timeoutPending == 0);
>   Thread.sleep(1000);
> }
> {code}
> The failure could not be reproduced in the reporter's docker environment 
> (start-build-environment.sh).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15308) TestReconstructStripedFile.testNNSendsErasureCodingTasks is flaky

2020-05-11 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15308:
-
Attachment: HDFS-15308.001.patch
Status: Patch Available  (was: Open)

> TestReconstructStripedFile.testNNSendsErasureCodingTasks is flaky
> -
>
> Key: HDFS-15308
> URL: https://issues.apache.org/jira/browse/HDFS-15308
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.3.0
>Reporter: Toshihiko Uchida
>Priority: Minor
>  Labels: flaky-test
> Attachments: HDFS-15308.001.patch
>
>
> In HDFS-14353, TestReconstructStripedFile.testNNSendsErasureCodingTasks 
> failed once due to pending reconstruction timeout as follows.
> {code}
> java.lang.AssertionError: Found 4 timeout pending reconstruction tasks
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.hdfs.TestReconstructStripedFile.testNNSendsErasureCodingTasks(TestReconstructStripedFile.java:502)
>   at 
> org.apache.hadoop.hdfs.TestReconstructStripedFile.testNNSendsErasureCodingTasks(TestReconstructStripedFile.java:458)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> The error occurred on the following assertion.
> {code}
> // Make sure that all pending reconstruction tasks can be processed.
> while (ns.getPendingReconstructionBlocks() > 0) {
>   long timeoutPending = ns.getNumTimedOutPendingReconstructions();
>   assertTrue(String.format("Found %d timeout pending reconstruction tasks",
>   timeoutPending), timeoutPending == 0);
>   Thread.sleep(1000);
> }
> {code}
> The failure could not be reproduced in the reporter's docker environment 
> (start-build-environment.sh).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15351) Blocks Scheduled Count was wrong on Truncate

2020-05-11 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15351:
-
Attachment: HDFS-15351.001.patch
Status: Patch Available  (was: Open)

> Blocks Scheduled Count was wrong on Truncate 
> -
>
> Key: HDFS-15351
> URL: https://issues.apache.org/jira/browse/HDFS-15351
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15351.001.patch
>
>
> On truncate and append we remove the blocks from Reconstruction Queue 
> On removing the blocks from pending reconstruction , we need to decrement 
> Blocks Scheduled 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15351) Blocks Scheduled Count was wrong on Truncate

2020-05-11 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15351:
-
Description: 
On truncate and append we remove the blocks from Reconstruction Queue 

On removing the blocks from pending reconstruction , we need to decrement 
Blocks Scheduled 

> Blocks Scheduled Count was wrong on Truncate 
> -
>
> Key: HDFS-15351
> URL: https://issues.apache.org/jira/browse/HDFS-15351
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
>
> On truncate and append we remove the blocks from Reconstruction Queue 
> On removing the blocks from pending reconstruction , we need to decrement 
> Blocks Scheduled 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15351) Blocks Scheduled Count was wrong on Truncate

2020-05-11 Thread hemanthboyina (Jira)
hemanthboyina created HDFS-15351:


 Summary: Blocks Scheduled Count was wrong on Truncate 
 Key: HDFS-15351
 URL: https://issues.apache.org/jira/browse/HDFS-15351
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: hemanthboyina
Assignee: hemanthboyina






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15335) Report top N metrics for files in get listing ops

2020-05-05 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100399#comment-17100399
 ] 

hemanthboyina commented on HDFS-15335:
--

Hi [~csun] it's a good improvement 

Are you started working on this ? , if not can I take over

> Report top N metrics for files in get listing ops
> -
>
> Key: HDFS-15335
> URL: https://issues.apache.org/jira/browse/HDFS-15335
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, metrics
>Reporter: Chao Sun
>Priority: Major
>
> Currently HDFS has {{filesInGetListingOps}} metrics which tells the total 
> number of files in all listing ops. However, it will be useful to report the 
> top N users who contribute most to this. This can help to identify the 
> potential bad users and stop the abusing against NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15332) Quota Space consumed was wrong in truncate with Snapshots

2020-05-04 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15332:
-
Description: 
On calculating space quota usage
{code:java}
   if (file.getBlocks() != null) {
allBlocks.addAll(Arrays.asList(file.getBlocks()));
   }
   if (removed.getBlocks() != null) {
allBlocks.addAll(Arrays.asList(removed.getBlocks()));
   }  
   for (BlockInfo b: allBlocks) { {code}
we missed out the blocks of file snapshot feature's Diffs

> Quota Space consumed was wrong in truncate with Snapshots
> -
>
> Key: HDFS-15332
> URL: https://issues.apache.org/jira/browse/HDFS-15332
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15332.001.patch
>
>
> On calculating space quota usage
> {code:java}
>if (file.getBlocks() != null) {
> allBlocks.addAll(Arrays.asList(file.getBlocks()));
>}
>if (removed.getBlocks() != null) {
> allBlocks.addAll(Arrays.asList(removed.getBlocks()));
>}  
>for (BlockInfo b: allBlocks) { {code}
> we missed out the blocks of file snapshot feature's Diffs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15332) Quota Space consumed was wrong in truncate with Snapshots

2020-05-04 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15332:
-
Attachment: HDFS-15332.001.patch
Status: Patch Available  (was: Open)

> Quota Space consumed was wrong in truncate with Snapshots
> -
>
> Key: HDFS-15332
> URL: https://issues.apache.org/jira/browse/HDFS-15332
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15332.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15332) Quota Space consumed was wrong in truncate with Snapshots

2020-05-04 Thread hemanthboyina (Jira)
hemanthboyina created HDFS-15332:


 Summary: Quota Space consumed was wrong in truncate with Snapshots
 Key: HDFS-15332
 URL: https://issues.apache.org/jira/browse/HDFS-15332
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: hemanthboyina
Assignee: hemanthboyina






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15316) Deletion failure should not remove directory from snapshottables

2020-05-01 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097474#comment-17097474
 ] 

hemanthboyina commented on HDFS-15316:
--

thanks [~ayushtkn] for review

updated the patch , please review
{quote}Secondly, can you help when in actual scenario
{quote}
it happened in very rare scenario , though it can be a safeguard condition to 
check

> Deletion failure should not remove directory from snapshottables
> 
>
> Key: HDFS-15316
> URL: https://issues.apache.org/jira/browse/HDFS-15316
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15316.001.patch, HDFS-15316.002.patch
>
>
> If deleting a directory doesn't succeeds , still we are removing directory 
> from snapshottables  
> this makes the system inconsistent , we will be able to create snapshots but 
> snapshot diff throws Directory is not snaphottable



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15316) Deletion failure should not remove directory from snapshottables

2020-05-01 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15316:
-
Attachment: HDFS-15316.002.patch

> Deletion failure should not remove directory from snapshottables
> 
>
> Key: HDFS-15316
> URL: https://issues.apache.org/jira/browse/HDFS-15316
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15316.001.patch, HDFS-15316.002.patch
>
>
> If deleting a directory doesn't succeeds , still we are removing directory 
> from snapshottables  
> this makes the system inconsistent , we will be able to create snapshots but 
> snapshot diff throws Directory is not snaphottable



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15316) Deletion failure should not remove directory from snapshottables

2020-04-30 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15316:
-
Attachment: HDFS-15316.001.patch
Status: Patch Available  (was: Open)

> Deletion failure should not remove directory from snapshottables
> 
>
> Key: HDFS-15316
> URL: https://issues.apache.org/jira/browse/HDFS-15316
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15316.001.patch
>
>
> If deleting a directory doesn't succeeds , still we are removing directory 
> from snapshottables  
> this makes the system inconsistent , we will be able to create snapshots but 
> snapshot diff throws Directory is not snaphottable



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15316) Deletion failure should not remove directory from snapshottables

2020-04-30 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15316:
-
Description: 
If deleting a directory doesn't succeeds , still we are removing directory from 
snapshottables  

this makes the system inconsistent , we will be able to create snapshots but 
snapshot diff throws Directory is not snaphottable

> Deletion failure should not remove directory from snapshottables
> 
>
> Key: HDFS-15316
> URL: https://issues.apache.org/jira/browse/HDFS-15316
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
>
> If deleting a directory doesn't succeeds , still we are removing directory 
> from snapshottables  
> this makes the system inconsistent , we will be able to create snapshots but 
> snapshot diff throws Directory is not snaphottable



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15302) Backport HDFS-15286 to branch-2.x

2020-04-30 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096878#comment-17096878
 ] 

hemanthboyina commented on HDFS-15302:
--

thanks [~aajisaka] for the review

updated the patch , please review , test failures and findbugs were not related

 

> Backport HDFS-15286 to branch-2.x
> -
>
> Key: HDFS-15302
> URL: https://issues.apache.org/jira/browse/HDFS-15302
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Akira Ajisaka
>Assignee: hemanthboyina
>Priority: Blocker
> Attachments: HDFS-15302-branch-2.10-01.patch, 
> HDFS-15302-branch-2.10-02.patch, HDFS-15302-branch.2.10.1.patch
>
>
> Backport HDFS-15286 to branch-2.10 and branch-2.9.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15316) Deletion failure should not remove directory from snapshottables

2020-04-30 Thread hemanthboyina (Jira)
hemanthboyina created HDFS-15316:


 Summary: Deletion failure should not remove directory from 
snapshottables
 Key: HDFS-15316
 URL: https://issues.apache.org/jira/browse/HDFS-15316
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: hemanthboyina
Assignee: hemanthboyina






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15302) Backport HDFS-15286 to branch-2.x

2020-04-30 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15302:
-
Attachment: HDFS-15302-branch-2.10-02.patch

> Backport HDFS-15286 to branch-2.x
> -
>
> Key: HDFS-15302
> URL: https://issues.apache.org/jira/browse/HDFS-15302
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Akira Ajisaka
>Assignee: hemanthboyina
>Priority: Blocker
> Attachments: HDFS-15302-branch-2.10-01.patch, 
> HDFS-15302-branch-2.10-02.patch, HDFS-15302-branch.2.10.1.patch
>
>
> Backport HDFS-15286 to branch-2.10 and branch-2.9.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15265) HttpFS: validate content-type in HttpFSUtils

2020-04-29 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095734#comment-17095734
 ] 

hemanthboyina commented on HDFS-15265:
--

updated the patch , please review [~elgoiri]

> HttpFS: validate content-type in HttpFSUtils
> 
>
> Key: HDFS-15265
> URL: https://issues.apache.org/jira/browse/HDFS-15265
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15265.001.patch, HDFS-15265.002.patch
>
>
> Validate that the content-type in HttpFSUtils is JSON.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   >