[jira] [Created] (HDFS-14534) NetworkTopology's fair lock has poor performance

2019-05-31 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-14534:
--

 Summary: NetworkTopology's fair lock has poor performance
 Key: HDFS-14534
 URL: https://issues.apache.org/jira/browse/HDFS-14534
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Daryn Sharp


The {{NetworkTopology#netlock}} is a heavily accessed lock.  HADOOP-15486 made 
the lock fair to avoid starvation of a re-registering datanode that holds the 
fsn write lock while waiting for the topology write lock.  If nodes start 
flapping and re-registering, the contention with all other handlers becomes 
extreme.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14533) Datanode short circuit cache can become blocked

2019-05-31 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-14533:
--

 Summary: Datanode short circuit cache can become blocked
 Key: HDFS-14533
 URL: https://issues.apache.org/jira/browse/HDFS-14533
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Daryn Sharp


Errors in the short circuit cache can leave clients indefinitely blocked in 
{{ShortCircuitCache#fetch}} on a waitable's condition that will never be 
signaled.  The condition wait should be bounded with a timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14532) Datanode's BlockSender checksum buffer is too big

2019-05-31 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-14532:
--

 Summary: Datanode's BlockSender checksum buffer is too big
 Key: HDFS-14532
 URL: https://issues.apache.org/jira/browse/HDFS-14532
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Daryn Sharp
 Attachments: Screen Shot 2019-05-31 at 12.32.06 PM.png

The BlockSender uses an excessively large 128K buffered input stream – 99% of 
the entire instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14531) Datanode's ScanInfo requires excessive memory

2019-05-31 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-14531:
--

 Summary: Datanode's ScanInfo requires excessive memory
 Key: HDFS-14531
 URL: https://issues.apache.org/jira/browse/HDFS-14531
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 2.0.0-alpha
Reporter: Daryn Sharp
 Attachments: Screen Shot 2019-05-31 at 12.25.54 PM.png

The DirectoryScanner's ScanInfo map consumes ~4.5X memory as replicas as the 
replica map.  For 1.1M replicas: the replica map is ~91M while the scan info is 
~405M.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13929) DFSClient leaks data streamer sockets

2018-09-20 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-13929:
--

 Summary: DFSClient leaks data streamer sockets
 Key: HDFS-13929
 URL: https://issues.apache.org/jira/browse/HDFS-13929
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Daryn Sharp


Sockets to DNs may linger in the CLOSE_WAIT state which means the remote peer 
(DN) closed the socket but the local client has not.  The socket does not 
transition to TIME_WAIT until the java process exits which is indicative of a 
leaked file descriptor.  Interestingly there is always 1 byte remaining to be 
read.

{noformat}
$ netstat -tnn|fgrep :1004|fgrep -v EST
tcp1  0 THISHOST:57158 RANDOMHOST:1004 CLOSE_WAIT  
tcp1  0 THISHOST:40346 RANDOMHOST:1004 CLOSE_WAIT  
tcp1  0 THISHOST:45504 RANDOMHOST:1004 CLOSE_WAIT  
tcp1  0 THISHOST:58958 RANDOMHOST:1004 CLOSE_WAIT  
tcp1  0 THISHOST:45570 RANDOMHOST:1004 CLOSE_WAIT  
tcp1  0 THISHOST:46496 RANDOMHOST:1004 CLOSE_WAIT  
tcp1  0 THISHOST:58944 RANDOMHOST:1004 CLOSE_WAIT  
tcp1  0 THISHOST:55540 RANDOMHOST:1004 CLOSE_WAIT  
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13465) Overlapping lease recoveries cause NPE in NN

2018-04-17 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-13465:
--

 Summary: Overlapping lease recoveries cause NPE in NN
 Key: HDFS-13465
 URL: https://issues.apache.org/jira/browse/HDFS-13465
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.8.0
Reporter: Daryn Sharp


Overlapping lease recoveries for the same file will NPE in the DatanodeManager 
while creating LeaseRecoveryCommands, possibly losing other recovery commands.
 * client1 calls recoverLease, file is added to DN1's recovery queue
 * client2 calls recoverLease, file is added to DN2's recovery queue
 * one DN heartbeats, gets the block recovery command and it completes the 
synchronization before the other DN heartbeats; ie. file is closed.
 * other DN heartbeats, takes block from recovery queue, assumes it's still 
under construction, gets a NPE calling getExpectedLocations

{code:java}
//check lease recovery
BlockInfo[] blocks = nodeinfo.getLeaseRecoveryCommand(Integer.MAX_VALUE);
if (blocks != null) {
  BlockRecoveryCommand brCommand = new BlockRecoveryCommand(
  blocks.length);
  for (BlockInfo b : blocks) {
BlockUnderConstructionFeature uc = b.getUnderConstructionFeature();
assert uc != null;
final DatanodeStorageInfo[] storages = uc.getExpectedStorageLocations();
{code}
This is "ok" to the NN state if only 1 block was queued.  All recoveries are 
lost if multiple blocks were queued.  Recovery will not occur until the client 
explicitly retries or the lease monitor recovers the lease.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13112) Token expiration edits may cause log corruption or deadlock

2018-02-06 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-13112:
--

 Summary: Token expiration edits may cause log corruption or 
deadlock
 Key: HDFS-13112
 URL: https://issues.apache.org/jira/browse/HDFS-13112
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 0.23.8, 2.1.0-beta
Reporter: Daryn Sharp
Assignee: Daryn Sharp


HDFS-4477 specifically did not acquire the fsn lock during token cancellation 
based on the belief that edit logs are thread-safe.  However, log rolling is 
not thread-safe.  Failure to externally synchronize on the fsn lock during a 
roll will cause problems.

For sync edit logging, it may cause corruption by interspersing edits with the 
end/start segment edits.  Async edit logging may encounter a deadlock if the 
log queue overflows.  Luckily, losing the race is extremely rare.  In ~5 years, 
we've never encountered it.  However, HDFS-13051 lost the race with async edits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13111) Close recovery may incorrectly mark blocks corrupt

2018-02-06 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-13111:
--

 Summary: Close recovery may incorrectly mark blocks corrupt
 Key: HDFS-13111
 URL: https://issues.apache.org/jira/browse/HDFS-13111
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.8.0
Reporter: Daryn Sharp


Close recovery can leave a block marked corrupt until the next FBR arrives from 
one of the DNs.  The reason is unclear but has happened multiple times when a 
DN has io saturated disks.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-13038) User with no permission on file is able to run getfacl for that file

2018-01-24 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp resolved HDFS-13038.

Resolution: Not A Problem

> User with no permission on file is able to run getfacl for that file
> 
>
> Key: HDFS-13038
> URL: https://issues.apache.org/jira/browse/HDFS-13038
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Namit Maheshwari
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDFS-13038.001.patch
>
>
> Currently any user with EXECUTE permission can run getfacl on a file or 
> directory. This Jira adds a check for READ access of user on the inode path. 
> {code:java}
> [root@host ~]$ hdfs dfs -copyFromLocal /etc/a.txt /tmp
> [root@host ~]$ hdfs dfs -setfacl -m user:abc:--- /tmp/a.txt
> {code}
> Since user abc does not have read permission on the file 'cat' command throws 
> Permission Denied error but getfacl executes normally.
> {code:java}
> [abc@host ~]$ hdfs dfs -cat /tmp/a.txt
> cat: Permission denied: user=abc, access=READ, 
> inode="/tmp/a.txt":abc:hdfs:-rw-r--r-- 
> [abc@host ~]$ hdfs dfs -getfacl /tmp/a.txt 
> # file: /tmp/a.txt 
> # owner:root 
> # group: hdfs 
> user::rw- 
> user:abc:--- 
> group::r-- 
> mask::r-- 
> other::r--
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12914) Block report leases cause missing blocks until next report

2017-12-11 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12914:
--

 Summary: Block report leases cause missing blocks until next report
 Key: HDFS-12914
 URL: https://issues.apache.org/jira/browse/HDFS-12914
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.8.0
Reporter: Daryn Sharp
Priority: Critical


{{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for conditions 
such as "unknown datanode", "not in pending set", "lease has expired", wrong 
lease id, etc.  Lease rejection does not throw an exception.  It returns false 
which bubbles up to  {{NameNodeRpcServer#blockReport}} and interpreted as 
{{noStaleStorages}}.

A re-registering node whose FBR is rejected from an invalid lease becomes 
active with _no blocks_.  A replication storm ensues possibly causing DNs to 
temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
re-registration.  The cluster will have many "missing blocks" until the DNs 
next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12907) Allow read-only access to reserved raw for non-superusers

2017-12-06 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12907:
--

 Summary: Allow read-only access to reserved raw for non-superusers
 Key: HDFS-12907
 URL: https://issues.apache.org/jira/browse/HDFS-12907
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Daryn Sharp


HDFS-6509 added a special /.reserved/raw path prefix to access the raw file 
contents of EZ files.  In the simplest sense it doesn't return the FE info in 
the {{LocatedBlocks}} so the dfs client doesn't try to decrypt the data.  This 
facilitates allowing tools like distcp to copy raw bytes.

Access to the raw hierarchy is restricted to superusers.  This seems like an 
overly broad restriction designed to prevent non-admins from munging the EZ 
related xattrs.  I believe we should relax the restriction to allow non-admins 
to perform read-only operations.  Allowing non-superusers to easily read the 
raw bytes will be extremely useful for regular users, esp. for enabling webhdfs 
client-side encryption.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12747) Lease monitor may infinitely loop on the same lease

2017-10-30 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12747:
--

 Summary: Lease monitor may infinitely loop on the same lease
 Key: HDFS-12747
 URL: https://issues.apache.org/jira/browse/HDFS-12747
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.8.0
Reporter: Daryn Sharp
Priority: Critical


Lease recovery incorrectly handles UC files if the last block is complete but 
the penultimate block is committed.  Incorrectly handles is the euphemism for 
infinitely loops for days and leaves all abandoned streams open until customers 
complain.

The problem may manifest when:
# Block1 is committed but seemingly never committed
# Block2 is allocated
# Lease recovery is initiated for block2
# Commit block synchronization invokes {{FSNamesytem#closeFileCommitBlocks}}, 
causing:
#* {{commitOrCompleteLastBlock}} to mark block2 as complete
#* {{finalizeINodeFileUnderConstruction}}/{{INodeFile.assertAllBlocksComplete}} 
to throw {{IllegalStateException}} because the penultimate block1 is "COMMITTED 
but not COMPLETE"
# The next lease recovery results in an infinite loop.

The {{LeaseManager}} expects that {{FSNamesystem#internalReleaseLease}} will 
either init recovery and renew the lease, or remove the lease.  In the 
described state it does neither.  The switch case will break out if the last 
block is complete.  (The case statement ironically contains an assert).  Since 
nothing changed, the lease is still the “next” lease to be processed.  The 
lease monitor loops for 25ms on the same lease, sleeps for 2s, loops on it 
again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-4493) webhdfs fails after SPNEGO token expires

2017-10-25 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp resolved HDFS-4493.
---
  Resolution: Works for Me
Release Note: This issue was fixed years ago when authenticated url was 
removed from webhdfs.

> webhdfs fails after SPNEGO token expires
> 
>
> Key: HDFS-4493
> URL: https://issues.apache.org/jira/browse/HDFS-4493
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0-alpha1
>Reporter: Daryn Sharp
>
> Webhdfs assumes that SPNEGO's {{AuthenticationException}} is always fatal.  
> The exception is thrown if authentication truly fails due to bad credentials, 
> _but_ it's also thrown when the auth token expires after 10h.  The retry 
> policies are short-circuited and the fs becomes unusable which is unsuitable 
> for long running processes/daemons.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12705) WebHdfsFileSystem exceptions should retain the caused by exception

2017-10-24 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12705:
--

 Summary: WebHdfsFileSystem exceptions should retain the caused by 
exception
 Key: HDFS-12705
 URL: https://issues.apache.org/jira/browse/HDFS-12705
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 2.8.0
Reporter: Daryn Sharp


{{WebHdfsFileSystem#runWithRetry}} uses reflection to prepend the remote host 
to the exception.  While it preserves the original stacktrace, it omits the 
original cause which complicates debugging.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12704) FBR may corrupt block state

2017-10-24 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12704:
--

 Summary: FBR may corrupt block state
 Key: HDFS-12704
 URL: https://issues.apache.org/jira/browse/HDFS-12704
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.8.0
Reporter: Daryn Sharp
Priority: Critical


If FBR processing generates a runtime exception it is believed to foul the 
block state and lead to unpredictable behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12703) Exceptions are fatal to decommissioning monitor

2017-10-24 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12703:
--

 Summary: Exceptions are fatal to decommissioning monitor
 Key: HDFS-12703
 URL: https://issues.apache.org/jira/browse/HDFS-12703
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Daryn Sharp
Priority: Critical


The {{DecommissionManager.Monitor}} runs as an executor scheduled task.  If an 
exception occurs, all decommissioning ceases until the NN is restarted.  Per 
javadoc for {{executor#scheduleAtFixedRate}}: *If any execution of the task 
encounters an exception, subsequent executions are suppressed*.  The monitor 
thread is alive but blocked waiting for an executor task that will never come.  
The code currently disposes of the future so the actual exception that aborted 
the task is gone.

Failover is insufficient since the task is also likely dead on the standby.  
Replication queue init after the transition to active will fix the under 
replication of blocks on currently decommissioning nodes but future nodes never 
decommission.  The standby must be bounced prior to failover – and hopefully 
the error condition does not reoccur.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12658) Lease renewal causes connection flapping

2017-10-13 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12658:
--

 Summary: Lease renewal causes connection flapping
 Key: HDFS-12658
 URL: https://issues.apache.org/jira/browse/HDFS-12658
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.0.0-alpha
Reporter: Daryn Sharp


Adding a dfsclient to the lease renewer use the minimum of 1/2 the soft timeout 
vs. 1/2 the client's timeout (when the client closes an idle connection).  Both 
default to 1m, so clients with open files that are otherwise not making calls 
to the NN will experience connection flapping.  Re-authentication is 
unnecessarily taxing on the ipc layer.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12657) Operations based on inode id must not fallback to the path

2017-10-13 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12657:
--

 Summary: Operations based on inode id must not fallback to the path
 Key: HDFS-12657
 URL: https://issues.apache.org/jira/browse/HDFS-12657
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Daryn Sharp


HDFS-6294 added the ability for some path-based operations to specify an 
optional inode id to mimic file descriptors.  If an inode id is provided and it 
exists, it replaces the provided path.  If it doesn't exist, it has the broken 
behavior of falling back to the supplied path.  A supplied inode id must be 
authoritative.  A FNF should be thrown if the inode does not exist.  
(HDFS-10745 changed from string paths to IIPs but preserved the same broken 
semantics)

This is broken since an operation specifying an inode for a deleted and 
recreated path will operate on the newer inode.  If another client recreates 
the path, the operation is likely to fail for other reasons such as lease 
checks.  However a multi-threaded client has a single lease id.  If thread1 
creates a file, it's somehow deleted, thread2 recreates the path, then further 
operations in thread1 may conflict with thread2 and corrupt the state of the 
file.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12648) DN should provide feedback to NN for throttling commands

2017-10-12 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12648:
--

 Summary: DN should provide feedback to NN for throttling commands
 Key: HDFS-12648
 URL: https://issues.apache.org/jira/browse/HDFS-12648
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.8.0
Reporter: Daryn Sharp


The NN should avoid sending commands to a DN with a high number of outstanding 
commands.  The heartbeat could provide this feedback via perhaps a simple count 
of the commands or rate of processing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12647) DN commands processing should be async

2017-10-12 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12647:
--

 Summary: DN commands processing should be async
 Key: HDFS-12647
 URL: https://issues.apache.org/jira/browse/HDFS-12647
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.8.0
Reporter: Daryn Sharp


Due to dataset lock contention, service actors may encounter significant 
latency while processing  DN commands.  Even the queuing of async deletions 
require multiple lock acquisitions.  A slow disk will cause a backlog of 
xceivers instantiating block sender/receivers which starves the actor and leads 
to the NN falsely declaring the node dead.

Async processing of all commands will free the actor to perform its primary 
purpose of heartbeating and block reporting.  Note that FBRs will be dependent 
on queued block invalidations not being included in the report.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12646) Avoid IO while holding the FsDataset lock

2017-10-12 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12646:
--

 Summary: Avoid IO while holding the FsDataset lock
 Key: HDFS-12646
 URL: https://issues.apache.org/jira/browse/HDFS-12646
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.8.0
Reporter: Daryn Sharp


IO operations should be allowed while holding the dataset lock.  Notable 
offenders include but are not limited to the instantiation of a block 
sender/receiver, constructing the path to a block, unfinalizing a block.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12645) FSDatasetImpl lock will stall BP service actors and may cause missing blocks

2017-10-12 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12645:
--

 Summary: FSDatasetImpl lock will stall BP service actors and may 
cause missing blocks
 Key: HDFS-12645
 URL: https://issues.apache.org/jira/browse/HDFS-12645
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.8.0
Reporter: Daryn Sharp


The DN is extremely susceptible to a slow volume due bad locking practices.  DN 
operations require a fs dataset lock.  IO in the dataset lock should not be 
permissible as it leads to severe performance degradation and possibly 
(temporarily) missing blocks.

A slow disk will cause pipelines to experience significant latency and 
timeouts, increasing lock/io contention while cleaning up, leading to more 
timeouts, etc.  Meanwhile, the actor service thread is interleaving multiple 
lock acquire/releases with xceivers.  If many commands are issued, the node may 
be incorrectly declared as dead.

HDFS-12639 documents that both actors synchronize on the offer service lock 
while processing commands.  A backlogged active actor will block the standby 
actor and cause it to go dead too.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12639) BPOfferService lock may stall all service actors

2017-10-11 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12639:
--

 Summary: BPOfferService lock may stall all service actors
 Key: HDFS-12639
 URL: https://issues.apache.org/jira/browse/HDFS-12639
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.8.0
Reporter: Daryn Sharp


{{BPOfferService}} manages {{BPServiceActor}} instances for the active and 
standby.  It uses a RW lock to primarily protect registration information while 
determining the active/standby from heartbeats.

Unfortunately the write lock is held during command processing.  If an actor is 
experiencing high latency processing commands, the other actor will neither be 
able to register (blocked in createRegistration, setNamespaceInfo, 
verifyAndSetNamespaceInfo) nor process heartbeats (blocked in 
updateActorStatesFromHeartbeat).

The worst case scenario for processing commands while holding the lock is 
re-registration.  The actor will loop, catching and logging exceptions, leaving 
the other actor blocked for an non-deterministic (possibly infinite) amount of 
time.

The lock must not be held during command processing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12184) Avoid redundant ancestor metadata traversals for listStatus child entries.

2017-07-21 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12184:
--

 Summary: Avoid redundant ancestor metadata traversals for 
listStatus child entries.
 Key: HDFS-12184
 URL: https://issues.apache.org/jira/browse/HDFS-12184
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp


Creating a file status requires metadata for EC, EZ, storage policies, etc that 
are computed by traversing up the ancestor inodes.  List status will incur the 
same penalties for all child entries when one traversal is enough.  The penalty 
for large directories is not trivial.  Storage policies  short-circuit the full 
traversal.  The other lookups should too.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12173) MiniDFSCluster cannot reliably use NameNode#stop

2017-07-20 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12173:
--

 Summary: MiniDFSCluster cannot reliably use NameNode#stop
 Key: HDFS-12173
 URL: https://issues.apache.org/jira/browse/HDFS-12173
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Daryn Sharp


Sporadic test failures occur because {{NameNode#stop}} used by the mini cluster 
does not properly manage the HA context's state.  It directly calls 
{{HAState#exitState(context)}} instead of {{HAState#setState(context,state)}}.  
The latter will properly lock the namesystem and update the ha state while 
locked, while the former does not.  The result is that while the cluster is 
stopping, the lock is released and any queued rpc calls think the NN is still 
active and are processed while the NN is in an unstable half-stopped state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12172) Reduce EZ lookup overhead

2017-07-20 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12172:
--

 Summary: Reduce EZ lookup overhead
 Key: HDFS-12172
 URL: https://issues.apache.org/jira/browse/HDFS-12172
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.7.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp


A number of inefficiencies exist in EZ lookups.  These are amplified by 
frequent operations like list status.  Once one encryption zone exists, all 
operations take the performance penalty.

Ex. Operations should not perform redundant lookups.  EZ path reconstruction 
should be lazy since it's not required in the common case.  Renames do not need 
to reallocate new IIPs to check parent dirs for EZ.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12171) Reduce IIP object allocations for inode lookup

2017-07-20 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12171:
--

 Summary: Reduce IIP object allocations for inode lookup
 Key: HDFS-12171
 URL: https://issues.apache.org/jira/browse/HDFS-12171
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp


{{IIP#getReadOnlyINodes}} is invoked frequently for EZ and EC lookups.  It 
allocates unnecessary objects to make the primitive array an immutable array 
list.  IIP already has a method for indexed inode retrieval that can be tweaked 
to further improve performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12143) Improve performance of getting and removing inode features

2017-07-14 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12143:
--

 Summary: Improve performance of getting and removing inode features
 Key: HDFS-12143
 URL: https://issues.apache.org/jira/browse/HDFS-12143
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.8.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp


Getting a feature uses an iterator which is less performant than an indexed for 
loop.  Feature lookups are becoming more prolific so cycles count.

Removing a feature requires building a string for up to 3 precondition checks.  
The worst case of 3 is the penalty for a successful removal.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12142) Files may be closed before streamer is done

2017-07-14 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12142:
--

 Summary: Files may be closed before streamer is done
 Key: HDFS-12142
 URL: https://issues.apache.org/jira/browse/HDFS-12142
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 2.8.0
Reporter: Daryn Sharp


We're encountering multiple cases of clients calling updateBlockForPipeline on 
completed blocks.  Initial analysis is the client closes a file, completeFile 
succeeds, then it immediately attempts recovery.  The exception is swallowed on 
the client, only logged on the NN by checkUCBlock.

The problem "appears" to be benign (no data loss) but it's unproven if the 
issue always occurs for successfully closed files.  There appears to be very 
poor coordination between the dfs output stream's threads which leads to races 
that confuse the streamer thread – which probably should have been joined 
before returning from close.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12140) Remove BPOfferService lock contention to get block pool id

2017-07-14 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12140:
--

 Summary: Remove BPOfferService lock contention to get block pool id
 Key: HDFS-12140
 URL: https://issues.apache.org/jira/browse/HDFS-12140
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.8.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical


The block pool id is protected by a lock in {{BPOfferService}}.  This creates 
excessive contention especially for xceivers threads attempting to queue IBRs 
and heartbeat processing.  When the latter is delayed due to excessive 
FSDataset lock contention, it causes pipelines to collapse.

Accessing the block pool id should be lockless after registration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12137) DN dataset lock should be fair

2017-07-13 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12137:
--

 Summary: DN dataset lock should be fair
 Key: HDFS-12137
 URL: https://issues.apache.org/jira/browse/HDFS-12137
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.8.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical


The dataset lock is very highly contended.  The unfair nature can be especially 
harmful to the heartbeat handling.  Under high loads, partially expose by 
HDFS-12136 introducing disk i/o within the lock, the heartbeat handling thread 
may process commands so slowly due to the contention that the node becomes 
stale or falsely declared dead.  The unfair lock is not helping and appears to 
be causing frequent starvation under load.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12136) BlockSender performance regression due to volume scanner edge case

2017-07-13 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12136:
--

 Summary: BlockSender performance regression due to volume scanner 
edge case
 Key: HDFS-12136
 URL: https://issues.apache.org/jira/browse/HDFS-12136
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.8.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical


HDFS-11160 attempted to fix a volume scan race for a file appended mid-scan by 
reading the last checksum of finalized blocks within the {{BlockSender}} ctor.  
Unfortunately it's holding the exclusive dataset lock to open and read the 
metafile multiple times  Block sender instantiation becomes serialized.

Performance completely collapses under heavy disk i/o utilization or high 
xceiver activity.  Ex. lost node replication, balancing, or decommissioning.  
The xceiver threads congest creating block senders and impair the heartbeat 
processing that is contending for the same lock.  Combined with other lock 
contention issues, pipelines break and nodes sporadically go dead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12070) Failed block recovery leaves files open indefinitely and at risk for data loss

2017-06-29 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12070:
--

 Summary: Failed block recovery leaves files open indefinitely and 
at risk for data loss
 Key: HDFS-12070
 URL: https://issues.apache.org/jira/browse/HDFS-12070
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Daryn Sharp


Files will remain open indefinitely if block recovery fails which creates a 
high risk of data loss.  The replication monitor will not replicate these 
blocks.

The NN provides the primary node a list of candidate nodes for recovery which 
involves a 2-stage process. The primary node removes any candidates that cannot 
init replica recovery (essentially alive and knows about the block) to create a 
sync list.  Stage 2 issues updates to the sync list – _but fails if any node 
fails_ unlike the first stage.  The NN should be informed of nodes that did 
succeed.

Manual recovery will also fail until the problematic node is temporarily 
stopped so a connection refused will induce the bad node to be pruned from the 
candidates.  Recovery succeeds, the lease is released, under replication is 
fixed, and block is invalidated from the bad node.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12049) Recommissioning live nodes stalls the NN

2017-06-27 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12049:
--

 Summary: Recommissioning live nodes stalls the NN
 Key: HDFS-12049
 URL: https://issues.apache.org/jira/browse/HDFS-12049
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Priority: Critical


A node refresh will recommission included nodes that are alive and in 
decommissioning or decommissioned state.  The recommission will scan all blocks 
on the node, find over replicated blocks, chose an excess, queue an invalidate.

The process is expensive and worsened by overhead of storage types (even when 
not in use).  It can be especially devastating because the write lock is held 
for the entire node refresh.  _Recommissioning 67 nodes with ~500k blocks/node 
stalled rpc services for over 4 mins._



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11648) Lazy construct the IIP pathname

2017-04-12 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-11648:
--

 Summary: Lazy construct the IIP pathname 
 Key: HDFS-11648
 URL: https://issues.apache.org/jira/browse/HDFS-11648
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Daryn Sharp
Assignee: Daryn Sharp


The IIP pathname is a string constructed from the byte[][] components.  If the 
pathname will never be accessed, ex. processing listStatus children, building 
the path is unnecessarily expensive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11379) DFSInputStream may infinite loop requesting block locations

2017-01-27 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-11379:
--

 Summary: DFSInputStream may infinite loop requesting block 
locations
 Key: HDFS-11379
 URL: https://issues.apache.org/jira/browse/HDFS-11379
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.7.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical


DFSInputStream creation caches file size and initial range of locations.  If 
the file is truncated (or replaced) and the client attempts to read outside the 
initial range, the client goes into a tight infinite looping requesting 
locations for the nonexistent range.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11310) Reduce the performance impact of the balancer (trunk port)

2017-01-10 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-11310:
--

 Summary: Reduce the performance impact of the balancer (trunk port)
 Key: HDFS-11310
 URL: https://issues.apache.org/jira/browse/HDFS-11310
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs, namenode
Affects Versions: 3.0.0-alpha1
Reporter: Daryn Sharp
Priority: Critical


HDFS-7967 introduced a highly performant balancer getBlocks() query that scales 
to large/dense clusters.  The simple design implementation depends on the 
triplets data structure.  HDFS-9260 removed the triplets which fundamentally 
changes the implementation.  Either that patch must be reverted or the 
getBlocks() patch needs reimplementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10997) Reduce number of path resolving methods

2016-10-11 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10997:
--

 Summary: Reduce number of path resolving methods
 Key: HDFS-10997
 URL: https://issues.apache.org/jira/browse/HDFS-10997
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Daryn Sharp
Assignee: Daryn Sharp


FSDirectory contains many methods for resolving paths to an IIP and/or inode.  
These should be unified into a couple methods that will consistently do the 
basics of resolving reserved paths, blocking write ops from snapshot paths, 
verifying ancestors as directories, and throwing if symlinks are encountered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10980) Optimize check for existence of parent directory

2016-10-07 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10980:
--

 Summary: Optimize check for existence of parent directory
 Key: HDFS-10980
 URL: https://issues.apache.org/jira/browse/HDFS-10980
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Daryn Sharp
Assignee: Daryn Sharp


{{FSDirectory.verifyParentDir()}} uses a {{Path}} object to parse and return 
the parent path.  This is very expensive compared to using the path within the 
IIP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10979) Pass IIP for FSDirDeleteOp methods

2016-10-07 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10979:
--

 Summary: Pass IIP for FSDirDeleteOp methods
 Key: HDFS-10979
 URL: https://issues.apache.org/jira/browse/HDFS-10979
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Daryn Sharp
Assignee: Daryn Sharp


Remove path strings from method signatures and/or replace with IIP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10956) Remove rename/delete performance penalty when not using snapshots

2016-10-04 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10956:
--

 Summary: Remove rename/delete performance penalty when not using 
snapshots
 Key: HDFS-10956
 URL: https://issues.apache.org/jira/browse/HDFS-10956
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Daryn Sharp
Assignee: Daryn Sharp


When deleting or renaming directories, the entire subtree(s) is scanned for 
snapshottable directories.  The performance penalty may become very expensive 
for dense trees.  The snapshot manager knows if snapshots are in use, so 
clusters not using snapshots should not take the performance penalty. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10955) Pass IIP for FSDirAttr methods

2016-10-04 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10955:
--

 Summary: Pass IIP for FSDirAttr methods
 Key: HDFS-10955
 URL: https://issues.apache.org/jira/browse/HDFS-10955
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Daryn Sharp
Assignee: Daryn Sharp


Methods should always use the resolved IIP instead of re-solving the path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10940) Reduce performance penalty of block caching when not used

2016-09-30 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10940:
--

 Summary: Reduce performance penalty of block caching when not used
 Key: HDFS-10940
 URL: https://issues.apache.org/jira/browse/HDFS-10940
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7
Reporter: Daryn Sharp
Assignee: Daryn Sharp


For every block location generated, the CacheManager will create a junk object 
for a hash lookup of cached locations.  If there are no cached blocks, none of 
this is required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10939) Reduce performance penalty of encryption zones

2016-09-30 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10939:
--

 Summary: Reduce performance penalty of encryption zones
 Key: HDFS-10939
 URL: https://issues.apache.org/jira/browse/HDFS-10939
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.7
Reporter: Daryn Sharp
Assignee: Daryn Sharp


The encryption zone APIs should be optimized to extensively use IIPs to 
eliminate path resolutions.  The performance penalties incurred by common 
operations like creation of file statuses may be reduced by more extensive 
short-circuiting of EZ lookups when no EZs exist.  All file creates should not 
be subjected to the multi-stage locking performance penalty required only for 
EDEK generation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10851) FSDirStatAndListingOp: stop passing path as string

2016-09-09 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10851:
--

 Summary: FSDirStatAndListingOp: stop passing path as string
 Key: HDFS-10851
 URL: https://issues.apache.org/jira/browse/HDFS-10851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs
Reporter: Daryn Sharp
Assignee: Daryn Sharp


Path strings should be resolved once into INodesInPath.  The IIP should be used 
extensively from that point forward.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10850) getEZForPath should NOT throw FNF

2016-09-09 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10850:
--

 Summary: getEZForPath should NOT throw FNF
 Key: HDFS-10850
 URL: https://issues.apache.org/jira/browse/HDFS-10850
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 2.8.0
Reporter: Daryn Sharp
Priority: Blocker


HDFS-9433 made an incompatible change to the semantics of getEZForPath.  It 
used to return the EZ of the closest ancestor path.  It never threw FNF.  A 
common use of getEZForPath to determining if a file can be renamed, or must be 
copied due to mismatched EZs.  Notably, this has broken hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10789) Route webhdfs through the RPC call queue

2016-08-23 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10789:
--

 Summary: Route webhdfs through the RPC call queue
 Key: HDFS-10789
 URL: https://issues.apache.org/jira/browse/HDFS-10789
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ipc, webhdfs
Reporter: Daryn Sharp
Assignee: Daryn Sharp


Webhdfs is extremely expensive under load and is not subject to the QoS 
benefits of the RPC call queue.  HADOOP-13537 provides the basis for routing 
webhdfs through the call queue to provide unified QoS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10779) Rename does not need to re-solve destination

2016-08-19 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10779:
--

 Summary: Rename does not need to re-solve destination
 Key: HDFS-10779
 URL: https://issues.apache.org/jira/browse/HDFS-10779
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs
Reporter: Daryn Sharp
Assignee: Daryn Sharp


Rename uses {{FSDirectory.isDir(String)}} to determine if the destination is a 
directory.  This dissect the path, creates an IIP, checks if the last inode is 
a directory.  The rename operations already have the IIP and can check it 
directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10772) Reduce byte/string conversions for get listing

2016-08-17 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10772:
--

 Summary: Reduce byte/string conversions for get listing
 Key: HDFS-10772
 URL: https://issues.apache.org/jira/browse/HDFS-10772
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Daryn Sharp
Assignee: Daryn Sharp


{{FSDirectory.getListingInt}} does a byte/string conversion for the byte[] 
startAfter just to determine if it should be resolved as an inode path.  This 
is not the common case but rather for NFS support so it should be avoided.  
When the resolution is necessary the conversions may be reduced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10768) Optimize mkdir ops

2016-08-16 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10768:
--

 Summary: Optimize mkdir ops
 Key: HDFS-10768
 URL: https://issues.apache.org/jira/browse/HDFS-10768
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs
Reporter: Daryn Sharp
Assignee: Daryn Sharp


Directory creation causes excessive object allocation: ex. an immutable list 
builder, containing the string of components converted from the IIP's byte[]s, 
sublist views of the string list, iterable, followed by string to byte[] 
conversion.  This can all be eliminated by accessing the component's byte[] in 
the IIP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10762) Pass IIP for file status related methods

2016-08-15 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10762:
--

 Summary: Pass IIP for file status related methods
 Key: HDFS-10762
 URL: https://issues.apache.org/jira/browse/HDFS-10762
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs
Reporter: Daryn Sharp
Assignee: Daryn Sharp


The frequently called file status methods will not require path re-resolves if 
the IIP is passed down the call stack.  The code can be simplified further if 
the IIP tracks if the original path was a reserved raw path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10745) Directly resolve paths into INodesInPath

2016-08-10 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10745:
--

 Summary: Directly resolve paths into INodesInPath
 Key: HDFS-10745
 URL: https://issues.apache.org/jira/browse/HDFS-10745
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Daryn Sharp
Assignee: Daryn Sharp


The intermediate resolution to a string, only to be decomposed by 
{{INodesInPath}} back into a byte[][] can be eliminated by resolving directly 
to an IIP.  The IIP will contain the resolved path if required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10744) Internally optimize path component resolution

2016-08-10 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10744:
--

 Summary: Internally optimize path component resolution
 Key: HDFS-10744
 URL: https://issues.apache.org/jira/browse/HDFS-10744
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs
Reporter: Daryn Sharp
Assignee: Daryn Sharp


{{FSDirectory}}'s path resolution currently uses a mixture of string & byte[][] 
 conversions, back to string, back to byte[][] for {{INodesInPath}}.  
Internally all path component resolution should be byte[][]-based as the 
precursor to instantiating an {{INodesInPath}} w/o the last 2 unnecessary 
conversions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10743) MiniDFSCluster test runtimes can be drastically reduce

2016-08-10 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10743:
--

 Summary: MiniDFSCluster test runtimes can be drastically reduce
 Key: HDFS-10743
 URL: https://issues.apache.org/jira/browse/HDFS-10743
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Affects Versions: 2.0.0-alpha
Reporter: Daryn Sharp


{{MiniDFSCluster}} tests have excessive runtimes.  The main problem appears to 
be the heartbeat interval.  The NN may have to wait up to 3s (default value) 
for all DNs to heartbeat, triggering registration, so NN can go active.  Tests 
that repeatedly restart the NN are severely affected.

Example for varying heartbeat intervals for {{TestFSImageWithAcl}}:
* 3s = ~70s -- (disgusting, why I investigated)
* 1s = ~27s
* 500ms = ~17s -- (had to hack DNConf for millisecond precision)

That a 4x improvement in runtime.

17s is still excessively long for what the test does.  Further areas to explore 
when running tests:
* Reduce numerous sleeps intervals in DN's {{BPServiceActor}}.
* Ensure heartbeats and initial BR are sent immediately upon (re)registration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10722) Fix race condition in TestEditLog#testBatchedSyncWithClosedLogs

2016-08-04 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10722:
--

 Summary: Fix race condition in 
TestEditLog#testBatchedSyncWithClosedLogs
 Key: HDFS-10722
 URL: https://issues.apache.org/jira/browse/HDFS-10722
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Reporter: Daryn Sharp
Assignee: Daryn Sharp


The test may fail the following assertion if async edit logs are enabled:
{{logging edit without syncing should do not affect txid expected:<1> but 
was:<2>}}.  The async thread is doing batched syncs in the background.  logSync 
just ensures the edit is durable, so the txid may increase prior to sync.  It's 
a race.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-08-01 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp reopened HDFS-10301:


> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Fix For: 2.7.4
>
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10711) Optimize FSPermissionChecker group membership check

2016-08-01 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10711:
--

 Summary: Optimize FSPermissionChecker group membership check
 Key: HDFS-10711
 URL: https://issues.apache.org/jira/browse/HDFS-10711
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs
Reporter: Daryn Sharp
Assignee: Daryn Sharp


HADOOP-13442 obviates the need for multiple group related object allocations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10674) Optimize creating a full path from an inode

2016-07-21 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10674:
--

 Summary: Optimize creating a full path from an inode
 Key: HDFS-10674
 URL: https://issues.apache.org/jira/browse/HDFS-10674
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs
Reporter: Daryn Sharp
Assignee: Daryn Sharp


{{INode#getFullPathName}} walks up the inode tree, creates a INode[], converts 
each component byte[] name to a String while building the path.  This involves 
many allocations, copies, and char conversions.

The path should be built with a single byte[] allocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10673) Optimize FSPermissionChecker's internal path usage

2016-07-21 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10673:
--

 Summary: Optimize FSPermissionChecker's internal path usage
 Key: HDFS-10673
 URL: https://issues.apache.org/jira/browse/HDFS-10673
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs
Reporter: Daryn Sharp
Assignee: Daryn Sharp


The INodeAttributeProvider and AccessControlEnforcer features degrade 
performance and generate excessive garbage even when neither is used.  Main 
issues:
# A byte[][] of components is unnecessarily created.  Each path component 
lookup converts a subrange of the byte[][] to a new String[] - then not used by 
default attribute provider.
# Subaccess checks are insanely expensive.  The full path of every subdir is 
created by walking up the inode tree, creating a INode[], building a string by 
converting each inode's byte[] name to a string, etc.  Which will only be used 
if there's an exception.

The expensive of #1 should only be incurred when using the provider/enforcer 
feature.  For #2, paths should be created on-demand for exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10662) Optimize UTF8 string/byte conversions

2016-07-20 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10662:
--

 Summary: Optimize UTF8 string/byte conversions
 Key: HDFS-10662
 URL: https://issues.apache.org/jira/browse/HDFS-10662
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs
Reporter: Daryn Sharp
Assignee: Daryn Sharp


String/byte conversions may take either a Charset instance or its canonical 
name.  One might think a Charset instance would be faster due to avoiding a 
lookup and instantiation of a Charset, but it's not.  The canonical string name 
variants will cache the string encoder/decoder (obtained from a Charset) 
resulting in better performance.

LOG4J2-935 describes a real-world performance boost.  I micro-benched a 
marginal runtime improvement on jdk 7/8.  However for a 16 byte path, using the 
canonical name generated 50% less garbage.  For a 64 byte path, 25% of the 
garbage.  Given the sheer number of times that paths are (re)parsed, the cost 
adds up quickly.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10656) Optimize conversion of byte arrays back to path string

2016-07-19 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10656:
--

 Summary: Optimize conversion of byte arrays back to path string
 Key: HDFS-10656
 URL: https://issues.apache.org/jira/browse/HDFS-10656
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs
Reporter: Daryn Sharp
Assignee: Daryn Sharp


{{DFSUtil.byteArray2PathString}} generates excessive object allocation.
# each byte array is encoded to a string (copy)
# string appended to a builder which extracts the chars from the intermediate 
string (copy) and adds to its own char array
# builder's char array is re-alloced if over 16 chars (copy)
# builder's toString creates another string (copy)

Instead of allocating all these objects and performing multiple byte/char 
encoding/decoding conversions, the byte array can be built in-place with a 
single final conversion to a string.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10653) Optimize conversion from path string to components

2016-07-19 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10653:
--

 Summary: Optimize conversion from path string to components
 Key: HDFS-10653
 URL: https://issues.apache.org/jira/browse/HDFS-10653
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs
Affects Versions: 2.0.0-alpha
Reporter: Daryn Sharp
Assignee: Daryn Sharp


Converting a path String to a byte[][] currently requires an unnecessary 
intermediate conversion from String to String[].  Removing this will reduce 
excessive object allocation and byte copying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10619) Cache path in InodesInPath

2016-07-13 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10619:
--

 Summary: Cache path in InodesInPath
 Key: HDFS-10619
 URL: https://issues.apache.org/jira/browse/HDFS-10619
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs
Reporter: Daryn Sharp
Assignee: Daryn Sharp


INodesInPath#getPath, a frequently called method, dynamically builds the path.  
IIP should cache the path upon construction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10616) Improve performance of path handling

2016-07-12 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10616:
--

 Summary: Improve performance of path handling
 Key: HDFS-10616
 URL: https://issues.apache.org/jira/browse/HDFS-10616
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 2.0.0-alpha
Reporter: Daryn Sharp
Assignee: Daryn Sharp


Path handling in the namesystem and directory is very inefficient.  The path is 
repeatedly resolved, decomposed into path components, recombined to a full 
path. parsed again, throughout the system.  This is directly inefficient for 
general performance, and indirectly via unnecessary pressure on young gen GC.

The namesystem should only operate on paths, parse it once into inodes, and the 
directory should only operate on inodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10343) BlockManager#createLocatedBlocks may return blocks on failed storages

2016-04-28 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10343:
--

 Summary: BlockManager#createLocatedBlocks may return blocks on 
failed storages
 Key: HDFS-10343
 URL: https://issues.apache.org/jira/browse/HDFS-10343
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 2.6.0
Reporter: Daryn Sharp


Storage state is ignored when building the machines list.  Failed storage 
removal is not immediate so clients may be directed to bad locations.  The 
client recovers but it's less than ideal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10342) BlockManager#createLocatedBlocks should not check corrupt replicas if none are corrupt

2016-04-28 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10342:
--

 Summary: BlockManager#createLocatedBlocks should not check corrupt 
replicas if none are corrupt
 Key: HDFS-10342
 URL: https://issues.apache.org/jira/browse/HDFS-10342
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Affects Versions: 2.7.0
Reporter: Daryn Sharp


{{corruptReplicas#isReplicaCorrupt(block, node)}} is called for every node 
while populating the machines array.  There's no need to invoke the method if 
{{corruptReplicas#numCorruptReplicas(block)}} returned 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10326) Disable setting tcp socket send/receive buffers for write pipelines

2016-04-25 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10326:
--

 Summary: Disable setting tcp socket send/receive buffers for write 
pipelines
 Key: HDFS-10326
 URL: https://issues.apache.org/jira/browse/HDFS-10326
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs, datanode
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp


The DataStreamer and the Datanode use a hardcoded DEFAULT_DATA_SOCKET_SIZE=128K 
for the send and receive buffers of a write pipeline.  Explicitly setting tcp 
buffer sizes disables tcp stack auto-tuning.  

The hardcoded value will saturate a 1Gb with 1ms RTT.  105Mbs at 10ms.  Paltry 
11Mbs over a 100ms long haul.  10Gb networks are underutilized.

There should either be a configuration to completely disable settings the 
buffers, or the the setReceiveBuffer and setSendBuffer should be removed 
entirely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9566) Remove expensive getStorages method

2015-12-16 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-9566:
-

 Summary: Remove expensive getStorages method
 Key: HDFS-9566
 URL: https://issues.apache.org/jira/browse/HDFS-9566
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 3.0.0, 2.8.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp


HDFS-5318 added a {{BlocksMap#getStorages(Block, State)}} which is based on 
iterables and predicates.  The method is very expensive compared to a simple 
comparison/continue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9557) Reduce object allocation in PB conversion

2015-12-15 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-9557:
-

 Summary: Reduce object allocation in PB conversion
 Key: HDFS-9557
 URL: https://issues.apache.org/jira/browse/HDFS-9557
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.8.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp


PB conversions use {{ByteString.copyFrom}} to populate the builder.  
Unfortunately this creates unique instances for empty arrays instead of 
returning the singleton {{ByteString.EMPTY}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9287) Block placement completely fails if too many nodes are decommissioning

2015-10-22 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-9287:
-

 Summary: Block placement completely fails if too many nodes are 
decommissioning
 Key: HDFS-9287
 URL: https://issues.apache.org/jira/browse/HDFS-9287
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Priority: Critical


The DatanodeManager coordinates with the HeartbeatManager to update 
HeartbeatManager.Stats to track capacity and load.   This is crucial for block 
placement to consider space and load.  It's completely broken for decomm nodes.

The heartbeat manager substracts the prior values before it adds new values.  
During registration of a decomm node, it substracts before seeding the initial 
values.  This decrements nodesInService, flips state to decomm, add will not 
increment nodesInService (correct).  There are other math bugs (double adding) 
that accidentally work due to 0 values.

The result is every decomm node decrements the node count used for block 
placement.  When enough nodes are decomm, the replication monitor will silently 
stop working.  No logging.  It searches all nodes and just gives up.  
Eventually, all block allocation will also completely fail.  No files can be 
created.  No jobs can be submitted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9258) NN should indicate which nodes are stale

2015-10-16 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-9258:
-

 Summary: NN should indicate which nodes are stale
 Key: HDFS-9258
 URL: https://issues.apache.org/jira/browse/HDFS-9258
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: Daryn Sharp


Determining why the NN is not coming out of safemode is difficult - is it a bug 
or pending block reports?  If the number of nodes appears sufficient, but there 
are missing blocks, it would be nice to know which nodes haven't block reported 
(stale).  Instead of forcing the NN to leave safemode prematurely, the SE can 
first force block reports from stale nodes.

The datanode report and the web ui's node list should contain this information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9198) Coalesce IBR processing in the NN

2015-10-05 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-9198:
-

 Summary: Coalesce IBR processing in the NN
 Key: HDFS-9198
 URL: https://issues.apache.org/jira/browse/HDFS-9198
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: Daryn Sharp
Assignee: Daryn Sharp


IBRs from thousands of DNs under load will degrade NN performance due to 
excessive write-lock contention from multiple IPC handler threads.  The IBR 
processing is quick, so the lock contention may be reduced by coalescing 
multiple IBRs into a single write-lock transaction.  The handlers will also be 
freed up faster for other operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9107) Prevent NN's unrecoverable death spiral after full GC

2015-09-18 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-9107:
-

 Summary: Prevent NN's unrecoverable death spiral after full GC
 Key: HDFS-9107
 URL: https://issues.apache.org/jira/browse/HDFS-9107
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical


A full GC pause in the NN that exceeds the dead node interval can lead to an 
infinite cycle of full GCs.  The most common situation that precipitates an 
unrecoverable state is a network issue that temporarily cuts off multiple racks.

The NN wakes up and falsely starts marking nodes dead. This bloats the 
replication queues which increases memory pressure. The replications create a 
flurry of incremental block reports and a glut of over-replicated blocks.

The "dead" nodes heartbeat within seconds. The NN forces a re-registration 
which requires a full block report - more memory pressure. The NN now has to 
invalidate all the over-replicated blocks. The extra blocks are added to 
invalidation queues, tracked in an excess blocks map, etc - much more memory 
pressure.

All the memory pressure can push the NN into another full GC which repeats the 
entire cycle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9017) UI shows wrong last contact for dead nodes

2015-09-03 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-9017:
-

 Summary: UI shows wrong last contact for dead nodes
 Key: HDFS-9017
 URL: https://issues.apache.org/jira/browse/HDFS-9017
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Priority: Minor


It's showing the last contact as the restart of the NN host (not process, 
host).  Presumably it's using monotonic time 0.  Ideally last contact for nodes 
that never connected would be "never" instead of the epoch or boot time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8776) Decom manager should not be active on standby

2015-07-14 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-8776:
-

 Summary: Decom manager should not be active on standby
 Key: HDFS-8776
 URL: https://issues.apache.org/jira/browse/HDFS-8776
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp


The decommission manager should not be actively processing on the standby.

The decomm manager goes through the costly computation for determining every 
block on the node requires replication yet doesn't queue them for replication - 
because it's in standby. The decomm manager is holding the namesystem write 
lock, causing DNs to timeout on heartbeats or IBRs, NN purges the call queue of 
timed out clients, NN processes some heartbeats/IBRs before the decomm manager 
locks up the namesystem again. Nodes attempting to register will be sending 
full BRs which are more costly to send and discard than a heartbeat.

If a failover is required, the standby will likely have to struggle very hard 
to not GC while catching up on its queued IBRs while DNs continue to fill the 
call queue and time out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8674) Improve performance of postponed block scans

2015-06-26 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-8674:
-

 Summary: Improve performance of postponed block scans
 Key: HDFS-8674
 URL: https://issues.apache.org/jira/browse/HDFS-8674
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: HDFS
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical


When a standby goes active, it marks all nodes as stale which will cause 
block invalidations for over-replicated blocks to be queued until full block 
reports are received from the nodes with the block.  The replication monitor 
scans the queue with O(N) runtime.  It picks a random offset and iterates 
through the set to randomize blocks scanned.

The result is devastating when a cluster loses multiple nodes during a rolling 
upgrade. Re-replication occurs, the nodes come back, the excess block 
invalidations are postponed. Rescanning just 2k blocks out of millions of 
postponed blocks may take multiple seconds. During the scan, the write lock is 
held which stalls all other processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8675) IBRs from dead DNs go into infinite loop

2015-06-26 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-8675:
-

 Summary: IBRs from dead DNs go into infinite loop
 Key: HDFS-8675
 URL: https://issues.apache.org/jira/browse/HDFS-8675
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Daryn Sharp


If the DN sends an IBR after the NN declares it dead, the NN returns an IOE of 
unregistered or dead.  The DN catches the IOE, ignores it, and infinitely loops 
spamming the NN with retries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8616) Cherry pick HDFS-6495 for excess block leak

2015-06-17 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-8616:
-

 Summary: Cherry pick HDFS-6495 for excess block leak
 Key: HDFS-8616
 URL: https://issues.apache.org/jira/browse/HDFS-8616
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: Daryn Sharp


Busy clusters quickly leak tens or hundreds of thousands of excess blocks which 
slow BR processing.  HDFS-6495 should be cherry picked into 2.7.x.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8498) Blocks can be committed with wrong size

2015-05-29 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-8498:
-

 Summary: Blocks can be committed with wrong size
 Key: HDFS-8498
 URL: https://issues.apache.org/jira/browse/HDFS-8498
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical


When an IBR for a UC block arrives, the NN updates the expected location's 
block and replica state _only_ if it's on an unexpected storage for an expected 
DN.  If it's for an expected storage, only the genstamp is updated.  When the 
block is committed, and the expected locations are verified, only the genstamp 
is checked.  The size is not checked but it wasn't updated in the expected 
locations anyway.

A faulty client may misreport the size when committing the block.  The block is 
effectively corrupted.  If the NN issues replications, the received IBR is 
considered corrupt, the NN invalidates the block, immediately issues another 
replication.  The NN eventually realizes all the original replicas are corrupt 
after full BRs are received from the original DNs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8491) DN shutdown race conditions with open xceivers

2015-05-28 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-8491:
-

 Summary: DN shutdown race conditions with open xceivers
 Key: HDFS-8491
 URL: https://issues.apache.org/jira/browse/HDFS-8491
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Daryn Sharp


DN shutdowns at least for restarts have many race conditions.  Shutdown is very 
noisy with exceptions.  The DN notifies writers of the restart, waits 1s and 
then interrupts the xceiver threads but does not join.  The ipc server is 
stopped and then the bpos services are stopped.

Xceivers then encounter NPEs in closeBlock because the block no longer exists 
in the volume map when transient storage is checked.  Just before that, the DN 
notifies the NN the block was received.  This does not appear to always be 
true, but rather that the thread was interrupted. They race with bpos shutdown, 
and luckily appear to lose, to send the block received.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8492) DN should notify NN when client requests a missing block

2015-05-28 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-8492:
-

 Summary: DN should notify NN when client requests a missing block
 Key: HDFS-8492
 URL: https://issues.apache.org/jira/browse/HDFS-8492
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Daryn Sharp


If the DN has a block its volume map but not on-disk, it tells clients it's an 
invalid block id.  The NN is not informed of the missing block until either the 
bp slice scanner or the directory scanner detects the missing block.  DN should 
remove the replica from the volume map and inform the NN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8402) Fsck exit codes are not reliable

2015-05-14 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-8402:
-

 Summary: Fsck exit codes are not reliable
 Key: HDFS-8402
 URL: https://issues.apache.org/jira/browse/HDFS-8402
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp


HDFS-6663 added the ability to check specific blocks.  The exit code is 
non-deterministically based on the state (corrupt, healthy, etc) of the last 
displayed block's last storage location - instead of whether any of the checked 
blocks' storages are corrupt.  Blocks with decommissioning or decommissioned 
nodes should not be flagged as an error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8133) Improve readability of deleted block check

2015-04-13 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-8133:
-

 Summary: Improve readability of deleted block check
 Key: HDFS-8133
 URL: https://issues.apache.org/jira/browse/HDFS-8133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: Daryn Sharp
Assignee: Daryn Sharp


The current means of checking if a block is deleted is checking if its block 
collection is null.  A more readable approach is an isDeleted method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7990) IBR delete ack should not be delayed

2015-03-25 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-7990:
-

 Summary: IBR delete ack should not be delayed
 Key: HDFS-7990
 URL: https://issues.apache.org/jira/browse/HDFS-7990
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: Daryn Sharp
Assignee: Daryn Sharp


HDFS-395 added the incremental BR feature.  A concern was avoiding a race 
condition with ack-ing block invalidates followed by the directory scanner 
re-adding the block before the async delete service removes the block, possibly 
resulting in a full BR that includes the previously delete ack-ed block.

The solution was to batch  delay block deletion acks via a hardcoded interval 
of 100 * heartbeat interval (default: 5min).  The delay isn't required:
#  The {{FSDatasetSpi}} tracks blocks pending deletions precisely so the 
scanner won't re-add the block to the volume map
# Block receiving, received, and deleted go into the same pending report.  A 
block received event will trigger an immediate IBR which includes the deletion 
acks.  Ie. The delay is meaningless for all but a quiescent cluster
# Failing to promptly report deleted blocks on a quiescent cluster prevents the 
NN from updating the block maps to remove the locations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7967) Reduce the performance impact of the balancer

2015-03-20 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-7967:
-

 Summary: Reduce the performance impact of the balancer
 Key: HDFS-7967
 URL: https://issues.apache.org/jira/browse/HDFS-7967
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.0.0-alpha
Reporter: Daryn Sharp
Assignee: Daryn Sharp


The balancer needs to query for blocks to move from overly full DNs.  The block 
lookup is extremely inefficient.  An iterator of the node's blocks is created 
from the iterators of its storages' blocks.  A random number is chosen 
corresponding to how many blocks will be skipped via the iterator.  Each skip 
requires costly scanning of triplets.

The current design also only considers node imbalances while ignoring 
imbalances within the nodes's storages.  A more efficient and intelligent 
design may eliminate the costly skipping of blocks via round-robin selection of 
blocks from the storages based on remaining capacity.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7964) Add support for async edit logging

2015-03-20 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-7964:
-

 Summary: Add support for async edit logging
 Key: HDFS-7964
 URL: https://issues.apache.org/jira/browse/HDFS-7964
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.0.2-alpha
Reporter: Daryn Sharp
Assignee: Daryn Sharp


Edit logging is a major source of contention within the NN.  LogEdit is called 
within the namespace write log, while logSync is called outside of the lock to 
allow greater concurrency.  The handler thread remains busy until logSync 
returns to provide the client with a durability guarantee for the response.

Write heavy RPC load and/or slow IO causes handlers to stall in logSync.  
Although the write lock is not held, readers are limited/starved and the call 
queue fills.  Combining an edit log thread with postponed RPC responses from 
HADOOP-9953 will provide the same durability guarantee but immediately free up 
the handlers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7586) HFTP does not work when namenode bind on wildcard

2015-01-14 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp resolved HDFS-7586.
---
Resolution: Not a Problem

 HFTP does not work when namenode bind on wildcard
 -

 Key: HDFS-7586
 URL: https://issues.apache.org/jira/browse/HDFS-7586
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.2.0, 2.3.0, 2.4.0
Reporter: Benoit Perroud
Priority: Minor
 Attachments: HDFS-7586-v0.1.txt


 When wildcard binding for NameNode RPC is turned on (i.e.  
 dfs.namenode.rpc-address=0.0.0.0:8020), HFTP download is failing.
 Call to http://namenode:50070/data/.. returns the header Location with 
 parameter nnaddr=0.0.0.0:8020, which is unlikely to ever succeed :)
 The idea would be, if wildcard binding is enabled, to get read the IP address 
 the request is actually connected to from the HttpServletRequest and return 
 this one.
 WDYT?
 How to reproduce:
 1. Turn on wildcard binding
 {code}dfs.namenode.rpc-address=0.0.0.0:8020{code}
 2. Upload a file
 {code}$ echo 123 | hdfs dfs -put - /tmp/randomFile.txt{code}
 3. Validate it's failing
 {code}
 $ hdfs dfs -cat hftp://namenode1/tmp/randomFile.txt
 {code}
 4. Get more details via curl
 {code}
 $ curl -vv http://namenode1:50070/data/tmp/randomFile.txt?ugi=hdfs | grep 
 Location:
  Location: 
 http://datanode003:50075/streamFile/tmp/randomFile.txt?ugi=hdfsamp;nnaddr=0.0.0.0:8020
 {code}
 We can clearly see the 0.0.0.0 returned as the NN ip.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7607) Use random rack-local node for webhdfs opens to avoid OOM on DNs

2015-01-13 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-7607:
-

 Summary: Use random rack-local node for webhdfs opens to avoid OOM 
on DNs
 Key: HDFS-7607
 URL: https://issues.apache.org/jira/browse/HDFS-7607
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode, webhdfs
Affects Versions: 2.0.0-alpha
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical


Webhdfs currently redirects a client to the DN that physically has one of the
replicas.  Unlike the hdfs data streamer protocol which can easily handle 
hundreds or thousands of connections, jetty has poor performance under heavy 
load.  Webhdfs clients can easily overwhelm the DNs and likely cause OOMs or 
excessive GC.

The NN should redirect the client to a rack-local location to distribute the 
webhdfs load across multiple hosts.  The rack can then use the lightweight 
streamer protocol.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7597) Clients seeking over webhdfs may crash the NN

2015-01-09 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-7597:
-

 Summary: Clients seeking over webhdfs may crash the NN
 Key: HDFS-7597
 URL: https://issues.apache.org/jira/browse/HDFS-7597
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: webhdfs
Affects Versions: 2.0.0-alpha
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical


Webhdfs seeks involve closing the current connection, and reissuing a new open 
request with the new offset.  The RPC layer caches connections so the DN keeps 
a lingering connection open to the NN.  Connection caching is in part based on 
UGI.  Although the client used the same token for the new offset request, the 
UGI is different which forces the DN to open another unnecessary connection to 
the NN.

A job that performs many seeks will easily crash the NN due to fd exhaustion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7457) DatanodeID generates excessive garbage

2014-12-01 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-7457:
-

 Summary: DatanodeID generates excessive garbage
 Key: HDFS-7457
 URL: https://issues.apache.org/jira/browse/HDFS-7457
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp


{{DatanodeID#getXferAddr}} is a dynamically generated string.  This string is 
repeatedly generated for the hash code, equality, comparisons, and 
stringification.  Every DN-NN RPC method calls {{DatanodeManager#getDatanode}} 
to validate if the node is registered, which involves a call to {{getXferAddr}}.

The dynamic computation generates unnecessary trash that puts unnecessary 
pressure on the GC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7433) DatanodeMap is inefficient

2014-11-24 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-7433:
-

 Summary: DatanodeMap is inefficient
 Key: HDFS-7433
 URL: https://issues.apache.org/jira/browse/HDFS-7433
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical


The datanode map is currently a {{TreeMap}}.  For many thousands of datanodes, 
tree lookups are ~10X more expensive than a {{HashMap}}.  Insertions and 
removals are up to 100X more expensive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7434) DatanodeID hashCode should not be mutable

2014-11-24 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-7434:
-

 Summary: DatanodeID hashCode should not be mutable
 Key: HDFS-7434
 URL: https://issues.apache.org/jira/browse/HDFS-7434
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp


Mutable hash codes may lead to orphaned instances in a collection.  Instances 
must always be removed prior to modification of hash code values, and 
re-inserted.  Although current code appears to do this, the mutable hash code 
is a landmine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7435) PB encoding of block reports is very inefficient

2014-11-24 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-7435:
-

 Summary: PB encoding of block reports is very inefficient
 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical


Block reports are encoded as a PB repeating long.  Repeating fields use an 
{{ArrayList}} with default capacity of 10.  A block report containing tens or 
hundreds of thousand of longs (3 for each replica) is extremely expensive since 
the {{ArrayList}} must realloc many times.  Also, decoding repeating fields 
will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7213) processIncrementalBlockReport performance degradation

2014-10-08 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-7213:
-

 Summary: processIncrementalBlockReport performance degradation
 Key: HDFS-7213
 URL: https://issues.apache.org/jira/browse/HDFS-7213
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Daryn Sharp
Assignee: Eric Payne
Priority: Critical


{{BlockManager#processIncrementalBlockReport}} has a debug line that is missing 
a {{isDebugEnabled}} check.  The write lock is being held.  Coupled with the 
increase in incremental block reports from receiving blocks, under heavy load 
this log line noticeably degrades performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7046) HA NN can NPE upon transition to active

2014-09-11 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-7046:
-

 Summary: HA NN can NPE upon transition to active
 Key: HDFS-7046
 URL: https://issues.apache.org/jira/browse/HDFS-7046
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0, 3.0.0
Reporter: Daryn Sharp
Priority: Critical


While processing edits, the NN may decide after adjusting block totals to leave 
safe mode - in the middle of the edit.  Going active starts the secret manager 
which generates a new secret key, which in turn generates an edit, which NPEs 
because the edit log is not open.

# Transitions should _not_ occur in the middle of an edit.
# The edit log appears to claim it's open for write when the stream isn't even 
open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7005) DFS input streams do not timeout

2014-09-05 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-7005:
-

 Summary: DFS input streams do not timeout
 Key: HDFS-7005
 URL: https://issues.apache.org/jira/browse/HDFS-7005
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical


Input streams lost their timeout.  The problem appears to be 
{{DFSClient#newConnectedPeer}} does not set the read timeout.  During a 
temporary network interruption the server will close the socket, unbeknownst to 
the client host, which blocks on a read forever.

The results are dire.  Services such as the RM, JHS, NMs, oozie servers, etc 
all need to be restarted to recover - unless you want to wait many hours for 
the tcp stack keepalive to detect the broken socket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-6964) NN fails to fix under replication leading to data loss

2014-08-28 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-6964:
-

 Summary: NN fails to fix under replication leading to data loss
 Key: HDFS-6964
 URL: https://issues.apache.org/jira/browse/HDFS-6964
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Priority: Blocker


We've encountered lost blocks due to node failure even when there is ample time 
to fix the under-replication.

2 nodes were lost.  The 3rd node with the last remaining replicas averaged 1 
copy block per heartbeat (3s) until ~7h later when that node was lost resulting 
in over 50 lost blocks.  When the node was restarted and sent its BR the NN 
immediately began fixing the replication.

In another data loss event, over 150 blocks were lost due to node failure but 
the timing of the node loss is not known so there may have been inadequate time 
to fix the under-replication unlike the first case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6967) DNs may OOM under high webhdfs load

2014-08-28 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-6967:
-

 Summary: DNs may OOM under high webhdfs load
 Key: HDFS-6967
 URL: https://issues.apache.org/jira/browse/HDFS-6967
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, webhdfs
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp


Webhdfs uses jetty.  The size of the request thread pool is limited, but jetty 
will accept and queue infinite connections.  Every queued connection is heavy 
with buffers, etc.  Unlike data streamer connections, thousands of webhdfs 
connections will quickly OOM a DN.  The accepted requests must be bounded and 
excess clients rejected so they retry on a new DN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6948) DN rejects blocks if it has older UC block

2014-08-26 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-6948:
-

 Summary: DN rejects blocks if it has older UC block
 Key: HDFS-6948
 URL: https://issues.apache.org/jira/browse/HDFS-6948
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp


DNs appear to always reject blocks, even with newer genstamps, if it already 
has a UC copy in its tmp dir.

{noformat}ReplicaAlreadyExistsException: Block
XXX already
exists in state TEMPORARY and thus cannot be created{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6773) MiniDFSCluster can run dramatically faster

2014-07-29 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-6773:
-

 Summary: MiniDFSCluster can run dramatically faster
 Key: HDFS-6773
 URL: https://issues.apache.org/jira/browse/HDFS-6773
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp


The mini cluster is unnecessarily running with durable edit logs.  The 
following change cut runtime of a single test from ~30s to ~10s.

{code}EditLogFileOutputStream.setShouldSkipFsyncForTesting(true);{code}

The mini cluster should default to this behavior after identifying the few edit 
log tests that probably depend on durable logs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >