[jira] [Commented] (HDFS-12643) HDFS maintenance state behaviour is confusing and not well documented

2021-09-28 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421453#comment-17421453
 ] 

Kihwal Lee commented on HDFS-12643:
---

Probably the missing information is that the cluster nodes need to be actively 
managed using {{dfs.hosts}} in order to use the maintenance mode feature.  It 
was likely overlooked because most big organizations do use either the old or 
new combined hosts file to manage cluster membership. For example, 
decommissioning also requires the use of hosts file based cluster membership 
management.  At minimum, the documentation need to be updated.


> HDFS maintenance state behaviour is confusing and not well documented
> -
>
> Key: HDFS-12643
> URL: https://issues.apache.org/jira/browse/HDFS-12643
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation, namenode
>Reporter: Andre Araujo
>Priority: Major
>
> The current implementation of the HDFS maintenance state feature is confusing 
> and error-prone. The documentation is missing important information that's 
> required for the correct use of the feature.
> For example, if the Hadoop admin wants to put a single node in maintenance 
> state, he/she can add a single entry to the maintenance file with the 
> contents:
> {code}
> {
>"hostName": "host-1.example.com",
>"adminState": "IN_MAINTENANCE",
>"maintenanceExpireTimeInMS": 1507663698000
> }
> {code}
> Let's say now that the actual maintenance finished well before the set 
> expiration time and the Hadoop admin wants to bring the node back to NORMAL 
> state. It would be natural to simply change the state of the node, as show 
> below, and run another refresh:
> {code}
> {
>"hostName": "host-1.example.com",
>"adminState": "NORMAL"
> }
> {code}
> The configuration file above, though, not only take the node {{host-1}} out 
> of maintenance state but it also *blacklists all the other DataNodes*. This 
> behaviour seems inconsistent to me and is due to {{emptyInServiceNodeLists}} 
> being set to {{false}} 
> [here|https://github.com/apache/hadoop/blob/230b85d5865b7e08fb7aaeab45295b5b966011ef/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CombinedHostFileManager.java#L80]
>  only when there is at least one node with {{adminState = NORMAL}} listed in 
> the file.
> I believe that it would be more consistent, and less error prone, to simply 
> implement the following:
> * If the dfs.hosts file is empty, all nodes are allowed and in normal state
> * If the file is not empty, any host *not* listed in the file is 
> *blacklisted*, regardless of the state of the hosts listed in the file.
> Regardless of the implementation being changed or not, the documentation also 
> needs to be updated to ensure the readers know of the caveats mentioned above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16198) Short circuit read leaks Slot objects when InvalidToken exception is thrown

2021-09-15 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-16198:
--
Fix Version/s: 2.10.2

> Short circuit read leaks Slot objects when InvalidToken exception is thrown
> ---
>
> Key: HDFS-16198
> URL: https://issues.apache.org/jira/browse/HDFS-16198
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eungsop Yoo
>Assignee: Eungsop Yoo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 2.10.2, 3.2.3, 3.3.2
>
> Attachments: HDFS-16198.patch, screenshot-2.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> In secure mode, 'dfs.block.access.token.enable' should be set 'true'. With 
> this configuration SecretManager.InvalidToken exception may be thrown if the 
> access token expires when we do short circuit reads. It doesn't matter 
> because the failed reads will be retried. But it causes the leakage of 
> ShortCircuitShm.Slot objects. 
>  
> We found this problem in our secure HBase clusters. The number of open file 
> descriptors of RegionServers kept increasing using short circuit reading. 
> !screenshot-2.png!
>  
> It was caused by the leakage of shared memory segments used by short circuit 
> reading.
> {code:java}
> [root ~]# lsof -p $(ps -ef | grep proc_regionserver | grep -v grep | awk 
> '{print $2}') | grep /dev/shm | wc -l
> 3925
> [root ~]# lsof -p $(ps -ef | grep proc_regionserver | grep -v grep | awk 
> '{print $2}') | grep /dev/shm | head -5
> java 86309 hbase DEL REG 0,19 2308279984 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_743473959
> java 86309 hbase DEL REG 0,19 2306359893 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_1594162967
> java 86309 hbase DEL REG 0,19 2305496758 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_2043027439
> java 86309 hbase DEL REG 0,19 2304784261 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_689571088
> java 86309 hbase DEL REG 0,19 2302621988 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_347008590 
> {code}
>  
> We finally found that the root cause of this is the leakage of 
> ShortCircuitShm.Slot.
>  
> The fix is trivial. Just free the slot when InvalidToken exception is thrown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16198) Short circuit read leaks Slot objects when InvalidToken exception is thrown

2021-09-15 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415718#comment-17415718
 ] 

Kihwal Lee commented on HDFS-16198:
---

Cherry-picked to branch-2.10.

> Short circuit read leaks Slot objects when InvalidToken exception is thrown
> ---
>
> Key: HDFS-16198
> URL: https://issues.apache.org/jira/browse/HDFS-16198
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eungsop Yoo
>Assignee: Eungsop Yoo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: HDFS-16198.patch, screenshot-2.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> In secure mode, 'dfs.block.access.token.enable' should be set 'true'. With 
> this configuration SecretManager.InvalidToken exception may be thrown if the 
> access token expires when we do short circuit reads. It doesn't matter 
> because the failed reads will be retried. But it causes the leakage of 
> ShortCircuitShm.Slot objects. 
>  
> We found this problem in our secure HBase clusters. The number of open file 
> descriptors of RegionServers kept increasing using short circuit reading. 
> !screenshot-2.png!
>  
> It was caused by the leakage of shared memory segments used by short circuit 
> reading.
> {code:java}
> [root ~]# lsof -p $(ps -ef | grep proc_regionserver | grep -v grep | awk 
> '{print $2}') | grep /dev/shm | wc -l
> 3925
> [root ~]# lsof -p $(ps -ef | grep proc_regionserver | grep -v grep | awk 
> '{print $2}') | grep /dev/shm | head -5
> java 86309 hbase DEL REG 0,19 2308279984 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_743473959
> java 86309 hbase DEL REG 0,19 2306359893 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_1594162967
> java 86309 hbase DEL REG 0,19 2305496758 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_2043027439
> java 86309 hbase DEL REG 0,19 2304784261 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_689571088
> java 86309 hbase DEL REG 0,19 2302621988 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_347008590 
> {code}
>  
> We finally found that the root cause of this is the leakage of 
> ShortCircuitShm.Slot.
>  
> The fix is trivial. Just free the slot when InvalidToken exception is thrown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16164) Configuration to allow group with read-all privilege

2021-08-11 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397658#comment-17397658
 ] 

Kihwal Lee commented on HDFS-16164:
---

Maybe you can accomplish what you want with ACL.

> Configuration to allow group with read-all privilege
> 
>
> Key: HDFS-16164
> URL: https://issues.apache.org/jira/browse/HDFS-16164
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Major
>
> We see more use cases that need read-all permission to hdfs. One example is 
> data quality service that needs to read all the data but no need to write. 
> Currently seems hdfs only supports supergroup that can do anything.
> Maybe we can add configuration like dfs.permissions.read-all.group to manage 
> this type of permissions easily.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16134) Optimize RPC throughput

2021-07-21 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384959#comment-17384959
 ] 

Kihwal Lee commented on HDFS-16134:
---

[~daryn]

> Optimize RPC throughput
> ---
>
> Key: HDFS-16134
> URL: https://issues.apache.org/jira/browse/HDFS-16134
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Priority: Major
>
> Now RPC has the following phenomena:
> 1. Each RPC has 1 Responder, which is used to return the result, but many 
> client requests are directly returned after the handler is processed;
> 2. Call#Connection#responseQueue uses synchronous control every time it is 
> used;
> These phenomena will hinder the performance of RPC, we can try to make some 
> changes.
> 1. Set multiple responders for each RPC, and the user returns the result;
> 2. After the Handler processing is completed, add the results that need to be 
> returned to a queue, and the Responder will process the queue data;
> 3. Add an identifier similar to sequenceId to ensure the order of return;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16127) Improper pipeline close recovery causes a permanent write failure or data loss.

2021-07-16 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-16127:
--
Fix Version/s: 3.3.2
   3.2.3
   3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks for the review, Daryn. I've committed this to trunk, branch-3.3 and 
branch-3.2.

> Improper pipeline close recovery causes a permanent write failure or data 
> loss.
> ---
>
> Key: HDFS-16127
> URL: https://issues.apache.org/jira/browse/HDFS-16127
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Major
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: HDFS-16127.patch
>
>
> When a block is being closed, the data streamer in the client waits for the 
> final ACK to be delivered. If an exception is received during this wait, the 
> close is retried. This assumption has become invalid by HDFS-15813, resulting 
> in permanent write failures in some close error cases involving slow nodes. 
> There are also less frequent cases of data loss.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16127) Improper pipeline close recovery causes a permanent write failure or data loss.

2021-07-14 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380634#comment-17380634
 ] 

Kihwal Lee commented on HDFS-16127:
---

There is no unit test in the patch. It isn't easy to instrument this particular 
bug in a unit test without intrusive instrumentation. 

> Improper pipeline close recovery causes a permanent write failure or data 
> loss.
> ---
>
> Key: HDFS-16127
> URL: https://issues.apache.org/jira/browse/HDFS-16127
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Major
> Attachments: HDFS-16127.patch
>
>
> When a block is being closed, the data streamer in the client waits for the 
> final ACK to be delivered. If an exception is received during this wait, the 
> close is retried. This assumption has become invalid by HDFS-15813, resulting 
> in permanent write failures in some close error cases involving slow nodes. 
> There are also less frequent cases of data loss.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16127) Improper pipeline close recovery causes a permanent write failure or data loss.

2021-07-13 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-16127:
--
Attachment: HDFS-16127.patch

> Improper pipeline close recovery causes a permanent write failure or data 
> loss.
> ---
>
> Key: HDFS-16127
> URL: https://issues.apache.org/jira/browse/HDFS-16127
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Major
> Attachments: HDFS-16127.patch
>
>
> When a block is being closed, the data streamer in the client waits for the 
> final ACK to be delivered. If an exception is received during this wait, the 
> close is retried. This assumption has become invalid by HDFS-15813, resulting 
> in permanent write failures in some close error cases involving slow nodes. 
> There are also less frequent cases of data loss.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16127) Improper pipeline close recovery causes a permanent write failure or data loss.

2021-07-13 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-16127:
--
Status: Patch Available  (was: Open)

> Improper pipeline close recovery causes a permanent write failure or data 
> loss.
> ---
>
> Key: HDFS-16127
> URL: https://issues.apache.org/jira/browse/HDFS-16127
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Major
> Attachments: HDFS-16127.patch
>
>
> When a block is being closed, the data streamer in the client waits for the 
> final ACK to be delivered. If an exception is received during this wait, the 
> close is retried. This assumption has become invalid by HDFS-15813, resulting 
> in permanent write failures in some close error cases involving slow nodes. 
> There are also less frequent cases of data loss.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-16127) Improper pipeline close recovery causes a permanent write failure or data loss.

2021-07-13 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee reassigned HDFS-16127:
-

Assignee: Kihwal Lee

> Improper pipeline close recovery causes a permanent write failure or data 
> loss.
> ---
>
> Key: HDFS-16127
> URL: https://issues.apache.org/jira/browse/HDFS-16127
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Major
>
> When a block is being closed, the data streamer in the client waits for the 
> final ACK to be delivered. If an exception is received during this wait, the 
> close is retried. This assumption has become invalid by HDFS-15813, resulting 
> in permanent write failures in some close error cases involving slow nodes. 
> There are also less frequent cases of data loss.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16127) Improper pipeline close recovery causes a permanent write failure or data loss.

2021-07-13 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379964#comment-17379964
 ] 

Kihwal Lee commented on HDFS-16127:
---

The proposed solution is to check the size of {{ackQueue}} when 
{{waitForAllAcks()}} for the final packet throws an {{IOException}}. If the 
queue is empty we can assume the last ack was received and the final packet for 
the block was removed from the queue, meaning no recovery is needed.

> Improper pipeline close recovery causes a permanent write failure or data 
> loss.
> ---
>
> Key: HDFS-16127
> URL: https://issues.apache.org/jira/browse/HDFS-16127
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Priority: Major
>
> When a block is being closed, the data streamer in the client waits for the 
> final ACK to be delivered. If an exception is received during this wait, the 
> close is retried. This assumption has become invalid by HDFS-15813, resulting 
> in permanent write failures in some close error cases involving slow nodes. 
> There are also less frequent cases of data loss.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16127) Improper pipeline close recovery causes a permanent write failure or data loss.

2021-07-13 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379961#comment-17379961
 ] 

Kihwal Lee commented on HDFS-16127:
---

*How the bug manifests*
In DataStreamer thread's main loop, it flushes everything and waits for all 
acks are received when it sees an empty close packet from dataQueue. Then it 
proceeds to send the close packet to signal datanodes to finalize the replicas. 
It also waits for the ack to the final packet by calling waitForAllAcks(). 
Prior to HDFS-15813, it involved no network activities, but network failures 
became possible after this. The following is the client log entry from one of 
the failure/data loss cases.

{noformat}
org.apache.hadoop.hdfs.DataStreamer: DataStreamer Exception
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:468)
at 
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.apache.hadoop.hdfs.DataStreamer.sendPacket(DataStreamer.java:857)
at 
org.apache.hadoop.hdfs.DataStreamer.sendHeartbeat(DataStreamer.java:875)
at 
org.apache.hadoop.hdfs.DataStreamer.waitForAllAcks(DataStreamer.java:845)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:798)
{noformat}

This exception resulted in a close recovery, because the pipeline stage was set 
to PIPELINE_CLOSE at this point. However, the client had no error coming from 
its ResponseProcessor, meaning that it has actually received the final ack and 
removed the final packet from ackQueue. Since it shut itself down cleanly, 
there was no sign of read or write error from this thread.

The following is the main part of the close recovery code, after a connection 
is successfully established.
{code:java}
  DFSPacket endOfBlockPacket = dataQueue.remove();  // remove the end 
of block packet
  assert endOfBlockPacket.isLastPacketInBlock();
  assert lastAckedSeqno == endOfBlockPacket.getSeqno() - 1;
  lastAckedSeqno = endOfBlockPacket.getSeqno();
  pipelineRecoveryCount = 0;
  dataQueue.notifyAll();
{code}

The asserts would have prevented the bug from propagating if they were active. 
(They are only active during testing.) It blindly requeues the content of 
ackQueue, thinking the unACKed final packet is still there. In this failure 
case, there is none, as the final packet was actually ACKed. The datanodes 
normally closed the connections, which resulted in "connection rest" for the 
data streamer stuckin in sending a heartbeat. The recovery simply dequeues one 
packet and tosses it away. After all, this packet is supposed to have no data. 
It even erroneously updates lastAckedSeqno.

At this point the first packet belonging to the next block has been thrown 
away. For the next block write, datanodes complains that the first packet's 
offset is non-zero. This is irrecoverable and after 5 times of retries, the 
data write fails.

*Data loss*
These errors usually cause a permanent write failure due to inability to write 
further with a first packet (actually the 2nd one, but the bug makes it the 
first one to be sent) of the block starting at non-zero offset. Thus the errors 
are propagated back to the user and results in a task attempt failure, etc.

However, if the remaining bytes to write to a new block fits in one packet, 
something worse happens. If the client encounters the close-recovery bug, the 
one data packet that gets dropped by the faulty recovery is the only data 
packet for the next block. The remaining packet after that is the final close 
packet for the next block. When the client continues to the next block, the 
datanode rejects the write as the close packet has a non-zero offset.

But instead of causing a permanent write failure, the client now enters a 
close-recovery phase, since the pipeline stage was set to PIPELINE_CLOSE while 
sending the packet. The new connection header tells datanodes that it is for a 
close recovery so they simply closes the zero-byte block file with the 
specified gen stamp. The recovery will appear successful, so the client gets no 
error 

[jira] [Updated] (HDFS-16127) Improper pipeline close recovery causes a permanent write failure or data loss.

2021-07-13 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-16127:
--
Description: When a block is being closed, the data streamer in the client 
waits for the final ACK to be delivered. If an exception is received during 
this wait, the close is retried. This assumption has become invalid by 
HDFS-15813, resulting in permanent write failures in some close error cases 
involving slow nodes. There are also less frequent cases of data loss.  (was: 
While waiting for the final ack for the empty close packet, the main 
DataStreamer thread can receive an exception even when the final ack was 
received and pipelines close normally.  This leads to an unnecessary close 
recovery that results in a permanent write failure or a silent data loss.
)

> Improper pipeline close recovery causes a permanent write failure or data 
> loss.
> ---
>
> Key: HDFS-16127
> URL: https://issues.apache.org/jira/browse/HDFS-16127
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Priority: Major
>
> When a block is being closed, the data streamer in the client waits for the 
> final ACK to be delivered. If an exception is received during this wait, the 
> close is retried. This assumption has become invalid by HDFS-15813, resulting 
> in permanent write failures in some close error cases involving slow nodes. 
> There are also less frequent cases of data loss.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16127) Improper pipeline close recovery causes a permanent write failure or data loss.

2021-07-13 Thread Kihwal Lee (Jira)
Kihwal Lee created HDFS-16127:
-

 Summary: Improper pipeline close recovery causes a permanent write 
failure or data loss.
 Key: HDFS-16127
 URL: https://issues.apache.org/jira/browse/HDFS-16127
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee


While waiting for the final ack for the empty close packet, the main 
DataStreamer thread can receive an exception even when the final ack was 
received and pipelines close normally.  This leads to an unnecessary close 
recovery that results in a permanent write failure or a silent data loss.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16123) NameNode standby checkpoint is abnormal, because the data disk is full, even if the disk space is restored

2021-07-12 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee resolved HDFS-16123.
---
Resolution: Invalid

>  NameNode standby checkpoint is abnormal, because the data disk is full, even 
> if the disk space is restored
> ---
>
> Key: HDFS-16123
> URL: https://issues.apache.org/jira/browse/HDFS-16123
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.1.3
> Environment: Hadoop 3.1.3
> When dfs.namenode.name.dir disk is full,standby nn ERROR:
> 2021-05-28 10:15:20,206 | WARN  | IPC Server handler 48 on 25000 | Space 
> available on volume '/dev/vdb' is 0, which is below the configured reserved 
> amount 104857600 | NameNodeResourceChecker.java:91
> 2021-05-28 10:15:20,206 | INFO  | IPC Server handler 48 on 25000 | IPC Server 
> handler 48 on 25000, call 
> org.apache.hadoop.ha.HAServiceProtocol.monitorHealth from 192.168.0.142:35128 
> Call#193407129 Retry#0 | Server.java
>  org.apache.hadoop.ha.HealthCheckFailedException: The NameNode has no 
> resources available 
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.monitorHealth(NameNode.java:1781)
>  
> After hard disk space recovery,but checkpoint cannot be restored; the ERROR 
> is:
> 2021-05-28 15:19:34,494 | ERROR | Standby State Checkpointer | Exception in 
> doCheckpoint | StandbyCheckpointer.java:452
>  java.io.IOException: No image directories available! 
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImageInAllDirs(FSImage.java:1219)
>Reporter: baijixue
>Priority: Major
>
> When the disk space of fsimages dir is located is full, standby NameNode 
> reports the error: "Space available on volume'/dev/vdb' is 0"; after the 
> space is restored, the standby nn's checkpoint cannot be restored, and the 
> error is  "java.io.IOException: No image directories available!"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16124) NameNode standby checkpoint is abnormal, error is No image directories available!

2021-07-12 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379191#comment-17379191
 ] 

Kihwal Lee commented on HDFS-16124:
---

Namenodes do not automatically restore failed storage directories by design. 
Did you try {{hdfs dfsadmin -restoreFailedStorage}}? 

>  NameNode standby checkpoint is abnormal, error is No image directories 
> available! 
> ---
>
> Key: HDFS-16124
> URL: https://issues.apache.org/jira/browse/HDFS-16124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.1.3
>Reporter: baijixue
>Priority: Major
>
> When the disk space of fsimages dir is located is full,  standby NameNode 
> reports the error: "Space available on volume'/dev/vdb' is 0";   after the 
> space is restored, the standby nn's checkpoint cannot be restored, and the 
> error is  "java.io.IOException: No image directories available!".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16117) Add file count info in audit log to record the file count for delete and getListing RPC request to assist user trouble shooting when RPC cost is increasing

2021-07-09 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17378127#comment-17378127
 ] 

Kihwal Lee commented on HDFS-16117:
---

Traditionally audit log format changes have been marked incompatible. I just 
set the flag.

{{getListing}} has a configured limit for each response size, so you will see 
multiple rpc calls for the same path from the same path multiple times in the 
audit log, if the user is listing a big directory. You can guess how big the 
directory is from this. The number of calls already reflects how many directory 
entries were returned.

More importantly, it will be beneficial to distinguish between listStatus and 
listLocatedStatus. We have an internal patch to report separate rpc metrics for 
these and it is helpful. Not sure whether it can be applied as is to trunk, as 
we have other RPC changes that are being tested.

> Add file count info in audit log to record the file count for delete and 
> getListing RPC request to assist user trouble shooting when RPC cost is 
> increasing 
> 
>
> Key: HDFS-16117
> URL: https://issues.apache.org/jira/browse/HDFS-16117
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.1
>Reporter: Daniel Ma
>Assignee: Daniel Ma
>Priority: Major
> Fix For: 3.3.1
>
> Attachments: 0001-HDFS-16117.patch
>
>
> Currently, there is no file count in audit log for delete and getListing RPC 
> request, therefore, for the increasing RPC call, it is not easy to configure 
> it out whether the time-consuming RPC  request is related to too many files 
> be operated in the RPC request.
>  
> Therefore, It it necessary to add file count info in the audit log to assist 
> maintenance 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16117) Add file count info in audit log to record the file count for delete and getListing RPC request to assist user trouble shooting when RPC cost is increasing

2021-07-09 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-16117:
--
Hadoop Flags: Incompatible change

> Add file count info in audit log to record the file count for delete and 
> getListing RPC request to assist user trouble shooting when RPC cost is 
> increasing 
> 
>
> Key: HDFS-16117
> URL: https://issues.apache.org/jira/browse/HDFS-16117
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.1
>Reporter: Daniel Ma
>Assignee: Daniel Ma
>Priority: Major
> Fix For: 3.3.1
>
> Attachments: 0001-HDFS-16117.patch
>
>
> Currently, there is no file count in audit log for delete and getListing RPC 
> request, therefore, for the increasing RPC call, it is not easy to configure 
> it out whether the time-consuming RPC  request is related to too many files 
> be operated in the RPC request.
>  
> Therefore, It it necessary to add file count info in the audit log to assist 
> maintenance 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2021-06-18 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1736#comment-1736
 ] 

Kihwal Lee edited comment on HDFS-13671 at 6/18/21, 3:30 PM:
-

EOL does not mean no more commits. You can still commit stuff.  It's just that 
there won't be any more official Apache releases.

According to their website,

{noformat}
HDP-3.1.0
This release provides Hadoop Common 3.1.1 and no additional Apache patches
{noformat}

So you should be able to apply this to 3.1.1 and do your own build.  You only 
need to update namenodes. If they are really not different from vanila Apache 
3.1.1, replacing hadoop-hdfs jar should be sufficient.


was (Author: kihwal):
EOL does not mean no more commits. you can still commit stuff.  It's just that 
there won't be any more official Apache releases.

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> --
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Yiqun Lin
>Assignee: Haibin Huang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, 
> image-2021-06-10-19-28-58-359.png, image-2021-06-18-15-46-46-052.png, 
> image-2021-06-18-15-47-04-037.png
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree node. When there are large 
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the 
> {{getBlockIterator}} to return blocks iterator and no other get operation 
> with specified block. Still we need to use {{FoldedTreeSet}} in 
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not 
> Update. Maybe we can revert this to the early implementation.



--
This message was sent by Atlassian Jira

[jira] [Commented] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2021-06-18 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1736#comment-1736
 ] 

Kihwal Lee commented on HDFS-13671:
---

EOL does not mean no more commits. you can still commit stuff.  It's just that 
there won't be any more official Apache releases.

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> --
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Yiqun Lin
>Assignee: Haibin Huang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, 
> image-2021-06-10-19-28-58-359.png, image-2021-06-18-15-46-46-052.png, 
> image-2021-06-18-15-47-04-037.png
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree node. When there are large 
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the 
> {{getBlockIterator}} to return blocks iterator and no other get operation 
> with specified block. Still we need to use {{FoldedTreeSet}} in 
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not 
> Update. Maybe we can revert this to the early implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15618) Improve datanode shutdown latency

2021-06-16 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15618:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

I've committed this to branch-2.10.

> Improve datanode shutdown latency
> -
>
> Key: HDFS-15618
> URL: https://issues.apache.org/jira/browse/HDFS-15618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.4.0, 2.10.2, 3.2.3, 3.3.1, 3.2.2
>
> Attachments: HDFS-15618-branch-2.10.001.patch, 
> HDFS-15618-branch-2.10.002.patch, HDFS-15618-branch-2.10.003.patch, 
> HDFS-15618-branch-3.3.004.patch, HDFS-15618.001.patch, HDFS-15618.002.patch, 
> HDFS-15618.003.patch, HDFS-15618.004.patch
>
>
> The shutdown of Datanode is a very long latency. A block scanner waits for 5 
> minutes to join on each VolumeScanner thread.
> Since the scanners are daemon threads and do not alter the block content, it 
> is safe to ignore such conditions on shutdown of Datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15618) Improve datanode shutdown latency

2021-06-16 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15618:
--
Fix Version/s: 2.10.2

> Improve datanode shutdown latency
> -
>
> Key: HDFS-15618
> URL: https://issues.apache.org/jira/browse/HDFS-15618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0, 2.10.2, 3.2.3
>
> Attachments: HDFS-15618-branch-2.10.001.patch, 
> HDFS-15618-branch-2.10.002.patch, HDFS-15618-branch-2.10.003.patch, 
> HDFS-15618-branch-3.3.004.patch, HDFS-15618.001.patch, HDFS-15618.002.patch, 
> HDFS-15618.003.patch, HDFS-15618.004.patch
>
>
> The shutdown of Datanode is a very long latency. A block scanner waits for 5 
> minutes to join on each VolumeScanner thread.
> Since the scanners are daemon threads and do not alter the block content, it 
> is safe to ignore such conditions on shutdown of Datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15618) Improve datanode shutdown latency

2021-06-16 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17364369#comment-17364369
 ] 

Kihwal Lee commented on HDFS-15618:
---

+1 The 2.10 patch looks good. Thanks, Ahmed for working on the port.

> Improve datanode shutdown latency
> -
>
> Key: HDFS-15618
> URL: https://issues.apache.org/jira/browse/HDFS-15618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3
>
> Attachments: HDFS-15618-branch-2.10.001.patch, 
> HDFS-15618-branch-2.10.002.patch, HDFS-15618-branch-2.10.003.patch, 
> HDFS-15618-branch-3.3.004.patch, HDFS-15618.001.patch, HDFS-15618.002.patch, 
> HDFS-15618.003.patch, HDFS-15618.004.patch
>
>
> The shutdown of Datanode is a very long latency. A block scanner waits for 5 
> minutes to join on each VolumeScanner thread.
> Since the scanners are daemon threads and do not alter the block content, it 
> is safe to ignore such conditions on shutdown of Datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15963) Unreleased volume references cause an infinite loop

2021-06-16 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17364299#comment-17364299
 ] 

Kihwal Lee commented on HDFS-15963:
---

I've looked at heap dumps and confirm the analysis by [~zhangshuyan]. 
One failed volume's reference was closed (2^30), but the count never went down 
to 0. As long as this volume is in the head of {{volumesBeingRemoved}}, 
additional volume failures could not be handled, as the handler threads are all 
stuck looping forever for this volume to clear.

> Unreleased volume references cause an infinite loop
> ---
>
> Key: HDFS-15963
> URL: https://issues.apache.org/jira/browse/HDFS-15963
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 2.10.2
>
> Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, 
> HDFS-15963.003.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> When BlockSender throws an exception because the meta-data cannot be found, 
> the volume reference obtained by the thread is not released, which causes the 
> thread trying to remove the volume to wait and fall into an infinite loop.
> {code:java}
> boolean checkVolumesRemoved() {
>   Iterator it = volumesBeingRemoved.iterator();
>   while (it.hasNext()) {
> FsVolumeImpl volume = it.next();
> if (!volume.checkClosed()) {
>   return false;
> }
> it.remove();
>   }
>   return true;
> }
> boolean checkClosed() {
>   // always be true.
>   if (this.reference.getReferenceCount() > 0) {
> FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.",
> this, reference.getReferenceCount());
> return false;
>   }
>   return true;
> }
> {code}
> At the same time, because the thread has been holding checkDirsLock when 
> removing the volume, other threads trying to acquire the same lock will be 
> permanently blocked.
> Similar problems also occur in RamDiskAsyncLazyPersistService and 
> FsDatasetAsyncDiskService.
> This patch releases the three previously unreleased volume references.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15963) Unreleased volume references cause an infinite loop

2021-06-15 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17364000#comment-17364000
 ] 

Kihwal Lee commented on HDFS-15963:
---

We hit this in 2.10 recently. I've cherry-picked it to branch-2.10 with minor 
conflicts. All new test cases pass.

> Unreleased volume references cause an infinite loop
> ---
>
> Key: HDFS-15963
> URL: https://issues.apache.org/jira/browse/HDFS-15963
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, 
> HDFS-15963.003.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> When BlockSender throws an exception because the meta-data cannot be found, 
> the volume reference obtained by the thread is not released, which causes the 
> thread trying to remove the volume to wait and fall into an infinite loop.
> {code:java}
> boolean checkVolumesRemoved() {
>   Iterator it = volumesBeingRemoved.iterator();
>   while (it.hasNext()) {
> FsVolumeImpl volume = it.next();
> if (!volume.checkClosed()) {
>   return false;
> }
> it.remove();
>   }
>   return true;
> }
> boolean checkClosed() {
>   // always be true.
>   if (this.reference.getReferenceCount() > 0) {
> FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.",
> this, reference.getReferenceCount());
> return false;
>   }
>   return true;
> }
> {code}
> At the same time, because the thread has been holding checkDirsLock when 
> removing the volume, other threads trying to acquire the same lock will be 
> permanently blocked.
> Similar problems also occur in RamDiskAsyncLazyPersistService and 
> FsDatasetAsyncDiskService.
> This patch releases the three previously unreleased volume references.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15963) Unreleased volume references cause an infinite loop

2021-06-15 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15963:
--
Fix Version/s: 2.10.2

> Unreleased volume references cause an infinite loop
> ---
>
> Key: HDFS-15963
> URL: https://issues.apache.org/jira/browse/HDFS-15963
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 2.10.2
>
> Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, 
> HDFS-15963.003.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> When BlockSender throws an exception because the meta-data cannot be found, 
> the volume reference obtained by the thread is not released, which causes the 
> thread trying to remove the volume to wait and fall into an infinite loop.
> {code:java}
> boolean checkVolumesRemoved() {
>   Iterator it = volumesBeingRemoved.iterator();
>   while (it.hasNext()) {
> FsVolumeImpl volume = it.next();
> if (!volume.checkClosed()) {
>   return false;
> }
> it.remove();
>   }
>   return true;
> }
> boolean checkClosed() {
>   // always be true.
>   if (this.reference.getReferenceCount() > 0) {
> FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.",
> this, reference.getReferenceCount());
> return false;
>   }
>   return true;
> }
> {code}
> At the same time, because the thread has been holding checkDirsLock when 
> removing the volume, other threads trying to acquire the same lock will be 
> permanently blocked.
> Similar problems also occur in RamDiskAsyncLazyPersistService and 
> FsDatasetAsyncDiskService.
> This patch releases the three previously unreleased volume references.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2021-06-10 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361135#comment-17361135
 ] 

Kihwal Lee commented on HDFS-13671:
---

Thanks for sharing the test result, [~huanghaibin].  I am still reviewing the 
PR, but it looks like [~aajisaka] has already done a good work.

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> --
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Yiqun Lin
>Assignee: Haibin Huang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, 
> image-2021-06-10-19-28-58-359.png
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree node. When there are large 
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the 
> {{getBlockIterator}} to return blocks iterator and no other get operation 
> with specified block. Still we need to use {{FoldedTreeSet}} in 
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not 
> Update. Maybe we can revert this to the early implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16042) DatanodeAdminMonitor scan should be delay based

2021-06-03 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356692#comment-17356692
 ] 

Kihwal Lee commented on HDFS-16042:
---

It is not related to any incident. Since the default interval is 30 seconds, 
the impact of the change will not be great, but still it is right thing to do.  
If a lot of decommissioning and/or maintenance mode entering nodes are 
introduced at once, the initial scan can last seconds. This initial scan is not 
subject to the max blocks per iteration limit.  By changing it from fixed 
interval to fixed delay, such an impact will be dampened a bit in the long run. 

The patch looks good.

> DatanodeAdminMonitor scan should be delay based
> ---
>
> Key: HDFS-16042
> URL: https://issues.apache.org/jira/browse/HDFS-16042
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In {{DatanodeAdminManager.activate()}}, the Monitor task is scheduled with a 
> fixed rate, ie. the period is from start1 -> start2.  
> {code:java}
> executor.scheduleAtFixedRate(monitor, intervalSecs, intervalSecs,
>TimeUnit.SECONDS);
> {code}
> According to Java API docs for {{scheduleAtFixedRate}},
> {quote}If any execution of this task takes longer than its period, then 
> subsequent executions may start late, but will not concurrently 
> execute.{quote}
> It should be a fixed delay so it's end1 -> start1.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16034) Disk failures exceeding the DFIP threshold do not shutdown datanode

2021-05-21 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee resolved HDFS-16034.
---
Resolution: Incomplete

> Disk failures exceeding the DFIP threshold do not shutdown datanode
> ---
>
> Key: HDFS-16034
> URL: https://issues.apache.org/jira/browse/HDFS-16034
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.10.1, 3.4.0
> Environment: HDFS-11182.
>Reporter: Kihwal Lee
>Priority: Major
>
> HDFS-11182 made DataNode use the new {{DatasetVolumeChecker}}.  But 
> {{handleVolumeFailures()}} blows up in the middle so it never reaches 
> {{handleDiskError()}}, where the datanode is potentially shutdown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16034) Disk failures exceeding the DFIP threshold do not shutdown datanode

2021-05-21 Thread Kihwal Lee (Jira)
Kihwal Lee created HDFS-16034:
-

 Summary: Disk failures exceeding the DFIP threshold do not 
shutdown datanode
 Key: HDFS-16034
 URL: https://issues.apache.org/jira/browse/HDFS-16034
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.10.1, 3.4.0
 Environment: HDFS-11182.
Reporter: Kihwal Lee


HDFS-11182 made DataNode use the new {{DatasetVolumeChecker}}.  But 
{{handleVolumeFailures()}} blows up in the middle so it never reaches 
{{handleDiskError()}}, where the datanode is potentially shutdown.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8224) Schedule a block for scanning if its metadata file is corrupt

2021-04-06 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-8224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17315771#comment-17315771
 ] 

Kihwal Lee commented on HDFS-8224:
--

If you are sure the data block is not corrupt, you can use {{hdfs debug 
computeMeta}} to regenerate the meta file. Or simply re-upload the data.

> Schedule a block for scanning if its metadata file is corrupt
> -
>
> Key: HDFS-8224
> URL: https://issues.apache.org/jira/browse/HDFS-8224
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha1
>
> Attachments: HDFS-8224-branch-2.7.patch, HDFS-8224-branch-2.patch, 
> HDFS-8224-trunk-1.patch, HDFS-8224-trunk-2.patch, HDFS-8224-trunk-3.patch, 
> HDFS-8224-trunk.patch
>
>
> This happened in our 2.6 cluster.
> One of the block and its metadata file were corrupted.
> The disk was healthy in this case.
> Only the block was corrupt.
> Namenode tried to copy that block to another datanode but failed with the 
> following stack trace:
> 2015-04-20 01:04:04,421 
> [org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@11319bc4] WARN 
> datanode.DataNode: DatanodeRegistration(a.b.c.d, 
> datanodeUuid=e8c5135c-9b9f-4d05-a59d-e5525518aca7, infoPort=1006, 
> infoSecurePort=0, ipcPort=8020, 
> storageInfo=lv=-56;cid=CID-e7f736ac-158e-446e-9091-7e66f3cddf3c;nsid=358250775;c=1428471998571):Failed
>  to transfer BP-xxx-1351096255769:blk_2697560713_1107108863999 to 
> a1.b1.c1.d1:1004 got 
> java.io.IOException: Could not create DataChecksum of type 0 with 
> bytesPerChecksum 0
> at 
> org.apache.hadoop.util.DataChecksum.newDataChecksum(DataChecksum.java:125)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readHeader(BlockMetadataHeader.java:175)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readHeader(BlockMetadataHeader.java:140)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readDataChecksum(BlockMetadataHeader.java:102)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:287)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1989)
> at java.lang.Thread.run(Thread.java:722)
> The following catch block in DataTransfer#run method will treat every 
> IOException as disk error fault and run disk errror
> {noformat}
> catch (IOException ie) {
> LOG.warn(bpReg + ":Failed to transfer " + b + " to " +
> targets[0] + " got ", ie);
> // check if there are any disk problem
> checkDiskErrorAsync();
>   } 
> {noformat}
> This block was never scanned by BlockPoolSliceScanner otherwise it would have 
> reported as corrupt block.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15901) Solve the problem of DN repeated block reports occupying too many RPCs during Safemode

2021-03-19 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304933#comment-17304933
 ] 

Kihwal Lee edited comment on HDFS-15901 at 3/19/21, 2:18 PM:
-

[~jianghuazhu], when it happens again, take a look at the datanodes tab of the 
NN web UI. If you sort by number of blocks, you can figure out who hasn't sent 
FBR yet. Those nodes will have very low block count or 0 for their block pool 
used percentage.  You can try manually triggering a FBR for those nodes. This 
might work, but the block report lease manager can get in the way.  In that 
case, the datanode can be restarted to force re-registration and obtain a new 
fbr lease.


was (Author: kihwal):
[~jianghuazhu], when it happens again, take a look at the datanodes tab of the 
NN web UI. If you sort by number of blocks, you can figure out who hasn't sent 
FBR yet. Those nodes will have very low block count or 0 for their block pool 
used percentage.  You can try manually triggering a FBR for those nodes. This 
might work, but the block report lease manager can get in the way.

> Solve the problem of DN repeated block reports occupying too many RPCs during 
> Safemode
> --
>
> Key: HDFS-15901
> URL: https://issues.apache.org/jira/browse/HDFS-15901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When the cluster exceeds thousands of nodes, we want to restart the NameNode 
> service, and all DataNodes send a full Block action to the NameNode. During 
> SafeMode, some DataNodes may send blocks to NameNode multiple times, which 
> will take up too much RPC. In fact, this is unnecessary.
> In this case, some block report leases will fail or time out, and in extreme 
> cases, the NameNode will always stay in Safe Mode.
> 2021-03-14 08:16:25,873 [78438700] - INFO  [Block report 
> processor:BlockManager@2158] - BLOCK* processReport 0xe: discarded 
> non-initial block report from DatanodeRegistration(:port, 
> datanodeUuid=, infoPort=, infoSecurePort=, 
> ipcPort=, storageInfo=lv=;nsid=;c=0) because namenode 
> still in startup phase
> 2021-03-14 08:16:31,521 [78444348] - INFO  [Block report 
> processor:BlockManager@2158] - BLOCK* processReport 0xe: discarded 
> non-initial block report from DatanodeRegistration(, 
> datanodeUuid=, infoPort=, infoSecurePort=, 
> ipcPort=, storageInfo=lv=;nsid=;c=0) because namenode 
> still in startup phase
> 2021-03-13 18:35:38,200 [29191027] - WARN  [Block report 
> processor:BlockReportLeaseManager@311] - BR lease 0x is not valid for 
> DN , because the DN is not in the pending set.
> 2021-03-13 18:36:08,143 [29220970] - WARN  [Block report 
> processor:BlockReportLeaseManager@311] - BR lease 0x is not valid for 
> DN , because the DN is not in the pending set.
> 2021-03-13 18:36:08,143 [29220970] - WARN  [Block report 
> processor:BlockReportLeaseManager@317] - BR lease 0x is not valid for 
> DN , because the lease has expired.
> 2021-03-13 18:36:08,145 [29220972] - WARN  [Block report 
> processor:BlockReportLeaseManager@317] - BR lease 0x is not valid for 
> DN , because the lease has expired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15901) Solve the problem of DN repeated block reports occupying too many RPCs during Safemode

2021-03-19 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304933#comment-17304933
 ] 

Kihwal Lee commented on HDFS-15901:
---

[~jianghuazhu], when it happens again, take a look at the datanodes tab of the 
NN web UI. If you sort by number of blocks, you can figure out who hasn't sent 
FBR yet. Those nodes will have very low block count or 0 for their block pool 
used percentage.  You can try manually triggering a FBR for those nodes. This 
might work, but the block report lease manager can get in the way.

> Solve the problem of DN repeated block reports occupying too many RPCs during 
> Safemode
> --
>
> Key: HDFS-15901
> URL: https://issues.apache.org/jira/browse/HDFS-15901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When the cluster exceeds thousands of nodes, we want to restart the NameNode 
> service, and all DataNodes send a full Block action to the NameNode. During 
> SafeMode, some DataNodes may send blocks to NameNode multiple times, which 
> will take up too much RPC. In fact, this is unnecessary.
> In this case, some block report leases will fail or time out, and in extreme 
> cases, the NameNode will always stay in Safe Mode.
> 2021-03-14 08:16:25,873 [78438700] - INFO  [Block report 
> processor:BlockManager@2158] - BLOCK* processReport 0xe: discarded 
> non-initial block report from DatanodeRegistration(:port, 
> datanodeUuid=, infoPort=, infoSecurePort=, 
> ipcPort=, storageInfo=lv=;nsid=;c=0) because namenode 
> still in startup phase
> 2021-03-14 08:16:31,521 [78444348] - INFO  [Block report 
> processor:BlockManager@2158] - BLOCK* processReport 0xe: discarded 
> non-initial block report from DatanodeRegistration(, 
> datanodeUuid=, infoPort=, infoSecurePort=, 
> ipcPort=, storageInfo=lv=;nsid=;c=0) because namenode 
> still in startup phase
> 2021-03-13 18:35:38,200 [29191027] - WARN  [Block report 
> processor:BlockReportLeaseManager@311] - BR lease 0x is not valid for 
> DN , because the DN is not in the pending set.
> 2021-03-13 18:36:08,143 [29220970] - WARN  [Block report 
> processor:BlockReportLeaseManager@311] - BR lease 0x is not valid for 
> DN , because the DN is not in the pending set.
> 2021-03-13 18:36:08,143 [29220970] - WARN  [Block report 
> processor:BlockReportLeaseManager@317] - BR lease 0x is not valid for 
> DN , because the lease has expired.
> 2021-03-13 18:36:08,145 [29220972] - WARN  [Block report 
> processor:BlockReportLeaseManager@317] - BR lease 0x is not valid for 
> DN , because the lease has expired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15901) Solve the problem of DN repeated block reports occupying too many RPCs during Safemode

2021-03-19 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304928#comment-17304928
 ] 

Kihwal Lee commented on HDFS-15901:
---

bq. look forward to your new block report flow control system, Do you have plan 
to submit to community? Thanks.
It was tested in small (to us) clusters with about 100 nodes, but hasn't hit 
the large ones where we occasionally see this issue. We don't usually see it 
during startup though. Probably the way our call queue is set up avoids this 
during normal startup for the standby NN. We do not see too many FBR 
retransmits even on the clusters with 4,000 or 6,000 nodes.   However, when a 
massive node loss and reregistration happens due to some issues, the active NN 
can be overwhelmed easily since there are also large number of user requests in 
the queue. That's when we usually see this kind of issues.

[~daryn]'s replacement for the fbr lease is something close to what he 
suggested to the author of the feature during its development. Once we verify 
it is working at scale, we will submit the patch.

> Solve the problem of DN repeated block reports occupying too many RPCs during 
> Safemode
> --
>
> Key: HDFS-15901
> URL: https://issues.apache.org/jira/browse/HDFS-15901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When the cluster exceeds thousands of nodes, we want to restart the NameNode 
> service, and all DataNodes send a full Block action to the NameNode. During 
> SafeMode, some DataNodes may send blocks to NameNode multiple times, which 
> will take up too much RPC. In fact, this is unnecessary.
> In this case, some block report leases will fail or time out, and in extreme 
> cases, the NameNode will always stay in Safe Mode.
> 2021-03-14 08:16:25,873 [78438700] - INFO  [Block report 
> processor:BlockManager@2158] - BLOCK* processReport 0xe: discarded 
> non-initial block report from DatanodeRegistration(:port, 
> datanodeUuid=, infoPort=, infoSecurePort=, 
> ipcPort=, storageInfo=lv=;nsid=;c=0) because namenode 
> still in startup phase
> 2021-03-14 08:16:31,521 [78444348] - INFO  [Block report 
> processor:BlockManager@2158] - BLOCK* processReport 0xe: discarded 
> non-initial block report from DatanodeRegistration(, 
> datanodeUuid=, infoPort=, infoSecurePort=, 
> ipcPort=, storageInfo=lv=;nsid=;c=0) because namenode 
> still in startup phase
> 2021-03-13 18:35:38,200 [29191027] - WARN  [Block report 
> processor:BlockReportLeaseManager@311] - BR lease 0x is not valid for 
> DN , because the DN is not in the pending set.
> 2021-03-13 18:36:08,143 [29220970] - WARN  [Block report 
> processor:BlockReportLeaseManager@311] - BR lease 0x is not valid for 
> DN , because the DN is not in the pending set.
> 2021-03-13 18:36:08,143 [29220970] - WARN  [Block report 
> processor:BlockReportLeaseManager@317] - BR lease 0x is not valid for 
> DN , because the lease has expired.
> 2021-03-13 18:36:08,145 [29220972] - WARN  [Block report 
> processor:BlockReportLeaseManager@317] - BR lease 0x is not valid for 
> DN , because the lease has expired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15868) Possible Resource Leak in EditLogFileOutputStream

2021-03-19 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304911#comment-17304911
 ] 

Kihwal Lee commented on HDFS-15868:
---

It is a good change. It was probably neglected during the design because once 
an exception is thrown, the namenode dies anyway. That hasn't been true for 
quite some time.

> Possible Resource Leak in EditLogFileOutputStream
> -
>
> Key: HDFS-15868
> URL: https://issues.apache.org/jira/browse/HDFS-15868
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Narges Shadab
>Assignee: Narges Shadab
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> We noticed a possible resource leak 
> [here|https://github.com/apache/hadoop/blob/1f1a1ef52df896a2b66b16f5bbc17aa39b1a1dd7/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileOutputStream.java#L91].
>  If an I/O error occurs at line 91, rp remains open since the exception isn't 
> caught locally, and there is no way for any caller to close the 
> RandomAccessFile.
>  I'll submit a pull request to fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15901) Solve the problem of DN repeated block reports occupying too many RPCs during Safemode

2021-03-18 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304503#comment-17304503
 ] 

Kihwal Lee edited comment on HDFS-15901 at 3/18/21, 10:32 PM:
--

The block report lease feature is supposed to improve this, but it ended up 
causing more problems in our experiences.  One of the main reasons of duplicate 
reporting is lack of ability to retransmit single report on rpc timeout.  On 
startup, the NN's call queue can be easily overwhelmed since the FBR processing 
is relatively slow. It is common to see the processing of a single storage 
taking 100s of milliseconds. A half dozen storage reports can take up a while 
second. You can easily imagine more than 60 seconds worth of reports waiting in 
the call queue, which will cause a timeout for some of the reports. 
Unfortunately, datanode's full block reporting does not retransmit the affected 
report only.  It regenerates the whole thing and start all over again.  Even if 
only the last storage FBR had a trouble, it will retransmit everything again.

The reason why it sometimes stuck in safe mode is likely the curse of the block 
report lease. When FBR is retransmitted, the feature will make the NN to drop 
the reports.  We have seen this happening in big clusters.  If the block report 
lease wasn't there, it wouldn't have stuck in safe mode.

We have recently gut out the FBR lease feature internally and implemented a new 
block report flow control system.  It was designed by [~daryn].  It hasn't been 
tested fully yet, so we haven't shared it with the community. 


was (Author: kihwal):
The block report lease feature is supposed to improve this, but it ended up 
causing more problems in our experiences.  One of the main reasons of duplicate 
reporting is lack of ability to retransmit single report on rpc timeout.  On 
startup, the NN's call queue can be easily overwhelmed since the FBR processing 
relatively slow. It is common to see a processing of a single storage taking 
100s of milliseconds. A half dozen storage reports can take up a while second. 
If you have enough in the call queue, the queue time can easily exceed the 60 
second timeout for some of the nodes. Unfortunately, datanode's full block 
reporting does not retransmit the affected report only.  It regenerates the 
whole thing and start all over again.  Even if only the last storage FBR had a 
trouble, it will retransmit everything again.

The reason why it sometimes stuck in safe mode is likely the curse of the block 
report lease. When FBR is retransmitted, the feature will make the NN to drop 
the reports.  We have seen this happening in big clusters.  If the block report 
lease wasn't there, it wouldn't have stuck in safe mode.

We have recently gut out the FBR lease feature internally and implemented a new 
block report flow control system.  It was designed by [~daryn].  It hasn't been 
tested fully yet, so we haven't shared it with the community. 

> Solve the problem of DN repeated block reports occupying too many RPCs during 
> Safemode
> --
>
> Key: HDFS-15901
> URL: https://issues.apache.org/jira/browse/HDFS-15901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When the cluster exceeds thousands of nodes, we want to restart the NameNode 
> service, and all DataNodes send a full Block action to the NameNode. During 
> SafeMode, some DataNodes may send blocks to NameNode multiple times, which 
> will take up too much RPC. In fact, this is unnecessary.
> In this case, some block report leases will fail or time out, and in extreme 
> cases, the NameNode will always stay in Safe Mode.
> 2021-03-14 08:16:25,873 [78438700] - INFO  [Block report 
> processor:BlockManager@2158] - BLOCK* processReport 0xe: discarded 
> non-initial block report from DatanodeRegistration(:port, 
> datanodeUuid=, infoPort=, infoSecurePort=, 
> ipcPort=, storageInfo=lv=;nsid=;c=0) because namenode 
> still in startup phase
> 2021-03-14 08:16:31,521 [78444348] - INFO  [Block report 
> processor:BlockManager@2158] - BLOCK* processReport 0xe: discarded 
> non-initial block report from DatanodeRegistration(, 
> datanodeUuid=, infoPort=, infoSecurePort=, 
> ipcPort=, storageInfo=lv=;nsid=;c=0) because namenode 
> still in startup phase
> 2021-03-13 18:35:38,200 [29191027] - WARN  [Block report 
> processor:BlockReportLeaseManager@311] - BR lease 0x is not valid for 
> DN , because the DN is not in the pending 

[jira] [Commented] (HDFS-15901) Solve the problem of DN repeated block reports occupying too many RPCs during Safemode

2021-03-18 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304503#comment-17304503
 ] 

Kihwal Lee commented on HDFS-15901:
---

The block report lease feature is supposed to improve this, but it ended up 
causing more problems in our experiences.  One of the main reasons of duplicate 
reporting is lack of ability to retransmit single report on rpc timeout.  On 
startup, the NN's call queue can be easily overwhelmed since the FBR processing 
relatively slow. It is common to see a processing of a single storage taking 
100s of milliseconds. A half dozen storage reports can take up a while second. 
If you have enough in the call queue, the queue time can easily exceed the 60 
second timeout for some of the nodes. Unfortunately, datanode's full block 
reporting does not retransmit the affected report only.  It regenerates the 
whole thing and start all over again.  Even if only the last storage FBR had a 
trouble, it will retransmit everything again.

The reason why it sometimes stuck in safe mode is likely the curse of the block 
report lease. When FBR is retransmitted, the feature will make the NN to drop 
the reports.  We have seen this happening in big clusters.  If the block report 
lease wasn't there, it wouldn't have stuck in safe mode.

We have recently gut out the FBR lease feature internally and implemented a new 
block report flow control system.  It was designed by [~daryn].  It hasn't been 
tested fully yet, so we haven't shared it with the community. 

> Solve the problem of DN repeated block reports occupying too many RPCs during 
> Safemode
> --
>
> Key: HDFS-15901
> URL: https://issues.apache.org/jira/browse/HDFS-15901
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When the cluster exceeds thousands of nodes, we want to restart the NameNode 
> service, and all DataNodes send a full Block action to the NameNode. During 
> SafeMode, some DataNodes may send blocks to NameNode multiple times, which 
> will take up too much RPC. In fact, this is unnecessary.
> In this case, some block report leases will fail or time out, and in extreme 
> cases, the NameNode will always stay in Safe Mode.
> 2021-03-14 08:16:25,873 [78438700] - INFO  [Block report 
> processor:BlockManager@2158] - BLOCK* processReport 0xe: discarded 
> non-initial block report from DatanodeRegistration(:port, 
> datanodeUuid=, infoPort=, infoSecurePort=, 
> ipcPort=, storageInfo=lv=;nsid=;c=0) because namenode 
> still in startup phase
> 2021-03-14 08:16:31,521 [78444348] - INFO  [Block report 
> processor:BlockManager@2158] - BLOCK* processReport 0xe: discarded 
> non-initial block report from DatanodeRegistration(, 
> datanodeUuid=, infoPort=, infoSecurePort=, 
> ipcPort=, storageInfo=lv=;nsid=;c=0) because namenode 
> still in startup phase
> 2021-03-13 18:35:38,200 [29191027] - WARN  [Block report 
> processor:BlockReportLeaseManager@311] - BR lease 0x is not valid for 
> DN , because the DN is not in the pending set.
> 2021-03-13 18:36:08,143 [29220970] - WARN  [Block report 
> processor:BlockReportLeaseManager@311] - BR lease 0x is not valid for 
> DN , because the DN is not in the pending set.
> 2021-03-13 18:36:08,143 [29220970] - WARN  [Block report 
> processor:BlockReportLeaseManager@317] - BR lease 0x is not valid for 
> DN , because the lease has expired.
> 2021-03-13 18:36:08,145 [29220972] - WARN  [Block report 
> processor:BlockReportLeaseManager@317] - BR lease 0x is not valid for 
> DN , because the lease has expired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15837) Incorrect bytes causing block corruption after update pipeline and recovery failure

2021-02-17 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286111#comment-17286111
 ] 

Kihwal Lee commented on HDFS-15837:
---

The successful pipeline update with 3072 bytes was at 03:42:40.  The 
replication of a 3544 byte block and corruption detection was 4 minutes later. 
The client must have wrote more after the recovery and closed the file. 
Replication happens only after closing the file.  It looks like the file was 
closed at around 03:46:40.

I don't think the size difference observed on datanodes are anomaly. Clients 
usually write more data after a pipeline recovery. Is the file still existing? 
What's the size you see when you do "hadoop fs -ls /pat_to_the_file"?  Is the 
block appearing as missing on the UI? 

If there was a corruption issue, I don't think it was because of the size.  The 
fact that the replication happened means the NN was agreeing with the size 
3544. If the datanode had unexpected size of the replica, the transfer wouldn't 
have happened.  The corruption report came from the destination node for the 
replication, 172.21.234.181.  The replication source does not perform any 
checks and there is no feedback mechanism for reporting error back to the 
source node. So even if a corruption was detected during the replication, only 
the target (172.21.234.181) would notice it.  Check the log of 172.21.234.181 
at around 03:46:43. It must have detected a corruption and logged it.

The replica on the source (172.21.226.26) must be the only copy and if it is 
corrupt, it will show up as missing. If the file still exists, you can get to 
the node and run "hdfs debug verifyMeta" command against the block file and the 
meta file to manually check the integrity.

If it is really corrupt, it could be due to 172.21.226.26's local issue.  
Extreme load can cause write failures and pipeline updates, but we haven't seen 
such condition causing data corruption.

> Incorrect bytes causing block corruption after update pipeline and recovery 
> failure
> ---
>
> Key: HDFS-15837
> URL: https://issues.apache.org/jira/browse/HDFS-15837
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Affects Versions: 2.8.5
>Reporter: Udit Mehrotra
>Priority: Major
>
> We are seeing cases on HDFS blocks being marked as *bad* after the initial 
> block receive fails during *update pipeline* followed by *HDFS* *recovery* 
> for the block failing as well. Here is the life cycle of a block 
> *{{blk_1342440165_272630578}}* that was ultimately marked as corrupt:
> 1. The block creation starts at name node as part of *update pipeline* 
> process:
> {noformat}
> 2021-01-25 03:41:17,335 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem (IPC Server handler 61 on 
> 8020): updatePipeline(blk_1342440165_272500939 => blk_1342440165_272630578) 
> success{noformat}
> 2. The block receiver on the data node fails with a socket timeout exception, 
> and so do the retries:
> {noformat}
> 2021-01-25 03:42:22,525 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (PacketResponder: 
> BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272630578, 
> type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=1:[172.21.246.239:50010]): 
> PacketResponder: 
> BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272630578, 
> type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=1:[172.21.246.239:50010]
> java.net.SocketTimeoutException: 65000 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/172.21.226.26:56294 remote=/172.21.246.239:50010]
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
> at java.io.FilterInputStream.read(FilterInputStream.java:83)
> at java.io.FilterInputStream.read(FilterInputStream.java:83)
> at 
> org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:400)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1305)
> at java.lang.Thread.run(Thread.java:748)
> 2021-01-25 03:42:22,526 WARN org.apache.hadoop.hdfs.server.datanode.DataNode 
> (PacketResponder: 
> BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272630578, 
> type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=1:[172.21.246.239:50010]): 
> IOException in BlockReceiver.run(): 
> java.io.IOException: Connection 

[jira] [Commented] (HDFS-15837) Incorrect bytes causing block corruption after update pipeline and recovery failure

2021-02-16 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17285448#comment-17285448
 ] 

Kihwal Lee commented on HDFS-15837:
---

What is the version of Hadoop you are using?  We've seen similar issues with 
varying causes.  Was this file closed successfully with the size 3072?

> Incorrect bytes causing block corruption after update pipeline and recovery 
> failure
> ---
>
> Key: HDFS-15837
> URL: https://issues.apache.org/jira/browse/HDFS-15837
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Affects Versions: 2.8.5
>Reporter: Udit Mehrotra
>Priority: Major
>
> We are seeing cases on HDFS blocks being marked as *bad* after the initial 
> block receive fails during *update pipeline* followed by *HDFS* *recovery* 
> for the block failing as well. Here is the life cycle of a block 
> *{{blk_1342440165_272630578}}* that was ultimately marked as corrupt:
> 1. The block creation starts at name node as part of *update pipeline* 
> process:
> {noformat}
> 2021-01-25 03:41:17,335 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem (IPC Server handler 61 on 
> 8020): updatePipeline(blk_1342440165_272500939 => blk_1342440165_272630578) 
> success{noformat}
> 2. The block receiver on the data node fails with a socket timeout exception, 
> and so do the retries:
> {noformat}
> 2021-01-25 03:42:22,525 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (PacketResponder: 
> BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272630578, 
> type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=1:[172.21.246.239:50010]): 
> PacketResponder: 
> BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272630578, 
> type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=1:[172.21.246.239:50010]
> java.net.SocketTimeoutException: 65000 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/172.21.226.26:56294 remote=/172.21.246.239:50010]
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
> at java.io.FilterInputStream.read(FilterInputStream.java:83)
> at java.io.FilterInputStream.read(FilterInputStream.java:83)
> at 
> org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:400)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1305)
> at java.lang.Thread.run(Thread.java:748)
> 2021-01-25 03:42:22,526 WARN org.apache.hadoop.hdfs.server.datanode.DataNode 
> (PacketResponder: 
> BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272630578, 
> type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=1:[172.21.246.239:50010]): 
> IOException in BlockReceiver.run(): 
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
> at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
> at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
> at sun.nio.ch.IOUtil.write(IOUtil.java:65)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:468)
> at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
> at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> at java.io.DataOutputStream.flush(DataOutputStream.java:123)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstreamUnprotected(BlockReceiver.java:1552)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstream(BlockReceiver.java:1489)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1402)
> at java.lang.Thread.run(Thread.java:748)
> 2021-01-25 03:42:22,526 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (PacketResponder: 
> BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272630578, 
> type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=1:[172.21.246.239:50010]): 
> PacketResponder: 
> 

[jira] [Resolved] (HDFS-15825) Using a cryptographically weak Pseudo Random Number Generator (PRNG)

2021-02-08 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee resolved HDFS-15825.
---
Resolution: Invalid

> Using a cryptographically weak Pseudo Random Number Generator (PRNG)
> 
>
> Key: HDFS-15825
> URL: https://issues.apache.org/jira/browse/HDFS-15825
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vicky Zhang
>Priority: Major
>
> We are a security research team at Virginia Tech. We are doing an empirical 
> study about the usefulness of the existing security vulnerability detection 
> tools. The following is a reported vulnerability by certain tools. We'll so 
> appreciate it if you can give any feedback on it.
> *Vulnerability Description*
> In file src/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java, use 
> java.util.Random instead of java.security.SecureRandom at Line 617.
> *Security Impact:*
> Java.util.Random is not cryptographically strong and may expose sensitive 
> information to certain types of attacks when used in a security context.
> *Useful Resources*:
> [https://cwe.mitre.org/data/definitions/338.html]
> *Solution we suggest*
> Replace it with SecureRandom
> *Please share with us your opinions/comments if there is any*
> Is the bug report helpful?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15825) Using a cryptographically weak Pseudo Random Number Generator (PRNG)

2021-02-08 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281142#comment-17281142
 ] 

Kihwal Lee commented on HDFS-15825:
---

[~Vicky Zhang], if you want to scan the code base, please do not include 
obsolete branches.  Please focus on trunk, branch-3.3 and  branch-3.2.

NNStorage currently uses {{ThreadLocalRandom}}. This is not for security, thus 
CWE-338 does not apply.

> Using a cryptographically weak Pseudo Random Number Generator (PRNG)
> 
>
> Key: HDFS-15825
> URL: https://issues.apache.org/jira/browse/HDFS-15825
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vicky Zhang
>Priority: Major
>
> We are a security research team at Virginia Tech. We are doing an empirical 
> study about the usefulness of the existing security vulnerability detection 
> tools. The following is a reported vulnerability by certain tools. We'll so 
> appreciate it if you can give any feedback on it.
> *Vulnerability Description*
> In file src/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java, use 
> java.util.Random instead of java.security.SecureRandom at Line 617.
> *Security Impact:*
> Java.util.Random is not cryptographically strong and may expose sensitive 
> information to certain types of attacks when used in a security context.
> *Useful Resources*:
> [https://cwe.mitre.org/data/definitions/338.html]
> *Solution we suggest*
> Replace it with SecureRandom
> *Please share with us your opinions/comments if there is any*
> Is the bug report helpful?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15824) Update to enable TLS >=1.2 as default secure protocols

2021-02-08 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee resolved HDFS-15824.
---
Resolution: Invalid

> Update to enable TLS >=1.2 as default secure protocols 
> ---
>
> Key: HDFS-15824
> URL: https://issues.apache.org/jira/browse/HDFS-15824
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: contrib/hdfsproxy
>Reporter: Vicky Zhang
>Priority: Major
>
> in file 
> src/contrib/hdfsproxy/src/java/org/apache/hadoop/hdfsproxy/ProxyUtil.java, 
> line 125, the SSL protocol is used in statement:  SSLContext sc = 
> SSLContext.getInstance("SSL");
> *Impact:* 
> An SSL DDoS attack targets the SSL handshake protocol either by sending 
> worthless data to the SSL server which will result in connection issues for 
> legitimate users or by abusing the SSL handshake protocol itself.
> *Suggestions:*
> Upgrade the implementation to the “TLS”, and configure https.protocols JVM 
> option to include TLSv1.2:
> *Useful links:*
> [https://blogs.oracle.com/java-platform-group/diagnosing-tls,-ssl,-and-https]
> [https://www.appmarq.com/public/tqi,1039002,CWE-319-Avoid-using-Deprecated-SSL-protocols-to-secure-connection]
> *Please share with us your opinions/comments if there is any:*
> Is the bug report helpful?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15824) Update to enable TLS >=1.2 as default secure protocols

2021-02-08 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281138#comment-17281138
 ] 

Kihwal Lee commented on HDFS-15824:
---

[~Vicky Zhang]  This is from an old branch that has been EOL'ed for a long time.

> Update to enable TLS >=1.2 as default secure protocols 
> ---
>
> Key: HDFS-15824
> URL: https://issues.apache.org/jira/browse/HDFS-15824
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: contrib/hdfsproxy
>Reporter: Vicky Zhang
>Priority: Major
>
> in file 
> src/contrib/hdfsproxy/src/java/org/apache/hadoop/hdfsproxy/ProxyUtil.java, 
> line 125, the SSL protocol is used in statement:  SSLContext sc = 
> SSLContext.getInstance("SSL");
> *Impact:* 
> An SSL DDoS attack targets the SSL handshake protocol either by sending 
> worthless data to the SSL server which will result in connection issues for 
> legitimate users or by abusing the SSL handshake protocol itself.
> *Suggestions:*
> Upgrade the implementation to the “TLS”, and configure https.protocols JVM 
> option to include TLSv1.2:
> *Useful links:*
> [https://blogs.oracle.com/java-platform-group/diagnosing-tls,-ssl,-and-https]
> [https://www.appmarq.com/public/tqi,1039002,CWE-319-Avoid-using-Deprecated-SSL-protocols-to-secure-connection]
> *Please share with us your opinions/comments if there is any:*
> Is the bug report helpful?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15822) Client retry mechanism may invalid when use hedgedRead

2021-02-05 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17279746#comment-17279746
 ] 

Kihwal Lee commented on HDFS-15822:
---

Hedged read has been known to be buggy. When there is an exception in one 
datanode, it does not recover well.  Multiple jiras have been filed in the past 
regarding its flaws. e.g. HDFS-10597, HDFS-12971 and HDFS-15407. See if your 
patch addresses the issues described there. You can dupe the Jira if you think 
your change covers it.

> Client retry mechanism may invalid when use hedgedRead
> --
>
> Key: HDFS-15822
> URL: https://issues.apache.org/jira/browse/HDFS-15822
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: tianhang tang
>Assignee: tianhang tang
>Priority: Major
> Attachments: HDFS-15822.001.patch
>
>
> Hedgedread uses ignoreNodes to ensure that multiple requests fall on 
> different nodes. But the ignoreNodes never been cleared. So if the request of 
> 1st round all failed, and the refetched location is not changed, HDFS client 
> would not request the same nodes which are in the ignoreNodes. It just sleep 
> time by time until reach the retry num, then throw a exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15813) DataStreamer: keep sending heartbeat packets while streaming

2021-02-04 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17279230#comment-17279230
 ] 

Kihwal Lee commented on HDFS-15813:
---

+1. Unit test failures seem unrelated.  If you can't find existing Jira for the 
failures, please file one for each.  I've looked at 
{{TestUnderReplicatedBlocks#testSetRepIncWithUnderReplicatedBlocks}} briefly. 
It appears to be a test issue.

The test artificially invalidated a replica on a node, but before the test made 
further progress, the NN fixed the under-replication by having another node 
send the block to the same node.  The test then went ahead and removed it from 
the NN's data structure (blocksmap) and called {{setReplication()}}. The NN 
picked two nodes, but one of them was the node that already has the block 
replica. It was only missing in NN's data structure. Again, this happened 
because the NN fixed the under-replication between the test deleting the 
replica and modifying the nn data structure. The replication failed with 
{{ReplicaAlreadyExistsException}}.   This kind of inconsistency does not happen 
in real clusters, but even if it did, it would be fixed when the replication 
times out.  The test is set to timeout before the default replication timeout, 
so it didn't have any chance to do that. 

> DataStreamer: keep sending heartbeat packets while streaming
> 
>
> Key: HDFS-15813
> URL: https://issues.apache.org/jira/browse/HDFS-15813
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: HDFS-15813.001.patch, HDFS-15813.002.patch, 
> HDFS-15813.003.patch, HDFS-15813.004.patch
>
>
> In response to [HDFS-5032], [~daryn] made a change to our internal code to 
> ensure that heartbeats continue during data steaming, even in the face of a 
> slow disk.
> As [~kihwal] noted, absence of heartbeat during flush will be fixed in a 
> separate jira.  It doesn't look like this change was ever pushed back to 
> apache, so I am providing it here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15813) DataStreamer: keep sending heartbeat packets while streaming

2021-02-03 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278224#comment-17278224
 ] 

Kihwal Lee edited comment on HDFS-15813 at 2/3/21, 5:19 PM:


As for the safety of the change, I feel confident since this has been running 
in our production for many years. But for preventing future breakages, we might 
need a unit test.

It might be possible to simulate the condition by delaying the ACK for last 
packet using {{DataNodeFaultInjector}}. If we do 1) write data 2) hflush, 3) 
enable the fault injector and 3) close(), the existing 
{{delaySendingAckToUpstream()}} may be utilized.  Just one datanode is needed. 
Even if the ACK is delayed beyond the value of "dfs.client.socket-timeout", the 
pipeline should not break. Because of the heartbeat packets, the BlockReceiver 
in the datanode won't get a read timeout while the client is waiting for the 
last ack.


was (Author: kihwal):
As for the safety of the change, I feel confident since this has been running 
in our production for many years. But for preventing future breakages, we might 
need a unit test.

It might be possible to simulate the condition by delaying the ACK for last 
packet using {{DataNodeFaultInjector}}. If we do 1) write data 2) hflush, 3) 
enable the fault injector and 3) close(), the existing 
{{delaySendingAckToUpstream()}} may be utilized.  Just one datanode is needed. 
Even if the ACK is delayed beyond the value of 
"ipc.client.connection.maxidletime", the pipeline should not break. Because of 
the heartbeat packets, the BlockReceiver in the datanode won't get a read 
timeout while the client is waiting for the last ack.

> DataStreamer: keep sending heartbeat packets while streaming
> 
>
> Key: HDFS-15813
> URL: https://issues.apache.org/jira/browse/HDFS-15813
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: HDFS-15813.001.patch, HDFS-15813.002.patch
>
>
> In response to [HDFS-5032], [~daryn] made a change to our internal code to 
> ensure that heartbeats continue during data steaming, even in the face of a 
> slow disk.
> As [~kihwal] noted, absence of heartbeat during flush will be fixed in a 
> separate jira.  It doesn't look like this change was ever pushed back to 
> apache, so I am providing it here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12763) DataStreamer should heartbeat during flush

2021-02-03 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-12763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-12763:
--
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

> DataStreamer should heartbeat during flush
> --
>
> Key: HDFS-12763
> URL: https://issues.apache.org/jira/browse/HDFS-12763
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: HDFS-12763.001.patch
>
>
> From HDFS-5032:
> bq. Absence of heartbeat during flush will be fixed in a separate jira by 
> Daryn Sharp
> This JIRA tracks the case where absence of heartbeat can cause the pipeline 
> to fail if operations like flush take some time to complete.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15813) DataStreamer: keep sending heartbeat packets while streaming

2021-02-03 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278224#comment-17278224
 ] 

Kihwal Lee commented on HDFS-15813:
---

As for the safety of the change, I feel confident since this has been running 
in our production for many years. But for preventing future breakages, we might 
need a unit test.

It might be possible to simulate the condition by delaying the ACK for last 
packet using {{DataNodeFaultInjector}}. If we do 1) write data 2) hflush, 3) 
enable the fault injector and 3) close(), the existing 
{{delaySendingAckToUpstream()}} may be utilized.  Just one datanode is needed. 
Even if the ACK is delayed beyond the value of 
"ipc.client.connection.maxidletime", the pipeline should not break. Because of 
the heartbeat packets, the BlockReceiver in the datanode won't get a read 
timeout while the client is waiting for the last ack.

> DataStreamer: keep sending heartbeat packets while streaming
> 
>
> Key: HDFS-15813
> URL: https://issues.apache.org/jira/browse/HDFS-15813
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: HDFS-15813.001.patch, HDFS-15813.002.patch
>
>
> In response to [HDFS-5032], [~daryn] made a change to our internal code to 
> ensure that heartbeats continue during data steaming, even in the face of a 
> slow disk.
> As [~kihwal] noted, absence of heartbeat during flush will be fixed in a 
> separate jira.  It doesn't look like this change was ever pushed back to 
> apache, so I am providing it here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15799) Make DisallowedDatanodeException terse

2021-02-03 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15799:
--
Fix Version/s: 3.2.3
   3.4.0
   3.3.1
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

I've committed this to trunk, branch-3.3 and branch-3.2. Thanks for working on 
this, [~richard-ross].

> Make DisallowedDatanodeException terse
> --
>
> Key: HDFS-15799
> URL: https://issues.apache.org/jira/browse/HDFS-15799
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 2.10.1, 3.4.0
>Reporter: Richard
>Assignee: Richard
>Priority: Minor
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
> Attachments: HDFS-15799.001.patch
>
>
> When org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException is 
> thrown back to a datanode, the namenode logs a full stack trace.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15799) Make DisallowedDatanodeException terse

2021-02-03 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278040#comment-17278040
 ] 

Kihwal Lee commented on HDFS-15799:
---

+1 lgtm. The failed tests don't seem related to the patch.

> Make DisallowedDatanodeException terse
> --
>
> Key: HDFS-15799
> URL: https://issues.apache.org/jira/browse/HDFS-15799
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 2.10.1, 3.4.0
>Reporter: Richard
>Assignee: Richard
>Priority: Minor
> Attachments: HDFS-15799.001.patch
>
>
> When org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException is 
> thrown back to a datanode, the namenode logs a full stack trace.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15789) Lease renewal does not require namesystem lock

2021-01-27 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17273129#comment-17273129
 ] 

Kihwal Lee commented on HDFS-15789:
---

This is a safe change. 

The FSN lock is only protecting NN against renewing a lease during a HA 
transition, which should be done by only the active NN.  So after this patch, 
there can be a case where a lease renewal request is received and being 
processed while a NN is active, but finishes processing it during or after 
transitioning to standby.  However, this does not affect the file system 
consistency or violate the existing file system API semantics.   The important 
states are whether file is open and who has the lease. Anything that changes 
these states is edit-logged.  The renewal does not revive expired/revoked 
leases and is thus not edit-logged.

+1 for the patch.

> Lease renewal does not require namesystem lock
> --
>
> Key: HDFS-15789
> URL: https://issues.apache.org/jira/browse/HDFS-15789
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: HDFS-15789.001.patch
>
>
> [~daryn] found this while testing the performance for HDFS-15704.
> The lease manager is independent of the namesystem. Acquiring the lock causes 
> unnecessary lock contention that degrades throughput.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10498) Intermittent test failure org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength.testSnapshotfileLength

2021-01-27 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-10498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17273123#comment-17273123
 ] 

Kihwal Lee commented on HDFS-10498:
---

+1

> Intermittent test failure 
> org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength.testSnapshotfileLength
> ---
>
> Key: HDFS-10498
> URL: https://issues.apache.org/jira/browse/HDFS-10498
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, snapshots
>Affects Versions: 3.0.0-alpha1
>Reporter: Hanisha Koneru
>Assignee: Jim Brennan
>Priority: Major
> Attachments: HDFS-10498.001.patch, test_failure.txt
>
>
> Error Details
> Per https://builds.apache.org/job/PreCommit-HDFS-Build/15646/testReport/, we 
> had the following failure. Local rerun is successful.
> Error Details:
> {panel}
> Fail to get block MD5 for 
> LocatedBlock{BP-145245805-172.17.0.3-1464981728847:blk_1073741826_1002; 
> getBlockSize()=1; corrupt=false; offset=1024; 
> locs=[DatanodeInfoWithStorage[127.0.0.1:55764,DS-a33d7c97-9d4a-4694-a47e-a3187a33ed5a,DISK]]}
> {panel}
> Stack Trace: 
> {panel}
> java.io.IOException: Fail to get block MD5 for 
> LocatedBlock{BP-145245805-172.17.0.3-1464981728847:blk_1073741826_1002; 
> getBlockSize()=1; corrupt=false; offset=1024; 
> locs=[DatanodeInfoWithStorage[127.0.0.1:55764,DS-a33d7c97-9d4a-4694-a47e-a3187a33ed5a,DISK]]}
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$ReplicatedFileChecksumComputer.checksumBlocks(FileChecksumHelper.java:289)
>   at 
> org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:206)
>   at org.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:1731)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$31.doCall(DistributedFileSystem.java:1482)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$31.doCall(DistributedFileSystem.java:1479)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1490)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength.testSnapshotfileLength(TestSnapshotFileLength.java:137)
>  Standard Output  7 sec
> {panel}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15794) IBR and FBR use different queues to load data.

2021-01-27 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17272999#comment-17272999
 ] 

Kihwal Lee edited comment on HDFS-15794 at 1/27/21, 4:58 PM:
-

{quote}The problem here is that when the NameNode is blocked in processing the 
IBR, the FBR requested by the DN from the NameNode will be affected. Similarly, 
when the NameNode processing FBR is blocked.
{quote}

The serial processing of IBR and FBR is not a side-effect of the way a data 
structure is used (single queue).  In current namespace and block manager 
design, each report is processed with the fsn write lock held. The queue made 
it possible to process multiple IBRs under one lock, thus increasing 
throughput.  Having multiple queues for IBRs and FBRs won't help with 
concurrency. In fact, it will complicate things, as it needs to maintain 
certain processing order across multiple queues.

In order to make a meaningful performance improvement, we have to make NN 
perform concurrent block report processing. 


was (Author: kihwal):
{quote}The problem here is that when the NameNode is blocked in processing the 
IBR, the FBR requested by the DN from the NameNode will be affected. Similarly, 
when the NameNode processing FBR is blocked.
{quote}

The serial processing of IBR and FBR is not a side-effect of way a data 
structure is used (single queue).  In current namespace and block manager 
design, each report is processed with the fsn write lock held. The queue made 
it possible to process multiple IBRs under one lock, thus increasing 
throughput.  Having multiple queues for IBRs and FBRs won't help with 
concurrency. In fact, it will complicate things, as it needs to maintain 
certain processing order across multiple queues.

In order to make a meaningful performance improvement, we have to make NN 
perform concurrent block report processing. 

> IBR and FBR use different queues to load data.
> --
>
> Key: HDFS-15794
> URL: https://issues.apache.org/jira/browse/HDFS-15794
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>
> When DataNode reports data to NameNode, IBR and FBR are included here.
> After the NameNode receives the DataNode request, it temporarily stores the 
> data in a queue, here it refers to 
> BlockManager#BlockReportProcessingThread#queue.
> NameNodeRpcServer#blockReport()
> for (int r = 0; r   final BlockListAsLongs blocks = reports[r].getBlocks();
>  final int index = r;
>  noStaleStorages = bm.runBlockOp(() ->
>  bm.processReport(nodeReg, reports[index].getStorage(),
>  blocks, context));
>  }
> NameNodeRpcServer#blockReport()
> for (final StorageReceivedDeletedBlocks r: receivedAndDeletedBlocks) {
>  bm.enqueueBlockOp(new Runnable() {
>  @Override
>  public void run() {
>  try {
>  namesystem.processIncrementalBlockReport(nodeReg, r);
>  } catch (Exception ex) {
>  // usually because the node is unregistered/dead. next heartbeat
>  // will correct the problem
>  blockStateChangeLog.error(
>  "*BLOCK* NameNode.blockReceivedAndDeleted: "
>  + "failed from "+ nodeReg + ":" + ex.getMessage());
>  }
>  }
>  });
>  }
> The problem here is that when the NameNode is blocked in processing the IBR, 
> the FBR requested by the DN from the NameNode will be affected. Similarly, 
> when the NameNode processing FBR is blocked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15794) IBR and FBR use different queues to load data.

2021-01-27 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17272999#comment-17272999
 ] 

Kihwal Lee commented on HDFS-15794:
---

{quote}The problem here is that when the NameNode is blocked in processing the 
IBR, the FBR requested by the DN from the NameNode will be affected. Similarly, 
when the NameNode processing FBR is blocked.
{quote}

The serial processing of IBR and FBR is not a side-effect of way a data 
structure is used (single queue).  In current namespace and block manager 
design, each report is processed with the fsn write lock held. The queue made 
it possible to process multiple IBRs under one lock, thus increasing 
throughput.  Having multiple queues for IBRs and FBRs won't help with 
concurrency. In fact, it will complicate things, as it needs to maintain 
certain processing order across multiple queues.

In order to make a meaningful performance improvement, we have to make NN 
perform concurrent block report processing. 

> IBR and FBR use different queues to load data.
> --
>
> Key: HDFS-15794
> URL: https://issues.apache.org/jira/browse/HDFS-15794
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>
> When DataNode reports data to NameNode, IBR and FBR are included here.
> After the NameNode receives the DataNode request, it temporarily stores the 
> data in a queue, here it refers to 
> BlockManager#BlockReportProcessingThread#queue.
> NameNodeRpcServer#blockReport()
> for (int r = 0; r   final BlockListAsLongs blocks = reports[r].getBlocks();
>  final int index = r;
>  noStaleStorages = bm.runBlockOp(() ->
>  bm.processReport(nodeReg, reports[index].getStorage(),
>  blocks, context));
>  }
> NameNodeRpcServer#blockReport()
> for (final StorageReceivedDeletedBlocks r: receivedAndDeletedBlocks) {
>  bm.enqueueBlockOp(new Runnable() {
>  @Override
>  public void run() {
>  try {
>  namesystem.processIncrementalBlockReport(nodeReg, r);
>  } catch (Exception ex) {
>  // usually because the node is unregistered/dead. next heartbeat
>  // will correct the problem
>  blockStateChangeLog.error(
>  "*BLOCK* NameNode.blockReceivedAndDeleted: "
>  + "failed from "+ nodeReg + ":" + ex.getMessage());
>  }
>  }
>  });
>  }
> The problem here is that when the NameNode is blocked in processing the IBR, 
> the FBR requested by the DN from the NameNode will be affected. Similarly, 
> when the NameNode processing FBR is blocked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15755) Add option to set retry times of checking failure volume and Import powermock for mock static method

2021-01-05 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259042#comment-17259042
 ] 

Kihwal Lee commented on HDFS-15755:
---

I tried to add powermock a long time ago in HADOOP-7537 and someone else in 
HADOOP-9122.  They all met with oppositions. 

> Add option to set retry times of checking failure volume and Import powermock 
> for mock static method
> 
>
> Key: HDFS-15755
> URL: https://issues.apache.org/jira/browse/HDFS-15755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, test
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Minor
> Attachments: HDFS-15755.001.patch, HDFS-15755.002.patch, 
> HDFS-15755.003.patch
>
>
> For some mounted remote volume, small network shaking may causes the failure 
> of volume checking and the the volume will removed out from datanode. Add an 
> option to set retry times and interval according to StorageType.
> For mocking static methods in unit test, import PowerMock to 
> hadoop-hdfs-project.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15725) Lease Recovery never completes for a committed block which the DNs never finalize

2020-12-11 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247982#comment-17247982
 ] 

Kihwal Lee commented on HDFS-15725:
---

+1 

> Lease Recovery never completes for a committed block which the DNs never 
> finalize
> -
>
> Key: HDFS-15725
> URL: https://issues.apache.org/jira/browse/HDFS-15725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15725.001.patch, HDFS-15725.002.patch, 
> HDFS-15725.003.patch, lease_recovery_2_10.patch
>
>
> It a very rare condition, the HDFS client process can get killed right at the 
> time it is completing a block / file.
> The client sends the "complete" call to the namenode, moving the block into a 
> committed state, but it dies before it can send the final packet to the 
> Datanodes telling them to finalize the block.
> This means the blocks are stuck on the datanodes in RBW state and nothing 
> will ever tell them to move out of that state.
> The namenode / lease manager will retry forever to close the file, but it 
> will always complain it is waiting for blocks to reach minimal replication.
> I have a simple test and patch to fix this, but I think it warrants some 
> discussion on whether this is the correct thing to do, or if I need to put 
> the fix behind a config switch.
> My idea, is that if lease recovery occurs, and the block is still waiting on 
> "minimal replication", just put the file back to UNDER_CONSTRUCTION so that 
> on the next lease recovery attempt, BLOCK RECOVERY will happen, close the 
> file and move the replicas to FINALIZED.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15727) RpcQueueTimeAvgTime of the NameNode increases after it becomes StandBy

2020-12-11 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247977#comment-17247977
 ] 

Kihwal Lee commented on HDFS-15727:
---

Standby namenode replays edits with the name system write lock held. Depending 
on the transaction rate and tailing frequency, it can last a long time, even 
more so with the retry cache on.  During this time, most RPC calls are sitting 
in the call queue and not processed.  This can drive up the queue time high, 
but is of no concern.

> RpcQueueTimeAvgTime of the NameNode increases after it becomes StandBy
> --
>
> Key: HDFS-15727
> URL: https://issues.apache.org/jira/browse/HDFS-15727
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.8.2
>Reporter: Aihua Xu
>Priority: Major
> Attachments: image-2020-12-10-13-30-44-288.png
>
>
> RpcQueueTimeAvgTime of the NameNode increases after it becomes StandBy. It 
> will get resolved after it gets restarted. Seems there is something incorrect 
> about this metrics.
> See the following graph, the NameNode becomes StandBy at 10:13 while 
> RpcQueueTimeAvgTime increases instead.
> !image-2020-12-10-13-30-44-288.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15726) Client should only complete a file if the last block is finalized

2020-12-10 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee reassigned HDFS-15726:
-

Assignee: Ahmed Hussein

> Client should only complete a file if the last block is finalized
> -
>
> Key: HDFS-15726
> URL: https://issues.apache.org/jira/browse/HDFS-15726
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Ahmed Hussein
>Priority: Major
>
> We have seen certain versions of DFS client manage to call {{completeFile()}} 
> before the blocks are finalized. This is ensured by the way {{colseImpl()}} 
> is written.  When the logic and sematics around flushing or closing changes, 
> this can easily break, but hard to notice by just looking at the code.
> We propose adding a simple yet explicit logic to prevent any violation of 
> this protocol. {{completeFile()}} won't be called unless it has received an 
> ack for the last packet (the one causes finalization on the datanode side).  
> This will fix any current bug and also likely prevent inadvertently breaking 
> it in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15726) Client should only complete a file if the last block is finalized

2020-12-10 Thread Kihwal Lee (Jira)
Kihwal Lee created HDFS-15726:
-

 Summary: Client should only complete a file if the last block is 
finalized
 Key: HDFS-15726
 URL: https://issues.apache.org/jira/browse/HDFS-15726
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee


We have seen certain versions of DFS client manage to call {{completeFile()}} 
before the blocks are finalized. This is ensured by the way {{colseImpl()}} is 
written.  When the logic and sematics around flushing or closing changes, this 
can easily break, but hard to notice by just looking at the code.

We propose adding a simple yet explicit logic to prevent any violation of this 
protocol. {{completeFile()}} won't be called unless it has received an ack for 
the last packet (the one causes finalization on the datanode side).  This will 
fix any current bug and also likely prevent inadvertently breaking it in the 
future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15725) Lease Recovery never completes for a committed block which the DNs never finalize

2020-12-10 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247478#comment-17247478
 ] 

Kihwal Lee commented on HDFS-15725:
---

We have seen certain version of clients closing without completely flushing. 
Usually a close is called from the shutdown hook, which will be executed even 
for failing client. Normally internal flushing must happen before close, but 
there must be corner cases.  We have a patch for that too. The logic is very 
simple. If the client has not received an ack for the final packet that causes 
datanodes to finalize the replicas, skip calling completeFile().   We will file 
a jira and post the patch soon.  This will be a good thing to add even if there 
was no apparent bug.

> Lease Recovery never completes for a committed block which the DNs never 
> finalize
> -
>
> Key: HDFS-15725
> URL: https://issues.apache.org/jira/browse/HDFS-15725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15725.001.patch, lease_recovery_2_10.patch
>
>
> It a very rare condition, the HDFS client process can get killed right at the 
> time it is completing a block / file.
> The client sends the "complete" call to the namenode, moving the block into a 
> committed state, but it dies before it can send the final packet to the 
> Datanodes telling them to finalize the block.
> This means the blocks are stuck on the datanodes in RBW state and nothing 
> will ever tell them to move out of that state.
> The namenode / lease manager will retry forever to close the file, but it 
> will always complain it is waiting for blocks to reach minimal replication.
> I have a simple test and patch to fix this, but I think it warrants some 
> discussion on whether this is the correct thing to do, or if I need to put 
> the fix behind a config switch.
> My idea, is that if lease recovery occurs, and the block is still waiting on 
> "minimal replication", just put the file back to UNDER_CONSTRUCTION so that 
> on the next lease recovery attempt, BLOCK RECOVERY will happen, close the 
> file and move the replicas to FINALIZED.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15725) Lease Recovery never completes for a committed block which the DNs never finalize

2020-12-10 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247455#comment-17247455
 ] 

Kihwal Lee edited comment on HDFS-15725 at 12/10/20, 7:43 PM:
--

We have seen this more with later versions of clients.  The namenode, 
regardless of its version, cannot recover lease in this case.  The condition is 
triggered by a client that commits without finalizing. We did not have that 
problem with the 2.8 client.  While we need to harden the namenode side, we can 
also fix the client side.

bq. The client sends the "complete" call to the namenode, moving the block into 
a committed state, but it dies before it can send the final packet to the 
Datanodes telling them to finalize the block.

The client should never call {{completeFile()}} if it has not received the ack 
for the last packet. Older clients do not act like that.

{code}
   NameNode.stateChangeLog.warn(message);
+  // If the block is still not minimally replicated when lease recovery
+  // happens, it means the hard limit has passed, and it is unlikely to get
+  // minimally replicated, or another client is trying to recover the lease
+  // on the file. In both cases, it makes sense to move the file back to
+  // UNDER_CONSTRUCTION so BLOCK RECOVERY can happen.
+  
lastBlock.convertToBlockUnderConstruction(BlockUCState.UNDER_CONSTRUCTION,
+  
lastBlock.getUnderConstructionFeature().getExpectedStorageLocations());
{code}

I am not sure whether uncommitting the block is the best way.  The NN is 
capable of doing block recovery without it. [~daryn] wrote this patch 
internally for 2.10. We were about to push it out to the community.  I am 
attaching  [^lease_recovery_2_10.patch] , please take a look at it and let us 
know what you think.


was (Author: kihwal):
We have seen this more with later versions of clients.  The namenode, 
regardless of its version, cannot recover lease in this case.  The condition is 
triggered by a client that commits without finalizing. We did not have that 
problem with the 2.8 client.  While we need to harden the namenode side, we can 
also fix the client side.

bq. The client sends the "complete" call to the namenode, moving the block into 
a committed state, but it dies before it can send the final packet to the 
Datanodes telling them to finalize the block.

The client should never call {{completeFile()}} if it has not received the ack 
for the last packet. Older clients do not act like that.

{code}
   NameNode.stateChangeLog.warn(message);
+  // If the block is still not minimally replicated when lease recovery
+  // happens, it means the hard limit has passed, and it is unlikely to get
+  // minimally replicated, or another client is trying to recover the lease
+  // on the file. In both cases, it makes sense to move the file back to
+  // UNDER_CONSTRUCTION so BLOCK RECOVERY can happen.
+  
lastBlock.convertToBlockUnderConstruction(BlockUCState.UNDER_CONSTRUCTION,
+  
lastBlock.getUnderConstructionFeature().getExpectedStorageLocations());
{code}

I am not sure whether uncommitting the block is the best way.  The NN is 
capable of doing block recovery without it. @daryn wrote this patch internally 
for 2.10. We were about to push it out to the community.  I am attaching  
[^lease_recovery_2_10.patch] , please take a look at it and let us know what 
you think.

> Lease Recovery never completes for a committed block which the DNs never 
> finalize
> -
>
> Key: HDFS-15725
> URL: https://issues.apache.org/jira/browse/HDFS-15725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15725.001.patch, lease_recovery_2_10.patch
>
>
> It a very rare condition, the HDFS client process can get killed right at the 
> time it is completing a block / file.
> The client sends the "complete" call to the namenode, moving the block into a 
> committed state, but it dies before it can send the final packet to the 
> Datanodes telling them to finalize the block.
> This means the blocks are stuck on the datanodes in RBW state and nothing 
> will ever tell them to move out of that state.
> The namenode / lease manager will retry forever to close the file, but it 
> will always complain it is waiting for blocks to reach minimal replication.
> I have a simple test and patch to fix this, but I think it warrants some 
> discussion on whether this is the correct thing to do, or if I need to put 
> the fix behind a config switch.
> My idea, is that if lease recovery occurs, and the block is still waiting on 
> "minimal replication", 

[jira] [Comment Edited] (HDFS-15725) Lease Recovery never completes for a committed block which the DNs never finalize

2020-12-10 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247455#comment-17247455
 ] 

Kihwal Lee edited comment on HDFS-15725 at 12/10/20, 7:42 PM:
--

We have seen this more with later versions of clients.  The namenode, 
regardless of its version, cannot recover lease in this case.  The condition is 
triggered by a client that commits without finalizing. We did not have that 
problem with the 2.8 client.  While we need to harden the namenode side, we can 
also fix the client side.

bq. The client sends the "complete" call to the namenode, moving the block into 
a committed state, but it dies before it can send the final packet to the 
Datanodes telling them to finalize the block.

The client should never call {{completeFile()}} if it has not received the ack 
for the last packet. Older clients do not act like that.

{code}
   NameNode.stateChangeLog.warn(message);
+  // If the block is still not minimally replicated when lease recovery
+  // happens, it means the hard limit has passed, and it is unlikely to get
+  // minimally replicated, or another client is trying to recover the lease
+  // on the file. In both cases, it makes sense to move the file back to
+  // UNDER_CONSTRUCTION so BLOCK RECOVERY can happen.
+  
lastBlock.convertToBlockUnderConstruction(BlockUCState.UNDER_CONSTRUCTION,
+  
lastBlock.getUnderConstructionFeature().getExpectedStorageLocations());
{code}

I am not sure whether uncommitting the block is the best way.  The NN is 
capable of doing block recovery without it. @daryn wrote this patch internally 
for 2.10. We were about to push it out to the community.  I am attaching  
[^lease_recovery_2_10.patch] , please take a look at it and let us know what 
you think.


was (Author: kihwal):
We have seen this more with later versions of clients.  The namenode, 
regardless of its version, cannot recover lease in this case.  The condition is 
triggered by a client that commits without finalizing. We did not have that 
problem with the 2.8 client.  While we need to harden the namenode side, we can 
also fix the client side.

bq. The client sends the "complete" call to the namenode, moving the block into 
a committed state, but it dies before it can send the final packet to the 
Datanodes telling them to finalize the block.

The client should never call {{completeFile()}} if it has not received the ack 
for the last packet. Older clients do not act like that.

{code}
   NameNode.stateChangeLog.warn(message);
+  // If the block is still not minimally replicated when lease recovery
+  // happens, it means the hard limit has passed, and it is unlikely to get
+  // minimally replicated, or another client is trying to recover the lease
+  // on the file. In both cases, it makes sense to move the file back to
+  // UNDER_CONSTRUCTION so BLOCK RECOVERY can happen.
+  
lastBlock.convertToBlockUnderConstruction(BlockUCState.UNDER_CONSTRUCTION,
+  
lastBlock.getUnderConstructionFeature().getExpectedStorageLocations());
{noformat}

I am not sure whether uncommitting the block is the best way.  The NN is 
capable of doing block recovery without it. @daryn wrote this patch internally 
for 2.10. We were about to push it out to the community.  I am attaching  
[^lease_recovery_2_10.patch] , please take a look at it and let us know what 
you think.

> Lease Recovery never completes for a committed block which the DNs never 
> finalize
> -
>
> Key: HDFS-15725
> URL: https://issues.apache.org/jira/browse/HDFS-15725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15725.001.patch, lease_recovery_2_10.patch
>
>
> It a very rare condition, the HDFS client process can get killed right at the 
> time it is completing a block / file.
> The client sends the "complete" call to the namenode, moving the block into a 
> committed state, but it dies before it can send the final packet to the 
> Datanodes telling them to finalize the block.
> This means the blocks are stuck on the datanodes in RBW state and nothing 
> will ever tell them to move out of that state.
> The namenode / lease manager will retry forever to close the file, but it 
> will always complain it is waiting for blocks to reach minimal replication.
> I have a simple test and patch to fix this, but I think it warrants some 
> discussion on whether this is the correct thing to do, or if I need to put 
> the fix behind a config switch.
> My idea, is that if lease recovery occurs, and the block is still waiting on 
> "minimal 

[jira] [Commented] (HDFS-15725) Lease Recovery never completes for a committed block which the DNs never finalize

2020-12-10 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247455#comment-17247455
 ] 

Kihwal Lee commented on HDFS-15725:
---

We have seen this more with later versions of clients.  The namenode, 
regardless of its version, cannot recover lease in this case.  The condition is 
triggered by a client that commits without finalizing. We did not have that 
problem with the 2.8 client.  While we need to harden the namenode side, we can 
also fix the client side.

bq. The client sends the "complete" call to the namenode, moving the block into 
a committed state, but it dies before it can send the final packet to the 
Datanodes telling them to finalize the block.

The client should never call {{completeFile()}} if it has not received the ack 
for the last packet. Older clients do not act like that.

{code}
   NameNode.stateChangeLog.warn(message);
+  // If the block is still not minimally replicated when lease recovery
+  // happens, it means the hard limit has passed, and it is unlikely to get
+  // minimally replicated, or another client is trying to recover the lease
+  // on the file. In both cases, it makes sense to move the file back to
+  // UNDER_CONSTRUCTION so BLOCK RECOVERY can happen.
+  
lastBlock.convertToBlockUnderConstruction(BlockUCState.UNDER_CONSTRUCTION,
+  
lastBlock.getUnderConstructionFeature().getExpectedStorageLocations());
{noformat}

I am not sure whether uncommitting the block is the best way.  The NN is 
capable of doing block recovery without it. @daryn wrote this patch internally 
for 2.10. We were about to push it out to the community.  I am attaching  
[^lease_recovery_2_10.patch] , please take a look at it and let us know what 
you think.

> Lease Recovery never completes for a committed block which the DNs never 
> finalize
> -
>
> Key: HDFS-15725
> URL: https://issues.apache.org/jira/browse/HDFS-15725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15725.001.patch, lease_recovery_2_10.patch
>
>
> It a very rare condition, the HDFS client process can get killed right at the 
> time it is completing a block / file.
> The client sends the "complete" call to the namenode, moving the block into a 
> committed state, but it dies before it can send the final packet to the 
> Datanodes telling them to finalize the block.
> This means the blocks are stuck on the datanodes in RBW state and nothing 
> will ever tell them to move out of that state.
> The namenode / lease manager will retry forever to close the file, but it 
> will always complain it is waiting for blocks to reach minimal replication.
> I have a simple test and patch to fix this, but I think it warrants some 
> discussion on whether this is the correct thing to do, or if I need to put 
> the fix behind a config switch.
> My idea, is that if lease recovery occurs, and the block is still waiting on 
> "minimal replication", just put the file back to UNDER_CONSTRUCTION so that 
> on the next lease recovery attempt, BLOCK RECOVERY will happen, close the 
> file and move the replicas to FINALIZED.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15725) Lease Recovery never completes for a committed block which the DNs never finalize

2020-12-10 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15725:
--
Attachment: lease_recovery_2_10.patch

> Lease Recovery never completes for a committed block which the DNs never 
> finalize
> -
>
> Key: HDFS-15725
> URL: https://issues.apache.org/jira/browse/HDFS-15725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15725.001.patch, lease_recovery_2_10.patch
>
>
> It a very rare condition, the HDFS client process can get killed right at the 
> time it is completing a block / file.
> The client sends the "complete" call to the namenode, moving the block into a 
> committed state, but it dies before it can send the final packet to the 
> Datanodes telling them to finalize the block.
> This means the blocks are stuck on the datanodes in RBW state and nothing 
> will ever tell them to move out of that state.
> The namenode / lease manager will retry forever to close the file, but it 
> will always complain it is waiting for blocks to reach minimal replication.
> I have a simple test and patch to fix this, but I think it warrants some 
> discussion on whether this is the correct thing to do, or if I need to put 
> the fix behind a config switch.
> My idea, is that if lease recovery occurs, and the block is still waiting on 
> "minimal replication", just put the file back to UNDER_CONSTRUCTION so that 
> on the next lease recovery attempt, BLOCK RECOVERY will happen, close the 
> file and move the replicas to FINALIZED.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15622) Deleted blocks linger in the replications queue

2020-10-22 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15622:
--
Fix Version/s: 3.2.3
   3.1.5
   3.3.1
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

I've committed this to trunk, branch-3.3, branch-3.2 and branch-3.1. Thanks for 
the patch, [~ahussein]!

> Deleted blocks linger in the replications queue
> ---
>
> Key: HDFS-15622
> URL: https://issues.apache.org/jira/browse/HDFS-15622
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
> Attachments: HDFS-15622.001.patch, HDFS-15622.002.patch
>
>
> We had incident whereas after resolving a missing blocks incident by 
> restarting two dead nodes, there were still 8 missing, but the list was 
> empty. Metasave shows the 8 blocks are "orphaned" meaning the files were 
> already deleted. It is unclear why they were left in the replication queue.
> * The containing node was flaky and started stoped multiple time.
> * The block allocation didn't work well due to the cluster-level storage 
> space exhaustion.
> * The NN was in safe mode.
> Triggering a full block report from the node didn't have any effect. It will 
> clear up if a failover happens as the repl queue will be reinitialized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15622) Deleted blocks linger in the replications queue

2020-10-22 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15622:
--
Fix Version/s: 3.4.0

> Deleted blocks linger in the replications queue
> ---
>
> Key: HDFS-15622
> URL: https://issues.apache.org/jira/browse/HDFS-15622
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15622.001.patch, HDFS-15622.002.patch
>
>
> We had incident whereas after resolving a missing blocks incident by 
> restarting two dead nodes, there were still 8 missing, but the list was 
> empty. Metasave shows the 8 blocks are "orphaned" meaning the files were 
> already deleted. It is unclear why they were left in the replication queue.
> * The containing node was flaky and started stoped multiple time.
> * The block allocation didn't work well due to the cluster-level storage 
> space exhaustion.
> * The NN was in safe mode.
> Triggering a full block report from the node didn't have any effect. It will 
> clear up if a failover happens as the repl queue will be reinitialized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15622) Deleted blocks linger in the replications queue

2020-10-22 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17219360#comment-17219360
 ] 

Kihwal Lee commented on HDFS-15622:
---

+1 on version 2 of the patch.

> Deleted blocks linger in the replications queue
> ---
>
> Key: HDFS-15622
> URL: https://issues.apache.org/jira/browse/HDFS-15622
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: HDFS-15622.001.patch, HDFS-15622.002.patch
>
>
> We had incident whereas after resolving a missing blocks incident by 
> restarting two dead nodes, there were still 8 missing, but the list was 
> empty. Metasave shows the 8 blocks are "orphaned" meaning the files were 
> already deleted. It is unclear why they were left in the replication queue.
> * The containing node was flaky and started stoped multiple time.
> * The block allocation didn't work well due to the cluster-level storage 
> space exhaustion.
> * The NN was in safe mode.
> Triggering a full block report from the node didn't have any effect. It will 
> clear up if a failover happens as the repl queue will be reinitialized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15618) Improve datanode shutdown latency

2020-10-22 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15618:
--
Hadoop Flags: Reviewed
  Resolution: Fixed
  Status: Resolved  (was: Patch Available)

I've committed this to trunk, branch-3.3, branch-3.2 and branch-3.1. Thanks for 
working on this, [~ahussein].

> Improve datanode shutdown latency
> -
>
> Key: HDFS-15618
> URL: https://issues.apache.org/jira/browse/HDFS-15618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
> Attachments: HDFS-15618-branch-3.3.004.patch, HDFS-15618.001.patch, 
> HDFS-15618.002.patch, HDFS-15618.003.patch, HDFS-15618.004.patch
>
>
> The shutdown of Datanode is a very long latency. A block scanner waits for 5 
> minutes to join on each VolumeScanner thread.
> Since the scanners are daemon threads and do not alter the block content, it 
> is safe to ignore such conditions on shutdown of Datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15618) Improve datanode shutdown latency

2020-10-22 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15618:
--
Fix Version/s: 3.2.3
   3.1.5

> Improve datanode shutdown latency
> -
>
> Key: HDFS-15618
> URL: https://issues.apache.org/jira/browse/HDFS-15618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
> Attachments: HDFS-15618-branch-3.3.004.patch, HDFS-15618.001.patch, 
> HDFS-15618.002.patch, HDFS-15618.003.patch, HDFS-15618.004.patch
>
>
> The shutdown of Datanode is a very long latency. A block scanner waits for 5 
> minutes to join on each VolumeScanner thread.
> Since the scanners are daemon threads and do not alter the block content, it 
> is safe to ignore such conditions on shutdown of Datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15618) Improve datanode shutdown latency

2020-10-22 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15618:
--
Fix Version/s: 3.3.1

> Improve datanode shutdown latency
> -
>
> Key: HDFS-15618
> URL: https://issues.apache.org/jira/browse/HDFS-15618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15618-branch-3.3.004.patch, HDFS-15618.001.patch, 
> HDFS-15618.002.patch, HDFS-15618.003.patch, HDFS-15618.004.patch
>
>
> The shutdown of Datanode is a very long latency. A block scanner waits for 5 
> minutes to join on each VolumeScanner thread.
> Since the scanners are daemon threads and do not alter the block content, it 
> is safe to ignore such conditions on shutdown of Datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15618) Improve datanode shutdown latency

2020-10-21 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218127#comment-17218127
 ] 

Kihwal Lee commented on HDFS-15618:
---

I've committed this to trunk, but the patch does not work in branch-3.3 and so 
on.
{noformat}
org.apache.hadoop.thirdparty.com.google.common.annotations does not exist
{noformat}

> Improve datanode shutdown latency
> -
>
> Key: HDFS-15618
> URL: https://issues.apache.org/jira/browse/HDFS-15618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15618.001.patch, HDFS-15618.002.patch, 
> HDFS-15618.003.patch, HDFS-15618.004.patch
>
>
> The shutdown of Datanode is a very long latency. A block scanner waits for 5 
> minutes to join on each VolumeScanner thread.
> Since the scanners are daemon threads and do not alter the block content, it 
> is safe to ignore such conditions on shutdown of Datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15618) Improve datanode shutdown latency

2020-10-21 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15618:
--
Fix Version/s: 3.4.0

> Improve datanode shutdown latency
> -
>
> Key: HDFS-15618
> URL: https://issues.apache.org/jira/browse/HDFS-15618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15618.001.patch, HDFS-15618.002.patch, 
> HDFS-15618.003.patch, HDFS-15618.004.patch
>
>
> The shutdown of Datanode is a very long latency. A block scanner waits for 5 
> minutes to join on each VolumeScanner thread.
> Since the scanners are daemon threads and do not alter the block content, it 
> is safe to ignore such conditions on shutdown of Datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15618) Improve datanode shutdown latency

2020-10-20 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218120#comment-17218120
 ] 

Kihwal Lee commented on HDFS-15618:
---

+1 the version 4 of the patch looks good.

> Improve datanode shutdown latency
> -
>
> Key: HDFS-15618
> URL: https://issues.apache.org/jira/browse/HDFS-15618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: HDFS-15618.001.patch, HDFS-15618.002.patch, 
> HDFS-15618.003.patch, HDFS-15618.004.patch
>
>
> The shutdown of Datanode is a very long latency. A block scanner waits for 5 
> minutes to join on each VolumeScanner thread.
> Since the scanners are daemon threads and do not alter the block content, it 
> is safe to ignore such conditions on shutdown of Datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15622) Deleted blocks linger in the replications queue

2020-10-20 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218087#comment-17218087
 ] 

Kihwal Lee commented on HDFS-15622:
---

The patch looks good. the only minor nit is the name of one of the new 
variables "corruptFlag".  It sounds like it is telling something is corrupt or 
user specified it to do something with corruption. Maybe it is better to make 
it something more descriptive like "inCorruptLevel".  The logic seems fine.

> Deleted blocks linger in the replications queue
> ---
>
> Key: HDFS-15622
> URL: https://issues.apache.org/jira/browse/HDFS-15622
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: HDFS-15622.001.patch
>
>
> We had incident whereas after resolving a missing blocks incident by 
> restarting two dead nodes, there were still 8 missing, but the list was 
> empty. Metasave shows the 8 blocks are "orphaned" meaning the files were 
> already deleted. It is unclear why they were left in the replication queue.
> * The containing node was flaky and started stoped multiple time.
> * The block allocation didn't work well due to the cluster-level storage 
> space exhaustion.
> * The NN was in safe mode.
> Triggering a full block report from the node didn't have any effect. It will 
> clear up if a failover happens as the repl queue will be reinitialized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14941) Potential editlog race condition can cause corrupted file

2020-10-19 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140608#comment-17140608
 ] 

Kihwal Lee edited comment on HDFS-14941 at 10/19/20, 9:17 PM:
--

Filed HDFS-15421 with more details.


was (Author: kihwal):
Filed HDFS-1542 with more details.

> Potential editlog race condition can cause corrupted file
> -
>
> Key: HDFS-14941
> URL: https://issues.apache.org/jira/browse/HDFS-14941
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>  Labels: ha
> Fix For: 3.3.0, 3.2.2, 2.10.1, 3.1.5
>
> Attachments: HDFS-14941.001.patch, HDFS-14941.002.patch, 
> HDFS-14941.003.patch, HDFS-14941.004.patch, HDFS-14941.005.patch, 
> HDFS-14941.006.patch
>
>
> Recently we encountered an issue that, after a failover, NameNode complains 
> corrupted file/missing blocks. The blocks did recover after full block 
> reports, so the blocks are not actually missing. After further investigation, 
> we believe this is what happened:
> First of all, on SbN, it is possible that it receives block reports before 
> corresponding edit tailing happened. In which case SbN postpones processing 
> the DN block report, handled by the guarding logic below:
> {code:java}
>   if (shouldPostponeBlocksFromFuture &&
>   namesystem.isGenStampInFuture(iblk)) {
> queueReportedBlock(storageInfo, iblk, reportedState,
> QUEUE_REASON_FUTURE_GENSTAMP);
> continue;
>   }
> {code}
> Basically if reported block has a future generation stamp, the DN report gets 
> requeued.
> However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
> {code:java}
>   // allocate new block, record block locations in INode.
>   newBlock = createNewBlock();
>   INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
>   saveAllocatedBlock(src, inodesInPath, newBlock, targets);
>   persistNewBlock(src, pendingFile);
>   offset = pendingFile.computeFileSize();
> {code}
> The line
>  {{newBlock = createNewBlock();}}
>  Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on 
> Standby
>  while the following line
>  {{persistNewBlock(src, pendingFile);}}
>  would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on 
> Standby.
> Then the race condition is that, imagine Standby has just processed 
> {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to 
> be in different setment). Now a block report with new generation stamp comes 
> in.
> Since the genstamp bump has already been processed, the reported block may 
> not be considered as future block. So the guarding logic passes. But 
> actually, the block hasn't been added to blockmap, because the second edit is 
> yet to be tailed. So, the block then gets added to invalidate block list and 
> we saw messages like:
> {code:java}
> BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file
> {code}
> Even worse, since this IBR is effectively lost, the NameNode has no 
> information about this block, until the next full block report. So after a 
> failover, the NN marks it as corrupt.
> This issue won't happen though, if both of the edit entries get tailed all 
> together, so no IBR processing can happen in between. But in our case, we set 
> edit tailing interval to super low (to allow Standby read), so when under 
> high workload, there is a much much higher chance that the two entries are 
> tailed separately, causing the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15628) HttpFS server throws NPE if a file is a symlink

2020-10-16 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215497#comment-17215497
 ] 

Kihwal Lee commented on HDFS-15628:
---

I've committed this to trunk, branch-3.3, branch-3.2 and branch-3.1. Thanks for 
working on this, [~ahussein].

> HttpFS server throws NPE if a file is a symlink
> ---
>
> Key: HDFS-15628
> URL: https://issues.apache.org/jira/browse/HDFS-15628
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, httpfs
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
> Attachments: HDFS-15628.001.patch, HDFS-15628.002.patch
>
>
> If a directory containing a symlink is listed, the client (WebHfdsFileSystem) 
> blows up with a NPE. If {{type}} is {{SYMLINK}}, there must be {{symlink}} 
> field whose value is the link target string. HttpFS returns a response 
> without {{symlink}} filed. {{WebHfdsFileSystem}} assumes it is there for a 
> symlink and blindly tries to parse it, causing NPE.
> This is not an issue if the destination cluster does not have symlinks 
> enabled.
>  
> {code:bash}
> java.io.IOException: localhost:55901: Response decoding failure: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$FsPathResponseRunner.getResponse(WebHdfsFileSystem.java:967)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:816)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:638)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:676)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:672)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.listStatus(WebHdfsFileSystem.java:1731)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.testListSymLinkStatus(BaseTestHttpFSWith.java:388)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.operation(BaseTestHttpFSWith.java:1230)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.testOperation(BaseTestHttpFSWith.java:1363)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.hadoop.test.TestHdfsHelper$HdfsStatement.evaluate(TestHdfsHelper.java:95)
>   at 
> org.apache.hadoop.test.TestDirHelper$1.evaluate(TestDirHelper.java:106)
>   at 
> org.apache.hadoop.test.TestExceptionHelper$1.evaluate(TestExceptionHelper.java:42)
>   at 
> org.apache.hadoop.test.TestJettyHelper$1.evaluate(TestJettyHelper.java:74)
>   at 
> org.apache.hadoop.test.TestDirHelper$1.evaluate(TestDirHelper.java:106)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runners.Suite.runChild(Suite.java:128)
>   at org.junit.runners.Suite.runChild(Suite.java:27)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at 

[jira] [Updated] (HDFS-15628) HttpFS server throws NPE if a file is a symlink

2020-10-16 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15628:
--
Fix Version/s: 3.1.5

> HttpFS server throws NPE if a file is a symlink
> ---
>
> Key: HDFS-15628
> URL: https://issues.apache.org/jira/browse/HDFS-15628
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, httpfs
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
> Attachments: HDFS-15628.001.patch, HDFS-15628.002.patch
>
>
> If a directory containing a symlink is listed, the client (WebHfdsFileSystem) 
> blows up with a NPE. If {{type}} is {{SYMLINK}}, there must be {{symlink}} 
> field whose value is the link target string. HttpFS returns a response 
> without {{symlink}} filed. {{WebHfdsFileSystem}} assumes it is there for a 
> symlink and blindly tries to parse it, causing NPE.
> This is not an issue if the destination cluster does not have symlinks 
> enabled.
>  
> {code:bash}
> java.io.IOException: localhost:55901: Response decoding failure: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$FsPathResponseRunner.getResponse(WebHdfsFileSystem.java:967)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:816)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:638)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:676)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:672)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.listStatus(WebHdfsFileSystem.java:1731)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.testListSymLinkStatus(BaseTestHttpFSWith.java:388)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.operation(BaseTestHttpFSWith.java:1230)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.testOperation(BaseTestHttpFSWith.java:1363)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.hadoop.test.TestHdfsHelper$HdfsStatement.evaluate(TestHdfsHelper.java:95)
>   at 
> org.apache.hadoop.test.TestDirHelper$1.evaluate(TestDirHelper.java:106)
>   at 
> org.apache.hadoop.test.TestExceptionHelper$1.evaluate(TestExceptionHelper.java:42)
>   at 
> org.apache.hadoop.test.TestJettyHelper$1.evaluate(TestJettyHelper.java:74)
>   at 
> org.apache.hadoop.test.TestDirHelper$1.evaluate(TestDirHelper.java:106)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runners.Suite.runChild(Suite.java:128)
>   at org.junit.runners.Suite.runChild(Suite.java:27)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 

[jira] [Updated] (HDFS-15627) Audit log deletes before collecting blocks

2020-10-16 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15627:
--
Fix Version/s: 3.2.3
   3.1.5
   3.4.0
   3.3.1
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

I've committed this to trunk, branch-3.3, branch-3.2 and branch-3.1. Thanks for 
working on this [~ahussein].

> Audit log deletes before collecting blocks
> --
>
> Key: HDFS-15627
> URL: https://issues.apache.org/jira/browse/HDFS-15627
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: logging, namenode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
> Attachments: HDFS-15627.001.patch
>
>
> Deletes currently collect blocks in the write lock, write the edit, 
> incrementally block delete, finally +audit log+. It should be collect blocks, 
> edit log, +audit log+, incremental delete. Once the edit is durable it's 
> consistent to audit log the delete. There is no sense in deferring the audit 
> into the indeterminate future.
> The problem occurs when thereto server hung due to large deletes but it won't 
> be easy to identify the problem. That should have been easily identified as 
> the first delete logged after the hang.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15627) Audit log deletes before collecting blocks

2020-10-16 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15627:
--
Summary: Audit log deletes before collecting blocks  (was: Audit log 
deletes after edit is written)

> Audit log deletes before collecting blocks
> --
>
> Key: HDFS-15627
> URL: https://issues.apache.org/jira/browse/HDFS-15627
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: logging, namenode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: HDFS-15627.001.patch
>
>
> Deletes currently collect blocks in the write lock, write the edit, 
> incrementally block delete, finally +audit log+. It should be collect blocks, 
> edit log, +audit log+, incremental delete. Once the edit is durable it's 
> consistent to audit log the delete. There is no sense in deferring the audit 
> into the indeterminate future.
> The problem occurs when thereto server hung due to large deletes but it won't 
> be easy to identify the problem. That should have been easily identified as 
> the first delete logged after the hang.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15627) Audit log deletes after edit is written

2020-10-16 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215476#comment-17215476
 ] 

Kihwal Lee commented on HDFS-15627:
---

+1 lgtm

> Audit log deletes after edit is written
> ---
>
> Key: HDFS-15627
> URL: https://issues.apache.org/jira/browse/HDFS-15627
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: logging, namenode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: HDFS-15627.001.patch
>
>
> Deletes currently collect blocks in the write lock, write the edit, 
> incrementally block delete, finally +audit log+. It should be collect blocks, 
> edit log, +audit log+, incremental delete. Once the edit is durable it's 
> consistent to audit log the delete. There is no sense in deferring the audit 
> into the indeterminate future.
> The problem occurs when thereto server hung due to large deletes but it won't 
> be easy to identify the problem. That should have been easily identified as 
> the first delete logged after the hang.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15618) Improve datanode shutdown latency

2020-10-16 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215454#comment-17215454
 ] 

Kihwal Lee commented on HDFS-15618:
---

It can be as fancy as adding a builder method for setting it (for exceptional 
cases) with a default value of 30 seconds. Or simply set it to 30 in places 
like {{startDataNodes()}}.

> Improve datanode shutdown latency
> -
>
> Key: HDFS-15618
> URL: https://issues.apache.org/jira/browse/HDFS-15618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: HDFS-15618.001.patch, HDFS-15618.002.patch, 
> HDFS-15618.003.patch
>
>
> The shutdown of Datanode is a very long latency. A block scanner waits for 5 
> minutes to join on each VolumeScanner thread.
> Since the scanners are daemon threads and do not alter the block content, it 
> is safe to ignore such conditions on shutdown of Datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15618) Improve datanode shutdown latency

2020-10-16 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215452#comment-17215452
 ] 

Kihwal Lee commented on HDFS-15618:
---

In production, there is no harm in exiting after waiting for 5 seconds. But in 
junit, as you pointed out, it might cause more failures when the environment is 
slow.  We can set the timeout to something like 30 seconds in the mini dfs 
cluster's base config.

> Improve datanode shutdown latency
> -
>
> Key: HDFS-15618
> URL: https://issues.apache.org/jira/browse/HDFS-15618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: HDFS-15618.001.patch, HDFS-15618.002.patch, 
> HDFS-15618.003.patch
>
>
> The shutdown of Datanode is a very long latency. A block scanner waits for 5 
> minutes to join on each VolumeScanner thread.
> Since the scanners are daemon threads and do not alter the block content, it 
> is safe to ignore such conditions on shutdown of Datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15628) HttpFS server throws NPE if a file is a symlink

2020-10-14 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15628:
--
Fix Version/s: 3.2.3

> HttpFS server throws NPE if a file is a symlink
> ---
>
> Key: HDFS-15628
> URL: https://issues.apache.org/jira/browse/HDFS-15628
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, httpfs
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
> Attachments: HDFS-15628.001.patch, HDFS-15628.002.patch
>
>
> If a directory containing a symlink is listed, the client (WebHfdsFileSystem) 
> blows up with a NPE. If {{type}} is {{SYMLINK}}, there must be {{symlink}} 
> field whose value is the link target string. HttpFS returns a response 
> without {{symlink}} filed. {{WebHfdsFileSystem}} assumes it is there for a 
> symlink and blindly tries to parse it, causing NPE.
> This is not an issue if the destination cluster does not have symlinks 
> enabled.
>  
> {code:bash}
> java.io.IOException: localhost:55901: Response decoding failure: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$FsPathResponseRunner.getResponse(WebHdfsFileSystem.java:967)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:816)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:638)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:676)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:672)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.listStatus(WebHdfsFileSystem.java:1731)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.testListSymLinkStatus(BaseTestHttpFSWith.java:388)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.operation(BaseTestHttpFSWith.java:1230)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.testOperation(BaseTestHttpFSWith.java:1363)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.hadoop.test.TestHdfsHelper$HdfsStatement.evaluate(TestHdfsHelper.java:95)
>   at 
> org.apache.hadoop.test.TestDirHelper$1.evaluate(TestDirHelper.java:106)
>   at 
> org.apache.hadoop.test.TestExceptionHelper$1.evaluate(TestExceptionHelper.java:42)
>   at 
> org.apache.hadoop.test.TestJettyHelper$1.evaluate(TestJettyHelper.java:74)
>   at 
> org.apache.hadoop.test.TestDirHelper$1.evaluate(TestDirHelper.java:106)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runners.Suite.runChild(Suite.java:128)
>   at org.junit.runners.Suite.runChild(Suite.java:27)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 

[jira] [Updated] (HDFS-15628) HttpFS server throws NPE if a file is a symlink

2020-10-14 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15628:
--
Fix Version/s: 3.4.0
   3.3.1
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> HttpFS server throws NPE if a file is a symlink
> ---
>
> Key: HDFS-15628
> URL: https://issues.apache.org/jira/browse/HDFS-15628
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, httpfs
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15628.001.patch, HDFS-15628.002.patch
>
>
> If a directory containing a symlink is listed, the client (WebHfdsFileSystem) 
> blows up with a NPE. If {{type}} is {{SYMLINK}}, there must be {{symlink}} 
> field whose value is the link target string. HttpFS returns a response 
> without {{symlink}} filed. {{WebHfdsFileSystem}} assumes it is there for a 
> symlink and blindly tries to parse it, causing NPE.
> This is not an issue if the destination cluster does not have symlinks 
> enabled.
>  
> {code:bash}
> java.io.IOException: localhost:55901: Response decoding failure: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$FsPathResponseRunner.getResponse(WebHdfsFileSystem.java:967)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:816)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:638)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:676)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:672)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.listStatus(WebHdfsFileSystem.java:1731)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.testListSymLinkStatus(BaseTestHttpFSWith.java:388)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.operation(BaseTestHttpFSWith.java:1230)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.testOperation(BaseTestHttpFSWith.java:1363)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.hadoop.test.TestHdfsHelper$HdfsStatement.evaluate(TestHdfsHelper.java:95)
>   at 
> org.apache.hadoop.test.TestDirHelper$1.evaluate(TestDirHelper.java:106)
>   at 
> org.apache.hadoop.test.TestExceptionHelper$1.evaluate(TestExceptionHelper.java:42)
>   at 
> org.apache.hadoop.test.TestJettyHelper$1.evaluate(TestJettyHelper.java:74)
>   at 
> org.apache.hadoop.test.TestDirHelper$1.evaluate(TestDirHelper.java:106)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runners.Suite.runChild(Suite.java:128)
>   at org.junit.runners.Suite.runChild(Suite.java:27)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at 

[jira] [Updated] (HDFS-15628) HttpFS server throws NPE if a file is a symlink

2020-10-14 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15628:
--
Summary: HttpFS server throws NPE if a file is a symlink  (was: https 
throws NPE if a file is a symlink)

> HttpFS server throws NPE if a file is a symlink
> ---
>
> Key: HDFS-15628
> URL: https://issues.apache.org/jira/browse/HDFS-15628
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, httpfs
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: HDFS-15628.001.patch, HDFS-15628.002.patch
>
>
> If a directory containing a symlink is listed, the client (WebHfdsFileSystem) 
> blows up with a NPE. If {{type}} is {{SYMLINK}}, there must be {{symlink}} 
> field whose value is the link target string. HttpFS returns a response 
> without {{symlink}} filed. {{WebHfdsFileSystem}} assumes it is there for a 
> symlink and blindly tries to parse it, causing NPE.
> This is not an issue if the destination cluster does not have symlinks 
> enabled.
>  
> {code:bash}
> java.io.IOException: localhost:55901: Response decoding failure: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$FsPathResponseRunner.getResponse(WebHdfsFileSystem.java:967)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:816)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:638)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:676)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:672)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.listStatus(WebHdfsFileSystem.java:1731)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.testListSymLinkStatus(BaseTestHttpFSWith.java:388)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.operation(BaseTestHttpFSWith.java:1230)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.testOperation(BaseTestHttpFSWith.java:1363)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.hadoop.test.TestHdfsHelper$HdfsStatement.evaluate(TestHdfsHelper.java:95)
>   at 
> org.apache.hadoop.test.TestDirHelper$1.evaluate(TestDirHelper.java:106)
>   at 
> org.apache.hadoop.test.TestExceptionHelper$1.evaluate(TestExceptionHelper.java:42)
>   at 
> org.apache.hadoop.test.TestJettyHelper$1.evaluate(TestJettyHelper.java:74)
>   at 
> org.apache.hadoop.test.TestDirHelper$1.evaluate(TestDirHelper.java:106)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runners.Suite.runChild(Suite.java:128)
>   at org.junit.runners.Suite.runChild(Suite.java:27)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 

[jira] [Commented] (HDFS-15628) https throws NPE if a file is a symlink

2020-10-14 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214306#comment-17214306
 ] 

Kihwal Lee commented on HDFS-15628:
---

+1 LGTM. We've been running the same server-side change for over 2 years now.

> https throws NPE if a file is a symlink
> ---
>
> Key: HDFS-15628
> URL: https://issues.apache.org/jira/browse/HDFS-15628
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, httpfs
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: HDFS-15628.001.patch, HDFS-15628.002.patch
>
>
> If a directory containing a symlink is listed, the client (WebHfdsFileSystem) 
> blows up with a NPE. If {{type}} is {{SYMLINK}}, there must be {{symlink}} 
> field whose value is the link target string. HttpFS returns a response 
> without {{symlink}} filed. {{WebHfdsFileSystem}} assumes it is there for a 
> symlink and blindly tries to parse it, causing NPE.
> This is not an issue if the destination cluster does not have symlinks 
> enabled.
>  
> {code:bash}
> java.io.IOException: localhost:55901: Response decoding failure: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$FsPathResponseRunner.getResponse(WebHdfsFileSystem.java:967)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:816)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:638)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:676)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:672)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.listStatus(WebHdfsFileSystem.java:1731)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.testListSymLinkStatus(BaseTestHttpFSWith.java:388)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.operation(BaseTestHttpFSWith.java:1230)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.testOperation(BaseTestHttpFSWith.java:1363)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.hadoop.test.TestHdfsHelper$HdfsStatement.evaluate(TestHdfsHelper.java:95)
>   at 
> org.apache.hadoop.test.TestDirHelper$1.evaluate(TestDirHelper.java:106)
>   at 
> org.apache.hadoop.test.TestExceptionHelper$1.evaluate(TestExceptionHelper.java:42)
>   at 
> org.apache.hadoop.test.TestJettyHelper$1.evaluate(TestJettyHelper.java:74)
>   at 
> org.apache.hadoop.test.TestDirHelper$1.evaluate(TestDirHelper.java:106)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runners.Suite.runChild(Suite.java:128)
>   at org.junit.runners.Suite.runChild(Suite.java:27)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 

[jira] [Commented] (HDFS-15618) Improve datanode shutdown latency

2020-10-14 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214271#comment-17214271
 ] 

Kihwal Lee commented on HDFS-15618:
---

The patch looks okay.  Although it is better than the previous value of 5 
minutes, the default timeout of 60 seconds seems still excessive. How about 
making it 5 seconds?

> Improve datanode shutdown latency
> -
>
> Key: HDFS-15618
> URL: https://issues.apache.org/jira/browse/HDFS-15618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: HDFS-15618.001.patch, HDFS-15618.002.patch
>
>
> The shutdown of Datanode is a very long latency. A block scanner waits for 5 
> minutes to join on each VolumeScanner thread.
> Since the scanners are daemon threads and do not alter the block content, it 
> is safe to ignore such conditions on shutdown of Datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15581) Access Controlled HTTPFS Proxy

2020-09-22 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200188#comment-17200188
 ] 

Kihwal Lee commented on HDFS-15581:
---

I've committed this to trunk, branch-3.3 and branch-3.2. Thanks for working on 
this, [~richard-ross].

> Access Controlled HTTPFS Proxy
> --
>
> Key: HDFS-15581
> URL: https://issues.apache.org/jira/browse/HDFS-15581
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Affects Versions: 3.4.0
>Reporter: Richard
>Assignee: Richard
>Priority: Minor
> Fix For: 3.2.2, 3.3.1, 3.4.0
>
> Attachments: HADOOP-17244.001.patch
>
>
> There are certain data migration patterns that require a way to limit access 
> to the HDFS via the HTTPFS proxy.  The needed access modes are read-write, 
> read-only and write-only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15581) Access Controlled HTTPFS Proxy

2020-09-22 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15581:
--
Fix Version/s: 3.4.0
   3.3.1
   3.2.2
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Access Controlled HTTPFS Proxy
> --
>
> Key: HDFS-15581
> URL: https://issues.apache.org/jira/browse/HDFS-15581
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Affects Versions: 3.4.0
>Reporter: Richard
>Assignee: Richard
>Priority: Minor
> Fix For: 3.2.2, 3.3.1, 3.4.0
>
> Attachments: HADOOP-17244.001.patch
>
>
> There are certain data migration patterns that require a way to limit access 
> to the HDFS via the HTTPFS proxy.  The needed access modes are read-write, 
> read-only and write-only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15581) Access Controlled HTTPFS Proxy

2020-09-22 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200135#comment-17200135
 ] 

Kihwal Lee commented on HDFS-15581:
---

+1 The patch looks good. The documentation in {{httpfs-default.xml}} is also 
adequate. It will be linked from 
{{hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/markdown/ServerSetup.md.vm}} 
when doc is generated.


> Access Controlled HTTPFS Proxy
> --
>
> Key: HDFS-15581
> URL: https://issues.apache.org/jira/browse/HDFS-15581
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Affects Versions: 3.4.0
>Reporter: Richard
>Assignee: Richard
>Priority: Minor
> Attachments: HADOOP-17244.001.patch
>
>
> There are certain data migration patterns that require a way to limit access 
> to the HDFS via the HTTPFS proxy.  The needed access modes are read-write, 
> read-only and write-only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15581) Access Controlled HTTPFS Proxy

2020-09-16 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197319#comment-17197319
 ] 

Kihwal Lee commented on HDFS-15581:
---

[~sodonnell], [~richard-ross] has already posted a patch.  Are you interested 
in reviewing his patch?

> Access Controlled HTTPFS Proxy
> --
>
> Key: HDFS-15581
> URL: https://issues.apache.org/jira/browse/HDFS-15581
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Affects Versions: 3.4.0
>Reporter: Richard
>Assignee: Stephen O'Donnell
>Priority: Minor
> Attachments: HADOOP-17244.001.patch
>
>
> There are certain data migration patterns that require a way to limit access 
> to the HDFS via the HTTPFS proxy.  The needed access modes are read-write, 
> read-only and write-only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15581) Access Controlled HTTPFS Proxy

2020-09-16 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee reassigned HDFS-15581:
-

Assignee: Richard

> Access Controlled HTTPFS Proxy
> --
>
> Key: HDFS-15581
> URL: https://issues.apache.org/jira/browse/HDFS-15581
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Affects Versions: 3.4.0
>Reporter: Richard
>Assignee: Richard
>Priority: Minor
> Attachments: HADOOP-17244.001.patch
>
>
> There are certain data migration patterns that require a way to limit access 
> to the HDFS via the HTTPFS proxy.  The needed access modes are read-write, 
> read-only and write-only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Moved] (HDFS-15581) Access Controlled HTTPFS Proxy

2020-09-16 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee moved HADOOP-17264 to HDFS-15581:


  Component/s: (was: httpfs)
   httpfs
  Key: HDFS-15581  (was: HADOOP-17264)
Affects Version/s: (was: 3.4.0)
   3.4.0
   Issue Type: Improvement  (was: New Feature)
  Project: Hadoop HDFS  (was: Hadoop Common)

> Access Controlled HTTPFS Proxy
> --
>
> Key: HDFS-15581
> URL: https://issues.apache.org/jira/browse/HDFS-15581
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Affects Versions: 3.4.0
>Reporter: Richard
>Priority: Minor
> Attachments: HADOOP-17244.001.patch
>
>
> There are certain data migration patterns that require a way to limit access 
> to the HDFS via the HTTPFS proxy.  The needed access modes are read-write, 
> read-only and write-only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15553) Improve NameNode RPC throughput with ReadWriteRpcCallQueue

2020-09-01 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188580#comment-17188580
 ] 

Kihwal Lee commented on HDFS-15553:
---

It is fine to reorder user requests in general. As [~suxingfate] described, 
state changes by clients are synchronous and clients are only guaranteed to see 
a state change only after the state changing write call returns. Read or write 
calls issued while the state changing write call is outstanding may or may not 
see the state update. Reordering write requests by users are also fine. 
However, some of the internal RPC calls from datanodes are not safe to reorder. 
Outstanding calls (IBRs, FBRs, etc.) from the same source may have implicit 
distributed dependencies. Some are also internally semi-synchronous to user's 
state changing requests. Over the years, some of them have been made less 
critical to timing and ordering, but there still are conditions that can cause 
issues. We could call that bad design/assumptions, but it was a design decision 
made for the balance between consistency, durability and performance of that 
time. But, we can always revisit and improve things when old assumptions do not 
hold anymore.  Also there are write calls that initially acquire the read lock 
then reacquire the write lock.

It should be safe to simply reorder user requests for read/write lock combining 
purposes.

Key to the success of this approach would depend on how smart the dynamic 
read/write allocation mechanism works. This may be less critical if the 
workload pattern is easily predictable or slowly changing, or if you want to 
enforce a certain ratio or priority between reads and writes. In environments 
where the workload is highly varied, there might be difficulty utilizing this 
in its fullest extent.

Just out of curiosity, are you using async edit logging and audit logging? Some 
of write combining is done in HDFS-9198 for the incremental block reports.  Do 
you see the queue overflow message in the NN log? The fixed queue size of 1024 
may not be ideal.

> Improve NameNode RPC throughput with ReadWriteRpcCallQueue 
> ---
>
> Key: HDFS-15553
> URL: https://issues.apache.org/jira/browse/HDFS-15553
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Wang, Xinglong
>Priority: Major
>
> *Current*
>  In our production cluster, a typical traffic model is read to write raito is 
> 10:1 and sometimes the ratios goes to 30:1.
>  NameNode is using ReEntrantReadWriteLock under the hood of FSNamesystemLock. 
> Read lock is shared lock while write lock is exclusive lock.
> Read RPC and Write RPC comes randomly to namenode. This makes read and write 
> mixed up. And then only a small fraction of read can really share their read 
> lock.
> Currently we have default callqueue and faircallqueue. And we can 
> refreshCallQueue on the fly. This opens room to design new call queue.
> *Idea*
>  If we reorder the rpc call in callqueue to group read rpc together and write 
> rpc together, we will have sort of control to let a batch of read rpc come to 
> handlers together and possibly share the same read lock. Thus we can reduce 
> Fragments of read locks.
>  This will only improve the chance to share the read lock among the batch of 
> read rpc due to there are some namenode internal write lock is out of call 
> queue.
> Under ReEntrantReadWriteLock, there is a queue to manage threads asking for 
> locks. We can give an example.
>  R: stands for read rpc
>  W: stands for write rpc
>  e.g
>  WWWWWWWW
>  In this case, we need 16 lock timeslice.
> optimized
>  
>  In this case, we only need 9 lock timeslice.
> *Correctness*
>  Since the execution order of any 2 concurrent or queued rpc in namenode is 
> not guaranteed. We can reorder the rpc in callqueue into read group and write 
> group. And then dequeue from these 2 queues by a designed strategy. let's say 
> dequeue 100 read and then dequeue 5 write rpc and then dequeue read again and 
> then write again.
>  Since FairCallQueue also does rpc call reorder in callqueue, for this part I 
> think they share the same logic to guarantee rpc result correctness.
> *Performance*
>  In test environment, we can see a 15% - 20% NameNode RPC throughput 
> improvement comparing with default callqueue. 
>  Test traffic is 30 read:3 write :1 list using NNLoadGeneratorMR
> This performance is not a surprise. Due to some write rpc is not managed in 
> callqueue. We can't do reorder to them by reording calls in callqueue. 
>  But still we can do a fully read write reorder if we redesign 
> ReEntrantReadWriteLock to achieve this. This will be further step after this.



--
This message was sent by Atlassian Jira

[jira] [Commented] (HDFS-15474) HttpFS: WebHdfsFileSystem cannot renew an expired delegation token from HttpFS response

2020-07-17 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159955#comment-17159955
 ] 

Kihwal Lee commented on HDFS-15474:
---

I believe the token ops are not implemented by HttpFS. They are handled at the 
filter level, which is a common component.  One possibility is to have HttpFS 
relay token ops to the namenode. That way, you don't need to have a shared 
secret or a zookeeper instance for token ops when there are multiple servers.

> HttpFS: WebHdfsFileSystem cannot renew an expired delegation token from 
> HttpFS response
> ---
>
> Key: HDFS-15474
> URL: https://issues.apache.org/jira/browse/HDFS-15474
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>
> When clients use WebHdfsFileSystem for HttpFS, they cannot renew expired 
> delegation tokens with the following error.
> {noformat}
> org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: token 
> (owner=..., renewer=..., realUser=..., issueDate=..., maxDate=..., 
> sequenceNumber=..., masterKeyId=...) is expired
> at 
> org.apache.hadoop.hdfs.web.JsonUtilClient.toRemoteException(JsonUtilClient.java:89)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:509)
> ...
> {noformat}
> When using WebHdfsFileSystem for NameNode, it succeeds. This is because the 
> response of HttpFS is different from its of NameNode. We should fix the 
> response of HttpFS.
> This issue is reported by Masayuki Yatagawa.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-24 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143880#comment-17143880
 ] 

Kihwal Lee commented on HDFS-15421:
---

Patch 004 looks good to me. +1.

> IBR leak causes standby NN to be stuck in safe mode
> ---
>
> Key: HDFS-15421
> URL: https://issues.apache.org/jira/browse/HDFS-15421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-15421-000.patch, HDFS-15421-001.patch, 
> HDFS-15421.002.patch, HDFS-15421.003.patch, HDFS-15421.004.patch
>
>
> After HDFS-14941, update of the global gen stamp is delayed in certain 
> situations.  This makes the last set of incremental block reports from append 
> "from future", which causes it to be simply re-queued to the pending DN 
> message queue, rather than processed to complete the block.  The last set of 
> IBRs will leak and never cleaned until it transitions to active.  The size of 
> {{pendingDNMessages}} constantly grows until then.
> If a leak happens while in a startup safe mode, the namenode will never be 
> able to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-23 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143331#comment-17143331
 ] 

Kihwal Lee commented on HDFS-15421:
---

Thanks, [~aajisaka] for the patch. I will also have a look soon.

> IBR leak causes standby NN to be stuck in safe mode
> ---
>
> Key: HDFS-15421
> URL: https://issues.apache.org/jira/browse/HDFS-15421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-15421-000.patch, HDFS-15421-001.patch, 
> HDFS-15421.002.patch
>
>
> After HDFS-14941, update of the global gen stamp is delayed in certain 
> situations.  This makes the last set of incremental block reports from append 
> "from future", which causes it to be simply re-queued to the pending DN 
> message queue, rather than processed to complete the block.  The last set of 
> IBRs will leak and never cleaned until it transitions to active.  The size of 
> {{pendingDNMessages}} constantly grows until then.
> If a leak happens while in a startup safe mode, the namenode will never be 
> able to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.

2020-06-19 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-15422:
--
Description: 
When queueing an IBR (incremental block report) on a standby namenode, some of 
the reported information is being replaced with the existing stored 
information.  This can lead to false block corruption.

We had a namenode, after transitioning to active, started reporting missing 
blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were 
appended and the sizes were actually correct on the datanodes. Upon further 
investigation, it was determined that the namenode was queueing IBRs with 
altered information.

Although it sounds bad, I am not making it blocker 

  was:
When queueing an IBR (incremental block report) on a standby namenode, some of 
the reported information is being replaced with the existing stored 
information.  This can lead to false block corruption.

We had a namenode, after transitioning to active, started reporting missing 
blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were 
appended and the sizes were actually correct on the datanodes. Upon further 
investigation, it was determined that the namenode was queueing IBRs with 
altered information.


> Reported IBR is partially replaced with stored info when queuing.
> -
>
> Key: HDFS-15422
> URL: https://issues.apache.org/jira/browse/HDFS-15422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Priority: Critical
>
> When queueing an IBR (incremental block report) on a standby namenode, some 
> of the reported information is being replaced with the existing stored 
> information.  This can lead to false block corruption.
> We had a namenode, after transitioning to active, started reporting missing 
> blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were 
> appended and the sizes were actually correct on the datanodes. Upon further 
> investigation, it was determined that the namenode was queueing IBRs with 
> altered information.
> Although it sounds bad, I am not making it blocker 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.

2020-06-19 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140618#comment-17140618
 ] 

Kihwal Lee commented on HDFS-15422:
---

The fix is simple. 
{code}
@@ -2578,10 +2578,7 @@ private BlockInfo processReportedBlock(
 // If the block is an out-of-date generation stamp or state,
 // but we're the standby, we shouldn't treat it as corrupt,
 // but instead just queue it for later processing.
-// TODO: Pretty confident this should be s/storedBlock/block below,
-// since we should be postponing the info of the reported block, not
-// the stored block. See HDFS-6289 for more context.
-queueReportedBlock(storageInfo, storedBlock, reportedState,
+queueReportedBlock(storageInfo, block, reportedState,
 QUEUE_REASON_CORRUPT_STATE);
   } else {
 toCorrupt.add(c);
{code}

If  the old information in memory({{storedBlock}}) is used in queueing a 
report, the size may be old.  Unlike GENSTAMP_MISMATCH, this kind of corruption 
can be undone when the NN sees a correct report again. I.e. forcing a block 
report won't fix this condition. 

> Reported IBR is partially replaced with stored info when queuing.
> -
>
> Key: HDFS-15422
> URL: https://issues.apache.org/jira/browse/HDFS-15422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Priority: Critical
>
> When queueing an IBR (incremental block report) on a standby namenode, some 
> of the reported information is being replaced with the existing stored 
> information.  This can lead to false block corruption.
> We had a namenode, after transitioning to active, started reporting missing 
> blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were 
> appended and the sizes were actually correct on the datanodes. Upon further 
> investigation, it was determined that the namenode was queueing IBRs with 
> altered information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.

2020-06-19 Thread Kihwal Lee (Jira)
Kihwal Lee created HDFS-15422:
-

 Summary: Reported IBR is partially replaced with stored info when 
queuing.
 Key: HDFS-15422
 URL: https://issues.apache.org/jira/browse/HDFS-15422
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Kihwal Lee


When queueing an IBR (incremental block report) on a standby namenode, some of 
the reported information is being replaced with the existing stored 
information.  This can lead to false block corruption.

We had a namenode, after transitioning to active, started reporting missing 
blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were 
appended and the sizes were actually correct on the datanodes. Upon further 
investigation, it was determined that the namenode was queueing IBRs with 
altered information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   10   >