[jira] [Assigned] (HDFS-17402) StartupSafeMode should not exit when resources are from low to available

2024-05-09 Thread Zilong Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zilong Zhu reassigned HDFS-17402:
-

Assignee: Zilong Zhu

> StartupSafeMode should not exit when resources are from low to available
> 
>
> Key: HDFS-17402
> URL: https://issues.apache.org/jira/browse/HDFS-17402
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>  Labels: pull-request-available
>
> After HDFS-17231, NameNode can exit safemode automatically when resources are 
> from low to available. It used 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem#leaveSafeMode, this 
> function will change BMSafeModeStatus. However, NameNode entering resource 
> low safe mode doesn't change BMSafeModeStatus in 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem#enterSafeMode. This is 
> not equal
> Now:
> a. NN enter StartupSafeMode
> b. NN enter ResourceLowSafeMode
> c. NN resources from low to available
> d. NN safemode off
>  
> Expectations:
> a. NN enter StartupSafeMode
> b. NN enter ResourceLowSafeMode
> c. NN resources from low to available
> d. NN exit ResourceLowSafeMode but in StartupSafeMode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17503) Unreleased volume references because of OOM

2024-04-28 Thread Zilong Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841815#comment-17841815
 ] 

Zilong Zhu commented on HDFS-17503:
---

[~Keepromise] It appears to occur when creating the BlockSender object. This is 
an intermittent issue that occurs in our production environment. If I manually 
throw an OOM error while creating the BlockSender object, it can cause volume 
references not to be released.

> Unreleased volume references because of OOM
> ---
>
> Key: HDFS-17503
> URL: https://issues.apache.org/jira/browse/HDFS-17503
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>
> When BlockSender throws an error because of OOM,the volume reference obtained 
> by the thread is not released,which causes the thread trying to remove the 
> volume to wait and fall into an infinite loop.
> I found HDFS-15963 catched exception and release volume reference. But it did 
> not handle the case of throwing errors. I think "catch (Throwable t)" should 
> be used instead of "catch (IOException ioe)".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17504) DN process should exit when BPServiceActor exit

2024-04-28 Thread Zilong Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zilong Zhu reassigned HDFS-17504:
-

Assignee: Zilong Zhu

> DN process should exit when BPServiceActor exit
> ---
>
> Key: HDFS-17504
> URL: https://issues.apache.org/jira/browse/HDFS-17504
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>
> BPServiceActor is a very important thread. In a non-HA cluster, the exit of 
> the BPServiceActor thread will cause the DN process to exit. However, in a HA 
> cluster, this is not the case.
> I found HDFS-15651 causes BPServiceActor thread to exit and sets the 
> "runningState" from "RunningState.FAILED" to "RunningState.EXITED",  it can 
> be confusing during troubleshooting.
> I believe that the DN process should exit when the flag of the BPServiceActor 
> is set to RunningState.FAILED because at this point, the DN is unable to 
> recover and establish a heartbeat connection with the ANN on its own.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17504) DN process should exit when BPServiceActor exit

2024-04-28 Thread Zilong Zhu (Jira)
Zilong Zhu created HDFS-17504:
-

 Summary: DN process should exit when BPServiceActor exit
 Key: HDFS-17504
 URL: https://issues.apache.org/jira/browse/HDFS-17504
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Zilong Zhu


BPServiceActor is a very important thread. In a non-HA cluster, the exit of the 
BPServiceActor thread will cause the DN process to exit. However, in a HA 
cluster, this is not the case.
I found HDFS-15651 causes BPServiceActor thread to exit and sets the 
"runningState" from "RunningState.FAILED" to "RunningState.EXITED",  it can be 
confusing during troubleshooting.
I believe that the DN process should exit when the flag of the BPServiceActor 
is set to RunningState.FAILED because at this point, the DN is unable to 
recover and establish a heartbeat connection with the ANN on its own.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17503) Unreleased volume references because of OOM

2024-04-28 Thread Zilong Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zilong Zhu updated HDFS-17503:
--
Description: 
When BlockSender throws an error because of OOM,the volume reference obtained 
by the thread is not released,which causes the thread trying to remove the 
volume to wait and fall into an infinite loop.

I found HDFS-15963 catched exception and release volume reference. But it did 
not handle the case of throwing errors. I think "catch (Throwable t)" should be 
used instead of "catch (IOException ioe)".

> Unreleased volume references because of OOM
> ---
>
> Key: HDFS-17503
> URL: https://issues.apache.org/jira/browse/HDFS-17503
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Priority: Major
>
> When BlockSender throws an error because of OOM,the volume reference obtained 
> by the thread is not released,which causes the thread trying to remove the 
> volume to wait and fall into an infinite loop.
> I found HDFS-15963 catched exception and release volume reference. But it did 
> not handle the case of throwing errors. I think "catch (Throwable t)" should 
> be used instead of "catch (IOException ioe)".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17503) Unreleased volume references because of OOM

2024-04-28 Thread Zilong Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zilong Zhu reassigned HDFS-17503:
-

Assignee: Zilong Zhu

> Unreleased volume references because of OOM
> ---
>
> Key: HDFS-17503
> URL: https://issues.apache.org/jira/browse/HDFS-17503
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>
> When BlockSender throws an error because of OOM,the volume reference obtained 
> by the thread is not released,which causes the thread trying to remove the 
> volume to wait and fall into an infinite loop.
> I found HDFS-15963 catched exception and release volume reference. But it did 
> not handle the case of throwing errors. I think "catch (Throwable t)" should 
> be used instead of "catch (IOException ioe)".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17503) Unreleased volume references because of OOM

2024-04-28 Thread Zilong Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zilong Zhu updated HDFS-17503:
--
Summary: Unreleased volume references because of OOM  (was: Unreleased 
volume references because of)

> Unreleased volume references because of OOM
> ---
>
> Key: HDFS-17503
> URL: https://issues.apache.org/jira/browse/HDFS-17503
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17503) Unreleased volume references because of

2024-04-28 Thread Zilong Zhu (Jira)
Zilong Zhu created HDFS-17503:
-

 Summary: Unreleased volume references because of
 Key: HDFS-17503
 URL: https://issues.apache.org/jira/browse/HDFS-17503
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Zilong Zhu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17402) StartupSafeMode should not exit when resources are from low to available

2024-02-28 Thread Zilong Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zilong Zhu updated HDFS-17402:
--
Description: 
After HDFS-17231, NameNode can exit safemode automatically when resources are 
from low to available. It used 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem#leaveSafeMode, this 
function will change BMSafeModeStatus. However, NameNode entering resource low 
safe mode doesn't change BMSafeModeStatus in 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem#enterSafeMode. This is not 
equal

Now:

a. NN enter StartupSafeMode

b. NN enter ResourceLowSafeMode

c. NN resources from low to available

d. NN safemode off

 
Expectations:

a. NN enter StartupSafeMode

b. NN enter ResourceLowSafeMode

c. NN resources from low to available

d. NN exit ResourceLowSafeMode but in StartupSafeMode

  was:
After HDFS-17231, NameNode can exit safemode automatically when resources are 
from low to available. It used 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem#leaveSafeMode, this 
function will change BMSafeModeStatus. However, NameNode entering resource low 
safe mode doesn't change BMSafeModeStatus in 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem#enterSafeMode. This is not 
equal

So, I think StartupSafeMode should not exit when resources are from low to 
available


> StartupSafeMode should not exit when resources are from low to available
> 
>
> Key: HDFS-17402
> URL: https://issues.apache.org/jira/browse/HDFS-17402
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Priority: Major
>
> After HDFS-17231, NameNode can exit safemode automatically when resources are 
> from low to available. It used 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem#leaveSafeMode, this 
> function will change BMSafeModeStatus. However, NameNode entering resource 
> low safe mode doesn't change BMSafeModeStatus in 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem#enterSafeMode. This is 
> not equal
> Now:
> a. NN enter StartupSafeMode
> b. NN enter ResourceLowSafeMode
> c. NN resources from low to available
> d. NN safemode off
>  
> Expectations:
> a. NN enter StartupSafeMode
> b. NN enter ResourceLowSafeMode
> c. NN resources from low to available
> d. NN exit ResourceLowSafeMode but in StartupSafeMode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17402) StartupSafeMode should not exit when resources are from low to available

2024-02-28 Thread Zilong Zhu (Jira)
Zilong Zhu created HDFS-17402:
-

 Summary: StartupSafeMode should not exit when resources are from 
low to available
 Key: HDFS-17402
 URL: https://issues.apache.org/jira/browse/HDFS-17402
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Zilong Zhu


After HDFS-17231, NameNode can exit safemode automatically when resources are 
from low to available. It used 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem#leaveSafeMode, this 
function will change BMSafeModeStatus. However, NameNode entering resource low 
safe mode doesn't change BMSafeModeStatus in 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem#enterSafeMode. This is not 
equal

So, I think StartupSafeMode should not exit when resources are from low to 
available



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17368) HA: Standy should exit safemode when resources are from low available

2024-02-01 Thread Zilong Zhu (Jira)
Zilong Zhu created HDFS-17368:
-

 Summary: HA: Standy should exit safemode when resources are from 
low available
 Key: HDFS-17368
 URL: https://issues.apache.org/jira/browse/HDFS-17368
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Zilong Zhu


The NameNodeResourceMonitor automatically enters safemode when it detects that 
the resources are not suffcient. NNRM is only in ANN. If both ANN and SNN enter 
SM due to low resources, and later SNN's disk space is restored, SNN willl 
become ANN and ANN will become SNN. However, at this point, SNN will not exit 
the SM, even if the disk is recovered.

Consider the following scenario:
 * Initially, nn-1 is active and nn-2 is standby. The insufficient resources of 
both nn-1 and nn-2 in dfs.namenode.name.dir, the NameNodeResourceMonitor 
detects the resource issue and puts nn01 into safemode.
 * At this point, nn-1 is in safemode (ON) and active, while nn-2 is in 
safemode (OFF) and standby.
 * After a period of time, the resources in nn-2's dfs.namenode.name.dir 
recover, triggering failover.
 * Now, nn-1 is in safe mode (ON) and standby, while nn-2 is in safe mode (OFF) 
and active.
 * Afterward, the resources in nn-1's dfs.namenode.name.dir recover.
 * However, since nn-1 is standby but in safemode (ON), it unable to exit safe 
mode automatically.

There are two possible ways fix this issues:
 # If SNN is detected to be in SM(because low resource), it will exit.
 # Or we already have HDFS-17231, we can revert HDFS-2914. Bringing NNRM back 
to SNN.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16644) java.io.IOException Invalid token in javax.security.sasl.qop

2023-10-07 Thread Zilong Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17772769#comment-17772769
 ] 

Zilong Zhu commented on HDFS-16644:
---

[~nishtha11shah] You are right. 
[https://github.com/apache/hadoop/pull/5962/files] change can only help the 
DataNode avoid crashes but do not enable the 2.10 hadoop-client to successfully 
reads/writes. I believe we should change client code on the 2.10-branch. As 
mentioned above, HDFS-13541 merged into both branch-2.10 and branch-3.2. It 
added the "handshakeMsg" field. But HDFS-6708 and HDFS-9807 merged into 
branch-3.2 only. It added the "storageTypes" and "storageIds" fields before 
HDFS-13541. 2.10-client mistakenly thinks of the ”storageType“ as 
”handshakeMsg“. This in turn passed the wrong “handshakeMsg”. This is where the 
real issue lies. 

> java.io.IOException Invalid token in javax.security.sasl.qop
> 
>
> Key: HDFS-16644
> URL: https://issues.apache.org/jira/browse/HDFS-16644
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.1
>Reporter: Walter Su
>Priority: Major
>  Labels: pull-request-available
>
> deployment:
> server side: kerberos enabled cluster with jdk 1.8 and hdfs-server 3.2.1
> client side:
> I run command hadoop fs -put a test file, with kerberos ticket inited first, 
> and use identical core-site.xml & hdfs-site.xml configuration.
>  using client ver 3.2.1, it succeeds.
>  using client ver 2.8.5, it succeeds.
>  using client ver 2.10.1, it fails. The client side error info is:
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient: 
> SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = 
> false
> 2022-06-27 01:06:15,781 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DataNode{data=FSDataset{dirpath='[/mnt/disk1/hdfs, /mnt/***/hdfs, 
> /mnt/***/hdfs, /mnt/***/hdfs]'}, localName='emr-worker-***.***:9866', 
> datanodeUuid='b1c7f64a-6389-4739-bddf-***', xmitsInProgress=0}:Exception 
> transfering block BP-1187699012-10.-***:blk_1119803380_46080919 to mirror 
> 10.*:9866
> java.io.IOException: Invalid token in javax.security.sasl.qop: D
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessage(DataTransferSaslUtil.java:220)
> Once any client ver 2.10.1 connect to hdfs server, the DataNode no longer 
> accepts any client connection, even client ver 3.2.1 cannot connects to hdfs 
> server. The DataNode rejects any client connection. For a short time, all 
> DataNodes rejects client connections. 
> The problem exists even if I replace DataNode with ver 3.3.0 or replace java 
> with jdk 11.
> The problem is fixed if I replace DataNode with ver 3.2.0. I guess the 
> problem is related to HDFS-13541



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-16644) java.io.IOException Invalid token in javax.security.sasl.qop

2023-08-15 Thread Zilong Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17754437#comment-17754437
 ] 

Zilong Zhu edited comment on HDFS-16644 at 8/15/23 12:54 PM:
-

At the same time, we also identified another issue related to this. It looks 
like the meta inf file for the BlockTokenIdentifier is in the hadoop-hdfs.jar 
rather then the hadoop-hdfs-client.jar. This prevents a client from decode 
identifier because the service loader doesn't find the BlockTokenIdentifier 
class. 

This will result in HDFS-13541 not functioning properly on branch-2.10 as well. 
I created HDFS-17159 to track it.


was (Author: JIRAUSER287487):
At the same time, we also identified another issus related to this. It looks 
like the meta inf file for the BlockTokenIdentifier is in the hadoop-hdfs.jar 
rather then the hadoop-hdfs-client.jar. This prevents a client from decode 
identifier because the service loader doesn't find the BlockTokenIdentifier 
class. 

This will result in HDFS-13541 not functioning properly on branch-2.10 as well. 
I created HDFS-17159 to track it.

> java.io.IOException Invalid token in javax.security.sasl.qop
> 
>
> Key: HDFS-16644
> URL: https://issues.apache.org/jira/browse/HDFS-16644
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.1
>Reporter: Walter Su
>Priority: Major
>
> deployment:
> server side: kerberos enabled cluster with jdk 1.8 and hdfs-server 3.2.1
> client side:
> I run command hadoop fs -put a test file, with kerberos ticket inited first, 
> and use identical core-site.xml & hdfs-site.xml configuration.
>  using client ver 3.2.1, it succeeds.
>  using client ver 2.8.5, it succeeds.
>  using client ver 2.10.1, it fails. The client side error info is:
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient: 
> SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = 
> false
> 2022-06-27 01:06:15,781 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DataNode{data=FSDataset{dirpath='[/mnt/disk1/hdfs, /mnt/***/hdfs, 
> /mnt/***/hdfs, /mnt/***/hdfs]'}, localName='emr-worker-***.***:9866', 
> datanodeUuid='b1c7f64a-6389-4739-bddf-***', xmitsInProgress=0}:Exception 
> transfering block BP-1187699012-10.-***:blk_1119803380_46080919 to mirror 
> 10.*:9866
> java.io.IOException: Invalid token in javax.security.sasl.qop: D
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessage(DataTransferSaslUtil.java:220)
> Once any client ver 2.10.1 connect to hdfs server, the DataNode no longer 
> accepts any client connection, even client ver 3.2.1 cannot connects to hdfs 
> server. The DataNode rejects any client connection. For a short time, all 
> DataNodes rejects client connections. 
> The problem exists even if I replace DataNode with ver 3.3.0 or replace java 
> with jdk 11.
> The problem is fixed if I replace DataNode with ver 3.2.0. I guess the 
> problem is related to HDFS-13541



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-16644) java.io.IOException Invalid token in javax.security.sasl.qop

2023-08-15 Thread Zilong Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17754420#comment-17754420
 ] 

Zilong Zhu edited comment on HDFS-16644 at 8/15/23 12:54 PM:
-

We‘ve also encountered this issue. Our NN and DN is Hadoop 3.2.4 version, and 
client version is 2.10.1.  For the same code segment, if only "hadoop-client" 
included in the pom.xml, it works fine. However, if both "hadoop-client" and 
"hadoop-hdfs" are included, issues arise.

We believe this issue is related to class loading and protocols. It leads to 
the generation of abnormal QOP value(e.g.D).  The key  to this issue lies in 
the handing of the accessToken's BlockTokenIdentifier. 

NN(3.2.4) serialized and sent the accessToken to the  client(2.10.1).  The 
client(2.10.1) deserialized the accessToken(3.2.4). At this point, some fields 
changed.

For BlockTokenIdentifier(3.2.4) 
org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier#writeLegacy
{code:java}
void writeLegacy(DataOutput out) throws IOException {
  WritableUtils.writeVLong(out, expiryDate);
  WritableUtils.writeVInt(out, keyId);
  WritableUtils.writeString(out, userId);
  WritableUtils.writeString(out, blockPoolId);
  WritableUtils.writeVLong(out, blockId);
  WritableUtils.writeVInt(out, modes.size());
  for (AccessMode aMode : modes) {
WritableUtils.writeEnum(out, aMode);
  }
  if (storageTypes != null) {< new fields
WritableUtils.writeVInt(out, storageTypes.length);
for (StorageType type : storageTypes) {
  WritableUtils.writeEnum(out, type);
}
  }
  if (storageIds != null) {  < new fields
WritableUtils.writeVInt(out, storageIds.length);
for (String id : storageIds) {
  WritableUtils.writeString(out, id);
}
  }
  if (handshakeMsg != null && handshakeMsg.length > 0) {
WritableUtils.writeVInt(out, handshakeMsg.length);
out.write(handshakeMsg);
  }
}{code}
For BlockTokenIdentifier(2.10.1) 
org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier#readFields

 
{code:java}
public void readFields(DataInput in) throws IOException {
  this.cache = null;
  if (in instanceof DataInputStream) {
final DataInputStream dis = (DataInputStream) in;
// this.cache should be assigned the raw bytes from the input data for
// upgrading compatibility. If we won't mutate fields and call getBytes()
// for something (e.g retrieve password), we should return the raw bytes
// instead of serializing the instance self fields to bytes, because we
// may lose newly added fields which we can't recognize.
this.cache = IOUtils.readFullyToByteArray(dis);
dis.reset();
  }
  expiryDate = WritableUtils.readVLong(in);
  keyId = WritableUtils.readVInt(in);
  userId = WritableUtils.readString(in);
  blockPoolId = WritableUtils.readString(in);
  blockId = WritableUtils.readVLong(in);
  int length = WritableUtils.readVIntInRange(in, 0,
  AccessMode.class.getEnumConstants().length);
  for (int i = 0; i < length; i++) {
modes.add(WritableUtils.readEnum(in, AccessMode.class));
  }
  try {
int handshakeMsgLen = WritableUtils.readVInt(in);
if (handshakeMsgLen != 0) {
  handshakeMsg = new byte[handshakeMsgLen];
  in.readFully(handshakeMsg);
}
  } catch (EOFException eof) {

  }
} {code}
So, when client(2.10.1) deserialized the handshakeMsg, an error occurred and it 
mistakenly deserialized the storageType instead of the handshakeMsg.

HDFS-13541 merged into both branch-2.10 and branch-3.2. It added the 
"handshakeMsg" field. But HDFS-6708 and HDFS-9807 merged into branch-3.2 only. 
It added the "storageTypes" and "storageIds" fields before HDFS-13541. This is 
where the real issue lies.

I want to fix this issue. Any comments and suggestions would be appreciated.

 


was (Author: JIRAUSER287487):
We‘ve also encountered this issue. Our NN and DN is Hadoop 3.2.4 version, and 
client version is 2.10.1.  For the same code segment, if only "hadoop-client" 
included in the pom.xml, it works fine. However, if both "hadoop-client" and 
"hadoop-hdfs" are included, issues arise.

We believe this issue is related to class loading and protocols. It leads to 
the generation of abnormal QOP value(e.g.D).  The key  to this issue lies in 
the handing of the accessToken's BlockTokenIdentifier. 

NN(3.2.4) serialized and sent the accessToken to the  client(2.10.1).  The 
client(2.10.1) deserialized the accessToken(3.2.4). At this point, some fields 
changed.

For BlockTokenIdentifier(3.2.4) 
org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier#writeLegacy
{code:java}
void writeLegacy(DataOutput out) throws IOException {
  WritableUtils.writeVLong(out, expiryDate);
  WritableUtils.writeVInt(out, keyId);
  WritableUtils.writeString(out, userId);
  WritableUtils.writeString(out, blockPoolId);
  WritableUtils.writeVLong(out, blockId);
  WritableUtils.writeVInt(out, 

[jira] [Comment Edited] (HDFS-16644) java.io.IOException Invalid token in javax.security.sasl.qop

2023-08-15 Thread Zilong Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17754437#comment-17754437
 ] 

Zilong Zhu edited comment on HDFS-16644 at 8/15/23 7:50 AM:


At the same time, we also identified another issus related to this. It looks 
like the meta inf file for the BlockTokenIdentifier is in the hadoop-hdfs.jar 
rather then the hadoop-hdfs-client.jar. This prevents a client from decode 
identifier because the service loader doesn't find the BlockTokenIdentifier 
class. 

This will result in HDFS-13541 not functioning properly on branch-2.10 as well. 
I created HDFS-17159 to track it.


was (Author: JIRAUSER287487):
At the same time, we also identified another issus related to this. It looks 
like the meta inf file for the BlockTokenIdentifier is in the hadoop-hdfs.jar 
rather then the hadoop-hdfs-client.jar. This prevents a client from decode 
identifier because the service loader doesn't find the BlockTokenIdentifier 
class. 

This will result in HDFS-13541 not functioning properly on branch-2.10 as well.

> java.io.IOException Invalid token in javax.security.sasl.qop
> 
>
> Key: HDFS-16644
> URL: https://issues.apache.org/jira/browse/HDFS-16644
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.1
>Reporter: Walter Su
>Priority: Major
>
> deployment:
> server side: kerberos enabled cluster with jdk 1.8 and hdfs-server 3.2.1
> client side:
> I run command hadoop fs -put a test file, with kerberos ticket inited first, 
> and use identical core-site.xml & hdfs-site.xml configuration.
>  using client ver 3.2.1, it succeeds.
>  using client ver 2.8.5, it succeeds.
>  using client ver 2.10.1, it fails. The client side error info is:
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient: 
> SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = 
> false
> 2022-06-27 01:06:15,781 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DataNode{data=FSDataset{dirpath='[/mnt/disk1/hdfs, /mnt/***/hdfs, 
> /mnt/***/hdfs, /mnt/***/hdfs]'}, localName='emr-worker-***.***:9866', 
> datanodeUuid='b1c7f64a-6389-4739-bddf-***', xmitsInProgress=0}:Exception 
> transfering block BP-1187699012-10.-***:blk_1119803380_46080919 to mirror 
> 10.*:9866
> java.io.IOException: Invalid token in javax.security.sasl.qop: D
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessage(DataTransferSaslUtil.java:220)
> Once any client ver 2.10.1 connect to hdfs server, the DataNode no longer 
> accepts any client connection, even client ver 3.2.1 cannot connects to hdfs 
> server. The DataNode rejects any client connection. For a short time, all 
> DataNodes rejects client connections. 
> The problem exists even if I replace DataNode with ver 3.3.0 or replace java 
> with jdk 11.
> The problem is fixed if I replace DataNode with ver 3.2.0. I guess the 
> problem is related to HDFS-13541



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17159) Can't decode Identifier HDFS tokens with only the hdfs client jar

2023-08-15 Thread Zilong Zhu (Jira)
Zilong Zhu created HDFS-17159:
-

 Summary: Can't decode Identifier HDFS tokens with only the hdfs 
client jar
 Key: HDFS-17159
 URL: https://issues.apache.org/jira/browse/HDFS-17159
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.10.1
Reporter: Zilong Zhu


It looks like the meta inf file for the BlockTokenIdentifier is in the 
hadoop-hdfs.jar rather then the hadoop-hdfs-client.jar. This prevents a client 
from decode identifier because the service loader doesn't find the 
BlockTokenIdentifier class. 

This will result in HDFS-13541 not functioning properly on branch-2.10 as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16644) java.io.IOException Invalid token in javax.security.sasl.qop

2023-08-15 Thread Zilong Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17754437#comment-17754437
 ] 

Zilong Zhu commented on HDFS-16644:
---

At the same time, we also identified another issus related to this. It looks 
like the meta inf file for the BlockTokenIdentifier is in the hadoop-hdfs.jar 
rather then the hadoop-hdfs-client.jar. This prevents a client from decode 
identifier because the service loader doesn't find the BlockTokenIdentifier 
class. 

This will result in HDFS-13541 not functioning properly on branch-2.10 as well.

> java.io.IOException Invalid token in javax.security.sasl.qop
> 
>
> Key: HDFS-16644
> URL: https://issues.apache.org/jira/browse/HDFS-16644
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.1
>Reporter: Walter Su
>Priority: Major
>
> deployment:
> server side: kerberos enabled cluster with jdk 1.8 and hdfs-server 3.2.1
> client side:
> I run command hadoop fs -put a test file, with kerberos ticket inited first, 
> and use identical core-site.xml & hdfs-site.xml configuration.
>  using client ver 3.2.1, it succeeds.
>  using client ver 2.8.5, it succeeds.
>  using client ver 2.10.1, it fails. The client side error info is:
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient: 
> SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = 
> false
> 2022-06-27 01:06:15,781 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DataNode{data=FSDataset{dirpath='[/mnt/disk1/hdfs, /mnt/***/hdfs, 
> /mnt/***/hdfs, /mnt/***/hdfs]'}, localName='emr-worker-***.***:9866', 
> datanodeUuid='b1c7f64a-6389-4739-bddf-***', xmitsInProgress=0}:Exception 
> transfering block BP-1187699012-10.-***:blk_1119803380_46080919 to mirror 
> 10.*:9866
> java.io.IOException: Invalid token in javax.security.sasl.qop: D
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessage(DataTransferSaslUtil.java:220)
> Once any client ver 2.10.1 connect to hdfs server, the DataNode no longer 
> accepts any client connection, even client ver 3.2.1 cannot connects to hdfs 
> server. The DataNode rejects any client connection. For a short time, all 
> DataNodes rejects client connections. 
> The problem exists even if I replace DataNode with ver 3.3.0 or replace java 
> with jdk 11.
> The problem is fixed if I replace DataNode with ver 3.2.0. I guess the 
> problem is related to HDFS-13541



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-16644) java.io.IOException Invalid token in javax.security.sasl.qop

2023-08-15 Thread Zilong Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17754420#comment-17754420
 ] 

Zilong Zhu edited comment on HDFS-16644 at 8/15/23 6:03 AM:


We‘ve also encountered this issue. Our NN and DN is Hadoop 3.2.4 version, and 
client version is 2.10.1.  For the same code segment, if only "hadoop-client" 
included in the pom.xml, it works fine. However, if both "hadoop-client" and 
"hadoop-hdfs" are included, issues arise.

We believe this issue is related to class loading and protocols. It leads to 
the generation of abnormal QOP value(e.g.D).  The key  to this issue lies in 
the handing of the accessToken's BlockTokenIdentifier. 

NN(3.2.4) serialized and sent the accessToken to the  client(2.10.1).  The 
client(2.10.1) deserialized the accessToken(3.2.4). At this point, some fields 
changed.

For BlockTokenIdentifier(3.2.4) 
org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier#writeLegacy
{code:java}
void writeLegacy(DataOutput out) throws IOException {
  WritableUtils.writeVLong(out, expiryDate);
  WritableUtils.writeVInt(out, keyId);
  WritableUtils.writeString(out, userId);
  WritableUtils.writeString(out, blockPoolId);
  WritableUtils.writeVLong(out, blockId);
  WritableUtils.writeVInt(out, modes.size());
  for (AccessMode aMode : modes) {
WritableUtils.writeEnum(out, aMode);
  }
  if (storageTypes != null) {< new fields
WritableUtils.writeVInt(out, storageTypes.length);
for (StorageType type : storageTypes) {
  WritableUtils.writeEnum(out, type);
}
  }
  if (storageIds != null) {  < new fields
WritableUtils.writeVInt(out, storageIds.length);
for (String id : storageIds) {
  WritableUtils.writeString(out, id);
}
  }
  if (handshakeMsg != null && handshakeMsg.length > 0) {
WritableUtils.writeVInt(out, handshakeMsg.length);
out.write(handshakeMsg);
  }
}{code}
For BlockTokenIdentifier(2.10.1) 
org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier#readFields

 
{code:java}
public void readFields(DataInput in) throws IOException {
  this.cache = null;
  if (in instanceof DataInputStream) {
final DataInputStream dis = (DataInputStream) in;
// this.cache should be assigned the raw bytes from the input data for
// upgrading compatibility. If we won't mutate fields and call getBytes()
// for something (e.g retrieve password), we should return the raw bytes
// instead of serializing the instance self fields to bytes, because we
// may lose newly added fields which we can't recognize.
this.cache = IOUtils.readFullyToByteArray(dis);
dis.reset();
  }
  expiryDate = WritableUtils.readVLong(in);
  keyId = WritableUtils.readVInt(in);
  userId = WritableUtils.readString(in);
  blockPoolId = WritableUtils.readString(in);
  blockId = WritableUtils.readVLong(in);
  int length = WritableUtils.readVIntInRange(in, 0,
  AccessMode.class.getEnumConstants().length);
  for (int i = 0; i < length; i++) {
modes.add(WritableUtils.readEnum(in, AccessMode.class));
  }
  try {
int handshakeMsgLen = WritableUtils.readVInt(in);
if (handshakeMsgLen != 0) {
  handshakeMsg = new byte[handshakeMsgLen];
  in.readFully(handshakeMsg);
}
  } catch (EOFException eof) {

  }
} {code}
So, when client(2.10.1) deserialized the handshakeMsg, an error occurred and it 
mistakenly deserialized the storageType instead of the handshakeMsg.

HDFS-13541 merged into both branch-2.10 and branch-3.2. It added the 
"handshakeMsg" field. But HDFS-6708 and HDFS-9807 merged into branch-3.2 only. 
It added the "storageTypes" and "storageIds" fields before HDFS-13541. This is 
where the real issus lies.

I want to fix this issus. Any comments and suggestions would be appreciated.

 


was (Author: JIRAUSER287487):
We‘ve also encountered this issue. Our NN and DN is Hadoop 3.2.4 version, and 
client version is 2.10.1.  For the same code segment, if only "hadoop-client" 
included in the pom.xml, it works fine. However, if both "hadoop-client" and 
"hadoop-hdfs" are included, issues arise.

We believe this issue is related to class loading and protocols. It leads to 
the generation of abnormal QOP value(e.g.D).  The key  to this issue lies in 
the handing of the accessToken's BlockTokenIdentifier. 

NN(3.2.4) serialized and sent the accessToken to the  client(2.10.1).  The 
client(2.10.1) deserialized the accessToken(3.2.4). At this point, some fields 
changed.

For BlockTokenIdentifier(3.2.4) 
org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier#writeLegacy
{code:java}
void writeLegacy(DataOutput out) throws IOException {
  WritableUtils.writeVLong(out, expiryDate);
  WritableUtils.writeVInt(out, keyId);
  WritableUtils.writeString(out, userId);
  WritableUtils.writeString(out, blockPoolId);
  WritableUtils.writeVLong(out, blockId);
  WritableUtils.writeVInt(out, 

[jira] [Comment Edited] (HDFS-16644) java.io.IOException Invalid token in javax.security.sasl.qop

2023-08-15 Thread Zilong Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17754420#comment-17754420
 ] 

Zilong Zhu edited comment on HDFS-16644 at 8/15/23 6:01 AM:


We‘ve also encountered this issue. Our NN and DN is Hadoop 3.2.4 version, and 
client version is 2.10.1.  For the same code segment, if only "hadoop-client" 
included in the pom.xml, it works fine. However, if both "hadoop-client" and 
"hadoop-hdfs" are included, issues arise.

We believe this issue is related to class loading and protocols. It leads to 
the generation of abnormal QOP value(e.g.D).  The key  to this issue lies in 
the handing of the accessToken's BlockTokenIdentifier. 

NN(3.2.4) serialized and sent the accessToken to the  client(2.10.1).  The 
client(2.10.1) deserialized the accessToken(3.2.4). At this point, some fields 
changed.

For BlockTokenIdentifier(3.2.4) 
org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier#writeLegacy
{code:java}
void writeLegacy(DataOutput out) throws IOException {
  WritableUtils.writeVLong(out, expiryDate);
  WritableUtils.writeVInt(out, keyId);
  WritableUtils.writeString(out, userId);
  WritableUtils.writeString(out, blockPoolId);
  WritableUtils.writeVLong(out, blockId);
  WritableUtils.writeVInt(out, modes.size());
  for (AccessMode aMode : modes) {
WritableUtils.writeEnum(out, aMode);
  }
  if (storageTypes != null) {< new fields
WritableUtils.writeVInt(out, storageTypes.length);
for (StorageType type : storageTypes) {
  WritableUtils.writeEnum(out, type);
}
  }
  if (storageIds != null) {  < new fields
WritableUtils.writeVInt(out, storageIds.length);
for (String id : storageIds) {
  WritableUtils.writeString(out, id);
}
  }
  if (handshakeMsg != null && handshakeMsg.length > 0) {
WritableUtils.writeVInt(out, handshakeMsg.length);
out.write(handshakeMsg);
  }
}{code}
For BlockTokenIdentifier(2.10.1) 
org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier#readFields

 
{code:java}
public void readFields(DataInput in) throws IOException {
  this.cache = null;
  if (in instanceof DataInputStream) {
final DataInputStream dis = (DataInputStream) in;
// this.cache should be assigned the raw bytes from the input data for
// upgrading compatibility. If we won't mutate fields and call getBytes()
// for something (e.g retrieve password), we should return the raw bytes
// instead of serializing the instance self fields to bytes, because we
// may lose newly added fields which we can't recognize.
this.cache = IOUtils.readFullyToByteArray(dis);
dis.reset();
  }
  expiryDate = WritableUtils.readVLong(in);
  keyId = WritableUtils.readVInt(in);
  userId = WritableUtils.readString(in);
  blockPoolId = WritableUtils.readString(in);
  blockId = WritableUtils.readVLong(in);
  int length = WritableUtils.readVIntInRange(in, 0,
  AccessMode.class.getEnumConstants().length);
  for (int i = 0; i < length; i++) {
modes.add(WritableUtils.readEnum(in, AccessMode.class));
  }
  try {
int handshakeMsgLen = WritableUtils.readVInt(in);
if (handshakeMsgLen != 0) {
  handshakeMsg = new byte[handshakeMsgLen];
  in.readFully(handshakeMsg);
}
  } catch (EOFException eof) {

  }
} {code}
So, when client(2.10.1) deserialized the handshakeMsg, an error occurred and it 
mistakenly deserialized the storageType instead of the handshakeMsg.

HDFS-13541 merged into both branch-2.10 and branch-3.2. It added the 
"handshakeMsg" field. But HDFS-6708 and HDFS-9807 merged into branch-3.2 only. 
It added the "storageTypes" and "storageIds" fields before HDFS-13531. This is 
where the real issus lies.

I want to fix this issus. Any comments and suggestions would be appreciated.

 


was (Author: JIRAUSER287487):
We‘ve also encountered this issue. Our NN and DN is Hadoop 3.2.4 version, and 
client version is 2.10.1.  For the same code segment, if only "hadoop-client" 
included in the pom.xml, it works fin. However, if both "hadoop-client" and 
"hadoop-hdfs" are included, issues arise.

We believe this issue is related to class loading and protocols. It leads to 
the generation of abnormal QOP value(e.g.D).  The key  to this issue lies in 
the handing of the accessToken's BlockTokenIdentifier. 

NN(3.2.4) serialized and sent the accessToken to the  client(2.10.1).  The 
client(2.10.1) deserialized the accessToken(3.2.4). At this point, some fields 
changed.

For BlockTokenIdentifier(3.2.4) 
org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier#writeLegacy
{code:java}
void writeLegacy(DataOutput out) throws IOException {
  WritableUtils.writeVLong(out, expiryDate);
  WritableUtils.writeVInt(out, keyId);
  WritableUtils.writeString(out, userId);
  WritableUtils.writeString(out, blockPoolId);
  WritableUtils.writeVLong(out, blockId);
  WritableUtils.writeVInt(out, 

[jira] [Commented] (HDFS-16644) java.io.IOException Invalid token in javax.security.sasl.qop

2023-08-14 Thread Zilong Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17754420#comment-17754420
 ] 

Zilong Zhu commented on HDFS-16644:
---

We‘ve also encountered this issue. Our NN and DN is Hadoop 3.2.4 version, and 
client version is 2.10.1.  For the same code segment, if only "hadoop-client" 
included in the pom.xml, it works fin. However, if both "hadoop-client" and 
"hadoop-hdfs" are included, issues arise.

We believe this issue is related to class loading and protocols. It leads to 
the generation of abnormal QOP value(e.g.D).  The key  to this issue lies in 
the handing of the accessToken's BlockTokenIdentifier. 

NN(3.2.4) serialized and sent the accessToken to the  client(2.10.1).  The 
client(2.10.1) deserialized the accessToken(3.2.4). At this point, some fields 
changed.

For BlockTokenIdentifier(3.2.4) 
org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier#writeLegacy
{code:java}
void writeLegacy(DataOutput out) throws IOException {
  WritableUtils.writeVLong(out, expiryDate);
  WritableUtils.writeVInt(out, keyId);
  WritableUtils.writeString(out, userId);
  WritableUtils.writeString(out, blockPoolId);
  WritableUtils.writeVLong(out, blockId);
  WritableUtils.writeVInt(out, modes.size());
  for (AccessMode aMode : modes) {
WritableUtils.writeEnum(out, aMode);
  }
  if (storageTypes != null) {< new fields
WritableUtils.writeVInt(out, storageTypes.length);
for (StorageType type : storageTypes) {
  WritableUtils.writeEnum(out, type);
}
  }
  if (storageIds != null) {  < new fields
WritableUtils.writeVInt(out, storageIds.length);
for (String id : storageIds) {
  WritableUtils.writeString(out, id);
}
  }
  if (handshakeMsg != null && handshakeMsg.length > 0) {
WritableUtils.writeVInt(out, handshakeMsg.length);
out.write(handshakeMsg);
  }
}{code}
For BlockTokenIdentifier(2.10.1) 
org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier#readFields

 
{code:java}
public void readFields(DataInput in) throws IOException {
  this.cache = null;
  if (in instanceof DataInputStream) {
final DataInputStream dis = (DataInputStream) in;
// this.cache should be assigned the raw bytes from the input data for
// upgrading compatibility. If we won't mutate fields and call getBytes()
// for something (e.g retrieve password), we should return the raw bytes
// instead of serializing the instance self fields to bytes, because we
// may lose newly added fields which we can't recognize.
this.cache = IOUtils.readFullyToByteArray(dis);
dis.reset();
  }
  expiryDate = WritableUtils.readVLong(in);
  keyId = WritableUtils.readVInt(in);
  userId = WritableUtils.readString(in);
  blockPoolId = WritableUtils.readString(in);
  blockId = WritableUtils.readVLong(in);
  int length = WritableUtils.readVIntInRange(in, 0,
  AccessMode.class.getEnumConstants().length);
  for (int i = 0; i < length; i++) {
modes.add(WritableUtils.readEnum(in, AccessMode.class));
  }
  try {
int handshakeMsgLen = WritableUtils.readVInt(in);
if (handshakeMsgLen != 0) {
  handshakeMsg = new byte[handshakeMsgLen];
  in.readFully(handshakeMsg);
}
  } catch (EOFException eof) {

  }
} {code}
So, when client(2.10.1) deserialized the handshakeMsg, an error occurred and it 
mistakenly deserialized the storageType instead of the handshakeMsg.

HDFS-13541 merged into both branch-2.10 and branch-3.2. It added the 
"handshakeMsg" field. But HDFS-6708 and HDFS-9807 merged into branch-3.2 only. 
It added the "storageTypes" and "storageIds" fields before HDFS-13531. This is 
where the real issus lies.

I want to fix this issus. Any comments and suggestions would be appreciated.

 

> java.io.IOException Invalid token in javax.security.sasl.qop
> 
>
> Key: HDFS-16644
> URL: https://issues.apache.org/jira/browse/HDFS-16644
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.1
>Reporter: Walter Su
>Priority: Major
>
> deployment:
> server side: kerberos enabled cluster with jdk 1.8 and hdfs-server 3.2.1
> client side:
> I run command hadoop fs -put a test file, with kerberos ticket inited first, 
> and use identical core-site.xml & hdfs-site.xml configuration.
>  using client ver 3.2.1, it succeeds.
>  using client ver 2.8.5, it succeeds.
>  using client ver 2.10.1, it fails. The client side error info is:
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient: 
> SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = 
> false
> 2022-06-27 01:06:15,781 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DataNode{data=FSDataset{dirpath='[/mnt/disk1/hdfs, /mnt/***/hdfs, 
> /mnt/***/hdfs, /mnt/***/hdfs]'},