[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-09-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601101#comment-17601101
 ] 

ASF GitHub Bot commented on HDFS-13522:
---

simbadzina commented on code in PR #4311:
URL: https://github.com/apache/hadoop/pull/4311#discussion_r964314207


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/ClientGSIContext.java:
##
@@ -66,15 +75,24 @@ public void 
updateResponseState(RpcResponseHeaderProto.Builder header) {
*/
   @Override
   public void receiveResponseState(RpcResponseHeaderProto header) {

Review Comment:
   Fixed.



##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/ClientGSIContext.java:
##
@@ -66,15 +75,24 @@ public void 
updateResponseState(RpcResponseHeaderProto.Builder header) {
*/
   @Override
   public void receiveResponseState(RpcResponseHeaderProto header) {
-lastSeenStateId.accumulate(header.getStateId());
+if (header.hasRouterFederatedState()) {
+  routerFederatedState = header.getRouterFederatedState();
+} else {
+  lastSeenStateId.update(header.getStateId());
+}
   }
 
   /**
* Client side implementation for providing state alignment info in requests.
*/
   @Override
   public void updateRequestState(RpcRequestHeaderProto.Builder header) {

Review Comment:
   Fixed.





> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-09-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601093#comment-17601093
 ] 

ASF GitHub Bot commented on HDFS-13522:
---

simbadzina commented on code in PR #4311:
URL: https://github.com/apache/hadoop/pull/4311#discussion_r964308328


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/FederatedNamespaceIds.java:
##
@@ -0,0 +1,113 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdfs.server.federation.router;
+
+import java.util.Collections;
+import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.locks.ReentrantLock;
+import org.apache.hadoop.classification.VisibleForTesting;
+import org.apache.hadoop.hdfs.NamespaceStateId;
+import org.apache.hadoop.ipc.protobuf.RpcHeaderProtos;
+import 
org.apache.hadoop.hdfs.federation.protocol.proto.HdfsServerFederationProtos.RouterFederatedStateProto;
+import org.apache.hadoop.thirdparty.protobuf.InvalidProtocolBufferException;
+import org.apache.hadoop.thirdparty.protobuf.ByteString;
+
+
+/**
+ * Collection of last-seen namespace state Ids for a set of namespaces.
+ * A single NamespaceStateId is shared by all outgoing connections to a 
particular namespace.
+ * Router clients share and query the entire collection.
+ */
+public class FederatedNamespaceIds {
+  private final Map namespaceIdMap = new 
ConcurrentHashMap<>();
+  private final ReentrantLock lock = new ReentrantLock();

Review Comment:
   You are right. I was concerned about the iterator (foreach) I have on the 
map.
   However, the weak consistency provided is sufficient for our needs.
   
   
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/package-summary.html#Weakly:~:text=Most%20concurrent%20Collection%20implementations%20(including%20most%20Queues)%20also%20differ%20from%20the%20usual%20java.util%20conventions%20in%20that%20their%20Iterators%20and%20Spliterators%20provide%20weakly%20consistent%20rather%20than%20fast%2Dfail%20traversal%3A





> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-09-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601088#comment-17601088
 ] 

ASF GitHub Bot commented on HDFS-13522:
---

simbadzina commented on code in PR #4311:
URL: https://github.com/apache/hadoop/pull/4311#discussion_r964306464


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/FederatedNamespaceIds.java:
##
@@ -0,0 +1,113 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdfs.server.federation.router;
+
+import java.util.Collections;
+import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.locks.ReentrantLock;
+import org.apache.hadoop.classification.VisibleForTesting;
+import org.apache.hadoop.hdfs.NamespaceStateId;
+import org.apache.hadoop.ipc.protobuf.RpcHeaderProtos;
+import 
org.apache.hadoop.hdfs.federation.protocol.proto.HdfsServerFederationProtos.RouterFederatedStateProto;
+import org.apache.hadoop.thirdparty.protobuf.InvalidProtocolBufferException;
+import org.apache.hadoop.thirdparty.protobuf.ByteString;
+
+
+/**
+ * Collection of last-seen namespace state Ids for a set of namespaces.
+ * A single NamespaceStateId is shared by all outgoing connections to a 
particular namespace.
+ * Router clients share and query the entire collection.
+ */
+public class FederatedNamespaceIds {
+  private final Map namespaceIdMap = new 
ConcurrentHashMap<>();
+  private final ReentrantLock lock = new ReentrantLock();
+
+  public void 
updateStateUsingRequestHeader(RpcHeaderProtos.RpcRequestHeaderProto header) {
+if (header.hasRouterFederatedState()) {
+  RouterFederatedStateProto federatedState;
+  try {
+federatedState = 
RouterFederatedStateProto.parseFrom(header.getRouterFederatedState());
+  } catch (InvalidProtocolBufferException e) {
+throw new RuntimeException(e);
+  }
+  lock.lock();
+  try {
+federatedState.getNamespaceStateIdsMap().forEach((nsId, stateId) -> {
+  if (!namespaceIdMap.containsKey(nsId)) {
+namespaceIdMap.putIfAbsent(nsId, new NamespaceStateId());
+  }
+  namespaceIdMap.get(nsId).update(stateId);

Review Comment:
   Changed.





> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16659) JournalNode should throw NewerTxnIdException if SinceTxId is bigger than HighestWrittenTxId

2022-09-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601082#comment-17601082
 ] 

ASF GitHub Bot commented on HDFS-16659:
---

ZanderXu commented on PR #4560:
URL: https://github.com/apache/hadoop/pull/4560#issuecomment-1238805756

   @xkrogen Sir, thank you for your patient answers and reviews, as well as 
nice suggestions.




> JournalNode should throw NewerTxnIdException if SinceTxId is bigger than 
> HighestWrittenTxId
> ---
>
> Key: HDFS-16659
> URL: https://issues.apache.org/jira/browse/HDFS-16659
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> JournalNode should throw `CacheMissException` if `sinceTxId` is bigger than 
> `highestWrittenTxId` during handling `getJournaledEdits` rpc from NNs. 
> Current logic may cause in-progress EditlogTailer cannot replay any Edits 
> from JournalNodes in some corner cases, resulting in ObserverNameNode cannot 
> handle requests from clients.
> Suppose there are 3 journalNodes, JN0 ~ JN1.
> * JN0 has some abnormal cases when Active Namenode is syncing 10 Edits with 
> first txid 11
> * NameNode just ignore the abnormal JN0 and continue to sync Edits to Journal 
> 1 and 2
> * JN0 backed to health
> * NameNode continue sync 10 Edits with first txid 21.
> * At this point, there are no Edits 11 ~ 30 in the cache of JN0
> * Observer NameNode try to select EditLogInputStream through 
> `getJournaledEdits` with since txId 21
> * Journal 2 has some abnormal cases and caused a slow response
> The expected result is: Response should contain 20 Edits from txId 21 to txId 
> 30 from JN1 and JN2. Because Active NameNode successfully write these Edits 
> to JN1 and JN2 and failed write these edits to JN0.
> But in the current implementation,  the response is [Response(0) from JN0, 
> Response(10) from JN1], because  there are some abnormal cases in  JN2, such 
> as GC, bad network,  cause a slow response. So the `maxAllowedTxns` will be 
> 0, NameNode will not replay any Edits.
> As above, the root case is that JournalNode should throw Miss Cache Exception 
> when `sinceTxid` is more than `highestWrittenTxId`.
> And the bug code as blew:
> {code:java}
> if (sinceTxId > getHighestWrittenTxId()) {
> // Requested edits that don't exist yet; short-circuit the cache here
> metrics.rpcEmptyResponses.incr();
> return 
> GetJournaledEditsResponseProto.newBuilder().setTxnCount(0).build(); 
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-09-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600977#comment-17600977
 ] 

ASF GitHub Bot commented on HDFS-13522:
---

simbadzina commented on code in PR #4311:
URL: https://github.com/apache/hadoop/pull/4311#discussion_r964156460


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterStateIdContext.java:
##
@@ -0,0 +1,94 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdfs.server.federation.router;
+
+import java.lang.reflect.Method;
+import java.util.HashSet;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.hdfs.protocol.ClientProtocol;
+import org.apache.hadoop.hdfs.server.namenode.ha.ReadOnly;
+import org.apache.hadoop.ipc.AlignmentContext;
+import org.apache.hadoop.ipc.RetriableException;
+import org.apache.hadoop.ipc.protobuf.RpcHeaderProtos.RpcRequestHeaderProto;
+import org.apache.hadoop.ipc.protobuf.RpcHeaderProtos.RpcResponseHeaderProto;
+
+import static 
org.apache.hadoop.ipc.RpcConstants.REQUEST_HEADER_NAMESPACE_STATEIDS_SET;
+
+/**
+ * This is the router implementation responsible for passing
+ * client state id to next level.

Review Comment:
   It holds data for all namespaces. I'll add more info to the comment.





> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-09-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600966#comment-17600966
 ] 

ASF GitHub Bot commented on HDFS-13522:
---

simbadzina commented on code in PR #4311:
URL: https://github.com/apache/hadoop/pull/4311#discussion_r964131419


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/FederatedNamespaceIds.java:
##
@@ -0,0 +1,113 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdfs.server.federation.router;
+
+import java.util.Collections;
+import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.locks.ReentrantLock;
+import org.apache.hadoop.classification.VisibleForTesting;
+import org.apache.hadoop.hdfs.NamespaceStateId;
+import org.apache.hadoop.ipc.protobuf.RpcHeaderProtos;
+import 
org.apache.hadoop.hdfs.federation.protocol.proto.HdfsServerFederationProtos.RouterFederatedStateProto;
+import org.apache.hadoop.thirdparty.protobuf.InvalidProtocolBufferException;
+import org.apache.hadoop.thirdparty.protobuf.ByteString;
+
+
+/**
+ * Collection of last-seen namespace state Ids for a set of namespaces.
+ * A single NamespaceStateId is shared by all outgoing connections to a 
particular namespace.
+ * Router clients share and query the entire collection.
+ */
+public class FederatedNamespaceIds {
+  private final Map namespaceIdMap = new 
ConcurrentHashMap<>();
+  private final ReentrantLock lock = new ReentrantLock();
+
+  public void 
updateStateUsingRequestHeader(RpcHeaderProtos.RpcRequestHeaderProto header) {
+if (header.hasRouterFederatedState()) {
+  RouterFederatedStateProto federatedState;
+  try {
+federatedState = 
RouterFederatedStateProto.parseFrom(header.getRouterFederatedState());
+  } catch (InvalidProtocolBufferException e) {
+throw new RuntimeException(e);
+  }
+  lock.lock();
+  try {
+federatedState.getNamespaceStateIdsMap().forEach((nsId, stateId) -> {
+  if (!namespaceIdMap.containsKey(nsId)) {
+namespaceIdMap.putIfAbsent(nsId, new NamespaceStateId());
+  }
+  namespaceIdMap.get(nsId).update(stateId);
+});
+  } finally {
+lock.unlock();
+  }
+
+}
+  }
+
+  public void 
setResponseHeaderState(RpcHeaderProtos.RpcResponseHeaderProto.Builder 
headerBuilder) {
+RouterFederatedStateProto.Builder federatedStateBuilder =
+RouterFederatedStateProto.newBuilder();
+lock.lock();
+try {
+  namespaceIdMap.forEach((k, v) -> 
federatedStateBuilder.putNamespaceStateIds(k, v.get()));
+} finally {
+  lock.unlock();
+}
+
headerBuilder.setRouterFederatedState(federatedStateBuilder.build().toByteString());
+  }
+
+  public NamespaceStateId getNamespaceId(String nsId) {
+lock.lock();
+try {
+  namespaceIdMap.putIfAbsent(nsId, new NamespaceStateId());
+} finally {
+  lock.unlock();
+}
+return namespaceIdMap.get(nsId);
+  }
+
+  public void removeNamespaceId(String nsId) {
+lock.lock();
+try {
+  namespaceIdMap.remove(nsId);
+} finally {
+  lock.unlock();
+}
+  }
+
+  /**
+   * Utility function to view state of routerFederatedState field in RPC 
headers.
+   */
+  @VisibleForTesting
+  public static Map getRouterFederatedStateMap(ByteString 
byteString) {

Review Comment:
   I now use this class in the RouterRPCClient.





> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png, 

[jira] [Resolved] (HDFS-16659) JournalNode should throw NewerTxnIdException if SinceTxId is bigger than HighestWrittenTxId

2022-09-06 Thread Erik Krogen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen resolved HDFS-16659.

Fix Version/s: 3.4.0
   Resolution: Fixed

> JournalNode should throw NewerTxnIdException if SinceTxId is bigger than 
> HighestWrittenTxId
> ---
>
> Key: HDFS-16659
> URL: https://issues.apache.org/jira/browse/HDFS-16659
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> JournalNode should throw `CacheMissException` if `sinceTxId` is bigger than 
> `highestWrittenTxId` during handling `getJournaledEdits` rpc from NNs. 
> Current logic may cause in-progress EditlogTailer cannot replay any Edits 
> from JournalNodes in some corner cases, resulting in ObserverNameNode cannot 
> handle requests from clients.
> Suppose there are 3 journalNodes, JN0 ~ JN1.
> * JN0 has some abnormal cases when Active Namenode is syncing 10 Edits with 
> first txid 11
> * NameNode just ignore the abnormal JN0 and continue to sync Edits to Journal 
> 1 and 2
> * JN0 backed to health
> * NameNode continue sync 10 Edits with first txid 21.
> * At this point, there are no Edits 11 ~ 30 in the cache of JN0
> * Observer NameNode try to select EditLogInputStream through 
> `getJournaledEdits` with since txId 21
> * Journal 2 has some abnormal cases and caused a slow response
> The expected result is: Response should contain 20 Edits from txId 21 to txId 
> 30 from JN1 and JN2. Because Active NameNode successfully write these Edits 
> to JN1 and JN2 and failed write these edits to JN0.
> But in the current implementation,  the response is [Response(0) from JN0, 
> Response(10) from JN1], because  there are some abnormal cases in  JN2, such 
> as GC, bad network,  cause a slow response. So the `maxAllowedTxns` will be 
> 0, NameNode will not replay any Edits.
> As above, the root case is that JournalNode should throw Miss Cache Exception 
> when `sinceTxid` is more than `highestWrittenTxId`.
> And the bug code as blew:
> {code:java}
> if (sinceTxId > getHighestWrittenTxId()) {
> // Requested edits that don't exist yet; short-circuit the cache here
> metrics.rpcEmptyResponses.incr();
> return 
> GetJournaledEditsResponseProto.newBuilder().setTxnCount(0).build(); 
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16659) JournalNode should throw NewerTxnIdException if SinceTxId is bigger than HighestWrittenTxId

2022-09-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600904#comment-17600904
 ] 

ASF GitHub Bot commented on HDFS-16659:
---

xkrogen commented on PR #4560:
URL: https://github.com/apache/hadoop/pull/4560#issuecomment-1238439633

   Merged to trunk. Thanks for the contribution @ZanderXu !




> JournalNode should throw NewerTxnIdException if SinceTxId is bigger than 
> HighestWrittenTxId
> ---
>
> Key: HDFS-16659
> URL: https://issues.apache.org/jira/browse/HDFS-16659
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> JournalNode should throw `CacheMissException` if `sinceTxId` is bigger than 
> `highestWrittenTxId` during handling `getJournaledEdits` rpc from NNs. 
> Current logic may cause in-progress EditlogTailer cannot replay any Edits 
> from JournalNodes in some corner cases, resulting in ObserverNameNode cannot 
> handle requests from clients.
> Suppose there are 3 journalNodes, JN0 ~ JN1.
> * JN0 has some abnormal cases when Active Namenode is syncing 10 Edits with 
> first txid 11
> * NameNode just ignore the abnormal JN0 and continue to sync Edits to Journal 
> 1 and 2
> * JN0 backed to health
> * NameNode continue sync 10 Edits with first txid 21.
> * At this point, there are no Edits 11 ~ 30 in the cache of JN0
> * Observer NameNode try to select EditLogInputStream through 
> `getJournaledEdits` with since txId 21
> * Journal 2 has some abnormal cases and caused a slow response
> The expected result is: Response should contain 20 Edits from txId 21 to txId 
> 30 from JN1 and JN2. Because Active NameNode successfully write these Edits 
> to JN1 and JN2 and failed write these edits to JN0.
> But in the current implementation,  the response is [Response(0) from JN0, 
> Response(10) from JN1], because  there are some abnormal cases in  JN2, such 
> as GC, bad network,  cause a slow response. So the `maxAllowedTxns` will be 
> 0, NameNode will not replay any Edits.
> As above, the root case is that JournalNode should throw Miss Cache Exception 
> when `sinceTxid` is more than `highestWrittenTxId`.
> And the bug code as blew:
> {code:java}
> if (sinceTxId > getHighestWrittenTxId()) {
> // Requested edits that don't exist yet; short-circuit the cache here
> metrics.rpcEmptyResponses.incr();
> return 
> GetJournaledEditsResponseProto.newBuilder().setTxnCount(0).build(); 
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16659) JournalNode should throw NewerTxnIdException if SinceTxId is bigger than HighestWrittenTxId

2022-09-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600903#comment-17600903
 ] 

ASF GitHub Bot commented on HDFS-16659:
---

xkrogen merged PR #4560:
URL: https://github.com/apache/hadoop/pull/4560




> JournalNode should throw NewerTxnIdException if SinceTxId is bigger than 
> HighestWrittenTxId
> ---
>
> Key: HDFS-16659
> URL: https://issues.apache.org/jira/browse/HDFS-16659
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> JournalNode should throw `CacheMissException` if `sinceTxId` is bigger than 
> `highestWrittenTxId` during handling `getJournaledEdits` rpc from NNs. 
> Current logic may cause in-progress EditlogTailer cannot replay any Edits 
> from JournalNodes in some corner cases, resulting in ObserverNameNode cannot 
> handle requests from clients.
> Suppose there are 3 journalNodes, JN0 ~ JN1.
> * JN0 has some abnormal cases when Active Namenode is syncing 10 Edits with 
> first txid 11
> * NameNode just ignore the abnormal JN0 and continue to sync Edits to Journal 
> 1 and 2
> * JN0 backed to health
> * NameNode continue sync 10 Edits with first txid 21.
> * At this point, there are no Edits 11 ~ 30 in the cache of JN0
> * Observer NameNode try to select EditLogInputStream through 
> `getJournaledEdits` with since txId 21
> * Journal 2 has some abnormal cases and caused a slow response
> The expected result is: Response should contain 20 Edits from txId 21 to txId 
> 30 from JN1 and JN2. Because Active NameNode successfully write these Edits 
> to JN1 and JN2 and failed write these edits to JN0.
> But in the current implementation,  the response is [Response(0) from JN0, 
> Response(10) from JN1], because  there are some abnormal cases in  JN2, such 
> as GC, bad network,  cause a slow response. So the `maxAllowedTxns` will be 
> 0, NameNode will not replay any Edits.
> As above, the root case is that JournalNode should throw Miss Cache Exception 
> when `sinceTxid` is more than `highestWrittenTxId`.
> And the bug code as blew:
> {code:java}
> if (sinceTxId > getHighestWrittenTxId()) {
> // Requested edits that don't exist yet; short-circuit the cache here
> metrics.rpcEmptyResponses.incr();
> return 
> GetJournaledEditsResponseProto.newBuilder().setTxnCount(0).build(); 
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16761) Namenode UI for Datanodes page not loading if any data node is down

2022-09-06 Thread Hemanth Boyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600889#comment-17600889
 ] 

Hemanth Boyina commented on HDFS-16761:
---

thank you [~mkris.reddy] for reporting the issue , due to recent upgrade in 
jquery datatables, we are facing this issue. 

the table headers and tables columns were not matching for the dead datanodes 
and hence we are getting the issue

> Namenode UI for Datanodes page not loading if any data node is down
> ---
>
> Key: HDFS-16761
> URL: https://issues.apache.org/jira/browse/HDFS-16761
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.2
>Reporter: Krishna Reddy
>Priority: Major
> Fix For: 3.2.2
>
>
> Steps to reproduce:
> - Install the hadoop components and add 3 datanodes
> - Enable namenode HA 
> - Open Namenode UI and check datanode page 
> - check all datanodes will display
> - Now make one datanode down
> - wait for 10 minutes time as heartbeat expires
> - Refresh namenode page and check
>  
> Actual Result: It is showing error message "NameNode is still loading. 
> Redirecting to the Startup Progress page."



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16761) Namenode UI for Datanodes page not loading if any data node is down

2022-09-06 Thread Krishna Reddy (Jira)
Krishna Reddy created HDFS-16761:


 Summary: Namenode UI for Datanodes page not loading if any data 
node is down
 Key: HDFS-16761
 URL: https://issues.apache.org/jira/browse/HDFS-16761
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.2.2
Reporter: Krishna Reddy
 Fix For: 3.2.2


Steps to reproduce:

- Install the hadoop components and add 3 datanodes

- Enable namenode HA 

- Open Namenode UI and check datanode page 

- check all datanodes will display

- Now make one datanode down

- wait for 10 minutes time as heartbeat expires

- Refresh namenode page and check

 

Actual Result: It is showing error message "NameNode is still loading. 
Redirecting to the Startup Progress page."



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16689) Standby NameNode crashes when transitioning to Active with in-progress tailer

2022-09-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600875#comment-17600875
 ] 

ASF GitHub Bot commented on HDFS-16689:
---

hadoop-yetus commented on PR #4744:
URL: https://github.com/apache/hadoop/pull/4744#issuecomment-1238381684

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 41s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 4 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  39m  0s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 38s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 18s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 47s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 21s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 47s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 39s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 29s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 29s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 20s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  3s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4744/9/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 285 unchanged 
- 1 fixed = 286 total (was 286)  |
   | +1 :green_heart: |  mvnsite  |   1m 28s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 33s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 29s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 265m 32s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4744/9/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 15s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 377m 14s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestReconstructStripedFile |
   |   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4744/9/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4744 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux b1d08172cce1 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 58002d0e78874a22190a43904eefae665aa1d983 |
   | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 

[jira] [Commented] (HDFS-16689) Standby NameNode crashes when transitioning to Active with in-progress tailer

2022-09-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600872#comment-17600872
 ] 

ASF GitHub Bot commented on HDFS-16689:
---

hadoop-yetus commented on PR #4744:
URL: https://github.com/apache/hadoop/pull/4744#issuecomment-1238374829

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 50s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 4 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  38m 59s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 47s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 23s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 43s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 21s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 43s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m  1s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 27s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 23s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 23s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  1s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4744/10/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 285 unchanged 
- 1 fixed = 286 total (was 286)  |
   | +1 :green_heart: |  mvnsite  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 30s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 22s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 262m  4s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4744/10/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 19s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 372m 35s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestReconstructStripedFile |
   |   | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4744/10/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4744 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 4c4ce04c083f 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 58002d0e78874a22190a43904eefae665aa1d983 |
   | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 

[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-09-06 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600719#comment-17600719
 ] 

Steve Loughran commented on HDFS-2139:
--

As you are going to are you making changes in the public/stable filesystem 
APIs, I'd like to keep an eye on that.

Anything that goes in 
* should have its own HADOOP- JIRA, even if all the work is in the hdfs branch.
* needs to be able to work well in cloud infrastructure is which implement 
similar capabilities but different latencies etc.
* 

Whatever copy command goes in shouldn't be a copy(src, dest) -> boolean call 
but instead return a builder subclass of FSDataOutputStreamBuilder which can 
allow for extra options, and return a Future which the caller can block on; etc 
etc
* needs to have something in the filesystem markdown spec and a matching 
contract test
* and a PathCapabilities probe which can check for the API being available 
under a path.
* and fail by throwing exceptions, not returning true/false. A return value is 
needed for the future; something which implements IOStatisticsSource is useful.

Any new API should work identically with azure storage as/when it adds the 
needed operation.; S3's file-by-file COPY call would also be supported. It is 
not going to be as fast as anything in HFS, but as it doesn't use any network 
IO outside the S3 store it is higher bandwidth and scales better than this CP 
would normally do. (The Hive team have asked for S3 copying before, but it gets 
complex once you start to think about encryption; s3a support might need to add 
extra source files


{code}
Future r = fs.copy(src, dest)
  .withFileStatus(srcStatus)   // as with openFile
.withProgress(progressable)
.must("fs.option.copy.atomic", true)   // example of a builder option, 
here one requiring atomic file/dir copy.
.build())
 
r.get();   // block for result
 
{code}

I'd also propose it as a new interface which both FileContext and FileSystem 
implement.

Also, fs shell could be good simple place for this to be used too...easier to 
get working/stabilise there.



> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
> [~xuzq_zander]Provided a design doc 
> https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16689) Standby NameNode crashes when transitioning to Active with in-progress tailer

2022-09-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600710#comment-17600710
 ] 

ASF GitHub Bot commented on HDFS-16689:
---

ZanderXu commented on code in PR #4744:
URL: https://github.com/apache/hadoop/pull/4744#discussion_r963522811


##
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestHAWithInProgressTail.java:
##
@@ -0,0 +1,142 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hdfs.server.namenode;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.hdfs.DFSConfigKeys;
+import org.apache.hadoop.hdfs.HAUtil;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hdfs.qjournal.MiniJournalCluster;
+import org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster;
+import org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager;
+import org.apache.hadoop.hdfs.qjournal.client.SpyQJournalUtil;
+import org.apache.hadoop.hdfs.server.blockmanagement.BlockManagerFaultInjector;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.IOException;
+
+import static 
org.apache.hadoop.hdfs.server.namenode.NameNodeAdapter.getFileInfo;
+import static org.junit.Assert.assertNotNull;
+
+public class TestHAWithInProgressTail {
+  private MiniQJMHACluster qjmhaCluster;
+  private MiniDFSCluster cluster;
+  private MiniJournalCluster jnCluster;
+  private NameNode nn0;
+  private NameNode nn1;
+
+  @Before
+  public void startUp() throws IOException {
+Configuration conf = new Configuration();
+conf.setBoolean(DFSConfigKeys.DFS_HA_TAILEDITS_INPROGRESS_KEY, true);
+conf.setInt(DFSConfigKeys.DFS_QJOURNAL_SELECT_INPUT_STREAMS_TIMEOUT_KEY, 
500);
+HAUtil.setAllowStandbyReads(conf, true);
+qjmhaCluster = new MiniQJMHACluster.Builder(conf).build();
+cluster = qjmhaCluster.getDfsCluster();
+jnCluster = qjmhaCluster.getJournalCluster();
+
+// Get NameNode from cluster to future manual control
+nn0 = cluster.getNameNode(0);
+nn1 = cluster.getNameNode(1);
+  }
+
+  @After
+  public void tearDown() throws IOException {
+if (qjmhaCluster != null) {
+  qjmhaCluster.shutdown();
+}
+  }
+
+
+  /**
+   * Test that Standby Node tails multiple segments while catching up
+   * during the transition to Active.
+   */
+  @Test
+  public void testFailoverWithAbnormalJN() throws Exception {
+cluster.transitionToActive(0);
+cluster.waitActive(0);
+
+BlockManagerFaultInjector.instance = new BlockManagerFaultInjector() {
+  @Override
+  public void mockJNStreams() throws IOException {
+spyOnJASjournal();
+  }
+};

Review Comment:
   @xkrogen Sir, I have updated this patch, please help me review it again. 
Thanks





> Standby NameNode crashes when transitioning to Active with in-progress tailer
> -
>
> Key: HDFS-16689
> URL: https://issues.apache.org/jira/browse/HDFS-16689
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Standby NameNode crashes when transitioning to Active with a in-progress 
> tailer. And the error message like blew:
> {code:java}
> Caused by: java.lang.IllegalStateException: Cannot start writing at txid X 
> when there is a stream available for read: ByteStringEditLog[X, Y], 
> ByteStringEditLog[X, 0]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:344)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.openForWrite(FSEditLogAsync.java:113)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1423)
>   at 
> 

[jira] [Commented] (HDFS-16737) Fix number of threads in FsDatasetAsyncDiskService#addExecutorForVolume

2022-09-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600699#comment-17600699
 ] 

ASF GitHub Bot commented on HDFS-16737:
---

hadoop-yetus commented on PR #4784:
URL: https://github.com/apache/hadoop/pull/4784#issuecomment-1237915530

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 48s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  42m  8s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 38s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 28s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 17s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 37s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 22s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 43s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  26m  7s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 25s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 26s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 18s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  1s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 24s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 31s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 35s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  25m 43s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 347m 45s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4784/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m  0s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 466m  2s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.TestFileTruncate |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4784/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4784 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux 0e0ad1a30cba 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / b13903175a636d20c0a993cdbb61608c9d706bb2 |
   | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4784/3/testReport/ |
   | Max. process+thread count | 2245 (vs. ulimit of 5500) |
   | modules | C: 

[jira] [Commented] (HDFS-16703) Enable RPC Timeout for some protocols of NameNode.

2022-09-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600698#comment-17600698
 ] 

ASF GitHub Bot commented on HDFS-16703:
---

hadoop-yetus commented on PR #4660:
URL: https://github.com/apache/hadoop/pull/4660#issuecomment-1237912143

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 38s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  16m 24s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  25m 46s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   6m  6s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   5m 55s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 37s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 59s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 14s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   2m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   6m 27s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m  4s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 32s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 14s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   6m  1s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   6m  1s |  |  
hadoop-hdfs-project-jdkUbuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 generated 0 new + 1134 unchanged - 1 
fixed = 1134 total (was 1135)  |
   | +1 :green_heart: |  compile  |   5m 41s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   5m 41s |  |  
hadoop-hdfs-project-jdkPrivateBuild-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 with 
JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 generated 0 new + 1103 
unchanged - 1 fixed = 1103 total (was 1104)  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 23s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 39s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   2m  8s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   6m  1s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 37s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 38s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | +1 :green_heart: |  unit  | 239m  8s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m  8s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 385m 59s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4660/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4660 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux ae65e76df820 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 9967ad85b45410dd431391942dd7faad1eedbad0 |
   | Default Java | Private 

[jira] (HDFS-14117) RBF: We can only delete the files or dirs of one subcluster in a cluster with multiple subclusters when trash is enabled

2022-09-06 Thread zhengchenyu (Jira)


[ https://issues.apache.org/jira/browse/HDFS-14117 ]


zhengchenyu deleted comment on HDFS-14117:


was (Author: zhengchenyu):
    In our cluster, I must mount all nameservice for /user/${user}/.Trash, it 
means router will rename all nameservice when move to trash. Though it works 
for long time, this will result to bad performance when one namenode degrade.

I wanna only connect one nameservice. So I have a new proposal: 
    Condition: 
    (1) /test is mounted in ns0
    (2) /user/hdfs is mounted is ns1
    If we move /test/hello to /user/hdfs/.Trash/Current/test/hello.
    When we process the location with trash prefix, we just use the location 
which remove the prefix to find the mounted ns. For 
/user/hdfs/.Trash/Current/test/hello, we remove the prefix 
'/user/hdfs/.Trash/Current', get '/test/hello', use '/test/hello' to find the 
mounted ns. Then we got the location: 
ns0->/user/hdfs/.Trash/Current/test/hello, then rename to trash will work.
    The problem is that we must check the pattern of location in every call, 
but I think it is low cost.

[~elgoiri] [~ayushtkn] [~hexiaoqiao] [~ramkumar]  [~xuzq_zander] How about my 
proposal?

> RBF: We can only delete the files or dirs of one subcluster in a cluster with 
> multiple subclusters when trash is enabled
> 
>
> Key: HDFS-14117
> URL: https://issues.apache.org/jira/browse/HDFS-14117
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ramkumar Ramalingam
>Assignee: Ramkumar Ramalingam
>Priority: Major
>  Labels: RBF
> Attachments: HDFS-14117-HDFS-13891.001.patch, 
> HDFS-14117-HDFS-13891.002.patch, HDFS-14117-HDFS-13891.003.patch, 
> HDFS-14117-HDFS-13891.004.patch, HDFS-14117-HDFS-13891.005.patch, 
> HDFS-14117-HDFS-13891.006.patch, HDFS-14117-HDFS-13891.007.patch, 
> HDFS-14117-HDFS-13891.008.patch, HDFS-14117-HDFS-13891.009.patch, 
> HDFS-14117-HDFS-13891.010.patch, HDFS-14117-HDFS-13891.011.patch, 
> HDFS-14117-HDFS-13891.012.patch, HDFS-14117-HDFS-13891.013.patch, 
> HDFS-14117-HDFS-13891.014.patch, HDFS-14117-HDFS-13891.015.patch, 
> HDFS-14117-HDFS-13891.016.patch, HDFS-14117-HDFS-13891.017.patch, 
> HDFS-14117-HDFS-13891.018.patch, HDFS-14117-HDFS-13891.019.patch, 
> HDFS-14117-HDFS-13891.020.patch, HDFS-14117.001.patch, HDFS-14117.002.patch, 
> HDFS-14117.003.patch, HDFS-14117.004.patch, HDFS-14117.005.patch
>
>
> When we delete files or dirs in hdfs, it will move the deleted files or dirs 
> to trash by default.
> But in the global path we can only mount one trash dir /user. So we mount 
> trash dir /user of the subcluster ns1 to the global path /user. Then we can 
> delete files or dirs of ns1, but when we delete the files or dirs of another 
> subcluser, such as hacluster, it will be failed.
> h1. Mount Table
> ||Global path||Target nameservice||Target path||Order||Read 
> only||Owner||Group||Permission||Quota/Usage||Date Modified||Date Created||
> |/test|hacluster2|/test| | |securedn|users|rwxr-xr-x|[NsQuota: -/-, SsQuota: 
> -/-]|2018/11/29 14:37:42|2018/11/29 14:37:42|
> |/tmp|hacluster1|/tmp| | |securedn|users|rwxr-xr-x|[NsQuota: -/-, SsQuota: 
> -/-]|2018/11/29 14:37:05|2018/11/29 14:37:05|
> |/user|hacluster2,hacluster1|/user|HASH| |securedn|users|rwxr-xr-x|[NsQuota: 
> -/-, SsQuota: -/-]|2018/11/29 14:42:37|2018/11/29 14:38:20|
> commands: 
> {noformat}
> 1./opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -ls /test/.
> 18/11/30 11:00:47 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> Found 1 items
> -rw-r--r-- 3 securedn supergroup 8081 2018-11-30 10:56 /test/hdfs.cmd
> 2./opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -ls /tmp/.
> 18/11/30 11:00:40 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> Found 1 items
> -rw-r--r--   3 securedn supergroup   6311 2018-11-30 10:57 /tmp/mapred.cmd
> 3../opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -rm 
> /tmp/mapred.cmd
> 18/11/30 11:01:02 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> rm: Failed to move to trash: hdfs://router/tmp/mapred.cmd: rename destination 
> parent /user/securedn/.Trash/Current/tmp/mapred.cmd not found.
> 4./opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -rm /test/hdfs.cmd
> 18/11/30 11:01:20 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/11/30 11:01:22 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://router/test/hdfs.cmd' to trash at: 
> hdfs://router/user/securedn/.Trash/Current/test/hdfs.cmd
> 

[jira] [Commented] (HDFS-14117) RBF: We can only delete the files or dirs of one subcluster in a cluster with multiple subclusters when trash is enabled

2022-09-06 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600659#comment-17600659
 ] 

zhengchenyu commented on HDFS-14117:


    In our cluster, I must mount all nameservice for /user/${user}/.Trash, it 
means router will rename all nameservice when move to trash. Though it works 
for long time, this will result to bad performance when one namenode degrade.

I wanna only connect one nameservice. So I have a new proposal: 
    Condition: 
    (1) /test is mounted in ns0
    (2) /user/hdfs is mounted is ns1
    If we move /test/hello to /user/hdfs/.Trash/Current/test/hello.
    When we process the location with trash prefix, we just use the location 
which remove the prefix to find the mounted ns. For 
/user/hdfs/.Trash/Current/test/hello, we remove the prefix 
'/user/hdfs/.Trash/Current', get '/test/hello', use '/test/hello' to find the 
mounted ns. Then we got the location: 
ns0->/user/hdfs/.Trash/Current/test/hello, then rename to trash will work.
    The problem is that we must check the pattern of location in every call, 
but I think it is low cost.

[~elgoiri] [~ayushtkn] [~hexiaoqiao] [~ramkumar]  [~xuzq_zander] How about my 
proposal?

> RBF: We can only delete the files or dirs of one subcluster in a cluster with 
> multiple subclusters when trash is enabled
> 
>
> Key: HDFS-14117
> URL: https://issues.apache.org/jira/browse/HDFS-14117
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ramkumar Ramalingam
>Assignee: Ramkumar Ramalingam
>Priority: Major
>  Labels: RBF
> Attachments: HDFS-14117-HDFS-13891.001.patch, 
> HDFS-14117-HDFS-13891.002.patch, HDFS-14117-HDFS-13891.003.patch, 
> HDFS-14117-HDFS-13891.004.patch, HDFS-14117-HDFS-13891.005.patch, 
> HDFS-14117-HDFS-13891.006.patch, HDFS-14117-HDFS-13891.007.patch, 
> HDFS-14117-HDFS-13891.008.patch, HDFS-14117-HDFS-13891.009.patch, 
> HDFS-14117-HDFS-13891.010.patch, HDFS-14117-HDFS-13891.011.patch, 
> HDFS-14117-HDFS-13891.012.patch, HDFS-14117-HDFS-13891.013.patch, 
> HDFS-14117-HDFS-13891.014.patch, HDFS-14117-HDFS-13891.015.patch, 
> HDFS-14117-HDFS-13891.016.patch, HDFS-14117-HDFS-13891.017.patch, 
> HDFS-14117-HDFS-13891.018.patch, HDFS-14117-HDFS-13891.019.patch, 
> HDFS-14117-HDFS-13891.020.patch, HDFS-14117.001.patch, HDFS-14117.002.patch, 
> HDFS-14117.003.patch, HDFS-14117.004.patch, HDFS-14117.005.patch
>
>
> When we delete files or dirs in hdfs, it will move the deleted files or dirs 
> to trash by default.
> But in the global path we can only mount one trash dir /user. So we mount 
> trash dir /user of the subcluster ns1 to the global path /user. Then we can 
> delete files or dirs of ns1, but when we delete the files or dirs of another 
> subcluser, such as hacluster, it will be failed.
> h1. Mount Table
> ||Global path||Target nameservice||Target path||Order||Read 
> only||Owner||Group||Permission||Quota/Usage||Date Modified||Date Created||
> |/test|hacluster2|/test| | |securedn|users|rwxr-xr-x|[NsQuota: -/-, SsQuota: 
> -/-]|2018/11/29 14:37:42|2018/11/29 14:37:42|
> |/tmp|hacluster1|/tmp| | |securedn|users|rwxr-xr-x|[NsQuota: -/-, SsQuota: 
> -/-]|2018/11/29 14:37:05|2018/11/29 14:37:05|
> |/user|hacluster2,hacluster1|/user|HASH| |securedn|users|rwxr-xr-x|[NsQuota: 
> -/-, SsQuota: -/-]|2018/11/29 14:42:37|2018/11/29 14:38:20|
> commands: 
> {noformat}
> 1./opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -ls /test/.
> 18/11/30 11:00:47 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> Found 1 items
> -rw-r--r-- 3 securedn supergroup 8081 2018-11-30 10:56 /test/hdfs.cmd
> 2./opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -ls /tmp/.
> 18/11/30 11:00:40 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> Found 1 items
> -rw-r--r--   3 securedn supergroup   6311 2018-11-30 10:57 /tmp/mapred.cmd
> 3../opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -rm 
> /tmp/mapred.cmd
> 18/11/30 11:01:02 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> rm: Failed to move to trash: hdfs://router/tmp/mapred.cmd: rename destination 
> parent /user/securedn/.Trash/Current/tmp/mapred.cmd not found.
> 4./opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -rm /test/hdfs.cmd
> 18/11/30 11:01:20 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/11/30 11:01:22 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://router/test/hdfs.cmd' to trash at: 
> 

[jira] [Created] (HDFS-16760) Support a block level inputFormat for RetriableFileFastCopyCommand

2022-09-06 Thread ZanderXu (Jira)
ZanderXu created HDFS-16760:
---

 Summary: Support a block level inputFormat for 
RetriableFileFastCopyCommand
 Key: HDFS-16760
 URL: https://issues.apache.org/jira/browse/HDFS-16760
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: ZanderXu
Assignee: ZanderXu


RetriableFileFastCopyCommand needs a block level inputFormat to deal with the 
data skew problem at the block level.

If the current existed inputFormats have the similar function, can modify them 
to meet this condition first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16759) RetriableFileFastCopyCommand supports migrating EC files

2022-09-06 Thread ZanderXu (Jira)
ZanderXu created HDFS-16759:
---

 Summary: RetriableFileFastCopyCommand supports migrating EC files
 Key: HDFS-16759
 URL: https://issues.apache.org/jira/browse/HDFS-16759
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: ZanderXu
Assignee: ZanderXu


RetriableFileFastCopyCommand should support migrating EC files via FastCopy. 
The main places that need to be modified are as bellows:
 * Preparing favoured list
 * Align the DNs of one block



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16758) Add an optional RetriableFileFastCopyCommand to Distcp

2022-09-06 Thread ZanderXu (Jira)
ZanderXu created HDFS-16758:
---

 Summary: Add an optional RetriableFileFastCopyCommand to Distcp
 Key: HDFS-16758
 URL: https://issues.apache.org/jira/browse/HDFS-16758
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: ZanderXu
Assignee: ZanderXu


Add an optional RetriableFileFastCopyCommand to Distcp to support copying a 
contiguous file from one namespace to another via FastCopy.

RetriableFileFastCopyCommand will be a subClass of RetriableFileCopyCommand. If 
the files to be copied does not meet the conditions of FastCopy, it will 
automatically call the super method to copy the files.

 

Tips: Current task only supports copy the contiguous file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16757) Add a new method copyBlockCrossNamespace to DataNode

2022-09-06 Thread ZanderXu (Jira)
ZanderXu created HDFS-16757:
---

 Summary: Add a new method copyBlockCrossNamespace to DataNode
 Key: HDFS-16757
 URL: https://issues.apache.org/jira/browse/HDFS-16757
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: ZanderXu
Assignee: ZanderXu


Add a new method copyBlockCrossNamespace in DataTransferProtocol at the 
DataNode Side.

This method will copy a source block from one namespace to a target block from 
a different namespace. If the target DN is the same with the current DN, this 
method will copy the block via HardLink. If the target DN is different with the 
current DN, this method will copy the block via TransferBlock.

This method will contains some parameters:
 * ExtendedBlock sourceBlock
 * Token sourceBlockToken
 * ExtendedBlock targetBlock
 * Token targetBlockToken
 * DatanodeInfo targetDN



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org