[jira] [Resolved] (HDFS-16923) The getListing RPC will throw NPE if the path does not exist
[ https://issues.apache.org/jira/browse/HDFS-16923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-16923. Resolution: Fixed > The getListing RPC will throw NPE if the path does not exist > > > Key: HDFS-16923 > URL: https://issues.apache.org/jira/browse/HDFS-16923 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.3.6 > > > The getListing RPC will throw NPE if the path does not exist. And the stack > as bellow: > {code:java} > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RemoteException): > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:4195) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:1421) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:783) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:622) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:590) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:574) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16764) ObserverNamenode handles addBlock rpc and throws a FileNotFoundException
[ https://issues.apache.org/jira/browse/HDFS-16764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-16764. Resolution: Fixed > ObserverNamenode handles addBlock rpc and throws a FileNotFoundException > - > > Key: HDFS-16764 > URL: https://issues.apache.org/jira/browse/HDFS-16764 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > ObserverNameNode currently can handle the addBlockLocation RPC, but it may > throw a FileNotFoundException when it contains stale txid. > * AddBlock is not a coordinated method, so Observer will not check the > statId. > * AddBlock does the validation with checkOperation(OperationCategory.READ) > So the observer can handle the addBlock rpc. If this observer cannot replay > the edit of create file, it will throw a FileNotFoundException during doing > validation. > The related code as follows: > {code:java} > checkOperation(OperationCategory.READ); > final FSPermissionChecker pc = getPermissionChecker(); > FSPermissionChecker.setOperationType(operationName); > readLock(); > try { > checkOperation(OperationCategory.READ); > r = FSDirWriteFileOp.validateAddBlock(this, pc, src, fileId, clientName, > previous, onRetryBlock); > } finally { > readUnlock(operationName); > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16872) Fix log throttling by declaring LogThrottlingHelper as static members
[ https://issues.apache.org/jira/browse/HDFS-16872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-16872. Resolution: Fixed > Fix log throttling by declaring LogThrottlingHelper as static members > - > > Key: HDFS-16872 > URL: https://issues.apache.org/jira/browse/HDFS-16872 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.4 >Reporter: Chengbing Liu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.5, 3.3.6 > > > In our production cluster with Observer NameNode enabled, we have plenty of > logs printed by {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}. The > {{LogThrottlingHelper}} doesn't seem to work. > {noformat} > 2022-10-25 09:26:50,380 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Start loading edits file ByteStringEditLog[17686250688, 17686250688], > ByteStringEditLog[17686250688, 17686250688], ByteStringEditLog[17686250688, > 17686250688] maxTxnsToRead = 9223372036854775807 > 2022-10-25 09:26:50,380 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream 'ByteStringEditLog[17686250688, 17686250688], > ByteStringEditLog[17686250688, 17686250688], ByteStringEditLog[17686250688, > 17686250688]' to transaction ID 17686250688 > 2022-10-25 09:26:50,380 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream 'ByteStringEditLog[17686250688, 17686250688]' to > transaction ID 17686250688 > 2022-10-25 09:26:50,380 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Loaded 1 edits file(s) (the last named ByteStringEditLog[17686250688, > 17686250688], ByteStringEditLog[17686250688, 17686250688], > ByteStringEditLog[17686250688, 17686250688]) of total size 527.0, total edits > 1.0, total load time 0.0 ms > 2022-10-25 09:26:50,387 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Start loading edits file ByteStringEditLog[17686250689, 17686250693], > ByteStringEditLog[17686250689, 17686250693], ByteStringEditLog[17686250689, > 17686250693] maxTxnsToRead = 9223372036854775807 > 2022-10-25 09:26:50,387 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream 'ByteStringEditLog[17686250689, 17686250693], > ByteStringEditLog[17686250689, 17686250693], ByteStringEditLog[17686250689, > 17686250693]' to transaction ID 17686250689 > 2022-10-25 09:26:50,387 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream 'ByteStringEditLog[17686250689, 17686250693]' to > transaction ID 17686250689 > 2022-10-25 09:26:50,387 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Loaded 1 edits file(s) (the last named ByteStringEditLog[17686250689, > 17686250693], ByteStringEditLog[17686250689, 17686250693], > ByteStringEditLog[17686250689, 17686250693]) of total size 890.0, total edits > 5.0, total load time 1.0 ms > {noformat} > After some digging, I found the cause is that {{LogThrottlingHelper}}'s are > declared as instance variables of all the enclosing classes, including > {{FSImage}}, {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}. > Therefore the logging frequency will not be limited across different > instances. For classes with only limited number of instances, such as > {{FSImage}}, this is fine. For others whose instances are created frequently, > such as {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}, it will > result in plenty of logs. > This can be fixed by declaring {{LogThrottlingHelper}}'s as static members. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16689) Standby NameNode crashes when transitioning to Active with in-progress tailer
[ https://issues.apache.org/jira/browse/HDFS-16689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-16689. Resolution: Fixed > Standby NameNode crashes when transitioning to Active with in-progress tailer > - > > Key: HDFS-16689 > URL: https://issues.apache.org/jira/browse/HDFS-16689 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Standby NameNode crashes when transitioning to Active with a in-progress > tailer. And the error message like blew: > {code:java} > Caused by: java.lang.IllegalStateException: Cannot start writing at txid X > when there is a stream available for read: ByteStringEditLog[X, Y], > ByteStringEditLog[X, 0] > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:344) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.openForWrite(FSEditLogAsync.java:113) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1423) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:2132) > ... 36 more > {code} > After tracing and found there is a critical bug in > *EditlogTailer#catchupDuringFailover()* when > *DFS_HA_TAILEDITS_INPROGRESS_KEY* is true. Because *catchupDuringFailover()* > try to replay all missed edits from JournalNodes with *onlyDurableTxns=true*. > It may cannot replay any edits when they are some abnormal JournalNodes. > Reproduce method, suppose: > - There are 2 namenode, namely NN0 and NN1, and the status of echo namenode > is Active, Standby respectively. And there are 3 JournalNodes, namely JN0, > JN1 and JN2. > - NN0 try to sync 3 edits to JNs with started txid 3, but only successfully > synced them to JN1 and JN3. And JN0 is abnormal, such as GC, bad network or > restarted. > - NN1's lastAppliedTxId is 2, and at the moment, we are trying failover > active from NN0 to NN1. > - NN1 only got two responses from JN0 and JN1 when it try to selecting > inputStreams with *fromTxnId=3* and *onlyDurableTxns=true*, and the count > txid of response is 0, 3 respectively. JN2 is abnormal, such as GC, bad > network or restarted. > - NN1 will cannot replay any Edits with *fromTxnId=3* from JournalNodes > because the *maxAllowedTxns* is 0. > So I think Standby NameNode should *catchupDuringFailover()* with > *onlyDurableTxns=false* , so that it can replay all missed edits from > JournalNode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16852) Register the shutdown hook only when not in shutdown for KeyProviderCache constructor
[ https://issues.apache.org/jira/browse/HDFS-16852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-16852. Resolution: Fixed > Register the shutdown hook only when not in shutdown for KeyProviderCache > constructor > - > > Key: HDFS-16852 > URL: https://issues.apache.org/jira/browse/HDFS-16852 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 2.10.3, 3.3.6 > > > When an HDFS client is created, it will register a shutdownhook to > shutdownHookManager. ShutdownHookManager doesn't allow adding a new > shutdownHook when the process is already in shutdown and throws an > IllegalStateException. > This behavior is not ideal, when a spark program failed during pre-launch. In > that case, during shutdown, spark would call cleanStagingDir() to clean the > staging dir. In cleanStagingDir(), it will create a FileSystem object to talk > to HDFS. However, since this would be the first time to use a filesystem > object in that process, it will need to create an hdfs client and register > the shutdownHook. Then, we will hit the IllegalStateException. This > illegalStateException will mask the actual exception which causes the spark > program to fail during pre-launch. > We propose to swallow IllegalStateException in KeyProviderCache and log a > warning. The TCP connection between the client and NameNode should be closed > by the OS when the process is shutdown. > Example stacktrace > {code:java} > 13-09-2022 14:39:42 PDT INFO - 22/09/13 21:39:41 ERROR util.Utils: Uncaught > exception in thread shutdown-hook-0 > 13-09-2022 14:39:42 PDT INFO - java.lang.IllegalStateException: Shutdown in > progress, cannot add a shutdownHook > 13-09-2022 14:39:42 PDT INFO - at > org.apache.hadoop.util.ShutdownHookManager.addShutdownHook(ShutdownHookManager.java:299) > > 13-09-2022 14:39:42 PDT INFO - at > org.apache.hadoop.hdfs.KeyProviderCache.(KeyProviderCache.java:71) > > 13-09-2022 14:39:42 PDT INFO - at > org.apache.hadoop.hdfs.ClientContext.(ClientContext.java:130) > 13-09-2022 14:39:42 PDT INFO - at > org.apache.hadoop.hdfs.ClientContext.get(ClientContext.java:167) > 13-09-2022 14:39:42 PDT INFO - at > org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:383) > 13-09-2022 14:39:42 PDT INFO - at > org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:287) > 13-09-2022 14:39:42 PDT INFO - at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:159) > > 13-09-2022 14:39:42 PDT INFO - at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3261) > > 13-09-2022 14:39:42 PDT INFO - at > org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:121) > 13-09-2022 14:39:42 PDT INFO - at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3310) > > 13-09-2022 14:39:42 PDT INFO - at > org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3278) > 13-09-2022 14:39:42 PDT INFO - at > org.apache.hadoop.fs.FileSystem.get(FileSystem.java:475) > 13-09-2022 14:39:42 PDT INFO - at > org.apache.hadoop.fs.Path.getFileSystem(Path.java:356) > 13-09-2022 14:39:42 PDT INFO - at > org.apache.spark.deploy.yarn.ApplicationMaster.cleanupStagingDir(ApplicationMaster.scala:675) > > 13-09-2022 14:39:42 PDT INFO - at > org.apache.spark.deploy.yarn.ApplicationMaster.$anonfun$run$2(ApplicationMaster.scala:259) > > 13-09-2022 14:39:42 PDT INFO - at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) > > 13-09-2022 14:39:42 PDT INFO - at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188) > > 13-09-2022 14:39:42 PDT INFO - at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > > 13-09-2022 14:39:42 PDT INFO - at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2023) > 13-09-2022 14:39:42 PDT INFO - at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188) > > 13-09-2022 14:39:42 PDT INFO - at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > > 13-09-2022 14:39:42 PDT INFO - at scala.util.Try$.apply(Try.scala:213) > > 13-09-2022 14:39:42 PDT INFO - at > org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) > > 13-09-2022 14:39:42 PDT INFO - at >
[jira] [Resolved] (HDFS-16550) [SBN read] Improper cache-size for journal node may cause cluster crash
[ https://issues.apache.org/jira/browse/HDFS-16550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-16550. Fix Version/s: 3.4.0 Resolution: Fixed > [SBN read] Improper cache-size for journal node may cause cluster crash > --- > > Key: HDFS-16550 > URL: https://issues.apache.org/jira/browse/HDFS-16550 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Tao Li >Assignee: Tao Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: image-2022-04-21-09-54-29-751.png, > image-2022-04-21-09-54-57-111.png, image-2022-04-21-12-32-56-170.png > > Time Spent: 1h > Remaining Estimate: 0h > > When we introduced {*}SBN Read{*}, we encountered a situation during upgrade > the JournalNodes. > Cluster Info: > *Active: nn0* > *Standby: nn1* > 1. Rolling restart journal node. {color:#ff}(related config: > fs.journalnode.edit-cache-size.bytes=1G, -Xms1G, -Xmx=1G){color} > 2. The cluster runs for a while, edits cache usage is increasing and memory > is used up. > 3. {color:#ff}Active namenode(nn0){color} shutdown because of “{_}Timed > out waiting 12ms for a quorum of nodes to respond”{_}. > 4. Transfer nn1 to Active state. > 5. {color:#ff}New Active namenode(nn1){color} also shutdown because of > “{_}Timed out waiting 12ms for a quorum of nodes to respond” too{_}. > 6. {color:#ff}The cluster crashed{color}. > > Related code: > {code:java} > JournaledEditsCache(Configuration conf) { > capacity = conf.getInt(DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_KEY, > DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_DEFAULT); > if (capacity > 0.9 * Runtime.getRuntime().maxMemory()) { > Journal.LOG.warn(String.format("Cache capacity is set at %d bytes but " + > "maximum JVM memory is only %d bytes. It is recommended that you " + > "decrease the cache size or increase the heap size.", > capacity, Runtime.getRuntime().maxMemory())); > } > Journal.LOG.info("Enabling the journaled edits cache with a capacity " + > "of bytes: " + capacity); > ReadWriteLock lock = new ReentrantReadWriteLock(true); > readLock = new AutoCloseableLock(lock.readLock()); > writeLock = new AutoCloseableLock(lock.writeLock()); > initialize(INVALID_TXN_ID); > } {code} > Currently, *fs.journalNode.edit-cache-size-bytes* can be set to a larger size > than the memory requested by the process. If > {*}fs.journalNode.edit-cache-sie.bytes > 0.9 * > Runtime.getruntime().maxMemory(){*}, only warn logs are printed during > journalnode startup. This can easily be overlooked by users. However, as the > cluster runs to a certain period of time, it is likely to cause the cluster > to crash. > > NN log: > !image-2022-04-21-09-54-57-111.png|width=1012,height=47! > !image-2022-04-21-12-32-56-170.png|width=809,height=218! > IMO, we should not set the {{cache size}} to a fixed value, but to the ratio > of maximum memory, which is 0.2 by default. > This avoids the problem of too large cache size. In addition, users can > actively adjust the heap size when they need to increase the cache size. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16547) [SBN read] Namenode in safe mode should not be transfered to observer state
[ https://issues.apache.org/jira/browse/HDFS-16547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-16547. Resolution: Fixed > [SBN read] Namenode in safe mode should not be transfered to observer state > --- > > Key: HDFS-16547 > URL: https://issues.apache.org/jira/browse/HDFS-16547 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Tao Li >Assignee: Tao Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Currently, when a Namenode is in safemode(under starting or enter safemode > manually), we can transfer this Namenode to Observer by command. This > Observer node may receive many requests and then throw a SafemodeException, > this causes unnecessary failover on the client. > So Namenode in safe mode should not be transfer to observer state. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16832) [SBN READ] Fix NPE when check the block location of empty directory
[ https://issues.apache.org/jira/browse/HDFS-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-16832. Resolution: Fixed > [SBN READ] Fix NPE when check the block location of empty directory > --- > > Key: HDFS-16832 > URL: https://issues.apache.org/jira/browse/HDFS-16832 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5 > > > HDFS-16732 is introduced for check block location when getListing or > getFileInfo. But When we check block location of empty directory will throw > NPE. > Exception stack on tez client are below: > {code:java} > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554) > at org.apache.hadoop.ipc.Client.call(Client.java:1492) > at org.apache.hadoop.ipc.Client.call(Client.java:1389) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy12.getListing(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:678) > at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy13.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1671) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1212) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1195) > at > org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1140) > at > org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1136) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1154) > at > org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2054) > at > org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:278) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:239) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408) > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:159) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:279) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:270) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:270) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:254) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) > at >
[jira] [Resolved] (HDFS-16659) JournalNode should throw NewerTxnIdException if SinceTxId is bigger than HighestWrittenTxId
[ https://issues.apache.org/jira/browse/HDFS-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-16659. Fix Version/s: 3.4.0 Resolution: Fixed > JournalNode should throw NewerTxnIdException if SinceTxId is bigger than > HighestWrittenTxId > --- > > Key: HDFS-16659 > URL: https://issues.apache.org/jira/browse/HDFS-16659 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > JournalNode should throw `CacheMissException` if `sinceTxId` is bigger than > `highestWrittenTxId` during handling `getJournaledEdits` rpc from NNs. > Current logic may cause in-progress EditlogTailer cannot replay any Edits > from JournalNodes in some corner cases, resulting in ObserverNameNode cannot > handle requests from clients. > Suppose there are 3 journalNodes, JN0 ~ JN1. > * JN0 has some abnormal cases when Active Namenode is syncing 10 Edits with > first txid 11 > * NameNode just ignore the abnormal JN0 and continue to sync Edits to Journal > 1 and 2 > * JN0 backed to health > * NameNode continue sync 10 Edits with first txid 21. > * At this point, there are no Edits 11 ~ 30 in the cache of JN0 > * Observer NameNode try to select EditLogInputStream through > `getJournaledEdits` with since txId 21 > * Journal 2 has some abnormal cases and caused a slow response > The expected result is: Response should contain 20 Edits from txId 21 to txId > 30 from JN1 and JN2. Because Active NameNode successfully write these Edits > to JN1 and JN2 and failed write these edits to JN0. > But in the current implementation, the response is [Response(0) from JN0, > Response(10) from JN1], because there are some abnormal cases in JN2, such > as GC, bad network, cause a slow response. So the `maxAllowedTxns` will be > 0, NameNode will not replay any Edits. > As above, the root case is that JournalNode should throw Miss Cache Exception > when `sinceTxid` is more than `highestWrittenTxId`. > And the bug code as blew: > {code:java} > if (sinceTxId > getHighestWrittenTxId()) { > // Requested edits that don't exist yet; short-circuit the cache here > metrics.rpcEmptyResponses.incr(); > return > GetJournaledEditsResponseProto.newBuilder().setTxnCount(0).build(); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.
[ https://issues.apache.org/jira/browse/HDFS-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-16732. Fix Version/s: 3.4.0 3.3.9 Resolution: Fixed Merged PR 4756 to trunk and branch-3.3. Thanks [~zhengchenyu]! > [SBN READ] Avoid get location from observer when the block report is delayed. > - > > Key: HDFS-16732 > URL: https://issues.apache.org/jira/browse/HDFS-16732 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > Hive on tez application fail occasionally after observer is enable, log show > below. > {code:java} > 2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] > |impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, > vertex=vertex_1660618571916_4839_1_00 [Map 1] > org.apache.tez.dag.app.dag.impl.AMUserCodeException: > java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:329) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.afterRanInterruptibly(TrustedListenableFutureTask.java:133) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:80) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:748) > at > org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:714) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:378) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408) > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:159) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:279) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:270) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:270) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:254) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) > ... 4 more {code} > As describe in MAPREDUCE-7082, when the block is missing, then will throw > this exception, but my cluster had no missing block. > In this example, I found getListing return location information. When block > report of observer is delayed, will return the block without location. > HDFS-13924 is introduce to solve this problem, but only consider > getBlockLocations. > In observer node, all method which may return location should check whether > locations is empty or not. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Resolved] (HDFS-16181) [SBN Read] Fix metric of RpcRequestCacheMissAmount can't display when tailEditLog form JN
[ https://issues.apache.org/jira/browse/HDFS-16181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-16181. Fix Version/s: 3.1.5 3.2.4 3.3.2 2.10.2 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed Thank you [~jianghuazhu]! This is my mistake. I just updated the JIRA status. > [SBN Read] Fix metric of RpcRequestCacheMissAmount can't display when > tailEditLog form JN > - > > Key: HDFS-16181 > URL: https://issues.apache.org/jira/browse/HDFS-16181 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: wangzhaohui >Assignee: wangzhaohui >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 2.10.2, 3.3.2, 3.2.4, 3.1.5 > > Attachments: after.jpg, before.jpg > > Time Spent: 2h 10m > Remaining Estimate: 0h > > I found the JN turn on edit cache, but the metric of > rpcRequestCacheMissAmount can not display. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16233) Do not use exception handler to implement copy-on-write for EnumCounters
[ https://issues.apache.org/jira/browse/HDFS-16233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-16233. Resolution: Fixed > Do not use exception handler to implement copy-on-write for EnumCounters > > > Key: HDFS-16233 > URL: https://issues.apache.org/jira/browse/HDFS-16233 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 2.10.2, 3.2.3, 3.3.2, 3.1.5 > > Attachments: Screen Shot 2021-09-22 at 1.59.59 PM.png, > profile_c7_delete_asyncaudit.html > > Time Spent: 1h 10m > Remaining Estimate: 0h > > HDFS-14547 saves the NameNode heap space occupied by EnumCounters by > essentially implementing a copy-on-write strategy. > At beginning, all EnumCounters refers to the same ConstEnumCounters to save > heap space. When it is modified, an exception is thrown and the exception > handler converts ConstEnumCounters to EnumCounters object and updates it. > Using exception handler to perform anything more than occasional is bad for > performance. > Propose: use instanceof keyword to detect the type of object and do COW > accordingly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15032) Balancer crashes when it fails to contact an NN via ObserverReadProxyProvider
Erik Krogen created HDFS-15032: -- Summary: Balancer crashes when it fails to contact an NN via ObserverReadProxyProvider Key: HDFS-15032 URL: https://issues.apache.org/jira/browse/HDFS-15032 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.10.0 Reporter: Erik Krogen Assignee: Erik Krogen When trying to run the Balancer using ObserverReadProxyProvider (to allow it to read from the Observer Node as described in HDFS-14979), if one of the NNs isn't running, the Balancer will crash. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14979) [Observer Node] Balancer should submit getBlocks to Observer Node when possible
Erik Krogen created HDFS-14979: -- Summary: [Observer Node] Balancer should submit getBlocks to Observer Node when possible Key: HDFS-14979 URL: https://issues.apache.org/jira/browse/HDFS-14979 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover, hdfs Reporter: Erik Krogen Assignee: Erik Krogen In HDFS-14162, we made it so that the Balancer could function when {{ObserverReadProxyProvider}} was in use. However, the Balancer would still read from the active NameNode, because {{getBlocks}} wasn't annotated as {{@ReadOnly}}. This task is to enable the Balancer to actually read from the Observer Node to alleviate load from the active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly
Erik Krogen created HDFS-14973: -- Summary: Balancer getBlocks RPC dispersal does not function properly Key: HDFS-14973 URL: https://issues.apache.org/jira/browse/HDFS-14973 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 3.0.0, 2.8.2, 2.7.4, 2.9.0 Reporter: Erik Krogen Assignee: Erik Krogen In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls issued by the Balancer/Mover more dispersed, to alleviate load on the NameNode, since {{getBlocks}} can be very expensive and the Balancer should not impact normal cluster operation. Unfortunately, this functionality does not function as expected, especially when the dispatcher thread count is low. The primary issue is that the delay is applied only to the first N threads that are submitted to the dispatcher's executor, where N is the size of the dispatcher's threadpool, but *not* to the first R threads, where R is the number of allowed {{getBlocks}} QPS (currently hardcoded to 20). For example, if the threadpool size is 100 (the default), threads 0-19 have no delay, 20-99 have increased levels of delay, and 100+ have no delay. As I understand it, the intent of the logic was that the delay applied to the first 100 threads would force the dispatcher executor's threads to all be consumed, thus blocking subsequent (non-delayed) threads until the delay period has expired. However, threads 0-19 can finish very quickly (their work can often be fulfilled in the time it takes to execute a single {{getBlocks}} RPC, on the order of tens of milliseconds), thus opening up 20 new slots in the executor, which are then consumed by non-delayed threads 100-119, and so on. So, although 80 threads have had a delay applied, the non-delay threads rush through in the 20 non-delay slots. This problem gets even worse when the dispatcher threadpool size is less than the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no threads ever have a delay applied_, and the feature is not enabled at all. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14245) Class cast error in GetGroups with ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-14245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-14245. Resolution: Fixed > Class cast error in GetGroups with ObserverReadProxyProvider > > > Key: HDFS-14245 > URL: https://issues.apache.org/jira/browse/HDFS-14245 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: HDFS-12943 >Reporter: Shen Yinjie >Assignee: Erik Krogen >Priority: Major > Fix For: 2.10.0, 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-14245.000.patch, HDFS-14245.001.patch, > HDFS-14245.002.patch, HDFS-14245.003.patch, HDFS-14245.004.patch, > HDFS-14245.005.patch, HDFS-14245.006.patch, HDFS-14245.007.patch, > HDFS-14245.patch > > > Run "hdfs groups" with ObserverReadProxyProvider, Exception throws as : > {code:java} > Exception in thread "main" java.io.IOException: Couldn't create proxy > provider class > org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider > at > org.apache.hadoop.hdfs.NameNodeProxiesClient.createFailoverProxyProvider(NameNodeProxiesClient.java:261) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:119) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:95) > at org.apache.hadoop.hdfs.tools.GetGroups.getUgmProtocol(GetGroups.java:87) > at org.apache.hadoop.tools.GetGroupsBase.run(GetGroupsBase.java:71) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.hdfs.tools.GetGroups.main(GetGroups.java:96) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hdfs.NameNodeProxiesClient.createFailoverProxyProvider(NameNodeProxiesClient.java:245) > ... 7 more > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hdfs.server.namenode.ha.NameNodeHAProxyFactory cannot be > cast to org.apache.hadoop.hdfs.server.namenode.ha.ClientHAProxyFactory > at > org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.(ObserverReadProxyProvider.java:123) > at > org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.(ObserverReadProxyProvider.java:112) > ... 12 more > {code} > similar with HDFS-14116, we did a simple fix. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-14162) Balancer should work with ObserverNode
[ https://issues.apache.org/jira/browse/HDFS-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen reopened HDFS-14162: Re-opening for backport to older branches which should have been done from the start. > Balancer should work with ObserverNode > -- > > Key: HDFS-14162 > URL: https://issues.apache.org/jira/browse/HDFS-14162 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Konstantin Shvachko >Assignee: Erik Krogen >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14162-HDFS-12943.wip0.patch, HDFS-14162.000.patch, > HDFS-14162.001.patch, HDFS-14162.002.patch, HDFS-14162.003.patch, > HDFS-14162.004.patch, ReflectionBenchmark.java, > testBalancerWithObserver-3.patch, testBalancerWithObserver.patch > > > Balancer provides a substantial RPC load on NameNode. It would be good to > divert Balancer RPCs {{getBlocks()}}, etc. to ObserverNode. The main problem > is that Balancer uses {{NamenodeProtocol}}, while ORPP currently supports > only {{ClientProtocol}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-14245) Class cast error in GetGroups with ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-14245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen reopened HDFS-14245: Re-opening for backport to other branches, which should have been done from the start. > Class cast error in GetGroups with ObserverReadProxyProvider > > > Key: HDFS-14245 > URL: https://issues.apache.org/jira/browse/HDFS-14245 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: HDFS-12943 >Reporter: Shen Yinjie >Assignee: Erik Krogen >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14245.000.patch, HDFS-14245.001.patch, > HDFS-14245.002.patch, HDFS-14245.003.patch, HDFS-14245.004.patch, > HDFS-14245.005.patch, HDFS-14245.006.patch, HDFS-14245.007.patch, > HDFS-14245.patch > > > Run "hdfs groups" with ObserverReadProxyProvider, Exception throws as : > {code:java} > Exception in thread "main" java.io.IOException: Couldn't create proxy > provider class > org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider > at > org.apache.hadoop.hdfs.NameNodeProxiesClient.createFailoverProxyProvider(NameNodeProxiesClient.java:261) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:119) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:95) > at org.apache.hadoop.hdfs.tools.GetGroups.getUgmProtocol(GetGroups.java:87) > at org.apache.hadoop.tools.GetGroupsBase.run(GetGroupsBase.java:71) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.hdfs.tools.GetGroups.main(GetGroups.java:96) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hdfs.NameNodeProxiesClient.createFailoverProxyProvider(NameNodeProxiesClient.java:245) > ... 7 more > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hdfs.server.namenode.ha.NameNodeHAProxyFactory cannot be > cast to org.apache.hadoop.hdfs.server.namenode.ha.ClientHAProxyFactory > at > org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.(ObserverReadProxyProvider.java:123) > at > org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.(ObserverReadProxyProvider.java:112) > ... 12 more > {code} > similar with HDFS-14116, we did a simple fix. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14829) [Dynamometer] Update TestDynamometerInfra to be Hadoop 3.2+ compatible
Erik Krogen created HDFS-14829: -- Summary: [Dynamometer] Update TestDynamometerInfra to be Hadoop 3.2+ compatible Key: HDFS-14829 URL: https://issues.apache.org/jira/browse/HDFS-14829 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Erik Krogen Currently the integration test included with Dynamometer, {{TestDynamometerInfra}}, is executing against version 3.1.2 of Hadoop. We should update it to run against a more recent version by default (3.2.x) and add support for 3.3 in anticipation of HDFS-14412. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14667) Backport [HDFS-14403] "Cost-based FairCallQueue" to branch-2
Erik Krogen created HDFS-14667: -- Summary: Backport [HDFS-14403] "Cost-based FairCallQueue" to branch-2 Key: HDFS-14667 URL: https://issues.apache.org/jira/browse/HDFS-14667 Project: Hadoop HDFS Issue Type: Improvement Reporter: Erik Krogen Assignee: Erik Krogen We would like to target pulling HDFS-14403, an important operability enhancement, into branch-2. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14643) [Dynamometer] Merge extra commits from GitHub to Hadoop
Erik Krogen created HDFS-14643: -- Summary: [Dynamometer] Merge extra commits from GitHub to Hadoop Key: HDFS-14643 URL: https://issues.apache.org/jira/browse/HDFS-14643 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Erik Krogen Assignee: Erik Krogen While Dynamometer was in the process of being committed to Hadoop, a few patches went in to the GitHub version that haven't yet made it into the version committed here. Some of them are related to TravisCI and Bintray deployment, which can safely be ignored in a Hadoop context, but a few are relevant: {code} * 2d2591e 2019-05-24 Make XML parsing error message more explicit (PR #97) [lfengnan ] * 755a298 2019-04-04 Fix misimplemented CountTimeWritable setter and update the README docs regarding the output file (PR #96) [Christopher Gregorian ] * 66d3e19 2019-03-14 Modify AuditReplay workflow to output count and latency of operations (PR #92) [Christopher Gregorian ] * 5c1d8cd 2019-02-28 Fix issues with the start-workload.sh script (PR #84) [Erik Krogen ] {code} I will use this ticket to track porting these 4 commits into Hadoop's Dynamometer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14640) [Dynamometer] Fix TestDynamometerInfra failures
Erik Krogen created HDFS-14640: -- Summary: [Dynamometer] Fix TestDynamometerInfra failures Key: HDFS-14640 URL: https://issues.apache.org/jira/browse/HDFS-14640 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Erik Krogen Assignee: Erik Krogen I've been seeing Jenkins reporting some failures of the {{TestDynamometerInfra}} test (basically a big integration test). It seems like it's timing out after 15 minutes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14639) [Dynamometer] Unnecessary duplicate bin directory appears in dist layout
Erik Krogen created HDFS-14639: -- Summary: [Dynamometer] Unnecessary duplicate bin directory appears in dist layout Key: HDFS-14639 URL: https://issues.apache.org/jira/browse/HDFS-14639 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, test Reporter: Erik Krogen The bin files get put into the {{share/hadoop/tools/dynamometer/dynamometer-*/bin}} locations as expected: {code} ekrogen at ekrogen-mn6 in ~/dev/hadoop/trunk/hadoop-dist/target/hadoop-3.3.0-SNAPSHOT on ekrogen-HDFS-14410-dyno-docs! ± ls share/hadoop/tools/dynamometer/dynamometer-*/bin share/hadoop/tools/dynamometer/dynamometer-blockgen/bin: generate-block-lists.sh share/hadoop/tools/dynamometer/dynamometer-infra/bin: create-slim-hadoop-tar.shparse-metrics.sh start-dynamometer-cluster.sh upload-fsimage.sh share/hadoop/tools/dynamometer/dynamometer-workload/bin: parse-start-timestamp.sh start-workload.sh {code} But for blockgen specifically, it also ends up in another folder: {code} ekrogen at ekrogen-mn6 in ~/dev/hadoop/trunk/hadoop-dist/target/hadoop-3.3.0-SNAPSHOT on ekrogen-HDFS-14410-dyno-docs! ± ls share/hadoop/tools/dynamometer-blockgen/bin generate-block-lists.sh {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14638) [Dynamometer] Fix scripts to refer to current build structure
Erik Krogen created HDFS-14638: -- Summary: [Dynamometer] Fix scripts to refer to current build structure Key: HDFS-14638 URL: https://issues.apache.org/jira/browse/HDFS-14638 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, test Reporter: Erik Krogen The scripts within the Dynamometer build dirs all refer to the old distribution structure with a single {{bin}} directory and a single {{lib}} directory. We need to update them to refer to the Hadoop-standard layout. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14539) Remove Dynamometer's reliance on the tar utility
Erik Krogen created HDFS-14539: -- Summary: Remove Dynamometer's reliance on the tar utility Key: HDFS-14539 URL: https://issues.apache.org/jira/browse/HDFS-14539 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Erik Krogen Dynamometer currently relies on the tar utility, which is cumbersome and means that it won't work on Windows. We should remove this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14500) NameNode StartupProgress continues to report edit log segments after the LOADING_EDITS phase is finished
[ https://issues.apache.org/jira/browse/HDFS-14500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-14500. Resolution: Fixed > NameNode StartupProgress continues to report edit log segments after the > LOADING_EDITS phase is finished > > > Key: HDFS-14500 > URL: https://issues.apache.org/jira/browse/HDFS-14500 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 3.1.3 > > Attachments: HDFS-14500-branch-2.001.patch, HDFS-14500.000.patch, > HDFS-14500.001.patch > > > When testing out a cluster with the edit log tailing fast path feature > enabled (HDFS-13150), an unrelated issue caused the NameNode to remain in > safe mode for an extended period of time, preventing the NameNode from fully > completing its startup sequence. We noticed that the Startup Progress web UI > displayed many edit log segments (millions of them). > I traced this problem back to {{StartupProgress}}. Within > {{FSEditLogLoader}}, the loader continually tries to update the startup > progress with a new {{Step}} any time that it loads edits. Per the Javadoc > for {{StartupProgress}}, this should be a no-op once startup is completed: > {code:title=StartupProgress.java} > * After startup completes, the tracked data is frozen. Any subsequent > updates > * or counter increments are no-ops. > {code} > However, {{StartupProgress}} only implements that logic once the _entire_ > startup sequence has been completed. When {{FSEditLogLoader}} calls > {{addStep()}}, it adds it into the {{LOADING_EDITS}} phase: > {code:title=FSEditLogLoader.java} > StartupProgress prog = NameNode.getStartupProgress(); > Step step = createStartupProgressStep(edits); > prog.beginStep(Phase.LOADING_EDITS, step); > {code} > This phase, in our case, ended long before, so it is nonsensical to continue > to add steps to it. I believe it is a bug that {{StartupProgress}} accepts > such steps instead of ignoring them; once a phase is complete, it should no > longer change. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-14500) NameNode StartupProgress continues to report edit log segments after the LOADING_EDITS phase is finished
[ https://issues.apache.org/jira/browse/HDFS-14500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen reopened HDFS-14500: > NameNode StartupProgress continues to report edit log segments after the > LOADING_EDITS phase is finished > > > Key: HDFS-14500 > URL: https://issues.apache.org/jira/browse/HDFS-14500 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14500.000.patch, HDFS-14500.001.patch > > > When testing out a cluster with the edit log tailing fast path feature > enabled (HDFS-13150), an unrelated issue caused the NameNode to remain in > safe mode for an extended period of time, preventing the NameNode from fully > completing its startup sequence. We noticed that the Startup Progress web UI > displayed many edit log segments (millions of them). > I traced this problem back to {{StartupProgress}}. Within > {{FSEditLogLoader}}, the loader continually tries to update the startup > progress with a new {{Step}} any time that it loads edits. Per the Javadoc > for {{StartupProgress}}, this should be a no-op once startup is completed: > {code:title=StartupProgress.java} > * After startup completes, the tracked data is frozen. Any subsequent > updates > * or counter increments are no-ops. > {code} > However, {{StartupProgress}} only implements that logic once the _entire_ > startup sequence has been completed. When {{FSEditLogLoader}} calls > {{addStep()}}, it adds it into the {{LOADING_EDITS}} phase: > {code:title=FSEditLogLoader.java} > StartupProgress prog = NameNode.getStartupProgress(); > Step step = createStartupProgressStep(edits); > prog.beginStep(Phase.LOADING_EDITS, step); > {code} > This phase, in our case, ended long before, so it is nonsensical to continue > to add steps to it. I believe it is a bug that {{StartupProgress}} accepts > such steps instead of ignoring them; once a phase is complete, it should no > longer change. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14500) NameNode StartupProgress continues to report edit log segments after the LOADING_EDITS phase is finished
Erik Krogen created HDFS-14500: -- Summary: NameNode StartupProgress continues to report edit log segments after the LOADING_EDITS phase is finished Key: HDFS-14500 URL: https://issues.apache.org/jira/browse/HDFS-14500 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.1.2, 2.8.5, 3.0.3, 2.9.2, 3.2.0 Reporter: Erik Krogen Assignee: Erik Krogen When testing out a cluster with the edit log tailing fast path feature enabled (HDFS-13150), an unrelated issue caused the NameNode to remain in safe mode for an extended period of time, preventing the NameNode from fully completing its startup sequence. We noticed that the Startup Progress web UI displayed many edit log segments (millions of them). I traced this problem back to {{StartupProgress}}. Within {{FSEditLogLoader}}, the loader continually tries to update the startup progress with a new {{Step}} any time that it loads edits. Per the Javadoc for {{StartupProgress}}, this should be a no-op once startup is completed: {code:title=StartupProgress.java} * After startup completes, the tracked data is frozen. Any subsequent updates * or counter increments are no-ops. {code} However, {{StartupProgress}} only implements that logic once the _entire_ startup sequence has been completed. When {{FSEditLogLoader}} calls {{addStep()}}, it adds it into the {{LOADING_EDITS}} phase: {code:title=FSEditLogLoader.java} StartupProgress prog = NameNode.getStartupProgress(); Step step = createStartupProgressStep(edits); prog.beginStep(Phase.LOADING_EDITS, step); {code} This phase, in our case, ended long before, so it is nonsensical to continue to add steps to it. I believe it is a bug that {{StartupProgress}} accepts such steps instead of ignoring them; once a phase is complete, it should no longer change. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14462) WebHDFS throws "Error writing request body to server" instead of NSQuotaExceededException
Erik Krogen created HDFS-14462: -- Summary: WebHDFS throws "Error writing request body to server" instead of NSQuotaExceededException Key: HDFS-14462 URL: https://issues.apache.org/jira/browse/HDFS-14462 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 3.1.2, 2.7.7, 2.8.5, 3.0.3, 2.9.2, 3.2.0 Reporter: Erik Krogen We noticed recently in our environment that, when writing data to HDFS via WebHDFS, a quota exception is returned to the client as: {code} java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3536) ~[?:1.8.0_172] at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3519) ~[?:1.8.0_172] at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[?:1.8.0_172] at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[?:1.8.0_172] at java.io.FilterOutputStream.flush(FilterOutputStream.java:140) ~[?:1.8.0_172] at java.io.DataOutputStream.flush(DataOutputStream.java:123) ~[?:1.8.0_172] {code} It is entirely opaque to the user that this exception was caused because they exceeded their quota. Yet in the DataNode logs: {code} 2019-04-24 02:13:09,639 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota of /foo/path/here is exceeded: quota = B = X TB but diskspace consumed = B = X TB at org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyStoragespaceQuota(DirectoryWithQuotaFeature.java:211) at org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyQuota(DirectoryWithQuotaFeature.java:239) {code} This was on a 2.7.x cluster, but I verified that the same logic exists on trunk. I believe we need to fix some of the logic within the {{ExceptionHandler}} to add special handling for the quota exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14442) Disagreement between HAUtil.getAddressOfActive and RpcInvocationHandler.getConnectionId
Erik Krogen created HDFS-14442: -- Summary: Disagreement between HAUtil.getAddressOfActive and RpcInvocationHandler.getConnectionId Key: HDFS-14442 URL: https://issues.apache.org/jira/browse/HDFS-14442 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.3.0 Reporter: Erik Krogen While working on HDFS-14245, we noticed a discrepancy in some proxy-handling code. The description of {{RpcInvocationHandler.getConnectionId()}} states: {code} /** * Returns the connection id associated with the InvocationHandler instance. * @return ConnectionId */ ConnectionId getConnectionId(); {code} It does not make any claims about whether this connection ID will be an active proxy or not. Yet in {{HAUtil}} we have: {code} /** * Get the internet address of the currently-active NN. This should rarely be * used, since callers of this method who connect directly to the NN using the * resulting InetSocketAddress will not be able to connect to the active NN if * a failover were to occur after this method has been called. * * @param fs the file system to get the active address of. * @return the internet address of the currently-active NN. * @throws IOException if an error occurs while resolving the active NN. */ public static InetSocketAddress getAddressOfActive(FileSystem fs) throws IOException { if (!(fs instanceof DistributedFileSystem)) { throw new IllegalArgumentException("FileSystem " + fs + " is not a DFS."); } // force client address resolution. fs.exists(new Path("/")); DistributedFileSystem dfs = (DistributedFileSystem) fs; DFSClient dfsClient = dfs.getClient(); return RPC.getServerAddress(dfsClient.getNamenode()); } {code} Where the call {{RPC.getServerAddress()}} eventually terminates into {{RpcInvocationHandler#getConnectionId()}}, via {{RPC.getServerAddress()}} -> {{RPC.getConnectionIdForProxy()}} -> {{RpcInvocationHandler#getConnectionId()}}. {{HAUtil}} appears to be making an incorrect assumption that {{RpcInvocationHandler}} will necessarily return an _active_ connection ID. {{ObserverReadProxyProvider}} demonstrates a counter-example to this, since the current connection ID may be pointing at, for example, an Observer NameNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14435) ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs
Erik Krogen created HDFS-14435: -- Summary: ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs Key: HDFS-14435 URL: https://issues.apache.org/jira/browse/HDFS-14435 Project: Hadoop HDFS Issue Type: Bug Components: ha, nn Affects Versions: 3.3.0 Reporter: Erik Krogen Assignee: Erik Krogen We have been seeing issues during testing of the Consistent Read from Standby feature that indicate that ORPP is unable to call {{getHAServiceState}} on Standby NNs, as they are rejected with a {{StandbyException}}. Upon further investigation, we realized that although the Standby allows the {{getHAServiceState()}} call, reading a delegation token is not allowed in Standby state, thus the call will fail when using DT-based authentication. This hasn't caused issues in practice, since ORPP assumes that the state is Standby if it is unable to fetch the state, but we should fix the logic to properly handle this scenario. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14413) HA Support for Dynamometer
Erik Krogen created HDFS-14413: -- Summary: HA Support for Dynamometer Key: HDFS-14413 URL: https://issues.apache.org/jira/browse/HDFS-14413 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Erik Krogen It would be nice if Dynamometer could handle spinning up a full 2 NN + 3 QJM cluster instead of just a single NN -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14412) Enable Dynamometer to use the local build of Hadoop by default
Erik Krogen created HDFS-14412: -- Summary: Enable Dynamometer to use the local build of Hadoop by default Key: HDFS-14412 URL: https://issues.apache.org/jira/browse/HDFS-14412 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Erik Krogen Currently, by default, Dynamometer will download a Hadoop tarball from the internet to use as the Hadoop version-under-test. Since it is bundled inside of Hadoop now, it would make more sense for it to use the current version of Hadoop by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14411) Combine Dynamometer's SimulatedDataNodes into DataNodeCluster
Erik Krogen created HDFS-14411: -- Summary: Combine Dynamometer's SimulatedDataNodes into DataNodeCluster Key: HDFS-14411 URL: https://issues.apache.org/jira/browse/HDFS-14411 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Erik Krogen Dynamometer has a {{SimulatedDataNodes}} class, which is very similar to {{DataNodeCluster}} but with some different functionality. It would be better to combine the two to keep maintenance changes in a single place. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14410) Make Dynamometer documentation properly compile onto the Hadoop site
Erik Krogen created HDFS-14410: -- Summary: Make Dynamometer documentation properly compile onto the Hadoop site Key: HDFS-14410 URL: https://issues.apache.org/jira/browse/HDFS-14410 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Erik Krogen The documentation included with Dynamometer doesn't properly appear on the site, we need to twiddle with this a bit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14409) Improve Dynamometer test suite
Erik Krogen created HDFS-14409: -- Summary: Improve Dynamometer test suite Key: HDFS-14409 URL: https://issues.apache.org/jira/browse/HDFS-14409 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Erik Krogen The testing within Dynamometer now is mostly one big integration test. It could really use better testing throughout. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14370) Edit log tailing fast-path should allow for backoff
Erik Krogen created HDFS-14370: -- Summary: Edit log tailing fast-path should allow for backoff Key: HDFS-14370 URL: https://issues.apache.org/jira/browse/HDFS-14370 Project: Hadoop HDFS Issue Type: Improvement Components: namenode, qjm Affects Versions: 3.3.0 Reporter: Erik Krogen Assignee: Erik Krogen As part of HDFS-13150, in-progress edit log tailing was changed to use an RPC-based mechanism, thus allowing the edit log tailing frequency to be turned way down, and allowing standby/observer NameNodes to be only a few milliseconds stale as compared to the Active NameNode. When there is a high volume of transactions on the system, each RPC fetches transactions and takes some time to process them, self-rate-limiting how frequently an RPC is submitted. In a lightly loaded cluster, however, most of these RPCs return an empty set of transactions, consuming a high (de)serialization overhead for very little benefit. This was reported by [~jojochuang] in HDFS-14276 and I have also reported it on a test cluster where the SbNN was submitting 8000 RPCs per second that returned empty. I propose we add some sort of backoff to the tailing, so that if an empty response is received, it will wait a longer period of time before submitting a new RPC. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14349) Edit log may be rolled more frequently than necessary with multiple Standby nodes
Erik Krogen created HDFS-14349: -- Summary: Edit log may be rolled more frequently than necessary with multiple Standby nodes Key: HDFS-14349 URL: https://issues.apache.org/jira/browse/HDFS-14349 Project: Hadoop HDFS Issue Type: Bug Components: ha, hdfs, qjm Reporter: Erik Krogen Assignee: Ekanth Sethuramalingam When HDFS-14317 was fixed, we tackled the problem that in a cluster with in-progress edit log tailing enabled, a Standby NameNode may _never_ roll the edit logs, which can eventually cause data loss. Unfortunately, in the process, it was made so that if there are multiple Standby NameNodes, they will all roll the edit logs at their specified frequency, so the edit log will be rolled X times more frequently than they should be (where X is the number of Standby NNs). This is not as bad as the original bug since rolling frequently does not affect correctness or data availability, but may degrade performance by creating more edit log segments than necessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14279) [SBN Read] Race condition in ObserverReadProxyProvider
Erik Krogen created HDFS-14279: -- Summary: [SBN Read] Race condition in ObserverReadProxyProvider Key: HDFS-14279 URL: https://issues.apache.org/jira/browse/HDFS-14279 Project: Hadoop HDFS Issue Type: Bug Components: hdfs, namenode Reporter: Erik Krogen Assignee: Erik Krogen There is a race condition in {{ObserverReadProxyProvider#getCurrentProxy()}}: {code} private NNProxyInfo getCurrentProxy() { if (currentProxy == null) { changeProxy(null); } return currentProxy; } {code} {{currentProxy}} is a {{volatile}}. Another {{changeProxy()}} could occur after the {{changeProxy()}} and before the {{return}}, thus making the return value incorrect. I have seen this result in an NPE. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14211) [Consistent Observer Reads] Allow for configurable "always msync" mode
Erik Krogen created HDFS-14211: -- Summary: [Consistent Observer Reads] Allow for configurable "always msync" mode Key: HDFS-14211 URL: https://issues.apache.org/jira/browse/HDFS-14211 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Erik Krogen To allow for reads to be serviced from an ObserverNode (see HDFS-12943) in a consistent way, an {{msync}} API was introduced (HDFS-13688) to allow for a client to fetch the latest transaction ID from the Active NN, thereby ensuring that subsequent reads from the ObserverNode will be up-to-date with the current state of the Active. Using this properly, however, requires application-side changes: for examples, a NodeManager should call {{msync}} before localizing the resources for a client, since it received notification of the existence of those resources via communicate which is out-of-band to HDFS and thus could potentially attempt to localize them prior to the availability of those resources on the ObserverNode. Until such application-side changes can be made, which will be a longer-term effort, we need to provide a mechanism for unchanged clients to utilize the ObserverNode without exposing such a client to inconsistencies. This is essentially phase 3 of the roadmap outlined in the [design document|https://issues.apache.org/jira/secure/attachment/12915990/ConsistentReadsFromStandbyNode.pdf] for HDFS-12943. The design document proposes some heuristics based on understanding of how common applications (e.g. MR) use HDFS for resources. As an initial pass, we can simply have a flag which tells a client to call {{msync}} before _every single_ read operation. This may seem counterintuitive, as it turns every read operation into two RPCs: {{msync}} to the Active following by an actual read operation to the Observer. However, the {{msync}} operation is extremely lightweight, as it does not acquire the {{FSNamesystemLock}}, and in experiments we have found that this approach can easily scale to well over 100,000 {{msync}} operations per second on the Active (while still servicing approx. 10,000 write op/s). Combined with the fast-path edit log tailing for standby/observer nodes (HDFS-13150), this "always msync" approach should introduce only a few ms of extra latency to each read call. Below are some experimental results collected from experiments which convert a normal RPC workload into one in which all read operations are turned into an {{msync}}. The baseline is a workload of 1.5k write op/s and 25k read op/s. ||Rate Multiplier|2|4|6|8|| ||RPC Queue Avg Time (ms)|14.2|53.2|110.4|125.3|| ||RPC Queue NumOps Avg (k)|51.4|102.3|147.8|177.9|| ||RPC Queue NumOps Max (k)|148.8|269.5|306.3|312.4|| Results are promising up to between 4x and 6x of the baseline workload, which is approx. 100-150k read op/s. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14155) Update "Consistent Read from Observer" User Guide with Edit Tailing Frequency
[ https://issues.apache.org/jira/browse/HDFS-14155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-14155. Resolution: Duplicate > Update "Consistent Read from Observer" User Guide with Edit Tailing Frequency > - > > Key: HDFS-14155 > URL: https://issues.apache.org/jira/browse/HDFS-14155 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: documentation, hdfs >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > > Currently the user guide created in HDFS-14131 does not make any mention of > the recommendation for {{dfs.ha.tail-edits.period}}, but the default works > very poorly in combination with this feature. We should update the > documentation to reflect this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14155) Update "Consistent Read from Observer" User Guide with Edit Tailing Frequency
Erik Krogen created HDFS-14155: -- Summary: Update "Consistent Read from Observer" User Guide with Edit Tailing Frequency Key: HDFS-14155 URL: https://issues.apache.org/jira/browse/HDFS-14155 Project: Hadoop HDFS Issue Type: Sub-task Components: documentation, hdfs Reporter: Erik Krogen Assignee: Erik Krogen Currently the user guide created in HDFS-14131 does not make any mention of the recommendation for {{dfs.ha.tail-edits.period}}, but the default works very poorly in combination with this feature. We should update the documentation to reflect this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-13873) ObserverNode should reject read requests when it is too far behind.
[ https://issues.apache.org/jira/browse/HDFS-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-13873. Resolution: Fixed > ObserverNode should reject read requests when it is too far behind. > --- > > Key: HDFS-13873 > URL: https://issues.apache.org/jira/browse/HDFS-13873 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client, namenode >Affects Versions: HDFS-12943 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Fix For: HDFS-12943 > > Attachments: HDFS-13873-HDFS-12943.001.patch, > HDFS-13873-HDFS-12943.002.patch, HDFS-13873-HDFS-12943.003.patch, > HDFS-13873-HDFS-12943.004.patch, HDFS-13873-HDFS-12943.005.patch > > > Add a server-side threshold for ObserverNode to reject read requests when it > is too far behind. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-14048) DFSOutputStream close() throws exception on subsequent call after DataNode restart
[ https://issues.apache.org/jira/browse/HDFS-14048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen reopened HDFS-14048: Re-opening for branch-2 commit. Sorry for the trouble [~elgoiri], I have just attached the branch-2 patch. Since I'm not sure if Jenkins will run properly given the branch-2 build issues, I also executed all of the following tests locally without any failures: {{TestClientProtocolForPipelineRecovery,TestDFSOutputStream,TestClientBlockVerification,TestDatanodeRestart}} > DFSOutputStream close() throws exception on subsequent call after DataNode > restart > -- > > Key: HDFS-14048 > URL: https://issues.apache.org/jira/browse/HDFS-14048 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.3.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1 > > Attachments: HDFS-14048.000.patch > > > We recently discovered an issue in which, during a rolling upgrade, some jobs > were failing with exceptions like (sadly this is the whole stack trace): > {code} > java.io.IOException: A datanode is restarting: > DatanodeInfoWithStorage[1.1.1.1:71,BP-,DISK] > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:877) > {code} > with an earlier statement in the log like: > {code} > INFO [main] org.apache.hadoop.hdfs.DFSClient: A datanode is restarting: > DatanodeInfoWithStorage[1.1.1.1:71,BP-,DISK] > {code} > Strangely we did not see any other logs about the {{DFSOutputStream}} failing > after waiting for the DataNode restart. We eventually realized that in some > cases {{DFSOutputStream#close()}} may be called more than once, and that if > so, the {{IOException}} above is thrown on the _second_ call to {{close()}} > (this is even with HDFS-5335; prior to this it would have been thrown on all > calls to {{close()}} besides the first). > The problem is that in {{DataStreamer#createBlockOutputStream()}}, after the > new output stream is created, it resets the error states: > {code} > errorState.resetInternalError(); > // remove all restarting nodes from failed nodes list > failed.removeAll(restartingNodes); > restartingNodes.clear(); > {code} > But it forgets to clear {{lastException}}. When > {{DFSOutputStream#closeImpl()}} is called a second time, this block is > triggered: > {code} > if (isClosed()) { > LOG.debug("Closing an already closed stream. [Stream:{}, streamer:{}]", > closed, getStreamer().streamerClosed()); > try { > getStreamer().getLastException().check(true); > {code} > The second time, {{isClosed()}} is true, so the exception checking occurs and > the "Datanode is restarting" exception is thrown even though the stream has > already been successfully closed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14048) DFSOutputStream close() throws exception on subsequent call after DataNode restart
Erik Krogen created HDFS-14048: -- Summary: DFSOutputStream close() throws exception on subsequent call after DataNode restart Key: HDFS-14048 URL: https://issues.apache.org/jira/browse/HDFS-14048 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Reporter: Erik Krogen Assignee: Erik Krogen We recently discovered an issue in which, during a rolling upgrade, some jobs were failing with exceptions like (sadly this is the whole stack trace): {code} java.io.IOException: A datanode is restarting: DatanodeInfoWithStorage[1.1.1.1:71,BP-,DISK] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:877) {code} with an earlier statement in the log like: {code} INFO [main] org.apache.hadoop.hdfs.DFSClient: A datanode is restarting: DatanodeInfoWithStorage[1.1.1.1:71,BP-,DISK] {code} Strangely we did not see any other logs about the {{DFSOutputStream}} failing after waiting for the DataNode restart. We eventually realized that in some cases {{DFSOutputStream#close()}} may be called more than once, and that if so, the {{IOException}} above is thrown on the _second_ call to {{close()}} (this is even with HDFS-5335; prior to this it would have been thrown on all calls to {{close()}} besides the first). The problem is that in {{DataStreamer#createBlockOutputStream()}}, after the new output stream is created, it resets the error states: {code} errorState.resetInternalError(); // remove all restarting nodes from failed nodes list failed.removeAll(restartingNodes); restartingNodes.clear(); {code} But it forgets to clear {{lastException}}. When {{DFSOutputStream#closeImpl()}} is called a second time, this block is triggered: {code} if (isClosed()) { LOG.debug("Closing an already closed stream. [Stream:{}, streamer:{}]", closed, getStreamer().streamerClosed()); try { getStreamer().getLastException().check(true); {code} The second time, {{isClosed()}} is true, so the exception checking occurs and the "Datanode is restarting" exception is thrown even though the stream has already been successfully closed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14034) Support getQuotaUsage API in WebHDFS
Erik Krogen created HDFS-14034: -- Summary: Support getQuotaUsage API in WebHDFS Key: HDFS-14034 URL: https://issues.apache.org/jira/browse/HDFS-14034 Project: Hadoop HDFS Issue Type: Improvement Components: fs, webhdfs Reporter: Erik Krogen Assignee: Erik Krogen HDFS-8898 added support for a new API, {{getQuotaUsage}} which can fetch quota usage on a directory with significantly lower impact than the similar {{getContentSummary}}. This JIRA is to track adding support for this API to WebHDFS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13977) NameNode can kill itself if it tries to send too many txns to a QJM simultaneously
Erik Krogen created HDFS-13977: -- Summary: NameNode can kill itself if it tries to send too many txns to a QJM simultaneously Key: HDFS-13977 URL: https://issues.apache.org/jira/browse/HDFS-13977 Project: Hadoop HDFS Issue Type: Bug Components: namenode, qjm Affects Versions: 2.7.7 Reporter: Erik Krogen Assignee: Erik Krogen h3. Problem & Logs We recently encountered an issue on a large cluster (running 2.7.4) in which the NameNode killed itself because it was unable to communicate with the JNs via QJM. We discovered that it was the result of the NameNode trying to send a huge batch of over 1 million transactions to the JNs in a single RPC: {code:title=NameNode Logs} WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Remote journal X.X.X.X: failed to write txns 1000-11153636. Will try to write to this JN again after the next log roll. ... WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 1098ms to send a batch of 1153637 edits (335886611 bytes) to remote journal X.X.X.X: {code} {code:title=JournalNode Logs} INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8485: readAndProcess from client X.X.X.X threw exception [java.io.IOException: Requested data length 335886776 is longer than maximum configured RPC length 67108864. RPC came from X.X.X.X] java.io.IOException: Requested data length 335886776 is longer than maximum configured RPC length 67108864. RPC came from X.X.X.X at org.apache.hadoop.ipc.Server$Connection.checkDataLength(Server.java:1610) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1672) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:897) at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:753) at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:724) {code} The JournalNodes rejected the RPC because it had a size well over the 64MB default {{ipc.maximum.data.length}}. This was triggered by a huge number of files all hitting a hard lease timeout simultaneously, causing the NN to force-close them all at once. This can be a particularly nasty bug as the NN will attempt to re-send this same huge RPC on restart, as it loads an fsimage which still has all of these open files that need to be force-closed. h3. Proposed Solution To solve this we propose to modify {{EditsDoubleBuffer}} to add a "hard limit" based on the value of {{ipc.maximum.data.length}}. When {{writeOp()}} or {{writeRaw()}} is called, first check the size of {{bufCurrent}}. If it exceeds the hard limit, block the writer until the buffer is flipped and {{bufCurrent}} becomes {{bufReady}}. This gives some self-throttling to prevent the NameNode from killing itself in this way. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-13930) Fix crlf line endings in HDFS-12943 branch
[ https://issues.apache.org/jira/browse/HDFS-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-13930. Resolution: Fixed > Fix crlf line endings in HDFS-12943 branch > -- > > Key: HDFS-13930 > URL: https://issues.apache.org/jira/browse/HDFS-13930 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-13930-HDFS-12943.000.patch, > branch-HDFS-12943-before.pdf > > > One of the merge commits introduced the wrong line endings to some {{*.cmd}} > files. Looks like it was commit {{1363eff69c3}} that broke it. > The tree is: > {code} > * | 1363eff69c3 2018-09-17 Merge commit > '9af96d4ed4b6f80d3ca53a2b003d2ef768650dd4' into HDFS-12943 [Konstantin V > Shvachko ] > |\ \ > | |/ > | * 9af96d4ed4b 2018-09-05 HADOOP-15707. Add IsActiveServlet to be used for > Load Balancers. Contributed by Lukas Majercak. [Giovanni Matteo Fumarola > ] > * | 94d7f90e93b 2018-09-17 Merge commit > 'e780556ae9229fe7a90817eb4e5449d7eed35dd8' into HDFS-12943 [Konstantin V > Shvachko ] > {code} > So that merge commit should have only introduced a single new commit > {{9af96d4ed4b}}. But: > {code} > ± git show --stat 9af96d4ed4b | cat > commit 9af96d4ed4b6f80d3ca53a2b003d2ef768650dd4 > Author: Giovanni Matteo Fumarola > Date: Wed Sep 5 10:50:25 2018 -0700 > HADOOP-15707. Add IsActiveServlet to be used for Load Balancers. > Contributed by Lukas Majercak. > .../org/apache/hadoop/http/IsActiveServlet.java| 71 > .../apache/hadoop/http/TestIsActiveServlet.java| 95 > ++ > .../federation/router/IsRouterActiveServlet.java | 37 + > .../server/federation/router/RouterHttpServer.java | 9 ++ > .../src/site/markdown/HDFSRouterFederation.md | 2 +- > .../server/namenode/IsNameNodeActiveServlet.java | 33 > .../hdfs/server/namenode/NameNodeHttpServer.java | 3 + > .../site/markdown/HDFSHighAvailabilityWithQJM.md | 8 ++ > .../IsResourceManagerActiveServlet.java| 38 + > .../server/resourcemanager/ResourceManager.java| 5 ++ > .../resourcemanager/webapp/RMWebAppFilter.java | 3 +- > .../src/site/markdown/ResourceManagerHA.md | 5 ++ > 12 files changed, 307 insertions(+), 2 deletions(-) > {code} > that commit has no changes to the cmd, whereas the merge commit does: > {code} > ± git show --stat 1363eff69c3 | cat > commit 1363eff69c36c4f2085194b59a86370505cc00cd > Merge: 94d7f90e93b 9af96d4ed4b > Author: Konstantin V Shvachko > Date: Mon Sep 17 17:39:11 2018 -0700 > Merge commit '9af96d4ed4b6f80d3ca53a2b003d2ef768650dd4' into HDFS-12943 > # Conflicts: > # > hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithQJM.md > .../hadoop-common/src/main/bin/start-all.cmd | 104 > ++--- > .../hadoop-common/src/main/bin/stop-all.cmd| 104 > ++--- > .../org/apache/hadoop/http/IsActiveServlet.java| 71 ++ > .../apache/hadoop/http/TestIsActiveServlet.java| 95 +++ > .../federation/router/IsRouterActiveServlet.java | 37 > .../server/federation/router/RouterHttpServer.java | 9 ++ > .../src/site/markdown/HDFSRouterFederation.md | 2 +- > .../hadoop-hdfs/src/main/bin/hdfs-config.cmd | 86 - > .../hadoop-hdfs/src/main/bin/start-dfs.cmd | 82 > .../hadoop-hdfs/src/main/bin/stop-dfs.cmd | 82 > .../server/namenode/IsNameNodeActiveServlet.java | 33 +++ > .../hdfs/server/namenode/NameNodeHttpServer.java | 3 + > .../site/markdown/HDFSHighAvailabilityWithQJM.md | 8 ++ > hadoop-mapreduce-project/bin/mapred-config.cmd | 86 - > hadoop-tools/hadoop-streaming/src/test/bin/cat.cmd | 36 +++ > .../hadoop-streaming/src/test/bin/xargs_cat.cmd| 36 +++ > hadoop-yarn-project/hadoop-yarn/bin/start-yarn.cmd | 94 +-- > hadoop-yarn-project/hadoop-yarn/bin/stop-yarn.cmd | 94 +-- > .../IsResourceManagerActiveServlet.java| 38 > .../server/resourcemanager/ResourceManager.java| 5 + > .../resourcemanager/webapp/RMWebAppFilter.java | 3 +- > .../src/site/markdown/ResourceManagerHA.md | 5 + > 22 files changed, 709 insertions(+), 404 deletions(-) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13930) Fix crlf line endings in HDFS-12943 branch
Erik Krogen created HDFS-13930: -- Summary: Fix crlf line endings in HDFS-12943 branch Key: HDFS-13930 URL: https://issues.apache.org/jira/browse/HDFS-13930 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Erik Krogen Assignee: Erik Krogen One of the merge commits introduced the wrong line endings to some {{*.cmd}} files. Looks like it was commit {{1363eff69c3}} that broke it. The tree is: {code} * | 1363eff69c3 2018-09-17 Merge commit '9af96d4ed4b6f80d3ca53a2b003d2ef768650dd4' into HDFS-12943 [Konstantin V Shvachko ] |\ \ | |/ | * 9af96d4ed4b 2018-09-05 HADOOP-15707. Add IsActiveServlet to be used for Load Balancers. Contributed by Lukas Majercak. [Giovanni Matteo Fumarola ] * | 94d7f90e93b 2018-09-17 Merge commit 'e780556ae9229fe7a90817eb4e5449d7eed35dd8' into HDFS-12943 [Konstantin V Shvachko ] {code} So that merge commit should have only introduced a single new commit {{9af96d4ed4b}}. But: {code} ± git show --stat 9af96d4ed4b | cat commit 9af96d4ed4b6f80d3ca53a2b003d2ef768650dd4 Author: Giovanni Matteo Fumarola Date: Wed Sep 5 10:50:25 2018 -0700 HADOOP-15707. Add IsActiveServlet to be used for Load Balancers. Contributed by Lukas Majercak. .../org/apache/hadoop/http/IsActiveServlet.java| 71 .../apache/hadoop/http/TestIsActiveServlet.java| 95 ++ .../federation/router/IsRouterActiveServlet.java | 37 + .../server/federation/router/RouterHttpServer.java | 9 ++ .../src/site/markdown/HDFSRouterFederation.md | 2 +- .../server/namenode/IsNameNodeActiveServlet.java | 33 .../hdfs/server/namenode/NameNodeHttpServer.java | 3 + .../site/markdown/HDFSHighAvailabilityWithQJM.md | 8 ++ .../IsResourceManagerActiveServlet.java| 38 + .../server/resourcemanager/ResourceManager.java| 5 ++ .../resourcemanager/webapp/RMWebAppFilter.java | 3 +- .../src/site/markdown/ResourceManagerHA.md | 5 ++ 12 files changed, 307 insertions(+), 2 deletions(-) {code} that commit has no changes to the cmd, whereas the merge commit does: {code} ± git show --stat 1363eff69c3 | cat commit 1363eff69c36c4f2085194b59a86370505cc00cd Merge: 94d7f90e93b 9af96d4ed4b Author: Konstantin V Shvachko Date: Mon Sep 17 17:39:11 2018 -0700 Merge commit '9af96d4ed4b6f80d3ca53a2b003d2ef768650dd4' into HDFS-12943 # Conflicts: # hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithQJM.md .../hadoop-common/src/main/bin/start-all.cmd | 104 ++--- .../hadoop-common/src/main/bin/stop-all.cmd| 104 ++--- .../org/apache/hadoop/http/IsActiveServlet.java| 71 ++ .../apache/hadoop/http/TestIsActiveServlet.java| 95 +++ .../federation/router/IsRouterActiveServlet.java | 37 .../server/federation/router/RouterHttpServer.java | 9 ++ .../src/site/markdown/HDFSRouterFederation.md | 2 +- .../hadoop-hdfs/src/main/bin/hdfs-config.cmd | 86 - .../hadoop-hdfs/src/main/bin/start-dfs.cmd | 82 .../hadoop-hdfs/src/main/bin/stop-dfs.cmd | 82 .../server/namenode/IsNameNodeActiveServlet.java | 33 +++ .../hdfs/server/namenode/NameNodeHttpServer.java | 3 + .../site/markdown/HDFSHighAvailabilityWithQJM.md | 8 ++ hadoop-mapreduce-project/bin/mapred-config.cmd | 86 - hadoop-tools/hadoop-streaming/src/test/bin/cat.cmd | 36 +++ .../hadoop-streaming/src/test/bin/xargs_cat.cmd| 36 +++ hadoop-yarn-project/hadoop-yarn/bin/start-yarn.cmd | 94 +-- hadoop-yarn-project/hadoop-yarn/bin/stop-yarn.cmd | 94 +-- .../IsResourceManagerActiveServlet.java| 38 .../server/resourcemanager/ResourceManager.java| 5 + .../resourcemanager/webapp/RMWebAppFilter.java | 3 +- .../src/site/markdown/ResourceManagerHA.md | 5 + 22 files changed, 709 insertions(+), 404 deletions(-) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13904) ContentSummary does not always respect processing limit, resulting in long lock acquisitions
Erik Krogen created HDFS-13904: -- Summary: ContentSummary does not always respect processing limit, resulting in long lock acquisitions Key: HDFS-13904 URL: https://issues.apache.org/jira/browse/HDFS-13904 Project: Hadoop HDFS Issue Type: Bug Components: hdfs, namenode Reporter: Erik Krogen Assignee: Erik Krogen HDFS-4995 added a config {{dfs.content-summary.limit}} which allows for an administrator to set a limit on the number of entries processed during a single acquisition of the {{FSNamesystemLock}} during the creation of a content summary. This is useful to prevent very long (multiple seconds) pauses on the NameNode when {{getContentSummary}} is called on large directories. However, even on versions with HDFS-4995, we have seen warnings like: {code} INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem read lock held for 9398 ms via java.lang.Thread.getStackTrace(Thread.java:1552) org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:950) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.readUnlock(FSNamesystemLock.java:188) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.readUnlock(FSNamesystem.java:1486) org.apache.hadoop.hdfs.server.namenode.ContentSummaryComputationContext.yield(ContentSummaryComputationContext.java:109) org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeDirectoryContentSummary(INodeDirectory.java:679) org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeContentSummary(INodeDirectory.java:642) org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeDirectoryContentSummary(INodeDirectory.java:656) {code} happen quite consistently when {{getContentSummary}} was called on a large directory on a heavily-loaded NameNode. Such long pauses completely destroy the performance of the NameNode. We have the limit set to its default of 5000; if it was respected, clearly there would not be a 10-second pause. The current {{yield()}} code within {{ContentSummaryComputationContext}} looks like: {code} public boolean yield() { // Are we set up to do this? if (limitPerRun <= 0 || dir == null || fsn == null) { return false; } // Have we reached the limit? long currentCount = counts.getFileCount() + counts.getSymlinkCount() + counts.getDirectoryCount() + counts.getSnapshotableDirectoryCount(); if (currentCount <= nextCountLimit) { return false; } // Update the next limit nextCountLimit = currentCount + limitPerRun; boolean hadDirReadLock = dir.hasReadLock(); boolean hadDirWriteLock = dir.hasWriteLock(); boolean hadFsnReadLock = fsn.hasReadLock(); boolean hadFsnWriteLock = fsn.hasWriteLock(); // sanity check. if (!hadDirReadLock || !hadFsnReadLock || hadDirWriteLock || hadFsnWriteLock || dir.getReadHoldCount() != 1 || fsn.getReadHoldCount() != 1) { // cannot relinquish return false; } // unlock dir.readUnlock(); fsn.readUnlock("contentSummary"); try { Thread.sleep(sleepMilliSec, sleepNanoSec); } catch (InterruptedException ie) { } finally { // reacquire fsn.readLock(); dir.readLock(); } yieldCount++; return true; } {code} We believe that this check in particular is the culprit: {code} if (!hadDirReadLock || !hadFsnReadLock || hadDirWriteLock || hadFsnWriteLock || dir.getReadHoldCount() != 1 || fsn.getReadHoldCount() != 1) { // cannot relinquish return false; } {code} The content summary computation will only relinquish the lock if it is currently the _only_ holder of the lock. Given the high volume of read requests on a heavily loaded NameNode, especially when unfair locking is enabled, it is likely there may be another holder of the read lock performing some short-lived operation. By refusing to give up the lock in this case, the content summary computation ends up never relinquishing the lock. We propose to simply remove the readHoldCount checks from this {{yield()}}. This should alleviate the case described above by giving up the read lock and allowing other short-lived operations to complete (while the content summary thread sleeps) so that the lock can finally be given up completely. This has the drawback that sometimes, the content summary may give up the lock unnecessarily, if the read lock is never actually released by the time the thread continues again. The only negative impact from this is to make some large content summary operations slightly slower, with the tradeoff of reducing NameNode-wide performance impact. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Resolved] (HDFS-13872) Only some protocol methods should perform msync wait
[ https://issues.apache.org/jira/browse/HDFS-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-13872. Resolution: Duplicate Closing in favor of HDFS-13880 > Only some protocol methods should perform msync wait > > > Key: HDFS-13872 > URL: https://issues.apache.org/jira/browse/HDFS-13872 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-13872-HDFS-12943.000.patch > > > Currently the implementation of msync added in HDFS-13767 waits until the > server has caught up to the client-specified transaction ID regardless of > what the inbound RPC is. This particularly causes problems for > ObserverReadProxyProvider (see HDFS-13779) when we try to fetch the state > from an observer/standby; this should be a quick operation, but it has to > wait for the node to catch up to the most current state. I initially thought > all {{HAServiceProtocol}} methods should thus be excluded from the wait > period, but actually I think the right approach is that _only_ > {{ClientProtocol}} methods should be subjected to the wait period. I propose > that we can do this via an annotation on client protocol which can then be > checked within {{ipc.Server}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13872) Only ClientProtocol should perform msync wait
Erik Krogen created HDFS-13872: -- Summary: Only ClientProtocol should perform msync wait Key: HDFS-13872 URL: https://issues.apache.org/jira/browse/HDFS-13872 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Erik Krogen Currently the implementation of msync added in HDFS-13767 waits until the server has caught up to the client-specified transaction ID regardless of what the inbound RPC is. This particularly causes problems for ObserverReadProxyProvider (see HDFS-13779) when we try to fetch the state from an observer/standby; this should be a quick operation, but it has to wait for the node to catch up to the most current state. I initially thought all {{HAServiceProtocol}} methods should thus be excluded from the wait period, but actually I think the right approach is that _only_ {{ClientProtocol}} methods should be subjected to the wait period. I propose that we can do this via an annotation on client protocol which can then be checked within {{ipc.Server}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-12421) Balancer to emit standard metrics
[ https://issues.apache.org/jira/browse/HDFS-12421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-12421. Resolution: Duplicate Just noticed this is a dup of HDFS-10648 > Balancer to emit standard metrics > - > > Key: HDFS-12421 > URL: https://issues.apache.org/jira/browse/HDFS-12421 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, metrics >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Minor > > The Balancer currently prints some statistics about its operation to stdout > while it is running. This is fine if the balancer is manually run via CLI by > an operator, but for the more common case of it being a scheduled execution, > it is cumbersome to have to track down the logs to be able to monitor its > progress. > We already have a standard metrics system in place; I propose that we have > the Balancer emit metrics while it is running so that they can be tracked via > standard metrics infrastructure. We can start with just the things that the > balancer already prints to stdout: bytes already moved, bytes left to move, > bytes currently being moved, and iteration number. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13791) Limit logging frequency of edit tail related statements
Erik Krogen created HDFS-13791: -- Summary: Limit logging frequency of edit tail related statements Key: HDFS-13791 URL: https://issues.apache.org/jira/browse/HDFS-13791 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs, qjm Reporter: Erik Krogen Assignee: Erik Krogen There are a number of log statements that occur every time new edits are tailed by a Standby NameNode. When edits are tailing only on the order of every tens of seconds, this is fine. With the work in HDFS-13150, however, edits may be tailed every few milliseconds, which can flood the logs with tailing-related statements. We should throttle it to limit it to printing at most, say, once per 5 seconds. We can implement logic similar to that used in HDFS-10713. This may be slightly more tricky since the log statements are distributed across a few classes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13789) Reduce logging frequency of QuorumJournalManager#selectInputStreams
Erik Krogen created HDFS-13789: -- Summary: Reduce logging frequency of QuorumJournalManager#selectInputStreams Key: HDFS-13789 URL: https://issues.apache.org/jira/browse/HDFS-13789 Project: Hadoop HDFS Issue Type: Improvement Components: namenode, qjm Affects Versions: HDFS-12943 Reporter: Erik Krogen Assignee: Erik Krogen As part of HDFS-13150, a logging statement was added to indicate whenever an edit tail is performed via the RPC mechanism. To enable low latency tailing, the tail frequency must be set very low, so this log statement gets printed much too frequently at an INFO level. We should decrease to DEBUG. Note that if there are actually edits available to tail, other log messages will get printed; this is just targeting the case when it attempts to tail and there are no new edits. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-13150) [Edit Tail Fast Path] Allow SbNN to tail in-progress edits from JN via RPC
[ https://issues.apache.org/jira/browse/HDFS-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-13150. Resolution: Fixed Fix Version/s: HDFS-12943 Closing this as all sub-issues (HDFS-13607, HDFS-13608, HDFS-13609, HDFS-13610) have been completed. Thanks to all who helped with this new feature! > [Edit Tail Fast Path] Allow SbNN to tail in-progress edits from JN via RPC > -- > > Key: HDFS-13150 > URL: https://issues.apache.org/jira/browse/HDFS-13150 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, hdfs, journal-node, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: HDFS-12943 > > Attachments: edit-tailing-fast-path-design-v0.pdf, > edit-tailing-fast-path-design-v1.pdf, edit-tailing-fast-path-design-v2.pdf > > > In the interest of making coordinated/consistent reads easier to complete > with low latency, it is advantageous to reduce the time between when a > transaction is applied on the ANN and when it is applied on the SbNN. We > propose adding a new "fast path" which can be used to tail edits when low > latency is desired. We leave the existing tailing logic in place, and fall > back to this path on startup, recovery, and when the fast path encounters > unrecoverable errors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13689) NameNodeRpcServer getEditsFromTxid assumes it is run on active NameNode
Erik Krogen created HDFS-13689: -- Summary: NameNodeRpcServer getEditsFromTxid assumes it is run on active NameNode Key: HDFS-13689 URL: https://issues.apache.org/jira/browse/HDFS-13689 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs, namenode Reporter: Erik Krogen Assignee: Erik Krogen {{NameNodeRpcServer#getEditsFromTxid}} currently decides which transactions are able to be served, i.e. which transactions are durable, using the following logic: {code} long syncTxid = log.getSyncTxId(); // If we haven't synced anything yet, we can only read finalized // segments since we can't reliably determine which txns in in-progress // segments have actually been committed (e.g. written to a quorum of JNs). // If we have synced txns, we can definitely read up to syncTxid since // syncTxid is only updated after a transaction is committed to all // journals. (In-progress segments written by old writers are already // discarded for us, so if we read any in-progress segments they are // guaranteed to have been written by this NameNode.) boolean readInProgress = syncTxid > 0; {code} This assumes that the NameNode serving this request is the current writer/active NameNode, which may not be true in the ObserverNode situation. Since {{selectInputStreams}} now has a {{onlyDurableTxns}} flag, which, if enabled, will only return durable/committed transactions, we can instead leverage this to provide the same functionality. We should utilize this to avoid consistency issues when serving this request from the ObserverNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13610) [Edit Tail Fast Path Pt 4] Cleanup: integration test, documentation, remove unnecessary dummy sync
Erik Krogen created HDFS-13610: -- Summary: [Edit Tail Fast Path Pt 4] Cleanup: integration test, documentation, remove unnecessary dummy sync Key: HDFS-13610 URL: https://issues.apache.org/jira/browse/HDFS-13610 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, journal-node, namenode Reporter: Erik Krogen Assignee: Erik Krogen See HDFS-13150 for full design. This JIRA is targeted at cleanup tasks: * Add in integration testing. We can expand {{TestStandbyInProgressTail}} * Documentation in HDFSHighAvailabilityWithQJM * Remove the dummy sync added as part of HDFS-10519; it is unnecessary since now in-progress tailing does not rely as heavily on the JN committedTxnId -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
Erik Krogen created HDFS-13609: -- Summary: [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC Key: HDFS-13609 URL: https://issues.apache.org/jira/browse/HDFS-13609 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, namenode Reporter: Erik Krogen Assignee: Erik Krogen See HDFS-13150 for the full design. This JIRA is targetted at the NameNode-side changes to enable tailing in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are in the QuorumJournalManager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13608) [Edit Tail Fast Path Pt 2] Add ability for JournalNode to serve edits via RPC
Erik Krogen created HDFS-13608: -- Summary: [Edit Tail Fast Path Pt 2] Add ability for JournalNode to serve edits via RPC Key: HDFS-13608 URL: https://issues.apache.org/jira/browse/HDFS-13608 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Erik Krogen Assignee: Erik Krogen See HDFS-13150 for full design. This JIRA is to make the JournalNode-side changes necessary to support serving edits via RPC. This includes interacting with the cache added in HDFS-13607. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-13125) Improve efficiency of JN -> Standby Pipeline Under Frequent Edit Tailing
[ https://issues.apache.org/jira/browse/HDFS-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-13125. Resolution: Duplicate This was subsumed by HDFS-13150 > Improve efficiency of JN -> Standby Pipeline Under Frequent Edit Tailing > > > Key: HDFS-13125 > URL: https://issues.apache.org/jira/browse/HDFS-13125 > Project: Hadoop HDFS > Issue Type: Improvement > Components: journal-node, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > > The current edit tailing pipeline is designed for > * High resiliency > * High throughput > and was _not_ designed for low latency. > It was designed under the assumption that each edit log segment would > typically be read all at once, e.g. on startup or the SbNN tailing the entire > thing after it is finalized. The ObserverNode should be reading constantly > from the JournalNodes' in-progress edit logs with low latency, to reduce the > lag time from when a transaction is committed on the ANN and when it is > visible on the ObserverNode. > Due to the critical nature of this pipeline to the health of HDFS, it would > be better not to redesign it altogether. Based on some experiments it seems > if we mitigate the following issues, lag times are reduced to low levels (low > hundreds of milliseconds even under very high write load): > * The overhead of creating a new HTTP connection for each time new edits are > fetched. This makes sense when you're expecting to tail an entire segment; it > does not when you may only be fetching a small number of edits. We can > mitigate this by allowing edits to be tailed via an RPC call, or by adding a > connection pool for the existing connections to the journal. > * The overhead of transmitting a whole file at once. Right now when an edit > segment is requested, the JN sends the entire segment, and on the SbNN it > will ignore edits up to the ones it wants. How to solve this one may be more > tricky, but one suggestion would be to keep recently logged edits in memory, > avoiding the need to serve them from file at all, allowing the JN to quickly > serve only the required edits. > We can implement these as optimizations on top of the existing logic, with > fallbacks to the current slow-but-resilient pipeline. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13602) Optimize checkOperation(WRITE) check in FSNamesystem getBlockLocations
Erik Krogen created HDFS-13602: -- Summary: Optimize checkOperation(WRITE) check in FSNamesystem getBlockLocations Key: HDFS-13602 URL: https://issues.apache.org/jira/browse/HDFS-13602 Project: Hadoop HDFS Issue Type: Improvement Components: ha, namenode Reporter: Erik Krogen Assignee: Chao Sun Similar to the work done in HDFS-4591 to avoid having to take a write lock before checking if an operation category is allowed, we can do the same for the write lock that is taken sometimes (when updating access time) within getBlockLocations. This is particularly useful when using the standby read feature (HDFS-12943), as it will be the case on an observer node that the operationCategory(READ) check succeeds but the operationCategory(WRITE) check fails. It would be ideal to fail this check _before_ acquiring the write lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-13595) Edit tailing period configuration should accept time units
[ https://issues.apache.org/jira/browse/HDFS-13595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-13595. Resolution: Invalid This is already done. Looked at the wrong branch, my mistake. > Edit tailing period configuration should accept time units > -- > > Key: HDFS-13595 > URL: https://issues.apache.org/jira/browse/HDFS-13595 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > > The {{dfs.ha.tail-edits.period}} config should accept time units to be able > to more easily specified across a wide range, and in particular for > HDFS-13150 it is useful to have a period shorter than 1 second which is not > currently possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13595) Edit tailing period configuration should accept time units
Erik Krogen created HDFS-13595: -- Summary: Edit tailing period configuration should accept time units Key: HDFS-13595 URL: https://issues.apache.org/jira/browse/HDFS-13595 Project: Hadoop HDFS Issue Type: Improvement Components: ha, namenode Reporter: Erik Krogen Assignee: Erik Krogen The {{dfs.ha.tail-edits.period}} config should accept time units to be able to more easily specified across a wide range, and in particular for HDFS-13150 it is useful to have a period shorter than 1 second which is not currently possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13523) Support observer nodes in MiniDFSCluster
Erik Krogen created HDFS-13523: -- Summary: Support observer nodes in MiniDFSCluster Key: HDFS-13523 URL: https://issues.apache.org/jira/browse/HDFS-13523 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, test Reporter: Erik Krogen MiniDFSCluster should support Observer nodes so that we can write decent integration tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13522) Support observer node from Router-Based Federation
Erik Krogen created HDFS-13522: -- Summary: Support observer node from Router-Based Federation Key: HDFS-13522 URL: https://issues.apache.org/jira/browse/HDFS-13522 Project: Hadoop HDFS Issue Type: Sub-task Components: federation, namenode Reporter: Erik Krogen Changes will need to occur to the router to support the new observer node. One such change will be to make the router understand the observer state, e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13493) Reduce the HttpServer2 thread count on DataNodes
Erik Krogen created HDFS-13493: -- Summary: Reduce the HttpServer2 thread count on DataNodes Key: HDFS-13493 URL: https://issues.apache.org/jira/browse/HDFS-13493 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Erik Krogen Assignee: Erik Krogen Given that HFTP was removed in Hadoop 3 and WebHDFS is handled via Netty, the HttpServer2 instance within the DataNode is only used for very basic tasks such as the web UI. Thus we can safely reduce the thread count used here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13400) WebHDFS append returned stream has incorrectly set position
Erik Krogen created HDFS-13400: -- Summary: WebHDFS append returned stream has incorrectly set position Key: HDFS-13400 URL: https://issues.apache.org/jira/browse/HDFS-13400 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 3.0.1, 2.7.5, 2.8.3, 2.9.0 Reporter: Erik Krogen -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13272) DataNodeHttpServer hard-codes HttpServer2 threads at 10
Erik Krogen created HDFS-13272: -- Summary: DataNodeHttpServer hard-codes HttpServer2 threads at 10 Key: HDFS-13272 URL: https://issues.apache.org/jira/browse/HDFS-13272 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Erik Krogen Assignee: Erik Krogen In HDFS-7279, the Jetty server on the DataNode was hard-coded to use 10 threads. In addition to the possibility of this being too few threads, it is much higher than necessary in resource constrained environments such as MiniDFSCluster. To avoid compatibility issues, rather than using {{HttpServer2#HTTP_MAX_THREADS}} directly, we can introduce a new configuration for the DataNode's thread pool size. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13265) MiniDFSCluster should set reasonable defaults to reduce resource consumption
Erik Krogen created HDFS-13265: -- Summary: MiniDFSCluster should set reasonable defaults to reduce resource consumption Key: HDFS-13265 URL: https://issues.apache.org/jira/browse/HDFS-13265 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode, test Reporter: Erik Krogen MiniDFSCluster takes its defaults from {{DFSConfigKeys}} defaults, but many of these are not suitable for a unit test environment. For example, the default handler thread count of 10 is definitely more than necessary for (almost?) any unit test. We should set reasonable, lower defaults unless a test specifically requires more. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13264) CacheReplicationMonitor should be able to be disabled completely
Erik Krogen created HDFS-13264: -- Summary: CacheReplicationMonitor should be able to be disabled completely Key: HDFS-13264 URL: https://issues.apache.org/jira/browse/HDFS-13264 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Erik Krogen Currently there is no way to completely disable the CacheReplicationMonitor, even if the feature is not being used at all. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13263) DiskBalancer should not start a thread if it is disabled
Erik Krogen created HDFS-13263: -- Summary: DiskBalancer should not start a thread if it is disabled Key: HDFS-13263 URL: https://issues.apache.org/jira/browse/HDFS-13263 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Erik Krogen Assignee: Erik Krogen -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13262) Services should not start threads unnecessarily
Erik Krogen created HDFS-13262: -- Summary: Services should not start threads unnecessarily Key: HDFS-13262 URL: https://issues.apache.org/jira/browse/HDFS-13262 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode, test Reporter: Erik Krogen There are a number of services in HDFS that start a thread even if they are disabled. Some services which may not be strictly necessary do not have a way to be disabled. This is particularly bad for the unit tests, in which the number of threads spawned by concurrent MiniDFSCluster-based tests can grow to be very large (e.g. see HDFS-12711) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13150) Create fast path for SbNN tailing edits from JNs
Erik Krogen created HDFS-13150: -- Summary: Create fast path for SbNN tailing edits from JNs Key: HDFS-13150 URL: https://issues.apache.org/jira/browse/HDFS-13150 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs, journal-node, namenode Reporter: Erik Krogen Assignee: Erik Krogen In the interest of making coordinated/consistent reads easier to complete with low latency, it is advantageous to reduce the time between when a transaction is applied on the ANN and when it is applied on the SbNN. We propose adding a new "fast path" which can be used to tail edits when low latency is desired. We leave the existing tailing logic in place, and fall back to this path on startup, recovery, and when the fast path encounters unrecoverable errors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-13122) Tailing edits should not update quota counts on ObserverNode
[ https://issues.apache.org/jira/browse/HDFS-13122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-13122. Resolution: Duplicate > Tailing edits should not update quota counts on ObserverNode > > > Key: HDFS-13122 > URL: https://issues.apache.org/jira/browse/HDFS-13122 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > > Currently in {{FSImage#loadEdits()}}, after applying a set of edits, we call > {code} > updateCountForQuota(target.getBlockManager().getStoragePolicySuite(), > target.dir.rootDir); > {code} > to update the quota counts for the entire namespace, which can be very > expensive. This makes sense if we are about to become the ANN, since we need > valid quotas, but not on an ObserverNode which does not need to enforce > quotas. > This is related to increasing the frequency with which the SbNN can tail > edits from the ANN to decrease the lag time for transactions to appear on the > Observer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13126) Re-enable HTTP Request Logging for WebHDFS
Erik Krogen created HDFS-13126: -- Summary: Re-enable HTTP Request Logging for WebHDFS Key: HDFS-13126 URL: https://issues.apache.org/jira/browse/HDFS-13126 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, webhdfs Affects Versions: 2.7.0 Reporter: Erik Krogen Due to HDFS-7279, starting in 2.7.0, the DataNode HTTP Request logs no longer include WebHDFS requests because the HTTP Request handling is done internal to {{HttpServer2}}, which is no longer used. If the request logging is enabled, we should add a Netty [LoggingHandler|https://netty.io/4.0/api/io/netty/handler/logging/LoggingHandler.html] to the ChannelPipeline for the http(s) servers used by the DataNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13125) Improve efficiency of JN -> Standby Pipeline Under Frequent Edit Tailing
Erik Krogen created HDFS-13125: -- Summary: Improve efficiency of JN -> Standby Pipeline Under Frequent Edit Tailing Key: HDFS-13125 URL: https://issues.apache.org/jira/browse/HDFS-13125 Project: Hadoop HDFS Issue Type: Improvement Components: journal-node, namenode Reporter: Erik Krogen Assignee: Erik Krogen The current edit tailing pipeline is designed for * High resiliency * High throughput and was _not_ designed for low latency. It was designed under the assumption that each edit log segment would typically be read all at once, e.g. on startup or the SbNN tailing the entire thing after it is finalized. The ObserverNode should be reading constantly from the JournalNodes' in-progress edit logs with low latency, to reduce the lag time from when a transaction is committed on the ANN and when it is visible on the ObserverNode. Due to the critical nature of this pipeline to the health of HDFS, it would be better not to redesign it altogether. Based on some experiments it seems if we mitigate the following issues, lag times are reduced to low levels (low hundreds of milliseconds even under very high write load): * The overhead of creating a new HTTP connection for each time new edits are fetched. This makes sense when you're expecting to tail an entire segment; it does not when you may only be fetching a small number of edits. We can mitigate this by allowing edits to be tailed via an RPC call, or by adding a connection pool for the existing connections to the journal. * The overhead of transmitting a whole file at once. Right now when an edit segment is requested, the JN sends the entire segment, and on the SbNN it will ignore edits up to the ones it wants. How to solve this one may be more tricky, but one suggestion would be to keep recently logged edits in memory, avoiding the need to serve them from file at all, allowing the JN to quickly serve only the required edits. We can implement these as optimizations on top of the existing logic, with fallbacks to the current slow-but-resilient pipeline. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13122) FSImage should not update quota counts on ObserverNode
Erik Krogen created HDFS-13122: -- Summary: FSImage should not update quota counts on ObserverNode Key: HDFS-13122 URL: https://issues.apache.org/jira/browse/HDFS-13122 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs, namenode Reporter: Erik Krogen Assignee: Erik Krogen Currently in {{FSImage#loadEdits()}}, after applying a set of edits, we call {code} updateCountForQuota(target.getBlockManager().getStoragePolicySuite(), target.dir.rootDir); {code} to update the quota counts for the entire namespace, which can be very expensive. This makes sense if we are about to become the ANN, since we need valid quotas, but not on an ObserverNode which does not need to enforce quotas. This is related to increasing the frequency with which the SbNN can tail edits from the ANN to decrease the lag time for transactions to appear on the Observer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12828) OIV ReverseXML Processor Fails With Escaped Characters
Erik Krogen created HDFS-12828: -- Summary: OIV ReverseXML Processor Fails With Escaped Characters Key: HDFS-12828 URL: https://issues.apache.org/jira/browse/HDFS-12828 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 2.8.0 Reporter: Erik Krogen The HDFS OIV ReverseXML processor fails if the XML file contains escaped characters: {code} ekrogen at ekrogen-ld1 in ~/dev/hadoop/trunk/hadoop-dist/target/hadoop-3.0.0-beta1-SNAPSHOT on trunk! ± $HADOOP_HOME/bin/hdfs dfs -fs hdfs://localhost:9000/ -ls / Found 4 items drwxr-xr-x - ekrogen supergroup 0 2017-11-16 14:48 /foo drwxr-xr-x - ekrogen supergroup 0 2017-11-16 14:49 /foo" drwxr-xr-x - ekrogen supergroup 0 2017-11-16 14:50 /foo` drwxr-xr-x - ekrogen supergroup 0 2017-11-16 14:49 /foo& {code} Then after doing {{saveNamespace}} on that NameNode... {code} ekrogen at ekrogen-ld1 in ~/dev/hadoop/trunk/hadoop-dist/target/hadoop-3.0.0-beta1-SNAPSHOT on trunk! ± $HADOOP_HOME/bin/hdfs oiv -i /tmp/hadoop-ekrogen/dfs/name/current/fsimage_008 -o /tmp/hadoop-ekrogen/dfs/name/current/fsimage_008.xml -p XML ekrogen at ekrogen-ld1 in ~/dev/hadoop/trunk/hadoop-dist/target/hadoop-3.0.0-beta1-SNAPSHOT on trunk! ± $HADOOP_HOME/bin/hdfs oiv -i /tmp/hadoop-ekrogen/dfs/name/current/fsimage_008.xml -o /tmp/hadoop-ekrogen/dfs/name/current/fsimage_008.xml.rev -p ReverseXML OfflineImageReconstructor failed: unterminated entity ref starting with & org.apache.hadoop.hdfs.util.XMLUtils$UnmanglingError: unterminated entity ref starting with & at org.apache.hadoop.hdfs.util.XMLUtils.unmangleXmlString(XMLUtils.java:232) at org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.loadNodeChildrenHelper(OfflineImageReconstructor.java:383) at org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.loadNodeChildrenHelper(OfflineImageReconstructor.java:379) at org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.loadNodeChildren(OfflineImageReconstructor.java:418) at org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.access$1000(OfflineImageReconstructor.java:95) at org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor$INodeSectionProcessor.process(OfflineImageReconstructor.java:524) at org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.processXml(OfflineImageReconstructor.java:1710) at org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.run(OfflineImageReconstructor.java:1765) at org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.run(OfflineImageViewerPB.java:191) at org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.main(OfflineImageViewerPB.java:134) {code} See attachments for relevant fsimage XML file. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12823) Backport HDFS-9259 "Make SO_SNDBUF size configurable at DFSClient" to branch-2.7
Erik Krogen created HDFS-12823: -- Summary: Backport HDFS-9259 "Make SO_SNDBUF size configurable at DFSClient" to branch-2.7 Key: HDFS-12823 URL: https://issues.apache.org/jira/browse/HDFS-12823 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs, hdfs-client Reporter: Erik Krogen Assignee: Erik Krogen Given the pretty significant performance implications of HDFS-9259 (see discussion in HDFS-10326) when doing transfers across high latency links, it would be helpful to have this configurability exist in the 2.7 series. Opening a new JIRA since the original HDFS-9259 has been closed for a while and there are conflicts due to a few classes moving. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12818) Support multiple storages in DataNodeCluster / SimulatedFSDataset
Erik Krogen created HDFS-12818: -- Summary: Support multiple storages in DataNodeCluster / SimulatedFSDataset Key: HDFS-12818 URL: https://issues.apache.org/jira/browse/HDFS-12818 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, test Reporter: Erik Krogen Assignee: Erik Krogen Priority: Minor Currently {{SimulatedFSDataset}} (and thus, {{DataNodeCluster}} with {{-simulated}}) only supports a single storage per {{DataNode}}. Given that the number of storages can have important implications on the performance of block report processing, it would be useful for these classes to support a multiple storage configuration. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-12803) We should not lock FsNamesystem even we operate a sub directory, we should refinement the lock
[ https://issues.apache.org/jira/browse/HDFS-12803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-12803. Resolution: Duplicate This is a longstanding request tracked in HDFS-5453 > We should not lock FsNamesystem even we operate a sub directory, we should > refinement the lock > -- > > Key: HDFS-12803 > URL: https://issues.apache.org/jira/browse/HDFS-12803 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.1, 3.0.0-alpha3 >Reporter: maobaolong > > An example: > If a client is doing mkdir or delete a file, other client will wait for the > FSNamesystem's lock to do some operation. > I think we have to refinement the lock. we can lock the parent inode only. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12746) DataNode Audit Logger
Erik Krogen created HDFS-12746: -- Summary: DataNode Audit Logger Key: HDFS-12746 URL: https://issues.apache.org/jira/browse/HDFS-12746 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, logging Reporter: Erik Krogen I would like to discuss adding in an audit logger for the Datanodes. We have audit logging on pretty much all other components: Namenode, ResourceManager, NodeManager. It seems the DN should have a similar concept to log, at minimum, all block reads/writes. I think all of the interesting information does already appear in the DN logs at INFO level but it would be nice to have a specific audit class that this gets logged through, a la {{RMAuditLogger}} and {{NMAuditLogger}}, to enable special handling. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-11707) TestDirectoryScanner#testThrottling fails on OSX
[ https://issues.apache.org/jira/browse/HDFS-11707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-11707. Resolution: Duplicate > TestDirectoryScanner#testThrottling fails on OSX > > > Key: HDFS-11707 > URL: https://issues.apache.org/jira/browse/HDFS-11707 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 2.8.0 >Reporter: Erik Krogen >Priority: Minor > > In branch-2 and trunk, {{TestDirectoryScanner#testThrottling}} consistently > fails on OS X (I'm running 10.11 specifically) with: > {code} > java.lang.AssertionError: Throttle is too permissive > {code} > It seems to work alright on Unix systems. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12533) NNThroughputBenchmark threads get stuck on UGI.getCurrentUser()
Erik Krogen created HDFS-12533: -- Summary: NNThroughputBenchmark threads get stuck on UGI.getCurrentUser() Key: HDFS-12533 URL: https://issues.apache.org/jira/browse/HDFS-12533 Project: Hadoop HDFS Issue Type: Improvement Reporter: Erik Krogen In {{NameNode#getRemoteUser()}}, it first attempts to fetch from the RPC user (not a synchronized operation), and if there is no RPC call, it will call {{UserGroupInformation#getCurrentUser()}} (which is {{synchronized}}). This makes it efficient for RPC operations (the bulk) so that there is not too much contention. In NNThroughputBenchmark, however, there is no RPC call since we bypass that later, so with a high thread count many of the threads are getting stuck. At one point I attached a profiler and found that quite a few threads had been waiting for {{#getCurrentUser()}} for 2 minutes (!). When taking this away I found some improvement in the throughput numbers I was seeing. To more closely emulate a real NN we should improve this issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12421) Balancer to emit standard metrics
Erik Krogen created HDFS-12421: -- Summary: Balancer to emit standard metrics Key: HDFS-12421 URL: https://issues.apache.org/jira/browse/HDFS-12421 Project: Hadoop HDFS Issue Type: Improvement Components: balancer & mover Reporter: Erik Krogen Assignee: Erik Krogen Priority: Minor The Balancer currently prints some statistics about its operation to stdout while it is running. This is fine if the balancer is manually run via CLI by an operator, but for the more common case of it being a scheduled execution, it is cumbersome to have to track down the logs to be able to monitor its progress. We already have a standard metrics system in place; I propose that we have the Balancer emit metrics while it is running so that they can be tracked via standard metrics infrastructure. We can start with just the things that the balancer already prints to stdout: bytes already moved, bytes left to move, bytes currently being moved, and iteration number. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-12131) Add some of the FSNamesystem JMX values as metrics
[ https://issues.apache.org/jira/browse/HDFS-12131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen reopened HDFS-12131: > Add some of the FSNamesystem JMX values as metrics > -- > > Key: HDFS-12131 > URL: https://issues.apache.org/jira/browse/HDFS-12131 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Minor > Fix For: 2.9.0, 3.0.0-beta1, 2.8.3 > > Attachments: HDFS-12131.000.patch, HDFS-12131.001.patch, > HDFS-12131.002.patch, HDFS-12131.002.patch, HDFS-12131.003.patch, > HDFS-12131.004.patch, HDFS-12131.005.patch, HDFS-12131.006.patch, > HDFS-12131-branch-2.006.patch, HDFS-12131-branch-2.8.006.patch > > > A number of useful numbers are emitted via the FSNamesystem JMX, but not > through the metrics system. These would be useful to be able to track over > time, e.g. to alert on via standard metrics systems or to view trends and > rate changes: > * NumLiveDataNodes > * NumDeadDataNodes > * NumDecomLiveDataNodes > * NumDecomDeadDataNodes > * NumDecommissioningDataNodes > * NumStaleStorages > * VolumeFailuresTotal > * EstimatedCapacityLostTotal > * NumInMaintenanceLiveDataNodes > * NumInMaintenanceDeadDataNodes > * NumEnteringMaintenanceDataNodes > This is a simple change that just requires annotating the JMX methods with > {{@Metric}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12342) Differentiate webhdfs vs. swebhdfs calls in audit log
Erik Krogen created HDFS-12342: -- Summary: Differentiate webhdfs vs. swebhdfs calls in audit log Key: HDFS-12342 URL: https://issues.apache.org/jira/browse/HDFS-12342 Project: Hadoop HDFS Issue Type: Improvement Components: logging Reporter: Erik Krogen Assignee: Erik Krogen Currently the audit log only logs {{webhdfs}} vs {{rpc}} as the {{proto}}. It is useful to be able to audit whether certain commands were carried out via webhdfs or swebhdfs as this has different security and potentially performance implications. We have been running this internally for a while and have found it useful for looking at usage patterns. Proposal is just to continue logging {{webhdfs}} as the proto for {{http}} WebHDFS commands, but log {{swebhdfs}} for SWebHDFS (over {{https}}). This will be incompatible. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12219) Javadoc for FSNamesystem#getMaxObjects is incorrect
Erik Krogen created HDFS-12219: -- Summary: Javadoc for FSNamesystem#getMaxObjects is incorrect Key: HDFS-12219 URL: https://issues.apache.org/jira/browse/HDFS-12219 Project: Hadoop HDFS Issue Type: Bug Reporter: Erik Krogen Assignee: Erik Krogen Priority: Trivial The Javadoc states that this represents the total number of objects in the system, but it really represents the maximum allowed number of objects (as correctly stated on the Javadoc for {{FSNamesystemMBean#getMaxObjects()}}). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12160) Fix broken NameNode metrics documentation
Erik Krogen created HDFS-12160: -- Summary: Fix broken NameNode metrics documentation Key: HDFS-12160 URL: https://issues.apache.org/jira/browse/HDFS-12160 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0-alpha4, 2.8.0 Reporter: Erik Krogen Assignee: Erik Krogen Priority: Trivial HDFS-11261 introduced documentation for the metrics added in HDFS-10872. The metrics have a pipe ({{|}}) in them which breaks the markdown table. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12131) Add some of the FSNamesystem JMX values as metrics
Erik Krogen created HDFS-12131: -- Summary: Add some of the FSNamesystem JMX values as metrics Key: HDFS-12131 URL: https://issues.apache.org/jira/browse/HDFS-12131 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs, namenode Reporter: Erik Krogen Assignee: Erik Krogen Priority: Minor A number of useful numbers are emitted via the FSNamesystem JMX, but not through the metrics system. These would be useful to be able to track over time, e.g. to alert on via standard metrics systems or to view trends and rate changes: * NumLiveDataNodes * NumDeadDataNodes * NumDecomLiveDataNodes * NumDecomDeadDataNodes * NumDecommissioningDataNodes * NumStaleStorages This is a simple change that just requires annotating the JMX methods with {{@Metric}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-12004) Namenode UI continues to list DNs that have been removed from include and exclude
[ https://issues.apache.org/jira/browse/HDFS-12004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-12004. Resolution: Duplicate > Namenode UI continues to list DNs that have been removed from include and > exclude > - > > Key: HDFS-12004 > URL: https://issues.apache.org/jira/browse/HDFS-12004 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.3.0 >Reporter: Erik Krogen >Priority: Minor > > Initially in HDFS, after a DN was decommission and subsequently removed from > the exclude file (thus removing all references to it), it would still appear > in the NN UI as a "dead" node until the NN was restarted. In HDFS-1773, > discussion about this was had, and it was decided that the web UI should not > show these nodes. However when HDFS-5334 went through and the NN web UI was > reimplemented client-side, the behavior reverted back to pre-HDFS-1773, and > dead+decommissioned nodes once again showed in the dead list. This can be > operationally confusing for the same reasons as discussed in HDFS-1773. > I would like to open this discussion to determine if the regression was > intentional or if we should carry forward the logic implemented HDFS-1773 > into the new UI. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12004) Namenode UI continues to list DNs that have been removed from include and exclude
Erik Krogen created HDFS-12004: -- Summary: Namenode UI continues to list DNs that have been removed from include and exclude Key: HDFS-12004 URL: https://issues.apache.org/jira/browse/HDFS-12004 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Erik Krogen Priority: Minor Initially in HDFS, after a DN was decommission and subsequently removed from the exclude file (thus removing all references to it), it would still appear in the NN UI as a "dead" node until the NN was restarted. In HDFS-1773, discussion about this was had, and it was decided that the web UI should not show these nodes. However when HDFS-5334 went through and the NN web UI was reimplemented client-side, the behavior reverted back to pre-HDFS-1773, and dead+decommissioned nodes once again showed in the dead list. This can be operationally confusing for the same reasons as discussed in HDFS-1773. I would like to open this discussion to determine if the regression was intentional or if we should carry forward the logic implemented HDFS-1773 into the new UI. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11717) Add unit test for HDFS-11709
Erik Krogen created HDFS-11717: -- Summary: Add unit test for HDFS-11709 Key: HDFS-11717 URL: https://issues.apache.org/jira/browse/HDFS-11717 Project: Hadoop HDFS Issue Type: Task Components: ha, namenode Affects Versions: 2.9.0, 2.7.4, 3.0.0-alpha3, 2.8.1 Reporter: Erik Krogen Assignee: Erik Krogen Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11707) TestDirectoryScanner#testThrottling fails on OSX
Erik Krogen created HDFS-11707: -- Summary: TestDirectoryScanner#testThrottling fails on OSX Key: HDFS-11707 URL: https://issues.apache.org/jira/browse/HDFS-11707 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 2.8.0 Reporter: Erik Krogen Priority: Minor In branch-2 and trunk, {{TestDirectoryScanner#testThrottling}} consistently fails on OS X (I'm running 10.11 specifically) with: {code} java.lang.AssertionError: Throttle is too permissive {code} It seems to work alright on Unix systems. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11615) FSNamesystemLock metrics can be inaccurate due millisecond precision
Erik Krogen created HDFS-11615: -- Summary: FSNamesystemLock metrics can be inaccurate due millisecond precision Key: HDFS-11615 URL: https://issues.apache.org/jira/browse/HDFS-11615 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 2.7.4 Reporter: Erik Krogen Assignee: Erik Krogen Currently the {{FSNamesystemLock}} metrics created in HDFS-10872 track the lock hold time using {{Timer.monotonicNow()}}, which has millisecond-level precision. However, many of these operations hold the lock for less than a millisecond, making these metrics inaccurate. We should instead use {{System.nanoTime()}} for higher accuracy. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11352) Potential deadlock in NN when failing over
Erik Krogen created HDFS-11352: -- Summary: Potential deadlock in NN when failing over Key: HDFS-11352 URL: https://issues.apache.org/jira/browse/HDFS-11352 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.4, 2.6.6 Reporter: Erik Krogen Assignee: Erik Krogen HDFS-11180 fixed a general class of deadlock that can occur when failing over between the MetricsSystemImpl and FSEditLog (see comments on that JIRA for more details). In trunk and branch-2/branch-2.8 this fix was successful by making the metrics calls not synchronize on FSEditLog. In branch-2.6 and branch-2.7 there is one more method, {{FSNamesystem#getTransactionsSinceLastCheckpoint}}, which still requires the lock on FSEditLog and thus can result in the same deadlock scenario. This can be seen by running {{TestFSNamesystemMBean#testWithFSEditLogLock}} _with the patch in HDFS-11290_ on either of these branches (it fails currently). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11208) Deadlock in WebHDFS on shutdown
Erik Krogen created HDFS-11208: -- Summary: Deadlock in WebHDFS on shutdown Key: HDFS-11208 URL: https://issues.apache.org/jira/browse/HDFS-11208 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 3.0.0-alpha1, 2.6.5, 2.7.3, 2.8.0 Reporter: Erik Krogen Assignee: Erik Krogen Currently on the client side if the {{DelegationTokenRenewer}} attempts to renew a WebHdfs delegation token while the client system is shutting down (i.e. {{FileSystem.Cache.ClientFinalizer}} is running) a deadlock may occur. This happens because {{ClientFinalizer}} calls {{FileSystem.Cache.closeAll()}} which first takes a lock on the {{FileSystem.Cache}} object and then locks each file system in the cache as it iterates over them. {{DelegationTokenRenewer}} takes a lock on a filesystem object while it is renewing that filesystem's token, but within {{TokenAspect.TokenManager.renew()}} (used for renewal of WebHdfs tokens) {{FileSystem.get}} is called, which in turn takes a lock on the FileSystem cache object, potentially causing deadlock if {{ClientFinalizer}} is currently running. See below for example deadlock output: {code} Found one Java-level deadlock: = "Thread-8572": waiting to lock monitor 0x7eff401f9878 (object 0x00051ec3f930, a dali.hdfs.web.WebHdfsFileSystem), which is held by "FileSystem-DelegationTokenRenewer" "FileSystem-DelegationTokenRenewer": waiting to lock monitor 0x7f005c08f5c8 (object 0x00050389c8b8, a dali.fs.FileSystem$Cache), which is held by "Thread-8572" Java stack information for the threads listed above: === "Thread-8572": at dali.hdfs.web.WebHdfsFileSystem.close(WebHdfsFileSystem.java:864) - waiting to lock <0x00051ec3f930> (a dali.hdfs.web.WebHdfsFileSystem) at dali.fs.FilterFileSystem.close(FilterFileSystem.java:449) at dali.fs.FileSystem$Cache.closeAll(FileSystem.java:2407) - locked <0x00050389c8b8> (a dali.fs.FileSystem$Cache) at dali.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2424) - locked <0x00050389c8d0> (a dali.fs.FileSystem$Cache$ClientFinalizer) at dali.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) "FileSystem-DelegationTokenRenewer": at dali.fs.FileSystem$Cache.getInternal(FileSystem.java:2343) - waiting to lock <0x00050389c8b8> (a dali.fs.FileSystem$Cache) at dali.fs.FileSystem$Cache.get(FileSystem.java:2332) at dali.fs.FileSystem.get(FileSystem.java:369) at dali.hdfs.web.TokenAspect$TokenManager.getInstance(TokenAspect.java:92) at dali.hdfs.web.TokenAspect$TokenManager.renew(TokenAspect.java:72) at dali.security.token.Token.renew(Token.java:373) at dali.fs.DelegationTokenRenewer$RenewAction.renew(DelegationTokenRenewer.java:127) - locked <0x00051ec3f930> (a dali.hdfs.web.WebHdfsFileSystem) at dali.fs.DelegationTokenRenewer$RenewAction.access$300(DelegationTokenRenewer.java:57) at dali.fs.DelegationTokenRenewer.run(DelegationTokenRenewer.java:258) Found 1 deadlock. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11021) Add FSNamesystemLock metrics for BlockManager operations
Erik Krogen created HDFS-11021: -- Summary: Add FSNamesystemLock metrics for BlockManager operations Key: HDFS-11021 URL: https://issues.apache.org/jira/browse/HDFS-11021 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Erik Krogen Assignee: Erik Krogen Right now the operations which the {{BlockManager}} issues to the {{Namesystem}} will not emit metrics about which operation caused the {{FSNamesystemLock}} to be held; they are all grouped under "OTHER". We should fix this since the {{BlockManager}} creates many acquisitions of both the read and write locks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10896) Move lock logging logic from FSNamesystem into FSNamesystemLock
Erik Krogen created HDFS-10896: -- Summary: Move lock logging logic from FSNamesystem into FSNamesystemLock Key: HDFS-10896 URL: https://issues.apache.org/jira/browse/HDFS-10896 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Erik Krogen Assignee: Erik Krogen There are a number of tickets (HDFS-10742, HDFS-10817, HDFS-10713, this subtask's story HDFS-10475) which are adding/improving logging/metrics around the {{FSNamesystemLock}}. All of this is done in {{FSNamesystem}} right now, which is polluting the namesystem with ThreadLocal variables, timing counters, etc. which are only relevant to the lock itself and the number of these increases as the logging/metrics become more sophisticated. It would be best to move these all into FSNamesystemLock to keep the metrics/logging tied directly to the item of interest. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org