[jira] [Created] (HDFS-15404) ShellCommandFencer should expose info about source
Chen Liang created HDFS-15404: - Summary: ShellCommandFencer should expose info about source Key: HDFS-15404 URL: https://issues.apache.org/jira/browse/HDFS-15404 Project: Hadoop HDFS Issue Type: Improvement Reporter: Chen Liang Assignee: Chen Liang Currently the HA fencing logic in ShellCommandFencer exposes environment variable about only the fencing target. i.e. the $target_* variables as mentioned in this [document page|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html]). But here only the fencing target variables are getting exposed. Sometimes it is useful to expose info about the fencing source node. One use case is would allow source and target node to identify themselves separately and run different commands/scripts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15293) Relax FSImage upload time delta check restriction
Chen Liang created HDFS-15293: - Summary: Relax FSImage upload time delta check restriction Key: HDFS-15293 URL: https://issues.apache.org/jira/browse/HDFS-15293 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Chen Liang Assignee: Chen Liang HDFS-12979 introduced the logic that, if ANN sees consecutive fs image upload from Standby with a small delta comparing to previous fsImage. ANN would reject this image. This is to avoid overly frequent fsImage in case of when there are multiple Standby node. However this check could be too stringent. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15197) Change ObserverRetryOnActiveException log to debug
Chen Liang created HDFS-15197: - Summary: Change ObserverRetryOnActiveException log to debug Key: HDFS-15197 URL: https://issues.apache.org/jira/browse/HDFS-15197 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Reporter: Chen Liang Assignee: Chen Liang Currently in ObserverReadProxyProvider, when a ObserverRetryOnActiveException happens, ObserverReadProxyProvider logs a message at INFO level. This can be a large volume of logs in some scenarios. For example, when some job tries to access lots of files that haven't been accessed for a long time, all these accesses may trigger atime updates, which led to ObserverRetryOnActiveException. We should change this log to DEBUG. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15153) TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang resolved HDFS-15153. --- Resolution: Duplicate > TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT fails > intermittently > --- > > Key: HDFS-15153 > URL: https://issues.apache.org/jira/browse/HDFS-15153 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > > The unit TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT is > failing consistently. Seems this is due to a log message change. We should > fix it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15153) TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT fails intermittently
Chen Liang created HDFS-15153: - Summary: TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT fails intermittently Key: HDFS-15153 URL: https://issues.apache.org/jira/browse/HDFS-15153 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Chen Liang Assignee: Chen Liang The unit TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT is failing consistently. Seems this is due to a log message change. We should fix it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15148) dfs.namenode.send.qop.enabled should not apply to primary NN port
Chen Liang created HDFS-15148: - Summary: dfs.namenode.send.qop.enabled should not apply to primary NN port Key: HDFS-15148 URL: https://issues.apache.org/jira/browse/HDFS-15148 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.10.1, 3.3.1 Reporter: Chen Liang Assignee: Chen Liang In HDFS-13617, NameNode can be configured to wrap its established QOP into block access token as an encrypted message. Later on DataNode will use this message to create SASL connection. But this new behavior should only apply to new auxiliary NameNode ports, not the primary port (the one configured in fs.defaultFS), as it may cause conflicting behavior with existing other SASL related configuration (e.g. dfs.data.transfer.protection). Since this configure is introduced for to auxiliary ports only, we should restrict this new behavior to not apply to primary port. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14991) Backport HDFS-14346 Better time precision in getTimeDuration to branch-2
Chen Liang created HDFS-14991: - Summary: Backport HDFS-14346 Better time precision in getTimeDuration to branch-2 Key: HDFS-14991 URL: https://issues.apache.org/jira/browse/HDFS-14991 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Reporter: Chen Liang Assignee: Chen Liang This is to backport HDFS-14346 to branch 2, as Standby reads in branch-2 requires being able to properly specify ms time granularity for Edit log tailing. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14941) Potential editlog race condition can cause corrupted file
Chen Liang created HDFS-14941: - Summary: Potential editlog race condition can cause corrupted file Key: HDFS-14941 URL: https://issues.apache.org/jira/browse/HDFS-14941 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Chen Liang Recently we encountered an issue that, after a failover, NameNode complains corrupted file/missing blocks. The blocks did recover after full block reports, so the blocks are not actually missing. After further investigation, we believe this is what happened: First of all, on SbN, it is possible that it receives block reports before corresponding edit tailing happened. In which case SbN postpones processing the DN block report, handled by the guarding logic below: {code:java} if (shouldPostponeBlocksFromFuture && namesystem.isGenStampInFuture(iblk)) { queueReportedBlock(storageInfo, iblk, reportedState, QUEUE_REASON_FUTURE_GENSTAMP); continue; } {code} Basically if reported block has a future generation stamp, the DN report gets requeued. However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code: {code:java} // allocate new block, record block locations in INode. newBlock = createNewBlock(); INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile); saveAllocatedBlock(src, inodesInPath, newBlock, targets); persistNewBlock(src, pendingFile); offset = pendingFile.computeFileSize(); {code} The line {{newBlock = createNewBlock();}} Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on Standby while the following line {{persistNewBlock(src, pendingFile);}} would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on Standby. Then the race condition is that, imagine Standby has just processed {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to be in different setment). Now a block report with new generation stamp comes in. Since the genstamp bump has already been processed, the reported block may not be considered as future block. So the guarding logic passes. But actually, the block hasn't been added to blockmap, because the second edit is yet to be tailed. So, the block then gets added to invalidate block list and we saw messages like: {code:java} BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file {code} Even worse, since this IBR is effectively lost, the NameNode has no information about this block, until the next full block report. So after a failover, the NN marks it as corrupt. This issue won't happen though, if both of the edit entries get tailed all together, so no IBR processing can happen in between. But in our case, we set edit tailing interval to super low (to allow Standby read), so when under high workload, there is a much much higher chance that the two entries are tailed separately, causing the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-12979) StandbyNode should upload FsImage to ObserverNode after checkpointing.
[ https://issues.apache.org/jira/browse/HDFS-12979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang reopened HDFS-12979: --- Thanks for the catch [~shv]. I've committed to branch-3.2 and branch-3.1 as there were only some imports difference. But branch-2 patch is quite different, re-open to post the patch for jenkins run. > StandbyNode should upload FsImage to ObserverNode after checkpointing. > -- > > Key: HDFS-12979 > URL: https://issues.apache.org/jira/browse/HDFS-12979 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-12979-branch-2.001.patch, HDFS-12979.001.patch, > HDFS-12979.002.patch, HDFS-12979.003.patch, HDFS-12979.004.patch, > HDFS-12979.005.patch, HDFS-12979.006.patch, HDFS-12979.007.patch, > HDFS-12979.008.patch, HDFS-12979.009.patch, HDFS-12979.010.patch, > HDFS-12979.011.patch, HDFS-12979.012.patch, HDFS-12979.013.patch, > HDFS-12979.014.patch, HDFS-12979.015.patch > > > ObserverNode does not create checkpoints. So it's fsimage file can get very > old making bootstrap of ObserverNode too long. A StandbyNode should copy > latest fsimage to ObserverNode(s) along with ANN. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14858) [SBN read] Allow configurably enable/disable AlignmentContext on NameNode
Chen Liang created HDFS-14858: - Summary: [SBN read] Allow configurably enable/disable AlignmentContext on NameNode Key: HDFS-14858 URL: https://issues.apache.org/jira/browse/HDFS-14858 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs Reporter: Chen Liang Assignee: Chen Liang As brought up under HDFS-14277, we should make sure SBN read has no performance impact when it is not enabled. One potential overhead of SBN read is maintaining and updating additional state status on NameNode. Specifically, this is done by creating/updating/checking a {{GlobalStateIdContext}} instance. Currently, even without enabling SBN read, this logic is still be checked. We can make this configurable so that when SBN read is not enabled, there is no such overhead and everything works as-is. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14822) Revisit GlobalStateIdContext locking when getting server state id
Chen Liang created HDFS-14822: - Summary: Revisit GlobalStateIdContext locking when getting server state id Key: HDFS-14822 URL: https://issues.apache.org/jira/browse/HDFS-14822 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs Reporter: Chen Liang Assignee: Chen Liang As mentioned under HDFS-14277. One potential performance issue of Observer read is that {{GlobalStateIdContext#getLastSeenStateId}} calls getCorrectLastAppliedOrWrittenTxId which ends up acquiring lock on txnid. We internally had some discussion and analysis, we believe this lock can be avoided, by calling the non-locking version method {{getLastAppliedOrWrittenTxId.}} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14806) Bootstrap standby may fail if used in-progress tailing
Chen Liang created HDFS-14806: - Summary: Bootstrap standby may fail if used in-progress tailing Key: HDFS-14806 URL: https://issues.apache.org/jira/browse/HDFS-14806 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.3.0 Reporter: Chen Liang Assignee: Chen Liang One issue we went across was that if in-progress tailing is enabled, bootstrap standby could fail. When in-progress tailing is enabled, Bootstrap uses the RPC mechanism to get edits. There is a config {{dfs.ha.tail-edits.qjm.rpc.max-txns}} that sets an upper bound on how many txnid can be included in one RPC call. The default is 5000. Meaning bootstraping NN (say NN1) can only pull at most 5000 edits from JN. However, as part of bootstrap, NN1 queries another NN (say NN2) for NN2's current transactionID, NN2 may return a state that is > 5000 txnid from NN1's current image. But NN1 can only see 5000 more txnid from JNs. At this point NN1 goes panic, because txnid retuned by JNs is behind NN2's returned state, bootstrap then fail. Increasing the value of {{dfs.ha.tail-edits.qjm.rpc.max-txns}} to some super large value allowed bootstrap to continue. But this is hardly the ideal solution. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14785) [SBN read] Change client logging to be less aggressive
Chen Liang created HDFS-14785: - Summary: [SBN read] Change client logging to be less aggressive Key: HDFS-14785 URL: https://issues.apache.org/jira/browse/HDFS-14785 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs Affects Versions: 3.1.2, 3.2.0, 2.10.0, 3.3.0 Reporter: Chen Liang Assignee: Chen Liang Currently {{ObserverReadProxyProvider}} logs a lot of information. There are states that are acceptable but {{ObserverReadProxyProvider}} still log an overwhelmingly large amount of messages. One example is that, if some NN runs an older version, the lack of {{getHAServiceState}} method in older version NN will lead to a Exception prints on every single call. We can change them to debug log. This should be minimum risk, because this is only client side, we can always enable the log back by changing to DEBUG log level on client side. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14726) Fix JN incompatibility issue in branch-2 due to backport of HDFS-10519
Chen Liang created HDFS-14726: - Summary: Fix JN incompatibility issue in branch-2 due to backport of HDFS-10519 Key: HDFS-14726 URL: https://issues.apache.org/jira/browse/HDFS-14726 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Affects Versions: 2.10.0 Reporter: Chen Liang Assignee: Chen Liang HDFS-10519 has been backported to branch-2. However HDFS-10519 introduced an incompatibility issue between NN and JN due to the new protobuf field {{committedTxnId}} in {{HdfsServer.proto}}. This field was introduced as a required field so if JN and NN are not on same version, it will run into missing field exception. Although currently we can get around by making sure JN always gets upgraded properly before NN, we can potentially fix this incompatibility by changing the field to optional. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14611) Move handshake secret field from Token to BlockAccessToken
Chen Liang created HDFS-14611: - Summary: Move handshake secret field from Token to BlockAccessToken Key: HDFS-14611 URL: https://issues.apache.org/jira/browse/HDFS-14611 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs Reporter: Chen Liang Assignee: Chen Liang Currently the handshake secret is included in Token, but conceptually this should belong to Block Access Token only. In fact, having this field in Token could potentially break compatibility. Moreover, having this field as part of Block Access Token also means we may not need to encrypt this field anymore, because block access token is already encrypted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14573) Backport Standby Read to branch-3
Chen Liang created HDFS-14573: - Summary: Backport Standby Read to branch-3 Key: HDFS-14573 URL: https://issues.apache.org/jira/browse/HDFS-14573 Project: Hadoop HDFS Issue Type: Task Components: hdfs Reporter: Chen Liang Assignee: Chen Liang This Jira tracks backporting the feature consistent read from standby (HDFS-12943) to branch-3.x, including 3.0, 3.1, 3.2. This is required for backporting to branch-2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14205) Backport HDFS-6440 to branch-2
Chen Liang created HDFS-14205: - Summary: Backport HDFS-6440 to branch-2 Key: HDFS-14205 URL: https://issues.apache.org/jira/browse/HDFS-14205 Project: Hadoop HDFS Issue Type: Improvement Reporter: Chen Liang Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 (consistent read from standby) backport to branch-2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14204) Backport HDFS-12943 to branch-2
Chen Liang created HDFS-14204: - Summary: Backport HDFS-12943 to branch-2 Key: HDFS-14204 URL: https://issues.apache.org/jira/browse/HDFS-14204 Project: Hadoop HDFS Issue Type: Improvement Reporter: Chen Liang Currently, consistent read from standby feature (HDFS-12943) is only in trunk (branch-3). This JIRA aims to backport the feature to branch-2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14142) Move ipfailover config key out of HdfsClientConfigKeys
Chen Liang created HDFS-14142: - Summary: Move ipfailover config key out of HdfsClientConfigKeys Key: HDFS-14142 URL: https://issues.apache.org/jira/browse/HDFS-14142 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang Running TestHdfsConfigFields throws error complaining missing key dfs.client.failover.ipfailover.virtual-address. Since this config key is specific to only ORFPPwithIP, This Jira moves this config prefix to ObserverReadProxyProviderWithIPFailover. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14120) ORFPP should also clone DT for the virtual IP
Chen Liang created HDFS-14120: - Summary: ORFPP should also clone DT for the virtual IP Key: HDFS-14120 URL: https://issues.apache.org/jira/browse/HDFS-14120 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-12943 Reporter: Chen Liang Assignee: Chen Liang Currently with HDFS-14017, ORFPP behaves the similar way on handling delegation as ConfiguredFailoverProxyProvider. Specifically, given the delegation token associated with name service ID, it clones the DTs for all the corresponding physical addresses. But ORFPP requires more work than CFPP in the sense that it also leverages VIP address for failover, meaning in addition to cloning DT for physical addresses, ORFPP also needs to clone DT for the VIP address, which is missed from HDFS-14017. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-13547) Add ingress port based sasl resolver
[ https://issues.apache.org/jira/browse/HDFS-13547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang resolved HDFS-13547. --- Resolution: Fixed Fix Version/s: 3.1.1 > Add ingress port based sasl resolver > > > Key: HDFS-13547 > URL: https://issues.apache.org/jira/browse/HDFS-13547 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: security >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Fix For: 3.2.0, 3.1.1 > > Attachments: HDFS-13547.001.patch, HDFS-13547.002.patch, > HDFS-13547.003.patch, HDFS-13547.004.patch > > > This Jira extends the SASL properties resolver interface to take an ingress > port parameter, and also adds an implementation based on this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14116) Fix a potential class cast error in ObserverReadProxyProvider
Chen Liang created HDFS-14116: - Summary: Fix a potential class cast error in ObserverReadProxyProvider Key: HDFS-14116 URL: https://issues.apache.org/jira/browse/HDFS-14116 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Chen Liang Currently in {{ObserverReadProxyProvider}} constructor there is this line {code} ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext); {code} This could potentially cause failure, because it is possible that factory can not be casted here. Specifically, {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the constructor will be called, and there are two paths that could call into this: (1).{{NameNodeProxies.createProxy}} (2).{{NameNodeProxiesClient.createFailoverProxyProvider}} (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses {{NameNodeHAProxyFactory}} which can not be casted to {{ClientHAProxyFactory}}, this happens when, for example, running NNThroughputBenmarck. To fix this we can at least: 1. introduce setAlignmentContext to HAProxyFactory which is the parent of both ClientHAProxyFactory and NameNodeHAProxyFactory OR 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a if check with reflection. Depending on whether it make sense to have alignment context for the case (1) calling code paths. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14035) NN status discovery does not leverage delegation token
Chen Liang created HDFS-14035: - Summary: NN status discovery does not leverage delegation token Key: HDFS-14035 URL: https://issues.apache.org/jira/browse/HDFS-14035 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang Currently ObserverReadProxyProvider uses {{HAServiceProtocol#getServiceStatus}} to get the status of each NN. {{HAServiceProtocol}} does not leverage delegation token. So when YARN node manager makes this call, token authentication will fail, causing the application to fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14017) ObserverReadProxyProviderWithIPFailover does not quite work
Chen Liang created HDFS-14017: - Summary: ObserverReadProxyProviderWithIPFailover does not quite work Key: HDFS-14017 URL: https://issues.apache.org/jira/browse/HDFS-14017 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang Currently {{ObserverReadProxyProviderWithIPFailover}} extends {{ObserverReadProxyProvider}}, and the only difference is changing the proxy factory to use {{IPFailoverProxyProvider}}. However this is not enough because when calling constructor of {{ObserverReadProxyProvider}} in super(...), the follow line: {code} nameNodeProxies = getProxyAddresses(uri, HdfsClientConfigKeys.DFS_NAMENODE_RPC_ADDRESS_KEY); {code} will try to resolve the all configured NN addresses to do configured failover. But in the case of IPFailover, this does not really apply. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14016) ObserverReadProxyProvider can never enable observer read except in tests
Chen Liang created HDFS-14016: - Summary: ObserverReadProxyProvider can never enable observer read except in tests Key: HDFS-14016 URL: https://issues.apache.org/jira/browse/HDFS-14016 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang Currently in {{ObserverReadProxyProvider#invoke}}, only when {{observerReadEnabled && isRead(method)}} is true, the code will check whether to talk to Observer. Otherwise always talk to active. The issue here is that currently it can only be set through {{setObserverReadEnabled}}, which is used by tests only. So observer read is always disabled in deployment and no way to enable it. We may want to either expose a configuration key, or hard code it to true so it can only be changed for testing purpose, or simply remove this variable. This is closely related to HDFS-13923. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13880) Add mechanism to allow certain RPC calls to bypass sync
Chen Liang created HDFS-13880: - Summary: Add mechanism to allow certain RPC calls to bypass sync Key: HDFS-13880 URL: https://issues.apache.org/jira/browse/HDFS-13880 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Chen Liang Assignee: Chen Liang Currently, every single call to NameNode will be synced, in the sense that NameNode will not process it until state id catches up. But in certain cases, we would like to bypass this check and allow the call to return immediately, even when the server id is not up to date. One case could be the to-be-added new API in HDFS-13749 that request for current state id. Others may include calls that do not promise real time responses such as {{getContentSummary}}. This Jira is to add the mechanism to allow certain calls to bypass sync. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13767) Add msync server implementation.
Chen Liang created HDFS-13767: - Summary: Add msync server implementation. Key: HDFS-13767 URL: https://issues.apache.org/jira/browse/HDFS-13767 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Chen Liang Assignee: Chen Liang This is a followup on HDFS-13688, where msync API is introduced to {{ClientProtocol}} but the server side implementation is missing. This is Jira is to implement the server side logic. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13699) Add DFSClient sending handshake token to DataNode, and allow DataNode overwrite downstream QOP
Chen Liang created HDFS-13699: - Summary: Add DFSClient sending handshake token to DataNode, and allow DataNode overwrite downstream QOP Key: HDFS-13699 URL: https://issues.apache.org/jira/browse/HDFS-13699 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang Given the other Jiras under HDFS-13541, this Jira is to allow DFSClient to redirect the encrypt secret to DataNode. The encrypted message is the QOP that client and NameNode have used. DataNode decrypts the message and enforce the QOP for the client connection. Also, this Jira will also include overwriting downstream QOP, as mentioned in the HDFS-13541 design doc. Namely, this is to allow inter-DN QOP that is different from client-DN QOP. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13688) Introduce msync API call
Chen Liang created HDFS-13688: - Summary: Introduce msync API call Key: HDFS-13688 URL: https://issues.apache.org/jira/browse/HDFS-13688 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang As mentioned in the design doc in HDFS-12943, to ensure consistent read, we need to introduce an RPC call {{msync}}. Specifically, client can issue a msync call to Observer node along with a transactionID. The msync will only return when the Observer's transactionID has caught up to the given ID. This JIRA is to add this API. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13617) Allow wrapping NN QOP into token in encrypted message
Chen Liang created HDFS-13617: - Summary: Allow wrapping NN QOP into token in encrypted message Key: HDFS-13617 URL: https://issues.apache.org/jira/browse/HDFS-13617 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang Attachments: HDFS-13617.001.patch This Jira allows NN to configurably wrap the QOP it has established with the client into the token message sent back to the client. The QOP is sent back in encrypted message, using BlockAccessToken encryption key as the key. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13566) Add configurable additional RPC listener to NameNode
Chen Liang created HDFS-13566: - Summary: Add configurable additional RPC listener to NameNode Key: HDFS-13566 URL: https://issues.apache.org/jira/browse/HDFS-13566 Project: Hadoop HDFS Issue Type: Sub-task Components: ipc Reporter: Chen Liang Assignee: Chen Liang This Jira aims to add the capability to NameNode to run additional listener(s). Such that NameNode can be accessed from multiple ports. Fundamentally, this Jira tries to extend ipc.Server to allow configured with more listeners, binding to different ports, but sharing the same call queue and the handlers. Useful when different clients are only allowed to access certain different ports. Combined with HDFS-13547, this also allows different ports to have different SASL security levels. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13547) Add ingress port based sasl resolver
Chen Liang created HDFS-13547: - Summary: Add ingress port based sasl resolver Key: HDFS-13547 URL: https://issues.apache.org/jira/browse/HDFS-13547 Project: Hadoop HDFS Issue Type: Sub-task Components: security Reporter: Chen Liang Assignee: Chen Liang This Jira extends the SASL properties resolver interface to take an ingress port parameter, and also adds an implementation based on this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13541) NameNode Port based selective encryption
Chen Liang created HDFS-13541: - Summary: NameNode Port based selective encryption Key: HDFS-13541 URL: https://issues.apache.org/jira/browse/HDFS-13541 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode, security Reporter: Chen Liang Assignee: Chen Liang Attachments: NameNode Port based selective encryption-v1.pdf Here at LinkedIn, one issue we face is that we need to enforce different security requirement based on the location of client and the cluster. Specifically, for clients from outside of the data center, it is required by regulation that all traffic must be encrypted. But for clients within the same data center, unencrypted connections are more desired to avoid the high encryption overhead. HADOOP-10221 introduced pluggable SASL resolver, based on which HADOOP-10335 introduced WhitelistBasedResolver which solves the same problem. However we found it difficult to fit into our environment for several reasons. In this JIRA, on top of pluggable SASL resolver, *we propose a different approach of running RPC two ports on NameNode, and the two ports will be enforcing encrypted and unencrypted connections respectively, and the following DataNode access will simply follow the same behaviour of encryption/unencryption*. Then by blocking unencrypted port on datacenter firewall, we can completely block unencrypted external access. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12989) Ozone: Container : Add key versioning support-4
Chen Liang created HDFS-12989: - Summary: Ozone: Container : Add key versioning support-4 Key: HDFS-12989 URL: https://issues.apache.org/jira/browse/HDFS-12989 Project: Hadoop HDFS Issue Type: Improvement Components: ozone Reporter: Chen Liang Assignee: Chen Liang After HDFS-12925 and HDFS-12954 get added, every key write call will generate a new version, and we will be able to read any specific version of a key. This JIRA adds a new newKeyReader API {{newKeyReaderWithVersion}} to {{StorageHandler}}. This method takes an extra version field such that the caller can read any older version. This JIRA also adds all the other changes needed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12958) Ozone: remove setAllocatedBytes method in ContainerInfo
Chen Liang created HDFS-12958: - Summary: Ozone: remove setAllocatedBytes method in ContainerInfo Key: HDFS-12958 URL: https://issues.apache.org/jira/browse/HDFS-12958 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Priority: Minor We may want to remove {{setAllocatedBytes}} method from {{ContainerInfo}} and we keep all fields of {{ContainerInfo}} immutable, such that client won't accidentally change {{ContainerInfo}} and rely on the changed instance. An alternative of having {{setAllocatedBytes}} is to always create a new {{ContainerInfo}} instance whenever it needs to be changed. This is based on [this comment|https://issues.apache.org/jira/browse/HDFS-12751?focusedCommentId=16299750=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16299750] from HDFS-12751. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12954) Ozone: Container : Add key versioning support-3
Chen Liang created HDFS-12954: - Summary: Ozone: Container : Add key versioning support-3 Key: HDFS-12954 URL: https://issues.apache.org/jira/browse/HDFS-12954 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Affects Versions: HDFS-7240 Reporter: Chen Liang Assignee: Chen Liang A new version of a key is effectively overwriting some consecutive range of bytes in the entire key offset range. For each version, we need to keep exactly what the range is in order for the IO vector to work. Currently, since we only write from the start (offset = 0), so offset range of a version is only up to the key data size field when the version gets committed. But currently we only keep one single key data size variable.(see {{KeyManagerImpl#commitKey}}). We need to know the corresponding key data size for each version. This JIRA is to the tracking of offset range for each version. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12925) Ozone: Container : Add key versioning support-2
Chen Liang created HDFS-12925: - Summary: Ozone: Container : Add key versioning support-2 Key: HDFS-12925 URL: https://issues.apache.org/jira/browse/HDFS-12925 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Affects Versions: HDFS-7240 Reporter: Chen Liang Assignee: Chen Liang One component for versioning is assembling read IO vector, (please see 4.2 section of the versioning design doc HDFS-12000 for the detail). This JIRA adds the util functions that takes a list with blocks from different versions and properly generate the read vector for the requested version. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-12265) Ozone : better handling of operation fail due to chill mode
[ https://issues.apache.org/jira/browse/HDFS-12265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang resolved HDFS-12265. --- Resolution: Fixed Release Note: Looks like this has been handled as part of HDFS-12387, close this JIRA. > Ozone : better handling of operation fail due to chill mode > --- > > Key: HDFS-12265 > URL: https://issues.apache.org/jira/browse/HDFS-12265 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Chen Liang >Priority: Minor > Labels: OzonePostMerge > > Currently if someone tries to create a container while SCM is in chill mode, > there will be exception of INTERNAL_ERROR, which is not very informative and > can be confusing for debugging. > We should make it easier to identify problems caused by chill mode. For > example, we may detect if SCM is in chill mode and report back to client in > some way, such that the client can backup and try again later. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12879) Ozone : add scm init command to document.
Chen Liang created HDFS-12879: - Summary: Ozone : add scm init command to document. Key: HDFS-12879 URL: https://issues.apache.org/jira/browse/HDFS-12879 Project: Hadoop HDFS Issue Type: Improvement Components: ozone Reporter: Chen Liang Priority: Minor When an Ozone cluster is initialized, before starting SCM through {{hdfs --daemon start scm}}, the command {{hdfs scm -init}} needs to be called first. But seems this command is not being documented. We should add this note to document. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12793) Ozone : TestSCMCli is failing consistently
Chen Liang created HDFS-12793: - Summary: Ozone : TestSCMCli is failing consistently Key: HDFS-12793 URL: https://issues.apache.org/jira/browse/HDFS-12793 Project: Hadoop HDFS Issue Type: Bug Components: ozone Reporter: Chen Liang Assignee: Chen Liang In the Jenkins build of HDFS-12787 and HDFS-12758, there are same three tests in {{TestSCMCli}} that failed: {{testCloseContainer}}, {{testDeleteContainer}} and {{testInfoContainer}}. I tested locally, these three tests have been failing consistently. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12626) Ozone : delete open key entries that will no longer be closed
Chen Liang created HDFS-12626: - Summary: Ozone : delete open key entries that will no longer be closed Key: HDFS-12626 URL: https://issues.apache.org/jira/browse/HDFS-12626 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang HDFS-12543 introduced the notion of "open key" where when a key is opened, an open key entry gets persisted, only after client calls a close will this entry be made visible. One issue is that if the client does not call close (e.g. failed), then that open key entry will never be deleted from meta data. This JIRA tracks this issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12543) Ozone : allow create key without specifying size
Chen Liang created HDFS-12543: - Summary: Ozone : allow create key without specifying size Key: HDFS-12543 URL: https://issues.apache.org/jira/browse/HDFS-12543 Project: Hadoop HDFS Issue Type: Improvement Reporter: Chen Liang Assignee: Chen Liang Currently when creating a key, it is required to specify the total size of the key. This makes it inconvenient for the case where a key is created and data keeps coming and being appended. This JIRA is remove the requirement of specifying the size on key creation, and allows appending to the key indefinitely. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12475) Ozone : add document for port sharing with WebHDFS
Chen Liang created HDFS-12475: - Summary: Ozone : add document for port sharing with WebHDFS Key: HDFS-12475 URL: https://issues.apache.org/jira/browse/HDFS-12475 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Chen Liang Currently Ozone's REST API uses the port 9864, all commands mentioned in OzoneCommandShell.md use the address localhost:9864. This port was used by WebHDFS and is now shared by Ozone. The value is controlled by the config key {{dfs.datanode.http.address}}. We should document this information in {{OzoneCommandShell.md}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-12268) Ozone: Add metrics for pending storage container requests
[ https://issues.apache.org/jira/browse/HDFS-12268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang reopened HDFS-12268: --- > Ozone: Add metrics for pending storage container requests > - > > Key: HDFS-12268 > URL: https://issues.apache.org/jira/browse/HDFS-12268 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Labels: ozoneMerge > Attachments: HDFS-12268-HDFS-7240.001.patch, > HDFS-12268-HDFS-7240.002.patch, HDFS-12268-HDFS-7240.003.patch, > HDFS-12268-HDFS-7240.004.patch, HDFS-12268-HDFS-7240.005.patch, > HDFS-12268-HDFS-7240.006.patch, HDFS-12268-HDFS-7240.007.patch > > > As storage container async interface has been supported after HDFS-11580, we > need to keep an eye on the queue depth of pending container requests. It can > help us better found if there are some performance problems. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12454) Ozone : the sample ozone-site.xml in OzoneGettingStarted does not work
Chen Liang created HDFS-12454: - Summary: Ozone : the sample ozone-site.xml in OzoneGettingStarted does not work Key: HDFS-12454 URL: https://issues.apache.org/jira/browse/HDFS-12454 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang In OzoneGettingStarted.md there is a sample ozone-site.xml file. But there are a few issues with it. 1. {code} ozone.scm.block.client.address scm.hadoop.apache.org ozone.ksm.address ksm.hadoop.apache.org {code} The value should be an address instead. 2. {{datanode.ObjectStoreHandler.(ObjectStoreHandler.java:103)}} requires {{ozone.scm.client.address}} to be set, which is missing from this sample file. Missing this config will seem to cause failure on starting datanode. 3. {code} ozone.scm.names scm.hadoop.apache.org {code} This value did not make much sense to, I found the comment in {{ScmConfigKeys}} that says {code} // ozone.scm.names key is a set of DNS | DNS:PORT | IP Address | IP:PORT. // Written as a comma separated string. e.g. scm1, scm2:8020, 7.7.7.7: {code} So maybe we should write something like scm1 as value here. 4. I'm not entirely sure about this, but [here|https://wiki.apache.org/hadoop/Ozone#Configuration] it says {code} ozone.handler.type local {code} is also part of minimum setting, do we need to add this [~anu]? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12368) [branch-2] Enable DFSNetworkTopology as default
Chen Liang created HDFS-12368: - Summary: [branch-2] Enable DFSNetworkTopology as default Key: HDFS-12368 URL: https://issues.apache.org/jira/browse/HDFS-12368 Project: Hadoop HDFS Issue Type: Improvement Reporter: Chen Liang Assignee: Chen Liang This JIRA is to backport HDFS-11998 to branch-2. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12346) [branch-2] Combine the old and the new chooseRandom for better performance
Chen Liang created HDFS-12346: - Summary: [branch-2] Combine the old and the new chooseRandom for better performance Key: HDFS-12346 URL: https://issues.apache.org/jira/browse/HDFS-12346 Project: Hadoop HDFS Issue Type: Improvement Reporter: Chen Liang Assignee: Chen Liang This JIRA is to backport HDFS-11577 back to branch-2. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12334) [branch-2] Add storage type demand to into DFSNetworkTopology#chooseRandom
Chen Liang created HDFS-12334: - Summary: [branch-2] Add storage type demand to into DFSNetworkTopology#chooseRandom Key: HDFS-12334 URL: https://issues.apache.org/jira/browse/HDFS-12334 Project: Hadoop HDFS Issue Type: Bug Reporter: Chen Liang Assignee: Chen Liang This JIRA is to backport HDFS-11514 to branch-2. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12325) SFTPFileSystem operations should restore cwd
Chen Liang created HDFS-12325: - Summary: SFTPFileSystem operations should restore cwd Key: HDFS-12325 URL: https://issues.apache.org/jira/browse/HDFS-12325 Project: Hadoop HDFS Issue Type: Bug Reporter: Chen Liang Assignee: Chen Liang We've seen a case where writing to {{SFTPFileSystem}} led to unexpected behaviour: Given a directory ./data with more than one files in it, the steps it took to get this error was simply: {code} hdfs dfs -fs sftp://x.y.z -mkdir dir0 hdfs dfs -fs sftp://x.y.z -copyFromLocal data dir0 hdfs dfs -fs sftp://x.y.z -ls -R dir0 {code} But not all files show up as in the ls output, in fact more often just one single file shows up in that path... Digging deeper, we found that rename, mkdirs and create operations in {{SFTPFileSystem}} are changing the current working directory during it's execution. For example in create there are: {code} client.cd(parent.toUri().getPath()); os = client.put(f.getName()); {code} The issue here is {{SFTPConnectionPool}} is caching SFTP sessions (in {{idleConnections}}), which contains their current working directory. So after these operations, the sessions will be put back to cache with a changed working directory. This accumulates in each call and ends up causing unexpected weird behaviour. Basically this error happens when processing multiple file system objects in one operation, and relative path is being used. The fix here is to restore the current working directory of the SFTP sessions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12322) [branch-2] Add storage type demand to into DFSNetworkTopology#chooseRandom.
Chen Liang created HDFS-12322: - Summary: [branch-2] Add storage type demand to into DFSNetworkTopology#chooseRandom. Key: HDFS-12322 URL: https://issues.apache.org/jira/browse/HDFS-12322 Project: Hadoop HDFS Issue Type: Bug Reporter: Chen Liang Assignee: Chen Liang This JIRA is to backport HDFS-11482 to branch-2. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12321) Ozone : debug cli: add support to load user-provided SQL query
Chen Liang created HDFS-12321: - Summary: Ozone : debug cli: add support to load user-provided SQL query Key: HDFS-12321 URL: https://issues.apache.org/jira/browse/HDFS-12321 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang Fix For: ozone This JIRA extends SQL CLI to support loading a user-provided file that includes any sql query the user wants to run on the SQLite db obtained by converting Ozone metadata db. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12311) [branch-2] HDFS specific network topology classes with storage type info included
Chen Liang created HDFS-12311: - Summary: [branch-2] HDFS specific network topology classes with storage type info included Key: HDFS-12311 URL: https://issues.apache.org/jira/browse/HDFS-12311 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Chen Liang Assignee: Chen Liang This JIRA is to backport HDFS-11450 to branch 2. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12306) [branch-2]Separate class InnerNode from class NetworkTopology and make it extendable
Chen Liang created HDFS-12306: - Summary: [branch-2]Separate class InnerNode from class NetworkTopology and make it extendable Key: HDFS-12306 URL: https://issues.apache.org/jira/browse/HDFS-12306 Project: Hadoop HDFS Issue Type: Improvement Reporter: Chen Liang Assignee: Chen Liang This JIRA is to backport HDFS-11430 to branch-2. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12304) Remove unused parameter from FsDatasetImpl#addVolume
Chen Liang created HDFS-12304: - Summary: Remove unused parameter from FsDatasetImpl#addVolume Key: HDFS-12304 URL: https://issues.apache.org/jira/browse/HDFS-12304 Project: Hadoop HDFS Issue Type: Improvement Reporter: Chen Liang Assignee: Chen Liang Priority: Minor FsDatasetImpl has this method {code} private void addVolume(Collection dataLocations, Storage.StorageDirectory sd) throws IOException {code} Parameter {{dataLocations}} was introduced in HDFS-6740, this variable was used to get storage type info. But HDFS-10637 has changed the way of getting storage type in this method, making dataLocations no longer being used at all here. We should probably remove dataLocations for a cleaner interface. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12287) Remove a no-longer applicable TODO comment in DatanodeManager
Chen Liang created HDFS-12287: - Summary: Remove a no-longer applicable TODO comment in DatanodeManager Key: HDFS-12287 URL: https://issues.apache.org/jira/browse/HDFS-12287 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Chen Liang Assignee: Chen Liang Priority: Trivial {{DatanodeManager}} has this this TODO comment {code} // TODO: Enables DFSNetworkTopology by default after more stress // testings/validations. {code} This has been resolved in HDFS-11998, but it missed removing this comment. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12266) Ozone : add debug cli to hdfs script
Chen Liang created HDFS-12266: - Summary: Ozone : add debug cli to hdfs script Key: HDFS-12266 URL: https://issues.apache.org/jira/browse/HDFS-12266 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Chen Liang Assignee: Chen Liang Priority: Minor The debug CLI (which converts metadata levelDB/RocksDB file to sqlite file) is still missing in hdfs script, this JIRA adds it as one of the hdfs subcommands. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12265) Ozone : better handling of operation fail due to chill mode
Chen Liang created HDFS-12265: - Summary: Ozone : better handling of operation fail due to chill mode Key: HDFS-12265 URL: https://issues.apache.org/jira/browse/HDFS-12265 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Chen Liang Priority: Minor Currently if someone tries to create a container while SCM is in chill mode, there will be exception of INTERNAL_ERROR, which is not very informative and can be confusing for debugging. We should make it easier to identify problems caused by chill mode. For example, we may detect if SCM is in chill mode and report back to client in some way, such that the client can backup and try again later. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12256) Ozone : handle inactive containers on DataNode side
Chen Liang created HDFS-12256: - Summary: Ozone : handle inactive containers on DataNode side Key: HDFS-12256 URL: https://issues.apache.org/jira/browse/HDFS-12256 Project: Hadoop HDFS Issue Type: Bug Reporter: Chen Liang When a container gets created, corresponding metadata gets added to {{ContainerManagerImpl#containerMap}}. What {{containerMap}} stores is a containerName to {{ContainerStatus}} instance map. When datanode starts, it also loads this map from disk file metadata. As long as the containerName is found in this map, it is considered an existing container. An issue we saw was that, occasionally, when the container creation on datanode fails, the metadata of the failed container may still get added to {{containerMap}}, with active flag set to false. But currently such containers are not being handled, containers with active=false are just treated as normal containers. Then when someone tries to write to this container, fails can happen. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12187) Ozone : add support to DEBUG CLI for ksm.db
Chen Liang created HDFS-12187: - Summary: Ozone : add support to DEBUG CLI for ksm.db Key: HDFS-12187 URL: https://issues.apache.org/jira/browse/HDFS-12187 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang This JIRA adds the ability to convert ksm meta data file (ksm.db) into sqlite db. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12138) Remove redundant 'public' modifiers from BlockCollection
Chen Liang created HDFS-12138: - Summary: Remove redundant 'public' modifiers from BlockCollection Key: HDFS-12138 URL: https://issues.apache.org/jira/browse/HDFS-12138 Project: Hadoop HDFS Issue Type: Improvement Reporter: Chen Liang Assignee: Chen Liang Priority: Trivial The 'public' modifier of the methods in {{BlockCollection}} are redundant, since this is a public interface. Running checkstyle against also complains this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12130) Optimizing permission check for getContentSummary
Chen Liang created HDFS-12130: - Summary: Optimizing permission check for getContentSummary Key: HDFS-12130 URL: https://issues.apache.org/jira/browse/HDFS-12130 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Chen Liang Assignee: Chen Liang Currently, {{getContentSummary}} takes two phases to complete: - phase1. check the permission of the entire subtree. If any subdirectory does not have {{READ_EXECUTE}}, an access control exception is thrown and {{getContentSummary}} terminates here (unless it's super user). - phase2. If phase1 passed, it will then traverse the entire tree recursively to get the actual content summary. An issue is, both phases currently hold the fs lock. Phase 2 has already been written that, it will yield the fs lock over time, such that it does not block other operations for too long. However phase 1 does not yield. Meaning it's possible that the permission check phase still blocks things for long time. One fix is to add lock yield to phase 1. But a simpler fix is to merge phase 1 into phase 2. Namely, instead of doing a full traversal for permission check first, we start with phase 2 directly, but for each directory, before obtaining its summary, check its permission first. This way we take advantage of existing lock yield in phase 2 code and still able to check permission and terminate on access exception. Thanks [~szetszwo] for the offline discussions! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-12041) Block Storage : make the server address config more concise
[ https://issues.apache.org/jira/browse/HDFS-12041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang resolved HDFS-12041. --- Resolution: Won't Fix > Block Storage : make the server address config more concise > --- > > Key: HDFS-12041 > URL: https://issues.apache.org/jira/browse/HDFS-12041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Minor > > Currently there are a few places where the address are read from config like > such > {code} > String cbmIPAddress = ozoneConf.get( > DFS_CBLOCK_JSCSI_CBLOCK_SERVER_ADDRESS_KEY, > DFS_CBLOCK_JSCSI_CBLOCK_SERVER_ADDRESS_DEFAULT > ); > int cbmPort = ozoneConf.getInt( > DFS_CBLOCK_JSCSI_PORT_KEY, > DFS_CBLOCK_JSCSI_PORT_DEFAULT > ); > {code} > Similarly for jscsi address config. Maybe we should consider merge these to > one single key config in form of host:port. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12043) Add counters for block re-replication
Chen Liang created HDFS-12043: - Summary: Add counters for block re-replication Key: HDFS-12043 URL: https://issues.apache.org/jira/browse/HDFS-12043 Project: Hadoop HDFS Issue Type: Bug Reporter: Chen Liang Assignee: Chen Liang We occasionally see that the under-replicated block count is not going down quickly enough. We've made at least one fix to speed up block replications (HDFS-9205) but we need better insight into the current state and activity of the block re-replication logic. For example, we need to understand whether is it because re-replication is not making forward progress at all, or is it because new under-replicated blocks are being added faster. We should include additional metrics: # Cumulative number of blocks that were successfully replicated. # Cumulative number of re-replications that timed out. # Cumulative number of blocks that were dequeued for re-replication but not scheduled e.g. because they were invalid, or under-construction or replication was postponed. The growth rate of of the above metrics will make it clear whether block replication is making forward progress and if not then provide potential clues about why it is stalled. Thanks [~arpitagarwal] for the offline discussions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12041) Block Storage : make the server address config more concise
Chen Liang created HDFS-12041: - Summary: Block Storage : make the server address config more concise Key: HDFS-12041 URL: https://issues.apache.org/jira/browse/HDFS-12041 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Priority: Minor Currently there are a few places where the address are read from config like such {code} String cbmIPAddress = ozoneConf.get( DFS_CBLOCK_JSCSI_CBLOCK_SERVER_ADDRESS_KEY, DFS_CBLOCK_JSCSI_CBLOCK_SERVER_ADDRESS_DEFAULT ); int cbmPort = ozoneConf.getInt( DFS_CBLOCK_JSCSI_PORT_KEY, DFS_CBLOCK_JSCSI_PORT_DEFAULT ); {code} Similarly for jscsi address config. Maybe we should consider merge these to one single key config in form of host:port. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12002) Ozone : SCM cli misc fixes/improvements
Chen Liang created HDFS-12002: - Summary: Ozone : SCM cli misc fixes/improvements Key: HDFS-12002 URL: https://issues.apache.org/jira/browse/HDFS-12002 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang Fix For: ozone Currently there are a few minor issues with the SCM CLI: 1. some commands do not use -c option to take container name. an issue with this is that arguments need to be in a certain order to be correctly parsed, e.g.: {{./bin/hdfs scm -container -del c0 -f}} works, but {{./bin/hdfs scm -container -del -f c0}} will not 2.some subcommands are not displaying the errors in the best way it could be, e.g.: {{./bin/hdfs scm -container -del}} is wrong because it misses container name. So cli complains {code} Missing argument for option: del Unrecognized options:[-container, -del] usage: hdfs scm [] where can be one of the following -container Container related options {code} but this does not really show that it is container name it is missing 3. probably better to rename -del to -delete to be consistent with other commands like -create and -info 4. when passing in invalid argument e.g. -info on a non-existing container, an exception will be displayed. We probably should not scare the users, and only display just one error message. And move the exception display to debug mode display or something. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11998) Enable DFSNetworkTopology as default
Chen Liang created HDFS-11998: - Summary: Enable DFSNetworkTopology as default Key: HDFS-11998 URL: https://issues.apache.org/jira/browse/HDFS-11998 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang HDFS-11530 has made it configurable to use {{DFSNetworkTopology}}, and still uses {{NetworkTopology}} as default. Given the stress testing in HDFS-11923 which shows the correctness of DFSNetworkTopology, and the performance testing in HDFS-11535 which shows how DFSNetworkTopology can outperform NetworkTopology. I think we are at the point where I can and should enable DFSNetworkTopology as default. Any comments/thoughts are more than welcome! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11997) ChunkManager functions do not use the argument keyName
Chen Liang created HDFS-11997: - Summary: ChunkManager functions do not use the argument keyName Key: HDFS-11997 URL: https://issues.apache.org/jira/browse/HDFS-11997 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang {{ChunkManagerImpl}}'s functions i.e. {{writeChunk}} {{readChunk}} {{deleteChunk}} all take a {{keyName}} argument, which is not being used by any of them. I think this makes sense because conceptually {{ChunkManager}} should not have to know keyName to do anything, probably except for some sort of sanity check or logging, which is not there either. We should revisit whether we need it here. I think we should remove it to make the Chunk syntax, and the function signatures more cleanly abstracted. Any comments? [~anu] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11996) Ozone : add partial read of chunks
Chen Liang created HDFS-11996: - Summary: Ozone : add partial read of chunks Key: HDFS-11996 URL: https://issues.apache.org/jira/browse/HDFS-11996 Project: Hadoop HDFS Issue Type: Sub-task Environment: Currently when reading a chunk, it is always the whole chunk that gets returned. However it is possible the reader may only need to read a subset of the chunk. This JIRA adds the partial read of chunks. Reporter: Chen Liang Assignee: Chen Liang -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11939) Ozone : add read/write random access to Chunks of a key
Chen Liang created HDFS-11939: - Summary: Ozone : add read/write random access to Chunks of a key Key: HDFS-11939 URL: https://issues.apache.org/jira/browse/HDFS-11939 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang In Ozone, the value of a key is a sequence of container chunks. Currently, the only way to read/write the chunks is by using ChunkInputStream and ChunkOutputStream. However, by the nature of streams, these classes are currently implemented to only allow sequential read/write. Ideally we would like to support random access of the chunks. For example, we want to be able to seek to a specific offset and read/write some data. This will be critical for key range read/write feature, and potentially important for supporting parallel read/write. This JIRA tracks adding support by implementing FileChannel class on top Chunks. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11932) BPServiceActor thread name is not correctly set
Chen Liang created HDFS-11932: - Summary: BPServiceActor thread name is not correctly set Key: HDFS-11932 URL: https://issues.apache.org/jira/browse/HDFS-11932 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Reporter: Chen Liang Assignee: Chen Liang When running unit tests (e.g. TestJMXGet), we often get this following exception, although the tests still passed: {code} WARN datanode.DataNode (BPOfferService.java:getBlockPoolId(192)) - Block pool ID needed, but service not yet registered with NN java.lang.Exception: trace at org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:192) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.formatThreadName(BPServiceActor.java:556) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.start(BPServiceActor.java:544) at ... {code} It seems that, although this does not affect normal operations, this is causing the thread name of BPServiceActor not correctly set as desired. More specifically,: {code} bpThread = new Thread(this, formatThreadName("heartbeating", nnAddr)); bpThread.setDaemon(true); // needed for JUnit testing bpThread.start(); {code} The first line tries to call formatThreadName to get format a thread name, and formatThreadName is reading the value of BPOfferService#bpNSInfo. However this value is set only after the thread started (the third line above). So we get exception in the first line for reading non-existing value. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11923) Stress test of DFSNetworkTopology
Chen Liang created HDFS-11923: - Summary: Stress test of DFSNetworkTopology Key: HDFS-11923 URL: https://issues.apache.org/jira/browse/HDFS-11923 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang I wrote a stress test with {{DFSNetworkTopology}} to verify its correctness under huge number of datanode changes e.g., data node insert/delete, storage addition/removal etc. The goal is to show that the topology maintains the correct counters all time. The test is written that, unless manually terminated, it will keep randomly performing the operations nonstop. (and because of this, the test is ignored in the patch). My local test lasted 40 min before I stopped it, it involved more than one million datanode changes, and no error happened. We believe this should be sufficient to show the correctness of {{DFSNetworkTopology}} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11920) Ozone : add key partition
Chen Liang created HDFS-11920: - Summary: Ozone : add key partition Key: HDFS-11920 URL: https://issues.apache.org/jira/browse/HDFS-11920 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang Currently, each key corresponds to one single SCM block, and putKey/getKey writes/reads to this single SCM block. This works fine for keys with reasonably small data size. However if the data is too huge, (e.g. not even fits into a single container), then we need to be able to partition the key data into multiple blocks, each in one container. This JIRA changes the key-related classes to support this. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently
Chen Liang created HDFS-11907: - Summary: NameNodeResourceChecker should avoid calling df.getAvailable too frequently Key: HDFS-11907 URL: https://issues.apache.org/jira/browse/HDFS-11907 Project: Hadoop HDFS Issue Type: Improvement Reporter: Chen Liang Assignee: Chen Liang Currently, {{HealthMonitor#doHealthChecks}} invokes {{NameNode#monitorHealth}} which ends up invoking {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per second by default. And NameNodeResourceChecker#isResourceAvailable invokes {{df.getAvailable();}} every time it is called. Which can be a potentially very expensive operation. Since available space information should rarely be changing dramatically at the pace of per second. A cached value should be sufficient. i.e. only try to get the updated value when the cached value is too old. otherwise simply return the cached value. This way df.getAvailable() gets invoked less. Thanks [~arpitagarwal] for the offline discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11906) Add log for NameNode#monitorHealth
Chen Liang created HDFS-11906: - Summary: Add log for NameNode#monitorHealth Key: HDFS-11906 URL: https://issues.apache.org/jira/browse/HDFS-11906 Project: Hadoop HDFS Issue Type: Improvement Reporter: Chen Liang Assignee: Chen Liang Priority: Minor We've seen cases where NN had long delays that we suspect were due to {{NameNode#monitorHealth}} was spending too much time on {{getNamesystem().checkAvailableResources();}}. However due to the lack of logging, it can be hard to verify. This JIRA tries to add some log to this function, that display the actual time spent. Thanks [~arpitagarwal] for the offline discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11891) DU#refresh should print the path of the directory when an exception is caught
Chen Liang created HDFS-11891: - Summary: DU#refresh should print the path of the directory when an exception is caught Key: HDFS-11891 URL: https://issues.apache.org/jira/browse/HDFS-11891 Project: Hadoop HDFS Issue Type: Improvement Reporter: Chen Liang Assignee: Chen Liang Priority: Minor the refresh() method DU is as follows, {code} @Override protected synchronized void refresh() { try { duShell.startRefresh(); } catch (IOException ioe) { LOG.warn("Could not get disk usage information", ioe); } } {code} the log warning message should also be printing out the directory that failed. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11886) Ozone : improving error handling for putkey operation
Chen Liang created HDFS-11886: - Summary: Ozone : improving error handling for putkey operation Key: HDFS-11886 URL: https://issues.apache.org/jira/browse/HDFS-11886 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Chen Liang Ozone's putKey operations involve a couple steps: 1. KSM calls allocateBlock to SCM, writes this info to KSM's local metastore 2. allocatedBlock gets returned to client, client checks to see if container needs to be created on datanode, if yes, create the container 3. writes the data to container. it is possible that 1 succeeded, but 2 or 3 failed, in this case there will be an entry in KSM's local metastore, but the key is actually nowhere to be found. We need to revert 1 is 2 or 3 failed in this case. This can be done with a deleteKey() call to KSM. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-11872) Ozone : implement StorageContainerManager#getStorageContainerLocations
[ https://issues.apache.org/jira/browse/HDFS-11872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang resolved HDFS-11872. --- Resolution: Won't Fix I misread {{getStorageContainerLocations}} as the lookup of container given container's name. But it turns out this is look up container given a specific key. In this case this should probably indeed move to KSM. May need to revisit this later, but will not 'fix' this for the time being. > Ozone : implement StorageContainerManager#getStorageContainerLocations > -- > > Key: HDFS-11872 > URL: https://issues.apache.org/jira/browse/HDFS-11872 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Chen Liang >Assignee: Chen Liang > > We should implement {{StorageContainerManager#getStorageContainerLocations}} > . > Although the comment says it will be moved to KSM, the functionality of > container lookup by name it should actually be part of SCM functionality. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11872) Ozone : implement StorageContainerManager#getStorageContainerLocations
Chen Liang created HDFS-11872: - Summary: Ozone : implement StorageContainerManager#getStorageContainerLocations Key: HDFS-11872 URL: https://issues.apache.org/jira/browse/HDFS-11872 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Chen Liang Assignee: Chen Liang We should implement {{StorageContainerManager#getStorageContainerLocations}} . Although the comment says it will be moved to KSM, the functionality of container lookup by name it should actually be part of SCM functionality. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11859) Ozone : separate blockLocationProtocol out of containerLocationProtocol
Chen Liang created HDFS-11859: - Summary: Ozone : separate blockLocationProtocol out of containerLocationProtocol Key: HDFS-11859 URL: https://issues.apache.org/jira/browse/HDFS-11859 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Currently StorageLocationProtcol contains two types of operations: container related operations and block related operations. Although there is {{ScmBlockLocationProtocol}} for block operations, only {{StorageContainerLocationProtocolServerSideTranslatorPB}} is making the distinguish. This JIRA tries to make the separation complete and thorough for all places. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11857) Ozone : need to refactor StorageContainerLocationProtocolServerSideTranslatorPB
Chen Liang created HDFS-11857: - Summary: Ozone : need to refactor StorageContainerLocationProtocolServerSideTranslatorPB Key: HDFS-11857 URL: https://issues.apache.org/jira/browse/HDFS-11857 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang Currently, StorageContainerLocationProtocolServerSideTranslatorPB has two protocol impls: {{StorageContainerLocationProtocol impl}} {{ScmBlockLocationProtocol blockImpl}}. the class provides container-related services by invoking {{impl}}, and block-related services by invoking {{blockImpl}}. Namely, on server side, the implementation makes a distinguish between "container protocol" and "block protocol". An issue is that, currently, everywhere except for the server side is viewing "container protocol" and "block protocol" as different. More specifically, StorageContainerLocationProtocol.proto still includes both container operation and block operation in itself alone. As a result of this difference, it is difficult to implement certain APIs (e.g. putKey) properly from client side. This JIRA merges "block protocol" back to "container protocol" in StorageContainerLocationProtocolServerSideTranslatorPB, to unblock the implementation of other APIs for client side. Please note that, in the long run, separating these two protocols does seem to be the right way. This JIRA is only a temporary solution to unblock developing other APIs. Will need to revisit these protocols in the future. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11836) Ozone : add sql debug CLI to hdfs script
Chen Liang created HDFS-11836: - Summary: Ozone : add sql debug CLI to hdfs script Key: HDFS-11836 URL: https://issues.apache.org/jira/browse/HDFS-11836 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Chen Liang Assignee: Chen Liang HDFS-11698 was missing one change, which is that {{SQLCLI}} should be exposed to commandline via hdfs script. This JIRA addresses this. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11802) Ozone : add DEBUG CLI support for open container db file
Chen Liang created HDFS-11802: - Summary: Ozone : add DEBUG CLI support for open container db file Key: HDFS-11802 URL: https://issues.apache.org/jira/browse/HDFS-11802 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Chen Liang Assignee: Chen Liang This is a following-up of HDFS-11698. This JIRA adds the converting of openContainer.db levelDB file. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11788) Ozone : add DEBUG CLI support for nodepool db file
Chen Liang created HDFS-11788: - Summary: Ozone : add DEBUG CLI support for nodepool db file Key: HDFS-11788 URL: https://issues.apache.org/jira/browse/HDFS-11788 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang This is a following-up of HDFS-11698. This JIRA adds the converting of nodepool.db levelDB file. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11759) Ozone : SCMNodeManager#close() should also close node pool manager object
Chen Liang created HDFS-11759: - Summary: Ozone : SCMNodeManager#close() should also close node pool manager object Key: HDFS-11759 URL: https://issues.apache.org/jira/browse/HDFS-11759 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang {{SCMNodeManager#close()}} should also call {{nodePoolManager.close();}} to close it's {{SCMNodePoolManager}} instance. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11756) Ozone : add DEBUG CLI support of blockDB file
Chen Liang created HDFS-11756: - Summary: Ozone : add DEBUG CLI support of blockDB file Key: HDFS-11756 URL: https://issues.apache.org/jira/browse/HDFS-11756 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang This is a following-up of HDFS-11698. This JIRA adds the convert of block.db levelDB file. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11747) Ozone : need to fix OZONE_SCM_DEFAULT_PORT
Chen Liang created HDFS-11747: - Summary: Ozone : need to fix OZONE_SCM_DEFAULT_PORT Key: HDFS-11747 URL: https://issues.apache.org/jira/browse/HDFS-11747 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang We were deploying things in a physical cluster, and found an issue that {{OZONE_SCM_DEFAULT_PORT}} should be set to {{OZONE_SCM_DATANODE_PORT_DEFAULT}} instead of 9862 in the config keys. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11728) Ozone: add the DB names to OzoneConsts
Chen Liang created HDFS-11728: - Summary: Ozone: add the DB names to OzoneConsts Key: HDFS-11728 URL: https://issues.apache.org/jira/browse/HDFS-11728 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang Currently there are several places that use levelDB, and the name of the levelDBs are hard coded in the classes that use levelDB. We should extract them into OzoneConsts instead. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11723) Should log a warning message when users try to make certain directories encryption zone
Chen Liang created HDFS-11723: - Summary: Should log a warning message when users try to make certain directories encryption zone Key: HDFS-11723 URL: https://issues.apache.org/jira/browse/HDFS-11723 Project: Hadoop HDFS Issue Type: Improvement Components: encryption, hdfs-client Reporter: Chen Liang Assignee: Chen Liang If a user tries to make the entire /user directory an encryption zone, and if trash is enabled, there will be problem when the user tries to delete unencrypted file from /user to trash directory. The problem will happen even with the fix in HDFS-8831. So we should log a WARN message when users try to make such directories encryption zone. Such directories include: {{/user}}, {{/user/$user}} {{/user/$user/.Trash}} Thanks [~xyao] for the offline discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11650) Ozone: fix the consistently timeout test testUpgradeFromRel22Image
Chen Liang created HDFS-11650: - Summary: Ozone: fix the consistently timeout test testUpgradeFromRel22Image Key: HDFS-11650 URL: https://issues.apache.org/jira/browse/HDFS-11650 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang Recently, the test TestDFSUpgradeFromImage.testUpgradeFromRel22Image has been consistently failing due to timeout. JIRAs that encountered this include (but not limited to) HDFS-11642, HDFS-11635, HDFS-11062 and HDFS-11618. While this test passes in trunk. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11649) Ozone : add SCM CLI shell code placeholder classes
Chen Liang created HDFS-11649: - Summary: Ozone : add SCM CLI shell code placeholder classes Key: HDFS-11649 URL: https://issues.apache.org/jira/browse/HDFS-11649 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang HDFS-11470 has outlined how the SCM CLI would look like. Based on the design, this JIRA adds the basic placeholder classes for all commands to be filled in. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11645) DataXceiver thread should log the actual error when getting InvalidMagicNumberException
Chen Liang created HDFS-11645: - Summary: DataXceiver thread should log the actual error when getting InvalidMagicNumberException Key: HDFS-11645 URL: https://issues.apache.org/jira/browse/HDFS-11645 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0-alpha1, 2.8.1 Reporter: Chen Liang Assignee: Chen Liang Priority: Minor Currently, {{DataXceiver#run}} method only logs an error message when getting an {{InvalidMagicNumberException}}. It should also log the actual exception. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11631) Block Storage : allow cblock server to be started from hdfs command
Chen Liang created HDFS-11631: - Summary: Block Storage : allow cblock server to be started from hdfs command Key: HDFS-11631 URL: https://issues.apache.org/jira/browse/HDFS-11631 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang This JIRA adds CBlock main() method, also adds entry to hdfs script, such that cblock server can be started by hdfs script and run as a daemon process. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-11535) Performance analysis of new DFSNetworkTopology#chooseRandom
[ https://issues.apache.org/jira/browse/HDFS-11535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang reopened HDFS-11535: --- > Performance analysis of new DFSNetworkTopology#chooseRandom > --- > > Key: HDFS-11535 > URL: https://issues.apache.org/jira/browse/HDFS-11535 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-11535.001.patch, HDFS-11535.002.patch, PerfTest.pdf > > > This JIRA is created to post the results of some performance experiments we > did. For those who are interested, please the attached .pdf file for more > detail. The attached patch file includes the experiment code we ran. > The key insights we got from these tests is that: although *the new method > outperforms the current one in most cases*. There is still *one case where > the current one is better*. Which is when there is only one storage type in > the cluster, and we also always look for this storage type. In this case, it > is simply a waste of time to perform storage-type-based pruning, blindly > picking up a random node (current methods) would suffice. > Therefore, based on the analysis, we propose to use a *combination of both > the old and the new methods*: > say, we search for a node of type X, since now inner node all keep storage > type info, we can *just check root node to see if X is the only type it has*. > If yes, blindly picking a random leaf will work, so we simply call the old > method, otherwise we call the new method. > There is still at least one missing piece in this performance test, which is > garbage collection. The new method does a few more object creation when doing > the search, which adds overhead to GC. I'm still thinking of any potential > optimization but this seems tricky, also I'm not sure whether this > optimization worth doing at all. Please feel free to leave any > comments/suggestions. > Thanks [~arpitagarwal] and [~szetszwo] for the offline discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-11535) Performance analysis of new DFSNetworkTopology#chooseRandom
[ https://issues.apache.org/jira/browse/HDFS-11535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang resolved HDFS-11535. --- Resolution: Information Provided > Performance analysis of new DFSNetworkTopology#chooseRandom > --- > > Key: HDFS-11535 > URL: https://issues.apache.org/jira/browse/HDFS-11535 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-11535.001.patch, HDFS-11535.002.patch, PerfTest.pdf > > > This JIRA is created to post the results of some performance experiments we > did. For those who are interested, please the attached .pdf file for more > detail. The attached patch file includes the experiment code we ran. > The key insights we got from these tests is that: although *the new method > outperforms the current one in most cases*. There is still *one case where > the current one is better*. Which is when there is only one storage type in > the cluster, and we also always look for this storage type. In this case, it > is simply a waste of time to perform storage-type-based pruning, blindly > picking up a random node (current methods) would suffice. > Therefore, based on the analysis, we propose to use a *combination of both > the old and the new methods*: > say, we search for a node of type X, since now inner node all keep storage > type info, we can *just check root node to see if X is the only type it has*. > If yes, blindly picking a random leaf will work, so we simply call the old > method, otherwise we call the new method. > There is still at least one missing piece in this performance test, which is > garbage collection. The new method does a few more object creation when doing > the search, which adds overhead to GC. I'm still thinking of any potential > optimization but this seems tricky, also I'm not sure whether this > optimization worth doing at all. Please feel free to leave any > comments/suggestions. > Thanks [~arpitagarwal] and [~szetszwo] for the offline discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11577) Combine the old and the new chooseRandom for better performance
Chen Liang created HDFS-11577: - Summary: Combine the old and the new chooseRandom for better performance Key: HDFS-11577 URL: https://issues.apache.org/jira/browse/HDFS-11577 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang As discussed in HDFS-11535, this JIRA adds a new function combining both the new and the old chooseRandom methods for better performance. More specifically, when choosing a random node with storage type requirement, the combined method first tries the old method of blindly picking a random node. If this node satisfies, it is returned. Otherwise, the new chooseRandom is called, which guarantees to find a eligible node in one call (if there is one at all). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11539) Block Storage : configurable max cache size
Chen Liang created HDFS-11539: - Summary: Block Storage : configurable max cache size Key: HDFS-11539 URL: https://issues.apache.org/jira/browse/HDFS-11539 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Currently, there is no max size limit for CBlock's local cache. In theory, this means the cache can potentially increase unbounded. We should make the max size configurable. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11537) Block Storage : add cache layer
Chen Liang created HDFS-11537: - Summary: Block Storage : add cache layer Key: HDFS-11537 URL: https://issues.apache.org/jira/browse/HDFS-11537 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang This JIRA adds the cache layer. Specifically, this JIRA implements the cache interface in HDFS-11361 and adds the code that actually talks to containers. The upper layer can simply view the storage as a cache with simple put and get interface, while in the backend the get and put are actually talking to containers. This is a critical part to the cblock performance. [~anu] is actually the author who contributed to most of this part. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11535) Performance analysis of new DFSNetworkTopology#chooseRandom
Chen Liang created HDFS-11535: - Summary: Performance analysis of new DFSNetworkTopology#chooseRandom Key: HDFS-11535 URL: https://issues.apache.org/jira/browse/HDFS-11535 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Chen Liang Assignee: Chen Liang Attachments: PerfTest.pdf This JIRA is created to post the results of some performance experiments we did. For those who are interested, please the attached .pdf file for more detail. The attached patch file includes the experiment code we ran. The key insights we got from these tests is that: although *the new method outperforms the current one in most cases*. There is still *one case where the current one is better*. Which is when there is only one storage type in the cluster, and we also always look for this storage type. In this case, it is simply a waste of time to perform storage-type-based pruning, blindly picking up a random node (current methods) would suffice. Therefore, based on the analysis, we propose to use a *combination of both the old and the new methods*: say, we search for a node of type X, since now inner node all keep storage type info, we can *just check root node to see if X is the only type it has*. If yes, blindly picking a random leaf will work, so we simply call the old method, otherwise we call the new method. There is still at least one missing piece in this performance test, which is garbage collection. The new method does a few more object creation when doing the search, which adds overhead to GC. I'm still thinking of any potential optimization but this seems tricky, also I'm not sure whether this optimization worth doing at all. Please feel free to leave any comments/suggestions. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11514) ChooseRandom can potentially be optimized
Chen Liang created HDFS-11514: - Summary: ChooseRandom can potentially be optimized Key: HDFS-11514 URL: https://issues.apache.org/jira/browse/HDFS-11514 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang Based on the offline discussion, one potential improvement to the {{chooseRandomWithStorageType}} added in HDFS-11482 is that, currently given a node, the method iterates all its children to sum up the number of candidate datanodes. Since datanode status change is much less frequent than block placement request. It is more efficient to get rid of this iteration check, by probably maintaining another disk type counter map. This JIRA tracks (but not limited) this optimization. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-11507) NetworkTopology#chooseRandom may run into a dead loop due to race condition
[ https://issues.apache.org/jira/browse/HDFS-11507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang resolved HDFS-11507. --- Resolution: Not A Problem > NetworkTopology#chooseRandom may run into a dead loop due to race condition > --- > > Key: HDFS-11507 > URL: https://issues.apache.org/jira/browse/HDFS-11507 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Chen Liang >Assignee: Chen Liang > > {{NetworkTopology#chooseRandom()}} works as: > 1. counts the number of available nodes as {{availableNodes}}, > 2. checks how many nodes are excluded, deduct from {{availableNodes}} > 3. if {{availableNodes}} still > 0, then there are nodes available. > 4. keep looping to find that node > But now imagine, in the meantime, the actually available nodes got removed in > step 3 or step 4, and all remaining nodes are excluded nodes. Then, although > there are no more nodes actually available, the code would still run as > {{availableNodes}} > 0, and then it would keep getting excluded node and loop > forever, as > {{if (excludedNodes == null || !excludedNodes.contains(ret))}} > will always be false. > We may fix this by expanding the while loop to also include the > {{availableNodes}} calculation. Such that we re-calculate {{availableNodes}} > every time it fails to find an available node. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org