[jira] [Resolved] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes
[ https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-14305. Resolution: Fixed > Serial number in BlockTokenSecretManager could overlap between different > namenodes > -- > > Key: HDFS-14305 > URL: https://issues.apache.org/jira/browse/HDFS-14305 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, security >Reporter: Chao Sun >Assignee: Konstantin Shvachko >Priority: Major > Labels: multi-sbnn > Attachments: HDFS-14305-007.patch, HDFS-14305-008.patch, > HDFS-14305.001.patch, HDFS-14305.002.patch, HDFS-14305.003.patch, > HDFS-14305.004.patch, HDFS-14305.005.patch, HDFS-14305.006.patch > > > Currently, a {{BlockTokenSecretManager}} starts with a random integer as the > initial serial number, and then use this formula to rotate it: > {code:java} > this.intRange = Integer.MAX_VALUE / numNNs; > this.nnRangeStart = intRange * nnIndex; > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > while {{numNNs}} is the total number of NameNodes in the cluster, and > {{nnIndex}} is the index of the current NameNode specified in the > configuration {{dfs.ha.namenodes.}}. > However, with this approach, different NameNode could have overlapping ranges > for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, > and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges > for these two are: > {code} > nn1 -> [-49, 49] > nn2 -> [1, 99] > {code} > This is because the initial serial number could be any negative integer. > Moreover, when the keys are updated, the serial number will again be updated > with the formula: > {code} > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > which means the new serial number could be updated to a range that belongs to > a different NameNode, thus increasing the chance of collision again. > When the collision happens, DataNodes could overwrite an existing key which > will cause clients to fail because of {{InvalidToken}} error. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-7612) TestOfflineEditsViewer.testStored() uses incorrect default value for cacheDir
[ https://issues.apache.org/jira/browse/HDFS-7612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-7612. --- Fix Version/s: 3.2.4 3.3.2 2.10.2 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed I just committed this to the four active branches. Congratulations [~mkuchenbecker]! > TestOfflineEditsViewer.testStored() uses incorrect default value for cacheDir > - > > Key: HDFS-7612 > URL: https://issues.apache.org/jira/browse/HDFS-7612 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.6.0 >Reporter: Konstantin Shvachko >Assignee: Michael Kuchenbecker >Priority: Major > Labels: newbie, pull-request-available > Fix For: 3.4.0, 2.10.2, 3.3.2, 3.2.4 > > Time Spent: 0.5h > Remaining Estimate: 0h > > {code} > final String cacheDir = System.getProperty("test.cache.data", > "build/test/cache"); > {code} > results in > {{FileNotFoundException: build/test/cache/editsStoredParsed.xml (No such file > or directory)}} > when {{test.cache.data}} is not set. > I can see this failing while running in Eclipse. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16141) [FGL] Address permission related issues with File / Directory
[ https://issues.apache.org/jira/browse/HDFS-16141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-16141. Fix Version/s: Fine-Grained Locking Hadoop Flags: Reviewed Resolution: Fixed I just committed this to fgl branch. Thank you [~prasad-acit]. > [FGL] Address permission related issues with File / Directory > - > > Key: HDFS-16141 > URL: https://issues.apache.org/jira/browse/HDFS-16141 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Fix For: Fine-Grained Locking > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Post FGL implementation (MKDIR & Create File), there are existing UTs got > impacted which needs to be addressed. > Failed Tests: > TestDFSPermission > TestPermission > TestFileCreation > TestDFSMkdirs (Added tests) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16130) [FGL] Implement Create File with FGL
[ https://issues.apache.org/jira/browse/HDFS-16130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-16130. Hadoop Flags: Reviewed Resolution: Fixed I just committed this. Fixed a few checkstyle warnings. Thank you [~prasad-acit]. > [FGL] Implement Create File with FGL > > > Key: HDFS-16130 > URL: https://issues.apache.org/jira/browse/HDFS-16130 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: Fine-Grained Locking >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Implement FGL for Create File. > Create API acquire global lock at mulitiple stages. Acquire the respective > partitioned lock and continue the create operation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16128) [FGL] Add support for saving/loading an FS Image for PartitionedGSet
[ https://issues.apache.org/jira/browse/HDFS-16128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-16128. Fix Version/s: Fine-Grained Locking Hadoop Flags: Reviewed Resolution: Fixed I just committed this. Thank you [~xinglin]. > [FGL] Add support for saving/loading an FS Image for PartitionedGSet > > > Key: HDFS-16128 > URL: https://issues.apache.org/jira/browse/HDFS-16128 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Major > Labels: pull-request-available > Fix For: Fine-Grained Locking > > > Add support to save Inodes stored in PartitionedGSet when saving an FS image > and load Inodes into PartitionedGSet from a saved FS image. > h1. Saving FSImage > *Original HDFS design*: iterate every inode in inodeMap and save them into > the FSImage file. > *FGL*: no change is needed here, since PartitionedGSet also provides an > iterator interface, to iterate over inodes stored in partitions. > h1. Loading an HDFS > *Original HDFS design*: it first loads the FSImage files and then loads edit > logs for recent changes. FSImage files contain different sections, including > INodeSections and INodeDirectorySections. An InodeSection contains serialized > Inodes objects and the INodeDirectorySection contains the parent inode for an > Inode. When loading an FSImage, the system first loads INodeSections and then > load the INodeDirectorySections, to set the parent inode for each inode. > After FSImage files are loaded, edit logs are then loaded. Edit log contains > recent changes to the filesystem, including Inodes creation/deletion. For a > newly created INode, the parent inode is set before it is added to the > inodeMap. > *FGL*: when adding an Inode into the partitionedGSet, we need the parent > inode of an inode, in order to determine which partition to store that inode, > when NAMESPACE_KEY_DEPTH = 2. Thus, in FGL, when loading FSImage files, we > used a temporary LightweightGSet (inodeMapTemp), to store inodes. When > LoadFSImage is done, the parent inode for all existing inodes in FSImage > files is set. We can now move the inodes into a partitionedGSet. Load edit > logs can work as usual, as the parent inode for an inode is set before it is > added to the inodeMap. > In theory, PartitionedGSet can support to store inodes without setting its > parent inodes. All these inodes will be stored in the 0th partition. However, > we decide to use a temporary LightweightGSet (inodeMapTemp) to store these > inodes, to make this case more transparent. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16125) [FGL] Fix the iterator for PartitionedGSet
[ https://issues.apache.org/jira/browse/HDFS-16125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-16125. Fix Version/s: Fine-Grained Locking Hadoop Flags: Reviewed Resolution: Fixed +1 on the latest patch. I just committed this to branch fgl, also re-based flg to current trunk. Thank you [~xinglin]. > [FGL] Fix the iterator for PartitionedGSet > --- > > Key: HDFS-16125 > URL: https://issues.apache.org/jira/browse/HDFS-16125 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > Fix For: Fine-Grained Locking > > Time Spent: 1h > Remaining Estimate: 0h > > Iterator in PartitionedGSet would visit the first partition twice, since we > did not set the keyIterator to move to the first key during initialization. > > This is related to fgl: https://issues.apache.org/jira/browse/HDFS-14703 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-16001) TestOfflineEditsViewer.testStored() fails reading negative value of FSEditLogOpCodes
Konstantin Shvachko created HDFS-16001: -- Summary: TestOfflineEditsViewer.testStored() fails reading negative value of FSEditLogOpCodes Key: HDFS-16001 URL: https://issues.apache.org/jira/browse/HDFS-16001 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Reporter: Konstantin Shvachko -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log
Konstantin Shvachko created HDFS-15915: -- Summary: Race condition with async edits logging due to updating txId outside of the namesystem log Key: HDFS-15915 URL: https://issues.apache.org/jira/browse/HDFS-15915 Project: Hadoop HDFS Issue Type: Bug Components: hdfs, namenode Reporter: Konstantin Shvachko {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside {{FSNamesystem.writeLock}}. But one essential field the transaction id of the edits op remains unset until the time when the operation is scheduled for synching. At that time {{beginTransaction()}} will set the the {{FSEditLogOp.txid}} and increment the global transaction count. On busy NameNode this event can fall outside the write lock. This causes problems for Observer reads. It also can potentially reshuffle transactions and Standby will apply them in a wrong order. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15849) ExpiredHeartbeats metric should be of Type.COUNTER
Konstantin Shvachko created HDFS-15849: -- Summary: ExpiredHeartbeats metric should be of Type.COUNTER Key: HDFS-15849 URL: https://issues.apache.org/jira/browse/HDFS-15849 Project: Hadoop HDFS Issue Type: Bug Components: metrics Reporter: Konstantin Shvachko Currently {{ExpiredHeartbeats}} metric has default type, which makes it {{Type.GAUGE}}. It should be {{Type.COUNTER}} for proper graphing. See discussion in HDFS-15808. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15632) AbstractContractDeleteTest should set recursive peremeter to true for recursive test cases.
[ https://issues.apache.org/jira/browse/HDFS-15632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-15632. Fix Version/s: 3.2.3 2.10.2 3.1.5 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed I just committed this. Thank you [~antn.kutuzov] for contributing. > AbstractContractDeleteTest should set recursive peremeter to true for > recursive test cases. > --- > > Key: HDFS-15632 > URL: https://issues.apache.org/jira/browse/HDFS-15632 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Anton Kutuzov >Priority: Major > Labels: newbie, pull-request-available > Fix For: 3.4.0, 3.1.5, 2.10.2, 3.2.3 > > Time Spent: 20m > Remaining Estimate: 0h > > {{AbstractContractDeleteTest.testDeleteNonexistentPathRecursive()}} should > call {{delete(path, true)}} rather than {{false}} > Also {{AbstractContractDeleteTest.testDeleteNonexistentPathNonRecursive()}} > has a wrong assert message. Should be {{"... attempting to non-recursively > delete ..."}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-954) There are two security packages in hdfs, should be one
[ https://issues.apache.org/jira/browse/HDFS-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-954. -- Resolution: Won't Fix Hey [~antn.kutuzov] this is a rather old jira. I don't think it is a good idea to do repackaging at this point since it will make things harder to backport to older versions. Closing as won't fix. > There are two security packages in hdfs, should be one > -- > > Key: HDFS-954 > URL: https://issues.apache.org/jira/browse/HDFS-954 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Jakob Homan >Priority: Major > Labels: newbie > > Currently the test source tree has both > src/test/hdfs/org/apache/hadoop/hdfs/security with: > SecurityTestUtil.java > TestAccessToken.java > TestClientProtocolWithDelegationToken.java > and > src/test/hdfs/org/apache/hadoop/security with: > TestDelegationToken.java > TestGroupMappingServiceRefresh.java > TestPermission.java > These should be combined into one package and possibly some things moved to > common. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15751) Add documentation for msync() API to filesystem.md
Konstantin Shvachko created HDFS-15751: -- Summary: Add documentation for msync() API to filesystem.md Key: HDFS-15751 URL: https://issues.apache.org/jira/browse/HDFS-15751 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko HDFS-15567 introduced new {{FileSystem}} call {{msync()}}. Should add it to the API definitions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15623) Respect configured values of rpc.engine
[ https://issues.apache.org/jira/browse/HDFS-15623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-15623. Fix Version/s: 3.2.3 2.10.2 3.1.5 3.4.0 3.3.1 Hadoop Flags: Reviewed Resolution: Fixed I just committed to trunk and branches 3.3, 3.2, 3.1, 2.10. Thank you [~hchaverri] > Respect configured values of rpc.engine > --- > > Key: HDFS-15623 > URL: https://issues.apache.org/jira/browse/HDFS-15623 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Hector Sandoval Chaverri >Assignee: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3 > > Time Spent: 0.5h > Remaining Estimate: 0h > > The HDFS Configuration allows users to specify the RPCEngine implementation > to use when communicating with Datanodes and Namenodes. However, the value is > overwritten to ProtobufRpcEngine.class in different classes. As an example in > NameNodeRpcServer: > {{RPC.setProtocolEngine(conf, ClientNamenodeProtocolPB.class, > ProtobufRpcEngine.class);}} > {{The configured value of rpc.engine.[protocolName] should be respected to > allow for other implementations of RPCEngine to be used}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15665) Balancer logging improvement
Konstantin Shvachko created HDFS-15665: -- Summary: Balancer logging improvement Key: HDFS-15665 URL: https://issues.apache.org/jira/browse/HDFS-15665 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko It would be good to have Balancer log all relevant configuration parameters on each iteration along with some data, which reflects its progress and the amount of resources it involves. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15632) AbstractContractDeleteTest should set recursive peremeter to true for recursive test cases.
Konstantin Shvachko created HDFS-15632: -- Summary: AbstractContractDeleteTest should set recursive peremeter to true for recursive test cases. Key: HDFS-15632 URL: https://issues.apache.org/jira/browse/HDFS-15632 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.10.0 Reporter: Konstantin Shvachko {{AbstractContractDeleteTest.testDeleteNonexistentPathRecursive()}} should call {{delete(path, true)}} rather than {{false}} Also {{AbstractContractDeleteTest.testDeleteNonexistentPathNonRecursive()}} has a wrong assert message. Should be {{"... attempting to non-recursively delete ..."}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15567) [SBN Read] HDFS should expose msync() API to allow downstream applications call it explicetly.
Konstantin Shvachko created HDFS-15567: -- Summary: [SBN Read] HDFS should expose msync() API to allow downstream applications call it explicetly. Key: HDFS-15567 URL: https://issues.apache.org/jira/browse/HDFS-15567 Project: Hadoop HDFS Issue Type: Improvement Components: ha, hdfs-client Reporter: Konstantin Shvachko Consistent reads from Standby introduced {{msync()}} API HDFS-13688, which updates client's state ID with current state of the Active NameNode to guarantee consistency of subsequent calls to an ObserverNode. Currently this API is exposed via {{DFSClient}} only, which makes it hard for applications to access {{msync()}}. One way is to use something like this: {code} if(fs instanceof DistributedFileSystem) { ((DistributedFileSystem)fs).getClient().msync(); } {code} This should be exposed both for {{FileSystem}} and {{FileContext}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15323) StandbyNode fails transition to active due to insufficient transaction tailing
Konstantin Shvachko created HDFS-15323: -- Summary: StandbyNode fails transition to active due to insufficient transaction tailing Key: HDFS-15323 URL: https://issues.apache.org/jira/browse/HDFS-15323 Project: Hadoop HDFS Issue Type: Bug Components: namenode, qjm Affects Versions: 2.7.7 Reporter: Konstantin Shvachko StandbyNode is asked to {{transitionToActive()}}. If it fell too far behind in tailing journal transaction (from QJM) it can crash with {{IllegalStateException}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15291) [SBN read] Implement CyclicalBlockingQueue to avoid requing RPC calls on Observer.
Konstantin Shvachko created HDFS-15291: -- Summary: [SBN read] Implement CyclicalBlockingQueue to avoid requing RPC calls on Observer. Key: HDFS-15291 URL: https://issues.apache.org/jira/browse/HDFS-15291 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Konstantin Shvachko RPC queue is currently based on {{LinkedBlockingQueue}}, which is FIFO. For Observer we delay execution of a call if its lastSeenStateId is larger than the stateId of the Observer. The delay implemented as re-queuing the call to the and of the queue. Re-queue is not atomic. We can avoid moving elements in the queue by replacing {{LinkedBlockingQueue}} with a {{CyclicalBlockingQueue}}. So that instead of re-queuing we just move the head of the queue and the call automatically becomes the last. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15290) NPE in HttpServer during NameNode startup
Konstantin Shvachko created HDFS-15290: -- Summary: NPE in HttpServer during NameNode startup Key: HDFS-15290 URL: https://issues.apache.org/jira/browse/HDFS-15290 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.8 Reporter: Konstantin Shvachko When NameNode starts it first starts HttpServer, then starts loading fsImage and edits. While loading the namesystem field in NameNode is null. I saw that a StandbyNode sends a checkpoint request, which fails with NPE because NNStorage is not instantiated yet. We should check the NameNode startup status before accepting checkpoint requests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15185) StartupProgress reports edits segments until the entire startup completes
Konstantin Shvachko created HDFS-15185: -- Summary: StartupProgress reports edits segments until the entire startup completes Key: HDFS-15185 URL: https://issues.apache.org/jira/browse/HDFS-15185 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.10.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Startup Progress page keeps reporting edits segments after the {{LOAD_EDITS}} stage is complete. New steps are added to StartupProgress while journal tailing until all startup phases are completed. This adds a lot of edits steps, since {{SAFEMODE}} phase can take a long time on a large cluster. With fast tailing the segments are small, but the number of them is large - 160K. This makes the page load forever. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15166) Remove redundant field fStream in ByteStringLog
Konstantin Shvachko created HDFS-15166: -- Summary: Remove redundant field fStream in ByteStringLog Key: HDFS-15166 URL: https://issues.apache.org/jira/browse/HDFS-15166 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.10.0 Reporter: Konstantin Shvachko {{ByteStringLog.fStream}} is only used in {{init()}} method and can be replaced by a local variable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15118) [SBN Read] Slow clients when Observer reads are enabled but there are no Observers on the cluster.
Konstantin Shvachko created HDFS-15118: -- Summary: [SBN Read] Slow clients when Observer reads are enabled but there are no Observers on the cluster. Key: HDFS-15118 URL: https://issues.apache.org/jira/browse/HDFS-15118 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.10.0 Reporter: Konstantin Shvachko We see substantial degradation in performance of HDFS clients, when Observer reads are enabled via {{ObserverReadProxyProvider}}, but there are no ObserverNodes on the cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15111) start / stopStandbyServices() should log which service it is transitioning to/from.
Konstantin Shvachko created HDFS-15111: -- Summary: start / stopStandbyServices() should log which service it is transitioning to/from. Key: HDFS-15111 URL: https://issues.apache.org/jira/browse/HDFS-15111 Project: Hadoop HDFS Issue Type: Bug Components: hdfs, logging Affects Versions: 2.10.0 Reporter: Konstantin Shvachko Trying to transition Observer to Standby state. Both {{stopStandbyServices()}} and {{startStandbyServices()}} log that they are stopping/starting Standby services. # {{startStandbyServices()}} should log which state it is transitioning TO. # {{stopStandbyServices()}} should log which state it is transitioning FROM. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15099) [SBN Read] getBlockLocations() should throw ObserverRetryOnActiveException on an attempt to change aTime on ObserverNode
Konstantin Shvachko created HDFS-15099: -- Summary: [SBN Read] getBlockLocations() should throw ObserverRetryOnActiveException on an attempt to change aTime on ObserverNode Key: HDFS-15099 URL: https://issues.apache.org/jira/browse/HDFS-15099 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.10.0 Reporter: Konstantin Shvachko The precision of updating an INode's aTime while executing {{getBlockLocations()}} is 1 hour by default. Updates cannot be handled by ObserverNode, so the call should be redirected to Active NameNode. In order to redirect to active the ObserverNode should through {{ObserverRetryOnActiveException}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15076) Fix tests that hold FSDirectory lock, without holding FSNamesystem lock.
Konstantin Shvachko created HDFS-15076: -- Summary: Fix tests that hold FSDirectory lock, without holding FSNamesystem lock. Key: HDFS-15076 URL: https://issues.apache.org/jira/browse/HDFS-15076 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Konstantin Shvachko Three tests {{TestGetBlockLocations}}, {{TestFSNamesystem}}, {{TestDiskspaceQuotaUpdate}} use {{FSDirectory}} methods, which hold FSDirectory lock. They should also hold the global Namesystem lock. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15037) Encryption Zone operations should not block other RPC calls while retreivingencryption keys.
Konstantin Shvachko created HDFS-15037: -- Summary: Encryption Zone operations should not block other RPC calls while retreivingencryption keys. Key: HDFS-15037 URL: https://issues.apache.org/jira/browse/HDFS-15037 Project: Hadoop HDFS Issue Type: Bug Components: encryption, namenode Affects Versions: 2.10.0 Reporter: Konstantin Shvachko I believe it was an intention to avoid blocking other operations while retrieving keys with holding {{[FSDirectory.dirLock}}. But in reality all other operations enter first {{FSNamesystemLock}} then {{dirLock}}. So they are all blocked waiting for the key. We see substantial increase in RPC wait time ({{RpcQueueTimeAvgTime}}) on NameNode when encryption operations are intermixed with regular workloads. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15036) Active NameNode should not silently fail the image transfer
Konstantin Shvachko created HDFS-15036: -- Summary: Active NameNode should not silently fail the image transfer Key: HDFS-15036 URL: https://issues.apache.org/jira/browse/HDFS-15036 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.10.0 Reporter: Konstantin Shvachko Image transfer from Standby NameNode to Active silently fails on Active, without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15017) Remove redundant import of AtomicBoolean in NameNodeConnector.
Konstantin Shvachko created HDFS-15017: -- Summary: Remove redundant import of AtomicBoolean in NameNodeConnector. Key: HDFS-15017 URL: https://issues.apache.org/jira/browse/HDFS-15017 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover, hdfs Affects Versions: 2.10.0 Reporter: Konstantin Shvachko Should remove redundant import. Looks like it is specific to branch 2.10. Trunk and 3x branches don't have it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15004) Refactor TestBalancer for faster execution.
Konstantin Shvachko created HDFS-15004: -- Summary: Refactor TestBalancer for faster execution. Key: HDFS-15004 URL: https://issues.apache.org/jira/browse/HDFS-15004 Project: Hadoop HDFS Issue Type: Bug Components: hdfs, test Affects Versions: 2.10.0 Reporter: Konstantin Shvachko {{TestBalancer}} is a big test by itself, it is also a part of many other tests. Running these tests involves spinning of {{MiniDFSCluter}} and shutting it down for every test case, which is inefficient. Many of the test cases can run using the same instance of {{MiniDFSCluter}}, but not all of them. Would be good to refactor the tests to optimize their running time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14792) [SBN read] StanbyNode does not come out of safemode while adding new blocks.
[ https://issues.apache.org/jira/browse/HDFS-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-14792. Fix Version/s: 2.10.1 Resolution: Fixed This turned out to be related to the same race condition between edits {{OP_ADD_BLOCK}} and IBRs of HDFS-14941. We do not see any delays in leaving safemode on StandbyNode after the HDFS-14941 fix. Closing this as fixed. > [SBN read] StanbyNode does not come out of safemode while adding new blocks. > > > Key: HDFS-14792 > URL: https://issues.apache.org/jira/browse/HDFS-14792 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Priority: Major > Fix For: 2.10.1 > > > During startup StandbyNode reports that it needs additional X blocks to reach > the threshold 1.. Where X is changing up and down. > This is because with fast tailing SBN adds new blocks from edits while DNs > have not reported replicas yet. Being in SafeMode SBN counts new blocks > towards the threshold and can stay in SafeMode for a long time. > By design, the purpose of startup SafeMode is to disallow modifications of > the namespace and blocks map until all DN replicas are reported. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-12943) Consistent Reads from Standby Node
[ https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-12943. Fix Version/s: 2.10.0 3.2.2 3.1.4 3.3.0 Hadoop Flags: Reviewed Release Note: Observer is a new type of NameNodes in addition to Active and Standby in HA settings. Observer Node maintains a replica of the namespace same as a Standby Node. It additionally allows execution of clients read requests. To ensure read-after-write consistency within a single client, a state ID is introduced in RPC headers. The Observer responds to the client request only after its own state has caught up with the client’s state ID, which it previously received from the Active NameNode. Clients can explicitly invoke a new client protocol call msync(), which ensures that subsequent reads by this client from an Observer are consistent. A new client-side ObserverReadProxyProvider is introduced to provide automatic switching between Active and Observer NameNodes for submitting respectively write and read requests. Resolution: Fixed Closing this as Fixed. The feature has been tested, back-ported down to 2.10 and released. Few remaining subtasks are being addressed as usual issues. Added release notes. Please review if I missed anything. _Thank you everybody for contributing to this effort._ > Consistent Reads from Standby Node > -- > > Key: HDFS-12943 > URL: https://issues.apache.org/jira/browse/HDFS-12943 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.0 > > Attachments: ConsistentReadsFromStandbyNode.pdf, > ConsistentReadsFromStandbyNode.pdf, HDFS-12943-001.patch, > HDFS-12943-002.patch, HDFS-12943-003.patch, HDFS-12943-004.patch, > TestPlan-ConsistentReadsFromStandbyNode.pdf > > > StandbyNode in HDFS is a replica of the active NameNode. The states of the > NameNodes are coordinated via the journal. It is natural to consider > StandbyNode as a read-only replica. As with any replicated distributed system > the problem of stale reads should be resolved. Our main goal is to provide > reads from standby in a consistent way in order to enable a wide range of > existing applications running on top of HDFS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14443) Throwing RemoteException in the time of Read Operation
[ https://issues.apache.org/jira/browse/HDFS-14443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-14443. Resolution: Not A Problem Resolving as not a problem. Please reopen if its. > Throwing RemoteException in the time of Read Operation > -- > > Key: HDFS-14443 > URL: https://issues.apache.org/jira/browse/HDFS-14443 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ranith Sardar >Priority: Major > > 2019-04-19 20:54:59,178 DEBUG > org.apache.hadoop.io.retry.RetryInvocationHandler: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): > Operation category WRITE is not supported in state observer. Visit > [https://s.apache.org/sbnn-error] > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1990) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1443) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.msync(NameNodeRpcServer.java:1372) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.msync(ClientNamenodeProtocolServerSideTranslatorPB.java:1929) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:531) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:862) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2791) > , while invoking $Proxy5.getFileInfo over > [host-*-*-*-*/*.*.*.*:6*5,host-*-*-*-*/*.*.*.*:**,host-*-*-*-*/*.*.*.*:6**5]. > Trying to failover immediately. > > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): > Operation category WRITE is not supported in state observer. Visit > [https://s.apache.org/sbnn-error] > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14020) Emulate Observer node falling far behind the Active
[ https://issues.apache.org/jira/browse/HDFS-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-14020. Resolution: Duplicate Resolving as duplicate since HDFS-13873 introduced {{testObserverFallBehind()}} in {{TestMultiObserverNode}}, which serves the purpose. This has also been already tested on live clusters. > Emulate Observer node falling far behind the Active > --- > > Key: HDFS-14020 > URL: https://issues.apache.org/jira/browse/HDFS-14020 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Sherwood Zheng >Assignee: Sherwood Zheng >Priority: Major > > Emulate Observer node falling far behind the Active. Ensure readers switch > over > to another Observer instead of waiting for the lagging Observer to catch up. > If > there is only a single Observer, it should fall back to the Active. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes
[ https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko reopened HDFS-14305: Reopening this. I think we should revert it before it got into a release and became a liability causing incompatible change. > Serial number in BlockTokenSecretManager could overlap between different > namenodes > -- > > Key: HDFS-14305 > URL: https://issues.apache.org/jira/browse/HDFS-14305 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, security >Reporter: Chao Sun >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3 > > Attachments: HDFS-14305.001.patch, HDFS-14305.002.patch, > HDFS-14305.003.patch, HDFS-14305.004.patch, HDFS-14305.005.patch, > HDFS-14305.006.patch > > > Currently, a {{BlockTokenSecretManager}} starts with a random integer as the > initial serial number, and then use this formula to rotate it: > {code:java} > this.intRange = Integer.MAX_VALUE / numNNs; > this.nnRangeStart = intRange * nnIndex; > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > while {{numNNs}} is the total number of NameNodes in the cluster, and > {{nnIndex}} is the index of the current NameNode specified in the > configuration {{dfs.ha.namenodes.}}. > However, with this approach, different NameNode could have overlapping ranges > for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, > and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges > for these two are: > {code} > nn1 -> [-49, 49] > nn2 -> [1, 99] > {code} > This is because the initial serial number could be any negative integer. > Moreover, when the keys are updated, the serial number will again be updated > with the formula: > {code} > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > which means the new serial number could be updated to a range that belongs to > a different NameNode, thus increasing the chance of collision again. > When the collision happens, DataNodes could overwrite an existing key which > will cause clients to fail because of {{InvalidToken}} error. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14794) reportBadBlock is rejected by Observer.
Konstantin Shvachko created HDFS-14794: -- Summary: reportBadBlock is rejected by Observer. Key: HDFS-14794 URL: https://issues.apache.org/jira/browse/HDFS-14794 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.10.0 Reporter: Konstantin Shvachko {{reportBadBlock}} is rejected by Observer via StandbyException {code}StandbyException: Operation category WRITE is not supported in state {code} We should investigate what are the consequences of this and if we should treat {{reportBadBlock}} as IBRs. Note that {{reportBadBlock}} is a part of both {{ClientProtocol}} and {{DatanodeProtocol}} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14793) BlockTokenSecretManager should LOG block tokaen range it operates on.
Konstantin Shvachko created HDFS-14793: -- Summary: BlockTokenSecretManager should LOG block tokaen range it operates on. Key: HDFS-14793 URL: https://issues.apache.org/jira/browse/HDFS-14793 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.10.0 Reporter: Konstantin Shvachko At startup log enough information to identified the range of block token keys for the NameNode. This should make it easier to debug issues with block tokens. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14792) [SBN read] StanbyNode does not come out of safemode while adding new blocks.
Konstantin Shvachko created HDFS-14792: -- Summary: [SBN read] StanbyNode does not come out of safemode while adding new blocks. Key: HDFS-14792 URL: https://issues.apache.org/jira/browse/HDFS-14792 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.10.0 Reporter: Konstantin Shvachko During startup StandbyNode reports that it needs additional X blocks to reach the threshold 1.. Where X is changing up and down. This is because with fast tailing SBN adds new blocks from edits while DNs have not reported replicas yet. Being in SafeMode SBN counts new blocks towards the threshold and can stays in SafeMode for a long time. By design, the purpose of startup SafeMode is to disallow modifications of the namespace and blocks map until all DNs replicas are reported. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14734) [FGL] Introduce Latch Lock to replace Namesystem global lock.
Konstantin Shvachko created HDFS-14734: -- Summary: [FGL] Introduce Latch Lock to replace Namesystem global lock. Key: HDFS-14734 URL: https://issues.apache.org/jira/browse/HDFS-14734 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Konstantin Shvachko The concept of Latch Lock associates a separate lock with each partition of PartitionedGSet. Define the order of acquiring locks on the partitions. Some operations will require holding locks on multiple partitions. It is preferable to retain the global lock for some operations, such as rename. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14733) [FGL] Introduce INode key.
Konstantin Shvachko created HDFS-14733: -- Summary: [FGL] Introduce INode key. Key: HDFS-14733 URL: https://issues.apache.org/jira/browse/HDFS-14733 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Konstantin Shvachko INode keys should satisfy the locality requirement. Keys should be plugable via a configuration parameter. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14732) [FGL] Introduce PartitionedGSet a new implementation of GSet.
Konstantin Shvachko created HDFS-14732: -- Summary: [FGL] Introduce PartitionedGSet a new implementation of GSet. Key: HDFS-14732 URL: https://issues.apache.org/jira/browse/HDFS-14732 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Konstantin Shvachko INodeMap and BlocksMap are currently represented by a hash table implemented as LightWeightGSet. For fine-grained locking it should be replaced by PartitionedGSet - a new implementation of GSet interface, which partitions INodes into ranges based on a key. We should target static partitioning into a configurable number of ranges. This should allow avoiding the high level lock for RangeMap. It should not be a compromise on efficiency, because parallelism on a single node is bounded by the number of CPU cores. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14731) [FGL] Remove redundant locking on NameNode.
Konstantin Shvachko created HDFS-14731: -- Summary: [FGL] Remove redundant locking on NameNode. Key: HDFS-14731 URL: https://issues.apache.org/jira/browse/HDFS-14731 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Konstantin Shvachko Currently NameNode has two global locks: FSNamesystemLock and FSDirectoryLock. An analysis shows that single FSNamesystemLock is sufficient to guarantee consistency of the NameNode state. FSDirectoryLock can be removed. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-14303) check block directory logic not correct when there is only meta file, print no meaning warn log
[ https://issues.apache.org/jira/browse/HDFS-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko reopened HDFS-14303: Reopening for committing the addendum patch to other versions. > check block directory logic not correct when there is only meta file, print > no meaning warn log > --- > > Key: HDFS-14303 > URL: https://issues.apache.org/jira/browse/HDFS-14303 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs >Affects Versions: 2.7.3, 3.2.0, 2.9.2, 2.8.5 > Environment: env free >Reporter: qiang Liu >Assignee: qiang Liu >Priority: Minor > Labels: easy-fix > Fix For: 2.10.0, 3.0.4, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-14303-addendum-01.patch, > HDFS-14303-addendum-02.patch, HDFS-14303-branch-2.005.patch, > HDFS-14303-branch-2.009.patch, HDFS-14303-branch-2.010.patch, > HDFS-14303-branch-2.015.patch, HDFS-14303-branch-2.017.patch, > HDFS-14303-branch-2.7.001.patch, HDFS-14303-branch-2.7.004.patch, > HDFS-14303-branch-2.7.006.patch, HDFS-14303-branch-2.9.011.patch, > HDFS-14303-branch-2.9.012.patch, HDFS-14303-branch-2.9.013.patch, > HDFS-14303-trunk.014.patch, HDFS-14303-trunk.015.patch, > HDFS-14303-trunk.016.patch, HDFS-14303-trunk.016.path, > HDFS-14303.branch-3.2.017.patch > > Original Estimate: 1m > Remaining Estimate: 1m > > chek block directory logic not correct when there is only meta file,print no > meaning warn log, eg: > WARN DirectoryScanner:? - Block: 1101939874 has to be upgraded to block > ID-based layout. Actual block file path: > /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68, > expected block file path: > /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68/subdir68 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning
Konstantin Shvachko created HDFS-14703: -- Summary: NameNode Fine-Grained Locking via Metadata Partitioning Key: HDFS-14703 URL: https://issues.apache.org/jira/browse/HDFS-14703 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs, namenode Reporter: Konstantin Shvachko We target to enable fine-grained locking by splitting the in-memory namespace into multiple partitions each having a separate lock. Intended to improve performance of NameNode write operations. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14502) keepResults option in NNThroughputBenchmark should call saveNamespace()
Konstantin Shvachko created HDFS-14502: -- Summary: keepResults option in NNThroughputBenchmark should call saveNamespace() Key: HDFS-14502 URL: https://issues.apache.org/jira/browse/HDFS-14502 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.7.6 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko {{-keepResults}} option is usually used to make it possible to rerun NNThroughputBenchmark with existing namespace state. E.g. first generate files with {{create}} command and then run {{fileStatus}} command on generated files. NNThroughputBenchmark should call {{saveNamespace()}} when {{-keepResults}} option is specified. Otherwise NN startup takes a while since it needs to digest large edits file starting from the empty image. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14494) Move Server logging of StatedId inside receiveRequestState()
Konstantin Shvachko created HDFS-14494: -- Summary: Move Server logging of StatedId inside receiveRequestState() Key: HDFS-14494 URL: https://issues.apache.org/jira/browse/HDFS-14494 Project: Hadoop HDFS Issue Type: Bug Reporter: Konstantin Shvachko HDFS-14270 introduced logging of the client and server StateIds in trace level. Unfortunately one of the arguments {{alignmentContext.getLastSeenStateId()}} holds a lock on FSEdits, which is called even if trace logging level is disabled. I propose to move logging message inside {{GlobalStateIdContext.receiveRequestState()}} where {{clientStateId}} and {{serverStateId}} already calculated and can be easily printed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14347) Restore a comment line mistakenly removed in ProtobufRpcEngine
Konstantin Shvachko created HDFS-14347: -- Summary: Restore a comment line mistakenly removed in ProtobufRpcEngine Key: HDFS-14347 URL: https://issues.apache.org/jira/browse/HDFS-14347 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.3.0 Reporter: Konstantin Shvachko HDFS-12977 mistakenly removed the following comment line in {{ProtobufRpcEngine.Server.Server()}} {code} - * the range of ports used when port is 0 (an ephemeral port) + * @param alignmentContext provides server state info on client responses {code} Let's put it back. Otherwise the comment doesn't make sense. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-13781) Unit tests for standby reads.
[ https://issues.apache.org/jira/browse/HDFS-13781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-13781. Resolution: Duplicate Fix Version/s: HDFS-12943 This was handled by HDFS-13523, HDFS-13961, HDFS-13925, and a few other issues. Resolving as duplicate. > Unit tests for standby reads. > - > > Key: HDFS-13781 > URL: https://issues.apache.org/jira/browse/HDFS-13781 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Konstantin Shvachko >Priority: Major > Fix For: HDFS-12943 > > > Create more unit tests supporting standby reads feature. Let's come up with a > list of tests that provide sufficient test coverage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14170) Fix white spaces related to SBN reads.
Konstantin Shvachko created HDFS-14170: -- Summary: Fix white spaces related to SBN reads. Key: HDFS-14170 URL: https://issues.apache.org/jira/browse/HDFS-14170 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko This is to fix some checkstyle warnings, mostly white spaces before merging HDFS-12943 branch to trunk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14162) Balancer should work with ObserverNode
Konstantin Shvachko created HDFS-14162: -- Summary: Balancer should work with ObserverNode Key: HDFS-14162 URL: https://issues.apache.org/jira/browse/HDFS-14162 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Konstantin Shvachko Balancer provides a substantial RPC load on NameNode. It would be good to divert Balancer RPCs {{getBlocks()}}, etc. to ObserverNode. The main problem is that Balancer uses {{NamenodeProtocol}}, while ORPP currently supports only {{ClientProtocol}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14160) ObserverReadInvocationHandler should implement RpcInvocationHandler
[ https://issues.apache.org/jira/browse/HDFS-14160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-14160. Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: HDFS-12943 I just committed this. Thanks for the review Chao. > ObserverReadInvocationHandler should implement RpcInvocationHandler > --- > > Key: HDFS-14160 > URL: https://issues.apache.org/jira/browse/HDFS-14160 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Fix For: HDFS-12943 > > Attachments: HDFS-14160-HDFS-12943.001.patch, > HDFS-14160-HDFS-12943.002.patch > > > Currently ObserverReadInvocationHandler implements InvocationHandler. > [As > mentioned|https://issues.apache.org/jira/browse/HDFS-14116?focusedCommentId=16710596=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16710596] > in HDFS-14116 this is the cause of Fsck failing with Observer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14160) ObserverReadInvocationHandler should implement RpcInvocationHandler
Konstantin Shvachko created HDFS-14160: -- Summary: ObserverReadInvocationHandler should implement RpcInvocationHandler Key: HDFS-14160 URL: https://issues.apache.org/jira/browse/HDFS-14160 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Currently ObserverReadInvocationHandler implements InvocationHandler. [As mentioned|https://issues.apache.org/jira/browse/HDFS-14116?focusedCommentId=16710596=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16710596] in HDFS-14116 this is the cause of Fsck failing with Observer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14116) ObserverReadProxyProvider should work with protocols other than ClientProtocol
[ https://issues.apache.org/jira/browse/HDFS-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-14116. Resolution: Fixed Hadoop Flags: Reviewed Pushed to HDFS-12943 branch. Thanks everybody. > ObserverReadProxyProvider should work with protocols other than ClientProtocol > -- > > Key: HDFS-14116 > URL: https://issues.apache.org/jira/browse/HDFS-14116 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Chen Liang >Assignee: Chao Sun >Priority: Major > Fix For: HDFS-12943 > > Attachments: HDFS-14116-HDFS-12943.000.patch, > HDFS-14116-HDFS-12943.001.patch, HDFS-14116-HDFS-12943.002.patch, > HDFS-14116-HDFS-12943.003.patch, HDFS-14116-HDFS-12943.004.patch, > HDFS-14116-HDFS-12943.005.patch > > > Currently in {{ObserverReadProxyProvider}} constructor there is this line > {code} > ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext); > {code} > This could potentially cause failure, because it is possible that factory can > not be casted here. Specifically, > {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the > constructor will be called, and there are two paths that could call into this: > (1).{{NameNodeProxies.createProxy}} > (2).{{NameNodeProxiesClient.createFailoverProxyProvider}} > (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses > {{NameNodeHAProxyFactory}} which can not be casted to > {{ClientHAProxyFactory}}, this happens when, for example, running > NNThroughputBenmarck. To fix this we can at least: > 1. introduce setAlignmentContext to HAProxyFactory which is the parent of > both ClientHAProxyFactory and NameNodeHAProxyFactory OR > 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a > if check with reflection. > Depending on whether it make sense to have alignment context for the case (1) > calling code paths. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-14116) ObserverReadProxyProvider should work with protocols other than ClientProtocol
[ https://issues.apache.org/jira/browse/HDFS-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko reopened HDFS-14116: Reopening. Yes I was looking at this patch and it is as [~vagarychen] said. The {{AlignmentContext}} is set in {{ClientHAProfyFactory}}. It is sort of the source of truth there. ORPP needs that to function properly. NNThroughputBenchmark uses in the end {{NameNodePoxyFactory}} to create essentially a non-ha proxy for a non-ClientProtocol interface. So it should use {{createNonHAProxy()}} rather than building ORPP. So I propose to revert the patch. And let's think how we should fix it. We should make {{NameNodeProxiesClient.createFailoverProxyProvider()}} return null if somebody tries to instantiate ORPP with non-ClientProtocol interface. Then things should fall in place. > ObserverReadProxyProvider should work with protocols other than ClientProtocol > -- > > Key: HDFS-14116 > URL: https://issues.apache.org/jira/browse/HDFS-14116 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Chen Liang >Assignee: Chao Sun >Priority: Major > Fix For: HDFS-12943 > > Attachments: HDFS-14116-HDFS-12943.000.patch, > HDFS-14116-HDFS-12943.001.patch, HDFS-14116-HDFS-12943.002.patch, > HDFS-14116-HDFS-12943.003.patch, HDFS-14116-HDFS-12943.004.patch > > > Currently in {{ObserverReadProxyProvider}} constructor there is this line > {code} > ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext); > {code} > This could potentially cause failure, because it is possible that factory can > not be casted here. Specifically, > {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the > constructor will be called, and there are two paths that could call into this: > (1).{{NameNodeProxies.createProxy}} > (2).{{NameNodeProxiesClient.createFailoverProxyProvider}} > (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses > {{NameNodeHAProxyFactory}} which can not be casted to > {{ClientHAProxyFactory}}, this happens when, for example, running > NNThroughputBenmarck. To fix this we can at least: > 1. introduce setAlignmentContext to HAProxyFactory which is the parent of > both ClientHAProxyFactory and NameNodeHAProxyFactory OR > 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a > if check with reflection. > Depending on whether it make sense to have alignment context for the case (1) > calling code paths. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14149) Adjust annotations on new interfaces/classes for SBN reads.
Konstantin Shvachko created HDFS-14149: -- Summary: Adjust annotations on new interfaces/classes for SBN reads. Key: HDFS-14149 URL: https://issues.apache.org/jira/browse/HDFS-14149 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-12943 Reporter: Konstantin Shvachko Let's make sure that all new classes and interfaces # do have annotations, as some of them don't, like {{ObserverReadProxyProvider}} # that they are annotated as {{Private}} and {{Evolving}}, to allow room for changes -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14131) Create user guide for "Consistent reads from Observer" feature.
Konstantin Shvachko created HDFS-14131: -- Summary: Create user guide for "Consistent reads from Observer" feature. Key: HDFS-14131 URL: https://issues.apache.org/jira/browse/HDFS-14131 Project: Hadoop HDFS Issue Type: Sub-task Components: documentation Affects Versions: HDFS-12943 Reporter: Konstantin Shvachko The documentation should give an overview of the feature, explain configuration parameters, give an example of recommended deployment. It should include the description of Fast Edits Tailing HDFS-13150, as this is required for efficient reads from Observer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14130) Make ZKFC ObserverNode aware
Konstantin Shvachko created HDFS-14130: -- Summary: Make ZKFC ObserverNode aware Key: HDFS-14130 URL: https://issues.apache.org/jira/browse/HDFS-14130 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: HDFS-12943 Reporter: Konstantin Shvachko Need to fix automatic failover with ZKFC. Currently it does not know about ObserverNodes trying to convert them to SBNs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14059) Test reads from standby on a secure cluster with Configured failover
[ https://issues.apache.org/jira/browse/HDFS-14059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-14059. Resolution: Done Thanks [~zero45]! Some cool tests there with multiple observers. Great there were no problem with DTs and failover. We still have an outstanding issue to support automatic failover with ZKFC. Will create a jira for that. Closing this one as done. > Test reads from standby on a secure cluster with Configured failover > > > Key: HDFS-14059 > URL: https://issues.apache.org/jira/browse/HDFS-14059 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Konstantin Shvachko >Assignee: Plamen Jeliazkov >Priority: Major > > Run standard HDFS tests to verify reading from ObserverNode on a secure HA > cluster with {{ConfiguredFailoverProxyProvider}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14058) Test reads from standby on a secure cluster with IP failover
[ https://issues.apache.org/jira/browse/HDFS-14058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-14058. Resolution: Done Thanks, [~vagarychen]! This was a lot of testing. With all related issues resolved and retested I think we can close this one. Load testing and performance tuning should go into the next step. > Test reads from standby on a secure cluster with IP failover > > > Key: HDFS-14058 > URL: https://issues.apache.org/jira/browse/HDFS-14058 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Attachments: dfsio_crs.no-crs.txt, dfsio_crs.with-crs.txt > > > Run standard HDFS tests to verify reading from ObserverNode on a secure HA > cluster with {{IPFailoverProxyProvider}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14094) Fix the order of logging arguments in ObserverReadProxyProvider.
Konstantin Shvachko created HDFS-14094: -- Summary: Fix the order of logging arguments in ObserverReadProxyProvider. Key: HDFS-14094 URL: https://issues.apache.org/jira/browse/HDFS-14094 Project: Hadoop HDFS Issue Type: Sub-task Components: logging Affects Versions: HDFS-12943 Reporter: Konstantin Shvachko [~zero45] finding from HDFS-14067 In ObserverReadProxyProvider there is a warn message: {code:java} LOG.warn("{} observers have failed for read request {}; also found " + "{} standby and {} active. Falling back to active.", failedObserverCount, standbyCount, activeCount, method.getName()); {code} Seems the arguments are out of order, should probably be {{failedObserverCount, method.getName(), standbyCount, activeCoun}}`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14059) Test reads from standby on a secure cluster with Configured failover
Konstantin Shvachko created HDFS-14059: -- Summary: Test reads from standby on a secure cluster with Configured failover Key: HDFS-14059 URL: https://issues.apache.org/jira/browse/HDFS-14059 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Konstantin Shvachko Assignee: Plamen Jeliazkov Run standard HDFS tests to verify reading from ObserverNode on a secure HA cluster with {{ConfiguredFailoverProxyProvider}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14058) Test reads from standby on a secure cluster with IP failover
Konstantin Shvachko created HDFS-14058: -- Summary: Test reads from standby on a secure cluster with IP failover Key: HDFS-14058 URL: https://issues.apache.org/jira/browse/HDFS-14058 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Konstantin Shvachko Assignee: Chen Liang Run standard HDFS tests to verify reading from ObserverNode on a secure HA cluster with {{IPFailoverProxyProvider}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-12026) libhdfs++: Fix compilation errors and warnings when compiling with Clang
[ https://issues.apache.org/jira/browse/HDFS-12026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko reopened HDFS-12026: Reopening. I think it is a blocker for release 3.2. If there is no progress on this, I would recommend reverting. Potentially the entire branch HDFS-8707, I didn't check how much of it is relied on this change. > libhdfs++: Fix compilation errors and warnings when compiling with Clang > - > > Key: HDFS-12026 > URL: https://issues.apache.org/jira/browse/HDFS-12026 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Anatoli Shein >Assignee: Anatoli Shein >Priority: Major > Attachments: HDFS-12026.HDFS-8707.000.patch, > HDFS-12026.HDFS-8707.001.patch, HDFS-12026.HDFS-8707.002.patch, > HDFS-12026.HDFS-8707.003.patch, HDFS-12026.HDFS-8707.004.patch, > HDFS-12026.HDFS-8707.005.patch, HDFS-12026.HDFS-8707.006.patch, > HDFS-12026.HDFS-8707.007.patch, HDFS-12026.HDFS-8707.008.patch, > HDFS-12026.HDFS-8707.009.patch, HDFS-12026.HDFS-8707.010.patch > > > Currently multiple errors and warnings prevent libhdfspp from being compiled > with clang. It should compile cleanly using flag: > -std=c++11 > and also warning flags: > -Weverything -Wno-c++98-compat -Wno-missing-prototypes > -Wno-c++98-compat-pedantic -Wno-padded -Wno-covered-switch-default > -Wno-missing-noreturn -Wno-unknown-pragmas -Wconversion -Werror -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-13974) Introduce the single Observer failure
[ https://issues.apache.org/jira/browse/HDFS-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko reopened HDFS-13974: > Introduce the single Observer failure > - > > Key: HDFS-13974 > URL: https://issues.apache.org/jira/browse/HDFS-13974 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Sherwood Zheng >Assignee: Sherwood Zheng >Priority: Major > > Introduce the single Observer failure. Reads should be automatically > redirected > to Active NameNode -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-13974) Introduce the single Observer failure
[ https://issues.apache.org/jira/browse/HDFS-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-13974. Resolution: Duplicate Re-resolving as Duplicate > Introduce the single Observer failure > - > > Key: HDFS-13974 > URL: https://issues.apache.org/jira/browse/HDFS-13974 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Sherwood Zheng >Assignee: Sherwood Zheng >Priority: Major > > Introduce the single Observer failure. Reads should be automatically > redirected > to Active NameNode -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13961) TestObserverNode refactoring
Konstantin Shvachko created HDFS-13961: -- Summary: TestObserverNode refactoring Key: HDFS-13961 URL: https://issues.apache.org/jira/browse/HDFS-13961 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: HDFS-12943 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko TestObserverNode combines unit tests for ObserverNode. The tests are of different types. I propose to split them into separate modules, factor out common methods, and optimize it so that it starts and shuts down MIniHDFSCluster once for the entire test rather than for individual test cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-13780) Postpone NameNode state discovery in ObserverReadProxyProvider until the first real RPC call.
[ https://issues.apache.org/jira/browse/HDFS-13780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-13780. Resolution: Duplicate Fix Version/s: HDFS-12943 I think it was incorporated, indeed. > Postpone NameNode state discovery in ObserverReadProxyProvider until the > first real RPC call. > - > > Key: HDFS-13780 > URL: https://issues.apache.org/jira/browse/HDFS-13780 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Fix For: HDFS-12943 > > > Currently {{ObserverReadProxyProvider}} during instantiation discovers > Observers by poking known NameNodes and checking their states. This rather > expensive process can be postponed until the first actual RPC call. > This is an optimization. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13873) ObserverNode should reject read requests when it is too far behind.
Konstantin Shvachko created HDFS-13873: -- Summary: ObserverNode should reject read requests when it is too far behind. Key: HDFS-13873 URL: https://issues.apache.org/jira/browse/HDFS-13873 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode Affects Versions: HDFS-12943 Reporter: Konstantin Shvachko Add a server-side threshold for ObserverNode to reject read requests when it is too far behind. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-13782) ObserverReadProxyProvider should work with IPFailoverProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-13782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-13782. Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: HDFS-12943 Release Note: I just committed this. > ObserverReadProxyProvider should work with IPFailoverProxyProvider > -- > > Key: HDFS-13782 > URL: https://issues.apache.org/jira/browse/HDFS-13782 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Fix For: HDFS-12943 > > Attachments: HDFS-13782-HDFS-12943.001.patch, > HDFS-13782-HDFS-12943.002.patch > > > Currently {{ObserverReadProxyProvider}} is based on > {{ConfiguredFailoverProxyProvider}}. We should also be able perform SBN reads > in case of {{IPFailoverProxyProvider}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13851) Remove AlignmentContext from AbstractNNFailoverProxyProvider
Konstantin Shvachko created HDFS-13851: -- Summary: Remove AlignmentContext from AbstractNNFailoverProxyProvider Key: HDFS-13851 URL: https://issues.apache.org/jira/browse/HDFS-13851 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-12943 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko {{AlignmentContext}} is now a part of {{ObserverReadProxyProvider}}, we can remove it from the base class. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13848) Refactor NameNode failover proxy providers
Konstantin Shvachko created HDFS-13848: -- Summary: Refactor NameNode failover proxy providers Key: HDFS-13848 URL: https://issues.apache.org/jira/browse/HDFS-13848 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, hdfs-client Affects Versions: 2.7.5 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Looking at NN failover proxy providers in the context of HDFS-13782 I noticed that {{ConfiguredFailoverProxyProvider}} and {{IPFailoverProxyProvider}} have a lot of common logic. We can move this common logic into {{AbstractNNFailoverProxyProvider}}, which simplifies things a lot. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13782) IPFailoverProxyProvider should work with SBN
Konstantin Shvachko created HDFS-13782: -- Summary: IPFailoverProxyProvider should work with SBN Key: HDFS-13782 URL: https://issues.apache.org/jira/browse/HDFS-13782 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Konstantin Shvachko Currently {{ObserverReadProxyProvider}} is based on {{ConfiguredFailoverProxyProvider}}. We should also be able perform SBN reads in case of {{IPFailoverProxyProvider}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13781) Unit test for standby reads.
Konstantin Shvachko created HDFS-13781: -- Summary: Unit test for standby reads. Key: HDFS-13781 URL: https://issues.apache.org/jira/browse/HDFS-13781 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Konstantin Shvachko Create more unit tests supporting standby reads feature. Let's come up with a list of tests that provide sufficient test coverage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13780) Postpone NameNode state discovery in ObserverReadProxyProvider until the first real RPC call.
Konstantin Shvachko created HDFS-13780: -- Summary: Postpone NameNode state discovery in ObserverReadProxyProvider until the first real RPC call. Key: HDFS-13780 URL: https://issues.apache.org/jira/browse/HDFS-13780 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Konstantin Shvachko Currently {{ObserverReadProxyProvider}} during instantiation discovers Observers by poking known NameNodes and checking their states. This rather expensive process can be postponed until the first actual RPC call. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.
Konstantin Shvachko created HDFS-13779: -- Summary: Implement performFailover logic for ObserverReadProxyProvider. Key: HDFS-13779 URL: https://issues.apache.org/jira/browse/HDFS-13779 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Konstantin Shvachko Currently {{ObserverReadProxyProvider}} inherits {{performFailover()}} method from {{ConfiguredFailoverProxyProvider}}, which simply increments the index and switches over to another NameNode. The logic for ORPP should be smart enough to choose another observer, otherwise it can switch to a SBN, where reads are disallowed, or to an ANN, which defeats the purpose of reads from standby. This was discussed in HDFS-12976. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13778) In TestStateAlignmentContextWithHA replace artificial AlignmentContextProxyProvider with real ObserverReadProxyProvider.
Konstantin Shvachko created HDFS-13778: -- Summary: In TestStateAlignmentContextWithHA replace artificial AlignmentContextProxyProvider with real ObserverReadProxyProvider. Key: HDFS-13778 URL: https://issues.apache.org/jira/browse/HDFS-13778 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Konstantin Shvachko TestStateAlignmentContextWithHA uses an artificial AlignmentContextProxyProvider, which was temporary needed for testing. Now that we have real ObserverReadProxyProvider it can take over ACPP. This is also useful for testing the ORPP. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13706) ClientGCIContext should be correctly named ClientGSIContext
Konstantin Shvachko created HDFS-13706: -- Summary: ClientGCIContext should be correctly named ClientGSIContext Key: HDFS-13706 URL: https://issues.apache.org/jira/browse/HDFS-13706 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Konstantin Shvachko GSI stands for Global State Id. It's a client-side counterpart of NN's {{GlobalStateIdContext}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-8675) IBRs from dead DNs go into infinite loop
[ https://issues.apache.org/jira/browse/HDFS-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-8675. --- Resolution: Not A Problem Looks like not a problem. > IBRs from dead DNs go into infinite loop > > > Key: HDFS-8675 > URL: https://issues.apache.org/jira/browse/HDFS-8675 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 >Reporter: Daryn Sharp >Priority: Major > > If the DN sends an IBR after the NN declares it dead, the NN returns an IOE > of unregistered or dead. The DN catches the IOE, ignores it, and infinitely > loops spamming the NN with retries. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-11985) Intermittent unit test failures on 2.7.4 branch.
[ https://issues.apache.org/jira/browse/HDFS-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-11985. Resolution: Fixed Fix Version/s: 2.7.6 These have been fixed elsewhere. I don't see those failures in 2.7.6 nightly builds anymore. > Intermittent unit test failures on 2.7.4 branch. > > > Key: HDFS-11985 > URL: https://issues.apache.org/jira/browse/HDFS-11985 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.7.4 >Reporter: Konstantin Shvachko >Priority: Major > Fix For: 2.7.6 > > > Some unit tests are failing intermittently on Jenkins nightly builds for > branch-2.7. > Here is the list of test, which failed more than once within last week: > * > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testUnderReplicationAfterVolFailure > * > org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten > * > org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount > > * org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12979) StandbyNode should upload FsImage to ObserverNode after checkpointing.
Konstantin Shvachko created HDFS-12979: -- Summary: StandbyNode should upload FsImage to ObserverNode after checkpointing. Key: HDFS-12979 URL: https://issues.apache.org/jira/browse/HDFS-12979 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Konstantin Shvachko ObserverNode does not create checkpoints. So it's fsimage file can get very old making bootstrap of ObserverNode too long. A StandbyNode should copy latest fsimage to ObserverNode(s) along with ANN. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12978) Fine-grained locking while consuming journal stream.
Konstantin Shvachko created HDFS-12978: -- Summary: Fine-grained locking while consuming journal stream. Key: HDFS-12978 URL: https://issues.apache.org/jira/browse/HDFS-12978 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Konstantin Shvachko In current implementation SBN consumes the entire segment of transactions under a single namesystem lock, which does not allow reads over a long period of time until the segment is processed. We should break the lock into fine grained chunks. In extreme case each transaction should release the lock once it is applied. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12977) Add stateId to RPC headers.
Konstantin Shvachko created HDFS-12977: -- Summary: Add stateId to RPC headers. Key: HDFS-12977 URL: https://issues.apache.org/jira/browse/HDFS-12977 Project: Hadoop HDFS Issue Type: Sub-task Components: ipc, namenode Reporter: Konstantin Shvachko stateId is a new field in the RPC headers of NameNode proto calls. stateId is the journal transaction Id, which represents LastSeenId for the clients and LastWrittenId for NameNodes. See more in [reads from Standby design doc|https://issues.apache.org/jira/secure/attachment/12902925/ConsistentReadsFromStandbyNode.pdf]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12976) Introduce StandbyReadProxyProvider
Konstantin Shvachko created HDFS-12976: -- Summary: Introduce StandbyReadProxyProvider Key: HDFS-12976 URL: https://issues.apache.org/jira/browse/HDFS-12976 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Konstantin Shvachko {{StandbyReadProxyProvider}} should implement {{FailoverProxyProvider}} interface and be able to submit read requests to ANN and SBN(s). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12975) Changes to the NameNode to support reads from standby
Konstantin Shvachko created HDFS-12975: -- Summary: Changes to the NameNode to support reads from standby Key: HDFS-12975 URL: https://issues.apache.org/jira/browse/HDFS-12975 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Konstantin Shvachko In order to support reads from standby NameNode needs changes to add ObserverNode role, which turns off checkpointing and such. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12943) Consistent Reads from Standby Node
Konstantin Shvachko created HDFS-12943: -- Summary: Consistent Reads from Standby Node Key: HDFS-12943 URL: https://issues.apache.org/jira/browse/HDFS-12943 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs Reporter: Konstantin Shvachko StandbyNode in HDFS is a replica of the active NameNode. The states of the NameNodes are coordinated via the journal. It is natural to consider StandbyNode as a read-only replica. As with any replicated distributed system the problem of stale reads should be resolved. Our main goal is to provide reads from standby in a consistent way in order to enable a wide range of existing applications running on top of HDFS. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12884) BlockUnderConstructionFeature.truncateBlock should be of type BlockInfo
Konstantin Shvachko created HDFS-12884: -- Summary: BlockUnderConstructionFeature.truncateBlock should be of type BlockInfo Key: HDFS-12884 URL: https://issues.apache.org/jira/browse/HDFS-12884 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.7.4 Reporter: Konstantin Shvachko {{BlockUnderConstructionFeature.truncateBlock}} type should be changed to {{BlockInfo}} from {{Block}}. {{truncateBlock}} is always assigned as {{BlockInfo}}, so this will avoid unnecessary casts. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-12638) Delete copy-on-truncate block along with the original block, when deleting a file being truncated
[ https://issues.apache.org/jira/browse/HDFS-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-12638. Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.4 3.0.1 2.9.1 2.10.0 3.1.0 2.7.5 Just committed this into the following branches: {code} 3c57def..7998077 branch-2 -> branch-2 7252e18..85eb32b branch-2.7 -> branch-2.7 eacccf1..19c18f7 branch-2.8 -> branch-2.8 5a8a1e6..0f5ec01 branch-2.9 -> branch-2.9 58d849b..def87db branch-3.0 -> branch-3.0 a63d19d..60fd0d7 trunk -> trunk {code} Thank you everybody for contributing. > Delete copy-on-truncate block along with the original block, when deleting a > file being truncated > - > > Key: HDFS-12638 > URL: https://issues.apache.org/jira/browse/HDFS-12638 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.2 >Reporter: Jiandan Yang >Assignee: Konstantin Shvachko >Priority: Blocker > Fix For: 2.7.5, 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4 > > Attachments: HDFS-12638-branch-2.8.2.001.patch, HDFS-12638.002.patch, > HDFS-12638.003.patch, HDFS-12638.004.patch, OphanBlocksAfterTruncateDelete.jpg > > > Active NamNode exit due to NPE, I can confirm that the BlockCollection passed > in when creating ReplicationWork is null, but I do not know why > BlockCollection is null, By view history I found > [HDFS-9754|https://issues.apache.org/jira/browse/HDFS-9754] remove judging > whether BlockCollection is null. > NN logs are as following: > {code:java} > 2017-10-11 16:29:06,161 ERROR [ReplicationMonitor] > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > ReplicationMonitor thread received Runtime exception. > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:55) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1532) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1491) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3792) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3744) > at java.lang.Thread.run(Thread.java:834) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-9754) Avoid unnecessary getBlockCollection calls in BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-9754. --- Resolution: Fixed Resolving this, based on the discussion in HDFS-12638. Filed HDFS-12880 instead. > Avoid unnecessary getBlockCollection calls in BlockManager > -- > > Key: HDFS-9754 > URL: https://issues.apache.org/jira/browse/HDFS-9754 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 2.8.2, 3.0.0-alpha1, 2.9.0 > > Attachments: HDFS-9754.000.patch, HDFS-9754.001.patch, > HDFS-9754.002.patch > > > Currently BlockManager calls {{Namesystem#getBlockCollection}} in order to: > 1. check if the block has already been abandoned > 2. identify the storage policy of the block > 3. meta save > For #1 we can use BlockInfo's internal state instead of checking if the > corresponding file still exists. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12880) Disallow abandoned blocks in the BlocksMap
Konstantin Shvachko created HDFS-12880: -- Summary: Disallow abandoned blocks in the BlocksMap Key: HDFS-12880 URL: https://issues.apache.org/jira/browse/HDFS-12880 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.7.4 Reporter: Konstantin Shvachko BlocksMap used to contain only valid blocks, that is belonging to a file. The issue is intended to restore this invariant. This was discussed in details while fixing HDFS-12638 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-9754) Avoid unnecessary getBlockCollection calls in BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko reopened HDFS-9754: --- There is clearly value in the work done here. I would rather revert the entire thing in order to unblock 2.8 release, and then let people modify the patch. Reopening this for now. > Avoid unnecessary getBlockCollection calls in BlockManager > -- > > Key: HDFS-9754 > URL: https://issues.apache.org/jira/browse/HDFS-9754 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 2.9.0, 3.0.0-alpha1, 2.8.2 > > Attachments: HDFS-9754.000.patch, HDFS-9754.001.patch, > HDFS-9754.002.patch > > > Currently BlockManager calls {{Namesystem#getBlockCollection}} in order to: > 1. check if the block has already been abandoned > 2. identify the storage policy of the block > 3. meta save > For #1 we can use BlockInfo's internal state instead of checking if the > corresponding file still exists. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12856) BlockReconstructionWork.chooseTargets() violates namesystem locking
Konstantin Shvachko created HDFS-12856: -- Summary: BlockReconstructionWork.chooseTargets() violates namesystem locking Key: HDFS-12856 URL: https://issues.apache.org/jira/browse/HDFS-12856 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.4 Reporter: Konstantin Shvachko {{BlockReconstructionWork.chooseTargets()}} is called outside namesystem lock, although it works with {{DatanodeDescriptor}} and {{DatanodeStorageInfo}}, which can change. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12855) Fsck violates namesystem locking
Konstantin Shvachko created HDFS-12855: -- Summary: Fsck violates namesystem locking Key: HDFS-12855 URL: https://issues.apache.org/jira/browse/HDFS-12855 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.4 Reporter: Konstantin Shvachko {{NamenodeFsck}} access {{FSNamesystem}} structures, such as INodes, BlockInfo without holding a lock. See e.g. {{NamenodeFsck.blockIdCK()}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12458) TestReencryptionWithKMS fails regularly
Konstantin Shvachko created HDFS-12458: -- Summary: TestReencryptionWithKMS fails regularly Key: HDFS-12458 URL: https://issues.apache.org/jira/browse/HDFS-12458 Project: Hadoop HDFS Issue Type: Bug Components: kms, test Affects Versions: 3.0.0 Reporter: Konstantin Shvachko {{TestReencryptionWithKMS}} fails pretty often on Jenkins. Should fix it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-11896) Non-dfsUsed will be doubled on dead node re-registration
[ https://issues.apache.org/jira/browse/HDFS-11896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-11896. Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.3 3.0.0-beta1 2.7.4 2.9.0 I just committed this to trunk, and branches 2, 2.8, and 2.7. Thank you [~brahmareddy] and [~zhz]. > Non-dfsUsed will be doubled on dead node re-registration > > > Key: HDFS-11896 > URL: https://issues.apache.org/jira/browse/HDFS-11896 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.3 >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Blocker > Fix For: 2.9.0, 2.7.4, 3.0.0-beta1, 2.8.3 > > Attachments: HDFS-11896-002.patch, HDFS-11896-003.patch, > HDFS-11896-004.patch, HDFS-11896-005.patch, HDFS-11896-006.patch, > HDFS-11896-007.patch, HDFS-11896-008.patch, HDFS-11896-branch-2.7-001.patch, > HDFS-11896-branch-2.7-002.patch, HDFS-11896-branch-2.7-003.patch, > HDFS-11896-branch-2.7-004.patch, HDFS-11896-branch-2.7-005.patch, > HDFS-11896-branch-2.7-006.patch, HDFS-11896-branch-2.7-008.patch, > HDFS-11896.patch > > > *Scenario:* > i)Make you sure you've non-dfs data. > ii) Stop Datanode > iii) wait it becomes dead > iv) now restart and check the non-dfs data -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11893) Fix TestDFSShell.testMoveWithTargetPortEmpty failure on branch-2.7
Konstantin Shvachko created HDFS-11893: -- Summary: Fix TestDFSShell.testMoveWithTargetPortEmpty failure on branch-2.7 Key: HDFS-11893 URL: https://issues.apache.org/jira/browse/HDFS-11893 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.7.4 Reporter: Konstantin Shvachko {{TestDFSShell.testMoveWithTargetPortEmpty()}} is consistently failing on branch-2.7. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-11078) NPE in LazyPersistFileScrubber
[ https://issues.apache.org/jira/browse/HDFS-11078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-11078. Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.2 3.0.0-alpha3 2.7.4 2.9.0 I just committed this. Thank you [~elgoiri]. > NPE in LazyPersistFileScrubber > -- > > Key: HDFS-11078 > URL: https://issues.apache.org/jira/browse/HDFS-11078 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Inigo Goiri >Assignee: Inigo Goiri > Fix For: 2.9.0, 2.7.4, 3.0.0-alpha3, 2.8.2 > > Attachments: HDFS-11078.000.patch, HDFS-11078.001.patch, > HDFS-11078-branch-2.7.patch > > > If a block is removed, it will be removed from the block map. When the > clearCorruptLazyPersistFiles() tries to delete the block, it may already be > deleted and generate a null pointer exception. > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:3820) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:3851) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-11867) Backport HDFS-6291 to branch 2.7
[ https://issues.apache.org/jira/browse/HDFS-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-11867. Resolution: Fixed Hadoop Flags: Reviewed Just committed this to branch-2.7. Thank you [~elgoiri] > Backport HDFS-6291 to branch 2.7 > > > Key: HDFS-11867 > URL: https://issues.apache.org/jira/browse/HDFS-11867 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Inigo Goiri >Assignee: Inigo Goiri > Attachments: HDFS-6291-branch-2.7.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-11731) Balancer.run() prints redundant included, excluded, source nodes.
[ https://issues.apache.org/jira/browse/HDFS-11731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-11731. Resolution: Invalid Yes, I agree there is no redundancy. Not sure now where I saw it. [~vrushalic] thank you for verifying. Closing. > Balancer.run() prints redundant included, excluded, source nodes. > - > > Key: HDFS-11731 > URL: https://issues.apache.org/jira/browse/HDFS-11731 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Konstantin Shvachko > Labels: newbie > > Included, excluded, and source nodes are printed twice by the Balancer. First > as part of {{BalancerParameters.toString()}} in > {code} > LOG.info("parameters = " + p); > {code} > And then separately > {code} > LOG.info("included nodes = " + p.getIncludedNodes()); > LOG.info("excluded nodes = " + p.getExcludedNodes()); > LOG.info("source nodes = " + p.getSourceNodes()); > {code} > The latter can be removed. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-2538) option to disable fsck dots
[ https://issues.apache.org/jira/browse/HDFS-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-2538. --- Resolution: Fixed Resolving it back. The incompatibility concern is valid. I am still thinking about if we can / should include it. Sorry for confusion. Thanks for the patch [~elgoiri]. > option to disable fsck dots > > > Key: HDFS-2538 > URL: https://issues.apache.org/jira/browse/HDFS-2538 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Allen Wittenauer >Assignee: Mohammad Kamrul Islam >Priority: Minor > Labels: newbie, release-blocker > Fix For: 3.0.0-alpha1 > > Attachments: HDFS-2538.1.patch, HDFS-2538.2.patch, HDFS-2538.3.patch, > HDFS-2538-branch-0.20-security-204.patch, > HDFS-2538-branch-0.20-security-204.patch, HDFS-2538-branch-1.0.patch, > HDFS-2538-branch-2.7.patch > > > this patch turns the dots during fsck off by default and provides an option > to turn them back on if you have a fetish for millions and millions of dots > on your terminal. i haven't done any benchmarks, but i suspect fsck is now > 300% faster to boot. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11736) OIV tests should not write outside 'target' directory.
Konstantin Shvachko created HDFS-11736: -- Summary: OIV tests should not write outside 'target' directory. Key: HDFS-11736 URL: https://issues.apache.org/jira/browse/HDFS-11736 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Konstantin Shvachko A few tests use {{Files.createTempDir()}} from Guava package, but do not set {{java.io.tmpdir}} system property. Thus the temp directory is created in unpredictable places and is not being cleaned up by {{mvn clean}}. This was probably introduced in {{TestOfflineImageViewer}} and then replicated in {{TestCheckpoint}}, {{TestStandbyCheckpoints}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11733) TestGetBlocks.getBlocksWithException() ignores datanode and size parameters.
Konstantin Shvachko created HDFS-11733: -- Summary: TestGetBlocks.getBlocksWithException() ignores datanode and size parameters. Key: HDFS-11733 URL: https://issues.apache.org/jira/browse/HDFS-11733 Project: Hadoop HDFS Issue Type: Bug Components: balancer & mover, test Affects Versions: 2.6.1 Reporter: Konstantin Shvachko {{TestGetBlocks.getBlocksWithException()}} has 3 parameters, but uses only one. So whatever callers think they pass in, it is ignored. Looks like we should change it to use the parameters, but I am not sure how this will affect the test. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org