[jira] [Resolved] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes
[ https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-14305. Resolution: Fixed > Serial number in BlockTokenSecretManager could overlap between different > namenodes > -- > > Key: HDFS-14305 > URL: https://issues.apache.org/jira/browse/HDFS-14305 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, security >Reporter: Chao Sun >Assignee: Konstantin Shvachko >Priority: Major > Labels: multi-sbnn > Attachments: HDFS-14305-007.patch, HDFS-14305-008.patch, > HDFS-14305.001.patch, HDFS-14305.002.patch, HDFS-14305.003.patch, > HDFS-14305.004.patch, HDFS-14305.005.patch, HDFS-14305.006.patch > > > Currently, a {{BlockTokenSecretManager}} starts with a random integer as the > initial serial number, and then use this formula to rotate it: > {code:java} > this.intRange = Integer.MAX_VALUE / numNNs; > this.nnRangeStart = intRange * nnIndex; > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > while {{numNNs}} is the total number of NameNodes in the cluster, and > {{nnIndex}} is the index of the current NameNode specified in the > configuration {{dfs.ha.namenodes.}}. > However, with this approach, different NameNode could have overlapping ranges > for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, > and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges > for these two are: > {code} > nn1 -> [-49, 49] > nn2 -> [1, 99] > {code} > This is because the initial serial number could be any negative integer. > Moreover, when the keys are updated, the serial number will again be updated > with the formula: > {code} > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > which means the new serial number could be updated to a range that belongs to > a different NameNode, thus increasing the chance of collision again. > When the collision happens, DataNodes could overwrite an existing key which > will cause clients to fail because of {{InvalidToken}} error. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16517) In 2.10 the distance metric is wrong for non-DN machines
[ https://issues.apache.org/jira/browse/HDFS-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511011#comment-17511011 ] Konstantin Shvachko commented on HDFS-16517: Looks like the same issue as HADOOP-16161, as [~xinglin] found out. The fix is equivalent. I did not compare the tests. Should we just backport [~omalley]? > In 2.10 the distance metric is wrong for non-DN machines > > > Key: HDFS-16517 > URL: https://issues.apache.org/jira/browse/HDFS-16517 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1 >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > In 2.10, the metric for distance between the client and the data node is > wrong for machines that aren't running data nodes (ie. > getWeightUsingNetworkLocation). The code works correctly in 3.3+. > Currently > > ||Client||DataNode||getWeight||getWeightUsingNetworkLocation|| > |/rack1/node1|/rack1/node1|0|0| > |/rack1/node1|/rack1/node2|2|2| > |/rack1/node1|/rack2/node2|4|2| > |/pod1/rack1/node1|/pod1/rack1/node2|2|2| > |/pod1/rack1/node1|/pod1/rack2/node2|4|2| > |/pod1/rack1/node1|/pod2/rack2/node2|6|4| > > This bug will destroy data locality on clusters where the clients share racks > with DataNodes, but are running on machines that aren't running DataNodes, > such as striping federated HDFS clusters across racks. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10650) DFSClient#mkdirs and DFSClient#primitiveMkdir should use default directory permission
[ https://issues.apache.org/jira/browse/HDFS-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-10650: --- Fix Version/s: 2.10.2 Merged this into branch-2.10. Updated fix version. > DFSClient#mkdirs and DFSClient#primitiveMkdir should use default directory > permission > - > > Key: HDFS-10650 > URL: https://issues.apache.org/jira/browse/HDFS-10650 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: John Zhuge >Assignee: John Zhuge >Priority: Minor > Fix For: 3.0.0-alpha1, 2.10.2 > > Attachments: HDFS-10650.001.patch, HDFS-10650.002.patch > > > These 2 DFSClient methods should use default directory permission to create a > directory. > {code:java} > public boolean mkdirs(String src, FsPermission permission, > boolean createParent) throws IOException { > if (permission == null) { > permission = FsPermission.getDefault(); > } > {code} > {code:java} > public boolean primitiveMkdir(String src, FsPermission absPermission, > boolean createParent) > throws IOException { > checkOpen(); > if (absPermission == null) { > absPermission = > FsPermission.getDefault().applyUMask(dfsClientConf.uMask); > } > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16322) The NameNode implementation of ClientProtocol.truncate(...) can cause data loss.
[ https://issues.apache.org/jira/browse/HDFS-16322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17444033#comment-17444033 ] Konstantin Shvachko commented on HDFS-16322: Is it specific to truncate? Same thing should happen with {{mkdir()}}. Client A creates a directory, client B deletes it, then client A retries the create. Same with {{setPermission()}}? >From NN perspective the two calls from client A are different calls. Since NN >responded to the first call from client A, it treats the retry as the second >call. > The NameNode implementation of ClientProtocol.truncate(...) can cause data > loss. > > > Key: HDFS-16322 > URL: https://issues.apache.org/jira/browse/HDFS-16322 > Project: Hadoop HDFS > Issue Type: Bug > Environment: The runtime environment is Ubuntu 18.04, Java 1.8.0_222 > and Apache Maven 3.6.0. > The bug can be reproduced by the the testMultipleTruncate() in the > attachment. First, replace the file TestFileTruncate.java under the directory > "hadoop-3.3.1-src/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/" > with the attachment. Then run "mvn test > -Dtest=org.apache.hadoop.hdfs.server.namenode.TestFileTruncate#testMultipleTruncate" > to run the testcase. Finally the "assertFileLength(p, n+newLength)" at 199 > line of TestFileTruncate.java will abort. Because the retry of truncate() > changes the file size and cause data loss. >Reporter: nhaorand >Priority: Major > Attachments: TestFileTruncate.java > > > The NameNode implementation of ClientProtocol.truncate(...) can cause data > loss. If dfsclient drops the first response of a truncate RPC call, the retry > by retry cache will truncate the file again and cause data loss. > HDFS-7926 avoids repeated execution of truncate(...) by checking if the file > is already being truncated with the same length. However, under concurrency, > after the first execution of truncate(...), concurrent requests from other > clients may append new data and change the file length. When truncate(...) is > retried after that, it will find the file has not been truncated with the > same length and truncate it again, which causes data loss. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-7612) TestOfflineEditsViewer.testStored() uses incorrect default value for cacheDir
[ https://issues.apache.org/jira/browse/HDFS-7612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-7612. --- Fix Version/s: 3.2.4 3.3.2 2.10.2 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed I just committed this to the four active branches. Congratulations [~mkuchenbecker]! > TestOfflineEditsViewer.testStored() uses incorrect default value for cacheDir > - > > Key: HDFS-7612 > URL: https://issues.apache.org/jira/browse/HDFS-7612 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.6.0 >Reporter: Konstantin Shvachko >Assignee: Michael Kuchenbecker >Priority: Major > Labels: newbie, pull-request-available > Fix For: 3.4.0, 2.10.2, 3.3.2, 3.2.4 > > Time Spent: 0.5h > Remaining Estimate: 0h > > {code} > final String cacheDir = System.getProperty("test.cache.data", > "build/test/cache"); > {code} > results in > {{FileNotFoundException: build/test/cache/editsStoredParsed.xml (No such file > or directory)}} > when {{test.cache.data}} is not set. > I can see this failing while running in Eclipse. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-7612) TestOfflineEditsViewer.testStored() uses incorrect default value for cacheDir
[ https://issues.apache.org/jira/browse/HDFS-7612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko reassigned HDFS-7612: - Assignee: Michael Kuchenbecker > TestOfflineEditsViewer.testStored() uses incorrect default value for cacheDir > - > > Key: HDFS-7612 > URL: https://issues.apache.org/jira/browse/HDFS-7612 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.6.0 >Reporter: Konstantin Shvachko >Assignee: Michael Kuchenbecker >Priority: Major > Labels: newbie, pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > {code} > final String cacheDir = System.getProperty("test.cache.data", > "build/test/cache"); > {code} > results in > {{FileNotFoundException: build/test/cache/editsStoredParsed.xml (No such file > or directory)}} > when {{test.cache.data}} is not set. > I can see this failing while running in Eclipse. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13150) [Edit Tail Fast Path] Allow SbNN to tail in-progress edits from JN via RPC
[ https://issues.apache.org/jira/browse/HDFS-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423048#comment-17423048 ] Konstantin Shvachko commented on HDFS-13150: We end up implementing quorum read from JNs for Observer fast path. You should check the code [~liutongwei] > [Edit Tail Fast Path] Allow SbNN to tail in-progress edits from JN via RPC > -- > > Key: HDFS-13150 > URL: https://issues.apache.org/jira/browse/HDFS-13150 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, hdfs, journal-node, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: HDFS-12943, 3.3.0 > > Attachments: edit-tailing-fast-path-design-v0.pdf, > edit-tailing-fast-path-design-v1.pdf, edit-tailing-fast-path-design-v2.pdf > > > In the interest of making coordinated/consistent reads easier to complete > with low latency, it is advantageous to reduce the time between when a > transaction is applied on the ANN and when it is applied on the SbNN. We > propose adding a new "fast path" which can be used to tail edits when low > latency is desired. We leave the existing tailing logic in place, and fall > back to this path on startup, recovery, and when the fast path encounters > unrecoverable errors. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-16220) [FGL]Configurable INodeMap#NAMESPACE_KEY_DEPTH_RANGES_STATIC
[ https://issues.apache.org/jira/browse/HDFS-16220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422417#comment-17422417 ] Konstantin Shvachko edited comment on HDFS-16220 at 9/29/21, 11:22 PM: --- # I don't think we should make Key depth configurable. We should generalize the long[] into a Key class, then we will be able to configure any Key class and load it using a Key factory. Different Key classes then can have different depths. The problem here is that one should be able to construct the keys while loading INodes from the image, so that they could be placed into the right partitions. # Number of partitions should in the end be configurable. It should depend on the number of cores on your server. Increasing the number of partitions does not necessarily increase the parallelism because at any moment the CPU cannot support more threads than the number of cores. So this change is useful, but not critical. And the main problem here is to be able to rebuild new partitions while reloading the fsimage. If you upgraded your NameNode to a server with more cores you should be able to adjust the number of partitions. was (Author: shv): # I don't think we should make Key depth configurable. We should generalize the long[] into a Key class, then we will be able to configure any Key class and load it using a Key factory. Different Key classes then can have different depths. The problem here is that one should be able to construct the keys while loading INodes from the image, so that they could be placed into the right partitions. # Number of partitions should in the end be configurable. It should depend on the number of cores on your server. Increasing the number of partitions does not necessarily increase the parallelism because at any moment the CPU cannot support more threads than the number of cores. So this change is useful, but not critical. > [FGL]Configurable INodeMap#NAMESPACE_KEY_DEPTH_RANGES_STATIC > > > Key: HDFS-16220 > URL: https://issues.apache.org/jira/browse/HDFS-16220 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Major > Labels: pull-request-available > Attachments: debug1.jpg, debug2.jpg > > Time Spent: 3h 10m > Remaining Estimate: 0h > > In INodeMap, NAMESPACE_KEY_DEPTH and NUM_RANGES_STATIC are a fixed value, we > should make it configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16228) [FGL]Improve safer PartitionedGSet#size
[ https://issues.apache.org/jira/browse/HDFS-16228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422419#comment-17422419 ] Konstantin Shvachko commented on HDFS-16228: Atomic variables have the same semantics as volatile, except that in addition they provide atomic set-and-get methods. In your patch you do call {{incrementAndGet()}}, but only for the purpose of incrementing since you never use the returned result. So you might as well keep it volatile. Besides, all GSet methods should be called under a higher level lock, so they should be safe. > [FGL]Improve safer PartitionedGSet#size > --- > > Key: HDFS-16228 > URL: https://issues.apache.org/jira/browse/HDFS-16228 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > When multiple PartitionedEntry is working at the same time, there may be > inconsistencies in the operation PartitionedGSet#size. > For example, there are some size++ or size-- operations in PartitionedGSet. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16220) [FGL]Configurable INodeMap#NAMESPACE_KEY_DEPTH_RANGES_STATIC
[ https://issues.apache.org/jira/browse/HDFS-16220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422417#comment-17422417 ] Konstantin Shvachko commented on HDFS-16220: # I don't think we should make Key depth configurable. We should generalize the long[] into a Key class, then we will be able to configure any Key class and load it using a Key factory. Different Key classes then can have different depths. The problem here is that one should be able to construct the keys while loading INodes from the image, so that they could be placed into the right partitions. # Number of partitions should in the end be configurable. It should depend on the number of cores on your server. Increasing the number of partitions does not necessarily increase the parallelism because at any moment the CPU cannot support more threads than the number of cores. So this change is useful, but not critical. > [FGL]Configurable INodeMap#NAMESPACE_KEY_DEPTH_RANGES_STATIC > > > Key: HDFS-16220 > URL: https://issues.apache.org/jira/browse/HDFS-16220 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Major > Labels: pull-request-available > Attachments: debug1.jpg, debug2.jpg > > Time Spent: 3h 10m > Remaining Estimate: 0h > > In INodeMap, NAMESPACE_KEY_DEPTH and NUM_RANGES_STATIC are a fixed value, we > should make it configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14216) NullPointerException happens in NamenodeWebHdfs
[ https://issues.apache.org/jira/browse/HDFS-14216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-14216: --- Fix Version/s: 2.10.2 Also saw this NPE on branch-2.10. Back-port is clean. Adding Fix version. > NullPointerException happens in NamenodeWebHdfs > --- > > Key: HDFS-14216 > URL: https://issues.apache.org/jira/browse/HDFS-14216 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: lujie >Assignee: lujie >Priority: Critical > Fix For: 3.3.0, 3.2.1, 3.1.4, 2.10.2 > > Attachments: HDFS-14216.branch-3.1.patch, HDFS-14216_1.patch, > HDFS-14216_2.patch, HDFS-14216_3.patch, HDFS-14216_4.patch, > HDFS-14216_5.patch, HDFS-14216_6.patch, hadoop-hires-namenode-hadoop11.log > > > workload > {code:java} > curl -i -X PUT -T $HOMEPARH/test.txt > "http://hadoop1:9870/webhdfs/v1/input?op=CREATE=hadoop2; > {code} > the method > {code:java} > org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.chooseDatanode(String > excludeDatanodes){ > HashSet excludes = new HashSet(); > if (excludeDatanodes != null) { >for (String host : StringUtils > .getTrimmedStringCollection(excludeDatanodes)) { > int idx = host.indexOf(":"); >if (idx != -1) { > excludes.add(bm.getDatanodeManager().getDatanodeByXferAddr( >host.substring(0, idx), Integer.parseInt(host.substring(idx + > 1; >} else { > > excludes.add(bm.getDatanodeManager().getDatanodeByHost(host));//line280 >} > } > } > } > {code} > when datanode(e.g.hadoop2) is {color:#d04437}just wiped before > line280{color}, or{color:#33} > {color}{color:#ff}we{color}{color:#ff} give the wrong DN > name{color}*,*then bm.getDatanodeManager().getDatanodeByHost(host) will > return null, *_excludes_* *containes null*. while *_excludes_* are used > later, NPE happens: > {code:java} > java.lang.NullPointerException > at org.apache.hadoop.net.NodeBase.getPath(NodeBase.java:113) > at > org.apache.hadoop.net.NetworkTopology.countNumOfAvailableNodes(NetworkTopology.java:672) > at > org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:533) > at > org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:491) > at > org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.chooseDatanode(NamenodeWebHdfsMethods.java:323) > at > org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.redirectURI(NamenodeWebHdfsMethods.java:384) > at > org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.put(NamenodeWebHdfsMethods.java:652) > at > org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$2.run(NamenodeWebHdfsMethods.java:600) > at > org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$2.run(NamenodeWebHdfsMethods.java:597) > at org.apache.hadoop.ipc.ExternalCall.run(ExternalCall.java:73) > at org.apache.hadoop.ipc.ExternalCall.run(ExternalCall.java:30) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2830) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16211) Complete some descriptions related to AuthToken
[ https://issues.apache.org/jira/browse/HDFS-16211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412096#comment-17412096 ] Konstantin Shvachko commented on HDFS-16211: Hi [~jianghuazhu]. Thanks for contributing. Generally changing documentation is a good thing. But with this particular change I do not see how it clarifies anything about {{AuthToken}} class. Besides, since you commit your changes only into trunk, it increases the divergence between supported versions of Hadoop (3.3, 3.2, 2.10) and makes backports more complex. If you are looking for some simpler tasks to get you started with Hadoop, I suggest to search for issues labeled "newbie" or "newbie++". > Complete some descriptions related to AuthToken > --- > > Key: HDFS-16211 > URL: https://issues.apache.org/jira/browse/HDFS-16211 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > In AuthToken, some description information is missing. > The purpose of this jira is to complete some descriptions related to > AuthToken. > /** > */ > public class AuthToken implements Principal { > .. > } -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16141) [FGL] Address permission related issues with File / Directory
[ https://issues.apache.org/jira/browse/HDFS-16141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-16141. Fix Version/s: Fine-Grained Locking Hadoop Flags: Reviewed Resolution: Fixed I just committed this to fgl branch. Thank you [~prasad-acit]. > [FGL] Address permission related issues with File / Directory > - > > Key: HDFS-16141 > URL: https://issues.apache.org/jira/browse/HDFS-16141 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Fix For: Fine-Grained Locking > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Post FGL implementation (MKDIR & Create File), there are existing UTs got > impacted which needs to be addressed. > Failed Tests: > TestDFSPermission > TestPermission > TestFileCreation > TestDFSMkdirs (Added tests) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning
[ https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390818#comment-17390818 ] Konstantin Shvachko commented on HDFS-14703: Some thoughts on [~daryn]'s comment: * For small clusters/namespaces you don't need to do anything at all, performance should be great. * 1 billion object namespaces can be effectively handled with Observers (HDFS-12943), as described in our [Exabyte Club blog|https://engineering.linkedin.com/blog/2021/the-exabyte-club--linkedin-s-journey-of-scaling-the-hadoop-distr]. * This namespace partitioning idea should help if you want to grow the workloads and cluster size further. And sure, it's a big "if" there. * There is plenty of benchmark data above. I built the POC exactly with the purpose to obtain some preliminary synthetic numbers. For me 30% is a threshold separating worthy improvements. * We won't know the real performance numbers until the feature is done. As with "Consistent Reads from Standby", our initial synthetic benchmarks showed ~50% improvement. The real numbers in production were 3x better in both average throughput and latency. * You bring up good design concerns. But conceptually multiple partitions cannot be worse than the single. When an operation spans all partitions, its like taking a global lock as we do today. So in this case the performance of multiple partitions degenerates to the current level, but in all other cases multiple namespace operations can go in parallel. * Let us know if you have concrete suggestions: you don't want it to sound like FUD. > NameNode Fine-Grained Locking via Metadata Partitioning > --- > > Key: HDFS-14703 > URL: https://issues.apache.org/jira/browse/HDFS-14703 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Priority: Major > Attachments: 001-partitioned-inodeMap-POC.tar.gz, > 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, > NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf > > > We target to enable fine-grained locking by splitting the in-memory namespace > into multiple partitions each having a separate lock. Intended to improve > performance of NameNode write operations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-14540) Block deletion failure causes an infinite polling in TestDeleteBlockPool
[ https://issues.apache.org/jira/browse/HDFS-14540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko reassigned HDFS-14540: -- Assignee: Anton Kutuzov > Block deletion failure causes an infinite polling in TestDeleteBlockPool > > > Key: HDFS-14540 > URL: https://issues.apache.org/jira/browse/HDFS-14540 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.0 >Reporter: John Doe >Assignee: Anton Kutuzov >Priority: Major > > In the testDeleteBlockPool function, when file deletion failure, the while > loop hangs. > {code:java} > fs1.delete(new Path("/alpha"), true); //deletion failure > > // Wait till all blocks are deleted from the dn2 for bpid1. > while ((MiniDFSCluster.getFinalizedDir(dn2StorageDir1, > bpid1).list().length != 0) || (MiniDFSCluster.getFinalizedDir( > dn2StorageDir2, bpid1).list().length != 0)) { > try { > Thread.sleep(3000); > } catch (Exception ignored) { > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14540) Block deletion failure causes an infinite polling in TestDeleteBlockPool
[ https://issues.apache.org/jira/browse/HDFS-14540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390757#comment-17390757 ] Konstantin Shvachko commented on HDFS-14540: I think we should just check the return value of {{fs1.delete()}} and assert it is successful. The rest of the test doesn't make sense without this delete succeeding. > Block deletion failure causes an infinite polling in TestDeleteBlockPool > > > Key: HDFS-14540 > URL: https://issues.apache.org/jira/browse/HDFS-14540 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.0 >Reporter: John Doe >Priority: Major > > In the testDeleteBlockPool function, when file deletion failure, the while > loop hangs. > {code:java} > fs1.delete(new Path("/alpha"), true); //deletion failure > > // Wait till all blocks are deleted from the dn2 for bpid1. > while ((MiniDFSCluster.getFinalizedDir(dn2StorageDir1, > bpid1).list().length != 0) || (MiniDFSCluster.getFinalizedDir( > dn2StorageDir2, bpid1).list().length != 0)) { > try { > Thread.sleep(3000); > } catch (Exception ignored) { > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16130) [FGL] Implement Create File with FGL
[ https://issues.apache.org/jira/browse/HDFS-16130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-16130: --- Fix Version/s: Fine-Grained Locking > [FGL] Implement Create File with FGL > > > Key: HDFS-16130 > URL: https://issues.apache.org/jira/browse/HDFS-16130 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: Fine-Grained Locking >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Fix For: Fine-Grained Locking > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Implement FGL for Create File. > Create API acquire global lock at mulitiple stages. Acquire the respective > partitioned lock and continue the create operation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16130) [FGL] Implement Create File with FGL
[ https://issues.apache.org/jira/browse/HDFS-16130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-16130. Hadoop Flags: Reviewed Resolution: Fixed I just committed this. Fixed a few checkstyle warnings. Thank you [~prasad-acit]. > [FGL] Implement Create File with FGL > > > Key: HDFS-16130 > URL: https://issues.apache.org/jira/browse/HDFS-16130 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: Fine-Grained Locking >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Implement FGL for Create File. > Create API acquire global lock at mulitiple stages. Acquire the respective > partitioned lock and continue the create operation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16128) [FGL] Add support for saving/loading an FS Image for PartitionedGSet
[ https://issues.apache.org/jira/browse/HDFS-16128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-16128. Fix Version/s: Fine-Grained Locking Hadoop Flags: Reviewed Resolution: Fixed I just committed this. Thank you [~xinglin]. > [FGL] Add support for saving/loading an FS Image for PartitionedGSet > > > Key: HDFS-16128 > URL: https://issues.apache.org/jira/browse/HDFS-16128 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Major > Labels: pull-request-available > Fix For: Fine-Grained Locking > > > Add support to save Inodes stored in PartitionedGSet when saving an FS image > and load Inodes into PartitionedGSet from a saved FS image. > h1. Saving FSImage > *Original HDFS design*: iterate every inode in inodeMap and save them into > the FSImage file. > *FGL*: no change is needed here, since PartitionedGSet also provides an > iterator interface, to iterate over inodes stored in partitions. > h1. Loading an HDFS > *Original HDFS design*: it first loads the FSImage files and then loads edit > logs for recent changes. FSImage files contain different sections, including > INodeSections and INodeDirectorySections. An InodeSection contains serialized > Inodes objects and the INodeDirectorySection contains the parent inode for an > Inode. When loading an FSImage, the system first loads INodeSections and then > load the INodeDirectorySections, to set the parent inode for each inode. > After FSImage files are loaded, edit logs are then loaded. Edit log contains > recent changes to the filesystem, including Inodes creation/deletion. For a > newly created INode, the parent inode is set before it is added to the > inodeMap. > *FGL*: when adding an Inode into the partitionedGSet, we need the parent > inode of an inode, in order to determine which partition to store that inode, > when NAMESPACE_KEY_DEPTH = 2. Thus, in FGL, when loading FSImage files, we > used a temporary LightweightGSet (inodeMapTemp), to store inodes. When > LoadFSImage is done, the parent inode for all existing inodes in FSImage > files is set. We can now move the inodes into a partitionedGSet. Load edit > logs can work as usual, as the parent inode for an inode is set before it is > added to the inodeMap. > In theory, PartitionedGSet can support to store inodes without setting its > parent inodes. All these inodes will be stored in the 0th partition. However, > we decide to use a temporary LightweightGSet (inodeMapTemp) to store these > inodes, to make this case more transparent. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16125) [FGL] Fix the iterator for PartitionedGSet
[ https://issues.apache.org/jira/browse/HDFS-16125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-16125. Fix Version/s: Fine-Grained Locking Hadoop Flags: Reviewed Resolution: Fixed +1 on the latest patch. I just committed this to branch fgl, also re-based flg to current trunk. Thank you [~xinglin]. > [FGL] Fix the iterator for PartitionedGSet > --- > > Key: HDFS-16125 > URL: https://issues.apache.org/jira/browse/HDFS-16125 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > Fix For: Fine-Grained Locking > > Time Spent: 1h > Remaining Estimate: 0h > > Iterator in PartitionedGSet would visit the first partition twice, since we > did not set the keyIterator to move to the first key during initialization. > > This is related to fgl: https://issues.apache.org/jira/browse/HDFS-14703 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16125) [FGL] Fix the iterator for PartitionedGSet
[ https://issues.apache.org/jira/browse/HDFS-16125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-16125: --- Summary: [FGL] Fix the iterator for PartitionedGSet (was: fix the iterator for PartitionedGSet ) > [FGL] Fix the iterator for PartitionedGSet > --- > > Key: HDFS-16125 > URL: https://issues.apache.org/jira/browse/HDFS-16125 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Iterator in PartitionedGSet would visit the first partition twice, since we > did not set the keyIterator to move to the first key during initialization. > > This is related to fgl: https://issues.apache.org/jira/browse/HDFS-14703 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16125) fix the iterator for PartitionedGSet
[ https://issues.apache.org/jira/browse/HDFS-16125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-16125: --- Parent: HDFS-14703 Issue Type: Sub-task (was: Bug) > fix the iterator for PartitionedGSet > - > > Key: HDFS-16125 > URL: https://issues.apache.org/jira/browse/HDFS-16125 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Iterator in PartitionedGSet would visit the first partition twice, since we > did not set the keyIterator to move to the first key during initialization. > > This is related to fgl: https://issues.apache.org/jira/browse/HDFS-14703 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning
[ https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17345870#comment-17345870 ] Konstantin Shvachko edited comment on HDFS-14703 at 7/14/21, 5:28 PM: -- I did some performance benchmarks using a physical server (a d430 server in [Utah Emulab testbed|http://www.emulab.net]). I used either RAMDISK or SSD, as the storage for HDFS. By using RAMDISK, we can remove the time used by the SSD to make each write persistent. For the RAM case, we observed an improvement of 45% from fine-grained locking. For the SSD case, fine-grained locking gives us 20% improvement. We used an Intel SSD (model: SSDSC2BX200G4R). We noticed for trunk, the mkdir OPS is lower for the RAMDISK than SSD. We don't know the reason for this yet. We repeated the experiment for RAMDISK for trunk twice to confirm the performance number. h2. tmpfs, hadoop-tmp-dir = /run/hadoop-utos h3. 45% improvements fgl vs. trunk trunk {noformat:nowrap} 2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: # operations: 1000 2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: Elapsed Time: 663510 2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: Ops per sec: 15071.362 2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: Average Time: 13 2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: — mkdirs stats — 2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: # operations: 1000 2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: Elapsed Time: 710248 2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: Ops per sec: 14079.5 2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: Average Time: 14 2021-05-16 22:15:13,515 INFO namenode.FSEditLog: Ending log segment 8345565, 10019540 {noformat} fgl {noformat:nowrap} 2021-05-16 21:06:46,476 INFO namenode.NNThroughputBenchmark: — mkdirs stats — 2021-05-16 21:06:46,476 INFO namenode.NNThroughputBenchmark: # operations: 1000 2021-05-16 21:06:46,476 INFO namenode.NNThroughputBenchmark: Elapsed Time: 445980 2021-05-16 21:06:46,476 INFO namenode.NNThroughputBenchmark: Ops per sec: 22422.530 2021-05-16 21:06:46,476 INFO namenode.NNThroughputBenchmark: Average Time: 8 {noformat} h2. SSD, hadoop.tmp.dir=/dev/sda4 h3. 23% improvement fgl vs. trunk trunk: {noformat:nowrap} 2021-05-16 21:59:06,042 INFO namenode.NNThroughputBenchmark: — mkdirs stats — 2021-05-16 21:59:06,042 INFO namenode.NNThroughputBenchmark: # operations: 1000 2021-05-16 21:59:06,042 INFO namenode.NNThroughputBenchmark: Elapsed Time: 593839 2021-05-16 21:59:06,042 INFO namenode.NNThroughputBenchmark: Ops per sec: 16839.581 2021-05-16 21:59:06,042 INFO namenode.NNThroughputBenchmark: Average Time: 11 {noformat:nowrap} fgl {noformat:nowrap} 2021-05-16 21:21:03,906 INFO namenode.NNThroughputBenchmark: — mkdirs stats — 2021-05-16 21:21:03,906 INFO namenode.NNThroughputBenchmark: # operations: 1000 2021-05-16 21:21:03,906 INFO namenode.NNThroughputBenchmark: Elapsed Time: 481269 2021-05-16 21:21:03,906 INFO namenode.NNThroughputBenchmark: Ops per sec: 20778.400 2021-05-16 21:21:03,906 INFO namenode.NNThroughputBenchmark: Average Time: 9 {noformat} {noformat:nowrap} /dev/sda: ATA device, with non-removable media Model Number: INTEL SSDSC2BX200G4R Serial Number: BTHC523202RD200TGN Firmware Revision: G201DL2D {noformat} was (Author: xinglin): I did some performance benchmarks using a physical server (a d430 server in [Utah Emulab testbed|http://www.emulab.net]). I used either RAMDISK or SSD, as the storage for HDFS. By using RAMDISK, we can remove the time used by the SSD to make each write persistent. For the RAM case, we observed an improvement of 45% from fine-grained locking. For the SSD case, fine-grained locking gives us 20% improvement. We used an Intel SSD (model: SSDSC2BX200G4R). We noticed for trunk, the mkdir OPS is lower for the RAMDISK than SSD. We don't know the reason for this yet. We repeated the experiment for RAMDISK for trunk twice to confirm the performance number. h1. tmpfs, hadoop-tmp-dir = /run/hadoop-utos h1. 45% improvements fgl vs. trunk h2. trunk 2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: # operations: 1000 2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: Elapsed Time: 663510 2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: Ops per sec: 15071.362 2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: Average Time: 13 2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: — mkdirs stats — 2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: # operations: 1000 2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: Elapsed Time: 710248 2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: Ops per sec: 14079.5 2021-05-16 22:15:13,515 INFO
[jira] [Commented] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning
[ https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380757#comment-17380757 ] Konstantin Shvachko commented on HDFS-14703: ??Shall I raise separate Jira for Create and trace the PR??? Yes please let's track {{create}} in a new jira. You can make it a subtask of this jira and follow [the standard process|https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute]. ??Just provided work-around to continue, we shall work on it and eventually optimize it better.?? It is fine as a work around, but yes we should and it would be good to design it early, as it may effect the structure of the entire implementation. A short design doc on the subject would be nice to have if you got any ideas. > NameNode Fine-Grained Locking via Metadata Partitioning > --- > > Key: HDFS-14703 > URL: https://issues.apache.org/jira/browse/HDFS-14703 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Priority: Major > Attachments: 001-partitioned-inodeMap-POC.tar.gz, > 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, > NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf > > > We target to enable fine-grained locking by splitting the in-memory namespace > into multiple partitions each having a separate lock. Intended to improve > performance of NameNode write operations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning
[ https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17378314#comment-17378314 ] Konstantin Shvachko commented on HDFS-14703: Great progress [~prasad-acit]. It proves the concept works for creates as well. I liked that your changes are all confined in internal classes like FSDirectory. Noticed that you implemented {{getInode(id)}} by iterating through all inodes. This is probably the key part of this effort. We should eventually replace {{getInode(id)}} with {{getInode(key)}} to make the inode lookup efficient. But hey you still got 25% boost. > NameNode Fine-Grained Locking via Metadata Partitioning > --- > > Key: HDFS-14703 > URL: https://issues.apache.org/jira/browse/HDFS-14703 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Priority: Major > Attachments: 001-partitioned-inodeMap-POC.tar.gz, > 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, > NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf > > > We target to enable fine-grained locking by splitting the in-memory namespace > into multiple partitions each having a separate lock. Intended to improve > performance of NameNode write operations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16040) RpcQueueTime metric counts requeued calls as unique events.
[ https://issues.apache.org/jira/browse/HDFS-16040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-16040: --- Fix Version/s: 3.3.2 3.2.3 2.10.2 3.1.5 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) I just committed this to all active branches. Thank you, [~simbadzina]. > RpcQueueTime metric counts requeued calls as unique events. > --- > > Key: HDFS-16040 > URL: https://issues.apache.org/jira/browse/HDFS-16040 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0, 3.3.0 >Reporter: Simbarashe Dzinamarira >Assignee: Simbarashe Dzinamarira >Priority: Major > Fix For: 3.1.5, 2.10.2, 3.2.3, 3.3.2 > > Attachments: HDFS-16040.001.patch, HDFS-16040.002.patch, > HDFS-16040.003.patch > > > The RpcQueueTime metric is updated every time a call is re-queued while > waiting for the server state to reach the call's client's state ID. This is > in contrast to RpcProcessingTime which is only updated when the call if > finally processed. > On the Observer NameNode this can result in RpcQueueTimeNumOps being much > larger than RpcProcessingTimeNumOps. The re-queueing is an internal > optimization to avoid blocking and shouldn't result in an inflated metric. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16040) RpcQueueTime metric counts requeued calls as unique events.
[ https://issues.apache.org/jira/browse/HDFS-16040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352920#comment-17352920 ] Konstantin Shvachko commented on HDFS-16040: Wow - clean build. Adding my +1 to that. Will commit shortly. > RpcQueueTime metric counts requeued calls as unique events. > --- > > Key: HDFS-16040 > URL: https://issues.apache.org/jira/browse/HDFS-16040 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0, 3.3.0 >Reporter: Simbarashe Dzinamarira >Assignee: Simbarashe Dzinamarira >Priority: Major > Attachments: HDFS-16040.001.patch, HDFS-16040.002.patch, > HDFS-16040.003.patch > > > The RpcQueueTime metric is updated every time a call is re-queued while > waiting for the server state to reach the call's client's state ID. This is > in contrast to RpcProcessingTime which is only updated when the call if > finally processed. > On the Observer NameNode this can result in RpcQueueTimeNumOps being much > larger than RpcProcessingTimeNumOps. The re-queueing is an internal > optimization to avoid blocking and shouldn't result in an inflated metric. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log
[ https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352653#comment-17352653 ] Konstantin Shvachko commented on HDFS-15915: [~daryn] would appreciate your review. > Race condition with async edits logging due to updating txId outside of the > namesystem log > -- > > Key: HDFS-15915 > URL: https://issues.apache.org/jira/browse/HDFS-15915 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Fix For: 3.4.0, 3.1.5, 2.10.2, 3.2.3, 3.3.2 > > Attachments: HDFS-15915-01.patch, HDFS-15915-02.patch, > HDFS-15915-03.patch, HDFS-15915-04.patch, HDFS-15915-05.patch, > testMkdirsRace.patch > > > {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside > {{FSNamesystem.writeLock}}. But one essential field the transaction id of the > edits op remains unset until the time when the operation is scheduled for > synching. At that time {{beginTransaction()}} will set the the > {{FSEditLogOp.txid}} and increment the global transaction count. On busy > NameNode this event can fall outside the write lock. > This causes problems for Observer reads. It also can potentially reshuffle > transactions and Standby will apply them in a wrong order. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log
[ https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-15915: --- Fix Version/s: 3.3.2 3.2.3 2.10.2 3.1.5 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) I just committed this to trunk and all branches down to branch-2.10. > Race condition with async edits logging due to updating txId outside of the > namesystem log > -- > > Key: HDFS-15915 > URL: https://issues.apache.org/jira/browse/HDFS-15915 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Fix For: 3.4.0, 3.1.5, 2.10.2, 3.2.3, 3.3.2 > > Attachments: HDFS-15915-01.patch, HDFS-15915-02.patch, > HDFS-15915-03.patch, HDFS-15915-04.patch, HDFS-15915-05.patch, > testMkdirsRace.patch > > > {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside > {{FSNamesystem.writeLock}}. But one essential field the transaction id of the > edits op remains unset until the time when the operation is scheduled for > synching. At that time {{beginTransaction()}} will set the the > {{FSEditLogOp.txid}} and increment the global transaction count. On busy > NameNode this event can fall outside the write lock. > This causes problems for Observer reads. It also can potentially reshuffle > transactions and Standby will apply them in a wrong order. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log
[ https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352023#comment-17352023 ] Konstantin Shvachko commented on HDFS-15915: Ran unit tests that failed on Jenkins. All passing locally. Will be committing this shortly. > Race condition with async edits logging due to updating txId outside of the > namesystem log > -- > > Key: HDFS-15915 > URL: https://issues.apache.org/jira/browse/HDFS-15915 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Attachments: HDFS-15915-01.patch, HDFS-15915-02.patch, > HDFS-15915-03.patch, HDFS-15915-04.patch, HDFS-15915-05.patch, > testMkdirsRace.patch > > > {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside > {{FSNamesystem.writeLock}}. But one essential field the transaction id of the > edits op remains unset until the time when the operation is scheduled for > synching. At that time {{beginTransaction()}} will set the the > {{FSEditLogOp.txid}} and increment the global transaction count. On busy > NameNode this event can fall outside the write lock. > This causes problems for Observer reads. It also can potentially reshuffle > transactions and Standby will apply them in a wrong order. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16040) RpcQueueTime metric counts requeued calls as unique events.
[ https://issues.apache.org/jira/browse/HDFS-16040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351510#comment-17351510 ] Konstantin Shvachko commented on HDFS-16040: I checked that the test is fails without the fix and passes with it. I was wondering if the code correct;y counts the queue time for Observer. That is takes into account the time the call was requeued. It seems to me that it does. [~simbadzina] could you please double-check. I guess there will be some checkstyle warnings when Jenkins finishes. > RpcQueueTime metric counts requeued calls as unique events. > --- > > Key: HDFS-16040 > URL: https://issues.apache.org/jira/browse/HDFS-16040 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0, 3.3.0 >Reporter: Simbarashe Dzinamarira >Assignee: Simbarashe Dzinamarira >Priority: Major > Attachments: HDFS-16040.001.patch, HDFS-16040.002.patch > > > The RpcQueueTime metric is updated every time a call is re-queued while > waiting for the server state to reach the call's client's state ID. This is > in contrast to RpcProcessingTime which is only updated when the call if > finally processed. > On the Observer NameNode this can result in RpcQueueTimeNumOps being much > larger than RpcProcessingTimeNumOps. The re-queueing is an internal > optimization to avoid blocking and shouldn't result in an inflated metric. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log
[ https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351284#comment-17351284 ] Konstantin Shvachko commented on HDFS-15915: Thanks for thorough review [~virajith]. BTW, this {{logEdit()}} method is only used in BackupNode, so it doesn't matter much. But I swapped the two lines in v 05 patch. > Race condition with async edits logging due to updating txId outside of the > namesystem log > -- > > Key: HDFS-15915 > URL: https://issues.apache.org/jira/browse/HDFS-15915 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Attachments: HDFS-15915-01.patch, HDFS-15915-02.patch, > HDFS-15915-03.patch, HDFS-15915-04.patch, HDFS-15915-05.patch, > testMkdirsRace.patch > > > {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside > {{FSNamesystem.writeLock}}. But one essential field the transaction id of the > edits op remains unset until the time when the operation is scheduled for > synching. At that time {{beginTransaction()}} will set the the > {{FSEditLogOp.txid}} and increment the global transaction count. On busy > NameNode this event can fall outside the write lock. > This causes problems for Observer reads. It also can potentially reshuffle > transactions and Standby will apply them in a wrong order. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log
[ https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-15915: --- Attachment: HDFS-15915-05.patch > Race condition with async edits logging due to updating txId outside of the > namesystem log > -- > > Key: HDFS-15915 > URL: https://issues.apache.org/jira/browse/HDFS-15915 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Attachments: HDFS-15915-01.patch, HDFS-15915-02.patch, > HDFS-15915-03.patch, HDFS-15915-04.patch, HDFS-15915-05.patch, > testMkdirsRace.patch > > > {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside > {{FSNamesystem.writeLock}}. But one essential field the transaction id of the > edits op remains unset until the time when the operation is scheduled for > synching. At that time {{beginTransaction()}} will set the the > {{FSEditLogOp.txid}} and increment the global transaction count. On busy > NameNode this event can fall outside the write lock. > This causes problems for Observer reads. It also can potentially reshuffle > transactions and Standby will apply them in a wrong order. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log
[ https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347336#comment-17347336 ] Konstantin Shvachko commented on HDFS-15915: Updated the patch per [~virajith]'s suggestions. Thanks. # The default implementation of {{EditLogOutputStream.getLastJournalledTxId()}} returns {{INVALID_TXID}} rather than {{0}}. # Changed {{beginTransaction()}} type to void. ??This change forces the txid to be assigned when the operation takes place under the FSN lock.?? Exactly right. The advantage of this in non-Observer case is verifiability and proper enforcement. When you merely rely on placing operations into the queue in the right order you cannot verify that, such as write unit tests or set asserts. And it is hard to detect a bug if there is one in this very multi-threaded code. With the patch the txId is generated when the operation is queued, so I could add asserts to ensure operations are queued and synced in the order they were applied on the active NN. > Race condition with async edits logging due to updating txId outside of the > namesystem log > -- > > Key: HDFS-15915 > URL: https://issues.apache.org/jira/browse/HDFS-15915 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Attachments: HDFS-15915-01.patch, HDFS-15915-02.patch, > HDFS-15915-03.patch, HDFS-15915-04.patch, testMkdirsRace.patch > > > {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside > {{FSNamesystem.writeLock}}. But one essential field the transaction id of the > edits op remains unset until the time when the operation is scheduled for > synching. At that time {{beginTransaction()}} will set the the > {{FSEditLogOp.txid}} and increment the global transaction count. On busy > NameNode this event can fall outside the write lock. > This causes problems for Observer reads. It also can potentially reshuffle > transactions and Standby will apply them in a wrong order. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log
[ https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-15915: --- Attachment: HDFS-15915-04.patch > Race condition with async edits logging due to updating txId outside of the > namesystem log > -- > > Key: HDFS-15915 > URL: https://issues.apache.org/jira/browse/HDFS-15915 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Attachments: HDFS-15915-01.patch, HDFS-15915-02.patch, > HDFS-15915-03.patch, HDFS-15915-04.patch, testMkdirsRace.patch > > > {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside > {{FSNamesystem.writeLock}}. But one essential field the transaction id of the > edits op remains unset until the time when the operation is scheduled for > synching. At that time {{beginTransaction()}} will set the the > {{FSEditLogOp.txid}} and increment the global transaction count. On busy > NameNode this event can fall outside the write lock. > This causes problems for Observer reads. It also can potentially reshuffle > transactions and Standby will apply them in a wrong order. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning
[ https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346379#comment-17346379 ] Konstantin Shvachko commented on HDFS-14703: Thanks [~prasad-acit] and [~xinglin] for benchmarking. Very glad you guys could independently confirm 30-45% improvement. I think the PartitionedGSet implementation should benefit from both *_more cores_* and *_faster storage device_* for edits. For storage device NVME SSDs perform the best for journaling type workloads in our experience. Also please take into account this is only a POC patch. Theoretically, we should be able to scale performance proportionally to the number of cores and partitions in the GSet given we are not IO bound. > NameNode Fine-Grained Locking via Metadata Partitioning > --- > > Key: HDFS-14703 > URL: https://issues.apache.org/jira/browse/HDFS-14703 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Priority: Major > Attachments: 001-partitioned-inodeMap-POC.tar.gz, > 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, > NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf > > > We target to enable fine-grained locking by splitting the in-memory namespace > into multiple partitions each having a separate lock. Intended to improve > performance of NameNode write operations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16004) BackupNode and QJournal lack Permission check.
[ https://issues.apache.org/jira/browse/HDFS-16004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17342108#comment-17342108 ] Konstantin Shvachko commented on HDFS-16004: Hey guys. I wouldn't worry about {{BackupNode}}. It was supposed to be removed as redundant HDFS-4114. Same with {{JournalProtocol}} as it is used exclusively for {{BackupNode}}. This is an old code, that is not supposed to be used. There were some controversial issues about removing {{BackupNode}}, but I don't think they still stand. {{QJournalProtocol}} is the one to be used with QJM. If it is fine, then we can close this issue as wont fix or not a problem. > BackupNode and QJournal lack Permission check. > -- > > Key: HDFS-16004 > URL: https://issues.apache.org/jira/browse/HDFS-16004 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: lujie >Assignee: lujie >Priority: Critical > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > I have some doubt when i configurate secure HDFS. I know we have Service > Level Authorization for protocols like NamenodeProtocol,DatanodeProtocol and > so on. > But i do not find such Authorization for JournalProtocol after reading the > code in HDFSPolicyProvider. And if we have, how can i configurate such > Authorization? > > Besides even NamenodeProtocol has Service Level Authorization, its methods > still have Permission check. Take startCheckpoint in NameNodeRpcServer who > implemented NamenodeProtocol for example: > > _public NamenodeCommand startCheckpoint(NamenodeRegistration registration)_ > _throws IOException {_ > _String operationName = "startCheckpoint";_ > _checkNNStartup();_ > _{color:#ff6600}namesystem.checkSuperuserPrivilege(operationName);{color}_ > _.._ > > I found that the methods in BackupNodeRpcServer who implemented > JournalProtocol lack of such Permission check. See below: > > > _public void startLogSegment(JournalInfo journalInfo, long epoch,_ > _long txid) throws IOException {_ > _namesystem.checkOperation(OperationCategory.JOURNAL);_ > _verifyJournalRequest(journalInfo);_ > _getBNImage().namenodeStartedLogSegment(txid);_ > _}_ > > _@Override_ > _public void journal(JournalInfo journalInfo, long epoch, long firstTxId,_ > _int numTxns, byte[] records) throws IOException {_ > _namesystem.checkOperation(OperationCategory.JOURNAL);_ > _verifyJournalRequest(journalInfo);_ > _getBNImage().journal(firstTxId, numTxns, records);_ > _}_ > > Do we need add Permission check for them? > > Please point out my mistakes if i am wrong or miss something. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning
[ https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17341089#comment-17341089 ] Konstantin Shvachko edited comment on HDFS-14703 at 5/10/21, 7:30 PM: -- Updated the POC patches to current trunk. There were indeed some missing parts in the first patch. See [^003-partitioned-inodeMap-POC.tar.gz]. Also created a remote branch called {{fgl}} in hadoop repo with both patches applied to current trunk. [~xinglin] is working on adding {{create()}} call to FGL. Right now only {{mkdirs()}} is supported. was (Author: shv): Updated the POC patches to current trunk. There were indeed some missing parts in the first patch. See [^003-partitioned-inodeMap-POC.tar.gz]. > NameNode Fine-Grained Locking via Metadata Partitioning > --- > > Key: HDFS-14703 > URL: https://issues.apache.org/jira/browse/HDFS-14703 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Priority: Major > Attachments: 001-partitioned-inodeMap-POC.tar.gz, > 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, > NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf > > > We target to enable fine-grained locking by splitting the in-memory namespace > into multiple partitions each having a separate lock. Intended to improve > performance of NameNode write operations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning
[ https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17341089#comment-17341089 ] Konstantin Shvachko edited comment on HDFS-14703 at 5/8/21, 1:05 AM: - Updated the POC patches to current trunk. There were indeed some missing parts in the first patch. See [^003-partitioned-inodeMap-POC.tar.gz]. was (Author: shv): Updated the POC patches. There were indeed some missing parts in the first patch. See [003-partitioned-inodeMap-POC.tar.gz|https://issues.apache.org/jira/secure/attachment/13025177/003-partitioned-inodeMap-POC.tar.gz]. > NameNode Fine-Grained Locking via Metadata Partitioning > --- > > Key: HDFS-14703 > URL: https://issues.apache.org/jira/browse/HDFS-14703 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Priority: Major > Attachments: 001-partitioned-inodeMap-POC.tar.gz, > 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, > NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf > > > We target to enable fine-grained locking by splitting the in-memory namespace > into multiple partitions each having a separate lock. Intended to improve > performance of NameNode write operations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning
[ https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17341089#comment-17341089 ] Konstantin Shvachko edited comment on HDFS-14703 at 5/8/21, 1:04 AM: - Updated the POC patches. There were indeed some missing parts in the first patch. See [003-partitioned-inodeMap-POC.tar.gz|https://issues.apache.org/jira/secure/attachment/13025177/003-partitioned-inodeMap-POC.tar.gz]. was (Author: shv): Updated the POC patches. There were indeed some missing parts in the first patch. See [https://issues.apache.org/jira/secure/attachment/13025177/003-partitioned-inodeMap-POC.tar.gz|https://issues.apache.org/jira/secure/attachment/13025177/003-partitioned-inodeMap-POC.tar.gz]. > NameNode Fine-Grained Locking via Metadata Partitioning > --- > > Key: HDFS-14703 > URL: https://issues.apache.org/jira/browse/HDFS-14703 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Priority: Major > Attachments: 001-partitioned-inodeMap-POC.tar.gz, > 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, > NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf > > > We target to enable fine-grained locking by splitting the in-memory namespace > into multiple partitions each having a separate lock. Intended to improve > performance of NameNode write operations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning
[ https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17341089#comment-17341089 ] Konstantin Shvachko commented on HDFS-14703: Updated the POC patches. There were indeed some missing parts in the first patch. See [https://issues.apache.org/jira/secure/attachment/13025177/003-partitioned-inodeMap-POC.tar.gz|https://issues.apache.org/jira/secure/attachment/13025177/003-partitioned-inodeMap-POC.tar.gz]. > NameNode Fine-Grained Locking via Metadata Partitioning > --- > > Key: HDFS-14703 > URL: https://issues.apache.org/jira/browse/HDFS-14703 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Priority: Major > Attachments: 001-partitioned-inodeMap-POC.tar.gz, > 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, > NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf > > > We target to enable fine-grained locking by splitting the in-memory namespace > into multiple partitions each having a separate lock. Intended to improve > performance of NameNode write operations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning
[ https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-14703: --- Attachment: 003-partitioned-inodeMap-POC.tar.gz > NameNode Fine-Grained Locking via Metadata Partitioning > --- > > Key: HDFS-14703 > URL: https://issues.apache.org/jira/browse/HDFS-14703 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Priority: Major > Attachments: 001-partitioned-inodeMap-POC.tar.gz, > 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, > NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf > > > We target to enable fine-grained locking by splitting the in-memory namespace > into multiple partitions each having a separate lock. Intended to improve > performance of NameNode write operations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16001) TestOfflineEditsViewer.testStored() fails reading negative value of FSEditLogOpCodes
[ https://issues.apache.org/jira/browse/HDFS-16001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17341046#comment-17341046 ] Konstantin Shvachko commented on HDFS-16001: Checked the test. This fixes it. +1 thanks [~aajisaka] > TestOfflineEditsViewer.testStored() fails reading negative value of > FSEditLogOpCodes > > > Key: HDFS-16001 > URL: https://issues.apache.org/jira/browse/HDFS-16001 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Konstantin Shvachko >Assignee: Akira Ajisaka >Priority: Blocker > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > {{TestOfflineEditsViewer.testStored()}} fails consistently with an exception > {noformat} > java.io.IOException: Op -54 has size -1314247195, but the minimum op size is > 17 > {noformat} > Seems like there is a corrupt record in {{editsStored}} file. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15912) Allow ProtobufRpcEngine to be extensible
[ https://issues.apache.org/jira/browse/HDFS-15912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337546#comment-17337546 ] Konstantin Shvachko commented on HDFS-15912: Since all changes are in {{hadoop-common}} this should be HADOOP-* jira, rather than HDFS-. Could you please move it to the right jira project to adjust the visibility for the right audience. About the change itself. # There are some [checkstyle warnings|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2905/1/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt], which are actually right. It is perfectly fine for methods to be protected , but not for the members. A better way is to keep them private and provide get/setters. For those that are really needed. # I see some white space change, like a blank line with spaces. > Allow ProtobufRpcEngine to be extensible > > > Key: HDFS-15912 > URL: https://issues.apache.org/jira/browse/HDFS-15912 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Hector Sandoval Chaverri >Assignee: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > The ProtobufRpcEngine class doesn't allow for new RpcEngine implementations > to extend some of its inner classes (e.g. Invoker and > Server.ProtoBufRpcInvoker). Also, some of its methods are long enough such > that overriding them would result in a lot of code duplication (e.g. > Invoker#invoke and Server.ProtoBufRpcInvoker#call). > When implementing a new RpcEngine, it would be helpful to reuse most of the > code already in ProtobufRpcEngine. This would allow new fields to be added to > the RPC header or message with minimal code changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15652) Make block size from NNThroughputBenchmark configurable
[ https://issues.apache.org/jira/browse/HDFS-15652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-15652: --- Fix Version/s: 3.2.3 2.10.2 3.1.5 3.3.1 Just back-ported this to branches 3.3, 3.2, 3.1, and 2.10. Updated Fix Versions. Thanks [~ferhui] for contributing. > Make block size from NNThroughputBenchmark configurable > > > Key: HDFS-15652 > URL: https://issues.apache.org/jira/browse/HDFS-15652 > Project: Hadoop HDFS > Issue Type: Improvement > Components: benchmarks >Affects Versions: 3.3.0 >Reporter: Hui Fei >Assignee: Hui Fei >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > When test NNThroughputBenchmark, get following error logs. > {quote} > 2020-10-26 20:51:25,781 ERROR namenode.NNThroughputBenchmark: StatsDaemon 43 > failed: > org.apache.hadoop.ipc.RemoteException(java.io.IOException): Specified block > size is less than configured minimum value > (dfs.namenode.fs-limits.min-block-size): 16 < 1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2514) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2452) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.createOriginal(NameNodeRpcServer.java:824) > at > org.apache.hadoop.hdfs.server.namenode.ProtectionManager.create(ProtectionManager.java:344) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:792) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:326) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1020) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2002) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2985) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1562) > at org.apache.hadoop.ipc.Client.call(Client.java:1508) > at org.apache.hadoop.ipc.Client.call(Client.java:1405) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy9.create(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:281) > at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy10.create(Unknown Source) > at > org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$CreateFileStats.executeOp(NNThroughputBenchmark.java:597) > at > org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$StatsDaemon.benchmarkOne(NNThroughputBenchmark.java:428) > at > org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$StatsDaemon.run(NNThroughputBenchmark.java:412) > {quote} > Because NN has start and serves, we should make block size of client > benchmark configurable, and that will be convenient -- This message was sent by Atlassian Jira (v8.3.4#803005) - To
[jira] [Updated] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log
[ https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-15915: --- Attachment: HDFS-15915-03.patch > Race condition with async edits logging due to updating txId outside of the > namesystem log > -- > > Key: HDFS-15915 > URL: https://issues.apache.org/jira/browse/HDFS-15915 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Attachments: HDFS-15915-01.patch, HDFS-15915-02.patch, > HDFS-15915-03.patch, testMkdirsRace.patch > > > {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside > {{FSNamesystem.writeLock}}. But one essential field the transaction id of the > edits op remains unset until the time when the operation is scheduled for > synching. At that time {{beginTransaction()}} will set the the > {{FSEditLogOp.txid}} and increment the global transaction count. On busy > NameNode this event can fall outside the write lock. > This causes problems for Observer reads. It also can potentially reshuffle > transactions and Standby will apply them in a wrong order. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15652) Make block size from NNThroughputBenchmark configurable
[ https://issues.apache.org/jira/browse/HDFS-15652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335066#comment-17335066 ] Konstantin Shvachko commented on HDFS-15652: I would like to backport this to earlier versions up to 2.10, if there are no objections. > Make block size from NNThroughputBenchmark configurable > > > Key: HDFS-15652 > URL: https://issues.apache.org/jira/browse/HDFS-15652 > Project: Hadoop HDFS > Issue Type: Improvement > Components: benchmarks >Affects Versions: 3.3.0 >Reporter: Hui Fei >Assignee: Hui Fei >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > When test NNThroughputBenchmark, get following error logs. > {quote} > 2020-10-26 20:51:25,781 ERROR namenode.NNThroughputBenchmark: StatsDaemon 43 > failed: > org.apache.hadoop.ipc.RemoteException(java.io.IOException): Specified block > size is less than configured minimum value > (dfs.namenode.fs-limits.min-block-size): 16 < 1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2514) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2452) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.createOriginal(NameNodeRpcServer.java:824) > at > org.apache.hadoop.hdfs.server.namenode.ProtectionManager.create(ProtectionManager.java:344) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:792) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:326) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1020) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2002) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2985) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1562) > at org.apache.hadoop.ipc.Client.call(Client.java:1508) > at org.apache.hadoop.ipc.Client.call(Client.java:1405) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy9.create(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:281) > at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy10.create(Unknown Source) > at > org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$CreateFileStats.executeOp(NNThroughputBenchmark.java:597) > at > org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$StatsDaemon.benchmarkOne(NNThroughputBenchmark.java:428) > at > org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$StatsDaemon.run(NNThroughputBenchmark.java:412) > {quote} > Because NN has start and serves, we should make block size of client > benchmark configurable, and that will be convenient -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Commented] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log
[ https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335052#comment-17335052 ] Konstantin Shvachko commented on HDFS-15915: Patch v.02 fixes findbugs and white space warnings. Checked test failures * {{TestOfflineEditsViewer}} fails on trunk the same way as with the patch. Filed HDFS-16001 for it. * {{TestDirectoryScanner}} intermittently fails because of HDFS-11045. * All other tests passed locally. > Race condition with async edits logging due to updating txId outside of the > namesystem log > -- > > Key: HDFS-15915 > URL: https://issues.apache.org/jira/browse/HDFS-15915 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Attachments: HDFS-15915-01.patch, HDFS-15915-02.patch, > testMkdirsRace.patch > > > {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside > {{FSNamesystem.writeLock}}. But one essential field the transaction id of the > edits op remains unset until the time when the operation is scheduled for > synching. At that time {{beginTransaction()}} will set the the > {{FSEditLogOp.txid}} and increment the global transaction count. On busy > NameNode this event can fall outside the write lock. > This causes problems for Observer reads. It also can potentially reshuffle > transactions and Standby will apply them in a wrong order. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log
[ https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-15915: --- Attachment: HDFS-15915-02.patch > Race condition with async edits logging due to updating txId outside of the > namesystem log > -- > > Key: HDFS-15915 > URL: https://issues.apache.org/jira/browse/HDFS-15915 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Attachments: HDFS-15915-01.patch, HDFS-15915-02.patch, > testMkdirsRace.patch > > > {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside > {{FSNamesystem.writeLock}}. But one essential field the transaction id of the > edits op remains unset until the time when the operation is scheduled for > synching. At that time {{beginTransaction()}} will set the the > {{FSEditLogOp.txid}} and increment the global transaction count. On busy > NameNode this event can fall outside the write lock. > This causes problems for Observer reads. It also can potentially reshuffle > transactions and Standby will apply them in a wrong order. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16001) TestOfflineEditsViewer.testStored() fails reading negative value of FSEditLogOpCodes
[ https://issues.apache.org/jira/browse/HDFS-16001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335051#comment-17335051 ] Konstantin Shvachko commented on HDFS-16001: This fails consistently on trunk but not in 2.10. I did not check other versions. Full exception here: {noformat:nowrap} Op -54 has size -1314247195, but the minimum op size is 17 Encountered exception. Exiting: Op -54 has size -1314247195, but the minimum op size is 17 java.io.IOException: Op -54 has size -1314247195, but the minimum op size is 17 at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LengthPrefixedReader.decodeOpFrame(FSEditLogOp.java:5244) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LengthPrefixedReader.decodeOp(FSEditLogOp.java:5186) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$Reader.readOp(FSEditLogOp.java:5059) at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.nextOpImpl(EditLogFileInputStream.java:229) at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.nextOp(EditLogFileInputStream.java:276) at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBinaryLoader.loadEdits(OfflineEditsBinaryLoader.java:67) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:158) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer.runOev(TestOfflineEditsViewer.java:208) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer.testStored(TestOfflineEditsViewer.java:176) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) {noformat} > TestOfflineEditsViewer.testStored() fails reading negative value of > FSEditLogOpCodes > > > Key: HDFS-16001 > URL: https://issues.apache.org/jira/browse/HDFS-16001 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Konstantin Shvachko >Priority: Major > > {{TestOfflineEditsViewer.testStored()}} fails consistently with an exception >
[jira] [Created] (HDFS-16001) TestOfflineEditsViewer.testStored() fails reading negative value of FSEditLogOpCodes
Konstantin Shvachko created HDFS-16001: -- Summary: TestOfflineEditsViewer.testStored() fails reading negative value of FSEditLogOpCodes Key: HDFS-16001 URL: https://issues.apache.org/jira/browse/HDFS-16001 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Reporter: Konstantin Shvachko -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16001) TestOfflineEditsViewer.testStored() fails reading negative value of FSEditLogOpCodes
[ https://issues.apache.org/jira/browse/HDFS-16001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-16001: --- Docs Text: (was: {{TestOfflineEditsViewer.testStored()}} fails consistently with an exception {noformat} java.io.IOException: Op -54 has size -1314247195, but the minimum op size is 17 {noformat} Seems like there is a corrupt record in {{editsStored}} file.) Description: {{TestOfflineEditsViewer.testStored()}} fails consistently with an exception {noformat} java.io.IOException: Op -54 has size -1314247195, but the minimum op size is 17 {noformat} Seems like there is a corrupt record in {{editsStored}} file. > TestOfflineEditsViewer.testStored() fails reading negative value of > FSEditLogOpCodes > > > Key: HDFS-16001 > URL: https://issues.apache.org/jira/browse/HDFS-16001 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Konstantin Shvachko >Priority: Major > > {{TestOfflineEditsViewer.testStored()}} fails consistently with an exception > {noformat} > java.io.IOException: Op -54 has size -1314247195, but the minimum op size is > 17 > {noformat} > Seems like there is a corrupt record in {{editsStored}} file. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7612) TestOfflineEditsViewer.testStored() uses incorrect default value for cacheDir
[ https://issues.apache.org/jira/browse/HDFS-7612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335043#comment-17335043 ] Konstantin Shvachko commented on HDFS-7612: --- Came across this again. Still not fixed. We just need to replace the default value of {{System.getProperty()}} with {{"target/test-classes"}} > TestOfflineEditsViewer.testStored() uses incorrect default value for cacheDir > - > > Key: HDFS-7612 > URL: https://issues.apache.org/jira/browse/HDFS-7612 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.6.0 >Reporter: Konstantin Shvachko >Priority: Major > Labels: newbie > > {code} > final String cacheDir = System.getProperty("test.cache.data", > "build/test/cache"); > {code} > results in > {{FileNotFoundException: build/test/cache/editsStoredParsed.xml (No such file > or directory)}} > when {{test.cache.data}} is not set. > I can see this failing while running in Eclipse. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-7612) TestOfflineEditsViewer.testStored() uses incorrect default value for cacheDir
[ https://issues.apache.org/jira/browse/HDFS-7612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-7612: -- Labels: newbie (was: ) > TestOfflineEditsViewer.testStored() uses incorrect default value for cacheDir > - > > Key: HDFS-7612 > URL: https://issues.apache.org/jira/browse/HDFS-7612 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.6.0 >Reporter: Konstantin Shvachko >Priority: Major > Labels: newbie > > {code} > final String cacheDir = System.getProperty("test.cache.data", > "build/test/cache"); > {code} > results in > {{FileNotFoundException: build/test/cache/editsStoredParsed.xml (No such file > or directory)}} > when {{test.cache.data}} is not set. > I can see this failing while running in Eclipse. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log
[ https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko reassigned HDFS-15915: -- Assignee: Konstantin Shvachko > Race condition with async edits logging due to updating txId outside of the > namesystem log > -- > > Key: HDFS-15915 > URL: https://issues.apache.org/jira/browse/HDFS-15915 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Attachments: HDFS-15915-01.patch, testMkdirsRace.patch > > > {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside > {{FSNamesystem.writeLock}}. But one essential field the transaction id of the > edits op remains unset until the time when the operation is scheduled for > synching. At that time {{beginTransaction()}} will set the the > {{FSEditLogOp.txid}} and increment the global transaction count. On busy > NameNode this event can fall outside the write lock. > This causes problems for Observer reads. It also can potentially reshuffle > transactions and Standby will apply them in a wrong order. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log
[ https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-15915: --- Status: Patch Available (was: Open) > Race condition with async edits logging due to updating txId outside of the > namesystem log > -- > > Key: HDFS-15915 > URL: https://issues.apache.org/jira/browse/HDFS-15915 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Attachments: HDFS-15915-01.patch, testMkdirsRace.patch > > > {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside > {{FSNamesystem.writeLock}}. But one essential field the transaction id of the > edits op remains unset until the time when the operation is scheduled for > synching. At that time {{beginTransaction()}} will set the the > {{FSEditLogOp.txid}} and increment the global transaction count. On busy > NameNode this event can fall outside the write lock. > This causes problems for Observer reads. It also can potentially reshuffle > transactions and Standby will apply them in a wrong order. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log
[ https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-15915: --- Attachment: HDFS-15915-01.patch > Race condition with async edits logging due to updating txId outside of the > namesystem log > -- > > Key: HDFS-15915 > URL: https://issues.apache.org/jira/browse/HDFS-15915 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Priority: Major > Attachments: HDFS-15915-01.patch, testMkdirsRace.patch > > > {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside > {{FSNamesystem.writeLock}}. But one essential field the transaction id of the > edits op remains unset until the time when the operation is scheduled for > synching. At that time {{beginTransaction()}} will set the the > {{FSEditLogOp.txid}} and increment the global transaction count. On busy > NameNode this event can fall outside the write lock. > This causes problems for Observer reads. It also can potentially reshuffle > transactions and Standby will apply them in a wrong order. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log
[ https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334483#comment-17334483 ] Konstantin Shvachko commented on HDFS-15915: Attaching a patch to fix the problem. The is a lot of moving parts in asynchronous journal logging, took me a while to get it working, although the actual fix doesn't look complex. # The main idea is that a new txId is assigned to the journal transaction when it is logged by {{logEdit(op)}} when the call is still under {{fsn.writeLock}}, rather than later while in {{logSync()}} as it is now. I think this is the right way to _*guarantee that all transactions are journalled in the same order as they were applied on Active NameNode*_. # Currently we do not have checks or tests against mismatch of the transactions order. This would have been a problem for regular HA with or without Observer. I could not build a test, which would show the order of transactions can be tampered with, but couldn't convince myself it is impossible either. The patch adds asserts to guarantee the journal txIds order is the same as they were applied to ANN. # I had to rework {{TestEditLogRace.testDeadlock()}}. Changed it to mock on {{doEditTransaction()}} instead of on {{setTransactionId()}} for the "blocker thread". Also with FSEditLogAsync we cannot really reuse the same operation instance for different transactions any more as they now have txid set in it before syncing. This is [~daryn]'s creation. woud appreciate if you could take a look. > Race condition with async edits logging due to updating txId outside of the > namesystem log > -- > > Key: HDFS-15915 > URL: https://issues.apache.org/jira/browse/HDFS-15915 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Priority: Major > Attachments: testMkdirsRace.patch > > > {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside > {{FSNamesystem.writeLock}}. But one essential field the transaction id of the > edits op remains unset until the time when the operation is scheduled for > synching. At that time {{beginTransaction()}} will set the the > {{FSEditLogOp.txid}} and increment the global transaction count. On busy > NameNode this event can fall outside the write lock. > This causes problems for Observer reads. It also can potentially reshuffle > transactions and Standby will apply them in a wrong order. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log
[ https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-15915: --- Target Version/s: 2.10.2 > Race condition with async edits logging due to updating txId outside of the > namesystem log > -- > > Key: HDFS-15915 > URL: https://issues.apache.org/jira/browse/HDFS-15915 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Priority: Major > Attachments: testMkdirsRace.patch > > > {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside > {{FSNamesystem.writeLock}}. But one essential field the transaction id of the > edits op remains unset until the time when the operation is scheduled for > synching. At that time {{beginTransaction()}} will set the the > {{FSEditLogOp.txid}} and increment the global transaction count. On busy > NameNode this event can fall outside the write lock. > This causes problems for Observer reads. It also can potentially reshuffle > transactions and Standby will apply them in a wrong order. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log
[ https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307463#comment-17307463 ] Konstantin Shvachko commented on HDFS-15915: Attached the test reproducing the bug. Looks like [~zero45] warned about this problem in [his comment|https://issues.apache.org/jira/browse/HDFS-13399?focusedCommentId=16454623=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16454623]. I don't remember though what was the resolution back then. > Race condition with async edits logging due to updating txId outside of the > namesystem log > -- > > Key: HDFS-15915 > URL: https://issues.apache.org/jira/browse/HDFS-15915 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Priority: Major > Attachments: testMkdirsRace.patch > > > {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside > {{FSNamesystem.writeLock}}. But one essential field the transaction id of the > edits op remains unset until the time when the operation is scheduled for > synching. At that time {{beginTransaction()}} will set the the > {{FSEditLogOp.txid}} and increment the global transaction count. On busy > NameNode this event can fall outside the write lock. > This causes problems for Observer reads. It also can potentially reshuffle > transactions and Standby will apply them in a wrong order. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log
[ https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-15915: --- Attachment: testMkdirsRace.patch > Race condition with async edits logging due to updating txId outside of the > namesystem log > -- > > Key: HDFS-15915 > URL: https://issues.apache.org/jira/browse/HDFS-15915 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Priority: Major > Attachments: testMkdirsRace.patch > > > {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside > {{FSNamesystem.writeLock}}. But one essential field the transaction id of the > edits op remains unset until the time when the operation is scheduled for > synching. At that time {{beginTransaction()}} will set the the > {{FSEditLogOp.txid}} and increment the global transaction count. On busy > NameNode this event can fall outside the write lock. > This causes problems for Observer reads. It also can potentially reshuffle > transactions and Standby will apply them in a wrong order. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log
[ https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307458#comment-17307458 ] Konstantin Shvachko commented on HDFS-15915: # Suppose two {{mkdirs}} for the same path are running on the Active NameNode at the same time. Assume that the path does not exist yet and that the two RPCs are coming from two clients c1 and c2. # Then one of them, e.g. c1, will create the directory in memory and generate the respective transaction {{MkdirOp}}, which has all the fields except for {{txid}}. Then it will enqueue the transaction in {{FSEditLogAsync.logEdit(op)}} for further asynchronous processing. The handler thread processing this rpc from c1 is now free to release the write lock and give control to other threads. # {{FSEditLogAsync.run()}} will asynchronously process the transaction when it dequeues it. At that time it will assign the {{txid}} for the transaction, see {{logEdit() -> doEditTransaction() -> beginTransaction()}}, and increment the global transaction count {{FSEditLog.txid}}. This can happen either inside or outside of the namesystem lock. Under heavy load (rare event) the call to {{logEdit()}} can happen outside the lock. And that causes the problem. # Now suppose that {{MkdirOp}} has not been processed yet, but the second {{mkdirs()}} from client c2 started executing. It can proceed because the write lock has been released. The c2 call will find that the directory already exists and will return to the client without generating any transactions. In the reply it will populate {{lastSeenStateId}}. But the stateId will be less than the txId of the {{MkdirOp}} client c2 just have seen, because this transaction has not been processed yet and the global tx count {{FSEditLog.txid}} did not advance. # Then of course going to ObserverNode with that transaction id can cause stale read if the client reaches the Observer before it tails the {{MkdirOp}} edit from the journal. I managed to reproduce this in a unit test. Attaching. The test spawns a bunch of {{mkdirs()}} on the same path. Then it mocks {{doEditTransaction()}} to delay async processing of the mkdir transaction on Active NN. The delay is sufficient for another {{mkdirs()}} call to pass through and obtain the wrong {{lastSeenStateId}}. Then one can see {{FileNotFoundException}}, which indicates stale read from Observer. _Seems like a straightforward solution is to assign the transaction id at the time of its creation before it is enqueued. The queue order should guarantee the same result of the assignment as now, but will avoid the race._ > Race condition with async edits logging due to updating txId outside of the > namesystem log > -- > > Key: HDFS-15915 > URL: https://issues.apache.org/jira/browse/HDFS-15915 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Priority: Major > > {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside > {{FSNamesystem.writeLock}}. But one essential field the transaction id of the > edits op remains unset until the time when the operation is scheduled for > synching. At that time {{beginTransaction()}} will set the the > {{FSEditLogOp.txid}} and increment the global transaction count. On busy > NameNode this event can fall outside the write lock. > This causes problems for Observer reads. It also can potentially reshuffle > transactions and Standby will apply them in a wrong order. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log
Konstantin Shvachko created HDFS-15915: -- Summary: Race condition with async edits logging due to updating txId outside of the namesystem log Key: HDFS-15915 URL: https://issues.apache.org/jira/browse/HDFS-15915 Project: Hadoop HDFS Issue Type: Bug Components: hdfs, namenode Reporter: Konstantin Shvachko {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside {{FSNamesystem.writeLock}}. But one essential field the transaction id of the edits op remains unset until the time when the operation is scheduled for synching. At that time {{beginTransaction()}} will set the the {{FSEditLogOp.txid}} and increment the global transaction count. On busy NameNode this event can fall outside the write lock. This causes problems for Observer reads. It also can potentially reshuffle transactions and Standby will apply them in a wrong order. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14731) [FGL] Remove redundant locking on NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305081#comment-17305081 ] Konstantin Shvachko commented on HDFS-14731: Sure [~xilangyan] I was thinking about back porting into 2.10 branch as well. Would you like to work on a backport patch? I didn't look if it is straightforward or not. > [FGL] Remove redundant locking on NameNode. > --- > > Key: HDFS-14731 > URL: https://issues.apache.org/jira/browse/HDFS-14731 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-14731.001.patch > > > Currently NameNode has two global locks: FSNamesystemLock and > FSDirectoryLock. An analysis shows that single FSNamesystemLock is sufficient > to guarantee consistency of the NameNode state. FSDirectoryLock can be > removed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time
[ https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296958#comment-17296958 ] Konstantin Shvachko commented on HDFS-15808: Hey there is no way to modify a commit unless we force-push, which is not recommended and we do it only on feature branches. I guess you just need to make sure to set the right address in the future. > Add metrics for FSNamesystem read/write lock hold long time > --- > > Key: HDFS-15808 > URL: https://issues.apache.org/jira/browse/HDFS-15808 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: hdfs, lock, metrics, pull-request-available > Fix For: 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3 > > Attachments: ExpiredHeartbeat.png, HDFS-15808-branch-3.3.001.patch, > lockLongHoldCount > > Time Spent: 6h 10m > Remaining Estimate: 0h > > To monitor how often read/write locks exceed thresholds, we can add two > metrics(ReadLockLongHoldCount/WriteLockLongHoldCount), which are exposed in > JMX. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time
[ https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-15808: --- Fix Version/s: 3.2.3 2.10.2 3.1.5 3.4.0 3.3.1 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) I just committed this to trunk and branches listed in "Fix Version". Thank you [~tomscut]. > Add metrics for FSNamesystem read/write lock hold long time > --- > > Key: HDFS-15808 > URL: https://issues.apache.org/jira/browse/HDFS-15808 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: hdfs, lock, metrics, pull-request-available > Fix For: 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3 > > Attachments: ExpiredHeartbeat.png, HDFS-15808-branch-3.3.001.patch, > lockLongHoldCount > > Time Spent: 6h 10m > Remaining Estimate: 0h > > To monitor how often read/write locks exceed thresholds, we can add two > metrics(ReadLockLongHoldCount/WriteLockLongHoldCount), which are exposed in > JMX. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time
[ https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-15808: --- Status: Patch Available (was: Open) > Add metrics for FSNamesystem read/write lock hold long time > --- > > Key: HDFS-15808 > URL: https://issues.apache.org/jira/browse/HDFS-15808 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: hdfs, lock, metrics, pull-request-available > Attachments: ExpiredHeartbeat.png, HDFS-15808-branch-3.3.001.patch, > lockLongHoldCount > > Time Spent: 6h 10m > Remaining Estimate: 0h > > To monitor how often read/write locks exceed thresholds, we can add two > metrics(ReadLockLongHoldCount/WriteLockLongHoldCount), which are exposed in > JMX. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15849) ExpiredHeartbeats metric should be of Type.COUNTER
[ https://issues.apache.org/jira/browse/HDFS-15849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-15849: --- Fix Version/s: 3.2.3 2.10.2 3.1.5 3.4.0 3.3.1 Resolution: Fixed Status: Resolved (was: Patch Available) I just committed this. Thank you [~zhuqi] for contributing. > ExpiredHeartbeats metric should be of Type.COUNTER > -- > > Key: HDFS-15849 > URL: https://issues.apache.org/jira/browse/HDFS-15849 > Project: Hadoop HDFS > Issue Type: Bug > Components: metrics >Reporter: Konstantin Shvachko >Assignee: Qi Zhu >Priority: Major > Labels: newbie > Fix For: 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3 > > Attachments: HDFS-15849.001.patch, HDFS-15849.002.patch > > > Currently {{ExpiredHeartbeats}} metric has default type, which makes it > {{Type.GAUGE}}. It should be {{Type.COUNTER}} for proper graphing. See > discussion in HDFS-15808. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time
[ https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293268#comment-17293268 ] Konstantin Shvachko commented on HDFS-15808: [~tomscut] I have problem merging this to branch-3.3. We want it there, right? If so, could you please provide a patch for 3.3. > Add metrics for FSNamesystem read/write lock hold long time > --- > > Key: HDFS-15808 > URL: https://issues.apache.org/jira/browse/HDFS-15808 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: hdfs, lock, metrics, pull-request-available > Attachments: ExpiredHeartbeat.png, lockLongHoldCount > > Time Spent: 5h 50m > Remaining Estimate: 0h > > To monitor how often read/write locks exceed thresholds, we can add two > metrics(ReadLockLongHoldCount/WriteLockLongHoldCount), which are exposed in > JMX. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time
[ https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17288566#comment-17288566 ] Konstantin Shvachko commented on HDFS-15808: +1 on pull request. Created HDFS-15849 to fix {{ExpiredHeartbeats}} > Add metrics for FSNamesystem read/write lock hold long time > --- > > Key: HDFS-15808 > URL: https://issues.apache.org/jira/browse/HDFS-15808 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: hdfs, lock, metrics, pull-request-available > Attachments: ExpiredHeartbeat.png, lockLongHoldCount > > Time Spent: 5h > Remaining Estimate: 0h > > To monitor how often read/write locks exceed thresholds, we can add two > metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15849) ExpiredHeartbeats metric should be of Type.COUNTER
Konstantin Shvachko created HDFS-15849: -- Summary: ExpiredHeartbeats metric should be of Type.COUNTER Key: HDFS-15849 URL: https://issues.apache.org/jira/browse/HDFS-15849 Project: Hadoop HDFS Issue Type: Bug Components: metrics Reporter: Konstantin Shvachko Currently {{ExpiredHeartbeats}} metric has default type, which makes it {{Type.GAUGE}}. It should be {{Type.COUNTER}} for proper graphing. See discussion in HDFS-15808. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time
[ https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17287396#comment-17287396 ] Konstantin Shvachko commented on HDFS-15808: [~tomscut] sure we all use different systems for managing metrics. {{RpcQueueTime}} is of type {{MutableRate}}, while {{ExpiredHeartbeats}} and your new metric are just a {{@Metric}}, which makes it of type {{GAUGE}} as Erik pointed out. In my system {{ExpiredHeartbeats}} look like this: !ExpiredHeartbeat.png! Good point [~xkrogen] about adding {{type=Type.COUNT}} to the annotation, this should fix the problem. > Add metrics for FSNamesystem read/write lock hold long time > --- > > Key: HDFS-15808 > URL: https://issues.apache.org/jira/browse/HDFS-15808 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: hdfs, lock, metrics, pull-request-available > Attachments: ExpiredHeartbeat.png, lockLongHoldCount > > Time Spent: 4.5h > Remaining Estimate: 0h > > To monitor how often read/write locks exceed thresholds, we can add two > metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time
[ https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-15808: --- Attachment: ExpiredHeartbeat.png > Add metrics for FSNamesystem read/write lock hold long time > --- > > Key: HDFS-15808 > URL: https://issues.apache.org/jira/browse/HDFS-15808 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: hdfs, lock, metrics, pull-request-available > Attachments: ExpiredHeartbeat.png, lockLongHoldCount > > Time Spent: 4.5h > Remaining Estimate: 0h > > To monitor how often read/write locks exceed thresholds, we can add two > metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time
[ https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-15808: --- Attachment: (was: ExpiredHeartbeat.png) > Add metrics for FSNamesystem read/write lock hold long time > --- > > Key: HDFS-15808 > URL: https://issues.apache.org/jira/browse/HDFS-15808 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: hdfs, lock, metrics, pull-request-available > Attachments: lockLongHoldCount > > Time Spent: 4.5h > Remaining Estimate: 0h > > To monitor how often read/write locks exceed thresholds, we can add two > metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time
[ https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-15808: --- Attachment: ExpiredHeartbeat.png > Add metrics for FSNamesystem read/write lock hold long time > --- > > Key: HDFS-15808 > URL: https://issues.apache.org/jira/browse/HDFS-15808 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: hdfs, lock, metrics, pull-request-available > Attachments: ExpiredHeartbeat.png, lockLongHoldCount > > Time Spent: 4.5h > Remaining Estimate: 0h > > To monitor how often read/write locks exceed thresholds, we can add two > metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time
[ https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286133#comment-17286133 ] Konstantin Shvachko commented on HDFS-15808: Hey [~tomscut]. The patch looks fine, but I doubt the metric will be useful in its current form. Monotonically increasing counter doesn't tell you much when plotted. Over time it just becomes an incredibly large number, hard to see its fluctuations. And you cannot set alerts if the threshold is exceeded often. See e.g. {{ExpiredHeartbeats}} or {{LastWrittenTransactionId}} - not useful. I assume you need something like a rate. > Add metrics for FSNamesystem read/write lock hold long time > --- > > Key: HDFS-15808 > URL: https://issues.apache.org/jira/browse/HDFS-15808 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: hdfs, lock, metrics, pull-request-available > Time Spent: 3h 10m > Remaining Estimate: 0h > > To monitor how often read/write locks exceed thresholds, we can add two > metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15792) ClasscastException while loading FSImage
[ https://issues.apache.org/jira/browse/HDFS-15792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282003#comment-17282003 ] Konstantin Shvachko commented on HDFS-15792: Hey guys, sorry for coming late to this. But we should avoid using {{ConcurrentHashMap}}. It is known to have performance issues and adds a lot of memory overhead. So whoever is using ACLs heavily will have larger namespace requirements - very bad for large clusters. Would prefer proper synchronization of the methods in {{ReferenceCountMap}}. Should we reopen this to revisit the fix? > ClasscastException while loading FSImage > > > Key: HDFS-15792 > URL: https://issues.apache.org/jira/browse/HDFS-15792 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Fix For: 3.3.1, 3.4.0, 2.10.2 > > Attachments: HDFS-15792-branch-2.10.001.patch, > HDFS-15792-branch-2.10.002.patch, HDFS-15792-branch-2.10.003.patch, > HDFS-15792-branch-2.10.004.patch, HDFS-15792.001.patch, HDFS-15792.002.patch, > HDFS-15792.003.patch, HDFS-15792.004.patch, HDFS-15792.005.patch, > HDFS-15792.addendum.001.patch, image-2021-01-27-12-00-34-846.png > > > FSImage loading has failed with ClasscastException - > java.lang.ClassCastException: java.util.HashMap$Node cannot be cast to > java.util.HashMap$TreeNode. > This is the usage issue with Hashmap in concurrent scenarios. > Same issue has been reported on Java & closed as usage issue. - > https://bugs.openjdk.java.net/browse/JDK-8173671 > 2020-12-28 11:36:26,127 | ERROR | main | An exception occurred when loading > INODE from fsiamge. | FSImageFormatProtobuf.java:442 > java.lang. > : java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode > at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1835) > at java.util.HashMap$TreeNode.treeify(HashMap.java:1951) > at java.util.HashMap.treeifyBin(HashMap.java:772) > at java.util.HashMap.putVal(HashMap.java:644) > at java.util.HashMap.put(HashMap.java:612) > at > org.apache.hadoop.hdfs.util.ReferenceCountMap.put(ReferenceCountMap.java:53) > at > org.apache.hadoop.hdfs.server.namenode.AclStorage.addAclFeature(AclStorage.java:391) > at > org.apache.hadoop.hdfs.server.namenode.INodeWithAdditionalFields.addAclFeature(INodeWithAdditionalFields.java:349) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectory(FSImageFormatPBINode.java:225) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINode(FSImageFormatPBINode.java:406) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.readPBINodes(FSImageFormatPBINode.java:367) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeSection(FSImageFormatPBINode.java:342) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader$2.call(FSImageFormatProtobuf.java:469) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2020-12-28 11:36:26,130 | ERROR | main | Failed to load image from > FSImageFile(file=/srv/BigData/namenode/current/fsimage_00198227480, > cpktTxId=00198227480) | FSImage.java:738 > java.io.IOException: java.lang.ClassCastException: java.util.HashMap$Node > cannot be cast to java.util.HashMap$TreeNode > at > org.apache.hadoop.io.MultipleIOException$Builder.add(MultipleIOException.java:68) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.runLoaderTasks(FSImageFormatProtobuf.java:444) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:360) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:263) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:227) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:971) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:955) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:820) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:733) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:331) > at >
[jira] [Resolved] (HDFS-15632) AbstractContractDeleteTest should set recursive peremeter to true for recursive test cases.
[ https://issues.apache.org/jira/browse/HDFS-15632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-15632. Fix Version/s: 3.2.3 2.10.2 3.1.5 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed I just committed this. Thank you [~antn.kutuzov] for contributing. > AbstractContractDeleteTest should set recursive peremeter to true for > recursive test cases. > --- > > Key: HDFS-15632 > URL: https://issues.apache.org/jira/browse/HDFS-15632 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Anton Kutuzov >Priority: Major > Labels: newbie, pull-request-available > Fix For: 3.4.0, 3.1.5, 2.10.2, 3.2.3 > > Time Spent: 20m > Remaining Estimate: 0h > > {{AbstractContractDeleteTest.testDeleteNonexistentPathRecursive()}} should > call {{delete(path, true)}} rather than {{false}} > Also {{AbstractContractDeleteTest.testDeleteNonexistentPathNonRecursive()}} > has a wrong assert message. Should be {{"... attempting to non-recursively > delete ..."}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-954) There are two security packages in hdfs, should be one
[ https://issues.apache.org/jira/browse/HDFS-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-954. -- Resolution: Won't Fix Hey [~antn.kutuzov] this is a rather old jira. I don't think it is a good idea to do repackaging at this point since it will make things harder to backport to older versions. Closing as won't fix. > There are two security packages in hdfs, should be one > -- > > Key: HDFS-954 > URL: https://issues.apache.org/jira/browse/HDFS-954 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Jakob Homan >Priority: Major > Labels: newbie > > Currently the test source tree has both > src/test/hdfs/org/apache/hadoop/hdfs/security with: > SecurityTestUtil.java > TestAccessToken.java > TestClientProtocolWithDelegationToken.java > and > src/test/hdfs/org/apache/hadoop/security with: > TestDelegationToken.java > TestGroupMappingServiceRefresh.java > TestPermission.java > These should be combined into one package and possibly some things moved to > common. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15632) AbstractContractDeleteTest should set recursive peremeter to true for recursive test cases.
[ https://issues.apache.org/jira/browse/HDFS-15632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17269631#comment-17269631 ] Konstantin Shvachko commented on HDFS-15632: +1 PR looks good. Will commit in a bit. > AbstractContractDeleteTest should set recursive peremeter to true for > recursive test cases. > --- > > Key: HDFS-15632 > URL: https://issues.apache.org/jira/browse/HDFS-15632 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Anton Kutuzov >Priority: Major > Labels: newbie, pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > {{AbstractContractDeleteTest.testDeleteNonexistentPathRecursive()}} should > call {{delete(path, true)}} rather than {{false}} > Also {{AbstractContractDeleteTest.testDeleteNonexistentPathNonRecursive()}} > has a wrong assert message. Should be {{"... attempting to non-recursively > delete ..."}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15751) Add documentation for msync() API to filesystem.md
[ https://issues.apache.org/jira/browse/HDFS-15751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268314#comment-17268314 ] Konstantin Shvachko commented on HDFS-15751: Created HADOOP-17477 to track implementation of {{msync()}} for {{ViewFS}} > Add documentation for msync() API to filesystem.md > -- > > Key: HDFS-15751 > URL: https://issues.apache.org/jira/browse/HDFS-15751 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3 > > Attachments: HDFS-15751-01.patch, HDFS-15751-02.patch, > HDFS-15751-03.patch > > > HDFS-15567 introduced new {{FileSystem}} call {{msync()}}. Should add it to > the API definitions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15751) Add documentation for msync() API to filesystem.md
[ https://issues.apache.org/jira/browse/HDFS-15751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257840#comment-17257840 ] Konstantin Shvachko commented on HDFS-15751: Thanks guys for taking care of that. > Add documentation for msync() API to filesystem.md > -- > > Key: HDFS-15751 > URL: https://issues.apache.org/jira/browse/HDFS-15751 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3 > > Attachments: HDFS-15751-01.patch, HDFS-15751-02.patch, > HDFS-15751-03.patch > > > HDFS-15567 introduced new {{FileSystem}} call {{msync()}}. Should add it to > the API definitions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15751) Add documentation for msync() API to filesystem.md
[ https://issues.apache.org/jira/browse/HDFS-15751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-15751: --- Status: Patch Available (was: Open) > Add documentation for msync() API to filesystem.md > -- > > Key: HDFS-15751 > URL: https://issues.apache.org/jira/browse/HDFS-15751 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Attachments: HDFS-15751-01.patch > > > HDFS-15567 introduced new {{FileSystem}} call {{msync()}}. Should add it to > the API definitions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15751) Add documentation for msync() API to filesystem.md
[ https://issues.apache.org/jira/browse/HDFS-15751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254662#comment-17254662 ] Konstantin Shvachko commented on HDFS-15751: Added documentation for {{msync()}}. I put a reference to HDFS documentation describing [Consistent Reads from Observer|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html] particulalrly the semantics of {{msync()}} in HDFS. Otherwise it is hard to define it in abstract terms without the context. I was also thinking about {{AbstractContract}}* type tests for {{msync}} but could not think of anything valuable that can be tested here. Essentially we need call {{mkdir()}} and then verify that after {{msync}} the metadata exists via a read call, which we do for HDFS in {{TestConsistentReadsObserver}}. But it is a probabilistic thing as it can succeed even without synchronization if standby catches up fast enough. I guess testing consistency contracts is similar to atomicity, which we don't test in {{AbstractContract}} tests, since it not clear how. > Add documentation for msync() API to filesystem.md > -- > > Key: HDFS-15751 > URL: https://issues.apache.org/jira/browse/HDFS-15751 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Attachments: HDFS-15751-01.patch > > > HDFS-15567 introduced new {{FileSystem}} call {{msync()}}. Should add it to > the API definitions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15751) Add documentation for msync() API to filesystem.md
[ https://issues.apache.org/jira/browse/HDFS-15751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-15751: --- Attachment: HDFS-15751-01.patch > Add documentation for msync() API to filesystem.md > -- > > Key: HDFS-15751 > URL: https://issues.apache.org/jira/browse/HDFS-15751 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Attachments: HDFS-15751-01.patch > > > HDFS-15567 introduced new {{FileSystem}} call {{msync()}}. Should add it to > the API definitions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15751) Add documentation for msync() API to filesystem.md
Konstantin Shvachko created HDFS-15751: -- Summary: Add documentation for msync() API to filesystem.md Key: HDFS-15751 URL: https://issues.apache.org/jira/browse/HDFS-15751 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko HDFS-15567 introduced new {{FileSystem}} call {{msync()}}. Should add it to the API definitions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15746) Standby NameNode crash when replay editlog
[ https://issues.apache.org/jira/browse/HDFS-15746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254641#comment-17254641 ] Konstantin Shvachko commented on HDFS-15746: Looks like your branch is either quite old or diverged a lot, it doesn't look like any of Apache branches based on line numbers, etc. On trunk and all other maintained branches up to 2.10 there is now place were an NPE can happen inside {{BlockInfo.setGenerationStampAndVerifyReplicas()}}. So it could be specific to branch-2.7 or even your own fork. I found that you already reported that same problem in HDFS-14529 about [a year ago|https://issues.apache.org/jira/browse/HDFS-14529?focusedCommentId=16970092=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16970092]. You might want to look at Jiras related to this issue. Or upgrade your HDFS. I believe this problem had been solved in later releases. > Standby NameNode crash when replay editlog > -- > > Key: HDFS-15746 > URL: https://issues.apache.org/jira/browse/HDFS-15746 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-15746.001.patch > > > Standby NameNode meet NPE and crash when replay editlog, After dig log and > source code, Not found the root cause. But some information may be useful for > this case. > a. before SBN crash, ANN do one lease recovery. > {code:java} > 2020-12-23 12:37:45,946 WARN org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.internalReleaseLease: $PATH has not been closed. Lease recovery is > in progress. RecoveryId = 21696709510 for block blk_*_21658833701 > {code} > b. then one Datanode Volumn failed which manage one replica of > blk_*_21658833701 after lease recovery. > c. after half one hour, SBN crash because NPE as following. > {code:java} > 2020-12-23 13:13:36,703 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation CloseOp [length=0, inodeId=0, path=$PATH, replication=3, > mtime=1608698268201, atime=1608343529481, blockSize=268435456, > blocks=[blk_$i_$j], permissions=user:group:rw-r--r--, aclEntries=null, > clientName=, clientMachine=, overwrite=false, storagePolicyId=0, > opCode=OP_CLOSE, txid=$txid] > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.setGenerationStampAndVerifyReplicas(BlockInfo.java:455) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.commitBlock(BlockInfo.java:476) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.forceCompleteBlock(BlockManager.java:1248) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1065) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:244) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:152) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:843) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1706) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:428) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297) > 2020-12-23 13:13:36,703 ERROR org.apache.hadoop.ipc.Server: Error in Reader > java.nio.channels.ClosedChannelException > at > java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:197) > at > org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:1053) > at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:1034) > 2020-12-23 13:13:36,703 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.16.39.26:50010 is added to blk_22374572883_21672067156
[jira] [Commented] (HDFS-15746) Standby NameNode crash when replay editlog
[ https://issues.apache.org/jira/browse/HDFS-15746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254245#comment-17254245 ] Konstantin Shvachko commented on HDFS-15746: [~hexiaoqiao] thanks for reporting this. Which version of Hadoop do you see this with? I agree with [~elgoiri] it would be really good to understand the root cause, so that we could add a unit test. > Standby NameNode crash when replay editlog > -- > > Key: HDFS-15746 > URL: https://issues.apache.org/jira/browse/HDFS-15746 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-15746.001.patch > > > Standby NameNode meet NPE and crash when replay editlog, After dig log and > source code, Not found the root cause. But some information may be useful for > this case. > a. before SBN crash, ANN do one lease recovery. > {code:java} > 2020-12-23 12:37:45,946 WARN org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.internalReleaseLease: $PATH has not been closed. Lease recovery is > in progress. RecoveryId = 21696709510 for block blk_*_21658833701 > {code} > b. then one Datanode Volumn failed which manage one replica of > blk_*_21658833701 after lease recovery. > c. after half one hour, SBN crash because NPE as following. > {code:java} > 2020-12-23 13:13:36,703 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation CloseOp [length=0, inodeId=0, path=$PATH, replication=3, > mtime=1608698268201, atime=1608343529481, blockSize=268435456, > blocks=[blk_$i_$j], permissions=user:group:rw-r--r--, aclEntries=null, > clientName=, clientMachine=, overwrite=false, storagePolicyId=0, > opCode=OP_CLOSE, txid=$txid] > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.setGenerationStampAndVerifyReplicas(BlockInfo.java:455) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.commitBlock(BlockInfo.java:476) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.forceCompleteBlock(BlockManager.java:1248) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1065) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:244) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:152) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:843) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1706) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:428) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297) > 2020-12-23 13:13:36,703 ERROR org.apache.hadoop.ipc.Server: Error in Reader > java.nio.channels.ClosedChannelException > at > java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:197) > at > org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:1053) > at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:1034) > 2020-12-23 13:13:36,703 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.16.39.26:50010 is added to blk_22374572883_21672067156 > size 58762255 > 2020-12-23 13:13:36,704 FATAL > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error > encountered while tailing edits. Shutting down standby NN. > java.io.IOException: java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:254) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:152) > at >
[jira] [Commented] (HDFS-15567) [SBN Read] HDFS should expose msync() API to allow downstream applications call it explicitly.
[ https://issues.apache.org/jira/browse/HDFS-15567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248677#comment-17248677 ] Konstantin Shvachko commented on HDFS-15567: Hey Steve, I am not sure I fully understand what is broken here. I believe it is not an incompatible change. Could you please explain what you think the process is. Would be best if you could share a link to a document describing it. I would be glad to follow up with tests and documentation that are needed. As you can see I proposed multiple solutions to the problem here. Seemed nobody was objecting, so I chose one and explained why. I believe we call it lazy consensus. > [SBN Read] HDFS should expose msync() API to allow downstream applications > call it explicitly. > -- > > Key: HDFS-15567 > URL: https://issues.apache.org/jira/browse/HDFS-15567 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha, hdfs-client >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Fix For: 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3 > > Attachments: HDFS-15567.001.patch, HDFS-15567.002.patch > > > Consistent reads from Standby introduced {{msync()}} API HDFS-13688, which > updates client's state ID with current state of the Active NameNode to > guarantee consistency of subsequent calls to an ObserverNode. Currently this > API is exposed via {{DFSClient}} only, which makes it hard for applications > to access {{msync()}}. One way is to use something like this: > {code} > if(fs instanceof DistributedFileSystem) { > ((DistributedFileSystem)fs).getClient().msync(); > } > {code} > This should be exposed both for {{FileSystem}} and {{FileContext}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15567) [SBN Read] HDFS should expose msync() API to allow downstream applications call it explicitly.
[ https://issues.apache.org/jira/browse/HDFS-15567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210550#comment-17210550 ] Konstantin Shvachko edited comment on HDFS-15567 at 12/13/20, 9:11 PM: --- Thanks for taking a look [~vagarychen]. * Both {{FileSystem}} and {{AbstractFileSystem}} throw {{UnsupportedOperationException}} with my patch. This is a standard pattern and a way for clients to learn if the operation is supported or not in the implementation. No-op will hide the problem and for {{msync}} in particular can lead to inconsistent results further down the road, which is hard to debug as we both know. * Logging in the tests is not "required", but it helped a lot in debugging problems that I fixed when some tests were failing. I decided to leave them in the code in case something breaks in the future. I agree we usually try to restrict change to bare minimum to avoid conflicts while backporting. In this case with code relatively recent I don't see it a blocker for backports. * Ran tests that failed on Jenkins locally. All passed. They are long running tests, which frequently fail on Jenkins builds. was (Author: shv): Thanks for taking a look [~vagarychen]. * Both {{FileSystem}} and {{AbstractFileSystem}} throw {{UnsupportedOperationException}} with my patch. This is a standard pattern and a way for clients to learn if the operation is supported or not in the implementation. No-op will hide the problem and for {{mscyn}} in particular can lead to inconsistent results further down the road, which is hard to debug as we both know. * Logging in the tests is not "required", but it helped a lot in debugging problems that I fixed when some tests were failing. I decided to leave them in the code in case something breaks in the future. I agree we usually try to restrict change to bare minimum to avoid conflicts while backporting. In this case with code relatively recent I don't see it a blocker for backports. * Ran tests that failed on Jenkins locally. All passed. They are long running tests, which frequently fail on Jenkins builds. > [SBN Read] HDFS should expose msync() API to allow downstream applications > call it explicitly. > -- > > Key: HDFS-15567 > URL: https://issues.apache.org/jira/browse/HDFS-15567 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha, hdfs-client >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Fix For: 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3 > > Attachments: HDFS-15567.001.patch, HDFS-15567.002.patch > > > Consistent reads from Standby introduced {{msync()}} API HDFS-13688, which > updates client's state ID with current state of the Active NameNode to > guarantee consistency of subsequent calls to an ObserverNode. Currently this > API is exposed via {{DFSClient}} only, which makes it hard for applications > to access {{msync()}}. One way is to use something like this: > {code} > if(fs instanceof DistributedFileSystem) { > ((DistributedFileSystem)fs).getClient().msync(); > } > {code} > This should be exposed both for {{FileSystem}} and {{FileContext}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.
[ https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238894#comment-17238894 ] Konstantin Shvachko commented on HDFS-4452: --- This is an interesting observation [~honestman]. You are right in your scenario the block creation will fail and the client will have to retry either re-writing just the last block or the entire file. The good thing is that the namespace remains in a consistent state. Which was the problem with the original issue in this jira. This is essentially a scenario for "Case 3" of {{analyzeFileState()}}. It would be good to confirm with a unit test this is indeed possible. NameNode should not violate the contract of persisting all the data that was successfully reported to clients. > getAdditionalBlock() can create multiple blocks if the client times out and > retries. > > > Key: HDFS-4452 > URL: https://issues.apache.org/jira/browse/HDFS-4452 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Critical > Fix For: 2.0.3-alpha > > Attachments: TestAddBlockRetry.java, > getAdditionalBlock-branch2.patch, getAdditionalBlock.patch, > getAdditionalBlock.patch, getAdditionalBlock.patch > > > HDFS client tries to addBlock() to a file. If NameNode is busy the client can > timeout and will reissue the same request again. The two requests will race > with each other in {{FSNamesystem.getAdditionalBlock()}}, which can result in > creating two new blocks on the NameNode while the client will know of only > one of them. This eventually results in {{NotReplicatedYetException}} because > the extra block is never reported by any DataNode, which stalls file creation > and puts it in invalid state with an empty block in the middle. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15562) StandbyCheckpointer will do checkpoint repeatedly while connecting observer/active namenode failed
[ https://issues.apache.org/jira/browse/HDFS-15562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228786#comment-17228786 ] Konstantin Shvachko commented on HDFS-15562: Hey guys, I think generally the checkpointer should persist until the checkpoint completes and the image is transferred. With large image we see transfers fail once in a while, so just ignoring image transfer failures isn't right. I understand that with multiple ObserverNodes some of them can be down. We already have logic for ActiveNN and ObserverNodes to reject an image if they already have one recent enough. So frequent checkpoints should not overwhelm the active or the Observers. We may add a logic for the Checkpointer to not re-create an image if it was created recently. But this does not seem to be a big concern. > StandbyCheckpointer will do checkpoint repeatedly while connecting > observer/active namenode failed > -- > > Key: HDFS-15562 > URL: https://issues.apache.org/jira/browse/HDFS-15562 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: SunHao >Assignee: Aihua Xu >Priority: Major > Labels: pull-request-available > Attachments: HDFS-15562.patch > > Time Spent: 20m > Remaining Estimate: 0h > > We find the standby namenode will do checkpoint over and over while > connecting observer/active namenode failed. > StandbyCheckpointer won't update “lastCheckpointTime” when upload new fsimage > to the other namenode failed, so that the standby namenode will keep doing > checkpoint repeatedly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15623) Respect configured values of rpc.engine
[ https://issues.apache.org/jira/browse/HDFS-15623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-15623. Fix Version/s: 3.2.3 2.10.2 3.1.5 3.4.0 3.3.1 Hadoop Flags: Reviewed Resolution: Fixed I just committed to trunk and branches 3.3, 3.2, 3.1, 2.10. Thank you [~hchaverri] > Respect configured values of rpc.engine > --- > > Key: HDFS-15623 > URL: https://issues.apache.org/jira/browse/HDFS-15623 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Hector Sandoval Chaverri >Assignee: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3 > > Time Spent: 0.5h > Remaining Estimate: 0h > > The HDFS Configuration allows users to specify the RPCEngine implementation > to use when communicating with Datanodes and Namenodes. However, the value is > overwritten to ProtobufRpcEngine.class in different classes. As an example in > NameNodeRpcServer: > {{RPC.setProtocolEngine(conf, ClientNamenodeProtocolPB.class, > ProtobufRpcEngine.class);}} > {{The configured value of rpc.engine.[protocolName] should be respected to > allow for other implementations of RPCEngine to be used}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15623) Respect configured values of rpc.engine
[ https://issues.apache.org/jira/browse/HDFS-15623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227602#comment-17227602 ] Konstantin Shvachko commented on HDFS-15623: Was looking into this a bit more. * So the right solution would be to use standard config default mechanism, that is ## remove all hardcoded {{RPC.setProtocolEngine()}} ## and instead change {{RPC.getProtocolEngine()}} to use {{ProtobufRpcEngine2}} as the default {{RpcEngine}} rather than {{WritableRpcEngine}} * But this can break some unit tests, which still expect {{WritableRpcEngine}} as people learned in HADOOP-12579. * Even though the problem above may have been fixed in hadoop 3, it is definitely present in hadoop 2. So I think this patch does the right thing if we want to make RpcEngines plugable again as it was originally intended in HADOOP-6422. Will commit this shortly. > Respect configured values of rpc.engine > --- > > Key: HDFS-15623 > URL: https://issues.apache.org/jira/browse/HDFS-15623 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Hector Sandoval Chaverri >Assignee: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The HDFS Configuration allows users to specify the RPCEngine implementation > to use when communicating with Datanodes and Namenodes. However, the value is > overwritten to ProtobufRpcEngine.class in different classes. As an example in > NameNodeRpcServer: > {{RPC.setProtocolEngine(conf, ClientNamenodeProtocolPB.class, > ProtobufRpcEngine.class);}} > {{The configured value of rpc.engine.[protocolName] should be respected to > allow for other implementations of RPCEngine to be used}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15665) Balancer logging improvement
[ https://issues.apache.org/jira/browse/HDFS-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-15665: --- Fix Version/s: 3.2.3 2.10.2 3.1.5 3.4.0 3.3.1 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) I just committed this to trunk and branches 3.3, 3.2, 3.1, and 2.10. Thanks Chen for the review. There were conflicts for 3.1 and 2.10 related to the LOG type. I changes Balancer and DIspatcher logs to sl4j > Balancer logging improvement > > > Key: HDFS-15665 > URL: https://issues.apache.org/jira/browse/HDFS-15665 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Fix For: 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3 > > Attachments: HDFS-15665.001.patch, HDFS-15665.002.patch > > > It would be good to have Balancer log all relevant configuration parameters > on each iteration along with some data, which reflects its progress and the > amount of resources it involves. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15665) Balancer logging improvement
[ https://issues.apache.org/jira/browse/HDFS-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-15665: --- Attachment: HDFS-15665.002.patch > Balancer logging improvement > > > Key: HDFS-15665 > URL: https://issues.apache.org/jira/browse/HDFS-15665 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Attachments: HDFS-15665.001.patch, HDFS-15665.002.patch > > > It would be good to have Balancer log all relevant configuration parameters > on each iteration along with some data, which reflects its progress and the > amount of resources it involves. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15665) Balancer logging improvement
[ https://issues.apache.org/jira/browse/HDFS-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225092#comment-17225092 ] Konstantin Shvachko commented on HDFS-15665: Thanks for the review [~vagarychen]. * {{getInt()}} is actually called to print a log message for the parameter. The value itself may not be used in the Balancer itself. * Two log messages looks better infact. Because the firs message is pretty long as it prints {{NameNodeConnector}} including URI and block pool id {noformat:nowrap} 2020-11-02 10:42:59,939 [Listener at localhost/64077] INFO balancer.Balancer (Balancer.java:runOneIteration(641)) - Will move 100.79 MB in this iteration for NameNodeConnector[namenodeUri=hdfs://localhost:64069, bpid=BP-79516876-172.18.170.12-1604342573024] {noformat} So if I append the second line it will be hard to read the logs. Will update the patch with checkstyle fixes. > Balancer logging improvement > > > Key: HDFS-15665 > URL: https://issues.apache.org/jira/browse/HDFS-15665 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Attachments: HDFS-15665.001.patch > > > It would be good to have Balancer log all relevant configuration parameters > on each iteration along with some data, which reflects its progress and the > amount of resources it involves. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15665) Balancer logging improvement
[ https://issues.apache.org/jira/browse/HDFS-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224959#comment-17224959 ] Konstantin Shvachko commented on HDFS-15665: Attached a patch * It logs additional config parameters {{dfs.namenode.get-blocks.max-qps}} and {{dfs.datanode.balance.bandwidthPerSec}} * Counts and logs number of blocks (in addition to bytes) moved in each iteration * Logs number of DN targets in each iteration * Prints the NameNode address for each iteration, which is useful in federation. > Balancer logging improvement > > > Key: HDFS-15665 > URL: https://issues.apache.org/jira/browse/HDFS-15665 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Attachments: HDFS-15665.001.patch > > > It would be good to have Balancer log all relevant configuration parameters > on each iteration along with some data, which reflects its progress and the > amount of resources it involves. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15665) Balancer logging improvement
[ https://issues.apache.org/jira/browse/HDFS-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-15665: --- Status: Patch Available (was: Open) > Balancer logging improvement > > > Key: HDFS-15665 > URL: https://issues.apache.org/jira/browse/HDFS-15665 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Attachments: HDFS-15665.001.patch > > > It would be good to have Balancer log all relevant configuration parameters > on each iteration along with some data, which reflects its progress and the > amount of resources it involves. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15665) Balancer logging improvement
[ https://issues.apache.org/jira/browse/HDFS-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-15665: --- Attachment: HDFS-15665.001.patch > Balancer logging improvement > > > Key: HDFS-15665 > URL: https://issues.apache.org/jira/browse/HDFS-15665 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Attachments: HDFS-15665.001.patch > > > It would be good to have Balancer log all relevant configuration parameters > on each iteration along with some data, which reflects its progress and the > amount of resources it involves. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org