[jira] [Resolved] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes

2024-01-05 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-14305.

Resolution: Fixed

> Serial number in BlockTokenSecretManager could overlap between different 
> namenodes
> --
>
> Key: HDFS-14305
> URL: https://issues.apache.org/jira/browse/HDFS-14305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, security
>Reporter: Chao Sun
>Assignee: Konstantin Shvachko
>Priority: Major
>  Labels: multi-sbnn
> Attachments: HDFS-14305-007.patch, HDFS-14305-008.patch, 
> HDFS-14305.001.patch, HDFS-14305.002.patch, HDFS-14305.003.patch, 
> HDFS-14305.004.patch, HDFS-14305.005.patch, HDFS-14305.006.patch
>
>
> Currently, a {{BlockTokenSecretManager}} starts with a random integer as the 
> initial serial number, and then use this formula to rotate it:
> {code:java}
> this.intRange = Integer.MAX_VALUE / numNNs;
> this.nnRangeStart = intRange * nnIndex;
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
>  {code}
> while {{numNNs}} is the total number of NameNodes in the cluster, and 
> {{nnIndex}} is the index of the current NameNode specified in the 
> configuration {{dfs.ha.namenodes.}}.
> However, with this approach, different NameNode could have overlapping ranges 
> for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, 
> and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges 
> for these two are:
> {code}
> nn1 -> [-49, 49]
> nn2 -> [1, 99]
> {code}
> This is because the initial serial number could be any negative integer.
> Moreover, when the keys are updated, the serial number will again be updated 
> with the formula:
> {code}
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
> {code}
> which means the new serial number could be updated to a range that belongs to 
> a different NameNode, thus increasing the chance of collision again.
> When the collision happens, DataNodes could overwrite an existing key which 
> will cause clients to fail because of {{InvalidToken}} error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16517) In 2.10 the distance metric is wrong for non-DN machines

2022-03-22 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511011#comment-17511011
 ] 

Konstantin Shvachko commented on HDFS-16517:


Looks like the same issue as HADOOP-16161, as [~xinglin] found out.
The fix is equivalent. I did not compare the tests.
Should we just backport [~omalley]?

> In 2.10 the distance metric is wrong for non-DN machines
> 
>
> Key: HDFS-16517
> URL: https://issues.apache.org/jira/browse/HDFS-16517
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In 2.10, the metric for distance between the client and the data node is 
> wrong for machines that aren't running data nodes (ie. 
> getWeightUsingNetworkLocation). The code works correctly in 3.3+. 
> Currently
>  
> ||Client||DataNode||getWeight||getWeightUsingNetworkLocation||
> |/rack1/node1|/rack1/node1|0|0|
> |/rack1/node1|/rack1/node2|2|2|
> |/rack1/node1|/rack2/node2|4|2|
> |/pod1/rack1/node1|/pod1/rack1/node2|2|2|
> |/pod1/rack1/node1|/pod1/rack2/node2|4|2|
> |/pod1/rack1/node1|/pod2/rack2/node2|6|4|
>  
> This bug will destroy data locality on clusters where the clients share racks 
> with DataNodes, but are running on machines that aren't running DataNodes, 
> such as striping federated HDFS clusters across racks.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10650) DFSClient#mkdirs and DFSClient#primitiveMkdir should use default directory permission

2022-02-15 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-10650:
---
Fix Version/s: 2.10.2

Merged this into branch-2.10.
Updated fix version.

> DFSClient#mkdirs and DFSClient#primitiveMkdir should use default directory 
> permission
> -
>
> Key: HDFS-10650
> URL: https://issues.apache.org/jira/browse/HDFS-10650
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
> Fix For: 3.0.0-alpha1, 2.10.2
>
> Attachments: HDFS-10650.001.patch, HDFS-10650.002.patch
>
>
> These 2 DFSClient methods should use default directory permission to create a 
> directory.
> {code:java}
>   public boolean mkdirs(String src, FsPermission permission,
>   boolean createParent) throws IOException {
> if (permission == null) {
>   permission = FsPermission.getDefault();
> }
> {code}
> {code:java}
>   public boolean primitiveMkdir(String src, FsPermission absPermission, 
> boolean createParent)
> throws IOException {
> checkOpen();
> if (absPermission == null) {
>   absPermission = 
> FsPermission.getDefault().applyUMask(dfsClientConf.uMask);
> } 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16322) The NameNode implementation of ClientProtocol.truncate(...) can cause data loss.

2021-11-15 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17444033#comment-17444033
 ] 

Konstantin Shvachko commented on HDFS-16322:


Is it specific to truncate?
Same thing should happen with {{mkdir()}}. Client A creates a directory, client 
B deletes it, then client A retries the create. Same with {{setPermission()}}?
>From NN perspective the two calls from client A are different calls. Since NN 
>responded to the first call from client A, it treats the retry as the second 
>call.

> The NameNode implementation of ClientProtocol.truncate(...) can cause data 
> loss.
> 
>
> Key: HDFS-16322
> URL: https://issues.apache.org/jira/browse/HDFS-16322
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: The runtime environment is Ubuntu 18.04, Java 1.8.0_222 
> and Apache Maven 3.6.0. 
> The bug can be reproduced by the the testMultipleTruncate() in the 
> attachment. First, replace the file TestFileTruncate.java under the directory 
> "hadoop-3.3.1-src/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/"
>  with the attachment. Then run "mvn test 
> -Dtest=org.apache.hadoop.hdfs.server.namenode.TestFileTruncate#testMultipleTruncate"
>  to run the testcase. Finally the "assertFileLength(p, n+newLength)" at 199 
> line of TestFileTruncate.java will abort. Because the retry of truncate() 
> changes the file size and cause data loss.
>Reporter: nhaorand
>Priority: Major
> Attachments: TestFileTruncate.java
>
>
> The NameNode implementation of ClientProtocol.truncate(...) can cause data 
> loss. If dfsclient drops the first response of a truncate RPC call, the retry 
> by retry cache will truncate the file again and cause data loss.
> HDFS-7926 avoids repeated execution of truncate(...) by checking if the file 
> is already being truncated with the same length. However, under concurrency, 
> after the first execution of truncate(...), concurrent requests from other 
> clients may append new data and change the file length. When truncate(...) is 
> retried after that, it will find the file has not been truncated with the 
> same length and truncate it again, which causes data loss.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-7612) TestOfflineEditsViewer.testStored() uses incorrect default value for cacheDir

2021-10-21 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-7612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-7612.
---
Fix Version/s: 3.2.4
   3.3.2
   2.10.2
   3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

I just committed this to the four active branches.
Congratulations [~mkuchenbecker]!

> TestOfflineEditsViewer.testStored() uses incorrect default value for cacheDir
> -
>
> Key: HDFS-7612
> URL: https://issues.apache.org/jira/browse/HDFS-7612
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.6.0
>Reporter: Konstantin Shvachko
>Assignee: Michael Kuchenbecker
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 3.4.0, 2.10.2, 3.3.2, 3.2.4
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code}
> final String cacheDir = System.getProperty("test.cache.data",
> "build/test/cache");
> {code}
> results in
> {{FileNotFoundException: build/test/cache/editsStoredParsed.xml (No such file 
> or directory)}}
> when {{test.cache.data}} is not set.
> I can see this failing while running in Eclipse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-7612) TestOfflineEditsViewer.testStored() uses incorrect default value for cacheDir

2021-10-21 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-7612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko reassigned HDFS-7612:
-

Assignee: Michael Kuchenbecker

> TestOfflineEditsViewer.testStored() uses incorrect default value for cacheDir
> -
>
> Key: HDFS-7612
> URL: https://issues.apache.org/jira/browse/HDFS-7612
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.6.0
>Reporter: Konstantin Shvachko
>Assignee: Michael Kuchenbecker
>Priority: Major
>  Labels: newbie, pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code}
> final String cacheDir = System.getProperty("test.cache.data",
> "build/test/cache");
> {code}
> results in
> {{FileNotFoundException: build/test/cache/editsStoredParsed.xml (No such file 
> or directory)}}
> when {{test.cache.data}} is not set.
> I can see this failing while running in Eclipse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13150) [Edit Tail Fast Path] Allow SbNN to tail in-progress edits from JN via RPC

2021-09-30 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423048#comment-17423048
 ] 

Konstantin Shvachko commented on HDFS-13150:


We end up implementing quorum read from JNs for Observer fast path.
You should check the code [~liutongwei]

> [Edit Tail Fast Path] Allow SbNN to tail in-progress edits from JN via RPC
> --
>
> Key: HDFS-13150
> URL: https://issues.apache.org/jira/browse/HDFS-13150
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, hdfs, journal-node, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: HDFS-12943, 3.3.0
>
> Attachments: edit-tailing-fast-path-design-v0.pdf, 
> edit-tailing-fast-path-design-v1.pdf, edit-tailing-fast-path-design-v2.pdf
>
>
> In the interest of making coordinated/consistent reads easier to complete 
> with low latency, it is advantageous to reduce the time between when a 
> transaction is applied on the ANN and when it is applied on the SbNN. We 
> propose adding a new "fast path" which can be used to tail edits when low 
> latency is desired. We leave the existing tailing logic in place, and fall 
> back to this path on startup, recovery, and when the fast path encounters 
> unrecoverable errors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-16220) [FGL]Configurable INodeMap#NAMESPACE_KEY_DEPTH_RANGES_STATIC

2021-09-29 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422417#comment-17422417
 ] 

Konstantin Shvachko edited comment on HDFS-16220 at 9/29/21, 11:22 PM:
---

# I don't think we should make Key depth configurable. We should generalize the 
long[] into a Key class, then we will be able to configure any Key class and 
load it using a Key factory. Different Key classes then can have different 
depths. The problem here is that one should be able to construct the keys while 
loading INodes from the image, so that they could be placed into the right 
partitions.
# Number of partitions should in the end be configurable. It should depend on 
the number of cores on your server. Increasing the number of partitions does 
not necessarily increase the parallelism because at any moment the CPU cannot 
support more threads than the number of cores. So this change is useful, but 
not critical.
And the main problem here is to be able to rebuild new partitions while 
reloading the fsimage. If you upgraded your NameNode to a server with more 
cores you should be able to adjust the number of partitions.


was (Author: shv):
# I don't think we should make Key depth configurable. We should generalize the 
long[] into a Key class, then we will be able to configure any Key class and 
load it using a Key factory. Different Key classes then can have different 
depths. The problem here is that one should be able to construct the keys while 
loading INodes from the image, so that they could be placed into the right 
partitions.
# Number of partitions should in the end be configurable. It should depend on 
the number of cores on your server. Increasing the number of partitions does 
not necessarily increase the parallelism because at any moment the CPU cannot 
support more threads than the number of cores. So this change is useful, but 
not critical.

> [FGL]Configurable INodeMap#NAMESPACE_KEY_DEPTH_RANGES_STATIC
> 
>
> Key: HDFS-16220
> URL: https://issues.apache.org/jira/browse/HDFS-16220
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, namenode
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: debug1.jpg, debug2.jpg
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> In INodeMap, NAMESPACE_KEY_DEPTH and NUM_RANGES_STATIC are a fixed value, we 
> should make it configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16228) [FGL]Improve safer PartitionedGSet#size

2021-09-29 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422419#comment-17422419
 ] 

Konstantin Shvachko commented on HDFS-16228:


Atomic variables have the same semantics as volatile, except that in addition 
they provide atomic set-and-get methods.
In your patch you do call {{incrementAndGet()}}, but only for the purpose of 
incrementing since you never use the returned result.
So you might as well keep it volatile.
Besides, all GSet methods should be called under a higher level lock, so they 
should be safe.

> [FGL]Improve safer PartitionedGSet#size
> ---
>
> Key: HDFS-16228
> URL: https://issues.apache.org/jira/browse/HDFS-16228
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, namenode
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When multiple PartitionedEntry is working at the same time, there may be 
> inconsistencies in the operation PartitionedGSet#size.
> For example, there are some size++ or size-- operations in PartitionedGSet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16220) [FGL]Configurable INodeMap#NAMESPACE_KEY_DEPTH_RANGES_STATIC

2021-09-29 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422417#comment-17422417
 ] 

Konstantin Shvachko commented on HDFS-16220:


# I don't think we should make Key depth configurable. We should generalize the 
long[] into a Key class, then we will be able to configure any Key class and 
load it using a Key factory. Different Key classes then can have different 
depths. The problem here is that one should be able to construct the keys while 
loading INodes from the image, so that they could be placed into the right 
partitions.
# Number of partitions should in the end be configurable. It should depend on 
the number of cores on your server. Increasing the number of partitions does 
not necessarily increase the parallelism because at any moment the CPU cannot 
support more threads than the number of cores. So this change is useful, but 
not critical.

> [FGL]Configurable INodeMap#NAMESPACE_KEY_DEPTH_RANGES_STATIC
> 
>
> Key: HDFS-16220
> URL: https://issues.apache.org/jira/browse/HDFS-16220
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, namenode
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: debug1.jpg, debug2.jpg
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> In INodeMap, NAMESPACE_KEY_DEPTH and NUM_RANGES_STATIC are a fixed value, we 
> should make it configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14216) NullPointerException happens in NamenodeWebHdfs

2021-09-10 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-14216:
---
Fix Version/s: 2.10.2

Also saw this NPE on branch-2.10. Back-port is clean. Adding Fix version.

> NullPointerException happens in NamenodeWebHdfs
> ---
>
> Key: HDFS-14216
> URL: https://issues.apache.org/jira/browse/HDFS-14216
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lujie
>Assignee: lujie
>Priority: Critical
> Fix For: 3.3.0, 3.2.1, 3.1.4, 2.10.2
>
> Attachments: HDFS-14216.branch-3.1.patch, HDFS-14216_1.patch, 
> HDFS-14216_2.patch, HDFS-14216_3.patch, HDFS-14216_4.patch, 
> HDFS-14216_5.patch, HDFS-14216_6.patch, hadoop-hires-namenode-hadoop11.log
>
>
>  workload
> {code:java}
> curl -i -X PUT -T $HOMEPARH/test.txt 
> "http://hadoop1:9870/webhdfs/v1/input?op=CREATE=hadoop2;
> {code}
> the method
> {code:java}
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.chooseDatanode(String
>  excludeDatanodes){
>     HashSet excludes = new HashSet();
> if (excludeDatanodes != null) {
>for (String host : StringUtils
>  .getTrimmedStringCollection(excludeDatanodes)) {
>  int idx = host.indexOf(":");
>if (idx != -1) { 
> excludes.add(bm.getDatanodeManager().getDatanodeByXferAddr(
>host.substring(0, idx), Integer.parseInt(host.substring(idx + 
> 1;
>} else {
>   
> excludes.add(bm.getDatanodeManager().getDatanodeByHost(host));//line280
>}
>   }
> }
> }
> {code}
> when datanode(e.g.hadoop2) is {color:#d04437}just  wiped before 
> line280{color}, or{color:#33} 
> {color}{color:#ff}we{color}{color:#ff} give the wrong DN 
> name{color}*,*then  bm.getDatanodeManager().getDatanodeByHost(host) will 
> return null, *_excludes_* *containes null*. while *_excludes_* are used 
> later, NPE happens:
> {code:java}
> java.lang.NullPointerException
> at org.apache.hadoop.net.NodeBase.getPath(NodeBase.java:113)
> at 
> org.apache.hadoop.net.NetworkTopology.countNumOfAvailableNodes(NetworkTopology.java:672)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:533)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:491)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.chooseDatanode(NamenodeWebHdfsMethods.java:323)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.redirectURI(NamenodeWebHdfsMethods.java:384)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.put(NamenodeWebHdfsMethods.java:652)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$2.run(NamenodeWebHdfsMethods.java:600)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$2.run(NamenodeWebHdfsMethods.java:597)
> at org.apache.hadoop.ipc.ExternalCall.run(ExternalCall.java:73)
> at org.apache.hadoop.ipc.ExternalCall.run(ExternalCall.java:30)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2830)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16211) Complete some descriptions related to AuthToken

2021-09-08 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412096#comment-17412096
 ] 

Konstantin Shvachko commented on HDFS-16211:


Hi [~jianghuazhu]. Thanks for contributing.
Generally changing documentation is a good thing.
But with this particular change I do not see how it clarifies anything about 
{{AuthToken}} class.
Besides, since you commit your changes only into trunk, it increases the 
divergence between supported versions of Hadoop (3.3, 3.2, 2.10) and makes 
backports more complex.
If you are looking for some simpler tasks to get you started with Hadoop, I 
suggest to search for issues labeled "newbie" or "newbie++".

> Complete some descriptions related to AuthToken
> ---
>
> Key: HDFS-16211
> URL: https://issues.apache.org/jira/browse/HDFS-16211
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In AuthToken, some description information is missing.
> The purpose of this jira is to complete some descriptions related to 
> AuthToken.
> /**
>  */
> public class AuthToken implements Principal {
>   ..
> }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16141) [FGL] Address permission related issues with File / Directory

2021-08-13 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-16141.

Fix Version/s: Fine-Grained Locking
 Hadoop Flags: Reviewed
   Resolution: Fixed

I just committed this to fgl branch. Thank you [~prasad-acit].

> [FGL] Address permission related issues with File / Directory
> -
>
> Key: HDFS-16141
> URL: https://issues.apache.org/jira/browse/HDFS-16141
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Renukaprasad C
>Assignee: Renukaprasad C
>Priority: Major
>  Labels: pull-request-available
> Fix For: Fine-Grained Locking
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Post FGL implementation (MKDIR & Create File), there are existing UTs got 
> impacted which needs to be addressed.
> Failed Tests:
> TestDFSPermission
> TestPermission
> TestFileCreation
> TestDFSMkdirs (Added tests)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2021-07-30 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390818#comment-17390818
 ] 

Konstantin Shvachko commented on HDFS-14703:


Some thoughts on [~daryn]'s comment:
* For small clusters/namespaces you don't need to do anything at all, 
performance should be great.
* 1 billion object namespaces can be effectively handled with Observers 
(HDFS-12943), as described in our [Exabyte Club 
blog|https://engineering.linkedin.com/blog/2021/the-exabyte-club--linkedin-s-journey-of-scaling-the-hadoop-distr].
* This namespace partitioning idea should help if you want to grow the 
workloads and cluster size further. And sure, it's a big "if" there.
* There is plenty of benchmark data above. I built the POC exactly with the 
purpose to obtain some preliminary synthetic numbers. For me 30% is a threshold 
separating worthy improvements.
* We won't know the real performance numbers until the feature is done. As with 
"Consistent Reads from Standby", our initial synthetic  benchmarks showed ~50% 
improvement. The real numbers in production were 3x better in both average 
throughput and latency.
* You bring up good design concerns. But conceptually multiple partitions 
cannot be worse than the single. When an operation spans all partitions, its 
like taking a global lock as we do today. So in this case the performance of 
multiple partitions degenerates to the current level, but in all other cases 
multiple namespace operations can go in parallel.
* Let us know if you have concrete suggestions: you don't want it to sound like 
FUD.

> NameNode Fine-Grained Locking via Metadata Partitioning
> ---
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: 001-partitioned-inodeMap-POC.tar.gz, 
> 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, 
> NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14540) Block deletion failure causes an infinite polling in TestDeleteBlockPool

2021-07-30 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko reassigned HDFS-14540:
--

Assignee: Anton Kutuzov

> Block deletion failure causes an infinite polling in TestDeleteBlockPool
> 
>
> Key: HDFS-14540
> URL: https://issues.apache.org/jira/browse/HDFS-14540
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: John Doe
>Assignee: Anton Kutuzov
>Priority: Major
>
> In the testDeleteBlockPool function, when file deletion failure, the while 
> loop hangs.
> {code:java}
>   fs1.delete(new Path("/alpha"), true); //deletion failure
>   
>   // Wait till all blocks are deleted from the dn2 for bpid1.
>   while ((MiniDFSCluster.getFinalizedDir(dn2StorageDir1, 
>   bpid1).list().length != 0) || (MiniDFSCluster.getFinalizedDir(
>   dn2StorageDir2, bpid1).list().length != 0)) {
> try {
>   Thread.sleep(3000); 
> } catch (Exception ignored) {
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14540) Block deletion failure causes an infinite polling in TestDeleteBlockPool

2021-07-30 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390757#comment-17390757
 ] 

Konstantin Shvachko commented on HDFS-14540:


I think we should just check the return value of {{fs1.delete()}} and assert it 
is successful. The rest of the test doesn't make sense without this delete 
succeeding.

> Block deletion failure causes an infinite polling in TestDeleteBlockPool
> 
>
> Key: HDFS-14540
> URL: https://issues.apache.org/jira/browse/HDFS-14540
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: John Doe
>Priority: Major
>
> In the testDeleteBlockPool function, when file deletion failure, the while 
> loop hangs.
> {code:java}
>   fs1.delete(new Path("/alpha"), true); //deletion failure
>   
>   // Wait till all blocks are deleted from the dn2 for bpid1.
>   while ((MiniDFSCluster.getFinalizedDir(dn2StorageDir1, 
>   bpid1).list().length != 0) || (MiniDFSCluster.getFinalizedDir(
>   dn2StorageDir2, bpid1).list().length != 0)) {
> try {
>   Thread.sleep(3000); 
> } catch (Exception ignored) {
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16130) [FGL] Implement Create File with FGL

2021-07-23 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-16130:
---
Fix Version/s: Fine-Grained Locking

> [FGL] Implement Create File with FGL
> 
>
> Key: HDFS-16130
> URL: https://issues.apache.org/jira/browse/HDFS-16130
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: Fine-Grained Locking
>Reporter: Renukaprasad C
>Assignee: Renukaprasad C
>Priority: Major
>  Labels: pull-request-available
> Fix For: Fine-Grained Locking
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Implement FGL for Create File.
> Create API acquire global lock at mulitiple stages. Acquire the respective 
> partitioned lock and continue the create operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16130) [FGL] Implement Create File with FGL

2021-07-23 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-16130.

Hadoop Flags: Reviewed
  Resolution: Fixed

I just committed this. Fixed a few checkstyle warnings.
Thank you [~prasad-acit].

> [FGL] Implement Create File with FGL
> 
>
> Key: HDFS-16130
> URL: https://issues.apache.org/jira/browse/HDFS-16130
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: Fine-Grained Locking
>Reporter: Renukaprasad C
>Assignee: Renukaprasad C
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Implement FGL for Create File.
> Create API acquire global lock at mulitiple stages. Acquire the respective 
> partitioned lock and continue the create operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16128) [FGL] Add support for saving/loading an FS Image for PartitionedGSet

2021-07-23 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-16128.

Fix Version/s: Fine-Grained Locking
 Hadoop Flags: Reviewed
   Resolution: Fixed

I just committed this. Thank you [~xinglin].

> [FGL] Add support for saving/loading an FS Image for PartitionedGSet
> 
>
> Key: HDFS-16128
> URL: https://issues.apache.org/jira/browse/HDFS-16128
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, namenode
>Reporter: Xing Lin
>Assignee: Xing Lin
>Priority: Major
>  Labels: pull-request-available
> Fix For: Fine-Grained Locking
>
>
> Add support to save Inodes stored in PartitionedGSet when saving an FS image 
> and load Inodes into PartitionedGSet from a saved FS image.
> h1. Saving FSImage
> *Original HDFS design*: iterate every inode in inodeMap and save them into 
> the FSImage file. 
> *FGL*: no change is needed here, since PartitionedGSet also provides an 
> iterator interface, to iterate over inodes stored in partitions. 
> h1. Loading an HDFS 
> *Original HDFS design*: it first loads the FSImage files and then loads edit 
> logs for recent changes. FSImage files contain different sections, including 
> INodeSections and INodeDirectorySections. An InodeSection contains serialized 
> Inodes objects and the INodeDirectorySection contains the parent inode for an 
> Inode. When loading an FSImage, the system first loads INodeSections and then 
> load the INodeDirectorySections, to set the parent inode for each inode. 
> After FSImage files are loaded, edit logs are then loaded. Edit log contains 
> recent changes to the filesystem, including Inodes creation/deletion. For a 
> newly created INode, the parent inode is set before it is added to the 
> inodeMap.
> *FGL*: when adding an Inode into the partitionedGSet, we need the parent 
> inode of an inode, in order to determine which partition to store that inode, 
> when NAMESPACE_KEY_DEPTH = 2. Thus, in FGL, when loading FSImage files, we 
> used a temporary LightweightGSet (inodeMapTemp), to store inodes. When 
> LoadFSImage is done, the parent inode for all existing inodes in FSImage 
> files is set. We can now move the inodes into a partitionedGSet. Load edit 
> logs can work as usual, as the parent inode for an inode is set before it is 
> added to the inodeMap. 
> In theory, PartitionedGSet can support to store inodes without setting its 
> parent inodes. All these inodes will be stored in the 0th partition. However, 
> we decide to use a temporary LightweightGSet (inodeMapTemp) to store these 
> inodes, to make this case more transparent.          
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16125) [FGL] Fix the iterator for PartitionedGSet

2021-07-16 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-16125.

Fix Version/s: Fine-Grained Locking
 Hadoop Flags: Reviewed
   Resolution: Fixed

+1 on the latest patch.
I just committed this to branch fgl, also re-based flg to current trunk.
Thank you [~xinglin].

> [FGL] Fix the iterator for PartitionedGSet 
> ---
>
> Key: HDFS-16125
> URL: https://issues.apache.org/jira/browse/HDFS-16125
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, namenode
>Reporter: Xing Lin
>Assignee: Xing Lin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: Fine-Grained Locking
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Iterator in PartitionedGSet would visit the first partition twice, since we 
> did not set the keyIterator to move to the first key during initialization.  
>  
> This is related to fgl: https://issues.apache.org/jira/browse/HDFS-14703



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16125) [FGL] Fix the iterator for PartitionedGSet

2021-07-14 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-16125:
---
Summary: [FGL] Fix the iterator for PartitionedGSet   (was: fix the 
iterator for PartitionedGSet )

> [FGL] Fix the iterator for PartitionedGSet 
> ---
>
> Key: HDFS-16125
> URL: https://issues.apache.org/jira/browse/HDFS-16125
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, namenode
>Reporter: Xing Lin
>Assignee: Xing Lin
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Iterator in PartitionedGSet would visit the first partition twice, since we 
> did not set the keyIterator to move to the first key during initialization.  
>  
> This is related to fgl: https://issues.apache.org/jira/browse/HDFS-14703



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16125) fix the iterator for PartitionedGSet

2021-07-14 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-16125:
---
Parent: HDFS-14703
Issue Type: Sub-task  (was: Bug)

> fix the iterator for PartitionedGSet 
> -
>
> Key: HDFS-16125
> URL: https://issues.apache.org/jira/browse/HDFS-16125
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, namenode
>Reporter: Xing Lin
>Assignee: Xing Lin
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Iterator in PartitionedGSet would visit the first partition twice, since we 
> did not set the keyIterator to move to the first key during initialization.  
>  
> This is related to fgl: https://issues.apache.org/jira/browse/HDFS-14703



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2021-07-14 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17345870#comment-17345870
 ] 

Konstantin Shvachko edited comment on HDFS-14703 at 7/14/21, 5:28 PM:
--

I did some performance benchmarks using a physical server (a d430 server in 
[Utah Emulab testbed|http://www.emulab.net]). I used either RAMDISK or SSD, as 
the storage for HDFS. By using RAMDISK, we can remove the time used by the SSD 
to make each write persistent. For the RAM case, we observed an improvement of 
45% from fine-grained locking. For the SSD case, fine-grained locking gives us 
20% improvement.  We used an Intel SSD (model: SSDSC2BX200G4R).  

We noticed for trunk, the mkdir OPS is lower for the RAMDISK than SSD. We don't 
know the reason for this yet. We repeated the experiment for RAMDISK for trunk 
twice to confirm the performance number.
h2. tmpfs, hadoop-tmp-dir = /run/hadoop-utos
h3. 45% improvements fgl vs. trunk
trunk 
{noformat:nowrap}
2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: # operations: 
1000
2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
663510
2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
15071.362
2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: Average Time: 13
2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: — mkdirs stats  —
2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: # operations: 
1000
2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
710248
2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
14079.5
2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: Average Time: 14
2021-05-16 22:15:13,515 INFO namenode.FSEditLog: Ending log segment 8345565, 
10019540
{noformat}

fgl
{noformat:nowrap}
2021-05-16 21:06:46,476 INFO namenode.NNThroughputBenchmark: — mkdirs stats  —
2021-05-16 21:06:46,476 INFO namenode.NNThroughputBenchmark: # operations: 
1000
2021-05-16 21:06:46,476 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
445980
2021-05-16 21:06:46,476 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
22422.530
2021-05-16 21:06:46,476 INFO namenode.NNThroughputBenchmark: Average Time: 8
{noformat}

h2. SSD, hadoop.tmp.dir=/dev/sda4
h3. 23% improvement fgl vs. trunk

trunk:
{noformat:nowrap}
2021-05-16 21:59:06,042 INFO namenode.NNThroughputBenchmark: — mkdirs stats  —
2021-05-16 21:59:06,042 INFO namenode.NNThroughputBenchmark: # operations: 
1000
2021-05-16 21:59:06,042 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
593839
2021-05-16 21:59:06,042 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
16839.581
2021-05-16 21:59:06,042 INFO namenode.NNThroughputBenchmark: Average Time: 11
{noformat:nowrap}

fgl
{noformat:nowrap}
2021-05-16 21:21:03,906 INFO namenode.NNThroughputBenchmark: — mkdirs stats  —
2021-05-16 21:21:03,906 INFO namenode.NNThroughputBenchmark: # operations: 
1000
2021-05-16 21:21:03,906 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
481269
2021-05-16 21:21:03,906 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
20778.400
2021-05-16 21:21:03,906 INFO namenode.NNThroughputBenchmark: Average Time: 9
{noformat}
 
{noformat:nowrap}
/dev/sda:
ATA device, with non-removable media
Model Number:   INTEL SSDSC2BX200G4R
Serial Number:  BTHC523202RD200TGN
Firmware Revision:  G201DL2D
{noformat}


was (Author: xinglin):
I did some performance benchmarks using a physical server (a d430 server in 
[Utah Emulab testbed|http://www.emulab.net]). I used either RAMDISK or SSD, as 
the storage for HDFS. By using RAMDISK, we can remove the time used by the SSD 
to make each write persistent. For the RAM case, we observed an improvement of 
45% from fine-grained locking. For the SSD case, fine-grained locking gives us 
20% improvement.  We used an Intel SSD (model: SSDSC2BX200G4R).  

We noticed for trunk, the mkdir OPS is lower for the RAMDISK than SSD. We don't 
know the reason for this yet. We repeated the experiment for RAMDISK for trunk 
twice to confirm the performance number.
h1. tmpfs, hadoop-tmp-dir = /run/hadoop-utos
h1. 45% improvements fgl vs. trunk
h2. trunk 

2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: # operations: 
1000

2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
663510

2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
15071.362

2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: Average Time: 13

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: — mkdirs stats  —

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: # operations: 
1000

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
710248

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
14079.5

2021-05-16 22:15:13,515 INFO 

[jira] [Commented] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2021-07-14 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380757#comment-17380757
 ] 

Konstantin Shvachko commented on HDFS-14703:


??Shall I raise separate Jira for Create and trace the PR???
Yes please let's track {{create}} in a new jira. You can make it a subtask of 
this jira and follow [the standard 
process|https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute].

??Just provided work-around to continue, we shall work on it and eventually 
optimize it better.??
It is fine as a work around, but yes we should and it would be good to design 
it early, as it may effect the structure of the entire implementation. A short 
design doc on the subject would be nice to have if you got any ideas.

> NameNode Fine-Grained Locking via Metadata Partitioning
> ---
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: 001-partitioned-inodeMap-POC.tar.gz, 
> 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, 
> NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2021-07-09 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17378314#comment-17378314
 ] 

Konstantin Shvachko commented on HDFS-14703:


Great progress [~prasad-acit]. It proves the concept works for creates as well.
I liked that your changes are all confined in internal classes like 
FSDirectory. Noticed that you implemented {{getInode(id)}} by iterating through 
all inodes. This is probably the key part of this effort. We should eventually 
replace {{getInode(id)}}  with {{getInode(key)}} to make the inode lookup 
efficient.
But hey you still got 25% boost.

> NameNode Fine-Grained Locking via Metadata Partitioning
> ---
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: 001-partitioned-inodeMap-POC.tar.gz, 
> 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, 
> NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16040) RpcQueueTime metric counts requeued calls as unique events.

2021-05-27 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-16040:
---
Fix Version/s: 3.3.2
   3.2.3
   2.10.2
   3.1.5
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

I just committed this to all active branches. Thank you, [~simbadzina].

> RpcQueueTime metric counts requeued calls as unique events.
> ---
>
> Key: HDFS-16040
> URL: https://issues.apache.org/jira/browse/HDFS-16040
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0, 3.3.0
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Major
> Fix For: 3.1.5, 2.10.2, 3.2.3, 3.3.2
>
> Attachments: HDFS-16040.001.patch, HDFS-16040.002.patch, 
> HDFS-16040.003.patch
>
>
> The RpcQueueTime metric is updated every time a call is re-queued while 
> waiting for the server state to reach the call's client's state ID. This is 
> in contrast to RpcProcessingTime which is only updated when the call if 
> finally processed.
> On the Observer NameNode this can result in RpcQueueTimeNumOps being much 
> larger than RpcProcessingTimeNumOps. The re-queueing is an internal 
> optimization to avoid blocking and shouldn't result in an inflated metric.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16040) RpcQueueTime metric counts requeued calls as unique events.

2021-05-27 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352920#comment-17352920
 ] 

Konstantin Shvachko commented on HDFS-16040:


Wow - clean build.
Adding my +1 to that. Will commit shortly.

> RpcQueueTime metric counts requeued calls as unique events.
> ---
>
> Key: HDFS-16040
> URL: https://issues.apache.org/jira/browse/HDFS-16040
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0, 3.3.0
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Major
> Attachments: HDFS-16040.001.patch, HDFS-16040.002.patch, 
> HDFS-16040.003.patch
>
>
> The RpcQueueTime metric is updated every time a call is re-queued while 
> waiting for the server state to reach the call's client's state ID. This is 
> in contrast to RpcProcessingTime which is only updated when the call if 
> finally processed.
> On the Observer NameNode this can result in RpcQueueTimeNumOps being much 
> larger than RpcProcessingTimeNumOps. The re-queueing is an internal 
> optimization to avoid blocking and shouldn't result in an inflated metric.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log

2021-05-27 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352653#comment-17352653
 ] 

Konstantin Shvachko commented on HDFS-15915:


[~daryn] would appreciate your review.

> Race condition with async edits logging due to updating txId outside of the 
> namesystem log
> --
>
> Key: HDFS-15915
> URL: https://issues.apache.org/jira/browse/HDFS-15915
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Fix For: 3.4.0, 3.1.5, 2.10.2, 3.2.3, 3.3.2
>
> Attachments: HDFS-15915-01.patch, HDFS-15915-02.patch, 
> HDFS-15915-03.patch, HDFS-15915-04.patch, HDFS-15915-05.patch, 
> testMkdirsRace.patch
>
>
> {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside 
> {{FSNamesystem.writeLock}}. But one essential field the transaction id of the 
> edits op remains unset until the time when the operation is scheduled for 
> synching. At that time {{beginTransaction()}} will set the the 
> {{FSEditLogOp.txid}} and increment the global transaction count. On busy 
> NameNode this event can fall outside the write lock. 
> This causes problems for Observer reads. It also can potentially reshuffle 
> transactions and Standby will apply them in a wrong order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log

2021-05-26 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-15915:
---
Fix Version/s: 3.3.2
   3.2.3
   2.10.2
   3.1.5
   3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

I just committed this to trunk and all branches down to branch-2.10.

> Race condition with async edits logging due to updating txId outside of the 
> namesystem log
> --
>
> Key: HDFS-15915
> URL: https://issues.apache.org/jira/browse/HDFS-15915
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Fix For: 3.4.0, 3.1.5, 2.10.2, 3.2.3, 3.3.2
>
> Attachments: HDFS-15915-01.patch, HDFS-15915-02.patch, 
> HDFS-15915-03.patch, HDFS-15915-04.patch, HDFS-15915-05.patch, 
> testMkdirsRace.patch
>
>
> {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside 
> {{FSNamesystem.writeLock}}. But one essential field the transaction id of the 
> edits op remains unset until the time when the operation is scheduled for 
> synching. At that time {{beginTransaction()}} will set the the 
> {{FSEditLogOp.txid}} and increment the global transaction count. On busy 
> NameNode this event can fall outside the write lock. 
> This causes problems for Observer reads. It also can potentially reshuffle 
> transactions and Standby will apply them in a wrong order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log

2021-05-26 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352023#comment-17352023
 ] 

Konstantin Shvachko commented on HDFS-15915:


Ran unit tests that failed on Jenkins. All passing locally. Will be committing 
this shortly.

> Race condition with async edits logging due to updating txId outside of the 
> namesystem log
> --
>
> Key: HDFS-15915
> URL: https://issues.apache.org/jira/browse/HDFS-15915
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15915-01.patch, HDFS-15915-02.patch, 
> HDFS-15915-03.patch, HDFS-15915-04.patch, HDFS-15915-05.patch, 
> testMkdirsRace.patch
>
>
> {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside 
> {{FSNamesystem.writeLock}}. But one essential field the transaction id of the 
> edits op remains unset until the time when the operation is scheduled for 
> synching. At that time {{beginTransaction()}} will set the the 
> {{FSEditLogOp.txid}} and increment the global transaction count. On busy 
> NameNode this event can fall outside the write lock. 
> This causes problems for Observer reads. It also can potentially reshuffle 
> transactions and Standby will apply them in a wrong order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16040) RpcQueueTime metric counts requeued calls as unique events.

2021-05-26 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351510#comment-17351510
 ] 

Konstantin Shvachko commented on HDFS-16040:


I checked that the test is fails without the fix and passes with it.
I was wondering if the code correct;y counts the queue time for Observer. That 
is takes into account the time the call was requeued. It seems to me that it 
does. [~simbadzina] could you please double-check.
I guess there will be some checkstyle warnings when Jenkins finishes.

> RpcQueueTime metric counts requeued calls as unique events.
> ---
>
> Key: HDFS-16040
> URL: https://issues.apache.org/jira/browse/HDFS-16040
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0, 3.3.0
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Major
> Attachments: HDFS-16040.001.patch, HDFS-16040.002.patch
>
>
> The RpcQueueTime metric is updated every time a call is re-queued while 
> waiting for the server state to reach the call's client's state ID. This is 
> in contrast to RpcProcessingTime which is only updated when the call if 
> finally processed.
> On the Observer NameNode this can result in RpcQueueTimeNumOps being much 
> larger than RpcProcessingTimeNumOps. The re-queueing is an internal 
> optimization to avoid blocking and shouldn't result in an inflated metric.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log

2021-05-25 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351284#comment-17351284
 ] 

Konstantin Shvachko commented on HDFS-15915:


Thanks for thorough review [~virajith].
BTW, this {{logEdit()}} method is only used in BackupNode, so it doesn't matter 
much. But I swapped the two lines in v 05 patch.

> Race condition with async edits logging due to updating txId outside of the 
> namesystem log
> --
>
> Key: HDFS-15915
> URL: https://issues.apache.org/jira/browse/HDFS-15915
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15915-01.patch, HDFS-15915-02.patch, 
> HDFS-15915-03.patch, HDFS-15915-04.patch, HDFS-15915-05.patch, 
> testMkdirsRace.patch
>
>
> {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside 
> {{FSNamesystem.writeLock}}. But one essential field the transaction id of the 
> edits op remains unset until the time when the operation is scheduled for 
> synching. At that time {{beginTransaction()}} will set the the 
> {{FSEditLogOp.txid}} and increment the global transaction count. On busy 
> NameNode this event can fall outside the write lock. 
> This causes problems for Observer reads. It also can potentially reshuffle 
> transactions and Standby will apply them in a wrong order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log

2021-05-25 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-15915:
---
Attachment: HDFS-15915-05.patch

> Race condition with async edits logging due to updating txId outside of the 
> namesystem log
> --
>
> Key: HDFS-15915
> URL: https://issues.apache.org/jira/browse/HDFS-15915
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15915-01.patch, HDFS-15915-02.patch, 
> HDFS-15915-03.patch, HDFS-15915-04.patch, HDFS-15915-05.patch, 
> testMkdirsRace.patch
>
>
> {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside 
> {{FSNamesystem.writeLock}}. But one essential field the transaction id of the 
> edits op remains unset until the time when the operation is scheduled for 
> synching. At that time {{beginTransaction()}} will set the the 
> {{FSEditLogOp.txid}} and increment the global transaction count. On busy 
> NameNode this event can fall outside the write lock. 
> This causes problems for Observer reads. It also can potentially reshuffle 
> transactions and Standby will apply them in a wrong order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log

2021-05-19 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347336#comment-17347336
 ] 

Konstantin Shvachko commented on HDFS-15915:


Updated the patch per [~virajith]'s suggestions. Thanks.
# The default implementation of {{EditLogOutputStream.getLastJournalledTxId()}} 
returns {{INVALID_TXID}} rather than {{0}}.
# Changed {{beginTransaction()}} type to void.

??This change forces the txid to be assigned when the operation takes place 
under the FSN lock.??

Exactly right. The advantage of this in non-Observer case is verifiability and 
proper enforcement.
When you merely rely on placing operations into the queue in the right order 
you cannot verify that, such as write unit tests or set asserts. And it is hard 
to detect a bug if there is one in this very multi-threaded code.
With the patch the txId is generated when the operation is queued, so I could 
add asserts to ensure operations are queued and synced in the order they were 
applied on the active NN.

> Race condition with async edits logging due to updating txId outside of the 
> namesystem log
> --
>
> Key: HDFS-15915
> URL: https://issues.apache.org/jira/browse/HDFS-15915
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15915-01.patch, HDFS-15915-02.patch, 
> HDFS-15915-03.patch, HDFS-15915-04.patch, testMkdirsRace.patch
>
>
> {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside 
> {{FSNamesystem.writeLock}}. But one essential field the transaction id of the 
> edits op remains unset until the time when the operation is scheduled for 
> synching. At that time {{beginTransaction()}} will set the the 
> {{FSEditLogOp.txid}} and increment the global transaction count. On busy 
> NameNode this event can fall outside the write lock. 
> This causes problems for Observer reads. It also can potentially reshuffle 
> transactions and Standby will apply them in a wrong order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log

2021-05-19 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-15915:
---
Attachment: HDFS-15915-04.patch

> Race condition with async edits logging due to updating txId outside of the 
> namesystem log
> --
>
> Key: HDFS-15915
> URL: https://issues.apache.org/jira/browse/HDFS-15915
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15915-01.patch, HDFS-15915-02.patch, 
> HDFS-15915-03.patch, HDFS-15915-04.patch, testMkdirsRace.patch
>
>
> {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside 
> {{FSNamesystem.writeLock}}. But one essential field the transaction id of the 
> edits op remains unset until the time when the operation is scheduled for 
> synching. At that time {{beginTransaction()}} will set the the 
> {{FSEditLogOp.txid}} and increment the global transaction count. On busy 
> NameNode this event can fall outside the write lock. 
> This causes problems for Observer reads. It also can potentially reshuffle 
> transactions and Standby will apply them in a wrong order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2021-05-17 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346379#comment-17346379
 ] 

Konstantin Shvachko commented on HDFS-14703:


Thanks [~prasad-acit] and [~xinglin] for benchmarking. Very glad you guys could 
independently confirm 30-45% improvement.
I think the PartitionedGSet implementation should benefit from both *_more 
cores_* and *_faster storage device_* for edits. For storage device NVME SSDs 
perform the best for journaling type workloads in our experience.
Also please take into account this is only a POC patch. Theoretically, we 
should be able to scale performance proportionally to the number of cores and 
partitions in the GSet given we are not IO bound.

> NameNode Fine-Grained Locking via Metadata Partitioning
> ---
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: 001-partitioned-inodeMap-POC.tar.gz, 
> 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, 
> NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16004) BackupNode and QJournal lack Permission check.

2021-05-10 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17342108#comment-17342108
 ] 

Konstantin Shvachko commented on HDFS-16004:


Hey guys. I wouldn't worry about {{BackupNode}}. It was supposed to be removed 
as redundant HDFS-4114.
Same with {{JournalProtocol}} as it is used exclusively for {{BackupNode}}.
This is an old code, that is not supposed to be used. There were some 
controversial issues about removing {{BackupNode}}, but I don't think they 
still stand.
{{QJournalProtocol}} is the one to be used with QJM.
If it is fine, then we can close this issue as wont fix or not a problem.

> BackupNode and QJournal lack Permission check.
> --
>
> Key: HDFS-16004
> URL: https://issues.apache.org/jira/browse/HDFS-16004
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lujie
>Assignee: lujie
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I have some doubt when i configurate secure HDFS.  I know we have Service 
> Level Authorization  for protocols like NamenodeProtocol,DatanodeProtocol and 
> so on.
> But i do not find such Authorization   for JournalProtocol after reading the 
> code in HDFSPolicyProvider.  And if we have, how can i configurate such 
> Authorization?
>  
> Besides  even NamenodeProtocol has Service Level Authorization, its methods 
> still have Permission check. Take startCheckpoint in NameNodeRpcServer who 
> implemented NamenodeProtocol  for example:
>  
> _public NamenodeCommand startCheckpoint(NamenodeRegistration registration)_
>       _throws IOException {_
>     _String operationName = "startCheckpoint";_
>     _checkNNStartup();_
>     _{color:#ff6600}namesystem.checkSuperuserPrivilege(operationName);{color}_
> _.._
>  
> I found that the methods in  BackupNodeRpcServer who implemented 
> JournalProtocol  lack of such  Permission check. See below:
>  
>  
>     _public void startLogSegment(JournalInfo journalInfo, long epoch,_
>         _long txid) throws IOException {_
>       _namesystem.checkOperation(OperationCategory.JOURNAL);_
>       _verifyJournalRequest(journalInfo);_
>       _getBNImage().namenodeStartedLogSegment(txid);_
>     _}_
>  
>     _@Override_
>     _public void journal(JournalInfo journalInfo, long epoch, long firstTxId,_
>         _int numTxns, byte[] records) throws IOException {_
>       _namesystem.checkOperation(OperationCategory.JOURNAL);_
>       _verifyJournalRequest(journalInfo);_
>       _getBNImage().journal(firstTxId, numTxns, records);_
>     _}_
>  
> Do we need add Permission check for them?
>  
> Please point out my mistakes if i am wrong or miss something. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2021-05-10 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17341089#comment-17341089
 ] 

Konstantin Shvachko edited comment on HDFS-14703 at 5/10/21, 7:30 PM:
--

Updated the POC patches to current trunk. There were indeed some missing parts 
in the first patch.
 See [^003-partitioned-inodeMap-POC.tar.gz].

Also created a remote branch called {{fgl}} in hadoop repo with both patches 
applied to current trunk. [~xinglin] is working on adding {{create()}} call to 
FGL. Right now only {{mkdirs()}} is supported.


was (Author: shv):
Updated the POC patches to current trunk. There were indeed some missing parts 
in the first patch.
 See [^003-partitioned-inodeMap-POC.tar.gz].

> NameNode Fine-Grained Locking via Metadata Partitioning
> ---
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: 001-partitioned-inodeMap-POC.tar.gz, 
> 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, 
> NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2021-05-07 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17341089#comment-17341089
 ] 

Konstantin Shvachko edited comment on HDFS-14703 at 5/8/21, 1:05 AM:
-

Updated the POC patches to current trunk. There were indeed some missing parts 
in the first patch.
 See [^003-partitioned-inodeMap-POC.tar.gz].


was (Author: shv):
Updated the POC patches. There were indeed some missing parts in the first 
patch.
See 
[003-partitioned-inodeMap-POC.tar.gz|https://issues.apache.org/jira/secure/attachment/13025177/003-partitioned-inodeMap-POC.tar.gz].

> NameNode Fine-Grained Locking via Metadata Partitioning
> ---
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: 001-partitioned-inodeMap-POC.tar.gz, 
> 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, 
> NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2021-05-07 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17341089#comment-17341089
 ] 

Konstantin Shvachko edited comment on HDFS-14703 at 5/8/21, 1:04 AM:
-

Updated the POC patches. There were indeed some missing parts in the first 
patch.
See 
[003-partitioned-inodeMap-POC.tar.gz|https://issues.apache.org/jira/secure/attachment/13025177/003-partitioned-inodeMap-POC.tar.gz].


was (Author: shv):
Updated the POC patches. There were indeed some missing parts in the first 
patch. See 
[https://issues.apache.org/jira/secure/attachment/13025177/003-partitioned-inodeMap-POC.tar.gz|https://issues.apache.org/jira/secure/attachment/13025177/003-partitioned-inodeMap-POC.tar.gz].

> NameNode Fine-Grained Locking via Metadata Partitioning
> ---
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: 001-partitioned-inodeMap-POC.tar.gz, 
> 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, 
> NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2021-05-07 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17341089#comment-17341089
 ] 

Konstantin Shvachko commented on HDFS-14703:


Updated the POC patches. There were indeed some missing parts in the first 
patch. See 
[https://issues.apache.org/jira/secure/attachment/13025177/003-partitioned-inodeMap-POC.tar.gz|https://issues.apache.org/jira/secure/attachment/13025177/003-partitioned-inodeMap-POC.tar.gz].

> NameNode Fine-Grained Locking via Metadata Partitioning
> ---
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: 001-partitioned-inodeMap-POC.tar.gz, 
> 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, 
> NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2021-05-07 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-14703:
---
Attachment: 003-partitioned-inodeMap-POC.tar.gz

> NameNode Fine-Grained Locking via Metadata Partitioning
> ---
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: 001-partitioned-inodeMap-POC.tar.gz, 
> 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, 
> NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16001) TestOfflineEditsViewer.testStored() fails reading negative value of FSEditLogOpCodes

2021-05-07 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17341046#comment-17341046
 ] 

Konstantin Shvachko commented on HDFS-16001:


Checked the test. This fixes it.
+1 thanks [~aajisaka]

> TestOfflineEditsViewer.testStored() fails reading negative value of 
> FSEditLogOpCodes
> 
>
> Key: HDFS-16001
> URL: https://issues.apache.org/jira/browse/HDFS-16001
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Konstantin Shvachko
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {{TestOfflineEditsViewer.testStored()}} fails consistently with an exception
> {noformat}
> java.io.IOException: Op -54 has size -1314247195, but the minimum op size is 
> 17
> {noformat}
> Seems like there is a corrupt record in {{editsStored}} file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15912) Allow ProtobufRpcEngine to be extensible

2021-04-30 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337546#comment-17337546
 ] 

Konstantin Shvachko commented on HDFS-15912:


Since all changes are in {{hadoop-common}} this should be HADOOP-* jira, rather 
than HDFS-.
Could you please move it to the right jira project to adjust the visibility for 
the right audience.

About the change itself.
# There are some [checkstyle 
warnings|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2905/1/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt],
 which are actually right. It is perfectly fine for methods to be protected , 
but not for the members. A better way is to keep them private and provide 
get/setters. For those that are really needed.
# I see some white space change, like a blank line with spaces.

> Allow ProtobufRpcEngine to be extensible
> 
>
> Key: HDFS-15912
> URL: https://issues.apache.org/jira/browse/HDFS-15912
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Hector Sandoval Chaverri
>Assignee: Hector Sandoval Chaverri
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The ProtobufRpcEngine class doesn't allow for new RpcEngine implementations 
> to extend some of its inner classes (e.g. Invoker and 
> Server.ProtoBufRpcInvoker). Also, some of its methods are long enough such 
> that overriding them would result in a lot of code duplication (e.g. 
> Invoker#invoke and Server.ProtoBufRpcInvoker#call).
> When implementing a new RpcEngine, it would be helpful to reuse most of the 
> code already in ProtobufRpcEngine. This would allow new fields to be added to 
> the RPC header or message with minimal code changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15652) Make block size from NNThroughputBenchmark configurable

2021-04-29 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-15652:
---
Fix Version/s: 3.2.3
   2.10.2
   3.1.5
   3.3.1

Just back-ported this to branches 3.3, 3.2, 3.1, and 2.10. Updated Fix Versions.
Thanks [~ferhui] for contributing.

> Make block size from NNThroughputBenchmark configurable 
> 
>
> Key: HDFS-15652
> URL: https://issues.apache.org/jira/browse/HDFS-15652
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: benchmarks
>Affects Versions: 3.3.0
>Reporter: Hui Fei
>Assignee: Hui Fei
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When test NNThroughputBenchmark, get following error logs.
> {quote}
> 2020-10-26 20:51:25,781 ERROR namenode.NNThroughputBenchmark: StatsDaemon 43 
> failed: 
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): Specified block 
> size is less than configured minimum value 
> (dfs.namenode.fs-limits.min-block-size): 16 < 1048576
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2514)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2452)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.createOriginal(NameNodeRpcServer.java:824)
> at 
> org.apache.hadoop.hdfs.server.namenode.ProtectionManager.create(ProtectionManager.java:344)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:792)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:326)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1020)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2002)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2985)
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1562)
> at org.apache.hadoop.ipc.Client.call(Client.java:1508)
> at org.apache.hadoop.ipc.Client.call(Client.java:1405)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
> at com.sun.proxy.$Proxy9.create(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:281)
> at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
> at com.sun.proxy.$Proxy10.create(Unknown Source)
> at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$CreateFileStats.executeOp(NNThroughputBenchmark.java:597)
> at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$StatsDaemon.benchmarkOne(NNThroughputBenchmark.java:428)
> at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$StatsDaemon.run(NNThroughputBenchmark.java:412)
> {quote}
> Because NN has start and serves, we should make block size of client 
> benchmark configurable, and that will be convenient



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To 

[jira] [Updated] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log

2021-04-29 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-15915:
---
Attachment: HDFS-15915-03.patch

> Race condition with async edits logging due to updating txId outside of the 
> namesystem log
> --
>
> Key: HDFS-15915
> URL: https://issues.apache.org/jira/browse/HDFS-15915
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15915-01.patch, HDFS-15915-02.patch, 
> HDFS-15915-03.patch, testMkdirsRace.patch
>
>
> {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside 
> {{FSNamesystem.writeLock}}. But one essential field the transaction id of the 
> edits op remains unset until the time when the operation is scheduled for 
> synching. At that time {{beginTransaction()}} will set the the 
> {{FSEditLogOp.txid}} and increment the global transaction count. On busy 
> NameNode this event can fall outside the write lock. 
> This causes problems for Observer reads. It also can potentially reshuffle 
> transactions and Standby will apply them in a wrong order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15652) Make block size from NNThroughputBenchmark configurable

2021-04-28 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335066#comment-17335066
 ] 

Konstantin Shvachko commented on HDFS-15652:


I would like to backport this to earlier versions up to 2.10, if there are no 
objections.

> Make block size from NNThroughputBenchmark configurable 
> 
>
> Key: HDFS-15652
> URL: https://issues.apache.org/jira/browse/HDFS-15652
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: benchmarks
>Affects Versions: 3.3.0
>Reporter: Hui Fei
>Assignee: Hui Fei
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When test NNThroughputBenchmark, get following error logs.
> {quote}
> 2020-10-26 20:51:25,781 ERROR namenode.NNThroughputBenchmark: StatsDaemon 43 
> failed: 
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): Specified block 
> size is less than configured minimum value 
> (dfs.namenode.fs-limits.min-block-size): 16 < 1048576
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2514)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2452)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.createOriginal(NameNodeRpcServer.java:824)
> at 
> org.apache.hadoop.hdfs.server.namenode.ProtectionManager.create(ProtectionManager.java:344)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:792)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:326)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1020)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2002)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2985)
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1562)
> at org.apache.hadoop.ipc.Client.call(Client.java:1508)
> at org.apache.hadoop.ipc.Client.call(Client.java:1405)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
> at com.sun.proxy.$Proxy9.create(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:281)
> at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
> at com.sun.proxy.$Proxy10.create(Unknown Source)
> at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$CreateFileStats.executeOp(NNThroughputBenchmark.java:597)
> at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$StatsDaemon.benchmarkOne(NNThroughputBenchmark.java:428)
> at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$StatsDaemon.run(NNThroughputBenchmark.java:412)
> {quote}
> Because NN has start and serves, we should make block size of client 
> benchmark configurable, and that will be convenient



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Commented] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log

2021-04-28 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335052#comment-17335052
 ] 

Konstantin Shvachko commented on HDFS-15915:


Patch v.02 fixes findbugs and white space warnings.
Checked test failures
* {{TestOfflineEditsViewer}} fails on trunk the same way as with the patch. 
Filed HDFS-16001 for it.
* {{TestDirectoryScanner}} intermittently fails because of HDFS-11045.
* All other tests passed locally.

> Race condition with async edits logging due to updating txId outside of the 
> namesystem log
> --
>
> Key: HDFS-15915
> URL: https://issues.apache.org/jira/browse/HDFS-15915
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15915-01.patch, HDFS-15915-02.patch, 
> testMkdirsRace.patch
>
>
> {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside 
> {{FSNamesystem.writeLock}}. But one essential field the transaction id of the 
> edits op remains unset until the time when the operation is scheduled for 
> synching. At that time {{beginTransaction()}} will set the the 
> {{FSEditLogOp.txid}} and increment the global transaction count. On busy 
> NameNode this event can fall outside the write lock. 
> This causes problems for Observer reads. It also can potentially reshuffle 
> transactions and Standby will apply them in a wrong order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log

2021-04-28 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-15915:
---
Attachment: HDFS-15915-02.patch

> Race condition with async edits logging due to updating txId outside of the 
> namesystem log
> --
>
> Key: HDFS-15915
> URL: https://issues.apache.org/jira/browse/HDFS-15915
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15915-01.patch, HDFS-15915-02.patch, 
> testMkdirsRace.patch
>
>
> {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside 
> {{FSNamesystem.writeLock}}. But one essential field the transaction id of the 
> edits op remains unset until the time when the operation is scheduled for 
> synching. At that time {{beginTransaction()}} will set the the 
> {{FSEditLogOp.txid}} and increment the global transaction count. On busy 
> NameNode this event can fall outside the write lock. 
> This causes problems for Observer reads. It also can potentially reshuffle 
> transactions and Standby will apply them in a wrong order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16001) TestOfflineEditsViewer.testStored() fails reading negative value of FSEditLogOpCodes

2021-04-28 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335051#comment-17335051
 ] 

Konstantin Shvachko commented on HDFS-16001:


This fails consistently on trunk but not in 2.10. I did not check other 
versions.
Full exception here:
{noformat:nowrap}
Op -54 has size -1314247195, but the minimum op size is 17
Encountered exception. Exiting: Op -54 has size -1314247195, but the minimum op 
size is 17
java.io.IOException: Op -54 has size -1314247195, but the minimum op size is 17
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LengthPrefixedReader.decodeOpFrame(FSEditLogOp.java:5244)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LengthPrefixedReader.decodeOp(FSEditLogOp.java:5186)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$Reader.readOp(FSEditLogOp.java:5059)
at 
org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.nextOpImpl(EditLogFileInputStream.java:229)
at 
org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.nextOp(EditLogFileInputStream.java:276)
at 
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
at 
org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBinaryLoader.loadEdits(OfflineEditsBinaryLoader.java:67)
at 
org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:158)
at 
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer.runOev(TestOfflineEditsViewer.java:208)
at 
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer.testStored(TestOfflineEditsViewer.java:176)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{noformat}

> TestOfflineEditsViewer.testStored() fails reading negative value of 
> FSEditLogOpCodes
> 
>
> Key: HDFS-16001
> URL: https://issues.apache.org/jira/browse/HDFS-16001
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Konstantin Shvachko
>Priority: Major
>
> {{TestOfflineEditsViewer.testStored()}} fails consistently with an exception
> 

[jira] [Created] (HDFS-16001) TestOfflineEditsViewer.testStored() fails reading negative value of FSEditLogOpCodes

2021-04-28 Thread Konstantin Shvachko (Jira)
Konstantin Shvachko created HDFS-16001:
--

 Summary: TestOfflineEditsViewer.testStored() fails reading 
negative value of FSEditLogOpCodes
 Key: HDFS-16001
 URL: https://issues.apache.org/jira/browse/HDFS-16001
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Reporter: Konstantin Shvachko






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16001) TestOfflineEditsViewer.testStored() fails reading negative value of FSEditLogOpCodes

2021-04-28 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-16001:
---
  Docs Text:   (was: {{TestOfflineEditsViewer.testStored()}} fails 
consistently with an exception
{noformat}
java.io.IOException: Op -54 has size -1314247195, but the minimum op size is 17
{noformat}
Seems like there is a corrupt record in {{editsStored}} file.)
Description: 
{{TestOfflineEditsViewer.testStored()}} fails consistently with an exception
{noformat}
java.io.IOException: Op -54 has size -1314247195, but the minimum op size is 17
{noformat}
Seems like there is a corrupt record in {{editsStored}} file.

> TestOfflineEditsViewer.testStored() fails reading negative value of 
> FSEditLogOpCodes
> 
>
> Key: HDFS-16001
> URL: https://issues.apache.org/jira/browse/HDFS-16001
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Konstantin Shvachko
>Priority: Major
>
> {{TestOfflineEditsViewer.testStored()}} fails consistently with an exception
> {noformat}
> java.io.IOException: Op -54 has size -1314247195, but the minimum op size is 
> 17
> {noformat}
> Seems like there is a corrupt record in {{editsStored}} file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7612) TestOfflineEditsViewer.testStored() uses incorrect default value for cacheDir

2021-04-28 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-7612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335043#comment-17335043
 ] 

Konstantin Shvachko commented on HDFS-7612:
---

Came across this again. Still not fixed.
We just need to replace the default value of {{System.getProperty()}} with 
{{"target/test-classes"}}

> TestOfflineEditsViewer.testStored() uses incorrect default value for cacheDir
> -
>
> Key: HDFS-7612
> URL: https://issues.apache.org/jira/browse/HDFS-7612
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.6.0
>Reporter: Konstantin Shvachko
>Priority: Major
>  Labels: newbie
>
> {code}
> final String cacheDir = System.getProperty("test.cache.data",
> "build/test/cache");
> {code}
> results in
> {{FileNotFoundException: build/test/cache/editsStoredParsed.xml (No such file 
> or directory)}}
> when {{test.cache.data}} is not set.
> I can see this failing while running in Eclipse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-7612) TestOfflineEditsViewer.testStored() uses incorrect default value for cacheDir

2021-04-28 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-7612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-7612:
--
Labels: newbie  (was: )

> TestOfflineEditsViewer.testStored() uses incorrect default value for cacheDir
> -
>
> Key: HDFS-7612
> URL: https://issues.apache.org/jira/browse/HDFS-7612
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.6.0
>Reporter: Konstantin Shvachko
>Priority: Major
>  Labels: newbie
>
> {code}
> final String cacheDir = System.getProperty("test.cache.data",
> "build/test/cache");
> {code}
> results in
> {{FileNotFoundException: build/test/cache/editsStoredParsed.xml (No such file 
> or directory)}}
> when {{test.cache.data}} is not set.
> I can see this failing while running in Eclipse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log

2021-04-28 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko reassigned HDFS-15915:
--

Assignee: Konstantin Shvachko

> Race condition with async edits logging due to updating txId outside of the 
> namesystem log
> --
>
> Key: HDFS-15915
> URL: https://issues.apache.org/jira/browse/HDFS-15915
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15915-01.patch, testMkdirsRace.patch
>
>
> {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside 
> {{FSNamesystem.writeLock}}. But one essential field the transaction id of the 
> edits op remains unset until the time when the operation is scheduled for 
> synching. At that time {{beginTransaction()}} will set the the 
> {{FSEditLogOp.txid}} and increment the global transaction count. On busy 
> NameNode this event can fall outside the write lock. 
> This causes problems for Observer reads. It also can potentially reshuffle 
> transactions and Standby will apply them in a wrong order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log

2021-04-28 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-15915:
---
Status: Patch Available  (was: Open)

> Race condition with async edits logging due to updating txId outside of the 
> namesystem log
> --
>
> Key: HDFS-15915
> URL: https://issues.apache.org/jira/browse/HDFS-15915
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15915-01.patch, testMkdirsRace.patch
>
>
> {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside 
> {{FSNamesystem.writeLock}}. But one essential field the transaction id of the 
> edits op remains unset until the time when the operation is scheduled for 
> synching. At that time {{beginTransaction()}} will set the the 
> {{FSEditLogOp.txid}} and increment the global transaction count. On busy 
> NameNode this event can fall outside the write lock. 
> This causes problems for Observer reads. It also can potentially reshuffle 
> transactions and Standby will apply them in a wrong order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log

2021-04-28 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-15915:
---
Attachment: HDFS-15915-01.patch

> Race condition with async edits logging due to updating txId outside of the 
> namesystem log
> --
>
> Key: HDFS-15915
> URL: https://issues.apache.org/jira/browse/HDFS-15915
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15915-01.patch, testMkdirsRace.patch
>
>
> {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside 
> {{FSNamesystem.writeLock}}. But one essential field the transaction id of the 
> edits op remains unset until the time when the operation is scheduled for 
> synching. At that time {{beginTransaction()}} will set the the 
> {{FSEditLogOp.txid}} and increment the global transaction count. On busy 
> NameNode this event can fall outside the write lock. 
> This causes problems for Observer reads. It also can potentially reshuffle 
> transactions and Standby will apply them in a wrong order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log

2021-04-28 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334483#comment-17334483
 ] 

Konstantin Shvachko commented on HDFS-15915:


Attaching a patch to fix the problem. The is a lot of moving parts in 
asynchronous journal logging, took me a while to get it working, although the 
actual fix doesn't look complex.
# The main idea is that a new txId is assigned to the journal transaction when 
it is logged by {{logEdit(op)}} when the call is still under {{fsn.writeLock}}, 
rather than later while in {{logSync()}} as it is now.
I think this is the right way to _*guarantee that all transactions are 
journalled in the same order as they were applied on Active NameNode*_.
# Currently we do not have checks or tests against mismatch of the transactions 
order. This would have been a problem for regular HA with or without Observer. 
I could not build a test, which would show the order of transactions can be 
tampered with, but couldn't convince myself it is impossible either.
The patch adds asserts to guarantee the journal txIds order is the same as they 
were applied to ANN.
# I had to rework {{TestEditLogRace.testDeadlock()}}. Changed it to mock on 
{{doEditTransaction()}} instead of on {{setTransactionId()}} for the "blocker 
thread". Also with FSEditLogAsync we cannot really reuse the same operation 
instance for different transactions any more as they now have txid set in it 
before syncing. This is [~daryn]'s creation. woud appreciate if you could take 
a look.

> Race condition with async edits logging due to updating txId outside of the 
> namesystem log
> --
>
> Key: HDFS-15915
> URL: https://issues.apache.org/jira/browse/HDFS-15915
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: testMkdirsRace.patch
>
>
> {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside 
> {{FSNamesystem.writeLock}}. But one essential field the transaction id of the 
> edits op remains unset until the time when the operation is scheduled for 
> synching. At that time {{beginTransaction()}} will set the the 
> {{FSEditLogOp.txid}} and increment the global transaction count. On busy 
> NameNode this event can fall outside the write lock. 
> This causes problems for Observer reads. It also can potentially reshuffle 
> transactions and Standby will apply them in a wrong order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log

2021-03-23 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-15915:
---
Target Version/s: 2.10.2

> Race condition with async edits logging due to updating txId outside of the 
> namesystem log
> --
>
> Key: HDFS-15915
> URL: https://issues.apache.org/jira/browse/HDFS-15915
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: testMkdirsRace.patch
>
>
> {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside 
> {{FSNamesystem.writeLock}}. But one essential field the transaction id of the 
> edits op remains unset until the time when the operation is scheduled for 
> synching. At that time {{beginTransaction()}} will set the the 
> {{FSEditLogOp.txid}} and increment the global transaction count. On busy 
> NameNode this event can fall outside the write lock. 
> This causes problems for Observer reads. It also can potentially reshuffle 
> transactions and Standby will apply them in a wrong order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log

2021-03-23 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307463#comment-17307463
 ] 

Konstantin Shvachko commented on HDFS-15915:


Attached the test reproducing the bug.
Looks like [~zero45] warned about this problem in [his 
comment|https://issues.apache.org/jira/browse/HDFS-13399?focusedCommentId=16454623=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16454623].
 I don't remember though what was the resolution back then.

> Race condition with async edits logging due to updating txId outside of the 
> namesystem log
> --
>
> Key: HDFS-15915
> URL: https://issues.apache.org/jira/browse/HDFS-15915
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: testMkdirsRace.patch
>
>
> {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside 
> {{FSNamesystem.writeLock}}. But one essential field the transaction id of the 
> edits op remains unset until the time when the operation is scheduled for 
> synching. At that time {{beginTransaction()}} will set the the 
> {{FSEditLogOp.txid}} and increment the global transaction count. On busy 
> NameNode this event can fall outside the write lock. 
> This causes problems for Observer reads. It also can potentially reshuffle 
> transactions and Standby will apply them in a wrong order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log

2021-03-23 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-15915:
---
Attachment: testMkdirsRace.patch

> Race condition with async edits logging due to updating txId outside of the 
> namesystem log
> --
>
> Key: HDFS-15915
> URL: https://issues.apache.org/jira/browse/HDFS-15915
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: testMkdirsRace.patch
>
>
> {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside 
> {{FSNamesystem.writeLock}}. But one essential field the transaction id of the 
> edits op remains unset until the time when the operation is scheduled for 
> synching. At that time {{beginTransaction()}} will set the the 
> {{FSEditLogOp.txid}} and increment the global transaction count. On busy 
> NameNode this event can fall outside the write lock. 
> This causes problems for Observer reads. It also can potentially reshuffle 
> transactions and Standby will apply them in a wrong order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log

2021-03-23 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307458#comment-17307458
 ] 

Konstantin Shvachko commented on HDFS-15915:


# Suppose two {{mkdirs}} for the same path are running on the Active NameNode 
at the same time. Assume that the path does not exist yet and that the two RPCs 
are coming from two clients c1 and c2.
# Then one of them, e.g. c1, will create the directory in memory and generate 
the respective transaction {{MkdirOp}}, which has all the fields except for 
{{txid}}. Then it will enqueue the transaction in 
{{FSEditLogAsync.logEdit(op)}} for further asynchronous processing. The handler 
thread processing this rpc from c1 is now free to release the write lock and 
give control to other threads.
# {{FSEditLogAsync.run()}} will asynchronously process the transaction when it 
dequeues it. At that time it will assign the {{txid}} for the transaction, see 
{{logEdit() -> doEditTransaction() -> beginTransaction()}}, and increment the 
global transaction count {{FSEditLog.txid}}. This can happen either inside or 
outside of the namesystem lock. Under heavy load (rare event) the call to 
{{logEdit()}} can happen outside the lock. And that causes the problem.
# Now suppose that {{MkdirOp}} has not been processed yet, but the second 
{{mkdirs()}} from client c2 started executing. It can proceed because the write 
lock has been released. The c2 call will find that the directory already exists 
and will return to the client without generating any transactions. In the reply 
it will populate {{lastSeenStateId}}. But the stateId will be less than the 
txId of the {{MkdirOp}} client c2 just have seen, because this transaction has 
not been processed yet and the global tx count {{FSEditLog.txid}} did not 
advance.
# Then of course going to ObserverNode with that transaction id can cause stale 
read if the client reaches the Observer before it tails the {{MkdirOp}} edit 
from the journal.

I managed to reproduce this in a unit test. Attaching. The test spawns a bunch 
of {{mkdirs()}} on the same path. Then it mocks {{doEditTransaction()}} to 
delay async processing of the mkdir transaction on Active NN. The delay is 
sufficient for another {{mkdirs()}} call to pass through and obtain the wrong 
{{lastSeenStateId}}. Then one can see {{FileNotFoundException}}, which 
indicates stale read from Observer.

_Seems like a straightforward solution is to assign the transaction id at the 
time of its creation before it is enqueued. The queue order should guarantee 
the same result of the assignment as now, but will avoid the race._

> Race condition with async edits logging due to updating txId outside of the 
> namesystem log
> --
>
> Key: HDFS-15915
> URL: https://issues.apache.org/jira/browse/HDFS-15915
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
>
> {{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside 
> {{FSNamesystem.writeLock}}. But one essential field the transaction id of the 
> edits op remains unset until the time when the operation is scheduled for 
> synching. At that time {{beginTransaction()}} will set the the 
> {{FSEditLogOp.txid}} and increment the global transaction count. On busy 
> NameNode this event can fall outside the write lock. 
> This causes problems for Observer reads. It also can potentially reshuffle 
> transactions and Standby will apply them in a wrong order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log

2021-03-23 Thread Konstantin Shvachko (Jira)
Konstantin Shvachko created HDFS-15915:
--

 Summary: Race condition with async edits logging due to updating 
txId outside of the namesystem log
 Key: HDFS-15915
 URL: https://issues.apache.org/jira/browse/HDFS-15915
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs, namenode
Reporter: Konstantin Shvachko


{{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside 
{{FSNamesystem.writeLock}}. But one essential field the transaction id of the 
edits op remains unset until the time when the operation is scheduled for 
synching. At that time {{beginTransaction()}} will set the the 
{{FSEditLogOp.txid}} and increment the global transaction count. On busy 
NameNode this event can fall outside the write lock. 
This causes problems for Observer reads. It also can potentially reshuffle 
transactions and Standby will apply them in a wrong order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14731) [FGL] Remove redundant locking on NameNode.

2021-03-19 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305081#comment-17305081
 ] 

Konstantin Shvachko commented on HDFS-14731:


Sure [~xilangyan] I was thinking about back porting into 2.10 branch as well.
Would you like to work on a backport patch? I didn't look if it is 
straightforward or not.

> [FGL] Remove redundant locking on NameNode.
> ---
>
> Key: HDFS-14731
> URL: https://issues.apache.org/jira/browse/HDFS-14731
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14731.001.patch
>
>
> Currently NameNode has two global locks: FSNamesystemLock and 
> FSDirectoryLock. An analysis shows that single FSNamesystemLock is sufficient 
> to guarantee consistency of the NameNode state. FSDirectoryLock can be 
> removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time

2021-03-07 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296958#comment-17296958
 ] 

Konstantin Shvachko commented on HDFS-15808:


Hey there is no way to modify a commit unless we force-push, which is not 
recommended and we do it only on feature branches. I guess you just need to 
make sure to set the right address in the future.

> Add metrics for FSNamesystem read/write lock hold long time
> ---
>
> Key: HDFS-15808
> URL: https://issues.apache.org/jira/browse/HDFS-15808
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: hdfs
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: hdfs, lock, metrics, pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3
>
> Attachments: ExpiredHeartbeat.png, HDFS-15808-branch-3.3.001.patch, 
> lockLongHoldCount
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> To monitor how often read/write locks exceed thresholds, we can add two 
> metrics(ReadLockLongHoldCount/WriteLockLongHoldCount), which are exposed in 
> JMX.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time

2021-03-06 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-15808:
---
Fix Version/s: 3.2.3
   2.10.2
   3.1.5
   3.4.0
   3.3.1
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

I just committed this to trunk and branches listed in "Fix Version".
Thank you [~tomscut].

> Add metrics for FSNamesystem read/write lock hold long time
> ---
>
> Key: HDFS-15808
> URL: https://issues.apache.org/jira/browse/HDFS-15808
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: hdfs
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: hdfs, lock, metrics, pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3
>
> Attachments: ExpiredHeartbeat.png, HDFS-15808-branch-3.3.001.patch, 
> lockLongHoldCount
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> To monitor how often read/write locks exceed thresholds, we can add two 
> metrics(ReadLockLongHoldCount/WriteLockLongHoldCount), which are exposed in 
> JMX.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time

2021-03-06 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-15808:
---
Status: Patch Available  (was: Open)

> Add metrics for FSNamesystem read/write lock hold long time
> ---
>
> Key: HDFS-15808
> URL: https://issues.apache.org/jira/browse/HDFS-15808
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: hdfs
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: hdfs, lock, metrics, pull-request-available
> Attachments: ExpiredHeartbeat.png, HDFS-15808-branch-3.3.001.patch, 
> lockLongHoldCount
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> To monitor how often read/write locks exceed thresholds, we can add two 
> metrics(ReadLockLongHoldCount/WriteLockLongHoldCount), which are exposed in 
> JMX.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15849) ExpiredHeartbeats metric should be of Type.COUNTER

2021-03-01 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-15849:
---
Fix Version/s: 3.2.3
   2.10.2
   3.1.5
   3.4.0
   3.3.1
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

I just committed this. Thank you [~zhuqi] for contributing.

> ExpiredHeartbeats metric should be of Type.COUNTER
> --
>
> Key: HDFS-15849
> URL: https://issues.apache.org/jira/browse/HDFS-15849
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: metrics
>Reporter: Konstantin Shvachko
>Assignee: Qi Zhu
>Priority: Major
>  Labels: newbie
> Fix For: 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3
>
> Attachments: HDFS-15849.001.patch, HDFS-15849.002.patch
>
>
> Currently {{ExpiredHeartbeats}} metric has default type, which makes it 
> {{Type.GAUGE}}. It should be {{Type.COUNTER}} for proper graphing. See 
> discussion in HDFS-15808.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time

2021-03-01 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293268#comment-17293268
 ] 

Konstantin Shvachko commented on HDFS-15808:


[~tomscut] I have problem merging this to branch-3.3. We want it there, right? 
If so, could you please provide a patch for 3.3.

> Add metrics for FSNamesystem read/write lock hold long time
> ---
>
> Key: HDFS-15808
> URL: https://issues.apache.org/jira/browse/HDFS-15808
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: hdfs
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: hdfs, lock, metrics, pull-request-available
> Attachments: ExpiredHeartbeat.png, lockLongHoldCount
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> To monitor how often read/write locks exceed thresholds, we can add two 
> metrics(ReadLockLongHoldCount/WriteLockLongHoldCount), which are exposed in 
> JMX.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time

2021-02-22 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17288566#comment-17288566
 ] 

Konstantin Shvachko commented on HDFS-15808:


+1 on pull request.
Created HDFS-15849 to fix {{ExpiredHeartbeats}}

> Add metrics for FSNamesystem read/write lock hold long time
> ---
>
> Key: HDFS-15808
> URL: https://issues.apache.org/jira/browse/HDFS-15808
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: hdfs
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: hdfs, lock, metrics, pull-request-available
> Attachments: ExpiredHeartbeat.png, lockLongHoldCount
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> To monitor how often read/write locks exceed thresholds, we can add two 
> metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15849) ExpiredHeartbeats metric should be of Type.COUNTER

2021-02-22 Thread Konstantin Shvachko (Jira)
Konstantin Shvachko created HDFS-15849:
--

 Summary: ExpiredHeartbeats metric should be of Type.COUNTER
 Key: HDFS-15849
 URL: https://issues.apache.org/jira/browse/HDFS-15849
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: metrics
Reporter: Konstantin Shvachko


Currently {{ExpiredHeartbeats}} metric has default type, which makes it 
{{Type.GAUGE}}. It should be {{Type.COUNTER}} for proper graphing. See 
discussion in HDFS-15808.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time

2021-02-19 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17287396#comment-17287396
 ] 

Konstantin Shvachko commented on HDFS-15808:


[~tomscut] sure we all use different systems for managing metrics.
{{RpcQueueTime}} is of type {{MutableRate}}, while {{ExpiredHeartbeats}} and 
your new metric are just a {{@Metric}}, which makes it of type {{GAUGE}} as 
Erik pointed out.
In my system {{ExpiredHeartbeats}} look like this:  !ExpiredHeartbeat.png!
Good point [~xkrogen] about adding {{type=Type.COUNT}} to the annotation, this 
should fix the problem.

> Add metrics for FSNamesystem read/write lock hold long time
> ---
>
> Key: HDFS-15808
> URL: https://issues.apache.org/jira/browse/HDFS-15808
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: hdfs
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: hdfs, lock, metrics, pull-request-available
> Attachments: ExpiredHeartbeat.png, lockLongHoldCount
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> To monitor how often read/write locks exceed thresholds, we can add two 
> metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time

2021-02-19 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-15808:
---
Attachment: ExpiredHeartbeat.png

> Add metrics for FSNamesystem read/write lock hold long time
> ---
>
> Key: HDFS-15808
> URL: https://issues.apache.org/jira/browse/HDFS-15808
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: hdfs
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: hdfs, lock, metrics, pull-request-available
> Attachments: ExpiredHeartbeat.png, lockLongHoldCount
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> To monitor how often read/write locks exceed thresholds, we can add two 
> metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time

2021-02-19 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-15808:
---
Attachment: (was:  ExpiredHeartbeat.png)

> Add metrics for FSNamesystem read/write lock hold long time
> ---
>
> Key: HDFS-15808
> URL: https://issues.apache.org/jira/browse/HDFS-15808
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: hdfs
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: hdfs, lock, metrics, pull-request-available
> Attachments: lockLongHoldCount
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> To monitor how often read/write locks exceed thresholds, we can add two 
> metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time

2021-02-19 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-15808:
---
Attachment:  ExpiredHeartbeat.png

> Add metrics for FSNamesystem read/write lock hold long time
> ---
>
> Key: HDFS-15808
> URL: https://issues.apache.org/jira/browse/HDFS-15808
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: hdfs
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: hdfs, lock, metrics, pull-request-available
> Attachments:  ExpiredHeartbeat.png, lockLongHoldCount
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> To monitor how often read/write locks exceed thresholds, we can add two 
> metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time

2021-02-17 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286133#comment-17286133
 ] 

Konstantin Shvachko commented on HDFS-15808:


Hey [~tomscut]. The patch looks fine, but I doubt the metric will be useful in 
its current form. Monotonically increasing counter doesn't tell you much when 
plotted. Over time it just becomes an incredibly large number, hard to see its 
fluctuations. And you cannot set alerts if the threshold is exceeded often.
See e.g. {{ExpiredHeartbeats}} or {{LastWrittenTransactionId}} - not useful.
I assume you need something like a rate.

> Add metrics for FSNamesystem read/write lock hold long time
> ---
>
> Key: HDFS-15808
> URL: https://issues.apache.org/jira/browse/HDFS-15808
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: hdfs
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: hdfs, lock, metrics, pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> To monitor how often read/write locks exceed thresholds, we can add two 
> metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15792) ClasscastException while loading FSImage

2021-02-09 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282003#comment-17282003
 ] 

Konstantin Shvachko commented on HDFS-15792:


Hey guys, sorry for coming late to this.
But we should avoid using {{ConcurrentHashMap}}. It is known to have 
performance issues and adds a lot of memory overhead. So whoever is using ACLs 
heavily will have larger namespace requirements - very bad for large clusters.

Would prefer proper synchronization of the methods in {{ReferenceCountMap}}.
Should we reopen this to revisit the fix?

> ClasscastException while loading FSImage
> 
>
> Key: HDFS-15792
> URL: https://issues.apache.org/jira/browse/HDFS-15792
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nn
>Reporter: Renukaprasad C
>Assignee: Renukaprasad C
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 2.10.2
>
> Attachments: HDFS-15792-branch-2.10.001.patch, 
> HDFS-15792-branch-2.10.002.patch, HDFS-15792-branch-2.10.003.patch, 
> HDFS-15792-branch-2.10.004.patch, HDFS-15792.001.patch, HDFS-15792.002.patch, 
> HDFS-15792.003.patch, HDFS-15792.004.patch, HDFS-15792.005.patch, 
> HDFS-15792.addendum.001.patch, image-2021-01-27-12-00-34-846.png
>
>
> FSImage loading has failed with ClasscastException - 
> java.lang.ClassCastException: java.util.HashMap$Node cannot be cast to 
> java.util.HashMap$TreeNode.
> This is the usage issue with Hashmap in concurrent scenarios.
> Same issue has been reported on Java & closed as usage issue.  - 
> https://bugs.openjdk.java.net/browse/JDK-8173671
> 2020-12-28 11:36:26,127 | ERROR | main | An exception occurred when loading 
> INODE from fsiamge. | FSImageFormatProtobuf.java:442
> java.lang.
> : java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode
>   at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1835)
>   at java.util.HashMap$TreeNode.treeify(HashMap.java:1951)
>   at java.util.HashMap.treeifyBin(HashMap.java:772)
>   at java.util.HashMap.putVal(HashMap.java:644)
>   at java.util.HashMap.put(HashMap.java:612)
>   at 
> org.apache.hadoop.hdfs.util.ReferenceCountMap.put(ReferenceCountMap.java:53)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AclStorage.addAclFeature(AclStorage.java:391)
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeWithAdditionalFields.addAclFeature(INodeWithAdditionalFields.java:349)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectory(FSImageFormatPBINode.java:225)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINode(FSImageFormatPBINode.java:406)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.readPBINodes(FSImageFormatPBINode.java:367)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeSection(FSImageFormatPBINode.java:342)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader$2.call(FSImageFormatProtobuf.java:469)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 2020-12-28 11:36:26,130 | ERROR | main | Failed to load image from 
> FSImageFile(file=/srv/BigData/namenode/current/fsimage_00198227480, 
> cpktTxId=00198227480) | FSImage.java:738
> java.io.IOException: java.lang.ClassCastException: java.util.HashMap$Node 
> cannot be cast to java.util.HashMap$TreeNode
>   at 
> org.apache.hadoop.io.MultipleIOException$Builder.add(MultipleIOException.java:68)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.runLoaderTasks(FSImageFormatProtobuf.java:444)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:360)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:263)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:227)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:971)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:955)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:820)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:733)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:331)
>   at 
> 

[jira] [Resolved] (HDFS-15632) AbstractContractDeleteTest should set recursive peremeter to true for recursive test cases.

2021-01-22 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-15632.

Fix Version/s: 3.2.3
   2.10.2
   3.1.5
   3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

I just committed this.
Thank you [~antn.kutuzov] for contributing.

> AbstractContractDeleteTest should set recursive peremeter to true for 
> recursive test cases.
> ---
>
> Key: HDFS-15632
> URL: https://issues.apache.org/jira/browse/HDFS-15632
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Anton Kutuzov
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 3.4.0, 3.1.5, 2.10.2, 3.2.3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{AbstractContractDeleteTest.testDeleteNonexistentPathRecursive()}} should 
> call {{delete(path, true)}} rather than {{false}}
> Also {{AbstractContractDeleteTest.testDeleteNonexistentPathNonRecursive()}} 
> has a wrong assert message. Should be {{"... attempting to non-recursively 
> delete ..."}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-954) There are two security packages in hdfs, should be one

2021-01-21 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-954.
--
Resolution: Won't Fix

Hey [~antn.kutuzov] this is a rather old jira.
I don't think it is a good idea to do repackaging at this point since it will 
make things harder to backport to older versions.
Closing as won't fix.

> There are two security packages in hdfs, should be one
> --
>
> Key: HDFS-954
> URL: https://issues.apache.org/jira/browse/HDFS-954
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Jakob Homan
>Priority: Major
>  Labels: newbie
>
> Currently the test source tree has both
> src/test/hdfs/org/apache/hadoop/hdfs/security with:
> SecurityTestUtil.java
> TestAccessToken.java
> TestClientProtocolWithDelegationToken.java
> and 
> src/test/hdfs/org/apache/hadoop/security with:
> TestDelegationToken.java
> TestGroupMappingServiceRefresh.java
> TestPermission.java
> These should be combined into one package and possibly some things moved to 
> common.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15632) AbstractContractDeleteTest should set recursive peremeter to true for recursive test cases.

2021-01-21 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17269631#comment-17269631
 ] 

Konstantin Shvachko commented on HDFS-15632:


+1 PR looks good. Will commit in a bit.

> AbstractContractDeleteTest should set recursive peremeter to true for 
> recursive test cases.
> ---
>
> Key: HDFS-15632
> URL: https://issues.apache.org/jira/browse/HDFS-15632
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Anton Kutuzov
>Priority: Major
>  Labels: newbie, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{AbstractContractDeleteTest.testDeleteNonexistentPathRecursive()}} should 
> call {{delete(path, true)}} rather than {{false}}
> Also {{AbstractContractDeleteTest.testDeleteNonexistentPathNonRecursive()}} 
> has a wrong assert message. Should be {{"... attempting to non-recursively 
> delete ..."}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15751) Add documentation for msync() API to filesystem.md

2021-01-19 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268314#comment-17268314
 ] 

Konstantin Shvachko commented on HDFS-15751:


Created HADOOP-17477 to track implementation of {{msync()}} for {{ViewFS}}

> Add documentation for msync() API to filesystem.md
> --
>
> Key: HDFS-15751
> URL: https://issues.apache.org/jira/browse/HDFS-15751
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3
>
> Attachments: HDFS-15751-01.patch, HDFS-15751-02.patch, 
> HDFS-15751-03.patch
>
>
> HDFS-15567 introduced new {{FileSystem}} call {{msync()}}. Should add it to 
> the API definitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15751) Add documentation for msync() API to filesystem.md

2021-01-03 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257840#comment-17257840
 ] 

Konstantin Shvachko commented on HDFS-15751:


Thanks guys for taking care of that.

> Add documentation for msync() API to filesystem.md
> --
>
> Key: HDFS-15751
> URL: https://issues.apache.org/jira/browse/HDFS-15751
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3
>
> Attachments: HDFS-15751-01.patch, HDFS-15751-02.patch, 
> HDFS-15751-03.patch
>
>
> HDFS-15567 introduced new {{FileSystem}} call {{msync()}}. Should add it to 
> the API definitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15751) Add documentation for msync() API to filesystem.md

2020-12-24 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-15751:
---
Status: Patch Available  (was: Open)

> Add documentation for msync() API to filesystem.md
> --
>
> Key: HDFS-15751
> URL: https://issues.apache.org/jira/browse/HDFS-15751
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15751-01.patch
>
>
> HDFS-15567 introduced new {{FileSystem}} call {{msync()}}. Should add it to 
> the API definitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15751) Add documentation for msync() API to filesystem.md

2020-12-24 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254662#comment-17254662
 ] 

Konstantin Shvachko commented on HDFS-15751:


Added documentation for {{msync()}}.
I put a reference to HDFS documentation describing [Consistent Reads from 
Observer|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html]
 particulalrly the semantics of {{msync()}} in HDFS. Otherwise it is hard to 
define it in abstract terms without the context.

I was also thinking about {{AbstractContract}}* type tests for {{msync}} but 
could not think of anything valuable that can be tested here. Essentially we 
need call {{mkdir()}} and then verify that after {{msync}} the metadata exists 
via a read call, which we do for HDFS in {{TestConsistentReadsObserver}}. But 
it is a probabilistic thing as it can succeed even without synchronization if 
standby catches up fast enough. I guess testing consistency contracts is 
similar to atomicity, which we don't test in {{AbstractContract}} tests, since 
it not clear how.

> Add documentation for msync() API to filesystem.md
> --
>
> Key: HDFS-15751
> URL: https://issues.apache.org/jira/browse/HDFS-15751
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15751-01.patch
>
>
> HDFS-15567 introduced new {{FileSystem}} call {{msync()}}. Should add it to 
> the API definitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15751) Add documentation for msync() API to filesystem.md

2020-12-24 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-15751:
---
Attachment: HDFS-15751-01.patch

> Add documentation for msync() API to filesystem.md
> --
>
> Key: HDFS-15751
> URL: https://issues.apache.org/jira/browse/HDFS-15751
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15751-01.patch
>
>
> HDFS-15567 introduced new {{FileSystem}} call {{msync()}}. Should add it to 
> the API definitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15751) Add documentation for msync() API to filesystem.md

2020-12-24 Thread Konstantin Shvachko (Jira)
Konstantin Shvachko created HDFS-15751:
--

 Summary: Add documentation for msync() API to filesystem.md
 Key: HDFS-15751
 URL: https://issues.apache.org/jira/browse/HDFS-15751
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko


HDFS-15567 introduced new {{FileSystem}} call {{msync()}}. Should add it to the 
API definitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15746) Standby NameNode crash when replay editlog

2020-12-24 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254641#comment-17254641
 ] 

Konstantin Shvachko commented on HDFS-15746:


Looks like your branch is either quite old or diverged a lot, it doesn't look 
like any of Apache branches based on line numbers, etc.
On trunk and all other maintained branches up to 2.10 there is now place were 
an NPE can happen inside {{BlockInfo.setGenerationStampAndVerifyReplicas()}}.
So it could be specific to branch-2.7 or even your own fork.

I found that you already reported that same problem in HDFS-14529 about [a year 
ago|https://issues.apache.org/jira/browse/HDFS-14529?focusedCommentId=16970092=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16970092].
 You might want to look at Jiras related to this issue. Or upgrade your HDFS. I 
believe this problem had been solved in later releases.

> Standby NameNode crash when replay editlog
> --
>
> Key: HDFS-15746
> URL: https://issues.apache.org/jira/browse/HDFS-15746
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-15746.001.patch
>
>
> Standby NameNode meet NPE and crash when replay editlog, After dig log and 
> source code, Not found the root cause. But some information may be useful for 
> this case.
> a. before SBN crash, ANN do one lease recovery.
> {code:java}
> 2020-12-23 12:37:45,946 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.internalReleaseLease: $PATH has not been closed. Lease recovery is 
> in progress. RecoveryId = 21696709510 for block blk_*_21658833701
> {code}
> b. then one Datanode Volumn failed which manage one replica of 
> blk_*_21658833701 after lease recovery.
> c. after half one hour, SBN crash because NPE as following.
> {code:java}
> 2020-12-23 13:13:36,703 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
> on operation CloseOp [length=0, inodeId=0, path=$PATH, replication=3, 
> mtime=1608698268201, atime=1608343529481, blockSize=268435456, 
> blocks=[blk_$i_$j], permissions=user:group:rw-r--r--, aclEntries=null, 
> clientName=, clientMachine=, overwrite=false, storagePolicyId=0, 
> opCode=OP_CLOSE, txid=$txid]
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.setGenerationStampAndVerifyReplicas(BlockInfo.java:455)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.commitBlock(BlockInfo.java:476)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.forceCompleteBlock(BlockManager.java:1248)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1065)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:244)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:152)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:843)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1706)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:428)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)
> 2020-12-23 13:13:36,703 ERROR org.apache.hadoop.ipc.Server: Error in Reader
> java.nio.channels.ClosedChannelException
> at 
> java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:197)
> at 
> org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:1053)
> at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:1034)
> 2020-12-23 13:13:36,703 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 10.16.39.26:50010 is added to blk_22374572883_21672067156 

[jira] [Commented] (HDFS-15746) Standby NameNode crash when replay editlog

2020-12-23 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254245#comment-17254245
 ] 

Konstantin Shvachko commented on HDFS-15746:


[~hexiaoqiao] thanks for reporting this. Which version of Hadoop do you see 
this with?
I agree with [~elgoiri] it would be really good to understand the root cause, 
so that we could add a unit test. 

> Standby NameNode crash when replay editlog
> --
>
> Key: HDFS-15746
> URL: https://issues.apache.org/jira/browse/HDFS-15746
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-15746.001.patch
>
>
> Standby NameNode meet NPE and crash when replay editlog, After dig log and 
> source code, Not found the root cause. But some information may be useful for 
> this case.
> a. before SBN crash, ANN do one lease recovery.
> {code:java}
> 2020-12-23 12:37:45,946 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.internalReleaseLease: $PATH has not been closed. Lease recovery is 
> in progress. RecoveryId = 21696709510 for block blk_*_21658833701
> {code}
> b. then one Datanode Volumn failed which manage one replica of 
> blk_*_21658833701 after lease recovery.
> c. after half one hour, SBN crash because NPE as following.
> {code:java}
> 2020-12-23 13:13:36,703 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
> on operation CloseOp [length=0, inodeId=0, path=$PATH, replication=3, 
> mtime=1608698268201, atime=1608343529481, blockSize=268435456, 
> blocks=[blk_$i_$j], permissions=user:group:rw-r--r--, aclEntries=null, 
> clientName=, clientMachine=, overwrite=false, storagePolicyId=0, 
> opCode=OP_CLOSE, txid=$txid]
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.setGenerationStampAndVerifyReplicas(BlockInfo.java:455)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.commitBlock(BlockInfo.java:476)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.forceCompleteBlock(BlockManager.java:1248)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1065)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:244)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:152)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:843)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1706)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:428)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)
> 2020-12-23 13:13:36,703 ERROR org.apache.hadoop.ipc.Server: Error in Reader
> java.nio.channels.ClosedChannelException
> at 
> java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:197)
> at 
> org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:1053)
> at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:1034)
> 2020-12-23 13:13:36,703 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 10.16.39.26:50010 is added to blk_22374572883_21672067156 
> size 58762255
> 2020-12-23 13:13:36,704 FATAL 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error 
> encountered while tailing edits. Shutting down standby NN.
> java.io.IOException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:254)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:152)
> at 
> 

[jira] [Commented] (HDFS-15567) [SBN Read] HDFS should expose msync() API to allow downstream applications call it explicitly.

2020-12-13 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248677#comment-17248677
 ] 

Konstantin Shvachko commented on HDFS-15567:


Hey Steve,
I am not sure I fully understand what is broken here. I believe it is not an 
incompatible change. 
Could you please explain what you think the process is.
Would be best if you could share a link to a document describing it.

I would be glad to follow up with tests and documentation that are needed.

As you can see I proposed multiple solutions to the problem here.
Seemed nobody was objecting, so I chose one and explained why.
I believe we call it lazy consensus.

> [SBN Read] HDFS should expose msync() API to allow downstream applications 
> call it explicitly.
> --
>
> Key: HDFS-15567
> URL: https://issues.apache.org/jira/browse/HDFS-15567
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, hdfs-client
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3
>
> Attachments: HDFS-15567.001.patch, HDFS-15567.002.patch
>
>
> Consistent reads from Standby introduced {{msync()}} API HDFS-13688, which 
> updates client's state ID with current state of the Active NameNode to 
> guarantee consistency of subsequent calls to an ObserverNode. Currently this 
> API is exposed via {{DFSClient}} only, which makes it hard for applications 
> to access {{msync()}}. One way is to use something like this:
> {code}
> if(fs instanceof DistributedFileSystem) {
>   ((DistributedFileSystem)fs).getClient().msync();
> }
> {code}
> This should be exposed both for {{FileSystem}} and {{FileContext}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15567) [SBN Read] HDFS should expose msync() API to allow downstream applications call it explicitly.

2020-12-13 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210550#comment-17210550
 ] 

Konstantin Shvachko edited comment on HDFS-15567 at 12/13/20, 9:11 PM:
---

Thanks for taking a look [~vagarychen].
* Both {{FileSystem}} and {{AbstractFileSystem}} throw 
{{UnsupportedOperationException}} with my patch. This is a standard pattern and 
a way for clients to learn if the operation is supported or not in the 
implementation. No-op will hide the problem and for {{msync}} in particular can 
lead to inconsistent results further down the road, which is hard to debug as 
we both know.
* Logging in the tests is not "required", but it helped a lot in debugging 
problems that I fixed when some tests were failing. I decided to leave them in 
the code in case something breaks in the future. I agree we usually try to 
restrict change to bare minimum to avoid conflicts while backporting. In this 
case with code relatively recent I don't see it a blocker for backports.
* Ran tests that failed on Jenkins locally. All passed. They are long running 
tests, which frequently fail on Jenkins builds.


was (Author: shv):
Thanks for taking a look [~vagarychen].
* Both {{FileSystem}} and {{AbstractFileSystem}} throw 
{{UnsupportedOperationException}} with my patch. This is a standard pattern and 
a way for clients to learn if the operation is supported or not in the 
implementation. No-op will hide the problem and for {{mscyn}} in particular can 
lead to inconsistent results further down the road, which is hard to debug as 
we both know.
* Logging in the tests is not "required", but it helped a lot in debugging 
problems that I fixed when some tests were failing. I decided to leave them in 
the code in case something breaks in the future. I agree we usually try to 
restrict change to bare minimum to avoid conflicts while backporting. In this 
case with code relatively recent I don't see it a blocker for backports.
* Ran tests that failed on Jenkins locally. All passed. They are long running 
tests, which frequently fail on Jenkins builds.

> [SBN Read] HDFS should expose msync() API to allow downstream applications 
> call it explicitly.
> --
>
> Key: HDFS-15567
> URL: https://issues.apache.org/jira/browse/HDFS-15567
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, hdfs-client
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3
>
> Attachments: HDFS-15567.001.patch, HDFS-15567.002.patch
>
>
> Consistent reads from Standby introduced {{msync()}} API HDFS-13688, which 
> updates client's state ID with current state of the Active NameNode to 
> guarantee consistency of subsequent calls to an ObserverNode. Currently this 
> API is exposed via {{DFSClient}} only, which makes it hard for applications 
> to access {{msync()}}. One way is to use something like this:
> {code}
> if(fs instanceof DistributedFileSystem) {
>   ((DistributedFileSystem)fs).getClient().msync();
> }
> {code}
> This should be exposed both for {{FileSystem}} and {{FileContext}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.

2020-11-25 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238894#comment-17238894
 ] 

Konstantin Shvachko commented on HDFS-4452:
---

This is an interesting observation [~honestman]. You are right in your scenario 
the block creation will fail and the client will have to retry either 
re-writing just the last block or the entire file. The good thing is that the 
namespace remains in a consistent state. Which was the problem with the 
original issue in this jira.
This is essentially a scenario for "Case 3" of {{analyzeFileState()}}. It would 
be good to confirm with a unit test this is indeed possible. NameNode should 
not violate the contract of persisting all the data that was successfully 
reported to clients.

> getAdditionalBlock() can create multiple blocks if the client times out and 
> retries.
> 
>
> Key: HDFS-4452
> URL: https://issues.apache.org/jira/browse/HDFS-4452
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Critical
> Fix For: 2.0.3-alpha
>
> Attachments: TestAddBlockRetry.java, 
> getAdditionalBlock-branch2.patch, getAdditionalBlock.patch, 
> getAdditionalBlock.patch, getAdditionalBlock.patch
>
>
> HDFS client tries to addBlock() to a file. If NameNode is busy the client can 
> timeout and will reissue the same request again. The two requests will race 
> with each other in {{FSNamesystem.getAdditionalBlock()}}, which can result in 
> creating two new blocks on the NameNode while the client will know of only 
> one of them. This eventually results in {{NotReplicatedYetException}} because 
> the extra block is never reported by any DataNode, which stalls file creation 
> and puts it in invalid state with an empty block in the middle.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15562) StandbyCheckpointer will do checkpoint repeatedly while connecting observer/active namenode failed

2020-11-09 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228786#comment-17228786
 ] 

Konstantin Shvachko commented on HDFS-15562:


Hey guys, I think generally the checkpointer should persist until the 
checkpoint completes and the image is transferred. With large image we see 
transfers fail once in a while, so just ignoring image transfer failures isn't 
right.
I understand that with multiple ObserverNodes some of them can be down.
We already have logic for ActiveNN and ObserverNodes to reject an image if they 
already have one recent enough. So frequent checkpoints should not overwhelm 
the active or the Observers.
We may add a logic for the Checkpointer to not re-create an image if it was 
created recently. But this does not seem to be a big concern.

> StandbyCheckpointer will do checkpoint repeatedly while connecting 
> observer/active namenode failed
> --
>
> Key: HDFS-15562
> URL: https://issues.apache.org/jira/browse/HDFS-15562
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: SunHao
>Assignee: Aihua Xu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-15562.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We find the standby namenode will do checkpoint over and over while 
> connecting observer/active namenode failed.
> StandbyCheckpointer won't update “lastCheckpointTime” when upload new fsimage 
> to the other namenode failed, so that the standby namenode will keep doing 
> checkpoint repeatedly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15623) Respect configured values of rpc.engine

2020-11-06 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-15623.

Fix Version/s: 3.2.3
   2.10.2
   3.1.5
   3.4.0
   3.3.1
 Hadoop Flags: Reviewed
   Resolution: Fixed

I just committed to trunk and branches 3.3, 3.2, 3.1, 2.10.
Thank you [~hchaverri]

> Respect configured values of rpc.engine
> ---
>
> Key: HDFS-15623
> URL: https://issues.apache.org/jira/browse/HDFS-15623
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Hector Sandoval Chaverri
>Assignee: Hector Sandoval Chaverri
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The HDFS Configuration allows users to specify the RPCEngine implementation 
> to use when communicating with Datanodes and Namenodes. However, the value is 
> overwritten to ProtobufRpcEngine.class in different classes. As an example in 
> NameNodeRpcServer:
> {{RPC.setProtocolEngine(conf, ClientNamenodeProtocolPB.class, 
> ProtobufRpcEngine.class);}}
> {{The configured value of rpc.engine.[protocolName] should be respected to 
> allow for other implementations of RPCEngine to be used}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15623) Respect configured values of rpc.engine

2020-11-06 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227602#comment-17227602
 ] 

Konstantin Shvachko commented on HDFS-15623:


Was looking into this a bit more.
 * So the right solution would be to use standard config default mechanism, 
that is
 ## remove all hardcoded {{RPC.setProtocolEngine()}}
 ## and instead change {{RPC.getProtocolEngine()}} to use 
{{ProtobufRpcEngine2}} as the default {{RpcEngine}} rather than 
{{WritableRpcEngine}}
 * But this can break some unit tests, which still expect {{WritableRpcEngine}} 
as people learned in HADOOP-12579.
 * Even though the problem above may have been fixed in hadoop 3, it is 
definitely present in hadoop 2.

So I think this patch does the right thing if we want to make RpcEngines 
plugable again as it was originally intended in HADOOP-6422.
 Will commit this shortly.

> Respect configured values of rpc.engine
> ---
>
> Key: HDFS-15623
> URL: https://issues.apache.org/jira/browse/HDFS-15623
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Hector Sandoval Chaverri
>Assignee: Hector Sandoval Chaverri
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The HDFS Configuration allows users to specify the RPCEngine implementation 
> to use when communicating with Datanodes and Namenodes. However, the value is 
> overwritten to ProtobufRpcEngine.class in different classes. As an example in 
> NameNodeRpcServer:
> {{RPC.setProtocolEngine(conf, ClientNamenodeProtocolPB.class, 
> ProtobufRpcEngine.class);}}
> {{The configured value of rpc.engine.[protocolName] should be respected to 
> allow for other implementations of RPCEngine to be used}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15665) Balancer logging improvement

2020-11-03 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-15665:
---
Fix Version/s: 3.2.3
   2.10.2
   3.1.5
   3.4.0
   3.3.1
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

I just committed this to trunk and branches 3.3, 3.2, 3.1, and 2.10. Thanks 
Chen for the review.
There were conflicts for 3.1 and 2.10 related to the LOG type. I changes 
Balancer and DIspatcher logs to sl4j

> Balancer logging improvement
> 
>
> Key: HDFS-15665
> URL: https://issues.apache.org/jira/browse/HDFS-15665
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3
>
> Attachments: HDFS-15665.001.patch, HDFS-15665.002.patch
>
>
> It would be good to have Balancer log all relevant configuration parameters 
> on each iteration along with some data, which reflects its progress and the 
> amount of resources it involves.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15665) Balancer logging improvement

2020-11-02 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-15665:
---
Attachment: HDFS-15665.002.patch

> Balancer logging improvement
> 
>
> Key: HDFS-15665
> URL: https://issues.apache.org/jira/browse/HDFS-15665
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15665.001.patch, HDFS-15665.002.patch
>
>
> It would be good to have Balancer log all relevant configuration parameters 
> on each iteration along with some data, which reflects its progress and the 
> amount of resources it involves.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15665) Balancer logging improvement

2020-11-02 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225092#comment-17225092
 ] 

Konstantin Shvachko commented on HDFS-15665:


Thanks for the review [~vagarychen].
* {{getInt()}} is actually called to print a log message for the parameter. The 
value itself may not be used in the Balancer itself.
* Two log messages looks better infact. Because the firs message is pretty long 
as it prints {{NameNodeConnector}} including URI and block pool id
{noformat:nowrap}
2020-11-02 10:42:59,939 [Listener at localhost/64077] INFO  balancer.Balancer 
(Balancer.java:runOneIteration(641)) - Will move 100.79 MB in this iteration 
for NameNodeConnector[namenodeUri=hdfs://localhost:64069, 
bpid=BP-79516876-172.18.170.12-1604342573024]
{noformat}
So if I append the second line it will be hard to read the logs.

Will update the patch with checkstyle fixes.

> Balancer logging improvement
> 
>
> Key: HDFS-15665
> URL: https://issues.apache.org/jira/browse/HDFS-15665
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15665.001.patch
>
>
> It would be good to have Balancer log all relevant configuration parameters 
> on each iteration along with some data, which reflects its progress and the 
> amount of resources it involves.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15665) Balancer logging improvement

2020-11-02 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224959#comment-17224959
 ] 

Konstantin Shvachko commented on HDFS-15665:


Attached a patch
* It logs additional config parameters {{dfs.namenode.get-blocks.max-qps}} and 
{{dfs.datanode.balance.bandwidthPerSec}}
* Counts and logs number of blocks (in addition to bytes) moved in each 
iteration
* Logs number of DN targets in each iteration
* Prints the NameNode address for each iteration, which is useful in federation.

> Balancer logging improvement
> 
>
> Key: HDFS-15665
> URL: https://issues.apache.org/jira/browse/HDFS-15665
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15665.001.patch
>
>
> It would be good to have Balancer log all relevant configuration parameters 
> on each iteration along with some data, which reflects its progress and the 
> amount of resources it involves.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15665) Balancer logging improvement

2020-11-02 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-15665:
---
Status: Patch Available  (was: Open)

> Balancer logging improvement
> 
>
> Key: HDFS-15665
> URL: https://issues.apache.org/jira/browse/HDFS-15665
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15665.001.patch
>
>
> It would be good to have Balancer log all relevant configuration parameters 
> on each iteration along with some data, which reflects its progress and the 
> amount of resources it involves.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15665) Balancer logging improvement

2020-11-02 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-15665:
---
Attachment: HDFS-15665.001.patch

> Balancer logging improvement
> 
>
> Key: HDFS-15665
> URL: https://issues.apache.org/jira/browse/HDFS-15665
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15665.001.patch
>
>
> It would be good to have Balancer log all relevant configuration parameters 
> on each iteration along with some data, which reflects its progress and the 
> amount of resources it involves.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   10   >