[jira] [Commented] (HDFS-17366) NameNode Fine-Grained Locking via Namespace Tree
[ https://issues.apache.org/jira/browse/HDFS-17366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839393#comment-17839393 ] Brahma Reddy Battula commented on HDFS-17366: - is it good to upload the design doc to this Jira also (May be somebody will not be having access to google docs or maintenance ...)? Are we going to create separate branch for this development (Please ignore if it's already done).? > NameNode Fine-Grained Locking via Namespace Tree > > > Key: HDFS-17366 > URL: https://issues.apache.org/jira/browse/HDFS-17366 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > > As we all known, the write performance of NameNode is limited by the global > lock. We target to enable fine-grained locking based on the Namespace tree to > improve the performance of NameNode write operations. > There are multiple motivations for creating this ticket: > * We have implemented this fine-grained locking and gained nearly 7x > performance improvements in our prod environment > * Other companies made similar improvements based on their internal branch. > Internal branches are quite different from the community, so few feedback and > discussions in the community. > * The topic of fine-grained locking has been discussed for a very long time, > but still without any results. > > We implemented this fine-gained locking based on the namespace tree to > maximize the number of concurrency for disjoint or independent operations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17377) Long Standing High Risk CVE in Hadoop
[ https://issues.apache.org/jira/browse/HDFS-17377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816374#comment-17816374 ] Brahma Reddy Battula commented on HDFS-17377: - [~prathapsagars] thanks for reporting this. Please mention the corresponding hadoop version which you are using for each finding. > Long Standing High Risk CVE in Hadoop > - > > Key: HDFS-17377 > URL: https://issues.apache.org/jira/browse/HDFS-17377 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.4.0 >Reporter: Prathap Sagar S >Priority: Major > Attachments: HADOOP_CVE_LIST.xlsx > > > Our ongoing security scans are turning up several long-standing CVEs, even in > the most recent version of Hadoop, which is making it difficult for us to use > Hadoop in our echo system. A comprehensive list of all the long-standing CVEs > and the JARs holding them is attached. I'm asking for community assistance to > address these high-risk vulnerabilities as soon as possible. > > |Vulnerability ID|Severity|Package name|Package version|Package type|Package > path|Package suggested fix| > |CVE-2023-2976|High|com.google.guava:guava|30.1.1-jre|java|/hadoop-3.4.0/share/hadoop/common/lib/hadoop-shaded-guava-1.1.1.jar|v32.0.0-android| > |CVE-2023-2976|High|com.google.guava:guava|30.1.1-jre|java|/hadoop-3.4.0/share/hadoop/client/hadoop-client-runtime-3.4.0-SNAPSHOT.jar|v32.0.0-android| > |CVE-2023-2976|High|com.google.guava:guava|12.0.1|java|/hadoop-3.4.0/share/hadoop/yarn/timelineservice/lib/guava-12.0.1.jar|v32.0.0-android| > |CVE-2023-2976|High|com.google.guava:guava|27.0-jre|java|/hadoop-3.4.0/share/hadoop/hdfs/lib/guava-27.0-jre.jar|v32.0.0-android| > |CVE-2023-2976|High|com.google.guava:guava|27.0-jre|java|/hadoop-3.4.0/share/hadoop/common/lib/guava-27.0-jre.jar|v32.0.0-android| > |CVE-2023-2976|High|com.google.guava:guava|30.1.1-jre|java|/hadoop-3.4.0/share/hadoop/hdfs/lib/hadoop-shaded-guava-1.1.1.jar|v32.0.0-android| > |CVE-2022-25647|High|com.google.code.gson:gson|2.8.5|java|/hadoop-3.4.0/share/hadoop/yarn/timelineservice/lib/hbase-shaded-gson-3.0.0.jar|v2.8.9| > |CVE-2022-3171|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/client/hadoop-client-runtime-3.4.0-SNAPSHOT.jar|v3.16.3| > |CVE-2022-3171|High|com.google.protobuf:protobuf-java|2.5.0|java|/hadoop-3.4.0/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar|v3.16.3| > |CVE-2022-3171|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/common/lib/hadoop-shaded-guava-1.1.1.jar|v3.16.3| > |CVE-2022-3171|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/common/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3| > |CVE-2022-3509|High|com.google.protobuf:protobuf-java|2.5.0|java|/hadoop-3.4.0/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar|v3.16.3| > |CVE-2022-3509|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/client/hadoop-client-runtime-3.4.0-SNAPSHOT.jar|v3.16.3| > |CVE-2022-3509|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/hdfs/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3| > |CVE-2022-3509|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/common/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3| > |CVE-2022-3510|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/hdfs/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3| > |CVE-2022-3510|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/common/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3| > |CVE-2022-3510|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/client/hadoop-client-runtime-3.4.0-SNAPSHOT.jar|v3.16.3| > |CVE-2022-3510|High|com.google.protobuf:protobuf-java|2.5.0|java|/hadoop-3.4.0/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar|v3.16.3| > |CVE-2023-39410|High|org.apache.avro:avro|1.9.2|java|/hadoop-3.4.0/share/hadoop/hdfs/lib/avro-1.9.2.jar|v1.11.3| > |CVE-2023-39410|High|org.apache.avro:avro|1.9.2|java|/hadoop-3.4.0/share/hadoop/client/hadoop-client-runtime-3.4.0-SNAPSHOT.jar|v1.11.3| > |CVE-2023-39410|High|org.apache.avro:avro|1.9.2|java|/hadoop-3.4.0/share/hadoop/common/lib/avro-1.9.2.jar|v1.11.3| > |CVE-2021-22570|Medium|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/client/hadoop-client-runtime-3.4.0-SNAPSHOT.jar|v3.16.3| > |CVE-2021-22570|Medium|com.google.protobuf:protobuf-java|2.5.0|java|/hadoop-3.4.0/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar|v3.16.3| > |CVE-2021-22570|Medium|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/hdfs/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3| >
[jira] [Commented] (HDFS-7343) HDFS smart storage management
[ https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738021#comment-17738021 ] Brahma Reddy Battula commented on HDFS-7343: [~PhiloHe] thanks. {quote}iii) This project is under maintenance phase. We have no plan to move it into HDFS or somewhere as subproject, or make it become an apache incubation project. {quote} ok.. Not sure, we can move as apache incubation project or not. I am in favour of moving to incubation or subproject. Let's see if anybody else have any other thoughts on this. > HDFS smart storage management > - > > Key: HDFS-7343 > URL: https://issues.apache.org/jira/browse/HDFS-7343 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Kai Zheng >Assignee: Wei Zhou >Priority: Major > Attachments: HDFS-Smart-Storage-Management-update.pdf, > HDFS-Smart-Storage-Management.pdf, > HDFSSmartStorageManagement-General-20170315.pdf, > HDFSSmartStorageManagement-Phase1-20170315.pdf, access_count_tables.jpg, > move.jpg, tables_in_ssm.xlsx > > > As discussed in HDFS-7285, it would be better to have a comprehensive and > flexible storage policy engine considering file attributes, metadata, data > temperature, storage type, EC codec, available hardware capabilities, > user/application preference and etc. > Modified the title for re-purpose. > We'd extend this effort some bit and aim to work on a comprehensive solution > to provide smart storage management service in order for convenient, > intelligent and effective utilizing of erasure coding or replicas, HDFS cache > facility, HSM offering, and all kinds of tools (balancer, mover, disk > balancer and so on) in a large cluster. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16652) Upgrade jquery datatable version references to v1.10.19
[ https://issues.apache.org/jira/browse/HDFS-16652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-16652: Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) I am closing this as it's already merged to trunk. I can cherry-pick other branches too. Please let me know which all branches. only for branch-3.3.? > Upgrade jquery datatable version references to v1.10.19 > --- > > Key: HDFS-16652 > URL: https://issues.apache.org/jira/browse/HDFS-16652 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: D M Murali Krishna Reddy >Assignee: D M Murali Krishna Reddy >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-16652.001.patch > > Time Spent: 50m > Remaining Estimate: 0h > > Upgrade jquery datatable version references in hdfs webapp to v1.10.19 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16860) Upgrade moment.min.js to 2.29.4
[ https://issues.apache.org/jira/browse/HDFS-16860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17645082#comment-17645082 ] Brahma Reddy Battula commented on HDFS-16860: - [~anuragparvatikar] added you as contributor. Please feel free to assign. > Upgrade moment.min.js to 2.29.4 > --- > > Key: HDFS-16860 > URL: https://issues.apache.org/jira/browse/HDFS-16860 > Project: Hadoop HDFS > Issue Type: Improvement > Components: build, ui >Affects Versions: 3.4.0 >Reporter: D M Murali Krishna Reddy >Assignee: D M Murali Krishna Reddy >Priority: Major > Labels: pull-request-available, transitive-cve > > Upgrade moment.min.js to 2.29.4 to resolve > https://nvd.nist.gov/vuln/detail/CVE-2022-31129 > "Users may notice a noticeable slowdown is observed with inputs above 10k > characters. Users who pass user-provided strings without sanity length checks > to moment constructor are vulnerable to (Re)DoS attacks. The problem is > patched in 2.29.4" > this only appears to affect the UI, not the yarn services, so it is a > self-harm DoS rather than anything important. "if you pass in big strings the > ui slows down" -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7343) HDFS smart storage management
[ https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17641615#comment-17641615 ] Brahma Reddy Battula commented on HDFS-7343: {quote}Hi Brahma, currently we have no plan to merge this feature to upstream. We have a repo to maintain this project. See [https://github.com/Intel-bigdata/SSM] {quote} Ok. thanks. [~zhouwei] /[~PhiloHe] i) Any features are pending.? is it production ready..? ii) kafka and ZK are required to deploy this.? iii) any chance to move this as subproject or incubation to apache..? > HDFS smart storage management > - > > Key: HDFS-7343 > URL: https://issues.apache.org/jira/browse/HDFS-7343 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Kai Zheng >Assignee: Wei Zhou >Priority: Major > Attachments: HDFS-Smart-Storage-Management-update.pdf, > HDFS-Smart-Storage-Management.pdf, > HDFSSmartStorageManagement-General-20170315.pdf, > HDFSSmartStorageManagement-Phase1-20170315.pdf, access_count_tables.jpg, > move.jpg, tables_in_ssm.xlsx > > > As discussed in HDFS-7285, it would be better to have a comprehensive and > flexible storage policy engine considering file attributes, metadata, data > temperature, storage type, EC codec, available hardware capabilities, > user/application preference and etc. > Modified the title for re-purpose. > We'd extend this effort some bit and aim to work on a comprehensive solution > to provide smart storage management service in order for convenient, > intelligent and effective utilizing of erasure coding or replicas, HDFS cache > facility, HSM offering, and all kinds of tools (balancer, mover, disk > balancer and so on) in a large cluster. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17611945#comment-17611945 ] Brahma Reddy Battula commented on HDFS-14509: - Hope I remeber correctly,Yes, you need to have this patch before you go upgrade. > DN throws InvalidToken due to inequality of password when upgrade NN 2.x to > 3.x > --- > > Key: HDFS-14509 > URL: https://issues.apache.org/jira/browse/HDFS-14509 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yuxuan Wang >Assignee: Yuxuan Wang >Priority: Blocker > Labels: release-blocker > Fix For: 2.10.0, 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-14509-001.patch, HDFS-14509-002.patch, > HDFS-14509-003.patch, HDFS-14509-branch-2.001.patch > > > According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need > upgrade NN first. And there will be a intermediate state that NN is 3.x and > DN is 2.x. At that moment, if a client reads (or writes) a block, it will get > a block token from NN and then deliver the token to DN who can verify the > token. But the verification in the code now is : > {code:title=BlockTokenSecretManager.java|borderStyle=solid} > public void checkAccess(...) > { > ... > id.readFields(new DataInputStream(new > ByteArrayInputStream(token.getIdentifier(; > ... > if (!Arrays.equals(retrievePassword(id), token.getPassword())) { > throw new InvalidToken("Block token with " + id.toString() > + " doesn't have the correct token password"); > } > } > {code} > And {{retrievePassword(id)}} is: > {code} > public byte[] retrievePassword(BlockTokenIdentifier identifier) > { > ... > return createPassword(identifier.getBytes(), key.getKey()); > } > {code} > So, if NN's identifier add new fields, DN will lose the fields and compute > wrong password. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-16652) Upgrade jquery datatable version references to v1.10.19
[ https://issues.apache.org/jira/browse/HDFS-16652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566827#comment-17566827 ] Brahma Reddy Battula edited comment on HDFS-16652 at 7/14/22 1:00 PM: -- [~dmmkr] thanks for contributing . Committed to trunk (PR #4562). Can you update PR for branch-3.2 and branch-3.3 also..? was (Author: brahmareddy): [~dmmkr] thanks for contributing . Committed to trunk. Can you update PR for branch-3.2 and branch-3.3 also..? > Upgrade jquery datatable version references to v1.10.19 > --- > > Key: HDFS-16652 > URL: https://issues.apache.org/jira/browse/HDFS-16652 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: D M Murali Krishna Reddy >Assignee: D M Murali Krishna Reddy >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-16652.001.patch > > Time Spent: 50m > Remaining Estimate: 0h > > Upgrade jquery datatable version references in hdfs webapp to v1.10.19 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16652) Upgrade jquery datatable version references to v1.10.19
[ https://issues.apache.org/jira/browse/HDFS-16652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-16652: Fix Version/s: 3.4.0 > Upgrade jquery datatable version references to v1.10.19 > --- > > Key: HDFS-16652 > URL: https://issues.apache.org/jira/browse/HDFS-16652 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: D M Murali Krishna Reddy >Assignee: D M Murali Krishna Reddy >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-16652.001.patch > > Time Spent: 50m > Remaining Estimate: 0h > > Upgrade jquery datatable version references in hdfs webapp to v1.10.19 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16652) Upgrade jquery datatable version references to v1.10.19
[ https://issues.apache.org/jira/browse/HDFS-16652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566827#comment-17566827 ] Brahma Reddy Battula commented on HDFS-16652: - [~dmmkr] thanks for contributing . Committed to trunk. Can you update PR for branch-3.2 and branch-3.3 also..? > Upgrade jquery datatable version references to v1.10.19 > --- > > Key: HDFS-16652 > URL: https://issues.apache.org/jira/browse/HDFS-16652 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: D M Murali Krishna Reddy >Assignee: D M Murali Krishna Reddy >Priority: Major > Labels: pull-request-available > Attachments: HDFS-16652.001.patch > > Time Spent: 50m > Remaining Estimate: 0h > > Upgrade jquery datatable version references in hdfs webapp to v1.10.19 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15443) Setting dfs.datanode.max.transfer.threads to a very small value can cause strange failure.
[ https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529787#comment-17529787 ] Brahma Reddy Battula commented on HDFS-15443: - Basically this config is to specify the maximum number of files that a DataNode can serve at any one time. Yes, CPU ( for checksum calculations) ,number of "open file descriptors" and memory availability for the stack's (Even based on the network bandwidth). {quote}are at 20k today and have 200TB of usable space each server, want to see if we can increase to 32k. {quote} if you see, there are above mentioned resources are enough and you are not reaching the expected the throughput you can increase. But IMO,20k itself is big number. > Setting dfs.datanode.max.transfer.threads to a very small value can cause > strange failure. > -- > > Key: HDFS-15443 > URL: https://issues.apache.org/jira/browse/HDFS-15443 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: AMC-team >Assignee: AMC-team >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, > HDFS-15443.002.patch, HDFS-15443.003.patch > > > Configuration parameter dfs.datanode.max.transfer.threads is to specify the > maximum number of threads to use for transferring data in and out of the DN. > This is a vital param that need to tune carefully. > {code:java} > // DataXceiverServer.java > // Make sure the xceiver count is not exceeded > intcurXceiverCount = datanode.getXceiverCount(); > if (curXceiverCount > maxXceiverCount) { > thrownewIOException("Xceiver count " + curXceiverCount > + " exceeds the limit of concurrent xceivers: " > + maxXceiverCount); > } > {code} > There are many issues that caused by not setting this param to an appropriate > value. However, there is no any check code to restrict the parameter. > Although having a hard-and-fast rule is difficult because we need to consider > number of cores, main memory etc, *we can prevent users from setting this > value to an absolute wrong value by accident.* (e.g. a negative value that > totally break the availability of datanode.) > *How to fix:* > Add proper check code for the parameter. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16562) Upgrade moment.min.js to 2.29.2
[ https://issues.apache.org/jira/browse/HDFS-16562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528948#comment-17528948 ] Brahma Reddy Battula commented on HDFS-16562: - [~dmmkr] thanks for reporting it.. Added you to contributor list. IMO, As score is more for this, it's better to fix. Can you install cluster with this and check this. > Upgrade moment.min.js to 2.29.2 > --- > > Key: HDFS-16562 > URL: https://issues.apache.org/jira/browse/HDFS-16562 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: D M Murali Krishna Reddy >Assignee: D M Murali Krishna Reddy >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Upgrade moment.min.js to 2.29.2 to resolve > [https://nvd.nist.gov/vuln/detail/CVE-2022-24785] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16364) Remove unnecessary brackets in NameNodeRpcServer#L453
[ https://issues.apache.org/jira/browse/HDFS-16364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452922#comment-17452922 ] Brahma Reddy Battula commented on HDFS-16364: - Committed to trunk. [~wangzhaohui] thanks for contribution. > Remove unnecessary brackets in NameNodeRpcServer#L453 > - > > Key: HDFS-16364 > URL: https://issues.apache.org/jira/browse/HDFS-16364 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: wangzhaohui >Assignee: wangzhaohui >Priority: Trivial > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16364) Remove unnecessary brackets in NameNodeRpcServer#L453
[ https://issues.apache.org/jira/browse/HDFS-16364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula resolved HDFS-16364. - Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed > Remove unnecessary brackets in NameNodeRpcServer#L453 > - > > Key: HDFS-16364 > URL: https://issues.apache.org/jira/browse/HDFS-16364 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: wangzhaohui >Assignee: wangzhaohui >Priority: Trivial > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16317) Backport HDFS-14729 for branch-3.2
[ https://issues.apache.org/jira/browse/HDFS-16317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451099#comment-17451099 ] Brahma Reddy Battula commented on HDFS-16317: - [~ananysin] thanks for contribution.. Committed to branch-3.2.. Can you upload patch for branch-3.2.3 also..? > Backport HDFS-14729 for branch-3.2 > -- > > Key: HDFS-16317 > URL: https://issues.apache.org/jira/browse/HDFS-16317 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 3.2.2 >Reporter: Ananya Singh >Assignee: Ananya Singh >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > Our security tool raised the following security flaw on Hadoop 3.2.2: > +[CVE-2015-9251 : > |http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2015-9251] > [https://nvd.nist.gov/vuln/detail/|https://nvd.nist.gov/vuln/detail/CVE-2021-29425] > > [CVE-2015-9251|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2015-9251]+ > +[CVE-2019-11358|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2019-11358] > : > [https://nvd.nist.gov/vuln/detail/|https://nvd.nist.gov/vuln/detail/CVE-2021-29425] > > [CVE-2019-11358|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2019-11358]+ > +[CVE-2020-11022 > |http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2020-11022] : > [https://nvd.nist.gov/vuln/detail/|https://nvd.nist.gov/vuln/detail/CVE-2021-29425] > > [CVE-2020-11022|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2020-11022]+ > > +[CVE-2020-11023 > |http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2020-11023] [ > |http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2020-11022] : > [https://nvd.nist.gov/vuln/detail/|https://nvd.nist.gov/vuln/detail/CVE-2021-29425] > > [CVE-2020-11023|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2020-11023]+ > > > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14729) Upgrade Bootstrap and jQuery versions used in HDFS UIs
[ https://issues.apache.org/jira/browse/HDFS-14729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17442246#comment-17442246 ] Brahma Reddy Battula commented on HDFS-14729: - There are conflicts while cherry-pick.So, it's better to raise anyother Jira to backport. [~ananysin] can you please raise Jira to backport.? > Upgrade Bootstrap and jQuery versions used in HDFS UIs > -- > > Key: HDFS-14729 > URL: https://issues.apache.org/jira/browse/HDFS-14729 > Project: Hadoop HDFS > Issue Type: Task > Components: ui >Reporter: Vivek Ratnavel Subramanian >Assignee: Vivek Ratnavel Subramanian >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14729.v1.patch > > > The current versions of bootstrap and jquery have multiple medium severity > CVEs reported till date and needs to be updated to the latest versions with > no reported CVEs. > > I suggest updating the following libraries: > ||Library||From version||To version|| > |Bootstrap|3.3.7|3.4.1| > |jQuery|3.3.1|3.4.1| -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14729) Upgrade Bootstrap and jQuery versions used in HDFS UIs
[ https://issues.apache.org/jira/browse/HDFS-14729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17442238#comment-17442238 ] Brahma Reddy Battula commented on HDFS-14729: - going to cherry-pick this to branch-3.2 > Upgrade Bootstrap and jQuery versions used in HDFS UIs > -- > > Key: HDFS-14729 > URL: https://issues.apache.org/jira/browse/HDFS-14729 > Project: Hadoop HDFS > Issue Type: Task > Components: ui >Reporter: Vivek Ratnavel Subramanian >Assignee: Vivek Ratnavel Subramanian >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14729.v1.patch > > > The current versions of bootstrap and jquery have multiple medium severity > CVEs reported till date and needs to be updated to the latest versions with > no reported CVEs. > > I suggest updating the following libraries: > ||Library||From version||To version|| > |Bootstrap|3.3.7|3.4.1| > |jQuery|3.3.1|3.4.1| -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14729) Upgrade Bootstrap and jQuery versions used in HDFS UIs
[ https://issues.apache.org/jira/browse/HDFS-14729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17425094#comment-17425094 ] Brahma Reddy Battula commented on HDFS-14729: - {quote}Can we backport this to branch-3.2? {quote} Sure.. can we raise one Jira for backport..? CC. [~vivekratnavel] and [~sunilg] > Upgrade Bootstrap and jQuery versions used in HDFS UIs > -- > > Key: HDFS-14729 > URL: https://issues.apache.org/jira/browse/HDFS-14729 > Project: Hadoop HDFS > Issue Type: Task > Components: ui >Reporter: Vivek Ratnavel Subramanian >Assignee: Vivek Ratnavel Subramanian >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14729.v1.patch > > > The current versions of bootstrap and jquery have multiple medium severity > CVEs reported till date and needs to be updated to the latest versions with > no reported CVEs. > > I suggest updating the following libraries: > ||Library||From version||To version|| > |Bootstrap|3.3.7|3.4.1| > |jQuery|3.3.1|3.4.1| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock
[ https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-15160: Fix Version/s: (was: 3.2.3) 3.2.4 > ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl > methods should use datanode readlock > --- > > Key: HDFS-15160 > URL: https://issues.apache.org/jira/browse/HDFS-15160 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.2.4 > > Attachments: HDFS-15160-branch-3.3-001.patch, HDFS-15160.001.patch, > HDFS-15160.002.patch, HDFS-15160.003.patch, HDFS-15160.004.patch, > HDFS-15160.005.patch, HDFS-15160.006.patch, HDFS-15160.007.patch, > HDFS-15160.008.patch, HDFS-15160.branch-3-3.001.patch, > image-2020-04-10-17-18-08-128.png, image-2020-04-10-17-18-55-938.png > > Time Spent: 6h 20m > Remaining Estimate: 0h > > Now we have HDFS-15150, we can start to move some DN operations to use the > read lock rather than the write lock to improve concurrence. The first step > is to make the changes to ReplicaMap, as many other methods make calls to it. > This Jira switches read operations against the volume map to use the readLock > rather than the write lock. > Additionally, some methods make a call to replicaMap.replicas() (eg > getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result > in a read only fashion, so they can also be switched to using a readLock. > Next is the directory scanner and disk balancer, which only require a read > lock. > Finally (for this Jira) are various "low hanging fruit" items in BlockSender > and fsdatasetImpl where is it fairly obvious they only need a read lock. > For now, I have avoided changing anything which looks too risky, as I think > its better to do any larger refactoring or risky changes each in their own > Jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock
[ https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414558#comment-17414558 ] Brahma Reddy Battula commented on HDFS-15160: - Looks this merged to 3.2.4 not 3.2.3, are you planning to cherry-pick this commit..? But better to have different commits, dn't sqaush it..? > ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl > methods should use datanode readlock > --- > > Key: HDFS-15160 > URL: https://issues.apache.org/jira/browse/HDFS-15160 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.2.4 > > Attachments: HDFS-15160-branch-3.3-001.patch, HDFS-15160.001.patch, > HDFS-15160.002.patch, HDFS-15160.003.patch, HDFS-15160.004.patch, > HDFS-15160.005.patch, HDFS-15160.006.patch, HDFS-15160.007.patch, > HDFS-15160.008.patch, HDFS-15160.branch-3-3.001.patch, > image-2020-04-10-17-18-08-128.png, image-2020-04-10-17-18-55-938.png > > Time Spent: 6h 20m > Remaining Estimate: 0h > > Now we have HDFS-15150, we can start to move some DN operations to use the > read lock rather than the write lock to improve concurrence. The first step > is to make the changes to ReplicaMap, as many other methods make calls to it. > This Jira switches read operations against the volume map to use the readLock > rather than the write lock. > Additionally, some methods make a call to replicaMap.replicas() (eg > getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result > in a read only fashion, so they can also be switched to using a readLock. > Next is the directory scanner and disk balancer, which only require a read > lock. > Finally (for this Jira) are various "low hanging fruit" items in BlockSender > and fsdatasetImpl where is it fairly obvious they only need a read lock. > For now, I have avoided changing anything which looks too risky, as I think > its better to do any larger refactoring or risky changes each in their own > Jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9266) Avoid unsafe split and append on fields that might be IPv6 literals
[ https://issues.apache.org/jira/browse/HDFS-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-9266: --- Fix Version/s: HADOOP-17800 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) Commit to HADOOP-17800 branch. [~newanja] and [~hemanthboyina] thanks for your contribution > Avoid unsafe split and append on fields that might be IPv6 literals > --- > > Key: HDFS-9266 > URL: https://issues.apache.org/jira/browse/HDFS-9266 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Nemanja Matkovic >Assignee: Nemanja Matkovic >Priority: Major > Labels: ipv6 > Fix For: HADOOP-17800 > > Attachments: HDFS-9266-HADOOP-11890.1.patch, > HDFS-9266-HADOOP-11890.2.patch, HDFS-9266-HADOOP-17800.001.patch, > HDFS-9266-HADOOP-17800.002.patch > > Original Estimate: 48h > Remaining Estimate: 48h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16147) load fsimage with parallelization and compression
[ https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-16147: Fix Version/s: (was: 3.3.0) > load fsimage with parallelization and compression > - > > Key: HDFS-16147 > URL: https://issues.apache.org/jira/browse/HDFS-16147 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namanode >Affects Versions: 3.3.0 >Reporter: liuyongpan >Priority: Minor > Attachments: HDFS-16147.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16147) load fsimage with parallelization and compression
[ https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-16147: Target Version/s: (was: 3.3.0) > load fsimage with parallelization and compression > - > > Key: HDFS-16147 > URL: https://issues.apache.org/jira/browse/HDFS-16147 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namanode >Affects Versions: 3.3.0 >Reporter: liuyongpan >Priority: Minor > Attachments: HDFS-16147.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12920) HDFS default value change (with adding time unit) breaks old version MR tarball work with Hadoop 3.x
[ https://issues.apache.org/jira/browse/HDFS-12920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17387210#comment-17387210 ] Brahma Reddy Battula commented on HDFS-12920: - I mean, E.G 3.2.2 and 3.2.3 hdfs-default.xml can be different.(if anybody write script or testcase for this, it might fail.) > HDFS default value change (with adding time unit) breaks old version MR > tarball work with Hadoop 3.x > > > Key: HDFS-12920 > URL: https://issues.apache.org/jira/browse/HDFS-12920 > Project: Hadoop HDFS > Issue Type: Bug > Components: configuration, hdfs >Reporter: Junping Du >Assignee: Akira Ajisaka >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 40m > Remaining Estimate: 0h > > After HADOOP-15059 get resolved. I tried to deploy 2.9.0 tar ball with 3.0.0 > RC1, and run the job with following errors: > {noformat} > 2017-12-12 13:29:06,824 INFO [main] > org.apache.hadoop.service.AbstractService: Service > org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.NumberFormatException: For input string: "30s" > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.NumberFormatException: For input string: "30s" > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:542) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:522) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1764) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:522) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:308) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1722) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1719) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1650) > {noformat} > This is because HDFS-10845, we are adding time unit to hdfs-default.xml but > it cannot be recognized by old version MR jars. > This break our rolling upgrade story, so should mark as blocker. > A quick workaround is to add values in hdfs-site.xml with removing all time > unit. But the right way may be to revert HDFS-10845 (and get rid of noisy > warnings). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12920) HDFS default value change (with adding time unit) breaks old version MR tarball work with Hadoop 3.x
[ https://issues.apache.org/jira/browse/HDFS-12920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17387055#comment-17387055 ] Brahma Reddy Battula commented on HDFS-12920: - I think, reverting HDFS-10845 is again incompatiable between 3.+ ( as it's committed to 3.0 ) and this new releases which you committed..? If you want to revert, you might need only in trunk..? > HDFS default value change (with adding time unit) breaks old version MR > tarball work with Hadoop 3.x > > > Key: HDFS-12920 > URL: https://issues.apache.org/jira/browse/HDFS-12920 > Project: Hadoop HDFS > Issue Type: Bug > Components: configuration, hdfs >Reporter: Junping Du >Assignee: Akira Ajisaka >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 40m > Remaining Estimate: 0h > > After HADOOP-15059 get resolved. I tried to deploy 2.9.0 tar ball with 3.0.0 > RC1, and run the job with following errors: > {noformat} > 2017-12-12 13:29:06,824 INFO [main] > org.apache.hadoop.service.AbstractService: Service > org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.NumberFormatException: For input string: "30s" > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.NumberFormatException: For input string: "30s" > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:542) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:522) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1764) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:522) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:308) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1722) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1719) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1650) > {noformat} > This is because HDFS-10845, we are adding time unit to hdfs-default.xml but > it cannot be recognized by old version MR jars. > This break our rolling upgrade story, so should mark as blocker. > A quick workaround is to add values in hdfs-site.xml with removing all time > unit. But the right way may be to revert HDFS-10845 (and get rid of noisy > warnings). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12920) HDFS default value change (with adding time unit) breaks old version MR tarball work with new version (3.0) of hadoop
[ https://issues.apache.org/jira/browse/HDFS-12920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17386179#comment-17386179 ] Brahma Reddy Battula commented on HDFS-12920: - [~jlowe] mentioned,This might not happen in normal upgrade process as configs and jar's are used same tarball. And this shouldn't be blocker for 3.2.3 release atleast, as we given couple releases with this Jira open. > HDFS default value change (with adding time unit) breaks old version MR > tarball work with new version (3.0) of hadoop > - > > Key: HDFS-12920 > URL: https://issues.apache.org/jira/browse/HDFS-12920 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Junping Du >Priority: Blocker > > After HADOOP-15059 get resolved. I tried to deploy 2.9.0 tar ball with 3.0.0 > RC1, and run the job with following errors: > {noformat} > 2017-12-12 13:29:06,824 INFO [main] > org.apache.hadoop.service.AbstractService: Service > org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.NumberFormatException: For input string: "30s" > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.NumberFormatException: For input string: "30s" > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:542) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:522) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1764) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:522) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:308) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1722) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1719) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1650) > {noformat} > This is because HDFS-10845, we are adding time unit to hdfs-default.xml but > it cannot be recognized by old version MR jars. > This break our rolling upgrade story, so should mark as blocker. > A quick workaround is to add values in hdfs-site.xml with removing all time > unit. But the right way may be to revert HDFS-10845 (and get rid of noisy > warnings). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16044) getListing call getLocatedBlocks even source is a directory
[ https://issues.apache.org/jira/browse/HDFS-16044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356659#comment-17356659 ] Brahma Reddy Battula commented on HDFS-16044: - nice catch [~pilchard] !![~weichiu] if any plan to re-spin 3.3.1 RC this will be good candidate to include. > getListing call getLocatedBlocks even source is a directory > --- > > Key: HDFS-16044 > URL: https://issues.apache.org/jira/browse/HDFS-16044 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.1.1 >Reporter: ludun >Assignee: ludun >Priority: Major > Attachments: HDFS-16044.00.patch > > > In production cluster when call getListing very frequent. The processing > time of rpc request is very high. we try to optimize the performance of > getListing request. > After some check, we found that, even the source and child is dir, the > getListing request also call getLocatedBlocks. > the request is and needLocation is false > {code:java} > 2021-05-27 15:16:07,093 TRACE ipc.ProtobufRpcEngine: 1: Call -> > 8-5-231-4/8.5.231.4:25000: getListing {src: > "/data/connector/test/topics/102test" startAfter: "" needLocation: false} > {code} > but getListing request 1000 times getLocatedBlocks which not needed. > {code:java} > `---ts=2021-05-27 14:19:15;thread_name=IPC Server handler 86 on > 25000;id=e6;is_daemon=true;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@5fcfe4b2 > `---[35.068532ms] > org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp:getListing() > +---[0.003542ms] > org.apache.hadoop.hdfs.server.namenode.INodesInPath:getPathComponents() #214 > +---[0.003053ms] > org.apache.hadoop.hdfs.server.namenode.FSDirectory:isExactReservedName() #95 > +---[0.002938ms] > org.apache.hadoop.hdfs.server.namenode.FSDirectory:readLock() #218 > +---[0.00252ms] > org.apache.hadoop.hdfs.server.namenode.INodesInPath:isDotSnapshotDir() #220 > +---[0.002788ms] > org.apache.hadoop.hdfs.server.namenode.INodesInPath:getPathSnapshotId() #223 > +---[0.002905ms] > org.apache.hadoop.hdfs.server.namenode.INodesInPath:getLastINode() #224 > +---[0.002785ms] > org.apache.hadoop.hdfs.server.namenode.INode:getStoragePolicyID() #230 > +---[0.002236ms] > org.apache.hadoop.hdfs.server.namenode.INode:isDirectory() #233 > +---[0.002919ms] > org.apache.hadoop.hdfs.server.namenode.INode:asDirectory() #242 > +---[0.003408ms] > org.apache.hadoop.hdfs.server.namenode.INodeDirectory:getChildrenList() #243 > +---[0.005942ms] > org.apache.hadoop.hdfs.server.namenode.INodeDirectory:nextChild() #244 > +---[0.002467ms] org.apache.hadoop.hdfs.util.ReadOnlyList:size() #245 > +---[0.005481ms] > org.apache.hadoop.hdfs.server.namenode.FSDirectory:getLsLimit() #247 > +---[0.002176ms] > org.apache.hadoop.hdfs.server.namenode.FSDirectory:getLsLimit() #248 > +---[min=0.00211ms,max=0.005157ms,total=2.247572ms,count=1000] > org.apache.hadoop.hdfs.util.ReadOnlyList:get() #252 > +---[min=0.001946ms,max=0.005411ms,total=2.041715ms,count=1000] > org.apache.hadoop.hdfs.server.namenode.INode:isSymlink() #253 > +---[min=0.002176ms,max=0.005426ms,total=2.264472ms,count=1000] > org.apache.hadoop.hdfs.server.namenode.INode:getLocalStoragePolicyID() #254 > +---[min=0.002251ms,max=0.006849ms,total=2.351935ms,count=1000] > org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp:getStoragePolicyID() > #95 > +---[min=0.006091ms,max=0.012333ms,total=6.439434ms,count=1000] > org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp:createFileStatus() > #257 > +---[min=0.00269ms,max=0.004995ms,total=2.788194ms,count=1000] > org.apache.hadoop.hdfs.protocol.HdfsLocatedFileStatus:getLocatedBlocks() #265 > +---[0.003234ms] > org.apache.hadoop.hdfs.protocol.DirectoryListing:() #274 > `---[0.002457ms] > org.apache.hadoop.hdfs.server.namenode.FSDirectory:readUnlock() #277 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15222) Correct the "hdfs fsck -list-corruptfileblocks" command output
[ https://issues.apache.org/jira/browse/HDFS-15222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-15222: Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. [~Sushma_28] thanks for contribution and [~SouryakantaDwivedy] thanks for reporting. > Correct the "hdfs fsck -list-corruptfileblocks" command output > --- > > Key: HDFS-15222 > URL: https://issues.apache.org/jira/browse/HDFS-15222 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, tools >Affects Versions: 3.1.1 > Environment: 3 node HA cluster >Reporter: Souryakanta Dwivedy >Assignee: Ravuri Sushma sree >Priority: Minor > Fix For: 3.4.0 > > Attachments: HDFS-15222.001.patch, HDFS-15222.002.patch, > HDFS-15222.003.patch, output1.PNG, output2.PNG > > > Output message of ""hdfs fsck -list-corruptfileblocks" command is not correct > > Steps :-Steps :- > * Create a directory and put files - > * Corrupt the file blocks > * check the corrupted file blocks with "hdfs fsck -list-corruptfileblocks" > command > It will display corrupted file blocks with message as "The list of corrupt > files under path '/path' are:" at the beginning which is wrong. > And at the end of output also the wrong message will display as "The > filesystem under path '/path' has CORRUPT files" > > Actual output : "The list of corrupt files under path '/path' are:" > "The filesystem under path '/path' has > CORRUPT files" > Expected output : "The list of corrupted file blocks under path '/path' are:" > "The filesystem under path '/path' has > CORRUPT file blocks" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15222) Correct the "hdfs fsck -list-corruptfileblocks" command output
[ https://issues.apache.org/jira/browse/HDFS-15222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-15222: Summary: Correct the "hdfs fsck -list-corruptfileblocks" command output (was: HDFS: Output message of ""hdfs fsck -list-corruptfileblocks" command is not correct) > Correct the "hdfs fsck -list-corruptfileblocks" command output > --- > > Key: HDFS-15222 > URL: https://issues.apache.org/jira/browse/HDFS-15222 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, tools >Affects Versions: 3.1.1 > Environment: 3 node HA cluster >Reporter: Souryakanta Dwivedy >Assignee: Ravuri Sushma sree >Priority: Minor > Attachments: HDFS-15222.001.patch, HDFS-15222.002.patch, > HDFS-15222.003.patch, output1.PNG, output2.PNG > > > Output message of ""hdfs fsck -list-corruptfileblocks" command is not correct > > Steps :-Steps :- > * Create a directory and put files - > * Corrupt the file blocks > * check the corrupted file blocks with "hdfs fsck -list-corruptfileblocks" > command > It will display corrupted file blocks with message as "The list of corrupt > files under path '/path' are:" at the beginning which is wrong. > And at the end of output also the wrong message will display as "The > filesystem under path '/path' has CORRUPT files" > > Actual output : "The list of corrupt files under path '/path' are:" > "The filesystem under path '/path' has > CORRUPT files" > Expected output : "The list of corrupted file blocks under path '/path' are:" > "The filesystem under path '/path' has > CORRUPT file blocks" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15222) HDFS: Output message of ""hdfs fsck -list-corruptfileblocks" command is not correct
[ https://issues.apache.org/jira/browse/HDFS-15222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313552#comment-17313552 ] Brahma Reddy Battula commented on HDFS-15222: - [~Sushma_28] thanks for uploading the patch.. Latest patch lgtm.. Looks test failures are unrelated. > HDFS: Output message of ""hdfs fsck -list-corruptfileblocks" command is not > correct > --- > > Key: HDFS-15222 > URL: https://issues.apache.org/jira/browse/HDFS-15222 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, tools >Affects Versions: 3.1.1 > Environment: 3 node HA cluster >Reporter: Souryakanta Dwivedy >Assignee: Ravuri Sushma sree >Priority: Minor > Attachments: HDFS-15222.001.patch, HDFS-15222.002.patch, > HDFS-15222.003.patch, output1.PNG, output2.PNG > > > Output message of ""hdfs fsck -list-corruptfileblocks" command is not correct > > Steps :-Steps :- > * Create a directory and put files - > * Corrupt the file blocks > * check the corrupted file blocks with "hdfs fsck -list-corruptfileblocks" > command > It will display corrupted file blocks with message as "The list of corrupt > files under path '/path' are:" at the beginning which is wrong. > And at the end of output also the wrong message will display as "The > filesystem under path '/path' has CORRUPT files" > > Actual output : "The list of corrupt files under path '/path' are:" > "The filesystem under path '/path' has > CORRUPT files" > Expected output : "The list of corrupted file blocks under path '/path' are:" > "The filesystem under path '/path' has > CORRUPT file blocks" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15735) NameNode memory Leak on frequent execution of fsck
[ https://issues.apache.org/jira/browse/HDFS-15735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313300#comment-17313300 ] Brahma Reddy Battula commented on HDFS-15735: - if there are no objects, can I commit this one on this week..? > NameNode memory Leak on frequent execution of fsck > > > Key: HDFS-15735 > URL: https://issues.apache.org/jira/browse/HDFS-15735 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: HDFS-15735.001.patch > > > The memory of the cluster NameNode continues to grow, and the full gc > eventually leads to the failure of the active and standby HDFS > Htrace is used to track the processing time of fsck > Checking the code it is found that the tracer object in NamenodeFsck.java was > only created but not closed because of this the memory footprint continues to > grow -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15222) HDFS: Output message of ""hdfs fsck -list-corruptfileblocks" command is not correct
[ https://issues.apache.org/jira/browse/HDFS-15222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313299#comment-17313299 ] Brahma Reddy Battula commented on HDFS-15222: - [~Sushma_28] while committing this one, I thought one update can be done for this more meaningful.. Can you please change following {noformat} "The filesystem under path '/path' has CORRUPT file blocks"{noformat} to following.? {noformat} "The filesystem under path '/path' has CORRUPT blocks"{noformat} > HDFS: Output message of ""hdfs fsck -list-corruptfileblocks" command is not > correct > --- > > Key: HDFS-15222 > URL: https://issues.apache.org/jira/browse/HDFS-15222 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, tools >Affects Versions: 3.1.1 > Environment: 3 node HA cluster >Reporter: Souryakanta Dwivedy >Assignee: Ravuri Sushma sree >Priority: Minor > Attachments: HDFS-15222.001.patch, HDFS-15222.002.patch, output1.PNG, > output2.PNG > > > Output message of ""hdfs fsck -list-corruptfileblocks" command is not correct > > Steps :-Steps :- > * Create a directory and put files - > * Corrupt the file blocks > * check the corrupted file blocks with "hdfs fsck -list-corruptfileblocks" > command > It will display corrupted file blocks with message as "The list of corrupt > files under path '/path' are:" at the beginning which is wrong. > And at the end of output also the wrong message will display as "The > filesystem under path '/path' has CORRUPT files" > > Actual output : "The list of corrupt files under path '/path' are:" > "The filesystem under path '/path' has > CORRUPT files" > Expected output : "The list of corrupted file blocks under path '/path' are:" > "The filesystem under path '/path' has > CORRUPT file blocks" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15494) TestReplicaCachingGetSpaceUsed#testReplicaCachingGetSpaceUsedByRBWReplica Fails on Windows
[ https://issues.apache.org/jira/browse/HDFS-15494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-15494: Fix Version/s: 3.4.0 3.3.1 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk and branch-3.3. [~Sushma_28] thanks for contribution. > TestReplicaCachingGetSpaceUsed#testReplicaCachingGetSpaceUsedByRBWReplica > Fails on Windows > -- > > Key: HDFS-15494 > URL: https://issues.apache.org/jira/browse/HDFS-15494 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15494.001.patch > > > TestReplicaCachingGetSpaceUsed #testReplicaCachingGetSpaceUsedByRBWReplica > Fails on Windows because when RBW should be renamed to Finalized, windows is > not supporting . > This should be skipped on Windows -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15494) TestReplicaCachingGetSpaceUsed#testReplicaCachingGetSpaceUsedByRBWReplica Fails on Windows
[ https://issues.apache.org/jira/browse/HDFS-15494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-15494: Summary: TestReplicaCachingGetSpaceUsed#testReplicaCachingGetSpaceUsedByRBWReplica Fails on Windows (was: TestReplicaCachingGetSpaceUsed #testReplicaCachingGetSpaceUsedByRBWReplica Fails on Windows) > TestReplicaCachingGetSpaceUsed#testReplicaCachingGetSpaceUsedByRBWReplica > Fails on Windows > -- > > Key: HDFS-15494 > URL: https://issues.apache.org/jira/browse/HDFS-15494 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: HDFS-15494.001.patch > > > TestReplicaCachingGetSpaceUsed #testReplicaCachingGetSpaceUsedByRBWReplica > Fails on Windows because when RBW should be renamed to Finalized, windows is > not supporting . > This should be skipped on Windows -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15932) Improve the balancer error message when process exits abnormally.
[ https://issues.apache.org/jira/browse/HDFS-15932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312028#comment-17312028 ] Brahma Reddy Battula commented on HDFS-15932: - [~prasad-acit] thanks for patch.. Patch lgtm. > Improve the balancer error message when process exits abnormally. > - > > Key: HDFS-15932 > URL: https://issues.apache.org/jira/browse/HDFS-15932 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Attachments: HDFS-15932.001.patch > > > The HDFS balancer exits abnormally. The content of the pid file is not > cleaned up, and the new balancer cannot be started. > Start the balancer (start-balancer.sh threshold 5) > Kill the balancer process (kill -9 ) > Re-execute the balancer, there will be an error message. > -- Balancer is running as process . Stop it first. > (But process already stopped, error message can be more detailed) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15222) HDFS: Output message of ""hdfs fsck -list-corruptfileblocks" command is not correct
[ https://issues.apache.org/jira/browse/HDFS-15222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283251#comment-17283251 ] Brahma Reddy Battula edited comment on HDFS-15222 at 2/11/21, 6:00 PM: --- [~Sushma_28] thanks for reporting.. It's make sense to me.. The patch LGTM, let jenkins run on latest code. This might go only in trunk, as this change the output of the command and some test scripts might fail as they might validate the output. was (Author: brahmareddy): [~Sushma_28] thanks for reporting.. It's make sense me.. The patch LGTM, let jenkins run on latest code. this might go only in trunk, as this change the output of the command some test scripts might file if they validate the output. > HDFS: Output message of ""hdfs fsck -list-corruptfileblocks" command is not > correct > --- > > Key: HDFS-15222 > URL: https://issues.apache.org/jira/browse/HDFS-15222 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, tools >Affects Versions: 3.1.1 > Environment: 3 node HA cluster >Reporter: Souryakanta Dwivedy >Assignee: Ravuri Sushma sree >Priority: Minor > Attachments: HDFS-15222.001.patch, HDFS-15222.002.patch, output1.PNG, > output2.PNG > > > Output message of ""hdfs fsck -list-corruptfileblocks" command is not correct > > Steps :-Steps :- > * Create a directory and put files - > * Corrupt the file blocks > * check the corrupted file blocks with "hdfs fsck -list-corruptfileblocks" > command > It will display corrupted file blocks with message as "The list of corrupt > files under path '/path' are:" at the beginning which is wrong. > And at the end of output also the wrong message will display as "The > filesystem under path '/path' has CORRUPT files" > > Actual output : "The list of corrupt files under path '/path' are:" > "The filesystem under path '/path' has > CORRUPT files" > Expected output : "The list of corrupted file blocks under path '/path' are:" > "The filesystem under path '/path' has > CORRUPT file blocks" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15222) HDFS: Output message of ""hdfs fsck -list-corruptfileblocks" command is not correct
[ https://issues.apache.org/jira/browse/HDFS-15222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283251#comment-17283251 ] Brahma Reddy Battula commented on HDFS-15222: - [~Sushma_28] thanks for reporting.. It's make sense me.. The patch LGTM, let jenkins run on latest code. this might go only in trunk, as this change the output of the command some test scripts might file if they validate the output. > HDFS: Output message of ""hdfs fsck -list-corruptfileblocks" command is not > correct > --- > > Key: HDFS-15222 > URL: https://issues.apache.org/jira/browse/HDFS-15222 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, tools >Affects Versions: 3.1.1 > Environment: 3 node HA cluster >Reporter: Souryakanta Dwivedy >Assignee: Ravuri Sushma sree >Priority: Minor > Attachments: HDFS-15222.001.patch, HDFS-15222.002.patch, output1.PNG, > output2.PNG > > > Output message of ""hdfs fsck -list-corruptfileblocks" command is not correct > > Steps :-Steps :- > * Create a directory and put files - > * Corrupt the file blocks > * check the corrupted file blocks with "hdfs fsck -list-corruptfileblocks" > command > It will display corrupted file blocks with message as "The list of corrupt > files under path '/path' are:" at the beginning which is wrong. > And at the end of output also the wrong message will display as "The > filesystem under path '/path' has CORRUPT files" > > Actual output : "The list of corrupt files under path '/path' are:" > "The filesystem under path '/path' has > CORRUPT files" > Expected output : "The list of corrupted file blocks under path '/path' are:" > "The filesystem under path '/path' has > CORRUPT file blocks" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15494) TestReplicaCachingGetSpaceUsed #testReplicaCachingGetSpaceUsedByRBWReplica Fails on Windows
[ https://issues.apache.org/jira/browse/HDFS-15494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283240#comment-17283240 ] Brahma Reddy Battula commented on HDFS-15494: - LGTM.. Lets run the jenkin's run on latest code. > TestReplicaCachingGetSpaceUsed #testReplicaCachingGetSpaceUsedByRBWReplica > Fails on Windows > --- > > Key: HDFS-15494 > URL: https://issues.apache.org/jira/browse/HDFS-15494 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: HDFS-15494.001.patch > > > TestReplicaCachingGetSpaceUsed #testReplicaCachingGetSpaceUsedByRBWReplica > Fails on Windows because when RBW should be renamed to Finalized, windows is > not supporting . > This should be skipped on Windows -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15735) NameNode memory Leak on frequent execution of fsck
[ https://issues.apache.org/jira/browse/HDFS-15735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283235#comment-17283235 ] Brahma Reddy Battula commented on HDFS-15735: - {quote} Tracer is a {{private}} variable, Not used anywhere, Tracer is subject to removal due to CVE(IIRC), guess HADOOP-17387 and others, one recently mentioned too. {quote} Looks you did not get what does we mean, we talked about this config *"namenode.fsck.htrace."* {quote}Harmless things are not always correct, closing tracer in fsck() may impact if someone is using tracer post it(if so). Closing in the last line of fsck may not be this issue what you are fixing. the moment you come out from the method control, the tracer would be subject to GC? closing it won't help, it will also make it subject to GC only. {quote} How it will impact for this ..? {quote}Would request consider the other options as well. {quote} Let's anybody else have objection to go this. {quote}On this note, I take my vote back. {quote} thanks. > NameNode memory Leak on frequent execution of fsck > > > Key: HDFS-15735 > URL: https://issues.apache.org/jira/browse/HDFS-15735 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: HDFS-15735.001.patch > > > The memory of the cluster NameNode continues to grow, and the full gc > eventually leads to the failure of the active and standby HDFS > Htrace is used to track the processing time of fsck > Checking the code it is found that the tracer object in NamenodeFsck.java was > only created but not closed because of this the memory footprint continues to > grow -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15812) after deleting data of hbase table hdfs size is not decreasing
[ https://issues.apache.org/jira/browse/HDFS-15812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283216#comment-17283216 ] Brahma Reddy Battula commented on HDFS-15812: - Have a look at *namenode audit logs* after you delete the table which can tell whether requests reached HDFS or not. Looks like you are using *"hdp 3.1.4.0-315"* which might not completely *Apache Hadoop*..SO IMO as it's vendor specific , you can ask vendor forum also.. > after deleting data of hbase table hdfs size is not decreasing > -- > > Key: HDFS-15812 > URL: https://issues.apache.org/jira/browse/HDFS-15812 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.0.2-alpha > Environment: HDP 3.1.4.0-315 > Hbase 2.0.2.3.1.4.0-315 >Reporter: Satya Gaurav >Priority: Major > > I am deleting the data from hbase table, it's deleting from hbase table but > the size of the hdfs directory is not reducing. Even I ran the major > compaction but after that also hdfs size didn't reduce. Any solution for this > issue? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15735) NameNode memory Leak on frequent execution of fsck
[ https://issues.apache.org/jira/browse/HDFS-15735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258055#comment-17258055 ] Brahma Reddy Battula commented on HDFS-15735: - {quote}I am not sure about it, why it being configurable makes it necessary to be here, why closing is better. Please hold it. -1 {quote} Removal can impact existing user whoever use this feature as they've configured and Proposed fix will not break anything. Not sure why this needs to hold and given -1 and I feel this will not good practice. > NameNode memory Leak on frequent execution of fsck > > > Key: HDFS-15735 > URL: https://issues.apache.org/jira/browse/HDFS-15735 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: HDFS-15735.001.patch > > > The memory of the cluster NameNode continues to grow, and the full gc > eventually leads to the failure of the active and standby HDFS > Htrace is used to track the processing time of fsck > Checking the code it is found that the tracer object in NamenodeFsck.java was > only created but not closed because of this the memory footprint continues to > grow -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15735) NameNode memory Leak on frequent execution of fsck
[ https://issues.apache.org/jira/browse/HDFS-15735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252618#comment-17252618 ] Brahma Reddy Battula commented on HDFS-15735: - [~Sushma_28] thanks for reporting. Closing tracer is better as of now. LGTM. > NameNode memory Leak on frequent execution of fsck > > > Key: HDFS-15735 > URL: https://issues.apache.org/jira/browse/HDFS-15735 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: HDFS-15735.001.patch > > > The memory of the cluster NameNode continues to grow, and the full gc > eventually leads to the failure of the active and standby HDFS > Htrace is used to track the processing time of fsck > Checking the code it is found that the tracer object in NamenodeFsck.java was > only created but not closed because of this the memory footprint continues to > grow -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15569) Speed up the Storage#doRecover during datanode rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-15569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250983#comment-17250983 ] Brahma Reddy Battula commented on HDFS-15569: - [~hemanthboyina] thanks for reporting and working on this. changes are LGTM. > Speed up the Storage#doRecover during datanode rolling upgrade > --- > > Key: HDFS-15569 > URL: https://issues.apache.org/jira/browse/HDFS-15569 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Hemanth Boyina >Assignee: Hemanth Boyina >Priority: Major > Attachments: HDFS-15569.001.patch, HDFS-15569.002.patch, > HDFS-15569.003.patch > > > When upgrading datanode from hadoop 2.7.2 to 3.1.1 , because of jvm not > having enough memory upgrade failed , Adjusted memory configurations and re > upgraded datanode , > Now datanode upgrade has taken more time , on analyzing found that > Storage#deleteDir has taken more time in RECOVER_UPGRADE state > {code:java} > "Thread-28" #270 daemon prio=5 os_prio=0 tid=0x7fed5a9b8000 nid=0x2b5c > runnable [0x7fdcdad2a000]"Thread-28" #270 daemon prio=5 os_prio=0 > tid=0x7fed5a9b8000 nid=0x2b5c runnable [0x7fdcdad2a000] > java.lang.Thread.State: RUNNABLE at java.io.UnixFileSystem.delete0(Native > Method) at java.io.UnixFileSystem.delete(UnixFileSystem.java:265) at > java.io.File.delete(File.java:1041) at > org.apache.hadoop.fs.FileUtil.deleteImpl(FileUtil.java:229) at > org.apache.hadoop.fs.FileUtil.fullyDeleteContents(FileUtil.java:270) at > org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:182) at > org.apache.hadoop.fs.FileUtil.fullyDeleteContents(FileUtil.java:285) at > org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:182) at > org.apache.hadoop.fs.FileUtil.fullyDeleteContents(FileUtil.java:285) at > org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:182) at > org.apache.hadoop.fs.FileUtil.fullyDeleteContents(FileUtil.java:285) at > org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:182) at > org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:153) at > org.apache.hadoop.hdfs.server.common.Storage.deleteDir(Storage.java:1348) at > org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.doRecover(Storage.java:782) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:174) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:224) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:253) > at > org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:455) > at > org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:389) > - locked <0x7fdf08ec7548> (a > org.apache.hadoop.hdfs.server.datanode.DataStorage) at > org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:557) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1761) > - locked <0x7fdf08ec7598> (a > org.apache.hadoop.hdfs.server.datanode.DataNode) at > org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1697) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:392) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:282) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:822) > at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15624) Fix the SetQuotaByStorageTypeOp problem after updating hadoop
[ https://issues.apache.org/jira/browse/HDFS-15624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223698#comment-17223698 ] Brahma Reddy Battula commented on HDFS-15624: - {quote}Wanted to check HDFS-15660 too, but now my namenode isn't starting now because of this, and It will take time to sort it. But I am still convinced it is different, so no need to hold this, But still I will respect opinions on how to go ahead. {quote} I think, you can post in mailing list others also can try to help you sort out. {quote}Right now, PR is handling the backward compatibility related issues (due to change in StorageType order) and inclusion of new Storage policy, by bumping the LayoutVersion and adding check against NVDIMM releated operations to block during upgrade. {quote} I dn't think bumping the namelayout is best solution, need to check other way. ( may be like checking the client version during the upgrade.) {quote}HDFS-15660 will be handled soon enough to solve issues of both PROVIDED and NVDIMM in a generic way. {quote} Yes, I too prefer generic way. {quote}In any case, if not able to fix any of these by the release time (which I think we still have some time), then we can think of revert. {quote} with provided storage we've given couple of releases, there are alternatives to avoid this. > Fix the SetQuotaByStorageTypeOp problem after updating hadoop > --- > > Key: HDFS-15624 > URL: https://issues.apache.org/jira/browse/HDFS-15624 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.4.0 >Reporter: YaYun Wang >Priority: Major > Labels: pull-request-available, release-blocker > Time Spent: 6h > Remaining Estimate: 0h > > HDFS-15025 adds a new storage Type NVDIMM, changes the ordinal() of the enum > of StorageType. And, setting the quota by storageType depends on the > ordinal(), therefore, it may cause the setting of quota to be invalid after > upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15624) Fix the SetQuotaByStorageTypeOp problem after updating hadoop
[ https://issues.apache.org/jira/browse/HDFS-15624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222805#comment-17222805 ] Brahma Reddy Battula commented on HDFS-15624: - {quote}, like the broken FsImage Compatibility, due to change in ordinal of Storage Types. Rolling Upgrade issue. {quote} did you try it..? can you describe which scenario you tried and paste snapshot before you talk about revert.? Based on the above > Fix the SetQuotaByStorageTypeOp problem after updating hadoop > --- > > Key: HDFS-15624 > URL: https://issues.apache.org/jira/browse/HDFS-15624 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: YaYun Wang >Priority: Major > Labels: pull-request-available > Time Spent: 5h > Remaining Estimate: 0h > > HDFS-15025 adds a new storage Type NVDIMM, changes the ordinal() of the enum > of StorageType. And, setting the quota by storageType depends on the > ordinal(), therefore, it may cause the setting of quota to be invalid after > upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15624) Fix the SetQuotaByStorageTypeOp problem after updating hadoop
[ https://issues.apache.org/jira/browse/HDFS-15624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222805#comment-17222805 ] Brahma Reddy Battula edited comment on HDFS-15624 at 10/29/20, 10:07 AM: - {quote}, like the broken FsImage Compatibility, due to change in ordinal of Storage Types. Rolling Upgrade issue. {quote} did you try it..? can you describe which scenario you tried and paste snapshot before you talk about revert.? was (Author: brahmareddy): {quote}, like the broken FsImage Compatibility, due to change in ordinal of Storage Types. Rolling Upgrade issue. {quote} did you try it..? can you describe which scenario you tried and paste snapshot before you talk about revert.? Based on the above > Fix the SetQuotaByStorageTypeOp problem after updating hadoop > --- > > Key: HDFS-15624 > URL: https://issues.apache.org/jira/browse/HDFS-15624 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: YaYun Wang >Priority: Major > Labels: pull-request-available > Time Spent: 5h > Remaining Estimate: 0h > > HDFS-15025 adds a new storage Type NVDIMM, changes the ordinal() of the enum > of StorageType. And, setting the quota by storageType depends on the > ordinal(), therefore, it may cause the setting of quota to be invalid after > upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15624) Fix the SetQuotaByStorageTypeOp problem after updating hadoop
[ https://issues.apache.org/jira/browse/HDFS-15624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222133#comment-17222133 ] Brahma Reddy Battula edited comment on HDFS-15624 at 10/28/20, 12:33 PM: - There is an issue after provided storage type introduced ( https://issues.apache.org/jira/browse/HDFS-15660), issue can be addressed there itself, so we can hold on till HDFS-15660 is addressed. was (Author: brahmareddy): there is an issue provided storage type itself ( https://issues.apache.org/jira/browse/HDFS-15660), issue can be addressed there itself > Fix the SetQuotaByStorageTypeOp problem after updating hadoop > --- > > Key: HDFS-15624 > URL: https://issues.apache.org/jira/browse/HDFS-15624 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: YaYun Wang >Priority: Major > Labels: pull-request-available > Time Spent: 4h 10m > Remaining Estimate: 0h > > HDFS-15025 adds a new storage Type NVDIMM, changes the ordinal() of the enum > of StorageType. And, setting the quota by storageType depends on the > ordinal(), therefore, it may cause the setting of quota to be invalid after > upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15624) Fix the SetQuotaByStorageTypeOp problem after updating hadoop
[ https://issues.apache.org/jira/browse/HDFS-15624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222133#comment-17222133 ] Brahma Reddy Battula commented on HDFS-15624: - there is an issue provided storage type itself ( https://issues.apache.org/jira/browse/HDFS-15660), issue can be addressed there itself > Fix the SetQuotaByStorageTypeOp problem after updating hadoop > --- > > Key: HDFS-15624 > URL: https://issues.apache.org/jira/browse/HDFS-15624 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: YaYun Wang >Priority: Major > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > > HDFS-15025 adds a new storage Type NVDIMM, changes the ordinal() of the enum > of StorageType. And, setting the quota by storageType depends on the > ordinal(), therefore, it may cause the setting of quota to be invalid after > upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7343) HDFS smart storage management
[ https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17209992#comment-17209992 ] Brahma Reddy Battula commented on HDFS-7343: Any Update on this feature..? > HDFS smart storage management > - > > Key: HDFS-7343 > URL: https://issues.apache.org/jira/browse/HDFS-7343 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Kai Zheng >Assignee: Wei Zhou >Priority: Major > Attachments: HDFS-Smart-Storage-Management-update.pdf, > HDFS-Smart-Storage-Management.pdf, > HDFSSmartStorageManagement-General-20170315.pdf, > HDFSSmartStorageManagement-Phase1-20170315.pdf, access_count_tables.jpg, > move.jpg, tables_in_ssm.xlsx > > > As discussed in HDFS-7285, it would be better to have a comprehensive and > flexible storage policy engine considering file attributes, metadata, data > temperature, storage type, EC codec, available hardware capabilities, > user/application preference and etc. > Modified the title for re-purpose. > We'd extend this effort some bit and aim to work on a comprehensive solution > to provide smart storage management service in order for convenient, > intelligent and effective utilizing of erasure coding or replicas, HDFS cache > facility, HSM offering, and all kinds of tools (balancer, mover, disk > balancer and so on) in a large cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15566) NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0
[ https://issues.apache.org/jira/browse/HDFS-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-15566: Attachment: HDFS-15566-003.patch > NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0 > > > Key: HDFS-15566 > URL: https://issues.apache.org/jira/browse/HDFS-15566 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: HDFS-15566-001.patch, HDFS-15566-002.patch, > HDFS-15566-003.patch > > > * After rollingUpgrade NN from 3.1.3/3.2.1 to 3.3.0, if the NN is restarted, > it fails while replaying edit logs. > * HDFS-14922, HDFS-14924, and HDFS-15054 introduced the *modification time* > bits to the editLog transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the *modification time* bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > {noformat} > 2020-09-07 19:34:42,085 | DEBUG | main | Stopping client | Client.java:1361 > 2020-09-07 19:34:42,087 | ERROR | main | Failed to start namenode. | > NameNode.java:1751 > java.lang.IllegalArgumentException > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at org.apache.hadoop.ipc.ClientId.toString(ClientId.java:56) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendRpcIdsToString(FSEditLogOp.java:318) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$700(FSEditLogOp.java:153) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$DeleteSnapshotOp.toString(FSEditLogOp.java:3606) > at java.lang.String.valueOf(String.java:2994) > at java.lang.StringBuilder.append(StringBuilder.java:131) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:305) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:188) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:932) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:779) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:337) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1136) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:742) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:654) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:716) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:959) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:932) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1674) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1744){noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15566) NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0
[ https://issues.apache.org/jira/browse/HDFS-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17195155#comment-17195155 ] Brahma Reddy Battula commented on HDFS-15566: - [~weichiu] thanks for review..Uploaded the patch to fix the checkstyle and handling writefields. > NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0 > > > Key: HDFS-15566 > URL: https://issues.apache.org/jira/browse/HDFS-15566 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: HDFS-15566-001.patch, HDFS-15566-002.patch > > > * After rollingUpgrade NN from 3.1.3/3.2.1 to 3.3.0, if the NN is restarted, > it fails while replaying edit logs. > * HDFS-14922, HDFS-14924, and HDFS-15054 introduced the *modification time* > bits to the editLog transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the *modification time* bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > {noformat} > 2020-09-07 19:34:42,085 | DEBUG | main | Stopping client | Client.java:1361 > 2020-09-07 19:34:42,087 | ERROR | main | Failed to start namenode. | > NameNode.java:1751 > java.lang.IllegalArgumentException > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at org.apache.hadoop.ipc.ClientId.toString(ClientId.java:56) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendRpcIdsToString(FSEditLogOp.java:318) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$700(FSEditLogOp.java:153) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$DeleteSnapshotOp.toString(FSEditLogOp.java:3606) > at java.lang.String.valueOf(String.java:2994) > at java.lang.StringBuilder.append(StringBuilder.java:131) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:305) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:188) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:932) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:779) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:337) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1136) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:742) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:654) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:716) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:959) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:932) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1674) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1744){noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15566) NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0
[ https://issues.apache.org/jira/browse/HDFS-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-15566: Attachment: HDFS-15566-002.patch > NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0 > > > Key: HDFS-15566 > URL: https://issues.apache.org/jira/browse/HDFS-15566 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: HDFS-15566-001.patch, HDFS-15566-002.patch > > > * After rollingUpgrade NN from 3.1.3/3.2.1 to 3.3.0, if the NN is restarted, > it fails while replaying edit logs. > * HDFS-14922, HDFS-14924, and HDFS-15054 introduced the *modification time* > bits to the editLog transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the *modification time* bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > {noformat} > 2020-09-07 19:34:42,085 | DEBUG | main | Stopping client | Client.java:1361 > 2020-09-07 19:34:42,087 | ERROR | main | Failed to start namenode. | > NameNode.java:1751 > java.lang.IllegalArgumentException > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at org.apache.hadoop.ipc.ClientId.toString(ClientId.java:56) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendRpcIdsToString(FSEditLogOp.java:318) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$700(FSEditLogOp.java:153) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$DeleteSnapshotOp.toString(FSEditLogOp.java:3606) > at java.lang.String.valueOf(String.java:2994) > at java.lang.StringBuilder.append(StringBuilder.java:131) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:305) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:188) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:932) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:779) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:337) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1136) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:742) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:654) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:716) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:959) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:932) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1674) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1744){noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15566) NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0
[ https://issues.apache.org/jira/browse/HDFS-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194247#comment-17194247 ] Brahma Reddy Battula commented on HDFS-15566: - [~weichiu] thanks for taking a look. AFAIK I didn't introduce any fields to fsimage so I no need to handle seperately.. The new layout version will be used for edits write/read during the rolloing upgrade. > NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0 > > > Key: HDFS-15566 > URL: https://issues.apache.org/jira/browse/HDFS-15566 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: HDFS-15566-001.patch > > > * After rollingUpgrade NN from 3.1.3/3.2.1 to 3.3.0, if the NN is restarted, > it fails while replaying edit logs. > * HDFS-14922, HDFS-14924, and HDFS-15054 introduced the *modification time* > bits to the editLog transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the *modification time* bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > {noformat} > 2020-09-07 19:34:42,085 | DEBUG | main | Stopping client | Client.java:1361 > 2020-09-07 19:34:42,087 | ERROR | main | Failed to start namenode. | > NameNode.java:1751 > java.lang.IllegalArgumentException > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at org.apache.hadoop.ipc.ClientId.toString(ClientId.java:56) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendRpcIdsToString(FSEditLogOp.java:318) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$700(FSEditLogOp.java:153) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$DeleteSnapshotOp.toString(FSEditLogOp.java:3606) > at java.lang.String.valueOf(String.java:2994) > at java.lang.StringBuilder.append(StringBuilder.java:131) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:305) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:188) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:932) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:779) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:337) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1136) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:742) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:654) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:716) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:959) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:932) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1674) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1744){noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15566) NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0
[ https://issues.apache.org/jira/browse/HDFS-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-15566: Attachment: HDFS-15566-001.patch > NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0 > > > Key: HDFS-15566 > URL: https://issues.apache.org/jira/browse/HDFS-15566 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: HDFS-15566-001.patch > > > * After rollingUpgrade NN from 3.1.3/3.2.1 to 3.3.0, if the NN is restarted, > it fails while replaying edit logs. > * HDFS-14922, HDFS-14924, and HDFS-15054 introduced the *modification time* > bits to the editLog transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the *modification time* bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > {noformat} > 2020-09-07 19:34:42,085 | DEBUG | main | Stopping client | Client.java:1361 > 2020-09-07 19:34:42,087 | ERROR | main | Failed to start namenode. | > NameNode.java:1751 > java.lang.IllegalArgumentException > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at org.apache.hadoop.ipc.ClientId.toString(ClientId.java:56) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendRpcIdsToString(FSEditLogOp.java:318) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$700(FSEditLogOp.java:153) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$DeleteSnapshotOp.toString(FSEditLogOp.java:3606) > at java.lang.String.valueOf(String.java:2994) > at java.lang.StringBuilder.append(StringBuilder.java:131) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:305) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:188) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:932) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:779) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:337) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1136) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:742) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:654) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:716) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:959) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:932) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1674) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1744){noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15566) NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0
Brahma Reddy Battula created HDFS-15566: --- Summary: NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0 Key: HDFS-15566 URL: https://issues.apache.org/jira/browse/HDFS-15566 Project: Hadoop HDFS Issue Type: Bug Reporter: Brahma Reddy Battula * After rollingUpgrade NN from 3.1.3/3.2.1 to 3.3.0, if the NN is restarted, it fails while replaying edit logs. * HDFS-14922, HDFS-14924, and HDFS-15054 introduced the *modification time* bits to the editLog transactions. * When NN is restarted and the edit logs are replayed, the NN reads the old layout version from the editLog file. When parsing the transactions, it assumes that the transactions are also from the previous layout and hence skips parsing the *modification time* bits. * This cascades into reading the wrong set of bits for other fields and leads to NN shutting down. {noformat} 2020-09-07 19:34:42,085 | DEBUG | main | Stopping client | Client.java:1361 2020-09-07 19:34:42,087 | ERROR | main | Failed to start namenode. | NameNode.java:1751 java.lang.IllegalArgumentException at com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) at org.apache.hadoop.ipc.ClientId.toString(ClientId.java:56) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendRpcIdsToString(FSEditLogOp.java:318) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$700(FSEditLogOp.java:153) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$DeleteSnapshotOp.toString(FSEditLogOp.java:3606) at java.lang.String.valueOf(String.java:2994) at java.lang.StringBuilder.append(StringBuilder.java:131) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:305) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:188) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:932) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:779) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:337) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1136) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:742) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:654) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:716) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:959) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:932) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1674) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1744){noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15566) NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0
[ https://issues.apache.org/jira/browse/HDFS-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula reassigned HDFS-15566: --- Assignee: Brahma Reddy Battula > NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0 > > > Key: HDFS-15566 > URL: https://issues.apache.org/jira/browse/HDFS-15566 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Blocker > > * After rollingUpgrade NN from 3.1.3/3.2.1 to 3.3.0, if the NN is restarted, > it fails while replaying edit logs. > * HDFS-14922, HDFS-14924, and HDFS-15054 introduced the *modification time* > bits to the editLog transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the *modification time* bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > {noformat} > 2020-09-07 19:34:42,085 | DEBUG | main | Stopping client | Client.java:1361 > 2020-09-07 19:34:42,087 | ERROR | main | Failed to start namenode. | > NameNode.java:1751 > java.lang.IllegalArgumentException > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at org.apache.hadoop.ipc.ClientId.toString(ClientId.java:56) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendRpcIdsToString(FSEditLogOp.java:318) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$700(FSEditLogOp.java:153) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$DeleteSnapshotOp.toString(FSEditLogOp.java:3606) > at java.lang.String.valueOf(String.java:2994) > at java.lang.StringBuilder.append(StringBuilder.java:131) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:305) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:188) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:932) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:779) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:337) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1136) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:742) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:654) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:716) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:959) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:932) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1674) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1744){noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15566) NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0
[ https://issues.apache.org/jira/browse/HDFS-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-15566: Status: Patch Available (was: Open) > NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0 > > > Key: HDFS-15566 > URL: https://issues.apache.org/jira/browse/HDFS-15566 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula >Priority: Blocker > Attachments: HDFS-15566-001.patch > > > * After rollingUpgrade NN from 3.1.3/3.2.1 to 3.3.0, if the NN is restarted, > it fails while replaying edit logs. > * HDFS-14922, HDFS-14924, and HDFS-15054 introduced the *modification time* > bits to the editLog transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the *modification time* bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > {noformat} > 2020-09-07 19:34:42,085 | DEBUG | main | Stopping client | Client.java:1361 > 2020-09-07 19:34:42,087 | ERROR | main | Failed to start namenode. | > NameNode.java:1751 > java.lang.IllegalArgumentException > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at org.apache.hadoop.ipc.ClientId.toString(ClientId.java:56) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendRpcIdsToString(FSEditLogOp.java:318) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$700(FSEditLogOp.java:153) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$DeleteSnapshotOp.toString(FSEditLogOp.java:3606) > at java.lang.String.valueOf(String.java:2994) > at java.lang.StringBuilder.append(StringBuilder.java:131) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:305) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:188) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:932) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:779) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:337) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1136) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:742) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:654) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:716) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:959) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:932) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1674) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1744){noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15054) Delete Snapshot not updating new modification time
[ https://issues.apache.org/jira/browse/HDFS-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192364#comment-17192364 ] Brahma Reddy Battula commented on HDFS-15054: - Looks to be we need to change the LayoutVersion itself to fix the above issue, will raise another issue to track this. > Delete Snapshot not updating new modification time > -- > > Key: HDFS-15054 > URL: https://issues.apache.org/jira/browse/HDFS-15054 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Hemanth Boyina >Assignee: Hemanth Boyina >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15054.001.patch, HDFS-15054.002.patch > > > on creating a snapshot , we set modifcation time for the snapshot along with > that we update modification time of snapshot created directory > {code:java} > snapshotRoot.updateModificationTime(now, Snapshot.CURRENT_STATE_ID); > s.getRoot().setModificationTime(now, Snapshot.CURRENT_STATE_ID); {code} > So on deleting snapshot , we should update the modification time for snapshot > created directory . -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15054) Delete Snapshot not updating new modification time
[ https://issues.apache.org/jira/browse/HDFS-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191842#comment-17191842 ] Brahma Reddy Battula commented on HDFS-15054: - HDFS-14922 , HDFS-14924 and this are incompatible changes..? As we added new field,I feel we should have like following..to address the rollingupgrade scenaro's ..? {code:java} @Override void readFields(DataInputStream in, int logVersion) throws IOException { + int flags = in.readInt(); snapshotRoot = FSImageSerialization.readString(in); snapshotName = FSImageSerialization.readString(in); - mtime = FSImageSerialization.readLong(in); - + if ((flags & 0x1) != 0) { + mtime = FSImageSerialization.readLong(in); + } + + // read RPC ids if necessary readRpcIds(in, logVersion); } @@ -3483,9 +3484,15 @@ void readFields(DataInputStream in, int logVersion) throws IOException { @Override public void writeFields(DataOutputStream out, int logVersion) throws IOException { + int flags = + ((mtime != 0L && mtime != Long.MAX_VALUE ) ? 0x1 : 0); + out.writeInt(flags); FSImageSerialization.writeString(snapshotRoot, out); FSImageSerialization.writeString(snapshotName, out); - FSImageSerialization.writeLong(mtime, out); + if (((mtime != 0L && mtime != Long.MAX_VALUE))) { + FSImageSerialization.writeLong(mtime, out); + }{code} + > Delete Snapshot not updating new modification time > -- > > Key: HDFS-15054 > URL: https://issues.apache.org/jira/browse/HDFS-15054 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Hemanth Boyina >Assignee: Hemanth Boyina >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15054.001.patch, HDFS-15054.002.patch > > > on creating a snapshot , we set modifcation time for the snapshot along with > that we update modification time of snapshot created directory > {code:java} > snapshotRoot.updateModificationTime(now, Snapshot.CURRENT_STATE_ID); > s.getRoot().setModificationTime(now, Snapshot.CURRENT_STATE_ID); {code} > So on deleting snapshot , we should update the modification time for snapshot > created directory . -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15550) Remove unused imports from TestFileTruncate.java
[ https://issues.apache.org/jira/browse/HDFS-15550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17187291#comment-17187291 ] Brahma Reddy Battula commented on HDFS-15550: - [~Sushma_28] thanks for reporting... It's staright forward +1 on patch. ASF Warnings are because of YARN-10386, I commented same there. and test failures are unrelated.. > Remove unused imports from TestFileTruncate.java > > > Key: HDFS-15550 > URL: https://issues.apache.org/jira/browse/HDFS-15550 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Minor > Attachments: HDFS-15550.001.patch > > > {{import org.apache.hadoop.fs.BlockLocation and import org.junit.Assert > remain unused in }}{{TestFileTruncate.java}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15510) RBF: Quota and Content Summary was not correct in Multiple Destinations
[ https://issues.apache.org/jira/browse/HDFS-15510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17179141#comment-17179141 ] Brahma Reddy Battula edited comment on HDFS-15510 at 8/17/20, 5:30 PM: --- thanks [~hemanthboyina] reporting this issue. IMO, #3 will be better choice as you also suggested because application will aware what they set and based on that application might validate. Setting Quota on multiple destinations with same value should fine I feel * As these commnads will come from the router to namenodes,application/user might not interact directly. * On unavailablity of other namespaces user can able to create files ( e.g Say two namespaces NS1,NS2 with 10 quota, NS1 is unavaiable due to some reason). Hoping when applications configured with router they shouldn't connect to nameserice directly . Even #2 also better, but these needs change current invocation of the commands as the quota value will be different for nameservices and unavailable of name serice might mislead user. E.G : * ns1-5, ns2-5, while writing 9th file NS2 is avialble, then it will quota exceed exception with "5" even user set "10". * ns1-7,ns2-7,ns3-6,if ns3 is unaviable we always fail write after 14 files, even we've available namespace ns1 and ns2 #1, this might not better than above two, as this can allow more than application/user set. was (Author: brahmareddy): thanks [~hemanthboyina] reporting this issue. IMO, #3 will be better choice as you also suggested because application will aware what they set and based on that application might validate. Setting Quota on multiple destinations with same value should fine I feel * As these commnads will come from the router to namenodes,application/user might not interact direc * On unavailablity of other namespaces user can able to create files ( e.g Say two namespaces NS1,NS2 with 10 quota, NS1 is unavaiable due to some reason). Hoping when applications configured with router they shouldn't connect to nameserice directly . Even #2 also better, but these needs change current invocation of the commands as the quota value will be different for nameservices and unavailable of name serice might mislead user. E.G : * ns1-5, ns2-5, while writing 9th file NS2 is avialble, then it will quota exceed exception with "5" even user set "10". * ns1-7,ns2-7,ns3-6,if ns3 is unaviable we always fail write after 14 files, even we've available namespace ns1 and ns2 #1, this might not better than above two, as this can allow more than application/user set. > RBF: Quota and Content Summary was not correct in Multiple Destinations > --- > > Key: HDFS-15510 > URL: https://issues.apache.org/jira/browse/HDFS-15510 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Hemanth Boyina >Assignee: Hemanth Boyina >Priority: Critical > Attachments: 15510.png > > > steps : > *) create a mount entry with multiple destinations ( for suppose 2) > *) Set NS quota as 10 for mount entry by dfsrouteradmin command, Content > Summary on the Mount Entry shows NS quota as 20 > *) Create 10 files through router, on creating 11th file , NS Quota Exceeded > Exception is coming > though the Content Summary showing the NS quota as 20 , we are not able to > create 20 files > > the problem here is router stores the mount entry's NS quota as 10 , but > invokes NS quota on both the name services by set NS quota as 10 , so content > summary on mount entry aggregates the content summary of both the name > services by making NS quota as 20 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15510) RBF: Quota and Content Summary was not correct in Multiple Destinations
[ https://issues.apache.org/jira/browse/HDFS-15510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17179141#comment-17179141 ] Brahma Reddy Battula commented on HDFS-15510: - thanks [~hemanthboyina] reporting this issue. IMO, #3 will be better choice as you also suggested because application will aware what they set and based on that application might validate. Setting Quota on multiple destinations with same value should fine I feel * As these commnads will come from the router to namenodes,application/user might not interact direc * On unavailablity of other namespaces user can able to create files ( e.g Say two namespaces NS1,NS2 with 10 quota, NS1 is unavaiable due to some reason). Hoping when applications configured with router they shouldn't connect to nameserice directly . Even #2 also better, but these needs change current invocation of the commands as the quota value will be different for nameservices and unavailable of name serice might mislead user. E.G : * ns1-5, ns2-5, while writing 9th file NS2 is avialble, then it will quota exceed exception with "5" even user set "10". * ns1-7,ns2-7,ns3-6,if ns3 is unaviable we always fail write after 14 files, even we've available namespace ns1 and ns2 #1, this might not better than above two, as this can allow more than application/user set. > RBF: Quota and Content Summary was not correct in Multiple Destinations > --- > > Key: HDFS-15510 > URL: https://issues.apache.org/jira/browse/HDFS-15510 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Hemanth Boyina >Assignee: Hemanth Boyina >Priority: Critical > Attachments: 15510.png > > > steps : > *) create a mount entry with multiple destinations ( for suppose 2) > *) Set NS quota as 10 for mount entry by dfsrouteradmin command, Content > Summary on the Mount Entry shows NS quota as 20 > *) Create 10 files through router, on creating 11th file , NS Quota Exceeded > Exception is coming > though the Content Summary showing the NS quota as 20 , we are not able to > create 20 files > > the problem here is router stores the mount entry's NS quota as 10 , but > invokes NS quota on both the name services by set NS quota as 10 , so content > summary on mount entry aggregates the content summary of both the name > services by making NS quota as 20 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15503) File and directory permissions are not able to be modified from WebUI
[ https://issues.apache.org/jira/browse/HDFS-15503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170062#comment-17170062 ] Brahma Reddy Battula commented on HDFS-15503: - +1 > File and directory permissions are not able to be modified from WebUI > - > > Key: HDFS-15503 > URL: https://issues.apache.org/jira/browse/HDFS-15503 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Hemanth Boyina >Assignee: Hemanth Boyina >Priority: Major > Attachments: HDFS-15503.001.patch, after-HDFS-15503.png, > before-HDFS-15503.png > > > After upgrading bootstrap from 3.3.7 to 3.4.1 the bootstrap popover content > is not being shown in Browse File System Permission column -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15503) File and directory permissions are not able to be modified from WebUI
[ https://issues.apache.org/jira/browse/HDFS-15503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170057#comment-17170057 ] Brahma Reddy Battula commented on HDFS-15503: - Added the broken link. > File and directory permissions are not able to be modified from WebUI > - > > Key: HDFS-15503 > URL: https://issues.apache.org/jira/browse/HDFS-15503 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Hemanth Boyina >Assignee: Hemanth Boyina >Priority: Major > Attachments: HDFS-15503.001.patch, after-HDFS-15503.png, > before-HDFS-15503.png > > > After upgrading bootstrap from 3.3.7 to 3.4.1 the bootstrap popover content > is not being shown in Browse File System Permission column -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15434) RBF: MountTableResolver#getDestinationForPath failing with AssertionError from localCache
[ https://issues.apache.org/jira/browse/HDFS-15434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146447#comment-17146447 ] Brahma Reddy Battula commented on HDFS-15434: - [~inigoiri] and [~crh] did you come across this scenario..? > RBF: MountTableResolver#getDestinationForPath failing with AssertionError > from localCache > - > > Key: HDFS-15434 > URL: https://issues.apache.org/jira/browse/HDFS-15434 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Priority: Major > > {code:java} > org.apache.hadoop.ipc.Remote.Exception : java.lang.AssertionError > com.google.common.cache.LocalCache$Segment.evictEntries(LocalCache.java:2698) > at > com.google.common.cache.LocalCache$Segment.storeLoadedValue(LocalCache.java:3166) > > at > com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2386) > at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2351) > at > com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313) > > at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) at > at com.google.common.cache.LocalCache.get(LocalCache.java:3965) > at > com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764) > at > org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver.getDestinationForPath(MountTableResolver.java:382) > at > org.apache.hadoop.hdfs.server.federation.resolver.MultipleDestinationMountTableResolver.getDestinationForPath(MultipleDestinationMountTableResolver.java:87) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:1406) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:1389) > at > org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.getFileInfo(RouterClientProtocol.java:741) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:763) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15406) Improve the speed of Datanode Block Scan
[ https://issues.apache.org/jira/browse/HDFS-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17135742#comment-17135742 ] Brahma Reddy Battula commented on HDFS-15406: - {quote}we get the datanode jstack, with 11M block , found that getDiskReport run nearly 23 min,then hold lock to process scan about 6 min. {quote} getDiskReport() (After HDFS-13947) getVolumeReports()) can be improved by confiuring the "dfs.datanode.directoryscan.threads" more. {quote} hold lock to process scan about 6 min {quote} Not sure, whether HDFS-9668 will address the same. > Improve the speed of Datanode Block Scan > > > Key: HDFS-15406 > URL: https://issues.apache.org/jira/browse/HDFS-15406 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > > In our customer cluster we have approx 10M blocks in one datanode > the Datanode to scans all the blocks , it has taken nearly 5mins > {code:java} > 2020-06-10 12:17:06,869 | INFO | > java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty > queue] | BlockPool BP-1104115233-**.**.**.**-1571300215588 Total blocks: > 11149530, missing metadata files:472, missing block files:472, missing blocks > in memory:0, mismatched blocks:0 | DirectoryScanner.java:473 > 2020-06-10 12:17:06,869 | WARN | > java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty > queue] | Lock held time above threshold: lock identifier: > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl > lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: > java.lang.Thread.getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) > org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148) > org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186) > org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133) > org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84) > org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96) > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475) > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375) > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320) > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > java.lang.Thread.run(Thread.java:748) > | InstrumentedLock.java:143 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15379) DataStreamer should reset thread interrupted state in createBlockOutputStream
[ https://issues.apache.org/jira/browse/HDFS-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125997#comment-17125997 ] Brahma Reddy Battula edited comment on HDFS-15379 at 6/4/20, 3:10 PM: -- [~pilchard] thanks for reporting the issue and tagging me. IMO, Channel operations are bound to the thread doing the operations. If this thread is interrupted, the stream / channel is closed due to IO safety issues. with this fix, I think, we are relaxing this?? was (Author: brahmareddy): [~pilchard] thanks for reporting the issue and tagging me. > DataStreamer should reset thread interrupted state in createBlockOutputStream > - > > Key: HDFS-15379 > URL: https://issues.apache.org/jira/browse/HDFS-15379 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfsclient >Affects Versions: 2.7.7, 3.1.3 >Reporter: ludun >Assignee: ludun >Priority: Major > Attachments: HDFS-15379.001.patch, HDFS-15379.002.patch, > HDFS-15379.003.patch, HDFS-15379.004.patch > > > In createBlockOutputStream if thread was interrupted becuase timeout to > conenct to DataNode. > {code}2020-05-27 18:32:53,310 | DEBUG | Connecting to datanode > xx.xx.xx.xx:25009 | DataStreamer.java:251 > 2020-05-27 18:33:50,457 | INFO | Exception in createBlockOutputStream > blk_1115121199_41386360 | DataStreamer.java:1854 > java.io.InterruptedIOException: Interrupted while waiting for IO on channel > java.nio.channels.SocketChannel[connected local=/xx.xx.xx.xx:40370 > remote=/xx.xx.xx.xx:25009]. 615000 millis timeout left. > at > org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:342) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) > at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) > at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) > at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118) > at java.io.FilterInputStream.read(FilterInputStream.java:83) > at java.io.FilterInputStream.read(FilterInputStream.java:83) > at > org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:551) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1826) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1743) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:718) > {code} > then abandonBlockrpc to namenode also failed due to interrupted exception > immediately. > {code}2020-05-27 18:33:50,461 | DEBUG | Connecting to xx/xx.xx.xx.xx:25000 | > Client.java:814 > 2020-05-27 18:33:50,462 | DEBUG | Failed to connect to server: > xx/xx.xx.xx.xx:25000: try once and fail. | Client.java:956 > java.nio.channels.ClosedByInterruptException > at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:720) > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:823) > at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:436) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1613) > at org.apache.hadoop.ipc.Client.call(Client.java:1444) > at org.apache.hadoop.ipc.Client.call(Client.java:1397) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:234) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy10.abandonBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.abandonBlock(ClientNamenodeProtocolTranslatorPB.java:509) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at >
[jira] [Commented] (HDFS-15379) DataStreamer should reset thread interrupted state in createBlockOutputStream
[ https://issues.apache.org/jira/browse/HDFS-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125997#comment-17125997 ] Brahma Reddy Battula commented on HDFS-15379: - [~pilchard] thanks for reporting the issue and tagging me. > DataStreamer should reset thread interrupted state in createBlockOutputStream > - > > Key: HDFS-15379 > URL: https://issues.apache.org/jira/browse/HDFS-15379 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfsclient >Affects Versions: 2.7.7, 3.1.3 >Reporter: ludun >Assignee: ludun >Priority: Major > Attachments: HDFS-15379.001.patch, HDFS-15379.002.patch, > HDFS-15379.003.patch, HDFS-15379.004.patch > > > In createBlockOutputStream if thread was interrupted becuase timeout to > conenct to DataNode. > {code}2020-05-27 18:32:53,310 | DEBUG | Connecting to datanode > xx.xx.xx.xx:25009 | DataStreamer.java:251 > 2020-05-27 18:33:50,457 | INFO | Exception in createBlockOutputStream > blk_1115121199_41386360 | DataStreamer.java:1854 > java.io.InterruptedIOException: Interrupted while waiting for IO on channel > java.nio.channels.SocketChannel[connected local=/xx.xx.xx.xx:40370 > remote=/xx.xx.xx.xx:25009]. 615000 millis timeout left. > at > org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:342) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) > at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) > at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) > at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118) > at java.io.FilterInputStream.read(FilterInputStream.java:83) > at java.io.FilterInputStream.read(FilterInputStream.java:83) > at > org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:551) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1826) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1743) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:718) > {code} > then abandonBlockrpc to namenode also failed due to interrupted exception > immediately. > {code}2020-05-27 18:33:50,461 | DEBUG | Connecting to xx/xx.xx.xx.xx:25000 | > Client.java:814 > 2020-05-27 18:33:50,462 | DEBUG | Failed to connect to server: > xx/xx.xx.xx.xx:25000: try once and fail. | Client.java:956 > java.nio.channels.ClosedByInterruptException > at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:720) > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:823) > at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:436) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1613) > at org.apache.hadoop.ipc.Client.call(Client.java:1444) > at org.apache.hadoop.ipc.Client.call(Client.java:1397) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:234) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy10.abandonBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.abandonBlock(ClientNamenodeProtocolTranslatorPB.java:509) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy11.abandonBlock(Unknown Source) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1748) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:718) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HDFS-15218) RBF: MountTableRefresherService fail in secure cluster.
[ https://issues.apache.org/jira/browse/HDFS-15218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081523#comment-17081523 ] Brahma Reddy Battula commented on HDFS-15218: - [~surendrasingh] thanks reporting, Patch lgtm (as testcase will be addressed with HDFS-15198). I feel, this can be merged to branch-3.3..[~elgoiri] what do you think..? > RBF: MountTableRefresherService fail in secure cluster. > --- > > Key: HDFS-15218 > URL: https://issues.apache.org/jira/browse/HDFS-15218 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-15218.001.patch > > > {code:java} > 2020-03-09 12:43:50,082 | ERROR | MountTableRefresh_linux-133:25020 | Failed > to refresh mount table entries cache at router X:25020 | > MountTableRefresherThread.java:69 > java.io.IOException: DestHost:destPort X:25020 , LocalHost:localPort > XXX/XXX:0. Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > org.apache.hadoop.hdfs.protocolPB.RouterAdminProtocolTranslatorPB.refreshMountTableEntries(RouterAdminProtocolTranslatorPB.java:284) > at > org.apache.hadoop.hdfs.server.federation.router.MountTableRefresherThread.run(MountTableRefresherThread.java:65) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory
[ https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081471#comment-17081471 ] Brahma Reddy Battula edited comment on HDFS-14476 at 4/11/20, 6:35 PM: --- Hi All, this Jira is committed to 3.3.0 and release is planned this week,this shouldn't open,Can we close this Jira and raise seperate Jira to track..? was (Author: brahmareddy): Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > lock too long when fix inconsistent blocks between disk and in-memory > - > > Key: HDFS-14476 > URL: https://issues.apache.org/jira/browse/HDFS-14476 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.6.0, 2.7.0, 3.0.3 >Reporter: Sean Chow >Assignee: Sean Chow >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14476-branch-2.01.patch, > HDFS-14476-branch-2.02.patch, HDFS-14476-branch-2.10.02.patch, > HDFS-14476.00.patch, HDFS-14476.002.patch, HDFS-14476.01.patch, > HDFS-14476.branch-3.2.001.patch, datanode-with-patch-14476.png > > > When directoryScanner have the results of differences between disk and > in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However > {{FsDatasetImpl.checkAndUpdate}} is a synchronized call > As I have about 6millions blocks for every datanodes and every 6hours' scan > will have about 25000 abnormal blocks to fix. That leads to a long lock > holding FsDatasetImpl object. > let's assume every block need 10ms to fix(because of latency of SAS disk), > that will cost 250 seconds to finish. That means all reads and writes will be > blocked for 3mins for that datanode. > > {code:java} > 2019-05-06 08:06:51,704 INFO > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool > BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing > metadata files:23574, missing block files:23574, missing blocks in > memory:47625, mismatched blocks:0 > ... > 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Took 588402ms to process 1 commands from NN > {code} > Take long time to process command from nn because threads are blocked. And > namenode will see long lastContact time for this datanode. > Maybe this affect all hdfs versions. > *how to fix:* > just like process invalidate command from namenode with 1000 batch size, fix > these abnormal block should be handled with batch too and sleep 2 seconds > between the batch to allow normal reading/writing blocks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13639) SlotReleaser is not fast enough
[ https://issues.apache.org/jira/browse/HDFS-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-13639: Fix Version/s: (was: 3.3.0) > SlotReleaser is not fast enough > --- > > Key: HDFS-13639 > URL: https://issues.apache.org/jira/browse/HDFS-13639 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.4.0, 2.6.0, 3.0.2 > Environment: 1. YCSB: > {color:#00} recordcount=20 > fieldcount=1 > fieldlength=1000 > operationcount=1000 > > workload=com.yahoo.ycsb.workloads.CoreWorkload > > table=ycsb-test > columnfamily=C > readproportion=1 > updateproportion=0 > insertproportion=0 > scanproportion=0 > > maxscanlength=0 > requestdistribution=zipfian > > # default > readallfields=true > writeallfields=true > scanlengthdistribution=constan{color} > {color:#00}2. datanode:{color} > -Xmx2048m -Xms2048m -Xmn1024m -XX:MaxDirectMemorySize=1024m > -XX:MaxPermSize=256m -Xloggc:$run_dir/stdout/datanode_gc_${start_time}.log > -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=$log_dir -XX:+PrintGCApplicationStoppedTime > -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 > -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled > -XX:+CMSClassUnloadingEnabled -XX:CMSMaxAbortablePrecleanTime=1 > -XX:+CMSScavengeBeforeRemark -XX:+PrintPromotionFailure > -XX:+CMSConcurrentMTEnabled -XX:+ExplicitGCInvokesConcurrent > -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking > -verbose:gc -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps > {color:#00}3. regionserver:{color} > {color:#00}-Xmx10g -Xms10g -XX:MaxDirectMemorySize=10g > -XX:MaxGCPauseMillis=150 -XX:MaxTenuringThreshold=2 > -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=5 > -Xloggc:$run_dir/stdout/regionserver_gc_${start_time}.log -Xss256k > -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$log_dir -verbose:gc > -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime > -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy > -XX:+PrintTenuringDistribution -XX:+PrintSafepointStatistics > -XX:PrintSafepointStatisticsCount=1 -XX:PrintFLSStatistics=1 > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=100 -XX:GCLogFileSize=128m > -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking > -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=65 > -XX:+ParallelRefProcEnabled -XX:ConcGCThreads=4 -XX:ParallelGCThreads=16 > -XX:G1HeapRegionSize=32m -XX:G1MixedGCCountTarget=64 > -XX:G1OldCSetRegionThresholdPercent=5{color} > {color:#00}block cache is disabled:{color}{color:#00} > hbase.bucketcache.size > 0.9 > {color} > >Reporter: Gang Xie >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-13639-2.4.diff, HDFS-13639.001.patch, > HDFS-13639.002.patch, ShortCircuitCache_new_slotReleaser.diff, > perf_after_improve_SlotReleaser.png, perf_before_improve_SlotReleaser.png > > > When test the performance of the ShortCircuit Read of the HDFS with YCSB, we > find that SlotReleaser of the ShortCircuitCache has some performance issue. > The problem is that, the qps of the slot releasing could only reach to 1000+ > while the qps of the slot allocating is ~3000. This means that the replica > info on datanode could not be released in time, which causes a lot of GCs and > finally full GCs. > > The fireflame graph shows that SlotReleaser spends a lot of time to do domain > socket connecting and throw/catching the exception when close the domain > socket and its streams. It doesn't make any sense to do the connecting and > closing each time. Each time when we connect to the domain socket, Datanode > allocates a new thread to free the slot. There are a lot of initializing > work, and it's costly. We need reuse the domain socket. > > After switch to reuse the domain socket(see diff attached), we get great > improvement(see the perf): > # without reusing the domain socket, the get qps of the YCSB getting worse > and worse, and after about 45 mins, full GC starts. When we reuse the domain > socket, no full GC found, and the stress test could be finished smoothly, the > qps of allocating and releasing match. > # Due to the datanode young GC, without the improvement, the YCSB get qps is > even smaller than the one with the improvement, ~3700 VS ~4200. > The diff is against 2.4, and I think this issue exists till latest version. I > doesn't have test env with 2.7 and higher version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14788) Use dynamic regex filter to ignore copy of source files in Distcp
[ https://issues.apache.org/jira/browse/HDFS-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079780#comment-17079780 ] Brahma Reddy Battula edited comment on HDFS-14788 at 4/11/20, 6:29 PM: --- [~ste...@apache.org] can we close this jira, as this committed ..? and if there is any plan for branch-3.2/3.1, can we track seperate jita..? was (Author: brahmareddy): Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Use dynamic regex filter to ignore copy of source files in Distcp > - > > Key: HDFS-14788 > URL: https://issues.apache.org/jira/browse/HDFS-14788 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp >Affects Versions: 3.2.1 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Fix For: 3.3.0 > > > There is a feature in Distcp where we can ignore specific files to get copied > to the destination. This is currently based on a filter regex which is read > from a specific file. The process of creating different regex file for > different distcp jobs seems like a tedious task. What we are proposing is to > expose a regex_filter parameter which can be set during Distcp job creation > and use this filter in a new implementation CopyFilter class. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HDFS-14788) Use dynamic regex filter to ignore copy of source files in Distcp
[ https://issues.apache.org/jira/browse/HDFS-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-14788: Comment: was deleted (was: Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. ) > Use dynamic regex filter to ignore copy of source files in Distcp > - > > Key: HDFS-14788 > URL: https://issues.apache.org/jira/browse/HDFS-14788 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp >Affects Versions: 3.2.1 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Fix For: 3.3.0 > > > There is a feature in Distcp where we can ignore specific files to get copied > to the destination. This is currently based on a filter regex which is read > from a specific file. The process of creating different regex file for > different distcp jobs seems like a tedious task. What we are proposing is to > expose a regex_filter parameter which can be set during Distcp job creation > and use this filter in a new implementation CopyFilter class. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13287) TestINodeFile#testGetBlockType results in NPE when run alone
[ https://issues.apache.org/jira/browse/HDFS-13287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-13287: Fix Version/s: (was: 3.1.4) (was: 3.3.0) > TestINodeFile#testGetBlockType results in NPE when run alone > > > Key: HDFS-13287 > URL: https://issues.apache.org/jira/browse/HDFS-13287 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti >Priority: Minor > Attachments: HDFS-13287.01.patch > > > When TestINodeFile#testGetBlockType is run by itself, it results in the > following error: > {code:java} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.218 > s <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestINodeFile > [ERROR] > testGetBlockType(org.apache.hadoop.hdfs.server.namenode.TestINodeFile) Time > elapsed: 0.023 s <<< ERROR! > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.ErasureCodingPolicyManager.getPolicyInfoByID(ErasureCodingPolicyManager.java:220) > at > org.apache.hadoop.hdfs.server.namenode.ErasureCodingPolicyManager.getByID(ErasureCodingPolicyManager.java:208) > at > org.apache.hadoop.hdfs.server.namenode.INodeFile$HeaderFormat.getBlockLayoutRedundancy(INodeFile.java:207) > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.(INodeFile.java:266) > at > org.apache.hadoop.hdfs.server.namenode.TestINodeFile.createStripedINodeFile(TestINodeFile.java:112) > at > org.apache.hadoop.hdfs.server.namenode.TestINodeFile.testGetBlockType(TestINodeFile.java:299) > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13287) TestINodeFile#testGetBlockType results in NPE when run alone
[ https://issues.apache.org/jira/browse/HDFS-13287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081412#comment-17081412 ] Brahma Reddy Battula edited comment on HDFS-13287 at 4/11/20, 6:25 PM: --- Removing the fix version , as this reverted. was (Author: brahmareddy): Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > TestINodeFile#testGetBlockType results in NPE when run alone > > > Key: HDFS-13287 > URL: https://issues.apache.org/jira/browse/HDFS-13287 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti >Priority: Minor > Fix For: 3.3.0, 3.1.4 > > Attachments: HDFS-13287.01.patch > > > When TestINodeFile#testGetBlockType is run by itself, it results in the > following error: > {code:java} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.218 > s <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestINodeFile > [ERROR] > testGetBlockType(org.apache.hadoop.hdfs.server.namenode.TestINodeFile) Time > elapsed: 0.023 s <<< ERROR! > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.ErasureCodingPolicyManager.getPolicyInfoByID(ErasureCodingPolicyManager.java:220) > at > org.apache.hadoop.hdfs.server.namenode.ErasureCodingPolicyManager.getByID(ErasureCodingPolicyManager.java:208) > at > org.apache.hadoop.hdfs.server.namenode.INodeFile$HeaderFormat.getBlockLayoutRedundancy(INodeFile.java:207) > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.(INodeFile.java:266) > at > org.apache.hadoop.hdfs.server.namenode.TestINodeFile.createStripedINodeFile(TestINodeFile.java:112) > at > org.apache.hadoop.hdfs.server.namenode.TestINodeFile.testGetBlockType(TestINodeFile.java:299) > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14261) Kerberize JournalNodeSyncer unit test
[ https://issues.apache.org/jira/browse/HDFS-14261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081432#comment-17081432 ] Brahma Reddy Battula edited comment on HDFS-14261 at 4/11/20, 6:24 PM: --- Removing the fix version as this is reverted. was (Author: brahmareddy): Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Kerberize JournalNodeSyncer unit test > - > > Key: HDFS-14261 > URL: https://issues.apache.org/jira/browse/HDFS-14261 > Project: Hadoop HDFS > Issue Type: Test > Components: journal-node, security, test >Affects Versions: 3.2.0, 3.1.2 >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14261.001.patch > > > This jira is an addition to HDFS-14140. Making the unit tests in > TestJournalNodeSync run on a Kerberized cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14261) Kerberize JournalNodeSyncer unit test
[ https://issues.apache.org/jira/browse/HDFS-14261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-14261: Fix Version/s: (was: 3.3.0) > Kerberize JournalNodeSyncer unit test > - > > Key: HDFS-14261 > URL: https://issues.apache.org/jira/browse/HDFS-14261 > Project: Hadoop HDFS > Issue Type: Test > Components: journal-node, security, test >Affects Versions: 3.2.0, 3.1.2 >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Attachments: HDFS-14261.001.patch > > > This jira is an addition to HDFS-14140. Making the unit tests in > TestJournalNodeSync run on a Kerberized cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14353) Erasure Coding: metrics xmitsInProgress become to negative.
[ https://issues.apache.org/jira/browse/HDFS-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081406#comment-17081406 ] Brahma Reddy Battula edited comment on HDFS-14353 at 4/11/20, 6:22 PM: --- [~elgoiri] as this is reverted, can I remove the fix version for this jira..? was (Author: brahmareddy): Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Erasure Coding: metrics xmitsInProgress become to negative. > --- > > Key: HDFS-14353 > URL: https://issues.apache.org/jira/browse/HDFS-14353 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, erasure-coding >Affects Versions: 3.3.0 >Reporter: maobaolong >Assignee: maobaolong >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14353.001.patch, HDFS-14353.002.patch, > HDFS-14353.003.patch, HDFS-14353.004.patch, HDFS-14353.005.patch, > HDFS-14353.006.patch, HDFS-14353.007.patch, HDFS-14353.008.patch, > HDFS-14353.009.patch, screenshot-1.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12862) CacheDirective becomes invalid when NN restart or failover
[ https://issues.apache.org/jira/browse/HDFS-12862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081405#comment-17081405 ] Brahma Reddy Battula edited comment on HDFS-12862 at 4/11/20, 6:21 PM: --- [~weichiu] any update on branch-3.1 patch for this issue..? Can be raised another Jira to track branch-3.1 patch..? Issue should not be open As this merged to 3.3.0 which is going to be released. was (Author: brahmareddy): Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > CacheDirective becomes invalid when NN restart or failover > -- > > Key: HDFS-12862 > URL: https://issues.apache.org/jira/browse/HDFS-12862 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching, hdfs >Affects Versions: 2.7.1 > Environment: >Reporter: Wang XL >Assignee: Wang XL >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: HDFS-12862-branch-2.7.1.001.patch, > HDFS-12862-trunk.002.patch, HDFS-12862-trunk.003.patch, > HDFS-12862-trunk.004.patch, HDFS-12862.005.patch, HDFS-12862.006.patch, > HDFS-12862.007.patch, HDFS-12862.branch-3.1.patch > > > The logic in FSNDNCacheOp#modifyCacheDirective is not correct. when modify > cacheDirective,the expiration in directive may be a relative expiryTime, and > EditLog will serial a relative expiry time. > {code:java} > // Some comments here > static void modifyCacheDirective( > FSNamesystem fsn, CacheManager cacheManager, CacheDirectiveInfo > directive, > EnumSet flags, boolean logRetryCache) throws IOException { > final FSPermissionChecker pc = getFsPermissionChecker(fsn); > cacheManager.modifyDirective(directive, pc, flags); > fsn.getEditLog().logModifyCacheDirectiveInfo(directive, logRetryCache); > } > {code} > But when SBN replay the log ,it will invoke > FSImageSerialization#readCacheDirectiveInfo as a absolute expiryTime.It will > result in the inconsistency . > {code:java} > public static CacheDirectiveInfo readCacheDirectiveInfo(DataInput in) > throws IOException { > CacheDirectiveInfo.Builder builder = > new CacheDirectiveInfo.Builder(); > builder.setId(readLong(in)); > int flags = in.readInt(); > if ((flags & 0x1) != 0) { > builder.setPath(new Path(readString(in))); > } > if ((flags & 0x2) != 0) { > builder.setReplication(readShort(in)); > } > if ((flags & 0x4) != 0) { > builder.setPool(readString(in)); > } > if ((flags & 0x8) != 0) { > builder.setExpiration( > CacheDirectiveInfo.Expiration.newAbsolute(readLong(in))); > } > if ((flags & ~0xF) != 0) { > throw new IOException("unknown flags set in " + > "ModifyCacheDirectiveInfoOp: " + flags); > } > return builder.build(); > } > {code} > In other words, fsn.getEditLog().logModifyCacheDirectiveInfo(directive, > logRetryCache) may serial a relative expiry time,But > builder.setExpiration(CacheDirectiveInfo.Expiration.newAbsolute(readLong(in))) >read it as a absolute expiryTime. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10848) Move hadoop-hdfs-native-client module into hadoop-hdfs-client
[ https://issues.apache.org/jira/browse/HDFS-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-10848: Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Move hadoop-hdfs-native-client module into hadoop-hdfs-client > - > > Key: HDFS-10848 > URL: https://issues.apache.org/jira/browse/HDFS-10848 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Akira Ajisaka >Assignee: Huafeng Wang >Priority: Major > Attachments: HDFS-10848.001.patch > > > When a patch changes hadoop-hdfs-client module, Jenkins does not pick up the > tests in the native code. That way we overlooked test failure when committing > the patch. (ex. HDFS-10844) > [~aw] said in HDFS-10844, > bq. Ideally, all of this native code would be hdfs-client. Then when a change > is made to to that code, this code will also get tested. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10364) Log current node in reversexml tool when parse failed
[ https://issues.apache.org/jira/browse/HDFS-10364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-10364: Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Log current node in reversexml tool when parse failed > - > > Key: HDFS-10364 > URL: https://issues.apache.org/jira/browse/HDFS-10364 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Trivial > Attachments: HDFS-10364.01.patch > > > Sometimes we want to modify the xml before converting it. If some error > happened, it's hard to find out. Adding a line to tell where the failure is > would be helpful. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12116) BlockReportTestBase#blockReport_08 and #blockReport_08 intermittently fail
[ https://issues.apache.org/jira/browse/HDFS-12116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-12116: Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > BlockReportTestBase#blockReport_08 and #blockReport_08 intermittently fail > -- > > Key: HDFS-12116 > URL: https://issues.apache.org/jira/browse/HDFS-12116 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 0.22.0 >Reporter: Xiao Chen >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-12116.01.patch, HDFS-12116.02.patch, > HDFS-12116.03.patch, > TEST-org.apache.hadoop.hdfs.server.datanode.TestNNHandlesBlockReportPerStorage.xml > > > This seems to be long-standing, but the failure rate (~10%) is slightly > higher in dist-test run in using cdh. > In both _08 and _09 tests: > # an attempt is made to make a replica in {{TEMPORARY}} > state, by {{waitForTempReplica}}. > # Once that's returned, the test goes on to verify block reports shows > correct pending replication blocks. > But there's a race condition. If the replica is replicated between steps #1 > and #2, {{getPendingReplicationBlocks}} could return 0 or 1, depending on how > many replicas are replicated, hence failing the test. > Failures are seen on both {{TestNNHandlesBlockReportPerStorage}} and > {{TestNNHandlesCombinedBlockReport}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11987) DistributedFileSystem#create and append do not honor CreateFlag.CREATE|APPEND
[ https://issues.apache.org/jira/browse/HDFS-11987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-11987: Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > DistributedFileSystem#create and append do not honor CreateFlag.CREATE|APPEND > - > > Key: HDFS-11987 > URL: https://issues.apache.org/jira/browse/HDFS-11987 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.1, 3.0.0-alpha3 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Major > Attachments: HDFS-11987.00.patch > > > {{DistributedFileSystem#create()}} and {{DistributedFIleSystem#append()}} do > not honor the expected behavior on {{CreateFlag.CREATE|APPEND}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11825) Make dfshealth datanode information more readable
[ https://issues.apache.org/jira/browse/HDFS-11825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-11825: Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Make dfshealth datanode information more readable > - > > Key: HDFS-11825 > URL: https://issues.apache.org/jira/browse/HDFS-11825 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-alpha2 >Reporter: Sarah Victor >Assignee: Sarah Victor >Priority: Trivial > Attachments: HADOOP-11825.1.patch > > > On the dfshealth.html#tab-datanode web page to monitor health of Hadoop nodes > (hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.html) > (1) The Capacity column reports the node capacity but is sorted by another > value that is not reported (ie) the % Capacity Used. It will be good to name > the column heading as "Capacity Used [%]" and report both the Capacity Used > and the Capacity Used % > (2) The Block Pool Used column reports the block pools used but is sorted by > the % Block Pool Used, It will be good to name the column heading as Block > Pool Used [%] > (3) The "Last contact" column is more easily understood if we call it "Last > Heartbeat". Adding the measurement unit [secs] to the header makes it more > easily understood. > (3) Adding the measurement unit [mins] to the "Last Block Report" header > makes it more easily understood. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12811) A cheaper, faster, less memory intensive Hadoop fsck
[ https://issues.apache.org/jira/browse/HDFS-12811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-12811: Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > A cheaper, faster, less memory intensive Hadoop fsck > > > Key: HDFS-12811 > URL: https://issues.apache.org/jira/browse/HDFS-12811 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > > A cheaper, faster, less memory intensive, approach is eliminate the traversal > by directly scanning the inode table. > A side-effect is paths would be scanned and displayed in a random order. A > new option, ex. "-direct" or "-fast", would likely be required to avoid > compatibility issues. > PS: This enhancement is valid only if the path is root directory {{/}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9940) Balancer should not use property dfs.datanode.balance.max.concurrent.moves
[ https://issues.apache.org/jira/browse/HDFS-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-9940: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Balancer should not use property dfs.datanode.balance.max.concurrent.moves > -- > > Key: HDFS-9940 > URL: https://issues.apache.org/jira/browse/HDFS-9940 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover >Affects Versions: 2.6.0 >Reporter: John Zhuge >Assignee: John Zhuge >Priority: Minor > Labels: supportability > > It is very confusing for both Balancer and Datanode to use the same property > {{dfs.datanode.balance.max.concurrent.moves}}. It is especially so for the > Balancer because the property has "datanode" in the name string. Many > customers forget to set the property for the Balancer. > Change the Balancer to use a new property > {{dfs.balancer.max.concurrent.moves}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-7550) Minor followon cleanups from HDFS-7543
[ https://issues.apache.org/jira/browse/HDFS-7550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-7550: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Minor followon cleanups from HDFS-7543 > -- > > Key: HDFS-7550 > URL: https://issues.apache.org/jira/browse/HDFS-7550 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.7.0 >Reporter: Charles Lamb >Priority: Minor > Attachments: HDFS-7550.001.patch > > > The commit of HDFS-7543 crossed paths with these comments: > FSDirMkdirOp.java > in #mkdirs, you removed the final String srcArg = src. This should be left > in. Many IDEs will whine about making assignments to formal args and that's > why it was put in in the first place. > FSDirRenameOp.java > #renameToInt, dstIIP (and resultingStat) could benefit from final's. > FSDirXAttrOp.java > I'm not sure why you've moved the call to getINodesInPath4Write and > checkXAttrChangeAccess inside the writeLock. > FSDirStatAndListing.java > The javadoc for the @param src needs to be changed to reflect that it's an > INodesInPath, not a String. Nit: it might be better to rename the > INodesInPath arg from src to iip. > #getFileInfo4DotSnapshot is now unused since you in-lined it into > #getFileInfo. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14528) Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-14528: Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Failover from Active to Standby Failed > > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Labels: multi-sbnn > Attachments: HDFS-14528.003.patch, HDFS-14528.004.patch, > HDFS-14528.005.patch, HDFS-14528.006.patch, HDFS-14528.007.patch, > HDFS-14528.2.Patch, ZKFC_issue.patch > > > *In a cluster with more than one Standby namenode, manual failover throws > exception for some cases* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused > This is encountered in the following cases : > Scenario 1 : > Namenodes - NN1(Active) , NN2(Standby), NN3(Standby) > When trying to manually failover from NN1 to NN2 if NN3 is down, Exception is > thrown > Scenario 2 : > Namenodes - NN1(Active) , NN2(Standby), NN3(Standby) > ZKFC's - ZKFC1, ZKFC2, ZKFC3 > When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is > down, Exception is thrown -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12257) Expose getSnapshottableDirListing as a public API in HdfsAdmin
[ https://issues.apache.org/jira/browse/HDFS-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-12257: Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Expose getSnapshottableDirListing as a public API in HdfsAdmin > -- > > Key: HDFS-12257 > URL: https://issues.apache.org/jira/browse/HDFS-12257 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Affects Versions: 2.6.5 >Reporter: Andrew Wang >Assignee: Huafeng Wang >Priority: Major > Attachments: HDFS-12257.001.patch, HDFS-12257.002.patch, > HDFS-12257.003.patch > > > Found at HIVE-16294. We have a CLI API for listing snapshottable dirs, but no > programmatic API. Other snapshot APIs are exposed in HdfsAdmin, I think we > should expose listing there as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10743) MiniDFSCluster test runtimes can be drastically reduce
[ https://issues.apache.org/jira/browse/HDFS-10743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-10743: Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > MiniDFSCluster test runtimes can be drastically reduce > -- > > Key: HDFS-10743 > URL: https://issues.apache.org/jira/browse/HDFS-10743 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 2.0.0-alpha >Reporter: Daryn Sharp >Assignee: Kuhu Shukla >Priority: Major > Attachments: HDFS-10743.001.patch, HDFS-10743.002.patch, > HDFS-10743.003.patch > > > {{MiniDFSCluster}} tests have excessive runtimes. The main problem appears > to be the heartbeat interval. The NN may have to wait up to 3s (default > value) for all DNs to heartbeat, triggering registration, so NN can go > active. Tests that repeatedly restart the NN are severely affected. > Example for varying heartbeat intervals for {{TestFSImageWithAcl}}: > * 3s = ~70s -- (disgusting, why I investigated) > * 1s = ~27s > * 500ms = ~17s -- (had to hack DNConf for millisecond precision) > That a 4x improvement in runtime. > 17s is still excessively long for what the test does. Further areas to > explore when running tests: > * Reduce numerous sleeps intervals in DN's {{BPServiceActor}}. > * Ensure heartbeats and initial BR are sent immediately upon (re)registration. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10429) DataStreamer interrupted warning always appears when using CLI upload file
[ https://issues.apache.org/jira/browse/HDFS-10429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-10429: Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > DataStreamer interrupted warning always appears when using CLI upload file > --- > > Key: HDFS-10429 > URL: https://issues.apache.org/jira/browse/HDFS-10429 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.3 >Reporter: Zhiyuan Yang >Priority: Minor > Attachments: HDFS-10429.1.patch, HDFS-10429.2.patch, > HDFS-10429.3.patch > > > Every time I use 'hdfs dfs -put' upload file, this warning is printed: > {code:java} > 16/05/18 20:57:56 WARN hdfs.DataStreamer: Caught exception > java.lang.InterruptedException > at java.lang.Object.wait(Native Method) > at java.lang.Thread.join(Thread.java:1245) > at java.lang.Thread.join(Thread.java:1319) > at > org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:871) > at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:519) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:696) > {code} > The reason is this: originally, DataStreamer::closeResponder always prints a > warning about InterruptedException; since HDFS-9812, > DFSOutputStream::closeImpl always forces threads to close, which causes > InterruptedException. > A simple fix is to use debug level log instead of warning level. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10911) Change edit log OP_UPDATE_BLOCKS to store delta blocks only.
[ https://issues.apache.org/jira/browse/HDFS-10911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-10911: Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Change edit log OP_UPDATE_BLOCKS to store delta blocks only. > > > Key: HDFS-10911 > URL: https://issues.apache.org/jira/browse/HDFS-10911 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.3, 3.0.0-alpha1 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Major > > Every time a HDFS client {{close}} or {{hflush}} an open file, NameNode > enumerates all the blocks and stores then into edit log (OP_UPDATE_BLOCKS). > It would cause problem when the client is appending a large file frequently > (i.e., WAL). > Because HDFS is append only, we can only store the blocks that have been > changed (delta blocks) in edit log. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11232) System.err should be System.out
[ https://issues.apache.org/jira/browse/HDFS-11232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-11232: Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > System.err should be System.out > --- > > Key: HDFS-11232 > URL: https://issues.apache.org/jira/browse/HDFS-11232 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Ethan Li >Priority: Trivial > Attachments: HDFS-11232.001.patch, HDFS-11232.002.patch > > > In > /Users/Ethan/Worksplace/IntelliJWorkspace/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java, > System.err.println("Generating new cluster id:"); is used. I think it should > be System.out.println(...) since this is not an error message -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11885) createEncryptionZone should not block on initializing EDEK cache
[ https://issues.apache.org/jira/browse/HDFS-11885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-11885: Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > createEncryptionZone should not block on initializing EDEK cache > > > Key: HDFS-11885 > URL: https://issues.apache.org/jira/browse/HDFS-11885 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption >Affects Versions: 2.6.5 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: HDFS-11885.001.patch, HDFS-11885.002.patch, > HDFS-11885.003.patch, HDFS-11885.004.patch > > > When creating an encryption zone, we call {{ensureKeyIsInitialized}}, which > calls {{provider.warmUpEncryptedKeys(keyName)}}. This is a blocking call, > which attempts to fill the key cache up to the low watermark. > If the KMS is down or slow, this can take a very long time, and cause the > createZone RPC to fail with a timeout. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11464) Improve the selection in choosing storage for blocks
[ https://issues.apache.org/jira/browse/HDFS-11464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-11464: Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Improve the selection in choosing storage for blocks > > > Key: HDFS-11464 > URL: https://issues.apache.org/jira/browse/HDFS-11464 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Major > Attachments: HDFS-11464.001.patch, HDFS-11464.002.patch, > HDFS-11464.003.patch, HDFS-11464.004.patch, HDFS-11464.005.patch > > > Currently the logic in choosing storage for blocks is not a good way. It > always uses the first valid storage of a given StorageType ({{see > DataNodeDescriptor#chooseStorage4Block}}). This should not be a good > selection. That means blcoks will always be written to the same volume (first > volume) and other valid volumes have no choices. This problem is brought up > by this comment ( > https://issues.apache.org/jira/browse/HDFS-9807?focusedCommentId=15878382=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15878382 > ) > There is one solution from me: > * First, based on existing storages in one node, extract all the valid > storages into a collection. > * Then, disrupt the order of these vaild storages, get a new collection. > * Finally, get the first storage from the new storages collection. > These steps will be executed in {{DataNodeDescriptor#chooseStorage4Block}} > and replace current logic. I think this improvement can be done as a subtask > under HDFS-11419. Any further comments are welcomed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11455) Fix javac warnings in HDFS that caused by deprecated FileSystem APIs
[ https://issues.apache.org/jira/browse/HDFS-11455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-11455: Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Fix javac warnings in HDFS that caused by deprecated FileSystem APIs > > > Key: HDFS-11455 > URL: https://issues.apache.org/jira/browse/HDFS-11455 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Minor > Attachments: HDFS-11455.001.patch, HDFS-11455.002.patch, > HDFS-11455.003.patch > > > There are many javac warnings coming out after FileSystem APIs which promote > inefficient call patterns being deprecated in HADOOP-13321. The relevant > warnings: > {code} > [WARNING] > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestQuota.java:[320,18] > [deprecation] isFile(Path) in FileSystem has been deprecated > [WARNING] > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestQuota.java:[1409,18] > [deprecation] isFile(Path) in FileSystem has been deprecated > [WARNING] > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeFile.java:[778,19] > [deprecation] isDirectory(Path) in FileSystem has been deprecated > [WARNING] > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeFile.java:[787,20] > [deprecation] isDirectory(Path) in FileSystem has been deprecated > [WARNING] > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestQuotaByStorageType.java:[834,18] > [deprecation] isFile(Path) in FileSystem has been > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15140) Replace FoldedTreeSet in Datanode with SortedSet or TreeMap
[ https://issues.apache.org/jira/browse/HDFS-15140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-15140: Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Replace FoldedTreeSet in Datanode with SortedSet or TreeMap > --- > > Key: HDFS-15140 > URL: https://issues.apache.org/jira/browse/HDFS-15140 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-15140.001.patch, HDFS-15140.002.patch > > > Based on the problems discussed in HDFS-15131, I would like to explore > replacing the FoldedTreeSet structure in the datanode with a builtin Java > equivalent - either SortedSet or TreeMap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11543) Test multiple erasure coding implementations
[ https://issues.apache.org/jira/browse/HDFS-11543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-11543: Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Test multiple erasure coding implementations > > > Key: HDFS-11543 > URL: https://issues.apache.org/jira/browse/HDFS-11543 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Affects Versions: 3.0.0-alpha2 >Reporter: László Bence Nagy >Priority: Minor > Labels: test > > Potentially, multiple native erasure coding plugins will be available to be > used from HDFS later on. These plugins should be tested as well. For example, > the *NativeRSRawErasureCoderFactory* class - which is used for instantiating > the native ISA-L plugin's encoder and decoder objects - are used in 5 test > files under the > *hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/* > directory. The files are: > - *TestDFSStripedInputStream.java* > - *TestDFSStripedOutputStream.java* > - *TestDFSStripedOutputStreamWithFailure.java* > - *TestReconstructStripedFile.java* > - *TestUnsetAndChangeDirectoryEcPolicy.java* > Other erasure coding plugins should be tested in these cases as well in a > nice way (not by for example making a new file for every new erasure coding > plugin). For this purpose [parameterized > tests|https://github.com/junit-team/junit4/wiki/parameterized-tests] might be > used. > This is also true for the > *hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/erasurecode/rawcoder/* > directory where this approach could be used for example for the > interoperability tests (when it is checked that certain erasure coding > implementations are compatible with each other by doing the encoding and > decoding operations with different plugins and verifying their results). The > plugin pairs which should be tested could be the parameters for the > parameterized tests. > The parameterized test is just an idea, there can be other solutions as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-7408) Add a counter in the log that shows the number of block reports processed
[ https://issues.apache.org/jira/browse/HDFS-7408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-7408: --- Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Add a counter in the log that shows the number of block reports processed > - > > Key: HDFS-7408 > URL: https://issues.apache.org/jira/browse/HDFS-7408 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Suresh Srinivas >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-7408.001.patch > > > It would be great to have in the info log corresponding to block report > processing, printing information on how many block reports have been > processed. This can be useful to debug when namenode is unresponsive > especially during startup time to understand if datanodes are sending block > reports multiple times. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11489) KMS should warning about weak SSL ciphers
[ https://issues.apache.org/jira/browse/HDFS-11489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-11489: Target Version/s: 3.4.0 (was: 3.3.0) Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > KMS should warning about weak SSL ciphers > - > > Key: HDFS-11489 > URL: https://issues.apache.org/jira/browse/HDFS-11489 > Project: Hadoop HDFS > Issue Type: Improvement > Components: kms >Affects Versions: 2.9.0 >Reporter: John Zhuge >Assignee: John Zhuge >Priority: Minor > > HADOOP-14083 sets a list of default ciphers that contain a few weak ciphers > in order to maintain backwards compatibility. In addition, users can select > weak ciphers by env {{KMS_SSL_CIPHERS}}. It'd nice to get warnings about the > weak ciphers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org