[jira] [Resolved] (HDDS-451) PutKey failed due to error "Rejecting write chunk request. Chunk overwrite without explicit request"
[ https://issues.apache.org/jira/browse/HDDS-451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDDS-451. -- Resolution: Cannot Reproduce Resolving as "Cannot Reproduce". > PutKey failed due to error "Rejecting write chunk request. Chunk overwrite > without explicit request" > > > Key: HDDS-451 > URL: https://issues.apache.org/jira/browse/HDDS-451 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.2.1 >Reporter: Nilotpal Nandi >Assignee: Shashikant Banerjee >Priority: Blocker > Labels: alpha2 > Attachments: all-node-ozone-logs-1536841590.tar.gz > > > steps taken : > -- > # Ran Put Key command to write 50GB data. Put Key client operation failed > after 17 mins. > error seen ozone.log : > > > {code} > 2018-09-13 12:11:53,734 [ForkJoinPool.commonPool-worker-20] DEBUG > (ChunkManagerImpl.java:85) - writing > chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_1 > chunk stage:COMMIT_DATA chunk > file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_1 > tmp chunk file > 2018-09-13 12:11:56,576 [pool-3-thread-60] DEBUG (ChunkManagerImpl.java:85) - > writing > chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2 > chunk stage:WRITE_DATA chunk > file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2 > tmp chunk file > 2018-09-13 12:11:56,739 [ForkJoinPool.commonPool-worker-20] DEBUG > (ChunkManagerImpl.java:85) - writing > chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2 > chunk stage:COMMIT_DATA chunk > file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2 > tmp chunk file > 2018-09-13 12:12:21,410 [Datanode State Machine Thread - 0] DEBUG > (DatanodeStateMachine.java:148) - Executing cycle Number : 206 > 2018-09-13 12:12:51,411 [Datanode State Machine Thread - 0] DEBUG > (DatanodeStateMachine.java:148) - Executing cycle Number : 207 > 2018-09-13 12:12:53,525 [BlockDeletingService#1] DEBUG > (TopNOrderedContainerDeletionChoosingPolicy.java:79) - Stop looking for next > container, there is no pending deletion block contained in remaining > containers. > 2018-09-13 12:12:55,048 [Datanode ReportManager Thread - 1] DEBUG > (ContainerSet.java:191) - Starting container report iteration. > 2018-09-13 12:13:02,626 [pool-3-thread-1] ERROR (ChunkUtils.java:244) - > Rejecting write chunk request. Chunk overwrite without explicit request. > ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2, > offset=0, len=16777216} > 2018-09-13 12:13:03,035 [pool-3-thread-1] INFO (ContainerUtils.java:149) - > Operation: WriteChunk : Trace ID: 54834b29-603d-4ba9-9d68-0885215759d8 : > Message: Rejecting write chunk request. OverWrite flag > required.ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2, > offset=0, len=16777216} : Result: OVERWRITE_FLAG_REQUIRED > 2018-09-13 12:13:03,037 [ForkJoinPool.commonPool-worker-11] ERROR > (ChunkUtils.java:244) - Rejecting write chunk request. Chunk overwrite > without explicit request. > ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2, > offset=0, len=16777216} > 2018-09-13 12:13:03,037 [ForkJoinPool.commonPool-worker-11] INFO > (ContainerUtils.java:149) - Operation: WriteChunk : Trace ID: > 54834b29-603d-4ba9-9d68-0885215759d8 : Message: Rejecting write chunk > request. OverWrite flag > required.ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2, > offset=0, len=16777216} : Result: OVERWRITE_FLAG_REQUIRED > > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT
Tsz Wo Nicholas Sze created HDDS-826: Summary: Update Ratis to 0.3.0-6f3419a-SNAPSHOT Key: HDDS-826 URL: https://issues.apache.org/jira/browse/HDDS-826 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze RATIS-404 fixed a deadlock bug. We should update Ratis here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-691) Dependency convergence error for org.apache.hadoop:hadoop-annotations
Tsz Wo Nicholas Sze created HDDS-691: Summary: Dependency convergence error for org.apache.hadoop:hadoop-annotations Key: HDDS-691 URL: https://issues.apache.org/jira/browse/HDDS-691 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-632) TimeoutScheduler and SlidingWindow should use daemon threads
Tsz Wo Nicholas Sze created HDDS-632: Summary: TimeoutScheduler and SlidingWindow should use daemon threads Key: HDDS-632 URL: https://issues.apache.org/jira/browse/HDDS-632 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze In HDDS-625, we found that the Ozone client does not terminate. The SlidingWindow (debug) thread and the TimeoutScheduler threads are holding up process termination. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-554) In XceiverClientSpi, implements sendCommand(..) using sendCommandAsync(..)
Tsz Wo Nicholas Sze created HDDS-554: Summary: In XceiverClientSpi, implements sendCommand(..) using sendCommandAsync(..) Key: HDDS-554 URL: https://issues.apache.org/jira/browse/HDDS-554 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: Ozone Client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze The advantages is two-fold -- # it simplifies the code, and # the async API is more efficient. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-372) There are two buffer copies in ChunkOutputStream
Tsz Wo Nicholas Sze created HDDS-372: Summary: There are two buffer copies in ChunkOutputStream Key: HDDS-372 URL: https://issues.apache.org/jira/browse/HDDS-372 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: Ozone Client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Currently, there are two buffer copies in ChunkOutputStream # from byte[] to ByteBuffer, and # from ByteBuffer to ByteString. We should eliminate the ByteBuffer in the middle. For zero copy io, we should support WritableByteChannel instead of OutputStream. It won't be done in this JIRA. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-13205) Incorrect path is passed to checkPermission during authorization of file under a snapshot (specifically under a subdir) after original subdir is deleted
[ https://issues.apache.org/jira/browse/HDFS-13205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-13205. Resolution: Not A Problem Resolving as Not A Problem. > Incorrect path is passed to checkPermission during authorization of file > under a snapshot (specifically under a subdir) after original subdir is > deleted > > > Key: HDFS-13205 > URL: https://issues.apache.org/jira/browse/HDFS-13205 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Affects Versions: 2.7.4 >Reporter: Raghavender Rao Guruvannagari >Assignee: Shashikant Banerjee >Priority: Major > > Steps to reproduce the issue. > +As 'hdfs' superuser+ > – Create a folder (/hdptest/test) with 700 permissions and ( > /hdptest/test/mydir) with 755. > --HDFS Ranger policy is defined with RWX for user "test" on /hdptest/test/ > recursively. > --Allow snapshot on the directory /hdptest/test/mydir: > {code:java} > #su - test > [test@node1 ~]$ hdfs dfs -ls /hdptest/test/mydir > [test@node1 ~]$ hdfs dfs -mkdir /hdptest/test/mydir/test > [test@node1 ~]$ hdfs dfs -put /etc/passwd /hdptest/test/mydir/test > [test@node1 ~]$ hdfs lsSnapshottableDir > drwxr-xr-x 0 test hdfs 0 2018-01-25 14:22 1 65536 /hdptest/test/mydir > > {code} > > -->Create Snapshot > {code:java} > [test@node1 ~]$ hdfs dfs -createSnapshot /hdptest/test/mydir > Created snapshot /hdptest/test/mydir/.snapshot/s20180125-135430.953 > {code} > -->Verifying that snapshot directory has the current files from directory > and verify the file is accessible .snapshot path: > {code:java} > [test@node1 ~]$ hdfs dfs -ls -R > /hdptest/test/mydir/.snapshot/s20180125-135430.953 > drwxr-xr-x - test hdfs 0 2018-01-25 13:53 > /hdptest/test/mydir/.snapshot/s20180125-135430.953/test > -rw-r--r-- 3 test hdfs 3227 2018-01-25 13:53 > /hdptest/test/mydir/.snapshot/s20180125-135430.953/test/passwd > [test@node1 ~]$ hdfs dfs -cat > /hdptest/test/mydir/.snapshot/s20180125-135430.953/test/passwd | tail > livytest:x:1015:496::/home/livytest:/bin/bash > ehdpzepp:x:1016:496::/home/ehdpzepp:/bin/bash > zepptest:x:1017:496::/home/zepptest:/bin/bash > {code} > -->Remove the file from main directory and verified that file is still > accessible: > {code:java} > [test@node1 ~]$ hdfs dfs -rm /hdptest/test/mydir/test/passwd > 18/01/25 13:55:06 INFO fs.TrashPolicyDefault: Moved: > 'hdfs://rangerSME/hdptest/test/mydir/test/passwd' to trash at: > hdfs://rangerSME/user/test/.Trash/Current/hdptest/test/mydir/test/passwd > [test@node1 ~]$ hdfs dfs -cat > /hdptest/test/mydir/.snapshot/s20180125-135430.953/test/passwd | tail > livytest:x:1015:496::/home/livytest:/bin/bash > {code} > -->Remove the parent directory of the file which was deleted, now accessing > the same file under .snapshot dir fails with permission denied error > {code:java} > [test@node1 ~]$ hdfs dfs -rm -r /hdptest/test/mydir/test > 18/01/25 13:55:25 INFO fs.TrashPolicyDefault: Moved: > 'hdfs://rangerSME/hdptest/test/mydir/test' to trash at: > hdfs://rangerSME/user/test/.Trash/Current/hdptest/test/mydir/test1516888525269 > [test@node1 ~]$ hdfs dfs -cat > /hdptest/test/mydir/.snapshot/s20180125-135430.953/test/passwd | tail > cat: Permission denied: user=test, access=EXECUTE, > inode="/hdptest/test/mydir/.snapshot/s20180125-135430.953/test/passwd":hdfs:hdfs:drwxr-x--- > > {code} > Ranger policies are not honored in this case for .snapshot directories/files > after main directory is deleted under snapshotable directory. > Workaround is to provide execute permission at HDFS level for the parent > folder > {code:java} > #su - hdfs > #hdfs dfs -chmod 701 /hdptest/test > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-293) Reduce memory usage in KeyData
Tsz Wo Nicholas Sze created HDDS-293: Summary: Reduce memory usage in KeyData Key: HDDS-293 URL: https://issues.apache.org/jira/browse/HDDS-293 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Currently, the field chunks is declared as a List in KeyData as shown below. {code} //KeyData.java private List chunks; {code} It is expected that many KeyData objects only have a single chunk. We could reduce the memory usage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-288) Fix bugs in OpenContainerBlockMap
Tsz Wo Nicholas Sze created HDDS-288: Summary: Fix bugs in OpenContainerBlockMap Key: HDDS-288 URL: https://issues.apache.org/jira/browse/HDDS-288 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze - OpenContainerBlockMap should not be synchronized for a better performance. - There is a memory leak in removeContainer(..) -- it sets the entry to null instead of removing it. - addChunkToMap may add the same chunk twice. See the comments below. {code} keyDataSet.putIfAbsent(blockID.getLocalID(), getKeyData(info, blockID)); // (1) when id is absent, it puts keyDataSet.computeIfPresent(blockID.getLocalID(), (key, value) -> { // (2) now, the id is present, it adds again. value.addChunk(info); return value; }); {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-42) Inconsistent module names and descriptions
Tsz Wo Nicholas Sze created HDDS-42: --- Summary: Inconsistent module names and descriptions Key: HDDS-42 URL: https://issues.apache.org/jira/browse/HDDS-42 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze The hdds/ozone module names and descriptions are inconsistent: - Missing "Hadoop" in some cases. - Inconsistent use of acronyms. - Inconsistent capitalization. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-28) Duplicate declaration in hadoop-tools/hadoop-ozone/pom.xml
Tsz Wo Nicholas Sze created HDDS-28: --- Summary: Duplicate declaration in hadoop-tools/hadoop-ozone/pom.xml Key: HDDS-28 URL: https://issues.apache.org/jira/browse/HDDS-28 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Filesystem Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze {code} [WARNING] Some problems were encountered while building the effective model for org.apache.hadoop:hadoop-ozone-filesystem:jar:3.2.0-SNAPSHOT [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must be unique: org.apache.hadoop:hadoop-hdds-server-framework:jar -> duplicate declaration of version (?) @ org.apache.hadoop:hadoop-ozone-filesystem:[unknown-version], /Users/szetszwo/hadoop/apache-hadoop/hadoop-tools/hadoop-ozone/pom.xml, line 173, column 17 [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must be unique: org.apache.hadoop:hadoop-hdds-server-scm:jar -> duplicate declaration of version (?) @ org.apache.hadoop:hadoop-ozone-filesystem:[unknown-version], /Users/szetszwo/hadoop/apache-hadoop/hadoop-tools/hadoop-ozone/pom.xml, line 178, column 17 [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must be unique: org.apache.hadoop:hadoop-hdds-client:jar -> duplicate declaration of version (?) @ org.apache.hadoop:hadoop-ozone-filesystem:[unknown-version], /Users/szetszwo/hadoop/apache-hadoop/hadoop-tools/hadoop-ozone/pom.xml, line 183, column 17 [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must be unique: org.apache.hadoop:hadoop-hdds-container-service:jar -> duplicate declaration of version (?) @ org.apache.hadoop:hadoop-ozone-filesystem:[unknown-version], /Users/szetszwo/hadoop/apache-hadoop/hadoop-tools/hadoop-ozone/pom.xml, line 188, column 17 [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must be unique: org.apache.hadoop:hadoop-ozone-ozone-manager:jar -> duplicate declaration of version (?) @ org.apache.hadoop:hadoop-ozone-filesystem:[unknown-version], /Users/szetszwo/hadoop/apache-hadoop/hadoop-tools/hadoop-ozone/pom.xml, line 193, column 17 {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13526) Use TimeoutScheduler in RaftClientImpl
Tsz Wo Nicholas Sze created HDFS-13526: -- Summary: Use TimeoutScheduler in RaftClientImpl Key: HDFS-13526 URL: https://issues.apache.org/jira/browse/HDFS-13526 Project: Hadoop HDFS Issue Type: Improvement Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze TimeoutScheduler is thread safe and have auto-shutdown when there are no tasks. Let's also use it in RaftClientImpl for submitting retry requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-898) Sequential generation of block ids
[ https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-898. -- Resolution: Duplicate This was done by HDFS-4645. Resolving ... > Sequential generation of block ids > -- > > Key: HDFS-898 > URL: https://issues.apache.org/jira/browse/HDFS-898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 0.20.1 >Reporter: Konstantin Shvachko >Priority: Major > Attachments: DuplicateBlockIds.patch, FreeBlockIds.pdf, > HighBitProjection.pdf, blockid.tex, blockid20100122.pdf > > > This is a proposal to replace random generation of block ids with a > sequential generator in order to avoid block id reuse in the future. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13252) Code refactoring: Remove Diff.ListType
Tsz Wo Nicholas Sze created HDFS-13252: -- Summary: Code refactoring: Remove Diff.ListType Key: HDFS-13252 URL: https://issues.apache.org/jira/browse/HDFS-13252 Project: Hadoop HDFS Issue Type: Improvement Components: snapshots Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze In Diff, there are only two lists, created and deleted. It is easier to trace the code if the methods have the list type in the method name, instead of passing a ListType parameter. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13223) Reduce DiffListBySkipList memory usage
Tsz Wo Nicholas Sze created HDFS-13223: -- Summary: Reduce DiffListBySkipList memory usage Key: HDFS-13223 URL: https://issues.apache.org/jira/browse/HDFS-13223 Project: Hadoop HDFS Issue Type: Improvement Components: snapshots Reporter: Tsz Wo Nicholas Sze Assignee: Shashikant Banerjee There are several ways to reduce memory footprint of DiffListBySkipList. - Move maxSkipLevels and skipInterval to DirectoryDiffListFactory. - Use an array for skipDiffList instead of List. - Do not store the level 0 element in skipDiffList. - Do not create new ChildrenDiff for the same value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12839) Refactor ratis-server tests to reduce the use DEFAULT_CALLID
Tsz Wo Nicholas Sze created HDFS-12839: -- Summary: Refactor ratis-server tests to reduce the use DEFAULT_CALLID Key: HDFS-12839 URL: https://issues.apache.org/jira/browse/HDFS-12839 Project: Hadoop HDFS Issue Type: Improvement Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor This JIRA is to help reducing the patch size in RATIS-141. We refactor the tests so that DEFAULT_CALLID is only used in MiniRaftCluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12527) CLONE - javadoc: error - class file for org.apache.http.annotation.ThreadSafe not found
Tsz Wo Nicholas Sze created HDFS-12527: -- Summary: CLONE - javadoc: error - class file for org.apache.http.annotation.ThreadSafe not found Key: HDFS-12527 URL: https://issues.apache.org/jira/browse/HDFS-12527 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Tsz Wo Nicholas Sze Assignee: Mukul Kumar Singh {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:2.10.4:jar (module-javadocs) on project hadoop-hdfs-client: MavenReportException: Error while generating Javadoc: [ERROR] Exit code: 1 - /Users/szetszwo/hadoop/t2/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/util/StripedBlockUtil.java:694: warning - Tag @link: reference not found: StripingCell [ERROR] javadoc: error - class file for org.apache.http.annotation.ThreadSafe not found [ERROR] [ERROR] Command line was: /Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home/jre/../bin/javadoc -J-Xmx768m @options @packages [ERROR] [ERROR] Refer to the generated Javadoc files in '/Users/szetszwo/hadoop/t2/hadoop-hdfs-project/hadoop-hdfs-client/target/api' dir. {code} To reproduce the error above, run {code} mvn package -Pdist -DskipTests -DskipDocs -Dtar {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12507) StripedBlockUtil.java:694: warning - Tag @link: reference not found: StripingCell
Tsz Wo Nicholas Sze created HDFS-12507: -- Summary: StripedBlockUtil.java:694: warning - Tag @link: reference not found: StripingCell Key: HDFS-12507 URL: https://issues.apache.org/jira/browse/HDFS-12507 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Tsz Wo Nicholas Sze {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:2.10.4:jar (module-javadocs) on project hadoop-hdfs-client: MavenReportException: Error while generating Javadoc: [ERROR] Exit code: 1 - /Users/szetszwo/hadoop/t2/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/util/StripedBlockUtil.java:694: warning - Tag @link: reference not found: StripingCell [ERROR] javadoc: error - class file for org.apache.http.annotation.ThreadSafe not found [ERROR] [ERROR] Command line was: /Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home/jre/../bin/javadoc -J-Xmx768m @options @packages [ERROR] [ERROR] Refer to the generated Javadoc files in '/Users/szetszwo/hadoop/t2/hadoop-hdfs-project/hadoop-hdfs-client/target/api' dir. {code} To reproduce the error above, run {code} mvn package -Pdist -DskipTests -DskipDocs -Dtar {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12244) Ozone: the static cache provided by ContainerCache does not work in Unit tests
Tsz Wo Nicholas Sze created HDFS-12244: -- Summary: Ozone: the static cache provided by ContainerCache does not work in Unit tests Key: HDFS-12244 URL: https://issues.apache.org/jira/browse/HDFS-12244 Project: Hadoop HDFS Issue Type: Bug Components: ozone Reporter: Tsz Wo Nicholas Sze Since a cluster may have >1 datanodes, a static ContainerCache is shared among the datanodes. When one datanode shutdown, the cache will be shutdown so that the other datanodes cannot use the cache any more. It results in "leveldb.DBException: Closed" {code} org.iq80.leveldb.DBException: Closed at org.fusesource.leveldbjni.internal.JniDB.get(JniDB.java:75) at org.apache.hadoop.utils.LevelDBStore.get(LevelDBStore.java:109) at org.apache.hadoop.ozone.container.common.impl.KeyManagerImpl.getKey(KeyManagerImpl.java:116) at org.apache.hadoop.ozone.container.common.impl.Dispatcher.handleGetSmallFile(Dispatcher.java:677) at org.apache.hadoop.ozone.container.common.impl.Dispatcher.smallFileHandler(Dispatcher.java:293) at org.apache.hadoop.ozone.container.common.impl.Dispatcher.dispatch(Dispatcher.java:121) at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatch(ContainerStateMachine.java:94) ... {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12163) Ozone: MiniOzoneCluster uses 400+ threads
Tsz Wo Nicholas Sze created HDFS-12163: -- Summary: Ozone: MiniOzoneCluster uses 400+ threads Key: HDFS-12163 URL: https://issues.apache.org/jira/browse/HDFS-12163 Project: Hadoop HDFS Issue Type: Bug Components: ozone, test Reporter: Tsz Wo Nicholas Sze Checked the number of active threads used in MiniOzoneCluster with various settings: - Local handlers - Distributed handlers - Ratis-Netty - Ratis-gRPC The results are similar for all the settings. It uses 400+ threads. Moreover, there is a thread leak -- a number of the threads do not shutdown after the test is finished. Therefore, when tests run consecutively, the later tests use more threads. Will post the details in comments. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12006) Ozone: add TestDistributedOzoneVolumesRatis
Tsz Wo Nicholas Sze created HDFS-12006: -- Summary: Ozone: add TestDistributedOzoneVolumesRatis Key: HDFS-12006 URL: https://issues.apache.org/jira/browse/HDFS-12006 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone, test Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Add Ratis tests similar to TestDistributedOzoneVolumes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11989) Oz
Tsz Wo Nicholas Sze created HDFS-11989: -- Summary: Oz Key: HDFS-11989 URL: https://issues.apache.org/jira/browse/HDFS-11989 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11979) Ozone: TestContainerPersistence never uses MiniOzoneCluster
Tsz Wo Nicholas Sze created HDFS-11979: -- Summary: Ozone: TestContainerPersistence never uses MiniOzoneCluster Key: HDFS-11979 URL: https://issues.apache.org/jira/browse/HDFS-11979 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone, test Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11977) Ozone: cannot enable test debug/trace log
Tsz Wo Nicholas Sze created HDFS-11977: -- Summary: Ozone: cannot enable test debug/trace log Key: HDFS-11977 URL: https://issues.apache.org/jira/browse/HDFS-11977 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone, test Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Interestingly, the test debug/trace logs are not printed for Ozone classes even if we invoke GenericTestUtils.setLogLevel(log, Level.ALL). Other classes such as Object do not have such problem. Here is a test: {code} @Test public void testLogLevel() throws Exception { runTestLogLevel(StorageContainerManager.class); runTestLogLevel(Object.class); } static void runTestLogLevel(Class clazz) throws Exception { final Logger log = LoggerFactory.getLogger(clazz); GenericTestUtils.setLogLevel(log, Level.ALL); log.trace(clazz.getSimpleName() + " trace log"); log.debug(clazz.getSimpleName() + " debug log"); log.info(clazz.getSimpleName() + " info log"); log.warn(clazz.getSimpleName() + " warn log"); log.error(clazz.getSimpleName() + " error log"); } {code} Output: {code} 2017-06-15 00:19:07,133 [Thread-0] INFO - StorageContainerManager info log 2017-06-15 00:19:07,135 [Thread-0] WARN - StorageContainerManager warn log 2017-06-15 00:19:07,135 [Thread-0] ERROR - StorageContainerManager error log 2017-06-15 00:19:07,135 [Thread-0] TRACE lang.Object(TestOzoneContainer.java:runTestLogLevel(64)) - Object trace log 2017-06-15 00:19:07,135 [Thread-0] DEBUG lang.Object(TestOzoneContainer.java:runTestLogLevel(65)) - Object debug log 2017-06-15 00:19:07,135 [Thread-0] INFO lang.Object(TestOzoneContainer.java:runTestLogLevel(66)) - Object info log 2017-06-15 00:19:07,135 [Thread-0] WARN lang.Object(TestOzoneContainer.java:runTestLogLevel(67)) - Object warn log 2017-06-15 00:19:07,135 [Thread-0] ERROR lang.Object(TestOzoneContainer.java:runTestLogLevel(68)) - Object error log {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11948) Ozone: change TestRatisManager to check cluster with data
Tsz Wo Nicholas Sze created HDFS-11948: -- Summary: Ozone: change TestRatisManager to check cluster with data Key: HDFS-11948 URL: https://issues.apache.org/jira/browse/HDFS-11948 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze TestRatisManager first creates multiple Ratis clusters. Then it changes the membership and closes some clusters. However, it does not test the clusters with data. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11947) BPOfferService prints a invalid warning message "Block pool ID needed, but service not yet registered with NN"
Tsz Wo Nicholas Sze created HDFS-11947: -- Summary: BPOfferService prints a invalid warning message "Block pool ID needed, but service not yet registered with NN" Key: HDFS-11947 URL: https://issues.apache.org/jira/browse/HDFS-11947 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Tsz Wo Nicholas Sze Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11946) Ozone: Containers in different datanodes are mapped to the same location
Tsz Wo Nicholas Sze created HDFS-11946: -- Summary: Ozone: Containers in different datanodes are mapped to the same location Key: HDFS-11946 URL: https://issues.apache.org/jira/browse/HDFS-11946 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Tsz Wo Nicholas Sze Assignee: Anu Engineer This is a problem in unit tests. Containers with the same container name in different datanodes are mapped to the same local path location. For example, As a result, the first datanode will be able to succeed creating the container file but the remaining datanodes will fail to create the container file with FileAlreadyExistsException. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11865) Ozone: Do not initialize Ratis cluster during datanode startup
Tsz Wo Nicholas Sze created HDFS-11865: -- Summary: Ozone: Do not initialize Ratis cluster during datanode startup Key: HDFS-11865 URL: https://issues.apache.org/jira/browse/HDFS-11865 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze During a datanode startup, we current pass dfs.container.ratis.conf so that the datanode is bound to a particular Ratis cluster. In this JIRA, we change Datanode that the datanode is no longer bound to any Ratis cluster during startup. We use the Ratis reinitialize request (RATIS-86) to set up a Ratis cluster later on. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11843) Ozone: XceiverClientRatis should implement XceiverClientSpi.connect()
Tsz Wo Nicholas Sze created HDFS-11843: -- Summary: Ozone: XceiverClientRatis should implement XceiverClientSpi.connect() Key: HDFS-11843 URL: https://issues.apache.org/jira/browse/HDFS-11843 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze When a XceiverClientRatis object is newly created, it automatically connect to the server. This is not a correct behavior. It should implement XceiverClientSpi.connect(). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11735) Ozone: In Ratis, leader should validate ContainerCommandRequestProto before propagating it to followers
Tsz Wo Nicholas Sze created HDFS-11735: -- Summary: Ozone: In Ratis, leader should validate ContainerCommandRequestProto before propagating it to followers Key: HDFS-11735 URL: https://issues.apache.org/jira/browse/HDFS-11735 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11734) Ozone: provide a way to validate ContainerCommandRequestProto
Tsz Wo Nicholas Sze created HDFS-11734: -- Summary: Ozone: provide a way to validate ContainerCommandRequestProto Key: HDFS-11734 URL: https://issues.apache.org/jira/browse/HDFS-11734 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Tsz Wo Nicholas Sze Assignee: Anu Engineer We need some API to check if a ContainerCommandRequestProto is valid. It is useful when the container pipeline is run with Ratis. Then, the leader could first checks if a ContainerCommandRequestProto is valid before the request is propagated to the followers. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11597) Ozone: Add Ratis management API
Tsz Wo Nicholas Sze created HDFS-11597: -- Summary: Ozone: Add Ratis management API Key: HDFS-11597 URL: https://issues.apache.org/jira/browse/HDFS-11597 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze We need an API to manage raft clusters, e.g. - RaftClusterId createRaftCluster(MembershipConfiguration) - void closeRaftCluster(RaftClusterId) - MembershipConfiguration getMembers(RaftClusterId) - void changeMembership(RaftClusterId, newMembershipConfiguration) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11559) Ozone: MiniOzoneCluster prints too many log messages by default
Tsz Wo Nicholas Sze created HDFS-11559: -- Summary: Ozone: MiniOzoneCluster prints too many log messages by default Key: HDFS-11559 URL: https://issues.apache.org/jira/browse/HDFS-11559 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone, test Reporter: Tsz Wo Nicholas Sze Priority: Minor When running tests using MiniOzoneCluster, it prints out tons of debug and trace log messages from all logs including the ones in from the libraries such as - ipc.Server {code} 2017-03-21 15:13:13,053 [Thread-0] DEBUG ipc.Server (RPC.java:registerProtocolAndImpl(895)) - RpcKind = RPC_PROTOCOL_BUFFER Protocol Name = org.apache.hadoop.ipc.ProtocolMetaInfoPB version=1 ProtocolImpl=org.apache.hadoop.ipc.protobuf.ProtocolInfoProtos$ProtocolInfoService$2 protocolClass=org.apache.hadoop.ipc.ProtocolMetaInfoPB 2017-03-21 15:13:13,058 [Thread-0] DEBUG ipc.Server (RPC.java:registerProtocolAndImpl(895)) - RpcKind = RPC_PROTOCOL_BUFFER Protocol Name = org.apache.hadoop.hdfs.protocol.ClientProtocol version=1 ProtocolImpl=org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2 protocolClass=org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolPB 2017-03-21 15:13:13,058 [Thread-0] DEBUG ipc.Server (RPC.java:registerProtocolAndImpl(895)) - RpcKind = RPC_PROTOCOL_BUFFER Protocol Name = org.apache.hadoop.ha.HAServiceProtocol version=1 ProtocolImpl=org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2 protocolClass=org.apache.hadoop.ha.protocolPB.HAServiceProtocolPB {code} - netty {code} 2017-03-21 15:13:14,912 [Thread-0] DEBUG nio.NioEventLoop (Slf4JLogger.java:debug(76)) - -Dio.netty.noKeySetOptimization: false 2017-03-21 15:13:14,912 [Thread-0] DEBUG nio.NioEventLoop (Slf4JLogger.java:debug(76)) - -Dio.netty.selectorAutoRebuildThreshold: 512 2017-03-21 15:13:14,916 [Thread-0] TRACE nio.NioEventLoop (Slf4JLogger.java:trace(46)) - Instrumented an optimized java.util.Set into: sun.nio.ch.KQueueSelectorImpl@501c140b 2017-03-21 15:13:14,916 [Thread-0] TRACE nio.NioEventLoop (Slf4JLogger.java:trace(46)) - Instrumented an optimized java.util.Set into: sun.nio.ch.KQueueSelectorImpl@30ebe2a0 2017-03-21 15:13:14,917 [Thread-0] TRACE nio.NioEventLoop (Slf4JLogger.java:trace(46)) - Instrumented an optimized java.util.Set into: sun.nio.ch.KQueueSelectorImpl@98cbeb6 {code} - beanutils {code} 2017-03-21 15:13:10,490 [Thread-0] TRACE beanutils.BeanUtils (BeanUtilsBean.java:setProperty(888)) - setProperty(org.apache.commons.configuration2.PropertiesConfiguration@18f07b01, listDelimiterHandler, org.apache.commons.configuration2.convert.DefaultListDelimiterHandler@5473ddc2) 2017-03-21 15:13:10,491 [Thread-0] TRACE beanutils.BeanUtils (BeanUtilsBean.java:setProperty(906)) - Target bean = org.apache.commons.configuration2.PropertiesConfiguration@18f07b01 2017-03-21 15:13:10,491 [Thread-0] TRACE beanutils.BeanUtils (BeanUtilsBean.java:setProperty(907)) - Target name = listDelimiterHandler {code} - eclipse.jetty {code} 2017-03-21 15:13:14,796 [Thread-0] DEBUG component.ContainerLifeCycle (ContainerLifeCycle.java:addBean(323)) - org.eclipse.jetty.server.Server@32cd6303 added {qtp48399352{STOPPED,8<=0<=200,i=0,q=0},AUTO} 2017-03-21 15:13:14,797 [Thread-0] DEBUG util.DecoratedObjectFactory (DecoratedObjectFactory.java:addDecorator(52)) - Adding Decorator: org.eclipse.jetty.util.DeprecationWarning@b7a0755 2017-03-21 15:13:14,797 [Thread-0] DEBUG component.ContainerLifeCycle (ContainerLifeCycle.java:addBean(323)) - org.eclipse.jetty.server.session.SessionHandler@47175536 added {org.eclipse.jetty.server.session.HashSessionManager@51fce36f,AUTO} {code} The test output becomes very very long. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11558) BPServiceActor thread name is too long
Tsz Wo Nicholas Sze created HDFS-11558: -- Summary: BPServiceActor thread name is too long Key: HDFS-11558 URL: https://issues.apache.org/jira/browse/HDFS-11558 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Currently, the thread name looks like {code} 2017-03-20 18:32:22,022 [DataNode: [[[DISK]file:/Users/szetszwo/hadoop/t2/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/dn1_data0, [DISK]file:/Users/szetszwo/hadoop/t2/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/dn1_data1]] heartbeating to localhost/127.0.0.1:51772] INFO ... {code} which contains the full path for each storage dir. It is unnecessarily long. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11513) Ozone: Separate XceiverServer and XceiverClient into interfaces and implementations
Tsz Wo Nicholas Sze created HDFS-11513: -- Summary: Ozone: Separate XceiverServer and XceiverClient into interfaces and implementations Key: HDFS-11513 URL: https://issues.apache.org/jira/browse/HDFS-11513 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze XceiverServer and XceiverClient are endpoint acts as the communication layer for Ozone containers. We propose to separate them into interfaces and implementations so we can use Ratis or some other library to implement them. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11429) Move out the Hadoop RPC config keys from RaftServerConfigKeys
Tsz Wo Nicholas Sze created HDFS-11429: -- Summary: Move out the Hadoop RPC config keys from RaftServerConfigKeys Key: HDFS-11429 URL: https://issues.apache.org/jira/browse/HDFS-11429 Project: Hadoop HDFS Issue Type: Bug Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze There are a few Hadoop Rpc specific config keys in RaftServerConfigKeys. We should move them to the ratis-hadoop module. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11168) Bump Netty 4 version
Tsz Wo Nicholas Sze created HDFS-11168: -- Summary: Bump Netty 4 version Key: HDFS-11168 URL: https://issues.apache.org/jira/browse/HDFS-11168 Project: Hadoop HDFS Issue Type: Bug Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze The current Netty 4 version is 4.1.0.Beta5. We should bump it to a non-beta version. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10791) Delete block meta file when the block file is missing
Tsz Wo Nicholas Sze created HDFS-10791: -- Summary: Delete block meta file when the block file is missing Key: HDFS-10791 URL: https://issues.apache.org/jira/browse/HDFS-10791 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Tsz Wo Nicholas Sze When the block file is missing, the block meta file should be deleted if it exists. Note that such situation is possible since the meta file is closed before the block file, the datanode could be killed in-between. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10535) Rename AsyncDistributedFileSystem
Tsz Wo Nicholas Sze created HDFS-10535: -- Summary: Rename AsyncDistributedFileSystem Key: HDFS-10535 URL: https://issues.apache.org/jira/browse/HDFS-10535 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Per discussion in HDFS-9924, AsyncDistributedFileSystem is not a good name since we only support nonblocking calls for the moment. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-8715) Checkpoint node keeps throwing exception
[ https://issues.apache.org/jira/browse/HDFS-8715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-8715. --- Resolution: Duplicate > Checkpoint node keeps throwing exception > > > Key: HDFS-8715 > URL: https://issues.apache.org/jira/browse/HDFS-8715 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.5.2 > Environment: centos 6.4, sun jdk 1.7 >Reporter: Jiahongchao > > I tired to start a checkup node using "bin/hdfs namenode -checkpoint", but it > keeps printing > 15/07/03 23:16:22 ERROR namenode.FSNamesystem: Swallowing exception in > NameNodeEditLogRoller: > java.lang.IllegalStateException: Bad state: BETWEEN_LOG_SEGMENTS > at > com.google.common.base.Preconditions.checkState(Preconditions.java:172) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.getCurSegmentTxId(FSEditLog.java:495) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem$NameNodeEditLogRoller.run(FSNamesystem.java:4718) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-10445) Add timeout tests for async DFS API
[ https://issues.apache.org/jira/browse/HDFS-10445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-10445. Resolution: Duplicate After HDFS-10431, all tests have timeout now. Resolving as duplicate. > Add timeout tests for async DFS API > --- > > Key: HDFS-10445 > URL: https://issues.apache.org/jira/browse/HDFS-10445 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Xiaobing Zhou >Assignee: Xiaobing Zhou > > As a result of HADOOP-13168 commit, async DFS APIs should also be tested in > the case of timeout (i.e. Future#get(int timeout, TimeUnit unit)). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10319) Balancer should not try to pair storages with different types
Tsz Wo Nicholas Sze created HDFS-10319: -- Summary: Balancer should not try to pair storages with different types Key: HDFS-10319 URL: https://issues.apache.org/jira/browse/HDFS-10319 Project: Hadoop HDFS Issue Type: Bug Components: balancer & mover Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor This is a performance bug – Balancer may pair a source datanode and a target datanode with different storage types. Fortunately, it will fail schedule any blocks in such pair since it will find out that the storage types are not matched later on. The bug won't lead to incorrect results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9943) Support reconfiguring namenode replication confs
Tsz Wo Nicholas Sze created HDFS-9943: - Summary: Support reconfiguring namenode replication confs Key: HDFS-9943 URL: https://issues.apache.org/jira/browse/HDFS-9943 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Xiaobing Zhou The following confs should be re-configurable in runtime. - dfs.namenode.replication.work.multiplier.per.iteration - dfs.namenode.replication.max-streams - dfs.namenode.replication.max-streams-hard-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9924) [umbrella] Asynchronous HDFS Access
Tsz Wo Nicholas Sze created HDFS-9924: - Summary: [umbrella] Asynchronous HDFS Access Key: HDFS-9924 URL: https://issues.apache.org/jira/browse/HDFS-9924 Project: Hadoop HDFS Issue Type: New Feature Components: fs Reporter: Tsz Wo Nicholas Sze Assignee: Xiaobing Zhou This is an umbrella JIRA for supporting Asynchronous HDFS Access. Currently, all the API methods are blocking calls -- the caller is blocked until the method returns. It is very slow if a client makes a large number of independent calls in a single thread since each call has to wait until the previous call is finished. It is inefficient if a client needs to create a large number of threads to invoke the calls. We propose adding a new API to support asynchronous calls, i.e. the caller is not blocked. The methods in the new API immediately return a Java Future object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9838) Refactor the excessReplicateMap to a class
Tsz Wo Nicholas Sze created HDFS-9838: - Summary: Refactor the excessReplicateMap to a class Key: HDFS-9838 URL: https://issues.apache.org/jira/browse/HDFS-9838 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h9838_20160219.patch There are a lot of code duplication for accessing the excessReplicateMap in BlockManger. Let's refactor the related code to a class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9825) Balancer should not terminate if only one of the namenodes has error
Tsz Wo Nicholas Sze created HDFS-9825: - Summary: Balancer should not terminate if only one of the namenodes has error Key: HDFS-9825 URL: https://issues.apache.org/jira/browse/HDFS-9825 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer & mover Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Currently, the Balancer terminates if only one of the namenodes has error in federation setting. Instead, it should continue balancing the cluster with the remaining namenodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8050) Separate the client conf key from DFSConfigKeys
[ https://issues.apache.org/jira/browse/HDFS-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-8050. --- Resolution: Duplicate The was done by other subtasks in HDFS-8048. > Separate the client conf key from DFSConfigKeys > --- > > Key: HDFS-8050 > URL: https://issues.apache.org/jira/browse/HDFS-8050 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > > Currently, all the conf keys are in DFSConfigKeys. We should separate the > public client DFSConfigKeys to a new class in org.apache.hadoop.hdfs.client > as described by [~wheat9] in HDFS-6566. > For the private conf keys, they may be moved to a new class in > org.apache.hadoop.hdfs.client.impl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9822) BlockManager.validateReconstructionWork throws AssertionError
Tsz Wo Nicholas Sze created HDFS-9822: - Summary: BlockManager.validateReconstructionWork throws AssertionError Key: HDFS-9822 URL: https://issues.apache.org/jira/browse/HDFS-9822 Project: Hadoop HDFS Issue Type: Bug Components: erasure-coding Reporter: Tsz Wo Nicholas Sze Found the following AssertionError in https://builds.apache.org/job/PreCommit-HDFS-Build/14501/testReport/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testMissingStripedBlockWithBusyNode2/ {code} AssertionError: Should wait the previous reconstruction to finish at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.validateReconstructionWork(BlockManager.java:1680) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1536) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1472) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4229) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4100) at java.lang.Thread.run(Thread.java:745) at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126) at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4119) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9729) Use null to replace DataNode.EMPTY_DEL_HINT
Tsz Wo Nicholas Sze created HDFS-9729: - Summary: Use null to replace DataNode.EMPTY_DEL_HINT Key: HDFS-9729 URL: https://issues.apache.org/jira/browse/HDFS-9729 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor When a delete-hint is unavailable, the current code may use null or DataNode.EMPTY_DEL_HINT as a default value. Let's uniformly use null for an empty delele-hint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9726) Refactor IBR code to a new class
Tsz Wo Nicholas Sze created HDFS-9726: - Summary: Refactor IBR code to a new class Key: HDFS-9726 URL: https://issues.apache.org/jira/browse/HDFS-9726 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h9726_20160131.patch The IBR code currently is mainly in BPServiceActor. The JIRA is to refactor it to a new class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9710) Change DN to send block receipt IBRs in batches
Tsz Wo Nicholas Sze created HDFS-9710: - Summary: Change DN to send block receipt IBRs in batches Key: HDFS-9710 URL: https://issues.apache.org/jira/browse/HDFS-9710 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze When a DN has received a block, it immediately sends a block receipt IBR RPC to NN for report the block. Even if a DN has received multiple blocks about the same time, it still sends multiple RPCs. It does not scale well since NN has to process a huge number of RPCs when many DNs receiving many blocks at the same time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9690) addBlock is not idempotent
Tsz Wo Nicholas Sze created HDFS-9690: - Summary: addBlock is not idempotent Key: HDFS-9690 URL: https://issues.apache.org/jira/browse/HDFS-9690 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze TestDFSClientRetries#testIdempotentAllocateBlockAndClose can illustrate the bug. It failed in the following builds. - https://builds.apache.org/job/PreCommit-HDFS-Build/14188/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/ - https://builds.apache.org/job/PreCommit-HDFS-Build/14201/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/ - https://builds.apache.org/job/PreCommit-HDFS-Build/14202/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9654) Code refactoring for HDFS-8578
Tsz Wo Nicholas Sze created HDFS-9654: - Summary: Code refactoring for HDFS-8578 Key: HDFS-9654 URL: https://issues.apache.org/jira/browse/HDFS-9654 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor This is a code refactoring JIRA in order to change Datanode to process all storage/data dirs in parallel; see also HDFS-8578. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-9573) o.a.h.hdfs.protocol.SnapshotDiffReport$DiffReportEntry$hashCode inconsistent with equals
[ https://issues.apache.org/jira/browse/HDFS-9573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-9573. --- Resolution: Invalid "SnapshotDiffReport$DiffReportEntry$hashCode inconsistent with equals" is clear invalid. Resolving as Invalid. > o.a.h.hdfs.protocol.SnapshotDiffReport$DiffReportEntry$hashCode inconsistent > with equals > > > Key: HDFS-9573 > URL: https://issues.apache.org/jira/browse/HDFS-9573 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: John Zhuge >Assignee: John Zhuge >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > DiffReportEntry.equals() uses field "type", but DiffReportEntry.hashCode() > doesn't. This breaks the rules on equals and hashCode: > * if a class overrides equals, it must override hashCode > * when they are both overridden, equals and hashCode must use the same set of > fields -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9527) The return type of FSNamesystem.getBlockCollection should be changed to INodeFile
Tsz Wo Nicholas Sze created HDFS-9527: - Summary: The return type of FSNamesystem.getBlockCollection should be changed to INodeFile Key: HDFS-9527 URL: https://issues.apache.org/jira/browse/HDFS-9527 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor FSNamesystem.getBlockCollection always returns INodeFile. It avoids unnecessary conversion from BlockCollection to INode/INodeFile after the change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9528) Cleanup namenode audit/log/exception messages
Tsz Wo Nicholas Sze created HDFS-9528: - Summary: Cleanup namenode audit/log/exception messages Key: HDFS-9528 URL: https://issues.apache.org/jira/browse/HDFS-9528 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor - Cleanup unnecessary long methods for constructing message strings. - Avoid calling toString() methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-4488) Confusing WebHDFS exception when host doesn't resolve
[ https://issues.apache.org/jira/browse/HDFS-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-4488. --- Resolution: Cannot Reproduce Target Version/s: 2.1.0-beta, 3.0.0 (was: 3.0.0, 2.1.0-beta) {code} $hadoop fs -ls webhdfs://unresolvable-host/ 15/12/04 11:48:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable -ls: java.net.UnknownHostException: unresolvable-host ... $echo $? 255 {code} The message is already fixed. Resolving as Cannot Reproduce. > Confusing WebHDFS exception when host doesn't resolve > - > > Key: HDFS-4488 > URL: https://issues.apache.org/jira/browse/HDFS-4488 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 0.23.0 >Reporter: Daryn Sharp > > {noformat} > $ hadoop fs -ls webhdfs://unresolvable-host/ > ls: unresolvable-host > $ echo $? > 1 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-2593) Rename webhdfs HTTP param 'delegation' to 'delegationtoken'
[ https://issues.apache.org/jira/browse/HDFS-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-2593. --- Resolution: Not A Problem Resolving this stale issue as Not A Problem. > Rename webhdfs HTTP param 'delegation' to 'delegationtoken' > --- > > Key: HDFS-2593 > URL: https://issues.apache.org/jira/browse/HDFS-2593 > Project: Hadoop HDFS > Issue Type: Improvement > Components: webhdfs >Affects Versions: 0.23.1, 1.0.0, 2.0.0-alpha >Reporter: Alejandro Abdelnur > > to be consistent with other params names and to be more clear for users on > what it is. > webhdfs spec doc should be updated as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9509) Add new metrics for measuring datanode storage statistics
Tsz Wo Nicholas Sze created HDFS-9509: - Summary: Add new metrics for measuring datanode storage statistics Key: HDFS-9509 URL: https://issues.apache.org/jira/browse/HDFS-9509 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Tsz Wo Nicholas Sze We already have sendDataPacketBlockedOnNetworkNanos and sendDataPacketTransferNanos for the transferTo case. We should add more metrics for the other cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-3439) Balancer exits if fs.defaultFS is set to a different, but semantically identical, URI from dfs.namenode.rpc-address
[ https://issues.apache.org/jira/browse/HDFS-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-3439. --- Resolution: Cannot Reproduce Target Version/s: (was: ) Resolving as Cannot Reproduce. {code} $hadoop balancer -Dfs.defaultFS=hdfs://foo.example.com:8020/ -Ddfs.namenode.servicerpc-address=hdfs://foo.example.com:8020 DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. 15/12/03 13:47:12 INFO balancer.Balancer: namenodes = [hdfs://foo.example.com:8020] {code} > Balancer exits if fs.defaultFS is set to a different, but semantically > identical, URI from dfs.namenode.rpc-address > --- > > Key: HDFS-3439 > URL: https://issues.apache.org/jira/browse/HDFS-3439 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.0.0-alpha >Reporter: Aaron T. Myers > > The balancer determines the set of NN URIs to balance by looking at > fs.defaultFS and all possible dfs.namenode.(service)rpc-address settings. If > fs.defaultFS is, for example, set to "hdfs://foo.example.com:8020/" (note the > trailing "/") and the rpc-address is set to "hdfs://foo.example.com:8020" > (without a "/"), then the balancer will conclude that there are two NNs and > try to balance both. However, since both of these URIs refer to the same > actual FS instance, the balancer will exit with "java.io.IOException: Another > balancer is running. Exiting ..." -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-5958) One very large node in a cluster prevents balancer from balancing data
[ https://issues.apache.org/jira/browse/HDFS-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-5958. --- Resolution: Duplicate > One very large node in a cluster prevents balancer from balancing data > -- > > Key: HDFS-5958 > URL: https://issues.apache.org/jira/browse/HDFS-5958 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.2.0 > Environment: Hadoop cluster with 4 nodes: 3 with 500Gb drives and one > with 4Tb drive. >Reporter: Alexey Kovyrin > > In a cluster with a set of small nodes and one much larger node balancer > always selects the large node as the target even though it already has a copy > of each block in the cluster. > This causes the balancer to enter an infinite loop and stop balancing other > nodes because each balancing iteration selects the same target and then could > not find a single block to move. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-2220) balancer.Balancer: java.lang.NullPointerException while HADOOP_CONF_DIR is empty or wrong
[ https://issues.apache.org/jira/browse/HDFS-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-2220. --- Resolution: Cannot Reproduce Let's resolve this as Cannot Reproduce. > balancer.Balancer: java.lang.NullPointerException while HADOOP_CONF_DIR is > empty or wrong > - > > Key: HDFS-2220 > URL: https://issues.apache.org/jira/browse/HDFS-2220 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 0.20.204.0 >Reporter: Rajit Saha > > When HADOOP_CONF_DIR is empty or wrongly set and balancer is called without > proper --config , in clientside STDOUT we get NPE. > $ echo $HADOOP_CONF_DIR > $ hadoop balancer > Balancing took 46.0 milliseconds > 11/06/13 05:14:04 ERROR balancer.Balancer: java.lang.NullPointerException > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:136) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:176) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:206) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.getServiceAddress(NameNode.java:200) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.createNamenode(Balancer.java:911) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.init(Balancer.java:860) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:1475) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:811) > I think it would be good to give more meaningful error messege instead of NPE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-2851) HA: Optimize stale block processing by triggering block reports immediately on failover
[ https://issues.apache.org/jira/browse/HDFS-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-2851. --- Resolution: Not A Problem Target Version/s: (was: ) Resolving this stale issue as Not A Problem. > HA: Optimize stale block processing by triggering block reports immediately > on failover > --- > > Key: HDFS-2851 > URL: https://issues.apache.org/jira/browse/HDFS-2851 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover, datanode, ha, namenode >Affects Versions: 2.0.0-alpha >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Attachments: HDFS-2851-HDFS-1623-Test.patch > > > After Balancer runs, usedSpace is not balancing correctly. > {code} > java.util.concurrent.TimeoutException: Cluster failed to reached expected > values of totalSpace (current: 1500, expected: 1500), or usedSpace (current: > 390, expected: 300), in more than 2 msec. > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForHeartBeat(TestBalancer.java:233) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes.testBalancerWithHANameNodes(TestBalancerWithHANameNodes.java:99) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-2621) Balancer is not checking ALREADY_RUNNING state and never returns this state.
[ https://issues.apache.org/jira/browse/HDFS-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-2621. --- Resolution: Not A Problem Resolving this stale issue as Not A Problem. > Balancer is not checking ALREADY_RUNNING state and never returns this state. > > > Key: HDFS-2621 > URL: https://issues.apache.org/jira/browse/HDFS-2621 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 0.23.1, 2.0.0-alpha >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-9434) Recommission a datanode with 500k blocks may pause NN for 30 seconds
[ https://issues.apache.org/jira/browse/HDFS-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-9434. --- Resolution: Fixed Sangjin, thanks for the review. I have committed the branch-2.6 patch. > Recommission a datanode with 500k blocks may pause NN for 30 seconds > > > Key: HDFS-9434 > URL: https://issues.apache.org/jira/browse/HDFS-9434 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Fix For: 2.6.3 > > Attachments: h9434_20151116.patch, h9434_20151116_branch-2.6.patch > > > In BlockManager, processOverReplicatedBlocksOnReCommission is called within > the namespace lock. There is a (not very useful) log message printed in > processOverReplicatedBlock. When there is a large number of blocks stored in > a storage, printing the log message for each block can pause NN to process > any other operations. We did see that it could pause NN for 30 seconds for > a storage with 500k blocks. > I suggest to change the log message to trace level as a quick fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HDFS-9434) Recommission a datanode with 500k blocks may pause NN for 30 seconds
[ https://issues.apache.org/jira/browse/HDFS-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze reopened HDFS-9434: --- Reopen for backporting to branch-2.6 > Recommission a datanode with 500k blocks may pause NN for 30 seconds > > > Key: HDFS-9434 > URL: https://issues.apache.org/jira/browse/HDFS-9434 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Fix For: 2.6.3 > > Attachments: h9434_20151116.patch > > > In BlockManager, processOverReplicatedBlocksOnReCommission is called within > the namespace lock. There is a (not very useful) log message printed in > processOverReplicatedBlock. When there is a large number of blocks stored in > a storage, printing the log message for each block can pause NN to process > any other operations. We did see that it could pause NN for 30 seconds for > a storage with 500k blocks. > I suggest to change the log message to trace level as a quick fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HDFS-8246) Get HDFS file name based on block pool id and block id
[ https://issues.apache.org/jira/browse/HDFS-8246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze reopened HDFS-8246: --- > Get HDFS file name based on block pool id and block id > -- > > Key: HDFS-8246 > URL: https://issues.apache.org/jira/browse/HDFS-8246 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs-client, namenode >Reporter: feng xu >Assignee: feng xu > Labels: BB2015-05-TBR > Attachments: HDFS-8246.0.patch > > > This feature provides HDFS shell command and C/Java API to retrieve HDFS file > name based on block pool id and block id. > 1. The Java API in class DistributedFileSystem > public String getFileName(String poolId, long blockId) throws IOException > 2. The C API in hdfs.c > char* hdfsGetFileName(hdfsFS fs, const char* poolId, int64_t blockId) > 3. The HDFS shell command > hdfs dfs [generic options] -fn > This feature is useful if you have HDFS block file name in local file system > and want to find out the related HDFS file name in HDFS name space > (http://stackoverflow.com/questions/10881449/how-to-find-file-from-blockname-in-hdfs-hadoop). > Each HDFS block file name in local file system contains both block pool id > and block id, for sample HDFS block file name > /hdfs/1/hadoop/hdfs/data/current/BP-97622798-10.3.11.84-1428081035160/current/finalized/subdir0/subdir0/blk_1073741825, > the block pool id is BP-97622798-10.3.11.84-1428081035160 and the block id > is 1073741825. The block pool id is uniquely related to a HDFS name > node/name space, and the block id is uniquely related to a HDFS file within > a HDFS name node/name space, so the combination of block pool id and a block > id is uniquely related a HDFS file name. > The shell command and C/Java API do not map the block pool id to name node, > so it’s user’s responsibility to talk to the correct name node in federation > environment that has multiple name nodes. The block pool id is used by name > node to check if the user is talking with the correct name node. > The implementation is straightforward. The client request to get HDFS file > name reaches the new method String getFileName(String poolId, long blockId) > in FSNamesystem in name node through RPC, and the new method does the > followings, > (1) Validate the block pool id. > (2) Create Block based on the block id. > (3) Get BlockInfoContiguous from Block. > (4) Get BlockCollection from BlockInfoContiguous. > (5) Get file name from BlockCollection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8246) Get HDFS file name based on block pool id and block id
[ https://issues.apache.org/jira/browse/HDFS-8246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-8246. --- Resolution: Won't Fix > Get HDFS file name based on block pool id and block id > -- > > Key: HDFS-8246 > URL: https://issues.apache.org/jira/browse/HDFS-8246 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs-client, namenode >Reporter: feng xu >Assignee: feng xu > Labels: BB2015-05-TBR > Attachments: HDFS-8246.0.patch > > > This feature provides HDFS shell command and C/Java API to retrieve HDFS file > name based on block pool id and block id. > 1. The Java API in class DistributedFileSystem > public String getFileName(String poolId, long blockId) throws IOException > 2. The C API in hdfs.c > char* hdfsGetFileName(hdfsFS fs, const char* poolId, int64_t blockId) > 3. The HDFS shell command > hdfs dfs [generic options] -fn > This feature is useful if you have HDFS block file name in local file system > and want to find out the related HDFS file name in HDFS name space > (http://stackoverflow.com/questions/10881449/how-to-find-file-from-blockname-in-hdfs-hadoop). > Each HDFS block file name in local file system contains both block pool id > and block id, for sample HDFS block file name > /hdfs/1/hadoop/hdfs/data/current/BP-97622798-10.3.11.84-1428081035160/current/finalized/subdir0/subdir0/blk_1073741825, > the block pool id is BP-97622798-10.3.11.84-1428081035160 and the block id > is 1073741825. The block pool id is uniquely related to a HDFS name > node/name space, and the block id is uniquely related to a HDFS file within > a HDFS name node/name space, so the combination of block pool id and a block > id is uniquely related a HDFS file name. > The shell command and C/Java API do not map the block pool id to name node, > so it’s user’s responsibility to talk to the correct name node in federation > environment that has multiple name nodes. The block pool id is used by name > node to check if the user is talking with the correct name node. > The implementation is straightforward. The client request to get HDFS file > name reaches the new method String getFileName(String poolId, long blockId) > in FSNamesystem in name node through RPC, and the new method does the > followings, > (1) Validate the block pool id. > (2) Create Block based on the block id. > (3) Get BlockInfoContiguous from Block. > (4) Get BlockCollection from BlockInfoContiguous. > (5) Get file name from BlockCollection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8244) HDFS Custom Storage Tier Policies
[ https://issues.apache.org/jira/browse/HDFS-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-8244. --- Resolution: Duplicate Resolving. > HDFS Custom Storage Tier Policies > - > > Key: HDFS-8244 > URL: https://issues.apache.org/jira/browse/HDFS-8244 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer & mover, datanode, hdfs-client, namenode >Affects Versions: 2.6.0 > Environment: HDP 2.2 >Reporter: Hari Sekhon >Priority: Minor > > Feature request to be able to define custom HDFS storage policies. > For example, being able to define DISK:2, Archive:n - 2. > Motivation for this is when integrating the archive tier on another cheaper > storage system such as Hedvig which we are not in control of and want to > hedge our bets in case something goes wrong with that archive storage system > (it's new and unproven) we don't want just one copy of the data left on our > cluster in case we lose a node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8464) hdfs namenode UI shows "Max Non Heap Memory" is -1 B
[ https://issues.apache.org/jira/browse/HDFS-8464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-8464. --- Resolution: Duplicate > hdfs namenode UI shows "Max Non Heap Memory" is -1 B > > > Key: HDFS-8464 > URL: https://issues.apache.org/jira/browse/HDFS-8464 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 > Environment: suse11.3 >Reporter: tongshiquan >Priority: Minor > Attachments: screenshot-1.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HDFS-8727) Allow using path style addressing for accessing the s3 endpoint
[ https://issues.apache.org/jira/browse/HDFS-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze reopened HDFS-8727: --- Since no patch was committed, we should not resolve this as fixed. > Allow using path style addressing for accessing the s3 endpoint > --- > > Key: HDFS-8727 > URL: https://issues.apache.org/jira/browse/HDFS-8727 > Project: Hadoop HDFS > Issue Type: Improvement > Components: HDFS >Affects Versions: 2.7.1 >Reporter: Andrew Baptist >Assignee: Andrew Baptist > Labels: features > Attachments: hdfs-8728.patch.2 > > > There is no ability to specify using path style access for the s3 endpoint. > There are numerous non-amazon implementations of storage that support the > amazon API's but only support path style access such as Cleversafe and Ceph. > Additionally in many environments it is difficult to configure DNS correctly > to get virtual host style addressing to work -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8727) Allow using path style addressing for accessing the s3 endpoint
[ https://issues.apache.org/jira/browse/HDFS-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-8727. --- Resolution: Not A Problem Fix Version/s: (was: 2.7.2) > Allow using path style addressing for accessing the s3 endpoint > --- > > Key: HDFS-8727 > URL: https://issues.apache.org/jira/browse/HDFS-8727 > Project: Hadoop HDFS > Issue Type: Improvement > Components: HDFS >Affects Versions: 2.7.1 >Reporter: Andrew Baptist >Assignee: Andrew Baptist > Labels: features > Attachments: hdfs-8728.patch.2 > > > There is no ability to specify using path style access for the s3 endpoint. > There are numerous non-amazon implementations of storage that support the > amazon API's but only support path style access such as Cleversafe and Ceph. > Additionally in many environments it is difficult to configure DNS correctly > to get virtual host style addressing to work -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9441) Do not call construct path string when choosing block placement targets
Tsz Wo Nicholas Sze created HDFS-9441: - Summary: Do not call construct path string when choosing block placement targets Key: HDFS-9441 URL: https://issues.apache.org/jira/browse/HDFS-9441 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor - INodeFile.getName() is expensive since it involves quite a few string operations. The method is called in both ReplicationWork and ErasureCodingWork but the default BlockPlacementPolicy does not use the returned string. We should simply pass BlockCollection to reduce unnecessary computation when using the default BlockPlacementPolicy. - Another improvement: the return type of FSNamesystem.getBlockCollection should be changed to INodeFile since it always returns an INodeFile object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9434) Recommission a datanode with 500k blocks may pause NN for 30 seconds
Tsz Wo Nicholas Sze created HDFS-9434: - Summary: Recommission a datanode with 500k blocks may pause NN for 30 seconds Key: HDFS-9434 URL: https://issues.apache.org/jira/browse/HDFS-9434 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze In BlockManager, processOverReplicatedBlocksOnReCommission is called within the namespace lock. There is a (not very useful) log message printed in processOverReplicatedBlock. When there is a large number of blocks stored in a storage, printing the log message for each block can pause NN to process any other operations. We did see that it could pause NN for 30 seconds for a storage with 500k blocks. I suggest to change the log message to trace level as a quick fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9365) Balaner should call getNNServiceRpcAddressesForCluster after HDFS-6376
Tsz Wo Nicholas Sze created HDFS-9365: - Summary: Balaner should call getNNServiceRpcAddressesForCluster after HDFS-6376 Key: HDFS-9365 URL: https://issues.apache.org/jira/browse/HDFS-9365 Project: Hadoop HDFS Issue Type: Bug Components: balancer & mover Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze HDFS-6376 added support for DistCp between two HA clusters. After the change, Balaner will use all the NN from both the local and the remote clusters. It should call getNNServiceRpcAddressesForCluster and only use the local cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9346) TestDFSStripedOutputStreamWithFailure.testMultipleDatanodeFailure56 may throw NPE
Tsz Wo Nicholas Sze created HDFS-9346: - Summary: TestDFSStripedOutputStreamWithFailure.testMultipleDatanodeFailure56 may throw NPE Key: HDFS-9346 URL: https://issues.apache.org/jira/browse/HDFS-9346 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Tsz Wo Nicholas Sze Priority: Minor See the NPE in [build 13294#|https://builds.apache.org/job/PreCommit-HDFS-Build/13294/testReport/org.apache.hadoop.hdfs/TestDFSStripedOutputStreamWithFailure/testMultipleDatanodeFailure56/]. It seems a bug in the test. {code} java.lang.NullPointerException: null at org.apache.hadoop.hdfs.MiniDFSCluster.stopDataNode(MiniDFSCluster.java:2157) at org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.killDatanode(TestDFSStripedOutputStreamWithFailure.java:445) at org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.runTest(TestDFSStripedOutputStreamWithFailure.java:374) at org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.runTestWithMultipleFailure(TestDFSStripedOutputStreamWithFailure.java:301) at org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.testMultipleDatanodeFailure56(TestDFSStripedOutputStreamWithFailure.java:172) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9205) Do not sehedule corrupted blocks for replication
Tsz Wo Nicholas Sze created HDFS-9205: - Summary: Do not sehedule corrupted blocks for replication Key: HDFS-9205 URL: https://issues.apache.org/jira/browse/HDFS-9205 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Corrupted blocks by definition are blocks cannot be read. As a consequence, they cannot be replicated. In UnderReplicatedBlocks, there is a queue for QUEUE_WITH_CORRUPT_BLOCKS and chooseUnderReplicatedBlocks may choose blocks from it. It seems that scheduling corrupted block for replication is wasting resource and potentially slow down replication for the higher priority blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-9194) AlreadyBeingCreatedException ... because pendingCreates is non-null but no leases found.
[ https://issues.apache.org/jira/browse/HDFS-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-9194. --- Resolution: Duplicate > AlreadyBeingCreatedException ... because pendingCreates is non-null but no > leases found. > > > Key: HDFS-9194 > URL: https://issues.apache.org/jira/browse/HDFS-9194 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > > There is a possible bug in FSDirectory.addFile causing no leases found for > under construction files. > {code} > //FSDirectory > INodeFile addFile(String path, PermissionStatus permissions, > short replication, long preferredBlockSize, > String clientName, String clientMachine) > throws FileAlreadyExistsException, QuotaExceededException, > UnresolvedLinkException, SnapshotAccessControlException, AclException { > long modTime = now(); > INodeFile newNode = newINodeFile(namesystem.allocateNewInodeId(), > permissions, modTime, modTime, replication, preferredBlockSize); > newNode.toUnderConstruction(clientName, clientMachine); > boolean added = false; > writeLock(); > try { > added = addINode(path, newNode); > } finally { > writeUnlock(); > } > ... > } > {code} > - newNode.toUnderConstruction(clientName, clientMachine) adds > FileUnderConstructionFeature to the INode, i.e. the file becomes an under > construction file. At this moment, there is no lease for this file yet. The > lease will be added later in FSNamesystem.startFileInternal(..). > - It is possible that addINode(path, newNode) adds the inode to the namespace > tree but throws QuotaExceededException later on when calling > updateModificationTime. (i.e. addINode -> addLastINode -> addChild -> > parent.addChild -> updateModificationTime throws QuotaExceededException) > Then, the newly added uc file is left in namespace but the corresponding > lease won't be added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9194) AlreadyBeingCreatedException ... because pendingCreates is non-null but no leases found.
Tsz Wo Nicholas Sze created HDFS-9194: - Summary: AlreadyBeingCreatedException ... because pendingCreates is non-null but no leases found. Key: HDFS-9194 URL: https://issues.apache.org/jira/browse/HDFS-9194 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze There is a possible bug in FSDirectory.addFile causing no leases found for under construction files. {code} //FSDirectory INodeFile addFile(String path, PermissionStatus permissions, short replication, long preferredBlockSize, String clientName, String clientMachine) throws FileAlreadyExistsException, QuotaExceededException, UnresolvedLinkException, SnapshotAccessControlException, AclException { long modTime = now(); INodeFile newNode = newINodeFile(namesystem.allocateNewInodeId(), permissions, modTime, modTime, replication, preferredBlockSize); newNode.toUnderConstruction(clientName, clientMachine); boolean added = false; writeLock(); try { added = addINode(path, newNode); } finally { writeUnlock(); } ... } {code} - newNode.toUnderConstruction(clientName, clientMachine) adds FileUnderConstructionFeature to the INode, i.e. the file becomes an under construction file. At this moment, there is no lease for this file yet. The lease will be added later in FSNamesystem.startFileInternal(..). - It is possible that addINode(path, newNode) adds the inode to the namespace tree but throws QuotaExceededException later on when calling updateModificationTime. (i.e. addINode -> addLastINode -> addChild -> parent.addChild -> updateModificationTime throws QuotaExceededException) Then, the newly added uc file is left in namespace but the corresponding lease won't be added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8341) HDFS mover stuck in loop trying to move corrupt block with no other valid replicas, doesn't move rest of other data blocks
[ https://issues.apache.org/jira/browse/HDFS-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-8341. --- Resolution: Cannot Reproduce > HDFS mover stuck in loop trying to move corrupt block with no other valid > replicas, doesn't move rest of other data blocks > -- > > Key: HDFS-8341 > URL: https://issues.apache.org/jira/browse/HDFS-8341 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.6.0 > Environment: HDP 2.2 >Reporter: Hari Sekhon >Priority: Minor > > HDFS mover gets stuck looping on a block that fails to move and doesn't > migrate the rest of the blocks. > This is preventing recovery of data from a decomissioning external storage > tier used for archive (we've had problems with that proprietary "hyperscale" > storage product which is why a couple blocks here and there have checksum > problems or premature eof as shown below), but this should not prevent moving > all the other blocks to recover our data: > {code}hdfs mover -p /apps/hive/warehouse/ > 15/05/07 14:52:50 INFO mover.Mover: namenodes = > {hdfs://nameservice1=[/apps/hive/warehouse/]} > 15/05/07 14:52:51 INFO balancer.KeyManager: Block token params received from > NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec > 15/05/07 14:52:51 INFO block.BlockTokenSecretManager: Setting block keys > 15/05/07 14:52:51 INFO balancer.KeyManager: Update block keys every 2hrs, > 30mins, 0sec > 15/05/07 14:52:52 INFO block.BlockTokenSecretManager: Setting block keys > 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: > /default-rack/:1019 > 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: > /default-rack/:1019 > 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: > /default-rack/:1019 > 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: > /default-rack/:1019 > 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: > /default-rack/:1019 > 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: > /default-rack/:1019 > 15/05/07 14:52:52 WARN balancer.Dispatcher: Failed to move > blk_1075156654_1438349 with size=134217728 from :1019:ARCHIVE to > :1019:DISK through :1019: block move is failed: opReplaceBlock > BP-120244285--1417023863606:blk_1075156654_1438349 received exception > java.io.EOFException: Premature EOF: no length prefix available > > 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: > /default-rack/:1019 > 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: > /default-rack/:1019 > 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: > /default-rack/:1019 > 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: > /default-rack/:1019 > 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: > /default-rack/:1019 > 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: > /default-rack/:1019 > 15/05/07 14:53:31 WARN balancer.Dispatcher: Failed to move > blk_1075156654_1438349 with size=134217728 from :1019:ARCHIVE to > :1019:DISK through :1019: block move is failed: opReplaceBlock > BP-120244285--1417023863606:blk_1075156654_1438349 received exception > java.io.EOFException: Premature EOF: no length prefix available > .. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8341) (Summary & Description may be invalid) HDFS mover stuck in loop after failing to move block, doesn't move rest of blocks, can't get data back off decommissioning external
[ https://issues.apache.org/jira/browse/HDFS-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-8341. --- Resolution: Invalid Resolving as invalid. Please feel free to reopen if you disagree. > (Summary & Description may be invalid) HDFS mover stuck in loop after failing > to move block, doesn't move rest of blocks, can't get data back off > decommissioning external storage tier as a result > --- > > Key: HDFS-8341 > URL: https://issues.apache.org/jira/browse/HDFS-8341 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.6.0 > Environment: HDP 2.2 >Reporter: Hari Sekhon >Priority: Minor > > HDFS mover gets stuck looping on a block that fails to move and doesn't > migrate the rest of the blocks. > This is preventing recovery of data from a decomissioning external storage > tier used for archive (we've had problems with that proprietary "hyperscale" > storage product which is why a couple blocks here and there have checksum > problems or premature eof as shown below), but this should not prevent moving > all the other blocks to recover our data: > {code}hdfs mover -p /apps/hive/warehouse/ > 15/05/07 14:52:50 INFO mover.Mover: namenodes = > {hdfs://nameservice1=[/apps/hive/warehouse/]} > 15/05/07 14:52:51 INFO balancer.KeyManager: Block token params received from > NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec > 15/05/07 14:52:51 INFO block.BlockTokenSecretManager: Setting block keys > 15/05/07 14:52:51 INFO balancer.KeyManager: Update block keys every 2hrs, > 30mins, 0sec > 15/05/07 14:52:52 INFO block.BlockTokenSecretManager: Setting block keys > 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: > /default-rack/:1019 > 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: > /default-rack/:1019 > 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: > /default-rack/:1019 > 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: > /default-rack/:1019 > 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: > /default-rack/:1019 > 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: > /default-rack/:1019 > 15/05/07 14:52:52 WARN balancer.Dispatcher: Failed to move > blk_1075156654_1438349 with size=134217728 from :1019:ARCHIVE to > :1019:DISK through :1019: block move is failed: opReplaceBlock > BP-120244285--1417023863606:blk_1075156654_1438349 received exception > java.io.EOFException: Premature EOF: no length prefix available > > 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: > /default-rack/:1019 > 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: > /default-rack/:1019 > 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: > /default-rack/:1019 > 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: > /default-rack/:1019 > 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: > /default-rack/:1019 > 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: > /default-rack/:1019 > 15/05/07 14:53:31 WARN balancer.Dispatcher: Failed to move > blk_1075156654_1438349 with size=134217728 from :1019:ARCHIVE to > :1019:DISK through :1019: block move is failed: opReplaceBlock > BP-120244285--1417023863606:blk_1075156654_1438349 received exception > java.io.EOFException: Premature EOF: no length prefix available > .. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8921) Add an option to Balancer so tha it only uses the k-most over-utilized DNs or all over-utilized DNs as sources.
Tsz Wo Nicholas Sze created HDFS-8921: - Summary: Add an option to Balancer so tha it only uses the k-most over-utilized DNs or all over-utilized DNs as sources. Key: HDFS-8921 URL: https://issues.apache.org/jira/browse/HDFS-8921 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer mover Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Arpit suggested to add a separate option to source from the most over-utilized DataNodes first so the administrator does not have to pass the source DNs manually; see [this comment|https://issues.apache.org/jira/browse/HDFS-8826?focusedCommentId=14700576page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14700576]. The new option could allow specifying the k-most over-utilized DNs or all over-utilized DNs as sources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small
Tsz Wo Nicholas Sze created HDFS-8838: - Summary: Tolerate datanode failures in DFSStripedOutputStream when the data length is small Key: HDFS-8838 URL: https://issues.apache.org/jira/browse/HDFS-8838 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Currently, DFSStripedOutputStream cannot tolerate datanode failures when the data length is small. We fix the bugs here and add more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8824) Do not use small blocks for balancing the cluster
Tsz Wo Nicholas Sze created HDFS-8824: - Summary: Do not use small blocks for balancing the cluster Key: HDFS-8824 URL: https://issues.apache.org/jira/browse/HDFS-8824 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Balancer gets datanode block lists from NN and then move the blocks in order to balance the cluster. It should not use the blocks with small size since moving the small blocks generates a lot of overhead and the small blocks do not help balancing the cluster much. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8825) Enhancements to Balancer
Tsz Wo Nicholas Sze created HDFS-8825: - Summary: Enhancements to Balancer Key: HDFS-8825 URL: https://issues.apache.org/jira/browse/HDFS-8825 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze This is an umbrella JIRA to enhance Balancer. The goal is to make it runs faster, more efficient and improve its usability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8826) Balancer may not move blocks efficiently in some cases
Tsz Wo Nicholas Sze created HDFS-8826: - Summary: Balancer may not move blocks efficiently in some cases Key: HDFS-8826 URL: https://issues.apache.org/jira/browse/HDFS-8826 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer mover Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Balancer is inefficient in the following case: || Datanode || Utilization || Rack || | D1 | 95% | A | | D2 | 30% | B | | D3, D4, D5 | 0% | B | The average utilization is 25% so that D2 is within 10% threshold. However, Balancer currently will first move blocks from D2 to D3, D4 and D5 since they are under the same rack. Then, it will move blocks from D1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-852) Balancer shutdown synchronisation could do with a review
[ https://issues.apache.org/jira/browse/HDFS-852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-852. -- Resolution: Not A Problem I think this issue got stale. Resolving as Not a Problem. Please feel free to reopen if you disagree. Balancer shutdown synchronisation could do with a review Key: HDFS-852 URL: https://issues.apache.org/jira/browse/HDFS-852 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Affects Versions: 0.22.0 Reporter: Steve Loughran Priority: Minor Looking at the source of the Balancer, there's a lot {{catch(InterruptedException)}} clauses, which runs the risk of swallowing exceptions, making it harder to shut down a balancer. for example, the {{AccessKeyUpdater swallows the InterruptedExceptions which get used to tell it to shut down, and while it does poll the shared field {{shouldRun}}, that field isn't volatile: the shutdown may }}not work. Elsewhere, the {{dispatchBlocks()}} method swallows interruptions without even looking for any shutdown flag. This is all minor as it is shutdown logic, but it is the stuff that it hard to test and leads to problems in the field, the problems that leave the ops team resorting to {{kill -9}}, and we don't want that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-1676) DateFormat.getDateTimeInstance() is very expensive, we can cache it to improve performance
[ https://issues.apache.org/jira/browse/HDFS-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-1676. --- Resolution: Not A Problem Resolving this as not-a-problem. Please feel free to reopen if you disagree. DateFormat.getDateTimeInstance() is very expensive, we can cache it to improve performance -- Key: HDFS-1676 URL: https://issues.apache.org/jira/browse/HDFS-1676 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 0.21.0 Reporter: Xiaoming Shi Labels: newbie In the file: ./hadoop-0.21.0/hdfs/src/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java line:1520 In the while loop, DateFormat.getDateTimeInstance()is called in each iteration. We can cache the result by moving it outside the loop or adding a class member. This is similar to the Apache bug https://issues.apache.org/bugzilla/show_bug.cgi?id=48778 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-3619) isGoodBlockCandidate() in Balancer is not handling properly if replica factor 3
[ https://issues.apache.org/jira/browse/HDFS-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-3619. --- Resolution: Not A Problem Resolving as not-a-problem. Please feel free to reopen if you disagree. isGoodBlockCandidate() in Balancer is not handling properly if replica factor 3 Key: HDFS-3619 URL: https://issues.apache.org/jira/browse/HDFS-3619 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 1.0.0, 2.0.0-alpha Reporter: Junping Du Assignee: Junping Du Let's assume: 1. replica factor = 4 2. source node in rack 1 has 1st replica, 2nd and 3rd replica are in rack 2, 4th replica in rack3 and target node is in rack3. So, It should be good for balancer to move replica from source node to target node but will return false in isGoodBlockCandidate(). I think we can fix it by simply making judgement that at least one replica node (other than source) is on the different rack of target node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-3411) Balancer fails to balance blocks between aboveAvgUtilized and belowAvgUtilized datanodes.
[ https://issues.apache.org/jira/browse/HDFS-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-3411. --- Resolution: Not A Problem Resolving as not-a-problem. Please feel free to reopen if you disagree. Balancer fails to balance blocks between aboveAvgUtilized and belowAvgUtilized datanodes. - Key: HDFS-3411 URL: https://issues.apache.org/jira/browse/HDFS-3411 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 0.23.0 Reporter: Ashish Singhi Scaenario: replication set to 1. 1. Start 1NN and IDN 2. pump 1GB of data. 3. Start one more DN 4. Run balancer with threshold 1. Now DN1 is added into aboveAvgUtilizedDatanodes and DN2 into belowAvgUtilizedDatanodes. Hence overLoadedBytes and underLoadedBytes will be equal to 0. Resulting in bytesLeftToMove equal to 0. Thus balancer will exit without balancing the blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8818) Allow Balancer to run faster
Tsz Wo Nicholas Sze created HDFS-8818: - Summary: Allow Balancer to run faster Key: HDFS-8818 URL: https://issues.apache.org/jira/browse/HDFS-8818 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze The original design of Balancer is intentionally to make it run slowly so that the balancing activities won't affect the normal cluster activities and the running jobs. There are new use case that cluster admin may choose to balance the cluster when the cluster load is low, or in a maintain window. So that we should have an option to allow Balancer to run faster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8528) Erasure Coding: optimize client writing by making the writing of data and parity concurrently
[ https://issues.apache.org/jira/browse/HDFS-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-8528. --- Resolution: Duplicate This duplicates HDFS-8287. Let's resolve it. Erasure Coding: optimize client writing by making the writing of data and parity concurrently -- Key: HDFS-8528 URL: https://issues.apache.org/jira/browse/HDFS-8528 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo HDFS-8425 shows the client writing is not very efficient currently. One factor is that when data buffers are full, client suspends until the parities are encoded and written. This sub task tries to make the two writings concurrently to enhance the efficiency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8540) Mover should exit with NO_MOVE_BLOCK if no block can be moved
Tsz Wo Nicholas Sze created HDFS-8540: - Summary: Mover should exit with NO_MOVE_BLOCK if no block can be moved Key: HDFS-8540 URL: https://issues.apache.org/jira/browse/HDFS-8540 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Reporter: Tsz Wo Nicholas Sze When there are files not satisfying their storage policy and no move is possible, Mover exits with SUCCESS. It should exit with NO_MOVE_BLOCK. The bug seems in the following code. When StorageTypeDiff is not empty and scheduleMoves4Block return false, it does not update hasRemaining. Also, there is no indication of No block can be moved for the entire iteration. {code} //Mover.processFile(..) if (!diff.removeOverlap(true)) { if (scheduleMoves4Block(diff, lb, ecSchema)) { hasRemaining |= (diff.existing.size() 1 diff.expected.size() 1); } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8541) Mover should exit with NO_MOVE_PROGRESS if there is no move progress
Tsz Wo Nicholas Sze created HDFS-8541: - Summary: Mover should exit with NO_MOVE_PROGRESS if there is no move progress Key: HDFS-8541 URL: https://issues.apache.org/jira/browse/HDFS-8541 Project: Hadoop HDFS Issue Type: Improvement Reporter: Tsz Wo Nicholas Sze Priority: Minor HDFS-8143 changed Mover to exit after some retry when failed to move blocks. Two additional suggestions: # Mover retry counter should be incremented only if all moves fail. If there are some successful moves, the counter should be reset. # Mover should exit with NO_MOVE_PROGRESS instead of IO_EXCEPTION in case of failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8437) Fail/warn if HDFS is setup with an even number of QJMs.
Tsz Wo Nicholas Sze created HDFS-8437: - Summary: Fail/warn if HDFS is setup with an even number of QJMs. Key: HDFS-8437 URL: https://issues.apache.org/jira/browse/HDFS-8437 Project: Hadoop HDFS Issue Type: Improvement Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor When setting an even number (2n, n1) of QJMs, the number of failure it can tolerate is the same as one node less (2n-1). Therefore, it does not make sense to setup with an even number of QJMs. We should either fail it or warn the users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8433) blockToken is not set in constructInternalBlock and parseStripedBlockGroup in StripedBlockUtil
Tsz Wo Nicholas Sze created HDFS-8433: - Summary: blockToken is not set in constructInternalBlock and parseStripedBlockGroup in StripedBlockUtil Key: HDFS-8433 URL: https://issues.apache.org/jira/browse/HDFS-8433 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze The blockToken provided in LocatedStripedBlock is not used to create LocatedBlock in constructInternalBlock and parseStripedBlockGroup in StripedBlockUtil. We should also add ec tests with security on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8405) Fix a typo in NamenodeFsck
Tsz Wo Nicholas Sze created HDFS-8405: - Summary: Fix a typo in NamenodeFsck Key: HDFS-8405 URL: https://issues.apache.org/jira/browse/HDFS-8405 Project: Hadoop HDFS Issue Type: Bug Reporter: Tsz Wo Nicholas Sze Assignee: Takanobu Asanuma Priority: Minor DFSConfigKeys.DFS_NAMENODE_REPLICATION_MIN_KEY below should not be quoted. {code} res.append(\n ).append(DFSConfigKeys.DFS_NAMENODE_REPLICATION_MIN_KEY:\t) .append(minReplication); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8397) Refactor the error handling code in DataStreamer
Tsz Wo Nicholas Sze created HDFS-8397: - Summary: Refactor the error handling code in DataStreamer Key: HDFS-8397 URL: https://issues.apache.org/jira/browse/HDFS-8397 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor DataStreamer handles (1) bad datanode, (2) restarting datanode and (3) datanode replacement and keeps various state and indexes. This issue is to clean up the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8383) Tolerate multiple failures in DFSStripedOutputStream
Tsz Wo Nicholas Sze created HDFS-8383: - Summary: Tolerate multiple failures in DFSStripedOutputStream Key: HDFS-8383 URL: https://issues.apache.org/jira/browse/HDFS-8383 Project: Hadoop HDFS Issue Type: Improvement Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8384) Allow NN to startup if there are files having a lease but are not under construction
Tsz Wo Nicholas Sze created HDFS-8384: - Summary: Allow NN to startup if there are files having a lease but are not under construction Key: HDFS-8384 URL: https://issues.apache.org/jira/browse/HDFS-8384 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor When there are files having a lease but are not under construction, NN will fail to start up with {code} 15/05/12 00:36:31 ERROR namenode.FSImage: Unable to save image for /hadoop/hdfs/namenode java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.getINodesUnderConstruction(LeaseManager.java:412) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFilesUnderConstruction(FSNamesystem.java:7124) ... {code} The actually problem is that the image could be corrupted by bugs like HDFS-7587. We should have an option/conf to allow NN to start up so that the problematic files could possibly be deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)