[jira] [Created] (HDDS-1688) Deadlock in ratis client
Rakesh R created HDDS-1688: -- Summary: Deadlock in ratis client Key: HDDS-1688 URL: https://issues.apache.org/jira/browse/HDDS-1688 Project: Hadoop Distributed Data Store Issue Type: Bug Affects Versions: 0.5.0 Reporter: Rakesh R Attachments: Freon_baseline_100Threads_64MB_Keysize_8Keys_10buckets.bin Ran Freon benchmark in a three node cluster with 100 writer threads. After some time the client got hanged due to deadlock issue. +Freon with the args:-+ --numOfBuckets=10 --numOfKeys=8 --keySize=67108864 --numOfVolumes=100 --numOfThreads=100 3 BLOCKED threads. Attached whole threaddump. {code} Found one Java-level deadlock: = "grpc-default-executor-6": waiting for ownable synchronizer 0x00021546bd00, (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync), which is held by "ForkJoinPool.commonPool-worker-7" "ForkJoinPool.commonPool-worker-7": waiting to lock monitor 0x7f48fc99c448 (object 0x00021546be30, a org.apache.ratis.util.SlidingWindow$Client), which is held by "grpc-default-executor-6" {code} {code} ForkJoinPool.commonPool-worker-7 priority:5 - threadId:0x7f48d834b000 - nativeId:0x9ffb - nativeId (decimal):40955 - state:BLOCKED stackTrace: java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.ratis.util.SlidingWindow$Client.resetFirstSeqNum(SlidingWindow.java:348) - waiting to lock <0x00021546be30> (a org.apache.ratis.util.SlidingWindow$Client) at org.apache.ratis.client.impl.OrderedAsync.resetSlidingWindow(OrderedAsync.java:122) at org.apache.ratis.client.impl.OrderedAsync$$Lambda$943/1670264164.accept(Unknown Source) at org.apache.ratis.client.impl.RaftClientImpl.lambda$handleIOException$6(RaftClientImpl.java:352) at org.apache.ratis.client.impl.RaftClientImpl$$Lambda$944/769363367.accept(Unknown Source) at java.util.Optional.ifPresent(Optional.java:159) at org.apache.ratis.client.impl.RaftClientImpl.handleIOException(RaftClientImpl.java:352) at org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$10(OrderedAsync.java:235) at org.apache.ratis.client.impl.OrderedAsync$$Lambda$776/1213731951.apply(Unknown Source) at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.completeReplyExceptionally(GrpcClientProtocolClient.java:324) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.close(GrpcClientProtocolClient.java:313) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$400(GrpcClientProtocolClient.java:245) at org.apache.ratis.grpc.client.GrpcClientProtocolClient.lambda$close$1(GrpcClientProtocolClient.java:131) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$$Lambda$950/1948156329.accept(Unknown Source) at java.util.Optional.ifPresent(Optional.java:159) at org.apache.ratis.grpc.client.GrpcClientProtocolClient.close(GrpcClientProtocolClient.java:131) at org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$close$1(PeerProxyMap.java:73) at org.apache.ratis.util.PeerProxyMap$PeerAndProxy$$Lambda$948/427065222.run(Unknown Source) at org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$2(LifeCycle.java:231) at org.apache.ratis.util.LifeCycle$$Lambda$949/1311526821.get(Unknown Source) at org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:251) at org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:229) at org.apache.ratis.util.PeerProxyMap$PeerAndProxy.close(PeerProxyMap.java:70) - locked <0x0003e793ef48> (a org.apache.ratis.util.PeerProxyMap$PeerAndProxy) at org.apache.ratis.util.PeerProxyMap.resetProxy(PeerProxyMap.java:126) - locked <0x000215453400> (a java.lang.Object) at org.apache.ratis.util.PeerProxyMap.handleException(PeerProxyMap.java:135) at org.apache.ratis.client.impl.RaftClientRpcWithProxy.handleException(RaftClientRpcWithProxy.java:47) at org.apache.ratis.client.impl.RaftClientImpl.handleIOException(RaftClientImpl.java:375) at org.apache.ratis.client.impl.RaftClientImpl.handleIOException(RaftClientImpl.java:341) at org.apache.ratis.client.impl.UnorderedAsync.lambda$sendRequestWithRetry$4(UnorderedAsync.java:108) at org.apache.ratis.client.impl.UnorderedAsync$$Lambda$976/655038759.accept(Unknown Source) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) at java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:44
[jira] [Created] (HDDS-1687) Datanode process shutdown due to OOME
Rakesh R created HDDS-1687: -- Summary: Datanode process shutdown due to OOME Key: HDDS-1687 URL: https://issues.apache.org/jira/browse/HDDS-1687 Project: Hadoop Distributed Data Store Issue Type: Bug Affects Versions: 0.5.0 Reporter: Rakesh R Attachments: baseline test - datanode error logs.0.5.0.rar Ran Freon benchmark in a three node cluster and with more parallel writer threads, datanode daemon hits OOME and got shutdown. Used HDD as storage type in worker nodes. +Freon with the args:-+ --numOfBuckets=10 --numOfKeys=8 --keySize=67108864 --numOfVolumes=100 --numOfThreads=100 *DN-2* : Process got killed during the test, due to OOME {code} 2019-06-13 00:48:11,976 ERROR org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: Terminating with exit status 1: a0cb8914-b51c-41b1-b5d2-59313cf38c0b-SegmentedRaftLogWorker:Storage Directory /data/datab/ozone/metadir/ratis/cbf29739-cbd1-4b00-8a21-2db750004dc7 failed. java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:694) at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:44) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:70) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:481) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:234) at java.lang.Thread.run(Thread.java:748) {code} *DN3* : Process got killed during the test, due to OOME. I could see lots of NPE at the datanode logs. {code} 2019-06-13 00:44:44,581 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 83232f1f-4469-4a4d-b369-c131c8432ae9: follower 07ace812-3883-47d3-ac95-3d55de5fab5c:10.243.61.192:9858's next index is 0, log's start index is 10062, need to notify follower to install snapshot 2019-06-13 00:44:44,582 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 83232f1f-4469-4a4d-b369-c131c8432ae9->07ace812-3883-47d3-ac95-3d55de5fab5c: follower responses installSnapshot Completed 2019-06-13 00:44:44,582 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 83232f1f-4469-4a4d-b369-c131c8432ae9: follower 07ace812-3883-47d3-ac95-3d55de5fab5c:10.243.61.192:9858's next index is 0, log's start index is 10062, need to notify follower to install snapshot 2019-06-13 00:44:44,587 ERROR org.apache.ratis.server.impl.LogAppender: org.apache.ratis.server.impl.LogAppender$AppenderDaemon@554415fe unexpected exception java.lang.NullPointerException: 83232f1f-4469-4a4d-b369-c131c8432ae9->07ace812-3883-47d3-ac95-3d55de5fab5c: Previous TermIndex not found for firstIndex = 10062 at java.util.Objects.requireNonNull(Objects.java:290) at org.apache.ratis.server.impl.LogAppender.assertProtos(LogAppender.java:234) at org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:221) at org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:169) at org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:113) at org.apache.ratis.server.impl.LogAppender$AppenderDaemon.run(LogAppender.java:80) at java.lang.Thread.run(Thread.java:748) OOME log messages present in the *.out file. Exception in thread "org.apache.ratis.server.impl.LogAppender$AppenderDaemon$$Lambda$267/386355867@1d9c10b3" java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:717) at org.apache.ratis.server.impl.LogAppender$AppenderDaemon.start(LogAppender.java:68) at org.apache.ratis.server.impl.LogAppender.startAppender(LogAppender.java:153) at java.util.ArrayList.forEach(ArrayList.java:1257) at org.apache.ratis.server.impl.LeaderState.addAndStartSenders(LeaderState.java:372) at org.apache.ratis.server.impl.LeaderState.restartSender(LeaderState.java:394) at org.apache.ratis.server.impl.LogAppender$AppenderDaemon.run(LogAppender.java:97) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1594) NullPointerException at the ratis client while running Freon benchmark
Rakesh R created HDDS-1594: -- Summary: NullPointerException at the ratis client while running Freon benchmark Key: HDDS-1594 URL: https://issues.apache.org/jira/browse/HDDS-1594 Project: Hadoop Distributed Data Store Issue Type: Bug Affects Versions: 0.4.0 Reporter: Rakesh R Hits NPE during Freon benchmark test run. Below is the exception logged at the client side output log message. {code} SEVERE: Exception while executing runnable org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed@6c585536 java.lang.NullPointerException at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.completeReplyExceptionally(GrpcClientProtocolClient.java:320) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$000(GrpcClientProtocolClient.java:245) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onError(GrpcClientProtocolClient.java:269) at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434) at org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) at org.apache.ratis.thirdparty.io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:678) at org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) at org.apache.ratis.thirdparty.io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397) at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459) at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63) at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546) at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467) at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584) at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14393) Move stats related methods to MappableBlockLoader
Rakesh R created HDFS-14393: --- Summary: Move stats related methods to MappableBlockLoader Key: HDFS-14393 URL: https://issues.apache.org/jira/browse/HDFS-14393 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R This jira sub-task is to move stats related methods to specific loader and make FsDatasetCache more cleaner to plugin DRAM and PMem implementations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R reopened HDFS-14355: - > Implement HDFS cache on SCM by using pure java mapped byte buffer > - > > Key: HDFS-14355 > URL: https://issues.apache.org/jira/browse/HDFS-14355 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, > HDFS-14355.002.patch > > > This task is to implement the caching to persistent memory using pure > {{java.nio.MappedByteBuffer}}, which could be useful in case native support > isn't available or convenient in some environments or platforms. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R resolved HDFS-14355. - Resolution: Unresolved > Implement HDFS cache on SCM by using pure java mapped byte buffer > - > > Key: HDFS-14355 > URL: https://issues.apache.org/jira/browse/HDFS-14355 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, > HDFS-14355.002.patch > > > This task is to implement the caching to persistent memory using pure > {{java.nio.MappedByteBuffer}}, which could be useful in case native support > isn't available or convenient in some environments or platforms. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13808) [SPS]: Remove unwanted FSNamesystem #isFileOpenedForWrite() and #getFileInfo() function
Rakesh R created HDFS-13808: --- Summary: [SPS]: Remove unwanted FSNamesystem #isFileOpenedForWrite() and #getFileInfo() function Key: HDFS-13808 URL: https://issues.apache.org/jira/browse/HDFS-13808 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-13084) [SPS]: Fix the branch review comments
[ https://issues.apache.org/jira/browse/HDFS-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R resolved HDFS-13084. - Resolution: Fixed Fix Version/s: HDFS-10285 I'm closing this issue as {{IntraSPSNameNodeContext}} code implementation specifically for the internal SPS service has been removed from this branch. Internal SPS mechanism will be discussed and supported via the follow-up Jira task HDFS-12226. We have taken care the comments related to this branch via HDFS-13097, HDFS-13110, HDFS-13166, HDFS-13381 Jira sub-tasks. > [SPS]: Fix the branch review comments > - > > Key: HDFS-13084 > URL: https://issues.apache.org/jira/browse/HDFS-13084 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Uma Maheswara Rao G >Assignee: Rakesh R >Priority: Major > Fix For: HDFS-10285 > > > Fix the review comments provided by [~daryn] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13491) [SPS]: Discuss and implement efficient approach to send a copy of a block to another datanode
Rakesh R created HDFS-13491: --- Summary: [SPS]: Discuss and implement efficient approach to send a copy of a block to another datanode Key: HDFS-13491 URL: https://issues.apache.org/jira/browse/HDFS-13491 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R This Jira task is to reach consensus about the block transfer logic to another data node and implement the same to satisfy block storage policy. Reference discussion thread -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13381) [SPS]: Use DFSUtilClient#makePathFromFileId() to prepare satisfier file path
Rakesh R created HDFS-13381: --- Summary: [SPS]: Use DFSUtilClient#makePathFromFileId() to prepare satisfier file path Key: HDFS-13381 URL: https://issues.apache.org/jira/browse/HDFS-13381 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R This Jira task will address the following comments: # Use DFSUtilClient::makePathFromFileId, instead of generics(one for string path and another for inodeId) like today. # Only the context impl differs for external/internal sps. Here, it can simply move FileCollector and BlockMoveTaskHandler to Context interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13166) [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly getLiveDatanodeStorageReport() calls
Rakesh R created HDFS-13166: --- Summary: [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly getLiveDatanodeStorageReport() calls Key: HDFS-13166 URL: https://issues.apache.org/jira/browse/HDFS-13166 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Presently {{#getLiveDatanodeStorageReport}} is fetched for every file and does the computation. This task is to discuss and implement a cache mechanism to minimize the number of function calls. Probably, we could define a configurable refresh interval and periodically refresh the DN cache by fetching latest {{#getLiveDatanodeStorageReport}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13165) [SPS]: Collects successfully moved block details via IBR
Rakesh R created HDFS-13165: --- Summary: [SPS]: Collects successfully moved block details via IBR Key: HDFS-13165 URL: https://issues.apache.org/jira/browse/HDFS-13165 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R This task to make use of the existing IBR to get moved block details and remove unwanted future tracking logic exists in BlockStorageMovementTracker code, this is no more needed as the file level tracking maintained at NN itself. Following comments taken from HDFS-10285, [here|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16347472&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16347472] Comment-3) {quote}BPServiceActor Is it actually sending back the moved blocks? Aren’t IBRs sufficient?{quote} Comment-21) {quote} BlockStorageMovementTracker Many data structures are riddled with non-threadsafe race conditions and risk of CMEs. Ex. The moverTaskFutures map. Adding new blocks and/or adding to a block's list of futures is synchronized. However the run loop does an unsynchronized block get, unsynchronized future remove, unsynchronized isEmpty, possibly another unsynchronized get, only then does it do a synchronized remove of the block. The whole chunk of code should be synchronized. Is the problematic moverTaskFutures even needed? It's aggregating futures per-block for seemingly no reason. Why track all the futures at all instead of just relying on the completion service? As best I can tell: It's only used to determine if a future from the completion service should be ignored during shutdown. Shutdown sets the running boolean to false and clears the entire datastructure so why not use the running boolean like a check just a little further down? As synchronization to sleep up to 2 seconds before performing a blocking moverCompletionService.take, but only when it thinks there are no active futures. I'll ignore the missed notify race that the bounded wait masks, but the real question is why not just do the blocking take? Why all the complexity? Am I missing something? BlocksMovementsStatusHandler Suffers same type of thread safety issues as StoragePolicySatisfyWorker. Ex. blockIdVsMovementStatus is inconsistent synchronized. Does synchronize to return an unmodifiable list which sadly does nothing to protect the caller from CME. handle is iterating over a non-thread safe list. {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13110) [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier
Rakesh R created HDFS-13110: --- Summary: [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier Key: HDFS-13110 URL: https://issues.apache.org/jira/browse/HDFS-13110 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R This task is to address [~daryn] comments. *Comment No.10)* NamenodeProtocolTranslatorPB Most of the api changes appear unnecessary. IntraSPSNameNodeContext#getFileInfo swallows all IOEs, based on assumption that any and all IOEs means FNF which probably isn’t the intention during rpc exceptions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13095) Improve slice tree traversal implementation
Rakesh R created HDFS-13095: --- Summary: Improve slice tree traversal implementation Key: HDFS-13095 URL: https://issues.apache.org/jira/browse/HDFS-13095 Project: Hadoop HDFS Issue Type: Bug Reporter: Rakesh R Assignee: Rakesh R This task is to refine the existing slice tree traversal logic in [ReencryptionHandler|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ReencryptionHandler.java#L74] class. Please refer Daryn's review comments {quote}*FSTreeTraverser* I need to study this more but I have grave concerns this will work correctly in a mutating namesystem. Ex. renames and deletes esp. in combination with snapshots. Looks like there's a chance it will go off in the weeds when backtracking out of a renamed directory. traverseDir may NPE if it's traversing a tree in a snapshot and one of the ancestors is deleted. Not sure why it's bothering to re-check permissions during the crawl. The storage policy is inherited by the entire tree, regardless of whether the sub-contents are accessible. The effect of this patch is the storage policy is enforced for all readable files, non-readable violate the new storage policy, new non-readable will conform to the new storage policy. Very convoluted. Since new files will conform, should just process the entire tree. {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13077) [SPS]: Fix review comments of external storage policy satisfier
Rakesh R created HDFS-13077: --- Summary: [SPS]: Fix review comments of external storage policy satisfier Key: HDFS-13077 URL: https://issues.apache.org/jira/browse/HDFS-13077 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R This task is to address the following Uma's review comments: - Implement login with external SPS keytab - make SPS outstanding requests Q limit configurable. Configuration could be {{dfs.storage.policy.satisfier.max.outstanding.paths}} - fix checkstyle warnings -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13076) Merge work for HDFS-10285
Rakesh R created HDFS-13076: --- Summary: Merge work for HDFS-10285 Key: HDFS-13076 URL: https://issues.apache.org/jira/browse/HDFS-13076 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R This Jira is to run aggregated HDFS-10285 branch patch against trunk and check for any jenkins issues. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13057) [SPS]: Revisit configurations to make SPS service modes internal/external/none
Rakesh R created HDFS-13057: --- Summary: [SPS]: Revisit configurations to make SPS service modes internal/external/none Key: HDFS-13057 URL: https://issues.apache.org/jira/browse/HDFS-13057 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R This task is to revisit the configurations to make SPS service modes - {{internal/external/none}} - {{internal}} : represents SPS service should be running with NN - {{external}}: represents SPS service will be running outside NN - {{none}}: represents the SPS service is completely disabled and zero cost to the system. Proposed configuration {{dfs.storage.policy.satisfier.running.mode}} item in hdfs-site.xml file and value will be string. The mode can be changed via {{reconfig}} command. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12982) [SPS]: Reduce the locking and cleanup the Namesystem access
Rakesh R created HDFS-12982: --- Summary: [SPS]: Reduce the locking and cleanup the Namesystem access Key: HDFS-12982 URL: https://issues.apache.org/jira/browse/HDFS-12982 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R This task is to optimize the NS lock usage in SPS and cleanup the Namesystem access via {{Context}} interface. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12790) [SPS]: Rebasing HDFS-10285 branch after HDFS-10467, HDFS-12599 and HDFS-11968 commits
Rakesh R created HDFS-12790: --- Summary: [SPS]: Rebasing HDFS-10285 branch after HDFS-10467, HDFS-12599 and HDFS-11968 commits Key: HDFS-12790 URL: https://issues.apache.org/jira/browse/HDFS-12790 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R This task is a continuation with the periodic HDFS-10285 branch code rebasing with the trunk code. To make branch code compile with the trunk code, it needs to be refactored with the latest trunk code changes - HDFS-10467, HDFS-12599 and HDFS-11968. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12570) [SPS]: Refactor Co-ordinator datanode logic to track the block storage movements
Rakesh R created HDFS-12570: --- Summary: [SPS]: Refactor Co-ordinator datanode logic to track the block storage movements Key: HDFS-12570 URL: https://issues.apache.org/jira/browse/HDFS-12570 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Rakesh R Assignee: Rakesh R This task is to refactor the C-DN block storage movements. Basically, the idea is to move the scheduling and tracking logic to Namenode rather than at the special C-DN. Please refer the discussion with [~andrew.wang] to understand the [background and the necessity of refactoring|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16141060&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16141060]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12291) [SPS]: Provide a mechanism to recursively iterate and satisfy storage policy of all the files under the given dir
Rakesh R created HDFS-12291: --- Summary: [SPS]: Provide a mechanism to recursively iterate and satisfy storage policy of all the files under the given dir Key: HDFS-12291 URL: https://issues.apache.org/jira/browse/HDFS-12291 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R For the given source path directory, presently SPS consider only the files immediately under the directory(only one level of scanning) for satisfying the policy. It WON’T do recursive directory scanning and then schedules SPS tasks to satisfy the storage policy of all the files till the leaf node. The idea of this jira is to discuss & implement an efficient recursive directory iteration mechanism and satisfies storage policy for all the files under the given directory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12228) [SPS]: Add storage policy satisfier related metrics
Rakesh R created HDFS-12228: --- Summary: [SPS]: Add storage policy satisfier related metrics Key: HDFS-12228 URL: https://issues.apache.org/jira/browse/HDFS-12228 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R This jira to discuss and implement metrics needed for SPS feature. Below are few metrics: # count of {{inprogress}} block movements # count of {{successful}} block movements # count of {{failed}} block movements Need to analyse and add more. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12227) Add throttling to control the number of concurrent moves at the datanode
Rakesh R created HDFS-12227: --- Summary: Add throttling to control the number of concurrent moves at the datanode Key: HDFS-12227 URL: https://issues.apache.org/jira/browse/HDFS-12227 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Rakesh R -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12226) Follow-on work for Storage Policy Satisfier in Namenode
Rakesh R created HDFS-12226: --- Summary: Follow-on work for Storage Policy Satisfier in Namenode Key: HDFS-12226 URL: https://issues.apache.org/jira/browse/HDFS-12226 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Reporter: Rakesh R This is a follow up jira of HDFS-10285 Storage Policy Satisfier feature. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12214) Rename configuration property 'dfs.storage.policy.satisfier.activate' to 'dfs.storage.policy.satisfier.enable'
Rakesh R created HDFS-12214: --- Summary: Rename configuration property 'dfs.storage.policy.satisfier.activate' to 'dfs.storage.policy.satisfier.enable' Key: HDFS-12214 URL: https://issues.apache.org/jira/browse/HDFS-12214 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R This sub-task is to address [~andrew.wang]'s review comments. Please refer the [review comment|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16103734&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16103734] in HDFS-10285 umbrella jira. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12152) [SPS]: Re-arrange StoragePolicySatisfyWorker stopping sequence to improve thread cleanup time
Rakesh R created HDFS-12152: --- Summary: [SPS]: Re-arrange StoragePolicySatisfyWorker stopping sequence to improve thread cleanup time Key: HDFS-12152 URL: https://issues.apache.org/jira/browse/HDFS-12152 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R This jira to improve the StoragePolicySatisfyWorker#stop sequence of steps to improve the thread interruption and graceful shutdown quickly. I have observed that [TestDataNodeUUID#testUUIDRegeneration|https://builds.apache.org/job/PreCommit-HDFS-Build/20271/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeUUID/testUUIDRegeneration/] test case is getting timed out frequently. When analyzing, it looks like the below function is always taking 3 secs waiting period. Probably, we could improve the thread interruption sequence so that the thread should finish #run method quickly. {code} StoragePolicySatisfyWorker.java void waitToFinishWorkerThread() { try { movementTrackerThread.join(3000); } catch (InterruptedException ignore) { // ignore } } {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12141) [SPS]: Fix checkstyle warnings
Rakesh R created HDFS-12141: --- Summary: [SPS]: Fix checkstyle warnings Key: HDFS-12141 URL: https://issues.apache.org/jira/browse/HDFS-12141 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R This sub-task is to fix the applicable checkstyle warnings in HDFS-10285 branch. Attached the checkstyle report. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-8125) Erasure Coding: Expose refreshECSchemas command to reload predefined schemas
[ https://issues.apache.org/jira/browse/HDFS-8125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R resolved HDFS-8125. Resolution: Not A Problem Agreed, the way we are plugging EC policy has been changed and its hard coded approach now. This jira is not required and I'm closing this as {{Not a problem}}. > Erasure Coding: Expose refreshECSchemas command to reload predefined schemas > > > Key: HDFS-8125 > URL: https://issues.apache.org/jira/browse/HDFS-8125 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R > > This is to expose {{refreshECSchemas}} command to administrators. When > invoking this command it will reload predefined schemas from configuration > file and dynamically update the schema definitions maintained in Namenode. > Note: For more details please refer the > [discussion|https://issues.apache.org/jira/browse/HDFS-7866?focusedCommentId=14489387&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14489387] > with [~drankye] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11248) [SPS]: Handle partial block location movements
Rakesh R created HDFS-11248: --- Summary: [SPS]: Handle partial block location movements Key: HDFS-11248 URL: https://issues.apache.org/jira/browse/HDFS-11248 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R This jira is to handle partial block location movements due to unavailability of target nodes for the matching storage type. For example, We have only A(disk,archive), B(disk) and C(disk,archive) are live nodes with A & C have archive storage type. Say, we have a block with locations {{A(disk), B(disk), C(disk)}}. Again assume, user changed the storage policy to COLD. Now, SPS internally starts preparing the src-target pairing like, {{src=> (A, B, C) and target=> (A, C)}} and sends BLOCK_STORAGE_MOVEMENT to the coordinator. SPS is skipping B as it doesn't have archive media to indicate that it should do retries to satisfy all block locations after some time. On receiving the movement command, coordinator will pair the src-target node to schedule actual physical movements like, {{movetask=> (A, A), (B, C)}}. Here ideally it should do {{(C, C)}} instead of {{(B, C)}} but mistakenly choosing the source C and creates problem. IMHO, the implicit assumptions of retry needed is creating confusions and leads to coding mistakes. One idea to fix this problem is to create a new flag {{retryNeeded}} flag to make it more readable. With this, SPS will prepare only the matching pair and dummy source slots will be avoided like, {{src=> (A, C) and target=> (A, C)}} and mark {{retryNeeded=true}} to convey the message that this {{trackId}} has only partial blocks movements. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-7955) Improve naming of classes, methods, and variables related to block replication and recovery
[ https://issues.apache.org/jira/browse/HDFS-7955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R resolved HDFS-7955. Resolution: Fixed Target Version/s: 3.0.0-alpha2 Thank you [~zhz], [~andrew.wang], [~szetszwo], [~umamaheswararao], [~drankye] and all others for the great support! This umbrella jira has covered most of the desired work. I'm closing this jira. IMHO, if needed we could create sub-task(s) under HDFS-8031 erasure coding follow-on umbrella jira and work on. Thanks! > Improve naming of classes, methods, and variables related to block > replication and recovery > --- > > Key: HDFS-7955 > URL: https://issues.apache.org/jira/browse/HDFS-7955 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Reporter: Zhe Zhang >Assignee: Rakesh R > Attachments: HDFS-7955-001.patch, HDFS-7955-002.patch, > HDFS-7955-003.patch, HDFS-7955-004.patch, HDFS-7955-5.patch > > > Many existing names should be revised to avoid confusion when blocks can be > both replicated and erasure coded. This JIRA aims to solicit opinions on > making those names more consistent and intuitive. > # In current HDFS _block recovery_ refers to the process of finalizing the > last block of a file, triggered by _lease recovery_. It is different from the > intuitive meaning of _recovering a lost block_. To avoid confusion, I can > think of 2 options: > #* Rename this process as _block finalization_ or _block completion_. I > prefer this option because this is literally not a recovery. > #* If we want to keep existing terms unchanged we can name all EC recovery > and re-replication logics as _reconstruction_. > # As Kai [suggested | > https://issues.apache.org/jira/browse/HDFS-7369?focusedCommentId=14361131&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14361131] > under HDFS-7369, several replication-based names should be made more generic: > #* {{UnderReplicatedBlocks}} and {{neededReplications}}. E.g. we can use > {{LowRedundancyBlocks}}/{{AtRiskBlocks}}, and > {{neededRecovery}}/{{neededReconstruction}}. > #* {{PendingReplicationBlocks}} > #* {{ReplicationMonitor}} > I'm sure the above list is incomplete; discussions and comments are very > welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11193) [SPS]: Erasure coded files should be considered for satisfying storage policy
Rakesh R created HDFS-11193: --- Summary: [SPS]: Erasure coded files should be considered for satisfying storage policy Key: HDFS-11193 URL: https://issues.apache.org/jira/browse/HDFS-11193 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Rakesh R Assignee: Rakesh R Erasure coded striped files supports storage policies {{HOT, COLD, ALLSSD}}. {{HdfsAdmin#satisfyStoragePolicy}} API call on a directory should consider all immediate files under that directory and need to check that, the files really matching with namespace storage policy. All the mismatched striped blocks should be chosen for block movement. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11164) Mover should avoid unnecessary retries if the block is pinned
Rakesh R created HDFS-11164: --- Summary: Mover should avoid unnecessary retries if the block is pinned Key: HDFS-11164 URL: https://issues.apache.org/jira/browse/HDFS-11164 Project: Hadoop HDFS Issue Type: Bug Components: balancer & mover Reporter: Rakesh R Assignee: Rakesh R When mover is trying to move a pinned block to another datanode, it will internally hits the following IOException and mark the block movement as {{failure}}. Since the Mover has {{dfs.mover.retry.max.attempts}} configs, it will continue moving this block until it reaches {{retryMaxAttempts}}. This retry is unnecessary and would be good to avoid retry attempts as pinned block won't be able to move. {code} 2016-11-22 10:56:10,537 WARN org.apache.hadoop.hdfs.server.balancer.Dispatcher: Failed to move blk_1073741825_1001 with size=52 from 127.0.0.1:19501:DISK to 127.0.0.1:19758:ARCHIVE through 127.0.0.1:19501 java.io.IOException: Got error, status=ERROR, status message opReplaceBlock BP-1772076264-10.252.146.200-1479792322960:blk_1073741825_1001 received exception java.io.IOException: Got error, status=ERROR, status message Not able to copy block 1073741825 to /127.0.0.1:19826 because it's pinned , copy block BP-1772076264-10.252.146.200-1479792322960:blk_1073741825_1001 from /127.0.0.1:19501, reportedBlock move is failed at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:118) at org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:417) at org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:358) at org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$5(Dispatcher.java:322) at org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:1075) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11151) [SPS]: Handle unable to choose target node for the required storage type by StoragePolicySatisfier
Rakesh R created HDFS-11151: --- Summary: [SPS]: Handle unable to choose target node for the required storage type by StoragePolicySatisfier Key: HDFS-11151 URL: https://issues.apache.org/jira/browse/HDFS-11151 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Presently SPS is not handling a case where the failed to choose target node for the required storage type. In general, there are two cases: # For the given path, unable to find any target node for any of its blocks or block locations(src nodes). Means, no block movement will be scheduled against this path. # For the given path, there are few target nodes available for few block locations(source nodes). Means, some of the blocks or block locations(src nodes) under the given path will be scheduled for block movement. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11125) [SPS]: Use smaller batches of BlockMovingInfo into the block storage movement command
Rakesh R created HDFS-11125: --- Summary: [SPS]: Use smaller batches of BlockMovingInfo into the block storage movement command Key: HDFS-11125 URL: https://issues.apache.org/jira/browse/HDFS-11125 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R This is a follow-up task of HDFS-11068, where it sends all the blocks under a trackID over single heartbeat response(DNA_BLOCK_STORAGE_MOVEMENT command). If blocks are many under a given trackID(For example: a file contains many blocks) then those requests go across a network and come with a lot of overhead. In this jira, we will discuss and implement a mechanism to limit the list of items into smaller batches with in trackID. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11113) Document dfs.client.read.striped configuration in hdfs-default.xml
Rakesh R created HDFS-3: --- Summary: Document dfs.client.read.striped configuration in hdfs-default.xml Key: HDFS-3 URL: https://issues.apache.org/jira/browse/HDFS-3 Project: Hadoop HDFS Issue Type: Sub-task Components: documentation, hdfs-client Reporter: Rakesh R Assignee: Rakesh R Priority: Minor {{dfs.client.read.striped.threadpool.size}} should be covered in hdfs-default.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11082) Erasure Coding : Provide replicated EC policy to just replicating the files
Rakesh R created HDFS-11082: --- Summary: Erasure Coding : Provide replicated EC policy to just replicating the files Key: HDFS-11082 URL: https://issues.apache.org/jira/browse/HDFS-11082 Project: Hadoop HDFS Issue Type: Sub-task Components: erasure-coding Reporter: Rakesh R Assignee: Rakesh R The idea of this jira is to provide a new {{replicated EC policy}} so that we can override the EC policy on a parent directory and go back to just replicating the files based on replication factors. Thanks [~andrew.wang] for the [discussions|https://issues.apache.org/jira/browse/HDFS-11072?focusedCommentId=15620743&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15620743]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11068) [SPS]: Provide unique trackID to track the block movement sends to coordinator
Rakesh R created HDFS-11068: --- Summary: [SPS]: Provide unique trackID to track the block movement sends to coordinator Key: HDFS-11068 URL: https://issues.apache.org/jira/browse/HDFS-11068 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Presently DatanodeManager uses constant value -1 as [trackID|https://github.com/apache/hadoop/blob/HDFS-10285/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1607], which is a temporary value. As per discussion with [~umamaheswararao], one proposal is to use {{BlockCollectionId/InodeFileId}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11032) [SPS]: Handling of block movement failure at the coordinator datanode
Rakesh R created HDFS-11032: --- Summary: [SPS]: Handling of block movement failure at the coordinator datanode Key: HDFS-11032 URL: https://issues.apache.org/jira/browse/HDFS-11032 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R The idea of this jira is to discuss and implement an efficient failure(block movement failure) handling logic at the datanode cooridnator. [Code reference|https://github.com/apache/hadoop/blob/HDFS-10285/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/StoragePolicySatisfyWorker.java#L243]. Following are the possible errors during block movement: # Network errors(IOException) - provide retries(may be a hard coded 2 time retries) if the block storage movement is failed due to network errors. If its still end up with errors after 2 retries then marked as failure/retry to NN. # No disk space(IOException) - no retries maked as failure/retry to NN. # Block pinned - no retries marked as success/no-retry to NN. It is not possible to relocate this block to another datanode. # Gen_Stamp mismatches - no retries marked as failure/retry to NN. Could be a case that the file might have re-opened. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-8331) Erasure Coding: Create FileStatus isErasureCoded() method
[ https://issues.apache.org/jira/browse/HDFS-8331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R resolved HDFS-8331. Resolution: Duplicate > Erasure Coding: Create FileStatus isErasureCoded() method > - > > Key: HDFS-8331 > URL: https://issues.apache.org/jira/browse/HDFS-8331 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R > > The idea of this jira is to discuss the need of > {{FileStatus#isErasureCoded()}} API. This is just an initial thought, > presently the use case/necessity of this is not clear now. Probably will > revisit this once the feature is getting matured. > Thanks [~umamaheswararao], [~vinayrpet] , [~zhz] for the offline discussions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10954) [SPS]: Report the failed block movement results back to NN from DN
Rakesh R created HDFS-10954: --- Summary: [SPS]: Report the failed block movement results back to NN from DN Key: HDFS-10954 URL: https://issues.apache.org/jira/browse/HDFS-10954 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R This jira is a follow-up task of HDFS-10884. As part of HDFS-10884 jira, it is providing a mechanism to collect all the failed block movement results at the {{co-ordinator datanode}} side. Now, the idea of this jira is to discuss an efficient way to report these failed block movement results to namenode, so that NN can take necessary action based on this information. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10920) TestStorageMover#testNoSpaceDisk is failing intermittently
Rakesh R created HDFS-10920: --- Summary: TestStorageMover#testNoSpaceDisk is failing intermittently Key: HDFS-10920 URL: https://issues.apache.org/jira/browse/HDFS-10920 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Rakesh R Assignee: Rakesh R TestStorageMover#testNoSpaceDisk test case is failing frequently in the build. References: [HDFS-Build_16890|https://builds.apache.org/job/PreCommit-HDFS-Build/16890], [HDFS-Build_16895|https://builds.apache.org/job/PreCommit-HDFS-Build/16895] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10884) [SPS]: Add block movement tracker to track the completion of block movement future tasks at DN
Rakesh R created HDFS-10884: --- Summary: [SPS]: Add block movement tracker to track the completion of block movement future tasks at DN Key: HDFS-10884 URL: https://issues.apache.org/jira/browse/HDFS-10884 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Presently [StoragePolicySatisfyWorker#processBlockMovingTasks()|https://github.com/apache/hadoop/blob/HDFS-10285/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/StoragePolicySatisfyWorker.java#L147] function act as a blocking call. The idea of this jira is to implement a mechanism to track these movements async so that would allow other movement while processing the previous one. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10794) Provide storage policy satisfy worker at DN for co-ordinating the block storage movement work
Rakesh R created HDFS-10794: --- Summary: Provide storage policy satisfy worker at DN for co-ordinating the block storage movement work Key: HDFS-10794 URL: https://issues.apache.org/jira/browse/HDFS-10794 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R The idea of this jira is to implement a mechanism to move the blocks to the given target in order to satisfy the block storage policy. Datanode receives {{blocktomove}} details via heart beat response from NN. More specifically, its a datanode side extension to handle the block storage movement commands. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10720) Fix intermittent test failure of TestDataNodeErasureCodingMetrics#testEcTasks
Rakesh R created HDFS-10720: --- Summary: Fix intermittent test failure of TestDataNodeErasureCodingMetrics#testEcTasks Key: HDFS-10720 URL: https://issues.apache.org/jira/browse/HDFS-10720 Project: Hadoop HDFS Issue Type: Bug Reporter: Rakesh R Assignee: Rakesh R The test is wrongly finding out the datanode to be corrupted from the block locations. Instead of finding out a datanode which is used in the block locations it is simply getting a datanode from the cluster, which may not be a datanode present in the block locations. {code} byte[] indices = lastBlock.getBlockIndices(); //corrupt the first block DataNode toCorruptDn = cluster.getDataNodes().get(indices[0]); {code} For example, datanodes in the cluster.getDataNodes() array indexed like, 0->Dn1, 1->Dn2, 2->Dn3, 3->Dn4, 4->Dn5, 5->Dn6, 6->Dn7, 7->Dn8, 8->Dn9, 9->Dn10 Assume the datanodes which are part of block location is => Dn2, Dn3, Dn4, Dn5, Dn6, Dn7, Dn8, Dn9, Dn10. Now, in the failed scenario, it is getting the corrupted datanode as cluster.getDataNodes().get(0) which will be Dn1 and corruption of this datanode will not result in ECWork and is failing the tests. Ideally, the test should find a datanode from the block locations and corrupt it, that will trigger ECWork. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10660) Expose storage policy apis via HDFSAdmin interface
Rakesh R created HDFS-10660: --- Summary: Expose storage policy apis via HDFSAdmin interface Key: HDFS-10660 URL: https://issues.apache.org/jira/browse/HDFS-10660 Project: Hadoop HDFS Issue Type: Improvement Reporter: Rakesh R Assignee: Rakesh R Presently, {{org.apache.hadoop.hdfs.client.HdfsAdmin.java}} interface has only {{#setStoragePolicy()}} API exposed. This jira is to add the following set of apis into HdfsAdmin. {code} HdfsAdmin#unsetStoragePolicy HdfsAdmin#getStoragePolicy HdfsAdmin#getAllStoragePolicies {code} Thanks [~arpitagarwal] for the offline discussions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10592) Fix intermittent test failure of TestNameNodeResourceChecker#testCheckThatNameNodeResourceMonitorIsRunning
Rakesh R created HDFS-10592: --- Summary: Fix intermittent test failure of TestNameNodeResourceChecker#testCheckThatNameNodeResourceMonitorIsRunning Key: HDFS-10592 URL: https://issues.apache.org/jira/browse/HDFS-10592 Project: Hadoop HDFS Issue Type: Bug Reporter: Rakesh R Assignee: Rakesh R This jira is to fix the {{TestNameNodeResourceChecker#testCheckThatNameNodeResourceMonitorIsRunning}} test case failure. Reference [Build_15973|https://builds.apache.org/job/PreCommit-HDFS-Build/15973/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestNameNodeResourceChecker/testCheckThatNameNodeResourceMonitorIsRunning/] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10590) Fix TestReconstructStripedBlocks.testCountLiveReplicas test failures
Rakesh R created HDFS-10590: --- Summary: Fix TestReconstructStripedBlocks.testCountLiveReplicas test failures Key: HDFS-10590 URL: https://issues.apache.org/jira/browse/HDFS-10590 Project: Hadoop HDFS Issue Type: Bug Reporter: Rakesh R Assignee: Rakesh R This jira is to fix the test case failure. Please see the below stacktrace. Reference : [Build_15968|https://builds.apache.org/job/PreCommit-HDFS-Build/15968/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testCountLiveReplicas/] {code} java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.hdfs.server.namenode.TestReconstructStripedBlocks.testCountLiveReplicas(TestReconstructStripedBlocks.java:324) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10584) Allow long-running Mover tool to login with keytab
Rakesh R created HDFS-10584: --- Summary: Allow long-running Mover tool to login with keytab Key: HDFS-10584 URL: https://issues.apache.org/jira/browse/HDFS-10584 Project: Hadoop HDFS Issue Type: New Feature Components: balancer & mover Reporter: Rakesh R Assignee: Rakesh R The idea of this jira is to support {{mover}} tool the ability to login from a keytab. That way, the RPC client would re-login from the keytab after expiration, which means the process could remain authenticated indefinitely. With some people wanting to run mover non-stop in "daemon mode", that might be a reasonable feature to add. Recently balancer has been enhanced using this feature. Thanks [~zhz] for the offline discussions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10461) Erasure Coding: Optimize block checksum recalculation logic on the fly by reconstructing multiple missed blocks at a time
Rakesh R created HDFS-10461: --- Summary: Erasure Coding: Optimize block checksum recalculation logic on the fly by reconstructing multiple missed blocks at a time Key: HDFS-10461 URL: https://issues.apache.org/jira/browse/HDFS-10461 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R This is HDFS-9833 follow-on task. HDFS-9833 is recomputing only one block checksum a time. The reconstruction logic can be further optimized by reconstructing multiple blocks at a time. There are several case to be considered like, case-1) Live Block indices : {{0, 4, 5, 6, 7, 8}} - consecutive missing data blocks 1, 2, 3 case-2) Live Block indices : {{0, 2, 4, 6, 7, 8}} - jumbled missing data blocks 1, 3, 5 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10460) Erasure Coding: Recompute block checksum for a particular range less than file size on the fly by reconstructing missed block
Rakesh R created HDFS-10460: --- Summary: Erasure Coding: Recompute block checksum for a particular range less than file size on the fly by reconstructing missed block Key: HDFS-10460 URL: https://issues.apache.org/jira/browse/HDFS-10460 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Rakesh R Assignee: Rakesh R This jira is HDFS-9833 follow-on task to address reconstructing block and then recalculating block checksum for a particular range query. For example, {code} // create a file 'stripedFile1' with fileSize = cellSize * numDataBlocks = 65536 * 6 = 393216 FileChecksum stripedFileChecksum = getFileChecksum(stripedFile1, 10, true); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10434) Fix intermittent test failure of TestDataNodeErasureCodingMetrics
Rakesh R created HDFS-10434: --- Summary: Fix intermittent test failure of TestDataNodeErasureCodingMetrics Key: HDFS-10434 URL: https://issues.apache.org/jira/browse/HDFS-10434 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R This jira is to fix the test case failure. Reference : [Build15485_TestDataNodeErasureCodingMetrics_testEcTasks|https://builds.apache.org/job/PreCommit-HDFS-Build/15485/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeErasureCodingMetrics/testEcTasks/] {code} Error Message Bad value for metric EcReconstructionTasks expected:<1> but was:<0> Stacktrace java.lang.AssertionError: Bad value for metric EcReconstructionTasks expected:<1> but was:<0> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.test.MetricsAsserts.assertCounter(MetricsAsserts.java:228) at org.apache.hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics.testEcTasks(TestDataNodeErasureCodingMetrics.java:92) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10407) Erasure Coding: Rename CorruptReplicasMap to CorruptRedundancyMap in BlockManager to more generic
Rakesh R created HDFS-10407: --- Summary: Erasure Coding: Rename CorruptReplicasMap to CorruptRedundancyMap in BlockManager to more generic Key: HDFS-10407 URL: https://issues.apache.org/jira/browse/HDFS-10407 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Rakesh R Assignee: Rakesh R The idea of this jira is to rename the following entity in BlockManager, - {{CorruptReplicasMap}} to {{CorruptRedundancyMap}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10368) Erasure Coding: Deprecate replication-related config keys
Rakesh R created HDFS-10368: --- Summary: Erasure Coding: Deprecate replication-related config keys Key: HDFS-10368 URL: https://issues.apache.org/jira/browse/HDFS-10368 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R This jira is to visit the replication based config keys and deprecate them. Please refer [discussion thread|https://issues.apache.org/jira/browse/HDFS-9869?focusedCommentId=15249363&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15249363] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10308) TestRetryCacheWithHA#testRetryCacheOnStandbyNN failing
Rakesh R created HDFS-10308: --- Summary: TestRetryCacheWithHA#testRetryCacheOnStandbyNN failing Key: HDFS-10308 URL: https://issues.apache.org/jira/browse/HDFS-10308 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Rakesh R Assignee: Rakesh R Its failing with following exception {code} java.lang.AssertionError: expected:<25> but was:<26> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testRetryCacheOnStandbyNN(TestRetryCacheWithHA.java:169) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10236) Erasure Coding: Rename replication-based names in BlockManager to more generic [part-3]
Rakesh R created HDFS-10236: --- Summary: Erasure Coding: Rename replication-based names in BlockManager to more generic [part-3] Key: HDFS-10236 URL: https://issues.apache.org/jira/browse/HDFS-10236 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R The idea of this jira is to rename the following entity in BlockManager as, {{getExpectedReplicaNum}} to {{getExpectedRedundancyNum}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10186) DirectoryScanner: Improve logs by adding full path of both actual and expected block directories
Rakesh R created HDFS-10186: --- Summary: DirectoryScanner: Improve logs by adding full path of both actual and expected block directories Key: HDFS-10186 URL: https://issues.apache.org/jira/browse/HDFS-10186 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Priority: Minor As per the [discussion|https://issues.apache.org/jira/browse/HDFS-7648?focusedCommentId=15195908&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15195908], this jira is to improve directory scanner log by adding the wrong and correct directory path so that admins can take necessary actions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9918) Erasure Coding : sort located striped blocks based on decommissioned states
Rakesh R created HDFS-9918: -- Summary: Erasure Coding : sort located striped blocks based on decommissioned states Key: HDFS-9918 URL: https://issues.apache.org/jira/browse/HDFS-9918 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R This jira is a follow-on work of HDFS-8786, where we do decommissioning of datanodes having striped blocks. Now, after decommissioning it requires to change the ordering of the storage list so that the decommissioned datanodes should only be last node in list. For example, assume we have a block group with storage list:- d0, d1, d2, d3, d4, d5, d6, d7, d8, d9 mapping to indices 0, 1, 2, 3, 4, 5, 6, 7, 8, 2 Here the internal block b2 is duplicated, locating in d2 and d9. If d2 is a decommissioning node then should switch d2 and d9 in the storage list. Thanks [~jingzhao] for the [discussions|https://issues.apache.org/jira/browse/HDFS-8786?focusedCommentId=15180415&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15180415] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9879) Erasure Coding : schedule striped blocks to be cached on DataNodes
Rakesh R created HDFS-9879: -- Summary: Erasure Coding : schedule striped blocks to be cached on DataNodes Key: HDFS-9879 URL: https://issues.apache.org/jira/browse/HDFS-9879 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R This jira to discuss and implement the caching of striped block objects on the appropriate datanode. Presently it is checking block group size and scheduling the blockGroupId to the datanode, which needs to be refined by checking the {{StripedBlockUtil.getInternalBlockLength()}} and schedule proper blockId to the datanode. {code} CacheReplicationMonitor.java if (pendingCapacity < blockInfo.getNumBytes()) { LOG.trace("Block {}: DataNode {} is not a valid possibility " + "because the block has size {}, but the DataNode only has {} " + "bytes of cache remaining ({} pending bytes, {} already cached.)", blockInfo.getBlockId(), datanode.getDatanodeUuid(), blockInfo.getNumBytes(), pendingCapacity, pendingBytes, datanode.getCacheRemaining()); outOfCapacity++; continue; } for (DatanodeDescriptor datanode : chosen) { LOG.trace("Block {}: added to PENDING_CACHED on DataNode {}", blockInfo.getBlockId(), datanode.getDatanodeUuid()); pendingCached.add(datanode); boolean added = datanode.getPendingCached().add(cachedBlock); assert added; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9869) Erasure Coding: Rename replication-based names in BlockManager to more generic [part-2]
Rakesh R created HDFS-9869: -- Summary: Erasure Coding: Rename replication-based names in BlockManager to more generic [part-2] Key: HDFS-9869 URL: https://issues.apache.org/jira/browse/HDFS-9869 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R The idea of this jira is to rename the following entities in BlockManager as, - {{PendingReplicationBlocks}} to {{PendingReconstructionBlocks}} - {{excessReplicateMap}} to {{extraRedundancyMap}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9857) Erasure Coding: Rename replication-based names in BlockManager to more generic
Rakesh R created HDFS-9857: -- Summary: Erasure Coding: Rename replication-based names in BlockManager to more generic Key: HDFS-9857 URL: https://issues.apache.org/jira/browse/HDFS-9857 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R The idea of this jira is to rename the following entities in BlockManager as, - {{UnderReplicatedBlocks}} to {{LowRedundancyBlocks}} - {{PendingReplicationBlocks}} to {{PendingReconstructionBlocks}} - {{neededReplications}} to {{neededReconstruction}} - {{excessReplicateMap}} to {{extraRedundancyMap}} Thanks [~zhz], [~andrew.wang] for the useful [discussions|https://issues.apache.org/jira/browse/HDFS-7955?focusedCommentId=15149406&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15149406] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9829) Erasure Coding: Improve few exception handling logic of ErasureCodingWorker
Rakesh R created HDFS-9829: -- Summary: Erasure Coding: Improve few exception handling logic of ErasureCodingWorker Key: HDFS-9829 URL: https://issues.apache.org/jira/browse/HDFS-9829 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Priority: Minor # Cancel remaining reads on InterruptedException. {code} } catch (InterruptedException e) { LOG.info("Read data interrupted.", e); break; } {code} # Shouldn't fail recontruction due to an IOException errors while reporting corrupt blocks. {code} } finally { // report corrupted blocks to NN reportCorruptedBlocks(corruptionMap); } {code} # Also, use {} instead of string concatenation in logger. {code} LOG.debug("Using striped reads; pool threads=" + num); //... LOG.warn("Found Checksum error for " + reader.block + " from " + reader.source + " at " + e.getPos()); //... LOG.debug("Using striped block reconstruction; pool threads=" + num); //.. LOG.warn("Failed to reconstruct striped block: " + blockGroup, e); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9775) Erasure Coding : Rename BlockRecoveryWork to BlockReconstructionWork
Rakesh R created HDFS-9775: -- Summary: Erasure Coding : Rename BlockRecoveryWork to BlockReconstructionWork Key: HDFS-9775 URL: https://issues.apache.org/jira/browse/HDFS-9775 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Rakesh R Assignee: Rakesh R This sub-task is to visit the block recovery work and make the logic as reconstruction. ie, rename "recovery" to "reconstruction" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9731) Erasure Coding: Improve naming of classes, methods, and variables related to EC recovery
Rakesh R created HDFS-9731: -- Summary: Erasure Coding: Improve naming of classes, methods, and variables related to EC recovery Key: HDFS-9731 URL: https://issues.apache.org/jira/browse/HDFS-9731 Project: Hadoop HDFS Issue Type: Sub-task Components: erasure-coding Reporter: Rakesh R Assignee: Rakesh R This sub-task is to visit the EC recovery logic and make the logic as _reconstruction_. ie, rename EC-related block repair logic to "reconstruction" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9472) concat() API does not resolve the .reserved path
Rakesh R created HDFS-9472: -- Summary: concat() API does not resolve the .reserved path Key: HDFS-9472 URL: https://issues.apache.org/jira/browse/HDFS-9472 Project: Hadoop HDFS Issue Type: Bug Reporter: Rakesh R Assignee: Rakesh R dfs#concat() API doesn't resolve the {{/.reserved/raw}} path. For example, if the input paths of the form {{/.reserved/raw/ezone/a}} then this API doesn't work properly. IMHO will discuss here to support this behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9435) TestBlockRecovery#testRBWReplicas is failing intermittently
Rakesh R created HDFS-9435: -- Summary: TestBlockRecovery#testRBWReplicas is failing intermittently Key: HDFS-9435 URL: https://issues.apache.org/jira/browse/HDFS-9435 Project: Hadoop HDFS Issue Type: Bug Reporter: Rakesh R Assignee: Rakesh R TestBlockRecovery#testRBWReplicas is failing in the [build 13536|https://builds.apache.org/job/PreCommit-HDFS-Build/13536/testReport/org.apache.hadoop.hdfs.server.datanode/TestBlockRecovery/testRBWReplicas/]. It looks like bug in tests due to race condition. Note: Attached logs taken from the build to this jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9433) DFS getEZForPath API on a non-existent file should throw FileNotFoundException
Rakesh R created HDFS-9433: -- Summary: DFS getEZForPath API on a non-existent file should throw FileNotFoundException Key: HDFS-9433 URL: https://issues.apache.org/jira/browse/HDFS-9433 Project: Hadoop HDFS Issue Type: Sub-task Components: encryption Reporter: Rakesh R Assignee: Rakesh R Presently {{dfs.getEZForPath()}} API is behaving differently for a non-existent normal file and non-existent ezone file: - If user pass a normal non-existent file then it will return null value. For example, {{Path("/nonexistentfile")}} - If user pass a non-existent file but which is under an existing encryption zone then it is returning the parent's encryption zone info. For example, {{Path("/ezone/nonexistentfile")}} Here the proposed idea is to unify the behavior by throwing FileNotFoundException. Please refer the discussion [thread|https://issues.apache.org/jira/browse/HDFS-9348?focusedCommentId=14983301&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14983301]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9348) DFS GetErasureCodingPolicy API on a non-existent file should be handled properly
Rakesh R created HDFS-9348: -- Summary: DFS GetErasureCodingPolicy API on a non-existent file should be handled properly Key: HDFS-9348 URL: https://issues.apache.org/jira/browse/HDFS-9348 Project: Hadoop HDFS Issue Type: Bug Reporter: Rakesh R Assignee: Rakesh R Priority: Minor Presently calling {{dfs#getErasureCodingPolicy()}} on a non-existent file is returning the ErasureCodingPolicy info. As per the [discussion|https://issues.apache.org/jira/browse/HDFS-8777?focusedCommentId=14981077&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14981077] it has to validate and throw FileNotFoundException. Also, {{dfs#getEncryptionZoneForPath()}} API has the same behavior. Again we can discuss to add the file existence validation in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9261) Erasure Coding: Skip encoding the data cells if all the parity data streamers are failed for the current block group
Rakesh R created HDFS-9261: -- Summary: Erasure Coding: Skip encoding the data cells if all the parity data streamers are failed for the current block group Key: HDFS-9261 URL: https://issues.apache.org/jira/browse/HDFS-9261 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Priority: Minor {{DFSStripedOutputStream}} will continue writing with minimum number (dataBlockNum) of live datanodes. It won't replace the failed datanodes immediately for the current block group. Consider a case where all the parity data streamers are failed, now it is unnecessary to encode the data block cells and generate the parity data. This is a corner case where it can skip {{writeParityCells()}} step. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9256) Erasure Coding: Improve failure handling of ECWorker striped block reconstruction
Rakesh R created HDFS-9256: -- Summary: Erasure Coding: Improve failure handling of ECWorker striped block reconstruction Key: HDFS-9256 URL: https://issues.apache.org/jira/browse/HDFS-9256 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R As we know reconstruction of missed striped block is a costly operation, it involves the following steps:- step-1) read the data from minimum number of sources(remotely reading the data) step-2) decode data for the targets (CPU cycles) step-3) transfer the data to the targets(remotely writing the data) Assume there is a failure in step-3 due to target DN disconnected or dead etc. Presently {{ECWorker}} is skipping the failed DN and continue transferring data to the other targets. In the next round, it should again start the reconstruction operation from first step. Considering the cost of reconstruction, it would be good to give another chance to retry the failed operation. The idea of this jira is to disucss the possible approaches and implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9185) TestRecoverStripedFile is failing
Rakesh R created HDFS-9185: -- Summary: TestRecoverStripedFile is failing Key: HDFS-9185 URL: https://issues.apache.org/jira/browse/HDFS-9185 Project: Hadoop HDFS Issue Type: Bug Components: erasure-coding Reporter: Rakesh R Assignee: Rakesh R Priority: Critical Below is the message taken from build: {code} Error Message Time out waiting for EC block recovery. Stacktrace java.io.IOException: Time out waiting for EC block recovery. at org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:383) at org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:283) at org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverAnyBlocks1(TestRecoverStripedFile.java:168) {code} Reference : https://builds.apache.org/job/PreCommit-HDFS-Build/12758 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9172) Erasure Coding: Move DFSStripedIO stream related classes to hadoop-hdfs-client
Rakesh R created HDFS-9172: -- Summary: Erasure Coding: Move DFSStripedIO stream related classes to hadoop-hdfs-client Key: HDFS-9172 URL: https://issues.apache.org/jira/browse/HDFS-9172 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R The idea of this jira is to move the striped stream related classes to {{hadoop-hdfs-client}} project. This will help to be in sync with the HDFS-6200 proposal. - DFSStripedInputStream - DFSStripedOutputStream - StripedDataStreamer -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9091) Erasure Coding: Provide DistributedFilesystem API to getAllErasureCodingPolicies
Rakesh R created HDFS-9091: -- Summary: Erasure Coding: Provide DistributedFilesystem API to getAllErasureCodingPolicies Key: HDFS-9091 URL: https://issues.apache.org/jira/browse/HDFS-9091 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R This jira is to implement {{DFS#getAllErasureCodingPolicies()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8959) Provide an iterator-based API for listing all the snapshottable directories
Rakesh R created HDFS-8959: -- Summary: Provide an iterator-based API for listing all the snapshottable directories Key: HDFS-8959 URL: https://issues.apache.org/jira/browse/HDFS-8959 Project: Hadoop HDFS Issue Type: Bug Reporter: Rakesh R Assignee: Rakesh R Presently {{DistributedFileSystem#getSnapshottableDirListing()}} is sending all the {{SnapshottableDirectoryStatus[]}} array to the clients. Now the client should have enough space to hold it in memory. There could be chance that the client JVMs running out of memory because of this. Also, some time back there was a [comment|https://issues.apache.org/jira/browse/HDFS-8643?focusedCommentId=14658800&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14658800] about RPC packet limitation and a large number of snapshot list can again cause issues. I believe iterator based {{DistributedFileSystem#listSnapshottableDirs()}} API would be a good addition! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8941) DistributedFileSystem listCorruptFileBlocks API should resolve relative path
Rakesh R created HDFS-8941: -- Summary: DistributedFileSystem listCorruptFileBlocks API should resolve relative path Key: HDFS-8941 URL: https://issues.apache.org/jira/browse/HDFS-8941 Project: Hadoop HDFS Issue Type: Bug Reporter: Rakesh R Assignee: Rakesh R Presently {{DFS#listCorruptFileBlocks(path)}} API is not resolving the given path relative to the workingDir. This jira is to discuss and provide the implementation of the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8899) Erasure Coding: use threadpool for EC recovery tasks
Rakesh R created HDFS-8899: -- Summary: Erasure Coding: use threadpool for EC recovery tasks Key: HDFS-8899 URL: https://issues.apache.org/jira/browse/HDFS-8899 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R The idea is to use threadpool for processing erasure coding recovery tasks at the datanode. {code} new Daemon(new ReconstructAndTransferBlock(recoveryInfo)).start(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8853) Erasure Coding: Provide ECSchema validation when creating ECZone
Rakesh R created HDFS-8853: -- Summary: Erasure Coding: Provide ECSchema validation when creating ECZone Key: HDFS-8853 URL: https://issues.apache.org/jira/browse/HDFS-8853 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Presently the {{DFS#createErasureCodingZone(path, ecSchema, cellSize)}} doesn't have any validation that the given {{ecSchema}} is available in {{ErasureCodingSchemaManager#activeSchemas}} list. Now, if it doesn't exists then will create the ECZone with {{null}} schema. IMHO we could improve this by doing necessary basic sanity checks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8773) Few FSNamesystem metrics are not documented in the Metrics page
Rakesh R created HDFS-8773: -- Summary: Few FSNamesystem metrics are not documented in the Metrics page Key: HDFS-8773 URL: https://issues.apache.org/jira/browse/HDFS-8773 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Rakesh R Assignee: Rakesh R This jira is to document missing metrics in the [Metrics page|https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/Metrics.html#FSNamesystem]. Following are not documented: {code} MissingReplOneBlocks NumFilesUnderConstruction NumActiveClients HAState FSState {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8721) Add a metric for number of encryption zones
Rakesh R created HDFS-8721: -- Summary: Add a metric for number of encryption zones Key: HDFS-8721 URL: https://issues.apache.org/jira/browse/HDFS-8721 Project: Hadoop HDFS Issue Type: Sub-task Components: encryption Reporter: Rakesh R Assignee: Rakesh R Would be good to expose the number of encryption zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8648) Revisit FsDirectory#resolvePath() function usage to check the call is made under proper lock
Rakesh R created HDFS-8648: -- Summary: Revisit FsDirectory#resolvePath() function usage to check the call is made under proper lock Key: HDFS-8648 URL: https://issues.apache.org/jira/browse/HDFS-8648 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R As per the [discussion|https://issues.apache.org/jira/browse/HDFS-8493?focusedCommentId=14595735&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14595735] in HDFS-8493 the function {{FsDirectory#resolvePath}} usage needs to be reviewed. It seems there are many places it has done the resolution {{fsd.resolvePath(pc, src, pathComponents);}} by acquiring only fsn lock and not fsd lock. As per the initial analysis following are such cases, probably it needs to filter out and fix wrong usage. # FsDirAclOp.java -> getAclStatus() -> modifyAclEntries() -> removeAcl() -> removeDefaultAcl() -> setAcl() -> getAclStatus() # FsDirDeleteOp.java -> delete(fsn, src, recursive, logRetryCache) # FsDirRenameOp.java -> renameToInt(fsd, srcArg, dstArg, logRetryCache) -> renameToInt(fsd, srcArg, dstArg, logRetryCache, options) # FsDirStatAndListingOp.java -> getContentSummary(fsd, src) -> getFileInfo(fsd, srcArg, resolveLink) -> isFileClosed(fsd, src) -> getListingInt(fsd, srcArg, startAfter, needLocation) # FsDirWriteFileOp.java -> abandonBlock() -> completeFile(fsn, pc, srcArg, holder, last, fileId) -> getEncryptionKeyInfo(fsn, pc, src, supportedVersions) -> startFile() -> validateAddBlock() # FsDirXAttrOp.java -> getXAttrs(fsd, srcArg, xAttrs) -> listXAttrs(fsd, src) -> setXAttr(fsd, src, xAttr, flag, logRetryCache) # FSNamesystem.java -> createEncryptionZoneInt() -> getEZForPath() Thanks [~wheat9], [~vinayrpet] for the advice. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8643) Add snapshot names list to SnapshottableDirectoryStatus
Rakesh R created HDFS-8643: -- Summary: Add snapshot names list to SnapshottableDirectoryStatus Key: HDFS-8643 URL: https://issues.apache.org/jira/browse/HDFS-8643 Project: Hadoop HDFS Issue Type: Improvement Reporter: Rakesh R Assignee: Rakesh R The idea of this jira to enhance {{SnapshottableDirectoryStatus}} by adding {{snapshotNames}} attribute into it, presently it has the {{snapshotNumber}}. IMHO this would help the users to get the list of snapshot names created. Also, the snapshot names can be used while renaming or deleting the snapshots. {code} org.apache.hadoop.hdfs.protocol.SnapshottableDirectoryStatus.java /** * @return Snapshot names for the directory. */ public List getSnapshotNames() { return snapshotNames; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8642) Improve TestFileTruncate#setup by deleting the snapshots
Rakesh R created HDFS-8642: -- Summary: Improve TestFileTruncate#setup by deleting the snapshots Key: HDFS-8642 URL: https://issues.apache.org/jira/browse/HDFS-8642 Project: Hadoop HDFS Issue Type: Bug Reporter: Rakesh R Assignee: Rakesh R Priority: Minor I've observed {{TestFileTruncate#setup()}} function has to be improved by making it more independent. Presently if any of the snapshots related test failures will affect all the subsequent unit test cases. One such error has been observed in the [Hadoop-Hdfs-trunk-2163|https://builds.apache.org/job/Hadoop-Hdfs-trunk/2163/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFileTruncate/testTruncateWithDataNodesRestart] {code} https://builds.apache.org/job/Hadoop-Hdfs-trunk/2163/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFileTruncate/testTruncateWithDataNodesRestart/ org.apache.hadoop.ipc.RemoteException: The directory /test cannot be deleted since /test is snapshottable and already has snapshots at org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.checkSnapshot(FSDirSnapshotOp.java:226) at org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:54) at org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.deleteInternal(FSDirDeleteOp.java:177) at org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:104) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3046) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:939) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:608) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2172) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2166) at org.apache.hadoop.ipc.Client.call(Client.java:1440) at org.apache.hadoop.ipc.Client.call(Client.java:1371) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy22.delete(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:540) at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at com.sun.proxy.$Proxy23.delete(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1711) at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:718) at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:714) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:714) at org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.setup(TestFileTruncate.java:119) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8632) Erasure Coding: Add InterfaceAudience annotation to the erasure coding classes
Rakesh R created HDFS-8632: -- Summary: Erasure Coding: Add InterfaceAudience annotation to the erasure coding classes Key: HDFS-8632 URL: https://issues.apache.org/jira/browse/HDFS-8632 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R I've noticed some of the erasure coding classes missing {{@InterfaceAudience}} annotation. It would be good to identify the classes and add proper annotation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8606) Cleanup DFSOutputStream by removing unwanted changes
Rakesh R created HDFS-8606: -- Summary: Cleanup DFSOutputStream by removing unwanted changes Key: HDFS-8606 URL: https://issues.apache.org/jira/browse/HDFS-8606 Project: Hadoop HDFS Issue Type: Improvement Reporter: Rakesh R Assignee: Rakesh R This jira is to clean up few changes done as part of HDFS-8386. As per [~szetszwo] comments, it will affect the write performance. Please see the discussion [here|https://issues.apache.org/jira/browse/HDFS-8386?focusedCommentId=14575386&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14575386] Needs to do the following changes as part of this jira: # remove “synchronized" from getStreamer() since it may unnecessarily block the caller # remove setStreamer(..) which is currently not used. We may add it in the HDFS-7285 branch and see how to do synchronization correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-3854) Implement a fence method which should fence the BK shared storage.
[ https://issues.apache.org/jira/browse/HDFS-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R resolved HDFS-3854. Resolution: Duplicate > Implement a fence method which should fence the BK shared storage. > -- > > Key: HDFS-3854 > URL: https://issues.apache.org/jira/browse/HDFS-3854 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Uma Maheswara Rao G >Assignee: Rakesh R > > Currently when machine down or network down, SSHFence can not ensure that, > other node is completely down. So, fence will fail and switch will not happen. > [ internally we did work around to return true when machine is not reachable, > as BKJM already has fencing] > It may be good idea to implement a fence method, which should ensure shared > storage fenced propertly and return true. > We can plug in this new method in ZKFC fence methods. > only pain points what I can see is, we may have to put the BKJM jar in ZKFC > lib for running this fence method. > thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8568) TestClusterId is failing
Rakesh R created HDFS-8568: -- Summary: TestClusterId is failing Key: HDFS-8568 URL: https://issues.apache.org/jira/browse/HDFS-8568 Project: Hadoop HDFS Issue Type: Bug Reporter: Rakesh R Assignee: Rakesh R It fails with the below exception: {code} java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.hdfs.server.namenode.TestClusterId.testFormatWithEmptyClusterIdOption(TestClusterId.java:292) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8550) Erasure Coding: Fix FindBugs Multithreaded correctness Warning
Rakesh R created HDFS-8550: -- Summary: Erasure Coding: Fix FindBugs Multithreaded correctness Warning Key: HDFS-8550 URL: https://issues.apache.org/jira/browse/HDFS-8550 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Findbug warning:- Inconsistent synchronization of org.apache.hadoop.hdfs.DFSOutputStream.streamer; locked 89% of time {code} Bug type IS2_INCONSISTENT_SYNC (click for details) In class org.apache.hadoop.hdfs.DFSOutputStream Field org.apache.hadoop.hdfs.DFSOutputStream.streamer Synchronized 89% of the time Unsynchronized access at DFSOutputStream.java:[line 146] Unsynchronized access at DFSOutputStream.java:[line 859] Unsynchronized access at DFSOutputStream.java:[line 627] Unsynchronized access at DFSOutputStream.java:[line 630] Unsynchronized access at DFSOutputStream.java:[line 640] Unsynchronized access at DFSOutputStream.java:[line 342] Unsynchronized access at DFSOutputStream.java:[line 744] Unsynchronized access at DFSOutputStream.java:[line 903] Synchronized access at DFSOutputStream.java:[line 737] Synchronized access at DFSOutputStream.java:[line 913] Synchronized access at DFSOutputStream.java:[line 726] Synchronized access at DFSOutputStream.java:[line 756] Synchronized access at DFSOutputStream.java:[line 762] Synchronized access at DFSOutputStream.java:[line 757] Synchronized access at DFSOutputStream.java:[line 758] Synchronized access at DFSOutputStream.java:[line 762] Synchronized access at DFSOutputStream.java:[line 483] Synchronized access at DFSOutputStream.java:[line 486] Synchronized access at DFSOutputStream.java:[line 717] Synchronized access at DFSOutputStream.java:[line 719] Synchronized access at DFSOutputStream.java:[line 722] Synchronized access at DFSOutputStream.java:[line 408] Synchronized access at DFSOutputStream.java:[line 408] Synchronized access at DFSOutputStream.java:[line 423] Synchronized access at DFSOutputStream.java:[line 426] Synchronized access at DFSOutputStream.java:[line 411] Synchronized access at DFSOutputStream.java:[line 452] Synchronized access at DFSOutputStream.java:[line 452] Synchronized access at DFSOutputStream.java:[line 439] Synchronized access at DFSOutputStream.java:[line 439] Synchronized access at DFSOutputStream.java:[line 439] Synchronized access at DFSOutputStream.java:[line 670] Synchronized access at DFSOutputStream.java:[line 580] Synchronized access at DFSOutputStream.java:[line 574] Synchronized access at DFSOutputStream.java:[line 592] Synchronized access at DFSOutputStream.java:[line 583] Synchronized access at DFSOutputStream.java:[line 581] Synchronized access at DFSOutputStream.java:[line 621] Synchronized access at DFSOutputStream.java:[line 609] Synchronized access at DFSOutputStream.java:[line 621] Synchronized access at DFSOutputStream.java:[line 597] Synchronized access at DFSOutputStream.java:[line 612] Synchronized access at DFSOutputStream.java:[line 597] Synchronized access at DFSOutputStream.java:[line 588] Synchronized access at DFSOutputStream.java:[line 624] Synchronized access at DFSOutputStream.java:[line 612] Synchronized access at DFSOutputStream.java:[line 588] Synchronized access at DFSOutputStream.java:[line 632] Synchronized access at DFSOutputStream.java:[line 632] Synchronized access at DFSOutputStream.java:[line 616] Synchronized access at DFSOutputStream.java:[line 633] Synchronized access at DFSOutputStream.java:[line 657] Synchronized access at DFSOutputStream.java:[line 658] Synchronized access at DFSOutputStream.java:[line 695] Synchronized access at DFSOutputStream.java:[line 698] Synchronized access at DFSOutputStream.java:[line 784] Synchronized access at DFSOutputStream.java:[line 795] Synchronized access at DFSOutputStream.java:[line 801] Synchronized access at DFSOutputStream.java:[line 155] Synchronized access at DFSOutputStream.java:[line 158] Synchronized access at DFSOutputStream.java:[line 433] Synchronized access at DFSOutputStream.java:[line 886] Synchronized access at DFSOutputStream.java:[line 463] Synchronized access at DFSOutputStream.java:[line 469] Synchronized access at DFSOutputStream.java:[line 463] Synchronized access at DFSOutputStream.java:[line 470] Synchronized access at DFSOutputStream.java:[line 465] Synchronized access at DFSOutputStream.java:[line 749] Synchronized access at DFSStripedOutputStream.java:[line 260] Synchronized access at DFSStripedOutputStream.java:[line 325] Synchronized access at DFSStripedOutputStream.java:[line 325] Synchronized access at DFSStripedOutputStream.java:[line 335] Synchronized access at DFSStripedOutputStream.java:[line 264] Synchronized access at DFSStripedOutputStream.java:[line 511] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8532) Make the visibility of DFSOutputStream#streamer member variable to private
Rakesh R created HDFS-8532: -- Summary: Make the visibility of DFSOutputStream#streamer member variable to private Key: HDFS-8532 URL: https://issues.apache.org/jira/browse/HDFS-8532 Project: Hadoop HDFS Issue Type: Improvement Reporter: Rakesh R Assignee: Rakesh R Priority: Trivial -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8495) Consolidate append() related implementation into a single class
Rakesh R created HDFS-8495: -- Summary: Consolidate append() related implementation into a single class Key: HDFS-8495 URL: https://issues.apache.org/jira/browse/HDFS-8495 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Rakesh R Assignee: Rakesh R This jira proposes to consolidate {{FSNamesystem#append()}} related methods into a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8450) Erasure Coding: Consolidate erasure coding zone related implementation into a single class
Rakesh R created HDFS-8450: -- Summary: Erasure Coding: Consolidate erasure coding zone related implementation into a single class Key: HDFS-8450 URL: https://issues.apache.org/jira/browse/HDFS-8450 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R The idea is to follow the same pattern suggested by HDFS-7416. It is good to consolidate all the erasure coding zone related implementations of {{FSNamesystem}}. Here, proposing {{FSDirErasureCodingZoneOp}} class to have functions to perform related erasure coding zone operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8420) Erasure Coding: ECZoneManager#getECZoneInfo is not resolving the path properly if zone dir itself is the snapshottable dir
Rakesh R created HDFS-8420: -- Summary: Erasure Coding: ECZoneManager#getECZoneInfo is not resolving the path properly if zone dir itself is the snapshottable dir Key: HDFS-8420 URL: https://issues.apache.org/jira/browse/HDFS-8420 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Presently the resultant zone dir will come with {{.snapshot}} only when the zone dir itself is snapshottable dir. It will return the path including the snapshot name like, {{/zone/.snapshot/snap1}}. Instead could improve this by returning only path {{/zone}}. Thanks [~vinayrpet] for the helpful [discussion|https://issues.apache.org/jira/browse/HDFS-8266?focusedCommentId=14543821&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14543821] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8399) Erasure Coding: BlockManager is unnecessarily computing recovery work for the deleted blocks
Rakesh R created HDFS-8399: -- Summary: Erasure Coding: BlockManager is unnecessarily computing recovery work for the deleted blocks Key: HDFS-8399 URL: https://issues.apache.org/jira/browse/HDFS-8399 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Following exception occurred in the {{ReplicationMonitor}}. As per the initial analysis, I could see the exception is coming for the blocks of the deleted file. {code} 2015-05-14 14:14:40,485 FATAL util.ExitUtil (ExitUtil.java:terminate(127)) - Terminate called org.apache.hadoop.util.ExitUtil$ExitException: java.lang.AssertionError: Absolute path required at org.apache.hadoop.hdfs.server.namenode.INode.getPathNames(INode.java:744) at org.apache.hadoop.hdfs.server.namenode.INode.getPathComponents(INode.java:723) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getINodesInPath(FSDirectory.java:1655) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getECSchemaForPath(FSNamesystem.java:8435) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeRecoveryWorkForBlocks(BlockManager.java:1572) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockRecoveryWork(BlockManager.java:1402) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3894) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3846) at java.lang.Thread.run(Thread.java:722) at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126) at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3865) at java.lang.Thread.run(Thread.java:722) Exception in thread "org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@1255079" org.apache.hadoop.util.ExitUtil$ExitException: java.lang.AssertionError: Absolute path required at org.apache.hadoop.hdfs.server.namenode.INode.getPathNames(INode.java:744) at org.apache.hadoop.hdfs.server.namenode.INode.getPathComponents(INode.java:723) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getINodesInPath(FSDirectory.java:1655) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getECSchemaForPath(FSNamesystem.java:8435) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeRecoveryWorkForBlocks(BlockManager.java:1572) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockRecoveryWork(BlockManager.java:1402) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3894) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3846) at java.lang.Thread.run(Thread.java:722) at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126) at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3865) at java.lang.Thread.run(Thread.java:722) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8387) Revisit the long and int datatypes usage in striping logic
Rakesh R created HDFS-8387: -- Summary: Revisit the long and int datatypes usage in striping logic Key: HDFS-8387 URL: https://issues.apache.org/jira/browse/HDFS-8387 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R This idea of this jira is to revisit the usage of {{long}} and {{int}} data types in the striping logic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8386) Improve synchronization of 'streamer' reference in DFSOutputStream - accessed inconsistently with respect to synchronization
Rakesh R created HDFS-8386: -- Summary: Improve synchronization of 'streamer' reference in DFSOutputStream - accessed inconsistently with respect to synchronization Key: HDFS-8386 URL: https://issues.apache.org/jira/browse/HDFS-8386 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Rakesh R Assignee: Rakesh R Presently {{DFSOutputStream#streamer}} object reference is accessed inconsistently with respect to synchronization. It would be good to improve this part. This has been noticed when implementing the erasure coding feature. Please refer the related [discussion thread|https://issues.apache.org/jira/browse/HDFS-8294?focusedCommentId=14541411&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14541411] in the jira HDFS-8294 for more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8378) Erasure Coding: Few improvements for the erasure coding worker
Rakesh R created HDFS-8378: -- Summary: Erasure Coding: Few improvements for the erasure coding worker Key: HDFS-8378 URL: https://issues.apache.org/jira/browse/HDFS-8378 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Priority: Minor # Following log is confusing, make it tidy. Its missing {{break;}} statement and causing this unwanted logs. {code} 2015-05-10 15:06:45,878 INFO datanode.DataNode (BPOfferService.java:processCommandFromActive(728)) - DatanodeCommand action: DNA_ERASURE_CODING_RECOVERY 2015-05-10 15:06:45,879 WARN datanode.DataNode (BPOfferService.java:processCommandFromActive(732)) - Unknown DatanodeCommand action: 11 {code} # Add exception trace to the log, would improve debuggability {code} } catch (Throwable e) { LOG.warn("Failed to recover striped block: " + blockGroup); } {code} # Make member variables present in ErasureCodingWorker, ReconstructAndTransferBlock, StripedReader {{private}} {{final}} # Correct spelling of the variable {{STRIPED_READ_TRHEAD_POOL}} to {{STRIPED_READ_THREAD_POOL}} # Good to add debug logs to print the striped read pool size {code} LOG.debug("Using striped reads; pool threads=" + num); {code} # Add meaningful message to the precondition check: {code} Preconditions.checkArgument(liveIndices.length == sources.length); {code} # Remove unused import {code} import org.apache.hadoop.hdfs.server.common.HdfsServerConstants; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8370) Erasure Coding: TestRecoverStripedFile#testRecoverOneParityBlock is failing
Rakesh R created HDFS-8370: -- Summary: Erasure Coding: TestRecoverStripedFile#testRecoverOneParityBlock is failing Key: HDFS-8370 URL: https://issues.apache.org/jira/browse/HDFS-8370 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R This jira is to analyse more on the failure of this unit test. {code} java.io.IOException: Time out waiting for EC block recovery. at org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:333) at org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:234) at org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverOneParityBlock(TestRecoverStripedFile.java:98) {code} Exception occurred during recovery packet transferring: {code} 2015-05-09 15:08:08,910 INFO datanode.DataNode (BlockReceiver.java:receiveBlock(826)) - Exception for BP-1332677436-67.195.81.147-1431184082022:blk_-9223372036854775792_1001 java.io.IOException: Premature EOF from inputStream at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:203) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:787) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:803) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8368) Erasure Coding: DFS opening a non-existent file need to be handled properly
Rakesh R created HDFS-8368: -- Summary: Erasure Coding: DFS opening a non-existent file need to be handled properly Key: HDFS-8368 URL: https://issues.apache.org/jira/browse/HDFS-8368 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R This jira to address bad exceptions when opening a non-existent file. It throws NPE as shown below: {code} java.lang.NullPointerException: null at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1184) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:307) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:303) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:303) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:767) at org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSClient(TestDistributedFileSystem.java:359) at org.apache.hadoop.hdfs.TestDistributedFileSystem.testAllWithNoXmlDefaults(TestDistributedFileSystem.java:666) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8332) DistributedFileSystem listCacheDirectives() and listCachePools() API calls should check filesystem closed
Rakesh R created HDFS-8332: -- Summary: DistributedFileSystem listCacheDirectives() and listCachePools() API calls should check filesystem closed Key: HDFS-8332 URL: https://issues.apache.org/jira/browse/HDFS-8332 Project: Hadoop HDFS Issue Type: Bug Reporter: Rakesh R Assignee: Rakesh R I could see {{listCacheDirectives()}} and {{listCachePools()}} APIs can be called even after the filesystem close. Instead these calls should do {{checkOpen}} and throws: {code} java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:464) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8331) Erasure Coding: Create FileStatus isErasureCoded() method
Rakesh R created HDFS-8331: -- Summary: Erasure Coding: Create FileStatus isErasureCoded() method Key: HDFS-8331 URL: https://issues.apache.org/jira/browse/HDFS-8331 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R The idea of this jira is to discuss the need of {{FileStatus#isErasureCoded()}} API. This is just an initial thought, presently the use case/necessity of this is not clear now. Probably will revisit this once the feature is getting matured. Thanks [~umamaheswararao], [~vinayrpet] , [~zhz] for the offline discussions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8294) Erasure Coding: Fix Findbug warnings present in erasure coding
Rakesh R created HDFS-8294: -- Summary: Erasure Coding: Fix Findbug warnings present in erasure coding Key: HDFS-8294 URL: https://issues.apache.org/jira/browse/HDFS-8294 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8275) Erasure Coding: Implement batched listing of enrasure coding zones
Rakesh R created HDFS-8275: -- Summary: Erasure Coding: Implement batched listing of enrasure coding zones Key: HDFS-8275 URL: https://issues.apache.org/jira/browse/HDFS-8275 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R The idea of the jira is to provide batch API in {{DistributedFileSystem}} to list the {{ECZoneInfo}}. API signature:- {code} /** * List all ErasureCoding zones. Incrementally fetches results from the server. */ public RemoteIterator listErasureCodingZones() throws IOException { return dfs.listErasureCodingZones(); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)