[jira] [Created] (HDDS-1688) Deadlock in ratis client

2019-06-14 Thread Rakesh R (JIRA)
Rakesh R created HDDS-1688:
--

 Summary: Deadlock in ratis client
 Key: HDDS-1688
 URL: https://issues.apache.org/jira/browse/HDDS-1688
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Rakesh R
 Attachments: Freon_baseline_100Threads_64MB_Keysize_8Keys_10buckets.bin

Ran Freon benchmark in a three node cluster with 100 writer threads. After some 
time the client got hanged due to deadlock issue.

+Freon with the args:-+
--numOfBuckets=10 --numOfKeys=8 --keySize=67108864 --numOfVolumes=100 
--numOfThreads=100

3 BLOCKED threads. Attached whole threaddump.

{code}
Found one Java-level deadlock:
=
"grpc-default-executor-6":
  waiting for ownable synchronizer 0x00021546bd00, (a 
java.util.concurrent.locks.ReentrantReadWriteLock$FairSync),
  which is held by "ForkJoinPool.commonPool-worker-7"
"ForkJoinPool.commonPool-worker-7":
  waiting to lock monitor 0x7f48fc99c448 (object 0x00021546be30, a 
org.apache.ratis.util.SlidingWindow$Client),
  which is held by "grpc-default-executor-6"
{code}

{code}
ForkJoinPool.commonPool-worker-7
priority:5 - threadId:0x7f48d834b000 - nativeId:0x9ffb - nativeId 
(decimal):40955 - state:BLOCKED
stackTrace:
java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.ratis.util.SlidingWindow$Client.resetFirstSeqNum(SlidingWindow.java:348)
- waiting to lock <0x00021546be30> (a 
org.apache.ratis.util.SlidingWindow$Client)
at 
org.apache.ratis.client.impl.OrderedAsync.resetSlidingWindow(OrderedAsync.java:122)
at 
org.apache.ratis.client.impl.OrderedAsync$$Lambda$943/1670264164.accept(Unknown 
Source)
at 
org.apache.ratis.client.impl.RaftClientImpl.lambda$handleIOException$6(RaftClientImpl.java:352)
at 
org.apache.ratis.client.impl.RaftClientImpl$$Lambda$944/769363367.accept(Unknown
 Source)
at java.util.Optional.ifPresent(Optional.java:159)
at 
org.apache.ratis.client.impl.RaftClientImpl.handleIOException(RaftClientImpl.java:352)
at 
org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$10(OrderedAsync.java:235)
at 
org.apache.ratis.client.impl.OrderedAsync$$Lambda$776/1213731951.apply(Unknown 
Source)
at 
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at 
java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.completeReplyExceptionally(GrpcClientProtocolClient.java:324)
at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.close(GrpcClientProtocolClient.java:313)
at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$400(GrpcClientProtocolClient.java:245)
at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient.lambda$close$1(GrpcClientProtocolClient.java:131)
at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$$Lambda$950/1948156329.accept(Unknown
 Source)
at java.util.Optional.ifPresent(Optional.java:159)
at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient.close(GrpcClientProtocolClient.java:131)
at 
org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$close$1(PeerProxyMap.java:73)
at 
org.apache.ratis.util.PeerProxyMap$PeerAndProxy$$Lambda$948/427065222.run(Unknown
 Source)
at 
org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$2(LifeCycle.java:231)
at org.apache.ratis.util.LifeCycle$$Lambda$949/1311526821.get(Unknown Source)
at org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:251)
at org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:229)
at org.apache.ratis.util.PeerProxyMap$PeerAndProxy.close(PeerProxyMap.java:70)
- locked <0x0003e793ef48> (a 
org.apache.ratis.util.PeerProxyMap$PeerAndProxy)
at org.apache.ratis.util.PeerProxyMap.resetProxy(PeerProxyMap.java:126)
- locked <0x000215453400> (a java.lang.Object)
at org.apache.ratis.util.PeerProxyMap.handleException(PeerProxyMap.java:135)
at 
org.apache.ratis.client.impl.RaftClientRpcWithProxy.handleException(RaftClientRpcWithProxy.java:47)
at 
org.apache.ratis.client.impl.RaftClientImpl.handleIOException(RaftClientImpl.java:375)
at 
org.apache.ratis.client.impl.RaftClientImpl.handleIOException(RaftClientImpl.java:341)
at 
org.apache.ratis.client.impl.UnorderedAsync.lambda$sendRequestWithRetry$4(UnorderedAsync.java:108)
at 
org.apache.ratis.client.impl.UnorderedAsync$$Lambda$976/655038759.accept(Unknown
 Source)
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at 
java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:44

[jira] [Created] (HDDS-1687) Datanode process shutdown due to OOME

2019-06-14 Thread Rakesh R (JIRA)
Rakesh R created HDDS-1687:
--

 Summary: Datanode process shutdown due to OOME
 Key: HDDS-1687
 URL: https://issues.apache.org/jira/browse/HDDS-1687
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Rakesh R
 Attachments: baseline test - datanode error logs.0.5.0.rar

Ran Freon benchmark in a three node cluster and with more parallel writer 
threads, datanode daemon hits OOME and got shutdown. Used HDD as storage type 
in worker nodes.

+Freon with the args:-+
--numOfBuckets=10 --numOfKeys=8 --keySize=67108864 --numOfVolumes=100 
--numOfThreads=100


*DN-2* : Process got killed during the test, due to OOME
{code}
2019-06-13 00:48:11,976 ERROR 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: Terminating 
with exit status 1: 
a0cb8914-b51c-41b1-b5d2-59313cf38c0b-SegmentedRaftLogWorker:Storage Directory 
/data/datab/ozone/metadir/ratis/cbf29739-cbd1-4b00-8a21-2db750004dc7 failed.
java.lang.OutOfMemoryError: Direct buffer memory
   at java.nio.Bits.reserveMemory(Bits.java:694)
   at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
   at 
org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:44)
   at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:70)
   at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:481)
   at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:234)
   at java.lang.Thread.run(Thread.java:748)
{code}

*DN3* : Process got killed during the test, due to OOME. I could see lots of 
NPE at the datanode logs.
{code}
2019-06-13 00:44:44,581 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 
83232f1f-4469-4a4d-b369-c131c8432ae9: follower 
07ace812-3883-47d3-ac95-3d55de5fab5c:10.243.61.192:9858's next index is 0, 
log's start index is 10062, need to notify follower to install snapshot
2019-06-13 00:44:44,582 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 
83232f1f-4469-4a4d-b369-c131c8432ae9->07ace812-3883-47d3-ac95-3d55de5fab5c: 
follower responses installSnapshot Completed
2019-06-13 00:44:44,582 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 
83232f1f-4469-4a4d-b369-c131c8432ae9: follower 
07ace812-3883-47d3-ac95-3d55de5fab5c:10.243.61.192:9858's next index is 0, 
log's start index is 10062, need to notify follower to install snapshot
2019-06-13 00:44:44,587 ERROR org.apache.ratis.server.impl.LogAppender: 
org.apache.ratis.server.impl.LogAppender$AppenderDaemon@554415fe unexpected 
exception
java.lang.NullPointerException: 
83232f1f-4469-4a4d-b369-c131c8432ae9->07ace812-3883-47d3-ac95-3d55de5fab5c: 
Previous TermIndex not found for firstIndex = 10062
   at java.util.Objects.requireNonNull(Objects.java:290)
   at 
org.apache.ratis.server.impl.LogAppender.assertProtos(LogAppender.java:234)
   at 
org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:221)
   at 
org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:169)
   at 
org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:113)
   at 
org.apache.ratis.server.impl.LogAppender$AppenderDaemon.run(LogAppender.java:80)
   at java.lang.Thread.run(Thread.java:748)

OOME log messages present in the *.out file.

Exception in thread 
"org.apache.ratis.server.impl.LogAppender$AppenderDaemon$$Lambda$267/386355867@1d9c10b3"
 java.lang.OutOfMemoryError: unable to create new native thread
   at java.lang.Thread.start0(Native Method)
   at java.lang.Thread.start(Thread.java:717)
   at 
org.apache.ratis.server.impl.LogAppender$AppenderDaemon.start(LogAppender.java:68)
   at 
org.apache.ratis.server.impl.LogAppender.startAppender(LogAppender.java:153)
   at java.util.ArrayList.forEach(ArrayList.java:1257)
   at 
org.apache.ratis.server.impl.LeaderState.addAndStartSenders(LeaderState.java:372)
   at 
org.apache.ratis.server.impl.LeaderState.restartSender(LeaderState.java:394)
   at 
org.apache.ratis.server.impl.LogAppender$AppenderDaemon.run(LogAppender.java:97)
   at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1594) NullPointerException at the ratis client while running Freon benchmark

2019-05-27 Thread Rakesh R (JIRA)
Rakesh R created HDDS-1594:
--

 Summary: NullPointerException at the ratis client while running 
Freon benchmark
 Key: HDDS-1594
 URL: https://issues.apache.org/jira/browse/HDDS-1594
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Rakesh R


Hits NPE during Freon benchmark test run. Below is the exception logged at the 
client side output log message. 

{code}
SEVERE: Exception while executing runnable 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed@6c585536
java.lang.NullPointerException
at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.completeReplyExceptionally(GrpcClientProtocolClient.java:320)
at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$000(GrpcClientProtocolClient.java:245)
at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onError(GrpcClientProtocolClient.java:269)
at 
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434)
at 
org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at 
org.apache.ratis.thirdparty.io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:678)
at 
org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at 
org.apache.ratis.thirdparty.io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at 
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14393) Move stats related methods to MappableBlockLoader

2019-03-27 Thread Rakesh R (JIRA)
Rakesh R created HDFS-14393:
---

 Summary: Move stats related methods to MappableBlockLoader
 Key: HDFS-14393
 URL: https://issues.apache.org/jira/browse/HDFS-14393
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


This jira sub-task is to move stats related methods to specific loader and make 
FsDatasetCache more cleaner to plugin DRAM and PMem implementations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-17 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R reopened HDFS-14355:
-

> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or convenient in some environments or platforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-17 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R resolved HDFS-14355.
-
Resolution: Unresolved

> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or convenient in some environments or platforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13808) [SPS]: Remove unwanted FSNamesystem #isFileOpenedForWrite() and #getFileInfo() function

2018-08-08 Thread Rakesh R (JIRA)
Rakesh R created HDFS-13808:
---

 Summary: [SPS]: Remove unwanted FSNamesystem 
#isFileOpenedForWrite() and #getFileInfo() function
 Key: HDFS-13808
 URL: https://issues.apache.org/jira/browse/HDFS-13808
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-13084) [SPS]: Fix the branch review comments

2018-08-01 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R resolved HDFS-13084.
-
   Resolution: Fixed
Fix Version/s: HDFS-10285

I'm closing this issue as {{IntraSPSNameNodeContext}} code implementation 
specifically for the internal SPS service has been removed from this branch. 
Internal SPS mechanism will be discussed and supported via the follow-up Jira 
task HDFS-12226. We have taken care the comments related to this branch via 
HDFS-13097, HDFS-13110, HDFS-13166, HDFS-13381 Jira sub-tasks.

> [SPS]: Fix the branch review comments
> -
>
> Key: HDFS-13084
> URL: https://issues.apache.org/jira/browse/HDFS-13084
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Uma Maheswara Rao G
>Assignee: Rakesh R
>Priority: Major
> Fix For: HDFS-10285
>
>
> Fix the review comments provided by [~daryn]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13491) [SPS]: Discuss and implement efficient approach to send a copy of a block to another datanode

2018-04-23 Thread Rakesh R (JIRA)
Rakesh R created HDFS-13491:
---

 Summary: [SPS]: Discuss and implement efficient approach to send a 
copy of a block to another datanode
 Key: HDFS-13491
 URL: https://issues.apache.org/jira/browse/HDFS-13491
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


This Jira task is to reach consensus about the block transfer logic to another 
data node and implement the same to satisfy block storage policy.

Reference discussion thread



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13381) [SPS]: Use DFSUtilClient#makePathFromFileId() to prepare satisfier file path

2018-04-02 Thread Rakesh R (JIRA)
Rakesh R created HDFS-13381:
---

 Summary: [SPS]: Use DFSUtilClient#makePathFromFileId() to prepare 
satisfier file path
 Key: HDFS-13381
 URL: https://issues.apache.org/jira/browse/HDFS-13381
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


This Jira task will address the following comments:
 # Use DFSUtilClient::makePathFromFileId, instead of generics(one for string 
path and another for inodeId) like today.
 # Only the context impl differs for external/internal sps. Here, it can simply 
move FileCollector and BlockMoveTaskHandler to Context interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13166) [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly getLiveDatanodeStorageReport() calls

2018-02-18 Thread Rakesh R (JIRA)
Rakesh R created HDFS-13166:
---

 Summary: [SPS]: Implement caching mechanism to keep LIVE datanodes 
to minimize costly getLiveDatanodeStorageReport() calls
 Key: HDFS-13166
 URL: https://issues.apache.org/jira/browse/HDFS-13166
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


Presently {{#getLiveDatanodeStorageReport}} is fetched for every file and does 
the computation. This task is to discuss and implement a cache mechanism to 
minimize the number of function calls. Probably, we could define a configurable 
refresh interval and periodically refresh the DN cache by fetching latest 
{{#getLiveDatanodeStorageReport}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13165) [SPS]: Collects successfully moved block details via IBR

2018-02-18 Thread Rakesh R (JIRA)
Rakesh R created HDFS-13165:
---

 Summary: [SPS]: Collects successfully moved block details via IBR
 Key: HDFS-13165
 URL: https://issues.apache.org/jira/browse/HDFS-13165
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R


This task to make use of the existing IBR to get moved block details and remove 
unwanted future tracking logic exists in BlockStorageMovementTracker code, this 
is no more needed as the file level tracking maintained at NN itself.

Following comments taken from HDFS-10285, 
[here|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16347472&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16347472]

Comment-3)
{quote}BPServiceActor
Is it actually sending back the moved blocks? Aren’t IBRs sufficient?{quote}

Comment-21)
{quote}
BlockStorageMovementTracker
Many data structures are riddled with non-threadsafe race conditions and risk 
of CMEs.

Ex. The moverTaskFutures map. Adding new blocks and/or adding to a block's list 
of futures is synchronized. However the run loop does an unsynchronized block 
get, unsynchronized future remove, unsynchronized isEmpty, possibly another 
unsynchronized get, only then does it do a synchronized remove of the block. 
The whole chunk of code should be synchronized.

Is the problematic moverTaskFutures even needed? It's aggregating futures 
per-block for seemingly no reason. Why track all the futures at all instead of 
just relying on the completion service? As best I can tell:

It's only used to determine if a future from the completion service should be 
ignored during shutdown. Shutdown sets the running boolean to false and clears 
the entire datastructure so why not use the running boolean like a check just a 
little further down?
As synchronization to sleep up to 2 seconds before performing a blocking 
moverCompletionService.take, but only when it thinks there are no active 
futures. I'll ignore the missed notify race that the bounded wait masks, but 
the real question is why not just do the blocking take?
Why all the complexity? Am I missing something?

BlocksMovementsStatusHandler
Suffers same type of thread safety issues as StoragePolicySatisfyWorker. Ex. 
blockIdVsMovementStatus is inconsistent synchronized. Does synchronize to 
return an unmodifiable list which sadly does nothing to protect the caller from 
CME.

handle is iterating over a non-thread safe list.
{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13110) [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier

2018-02-06 Thread Rakesh R (JIRA)
Rakesh R created HDFS-13110:
---

 Summary: [SPS]: Reduce the number of APIs in NamenodeProtocol used 
by external satisfier
 Key: HDFS-13110
 URL: https://issues.apache.org/jira/browse/HDFS-13110
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R


This task is to address [~daryn] comments. 

*Comment No.10)*

NamenodeProtocolTranslatorPB

Most of the api changes appear unnecessary.

IntraSPSNameNodeContext#getFileInfo swallows all IOEs, based on assumption that 
any and all IOEs means FNF which probably isn’t the intention during rpc 
exceptions.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13095) Improve slice tree traversal implementation

2018-01-31 Thread Rakesh R (JIRA)
Rakesh R created HDFS-13095:
---

 Summary: Improve slice tree traversal implementation
 Key: HDFS-13095
 URL: https://issues.apache.org/jira/browse/HDFS-13095
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Rakesh R
Assignee: Rakesh R


This task is to refine the existing slice tree traversal logic in 
[ReencryptionHandler|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ReencryptionHandler.java#L74]
 class.

Please refer Daryn's review comments
{quote}*FSTreeTraverser*
 I need to study this more but I have grave concerns this will work correctly 
in a mutating namesystem.  Ex. renames and deletes esp. in combination with 
snapshots. Looks like there's a chance it will go off in the weeds when 
backtracking out of a renamed directory.

traverseDir may NPE if it's traversing a tree in a snapshot and one of the 
ancestors is deleted.

Not sure why it's bothering to re-check permissions during the crawl.  The 
storage policy is inherited by the entire tree, regardless of whether the 
sub-contents are accessible.  The effect of this patch is the storage policy is 
enforced for all readable files, non-readable violate the new storage policy, 
new non-readable will conform to the new storage policy.  Very convoluted.  
Since new files will conform, should just process the entire tree.
{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13077) [SPS]: Fix review comments of external storage policy satisfier

2018-01-29 Thread Rakesh R (JIRA)
Rakesh R created HDFS-13077:
---

 Summary: [SPS]: Fix review comments of external storage policy 
satisfier
 Key: HDFS-13077
 URL: https://issues.apache.org/jira/browse/HDFS-13077
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


This task is to address the following Uma's review comments:
 - Implement login with external SPS keytab
 - make SPS outstanding requests Q limit configurable. Configuration could be 
{{dfs.storage.policy.satisfier.max.outstanding.paths}}
 - fix checkstyle warnings



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13076) Merge work for HDFS-10285

2018-01-28 Thread Rakesh R (JIRA)
Rakesh R created HDFS-13076:
---

 Summary: Merge work for HDFS-10285
 Key: HDFS-13076
 URL: https://issues.apache.org/jira/browse/HDFS-13076
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R


This Jira is to run aggregated HDFS-10285 branch patch against trunk and check 
for any jenkins issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13057) [SPS]: Revisit configurations to make SPS service modes internal/external/none

2018-01-24 Thread Rakesh R (JIRA)
Rakesh R created HDFS-13057:
---

 Summary: [SPS]: Revisit configurations to make SPS service modes 
internal/external/none
 Key: HDFS-13057
 URL: https://issues.apache.org/jira/browse/HDFS-13057
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


This task is to revisit the configurations to make SPS service modes - 
{{internal/external/none}}
- {{internal}} : represents SPS service should be running with NN
- {{external}}: represents SPS service will be running outside NN
- {{none}}: represents the SPS service is completely disabled and zero cost to 
the system.

Proposed configuration {{dfs.storage.policy.satisfier.running.mode}} item in 
hdfs-site.xml file and value will be string. The mode can be changed via 
{{reconfig}} command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12982) [SPS]: Reduce the locking and cleanup the Namesystem access

2018-01-02 Thread Rakesh R (JIRA)
Rakesh R created HDFS-12982:
---

 Summary: [SPS]: Reduce the locking and cleanup the Namesystem 
access
 Key: HDFS-12982
 URL: https://issues.apache.org/jira/browse/HDFS-12982
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


This task is to optimize the NS lock usage in SPS and cleanup the Namesystem 
access via {{Context}} interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12790) [SPS]: Rebasing HDFS-10285 branch after HDFS-10467, HDFS-12599 and HDFS-11968 commits

2017-11-08 Thread Rakesh R (JIRA)
Rakesh R created HDFS-12790:
---

 Summary: [SPS]: Rebasing HDFS-10285 branch after HDFS-10467, 
HDFS-12599 and HDFS-11968 commits
 Key: HDFS-12790
 URL: https://issues.apache.org/jira/browse/HDFS-12790
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


This task is a continuation with the periodic HDFS-10285 branch code rebasing 
with the trunk code. To make branch code compile with the trunk code, it needs 
to be refactored with the latest trunk code changes - HDFS-10467, HDFS-12599 
and HDFS-11968.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12570) [SPS]: Refactor Co-ordinator datanode logic to track the block storage movements

2017-09-29 Thread Rakesh R (JIRA)
Rakesh R created HDFS-12570:
---

 Summary: [SPS]: Refactor Co-ordinator datanode logic to track the 
block storage movements
 Key: HDFS-12570
 URL: https://issues.apache.org/jira/browse/HDFS-12570
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Rakesh R
Assignee: Rakesh R


This task is to refactor the C-DN block storage movements. Basically, the idea 
is to move the scheduling and tracking logic to Namenode rather than at the 
special C-DN. Please refer the discussion with [~andrew.wang] to understand the 
[background and the necessity of 
refactoring|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16141060&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16141060].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12291) [SPS]: Provide a mechanism to recursively iterate and satisfy storage policy of all the files under the given dir

2017-08-11 Thread Rakesh R (JIRA)
Rakesh R created HDFS-12291:
---

 Summary: [SPS]: Provide a mechanism to recursively iterate and 
satisfy storage policy of all the files under the given dir
 Key: HDFS-12291
 URL: https://issues.apache.org/jira/browse/HDFS-12291
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


For the given source path directory, presently SPS consider only the files 
immediately under the directory(only one level of scanning) for satisfying the 
policy. It WON’T do recursive directory scanning and then schedules SPS tasks 
to satisfy the storage policy of all the files till the leaf node. 

The idea of this jira is to discuss & implement an efficient recursive 
directory iteration mechanism and satisfies storage policy for all the files 
under the given directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12228) [SPS]: Add storage policy satisfier related metrics

2017-07-29 Thread Rakesh R (JIRA)
Rakesh R created HDFS-12228:
---

 Summary: [SPS]: Add storage policy satisfier related metrics
 Key: HDFS-12228
 URL: https://issues.apache.org/jira/browse/HDFS-12228
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R


This jira to discuss and implement metrics needed for SPS feature.

Below are few metrics:
# count of {{inprogress}} block movements
# count of {{successful}} block movements
# count of {{failed}} block movements

Need to analyse and add more.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12227) Add throttling to control the number of concurrent moves at the datanode

2017-07-29 Thread Rakesh R (JIRA)
Rakesh R created HDFS-12227:
---

 Summary: Add throttling to control the number of concurrent moves 
at the datanode
 Key: HDFS-12227
 URL: https://issues.apache.org/jira/browse/HDFS-12227
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Rakesh R






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12226) Follow-on work for Storage Policy Satisfier in Namenode

2017-07-29 Thread Rakesh R (JIRA)
Rakesh R created HDFS-12226:
---

 Summary: Follow-on work for Storage Policy Satisfier in Namenode
 Key: HDFS-12226
 URL: https://issues.apache.org/jira/browse/HDFS-12226
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Reporter: Rakesh R


This is a follow up jira of HDFS-10285 Storage Policy Satisfier feature.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12214) Rename configuration property 'dfs.storage.policy.satisfier.activate' to 'dfs.storage.policy.satisfier.enable'

2017-07-28 Thread Rakesh R (JIRA)
Rakesh R created HDFS-12214:
---

 Summary: Rename configuration property 
'dfs.storage.policy.satisfier.activate' to 'dfs.storage.policy.satisfier.enable'
 Key: HDFS-12214
 URL: https://issues.apache.org/jira/browse/HDFS-12214
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


This sub-task is to address [~andrew.wang]'s review comments. Please refer the 
[review 
comment|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16103734&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16103734]
 in HDFS-10285 umbrella jira.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12152) [SPS]: Re-arrange StoragePolicySatisfyWorker stopping sequence to improve thread cleanup time

2017-07-17 Thread Rakesh R (JIRA)
Rakesh R created HDFS-12152:
---

 Summary: [SPS]: Re-arrange StoragePolicySatisfyWorker stopping 
sequence to improve thread cleanup time
 Key: HDFS-12152
 URL: https://issues.apache.org/jira/browse/HDFS-12152
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


This jira to improve the StoragePolicySatisfyWorker#stop sequence of steps to 
improve the thread interruption and graceful shutdown quickly.

I have observed that 
[TestDataNodeUUID#testUUIDRegeneration|https://builds.apache.org/job/PreCommit-HDFS-Build/20271/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeUUID/testUUIDRegeneration/]
 test case is getting timed out frequently. When analyzing, it looks like the 
below function is always taking 3 secs waiting period. Probably, we could 
improve the thread interruption sequence so that the thread should finish #run 
method quickly.

{code}
StoragePolicySatisfyWorker.java

  void waitToFinishWorkerThread() {
try {
  movementTrackerThread.join(3000);
} catch (InterruptedException ignore) {
  // ignore
}
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12141) [SPS]: Fix checkstyle warnings

2017-07-14 Thread Rakesh R (JIRA)
Rakesh R created HDFS-12141:
---

 Summary: [SPS]: Fix checkstyle warnings
 Key: HDFS-12141
 URL: https://issues.apache.org/jira/browse/HDFS-12141
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


This sub-task is to fix the applicable checkstyle warnings in HDFS-10285 
branch. Attached the checkstyle report.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-8125) Erasure Coding: Expose refreshECSchemas command to reload predefined schemas

2017-01-18 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R resolved HDFS-8125.

Resolution: Not A Problem

Agreed, the way we are plugging EC policy has been changed and its hard coded 
approach now. This jira is not required and I'm closing this as {{Not a 
problem}}.

> Erasure Coding: Expose refreshECSchemas command to reload predefined schemas
> 
>
> Key: HDFS-8125
> URL: https://issues.apache.org/jira/browse/HDFS-8125
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>
> This is to expose {{refreshECSchemas}} command to administrators. When 
> invoking this command it will reload predefined schemas from configuration 
> file and dynamically update the schema definitions maintained in Namenode.
> Note: For more details please refer the 
> [discussion|https://issues.apache.org/jira/browse/HDFS-7866?focusedCommentId=14489387&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14489387]
>  with [~drankye]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11248) [SPS]: Handle partial block location movements

2016-12-14 Thread Rakesh R (JIRA)
Rakesh R created HDFS-11248:
---

 Summary: [SPS]: Handle partial block location movements
 Key: HDFS-11248
 URL: https://issues.apache.org/jira/browse/HDFS-11248
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


This jira is to handle partial block location movements due to unavailability 
of target nodes for the matching storage type. 

For example, We have only A(disk,archive), B(disk) and C(disk,archive) are live 
nodes with A & C have archive storage type. Say, we have a block with locations 
{{A(disk), B(disk), C(disk)}}. Again assume, user changed the storage policy to 
COLD. Now, SPS internally starts preparing the src-target pairing like, {{src=> 
(A, B, C) and target=> (A, C)}} and sends BLOCK_STORAGE_MOVEMENT to the 
coordinator. SPS is skipping B as it doesn't have archive media to indicate 
that it should do retries to satisfy all block locations after some time. On 
receiving the movement command, coordinator will pair the src-target node to 
schedule actual physical movements like, {{movetask=> (A, A), (B, C)}}. Here 
ideally it should do {{(C, C)}} instead of {{(B, C)}} but mistakenly choosing 
the source C and creates problem.

IMHO, the implicit assumptions of retry needed is creating confusions and leads 
to coding mistakes. One idea to fix this problem is to create a new flag 
{{retryNeeded}} flag to make it more readable. With this, SPS will prepare only 
the matching pair and dummy source slots will be avoided like, {{src=> (A, C) 
and target=> (A, C)}} and mark {{retryNeeded=true}} to convey the message that 
this {{trackId}} has only partial blocks movements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-7955) Improve naming of classes, methods, and variables related to block replication and recovery

2016-11-30 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R resolved HDFS-7955.

  Resolution: Fixed
Target Version/s: 3.0.0-alpha2

Thank you [~zhz], [~andrew.wang], [~szetszwo], [~umamaheswararao], [~drankye] 
and all others for the great support!

This umbrella jira has covered most of the desired work. I'm closing this jira. 
IMHO, if needed we could create sub-task(s) under HDFS-8031 erasure coding 
follow-on umbrella jira and work on. Thanks!

> Improve naming of classes, methods, and variables related to block 
> replication and recovery
> ---
>
> Key: HDFS-7955
> URL: https://issues.apache.org/jira/browse/HDFS-7955
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Reporter: Zhe Zhang
>Assignee: Rakesh R
> Attachments: HDFS-7955-001.patch, HDFS-7955-002.patch, 
> HDFS-7955-003.patch, HDFS-7955-004.patch, HDFS-7955-5.patch
>
>
> Many existing names should be revised to avoid confusion when blocks can be 
> both replicated and erasure coded. This JIRA aims to solicit opinions on 
> making those names more consistent and intuitive.
> # In current HDFS _block recovery_ refers to the process of finalizing the 
> last block of a file, triggered by _lease recovery_. It is different from the 
> intuitive meaning of _recovering a lost block_. To avoid confusion, I can 
> think of 2 options:
> #* Rename this process as _block finalization_ or _block completion_. I 
> prefer this option because this is literally not a recovery.
> #* If we want to keep existing terms unchanged we can name all EC recovery 
> and re-replication logics as _reconstruction_.  
> # As Kai [suggested | 
> https://issues.apache.org/jira/browse/HDFS-7369?focusedCommentId=14361131&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14361131]
>  under HDFS-7369, several replication-based names should be made more generic:
> #* {{UnderReplicatedBlocks}} and {{neededReplications}}. E.g. we can use 
> {{LowRedundancyBlocks}}/{{AtRiskBlocks}}, and 
> {{neededRecovery}}/{{neededReconstruction}}.
> #* {{PendingReplicationBlocks}}
> #* {{ReplicationMonitor}}
> I'm sure the above list is incomplete; discussions and comments are very 
> welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11193) [SPS]: Erasure coded files should be considered for satisfying storage policy

2016-11-30 Thread Rakesh R (JIRA)
Rakesh R created HDFS-11193:
---

 Summary: [SPS]: Erasure coded files should be considered for 
satisfying storage policy
 Key: HDFS-11193
 URL: https://issues.apache.org/jira/browse/HDFS-11193
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Rakesh R
Assignee: Rakesh R


Erasure coded striped files supports storage policies {{HOT, COLD, ALLSSD}}. 
{{HdfsAdmin#satisfyStoragePolicy}} API call on a directory should consider all 
immediate files under that directory and need to check that, the files really 
matching with namespace storage policy. All the mismatched striped blocks 
should be chosen for block movement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11164) Mover should avoid unnecessary retries if the block is pinned

2016-11-21 Thread Rakesh R (JIRA)
Rakesh R created HDFS-11164:
---

 Summary: Mover should avoid unnecessary retries if the block is 
pinned
 Key: HDFS-11164
 URL: https://issues.apache.org/jira/browse/HDFS-11164
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer & mover
Reporter: Rakesh R
Assignee: Rakesh R


When mover is trying to move a pinned block to another datanode, it will 
internally hits the following IOException and mark the block movement as 
{{failure}}. Since the Mover has {{dfs.mover.retry.max.attempts}} configs, it 
will continue moving this block until it reaches {{retryMaxAttempts}}. This 
retry is unnecessary and would be good to avoid retry attempts as pinned block 
won't be able to move.

{code}
2016-11-22 10:56:10,537 WARN org.apache.hadoop.hdfs.server.balancer.Dispatcher: 
Failed to move blk_1073741825_1001 with size=52 from 127.0.0.1:19501:DISK to 
127.0.0.1:19758:ARCHIVE through 127.0.0.1:19501
java.io.IOException: Got error, status=ERROR, status message opReplaceBlock 
BP-1772076264-10.252.146.200-1479792322960:blk_1073741825_1001 received 
exception java.io.IOException: Got error, status=ERROR, status message Not able 
to copy block 1073741825 to /127.0.0.1:19826 because it's pinned , copy block 
BP-1772076264-10.252.146.200-1479792322960:blk_1073741825_1001 from 
/127.0.0.1:19501, reportedBlock move is failed
at 
org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:118)
at 
org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:417)
at 
org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:358)
at 
org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$5(Dispatcher.java:322)
at 
org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:1075)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11151) [SPS]: Handle unable to choose target node for the required storage type by StoragePolicySatisfier

2016-11-16 Thread Rakesh R (JIRA)
Rakesh R created HDFS-11151:
---

 Summary: [SPS]: Handle unable to choose target node for the 
required storage type by StoragePolicySatisfier
 Key: HDFS-11151
 URL: https://issues.apache.org/jira/browse/HDFS-11151
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


Presently SPS is not handling a case where the failed to choose target node for 
the required storage type. In general, there are two cases:

# For the given path, unable to find any target node for any of its blocks or 
block locations(src nodes). Means, no block movement will be scheduled against 
this path.
# For the given path, there are few target nodes available for few block 
locations(source nodes). Means, some of the blocks or block locations(src 
nodes) under the given path will be scheduled for block movement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11125) [SPS]: Use smaller batches of BlockMovingInfo into the block storage movement command

2016-11-10 Thread Rakesh R (JIRA)
Rakesh R created HDFS-11125:
---

 Summary: [SPS]: Use smaller batches of BlockMovingInfo into the 
block storage movement command
 Key: HDFS-11125
 URL: https://issues.apache.org/jira/browse/HDFS-11125
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


This is a follow-up task of HDFS-11068, where it sends all the blocks under a 
trackID over single heartbeat response(DNA_BLOCK_STORAGE_MOVEMENT command). If 
blocks are many under a given trackID(For example: a file contains many blocks) 
then those requests go across a network and come with a lot of overhead. In 
this jira, we will discuss and implement a mechanism to limit the list of items 
into smaller batches with in trackID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11113) Document dfs.client.read.striped configuration in hdfs-default.xml

2016-11-05 Thread Rakesh R (JIRA)
Rakesh R created HDFS-3:
---

 Summary: Document dfs.client.read.striped configuration in 
hdfs-default.xml
 Key: HDFS-3
 URL: https://issues.apache.org/jira/browse/HDFS-3
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: documentation, hdfs-client
Reporter: Rakesh R
Assignee: Rakesh R
Priority: Minor


{{dfs.client.read.striped.threadpool.size}} should be covered in 
hdfs-default.xml.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11082) Erasure Coding : Provide replicated EC policy to just replicating the files

2016-11-01 Thread Rakesh R (JIRA)
Rakesh R created HDFS-11082:
---

 Summary: Erasure Coding : Provide replicated EC policy to just 
replicating the files
 Key: HDFS-11082
 URL: https://issues.apache.org/jira/browse/HDFS-11082
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: erasure-coding
Reporter: Rakesh R
Assignee: Rakesh R


The idea of this jira is to provide a new {{replicated EC policy}} so that we 
can override the EC policy on a parent directory and go back to just 
replicating the files based on replication factors.

Thanks [~andrew.wang] for the 
[discussions|https://issues.apache.org/jira/browse/HDFS-11072?focusedCommentId=15620743&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15620743].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11068) [SPS]: Provide unique trackID to track the block movement sends to coordinator

2016-10-27 Thread Rakesh R (JIRA)
Rakesh R created HDFS-11068:
---

 Summary: [SPS]: Provide unique trackID to track the block movement 
sends to coordinator
 Key: HDFS-11068
 URL: https://issues.apache.org/jira/browse/HDFS-11068
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


Presently DatanodeManager uses constant  value -1 as 
[trackID|https://github.com/apache/hadoop/blob/HDFS-10285/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1607],
 which is a temporary value. As per discussion with [~umamaheswararao], one 
proposal is to use {{BlockCollectionId/InodeFileId}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11032) [SPS]: Handling of block movement failure at the coordinator datanode

2016-10-19 Thread Rakesh R (JIRA)
Rakesh R created HDFS-11032:
---

 Summary: [SPS]: Handling of block movement failure at the 
coordinator datanode
 Key: HDFS-11032
 URL: https://issues.apache.org/jira/browse/HDFS-11032
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


The idea of this jira is to discuss and implement an efficient failure(block 
movement failure) handling logic at the datanode cooridnator.  [Code 
reference|https://github.com/apache/hadoop/blob/HDFS-10285/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/StoragePolicySatisfyWorker.java#L243].

Following are the possible errors during block movement:
# Network errors(IOException) - provide retries(may be a hard coded 2 time 
retries) if the block storage movement is failed due to network errors. If its 
still end up with errors after 2 retries then marked as failure/retry to NN.
# No disk space(IOException) - no retries maked as failure/retry to NN.
# Block pinned - no retries marked as success/no-retry to NN. It is not 
possible to relocate this block to another datanode.
# Gen_Stamp mismatches - no retries marked as failure/retry to NN. Could be a 
case that the file might have re-opened.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-8331) Erasure Coding: Create FileStatus isErasureCoded() method

2016-10-13 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R resolved HDFS-8331.

Resolution: Duplicate

> Erasure Coding: Create FileStatus isErasureCoded() method
> -
>
> Key: HDFS-8331
> URL: https://issues.apache.org/jira/browse/HDFS-8331
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>
> The idea of this jira is to discuss the need of 
> {{FileStatus#isErasureCoded()}} API. This is just an initial thought, 
> presently the use case/necessity of this is not clear now. Probably will 
> revisit this once the feature is getting matured.  
> Thanks [~umamaheswararao], [~vinayrpet] , [~zhz] for the offline discussions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10954) [SPS]: Report the failed block movement results back to NN from DN

2016-10-04 Thread Rakesh R (JIRA)
Rakesh R created HDFS-10954:
---

 Summary: [SPS]: Report the failed block movement results back to 
NN from DN
 Key: HDFS-10954
 URL: https://issues.apache.org/jira/browse/HDFS-10954
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


This jira is a follow-up task of HDFS-10884. As part of HDFS-10884 jira, it is 
providing a mechanism to collect all the failed block movement results at the 
{{co-ordinator datanode}} side. Now, the idea of this jira is to discuss an 
efficient way to report these failed block movement results to namenode, so 
that NN can take necessary action based on this information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10920) TestStorageMover#testNoSpaceDisk is failing intermittently

2016-09-28 Thread Rakesh R (JIRA)
Rakesh R created HDFS-10920:
---

 Summary: TestStorageMover#testNoSpaceDisk is failing intermittently
 Key: HDFS-10920
 URL: https://issues.apache.org/jira/browse/HDFS-10920
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Rakesh R
Assignee: Rakesh R


TestStorageMover#testNoSpaceDisk test case is failing frequently in the build.

References:
[HDFS-Build_16890|https://builds.apache.org/job/PreCommit-HDFS-Build/16890], 
[HDFS-Build_16895|https://builds.apache.org/job/PreCommit-HDFS-Build/16895]




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10884) [SPS]: Add block movement tracker to track the completion of block movement future tasks at DN

2016-09-21 Thread Rakesh R (JIRA)
Rakesh R created HDFS-10884:
---

 Summary: [SPS]: Add block movement tracker to track the completion 
of block movement future tasks at DN
 Key: HDFS-10884
 URL: https://issues.apache.org/jira/browse/HDFS-10884
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


Presently 
[StoragePolicySatisfyWorker#processBlockMovingTasks()|https://github.com/apache/hadoop/blob/HDFS-10285/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/StoragePolicySatisfyWorker.java#L147]
 function act as a blocking call. The idea of this jira is to implement a 
mechanism to track these movements async so that would allow other movement 
while processing the previous one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10794) Provide storage policy satisfy worker at DN for co-ordinating the block storage movement work

2016-08-24 Thread Rakesh R (JIRA)
Rakesh R created HDFS-10794:
---

 Summary: Provide storage policy satisfy worker at DN for 
co-ordinating the block storage movement work
 Key: HDFS-10794
 URL: https://issues.apache.org/jira/browse/HDFS-10794
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


The idea of this jira is to implement a mechanism to move the blocks to the 
given target in order to satisfy the block storage policy. Datanode receives 
{{blocktomove}} details via heart beat response from NN. More specifically, its 
a datanode side extension to handle the block storage movement commands.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10720) Fix intermittent test failure of TestDataNodeErasureCodingMetrics#testEcTasks

2016-08-03 Thread Rakesh R (JIRA)
Rakesh R created HDFS-10720:
---

 Summary: Fix intermittent test failure of 
TestDataNodeErasureCodingMetrics#testEcTasks
 Key: HDFS-10720
 URL: https://issues.apache.org/jira/browse/HDFS-10720
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Rakesh R
Assignee: Rakesh R


The test is wrongly finding out the datanode to be corrupted from the block 
locations. Instead of finding out a datanode which is used in the block 
locations it is simply getting a datanode from the cluster, which may not be a 
datanode present in the block locations.
{code}
byte[] indices = lastBlock.getBlockIndices();
//corrupt the first block
DataNode toCorruptDn = cluster.getDataNodes().get(indices[0]);
{code}

For example, datanodes in the cluster.getDataNodes() array indexed like, 
0->Dn1, 1->Dn2, 2->Dn3, 3->Dn4, 4->Dn5, 5->Dn6, 6->Dn7, 7->Dn8, 8->Dn9, 9->Dn10

Assume the datanodes which are part of block location is => Dn2, Dn3, Dn4, Dn5, 
Dn6, Dn7, Dn8, Dn9, Dn10. Now, in the failed scenario, it is getting the 
corrupted datanode as cluster.getDataNodes().get(0) which will be Dn1 and 
corruption of this datanode will not result in ECWork and is failing the tests. 

Ideally, the test should find a datanode from the block locations and corrupt 
it, that will trigger ECWork.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10660) Expose storage policy apis via HDFSAdmin interface

2016-07-20 Thread Rakesh R (JIRA)
Rakesh R created HDFS-10660:
---

 Summary: Expose storage policy apis via HDFSAdmin interface
 Key: HDFS-10660
 URL: https://issues.apache.org/jira/browse/HDFS-10660
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Rakesh R
Assignee: Rakesh R


Presently, {{org.apache.hadoop.hdfs.client.HdfsAdmin.java}} interface has only 
{{#setStoragePolicy()}} API exposed. This jira is to add the following set of 
apis into HdfsAdmin.

{code}
HdfsAdmin#unsetStoragePolicy
HdfsAdmin#getStoragePolicy
HdfsAdmin#getAllStoragePolicies
{code}

Thanks [~arpitagarwal] for the offline discussions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10592) Fix intermittent test failure of TestNameNodeResourceChecker#testCheckThatNameNodeResourceMonitorIsRunning

2016-07-04 Thread Rakesh R (JIRA)
Rakesh R created HDFS-10592:
---

 Summary: Fix intermittent test failure of 
TestNameNodeResourceChecker#testCheckThatNameNodeResourceMonitorIsRunning
 Key: HDFS-10592
 URL: https://issues.apache.org/jira/browse/HDFS-10592
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Rakesh R
Assignee: Rakesh R


This jira is to fix the 
{{TestNameNodeResourceChecker#testCheckThatNameNodeResourceMonitorIsRunning}} 
test case failure.

Reference 
[Build_15973|https://builds.apache.org/job/PreCommit-HDFS-Build/15973/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestNameNodeResourceChecker/testCheckThatNameNodeResourceMonitorIsRunning/]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10590) Fix TestReconstructStripedBlocks.testCountLiveReplicas test failures

2016-07-03 Thread Rakesh R (JIRA)
Rakesh R created HDFS-10590:
---

 Summary: Fix TestReconstructStripedBlocks.testCountLiveReplicas 
test failures
 Key: HDFS-10590
 URL: https://issues.apache.org/jira/browse/HDFS-10590
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Rakesh R
Assignee: Rakesh R


This jira is to fix the test case failure. Please see the below stacktrace.

Reference : 
[Build_15968|https://builds.apache.org/job/PreCommit-HDFS-Build/15968/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testCountLiveReplicas/]

{code}
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.hdfs.server.namenode.TestReconstructStripedBlocks.testCountLiveReplicas(TestReconstructStripedBlocks.java:324)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10584) Allow long-running Mover tool to login with keytab

2016-06-27 Thread Rakesh R (JIRA)
Rakesh R created HDFS-10584:
---

 Summary: Allow long-running Mover tool to login with keytab
 Key: HDFS-10584
 URL: https://issues.apache.org/jira/browse/HDFS-10584
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: balancer & mover
Reporter: Rakesh R
Assignee: Rakesh R


The idea of this jira is to support {{mover}} tool the ability to login from a 
keytab. That way, the RPC client would re-login from the keytab after 
expiration, which means the process could remain authenticated indefinitely. 
With some people wanting to run mover non-stop in "daemon mode", that might be 
a reasonable feature to add. Recently balancer has been enhanced using this 
feature.

Thanks [~zhz] for the offline discussions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10461) Erasure Coding: Optimize block checksum recalculation logic on the fly by reconstructing multiple missed blocks at a time

2016-05-25 Thread Rakesh R (JIRA)
Rakesh R created HDFS-10461:
---

 Summary: Erasure Coding: Optimize block checksum recalculation 
logic on the fly by reconstructing multiple missed blocks at a time
 Key: HDFS-10461
 URL: https://issues.apache.org/jira/browse/HDFS-10461
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


This is HDFS-9833 follow-on task. HDFS-9833 is recomputing only one block 
checksum a time. The reconstruction logic can be further optimized by 
reconstructing multiple blocks at a time.

There are several case to be considered like, 

case-1) Live Block indices : {{0, 4, 5, 6, 7, 8}} - consecutive missing data 
blocks 1, 2, 3
case-2) Live Block indices : {{0, 2, 4, 6, 7, 8}} - jumbled missing data blocks 
1, 3, 5




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10460) Erasure Coding: Recompute block checksum for a particular range less than file size on the fly by reconstructing missed block

2016-05-25 Thread Rakesh R (JIRA)
Rakesh R created HDFS-10460:
---

 Summary: Erasure Coding: Recompute block checksum for a particular 
range less than file size on the fly by reconstructing missed block
 Key: HDFS-10460
 URL: https://issues.apache.org/jira/browse/HDFS-10460
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Rakesh R
Assignee: Rakesh R


This jira is HDFS-9833 follow-on task to address reconstructing block and then 
recalculating block checksum for a particular range query.

For example,
{code}
// create a file 'stripedFile1' with fileSize = cellSize * numDataBlocks = 
65536 * 6 = 393216
FileChecksum stripedFileChecksum = getFileChecksum(stripedFile1, 10, true);
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10434) Fix intermittent test failure of TestDataNodeErasureCodingMetrics

2016-05-19 Thread Rakesh R (JIRA)
Rakesh R created HDFS-10434:
---

 Summary: Fix intermittent test failure of 
TestDataNodeErasureCodingMetrics
 Key: HDFS-10434
 URL: https://issues.apache.org/jira/browse/HDFS-10434
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


This jira is to fix the test case failure.

Reference : 
[Build15485_TestDataNodeErasureCodingMetrics_testEcTasks|https://builds.apache.org/job/PreCommit-HDFS-Build/15485/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeErasureCodingMetrics/testEcTasks/]

{code}
Error Message

Bad value for metric EcReconstructionTasks expected:<1> but was:<0>
Stacktrace

java.lang.AssertionError: Bad value for metric EcReconstructionTasks 
expected:<1> but was:<0>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at 
org.apache.hadoop.test.MetricsAsserts.assertCounter(MetricsAsserts.java:228)
at 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics.testEcTasks(TestDataNodeErasureCodingMetrics.java:92)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10407) Erasure Coding: Rename CorruptReplicasMap to CorruptRedundancyMap in BlockManager to more generic

2016-05-15 Thread Rakesh R (JIRA)
Rakesh R created HDFS-10407:
---

 Summary: Erasure Coding: Rename CorruptReplicasMap to 
CorruptRedundancyMap in BlockManager to more generic
 Key: HDFS-10407
 URL: https://issues.apache.org/jira/browse/HDFS-10407
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Rakesh R
Assignee: Rakesh R


The idea of this jira is to rename the following entity in BlockManager,

- {{CorruptReplicasMap}} to {{CorruptRedundancyMap}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10368) Erasure Coding: Deprecate replication-related config keys

2016-05-05 Thread Rakesh R (JIRA)
Rakesh R created HDFS-10368:
---

 Summary: Erasure Coding: Deprecate replication-related config keys
 Key: HDFS-10368
 URL: https://issues.apache.org/jira/browse/HDFS-10368
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


This jira is to visit the replication based config keys and deprecate them.

Please refer [discussion 
thread|https://issues.apache.org/jira/browse/HDFS-9869?focusedCommentId=15249363&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15249363]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10308) TestRetryCacheWithHA#testRetryCacheOnStandbyNN failing

2016-04-19 Thread Rakesh R (JIRA)
Rakesh R created HDFS-10308:
---

 Summary: TestRetryCacheWithHA#testRetryCacheOnStandbyNN failing
 Key: HDFS-10308
 URL: https://issues.apache.org/jira/browse/HDFS-10308
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Rakesh R
Assignee: Rakesh R


Its failing with following exception
{code}
java.lang.AssertionError: expected:<25> but was:<26>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testRetryCacheOnStandbyNN(TestRetryCacheWithHA.java:169)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10236) Erasure Coding: Rename replication-based names in BlockManager to more generic [part-3]

2016-03-30 Thread Rakesh R (JIRA)
Rakesh R created HDFS-10236:
---

 Summary: Erasure Coding: Rename replication-based names in 
BlockManager to more generic [part-3]
 Key: HDFS-10236
 URL: https://issues.apache.org/jira/browse/HDFS-10236
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


The idea of this jira is to rename the following entity in BlockManager as,

{{getExpectedReplicaNum}} to {{getExpectedRedundancyNum}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10186) DirectoryScanner: Improve logs by adding full path of both actual and expected block directories

2016-03-20 Thread Rakesh R (JIRA)
Rakesh R created HDFS-10186:
---

 Summary: DirectoryScanner: Improve logs by adding full path of 
both actual and expected block directories
 Key: HDFS-10186
 URL: https://issues.apache.org/jira/browse/HDFS-10186
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R
Priority: Minor


As per the 
[discussion|https://issues.apache.org/jira/browse/HDFS-7648?focusedCommentId=15195908&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15195908],
 this jira is to improve directory scanner log by adding the wrong and correct 
directory path so that admins can take necessary actions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9918) Erasure Coding : sort located striped blocks based on decommissioned states

2016-03-08 Thread Rakesh R (JIRA)
Rakesh R created HDFS-9918:
--

 Summary: Erasure Coding : sort located striped blocks based on 
decommissioned states
 Key: HDFS-9918
 URL: https://issues.apache.org/jira/browse/HDFS-9918
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


This jira is a follow-on work of HDFS-8786, where we do decommissioning of 
datanodes having striped blocks.

Now, after decommissioning it requires to change the ordering of the storage 
list so that the decommissioned datanodes should only be last node in list.

For example, assume we have a block group with storage list:-
d0, d1, d2, d3, d4, d5, d6, d7, d8, d9
mapping to indices
0, 1, 2, 3, 4, 5, 6, 7, 8, 2

Here the internal block b2 is duplicated, locating in d2 and d9. If d2 is a 
decommissioning node then should switch d2 and d9 in the storage list.

Thanks [~jingzhao] for the 
[discussions|https://issues.apache.org/jira/browse/HDFS-8786?focusedCommentId=15180415&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15180415]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9879) Erasure Coding : schedule striped blocks to be cached on DataNodes

2016-02-29 Thread Rakesh R (JIRA)
Rakesh R created HDFS-9879:
--

 Summary: Erasure Coding : schedule striped blocks to be cached on 
DataNodes
 Key: HDFS-9879
 URL: https://issues.apache.org/jira/browse/HDFS-9879
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


This jira to discuss and implement the caching of striped block objects on the 
appropriate datanode.

Presently it is checking block group size and scheduling the blockGroupId to 
the datanode, which needs to be refined by checking the 
{{StripedBlockUtil.getInternalBlockLength()}} and schedule proper blockId to 
the datanode.
{code}
CacheReplicationMonitor.java

  if (pendingCapacity < blockInfo.getNumBytes()) {
LOG.trace("Block {}: DataNode {} is not a valid possibility " +
"because the block has size {}, but the DataNode only has {} " +
"bytes of cache remaining ({} pending bytes, {} already cached.)",
blockInfo.getBlockId(), datanode.getDatanodeUuid(),
blockInfo.getNumBytes(), pendingCapacity, pendingBytes,
datanode.getCacheRemaining());
outOfCapacity++;
continue;
  }

for (DatanodeDescriptor datanode : chosen) {
  LOG.trace("Block {}: added to PENDING_CACHED on DataNode {}",
  blockInfo.getBlockId(), datanode.getDatanodeUuid());
  pendingCached.add(datanode);
  boolean added = datanode.getPendingCached().add(cachedBlock);
  assert added;
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9869) Erasure Coding: Rename replication-based names in BlockManager to more generic [part-2]

2016-02-28 Thread Rakesh R (JIRA)
Rakesh R created HDFS-9869:
--

 Summary: Erasure Coding: Rename replication-based names in 
BlockManager to more generic [part-2]
 Key: HDFS-9869
 URL: https://issues.apache.org/jira/browse/HDFS-9869
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


The idea of this jira is to rename the following entities in BlockManager as,
- {{PendingReplicationBlocks}} to {{PendingReconstructionBlocks}}
- {{excessReplicateMap}} to {{extraRedundancyMap}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9857) Erasure Coding: Rename replication-based names in BlockManager to more generic

2016-02-24 Thread Rakesh R (JIRA)
Rakesh R created HDFS-9857:
--

 Summary: Erasure Coding: Rename replication-based names in 
BlockManager to more generic
 Key: HDFS-9857
 URL: https://issues.apache.org/jira/browse/HDFS-9857
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


The idea of this jira is to rename the following entities in BlockManager as,
- {{UnderReplicatedBlocks}} to {{LowRedundancyBlocks}}
- {{PendingReplicationBlocks}} to {{PendingReconstructionBlocks}}
- {{neededReplications}} to {{neededReconstruction}}
- {{excessReplicateMap}} to {{extraRedundancyMap}}

Thanks [~zhz], [~andrew.wang] for the useful 
[discussions|https://issues.apache.org/jira/browse/HDFS-7955?focusedCommentId=15149406&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15149406]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9829) Erasure Coding: Improve few exception handling logic of ErasureCodingWorker

2016-02-18 Thread Rakesh R (JIRA)
Rakesh R created HDFS-9829:
--

 Summary: Erasure Coding: Improve few exception handling logic of 
ErasureCodingWorker
 Key: HDFS-9829
 URL: https://issues.apache.org/jira/browse/HDFS-9829
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R
Priority: Minor


# Cancel remaining reads on InterruptedException.
{code}
} catch (InterruptedException e) {
  LOG.info("Read data interrupted.", e);
  break;
}
{code}
# Shouldn't fail recontruction due to an IOException errors while reporting 
corrupt blocks.
{code}
  } finally {
// report corrupted blocks to NN
reportCorruptedBlocks(corruptionMap);
  }
{code}
# Also, use {} instead of string concatenation in logger.
{code}
LOG.debug("Using striped reads; pool threads=" + num);
//...
LOG.warn("Found Checksum error for " + reader.block + " from "
+ reader.source + " at " + e.getPos());
//...
LOG.debug("Using striped block reconstruction; pool threads=" + num);
//..
LOG.warn("Failed to reconstruct striped block: " + blockGroup, e);
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9775) Erasure Coding : Rename BlockRecoveryWork to BlockReconstructionWork

2016-02-07 Thread Rakesh R (JIRA)
Rakesh R created HDFS-9775:
--

 Summary: Erasure Coding : Rename BlockRecoveryWork to 
BlockReconstructionWork
 Key: HDFS-9775
 URL: https://issues.apache.org/jira/browse/HDFS-9775
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Rakesh R
Assignee: Rakesh R


This sub-task is to visit the block recovery work and make the logic as 
reconstruction. ie, rename "recovery" to "reconstruction"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9731) Erasure Coding: Improve naming of classes, methods, and variables related to EC recovery

2016-02-01 Thread Rakesh R (JIRA)
Rakesh R created HDFS-9731:
--

 Summary: Erasure Coding: Improve naming of classes, methods, and 
variables related to EC recovery
 Key: HDFS-9731
 URL: https://issues.apache.org/jira/browse/HDFS-9731
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: erasure-coding
Reporter: Rakesh R
Assignee: Rakesh R


This sub-task is to visit the EC recovery logic and make the logic as 
_reconstruction_. ie, rename EC-related block repair logic to "reconstruction"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9472) concat() API does not resolve the .reserved path

2015-11-26 Thread Rakesh R (JIRA)
Rakesh R created HDFS-9472:
--

 Summary: concat() API does not resolve the .reserved path
 Key: HDFS-9472
 URL: https://issues.apache.org/jira/browse/HDFS-9472
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Rakesh R
Assignee: Rakesh R


dfs#concat() API doesn't resolve the {{/.reserved/raw}} path.  For example, if 
the input paths of the form {{/.reserved/raw/ezone/a}} then this API doesn't 
work properly. IMHO will discuss here to support this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9435) TestBlockRecovery#testRBWReplicas is failing intermittently

2015-11-17 Thread Rakesh R (JIRA)
Rakesh R created HDFS-9435:
--

 Summary: TestBlockRecovery#testRBWReplicas is failing 
intermittently
 Key: HDFS-9435
 URL: https://issues.apache.org/jira/browse/HDFS-9435
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Rakesh R
Assignee: Rakesh R


TestBlockRecovery#testRBWReplicas is failing in the [build 
13536|https://builds.apache.org/job/PreCommit-HDFS-Build/13536/testReport/org.apache.hadoop.hdfs.server.datanode/TestBlockRecovery/testRBWReplicas/].
 It looks like bug in tests due to race condition.

Note: Attached logs taken from the build to this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9433) DFS getEZForPath API on a non-existent file should throw FileNotFoundException

2015-11-16 Thread Rakesh R (JIRA)
Rakesh R created HDFS-9433:
--

 Summary: DFS getEZForPath API on a non-existent file should throw 
FileNotFoundException
 Key: HDFS-9433
 URL: https://issues.apache.org/jira/browse/HDFS-9433
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: encryption
Reporter: Rakesh R
Assignee: Rakesh R


Presently {{dfs.getEZForPath()}} API is behaving differently for a non-existent 
normal file and non-existent ezone file:

- If user pass a normal non-existent file then it will return null value. For 
example, {{Path("/nonexistentfile")}}
- If user pass a non-existent file but which is under an existing encryption 
zone then it is returning the parent's encryption zone info. For example, 
{{Path("/ezone/nonexistentfile")}}

Here the proposed idea is to unify the behavior by throwing 
FileNotFoundException. Please refer the discussion 
[thread|https://issues.apache.org/jira/browse/HDFS-9348?focusedCommentId=14983301&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14983301].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9348) DFS GetErasureCodingPolicy API on a non-existent file should be handled properly

2015-10-30 Thread Rakesh R (JIRA)
Rakesh R created HDFS-9348:
--

 Summary: DFS GetErasureCodingPolicy API on a non-existent file 
should be handled properly
 Key: HDFS-9348
 URL: https://issues.apache.org/jira/browse/HDFS-9348
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Rakesh R
Assignee: Rakesh R
Priority: Minor


Presently calling {{dfs#getErasureCodingPolicy()}} on a non-existent file is 
returning the ErasureCodingPolicy info. As per the 
[discussion|https://issues.apache.org/jira/browse/HDFS-8777?focusedCommentId=14981077&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14981077]
 it has to validate and throw FileNotFoundException.

Also, {{dfs#getEncryptionZoneForPath()}} API has the same behavior. Again we 
can discuss to add the file existence validation in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9261) Erasure Coding: Skip encoding the data cells if all the parity data streamers are failed for the current block group

2015-10-16 Thread Rakesh R (JIRA)
Rakesh R created HDFS-9261:
--

 Summary: Erasure Coding: Skip encoding the data cells if all the 
parity data streamers are failed for the current block group
 Key: HDFS-9261
 URL: https://issues.apache.org/jira/browse/HDFS-9261
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R
Priority: Minor


{{DFSStripedOutputStream}} will continue writing with minimum number 
(dataBlockNum) of live datanodes. It won't replace the failed datanodes 
immediately for the current block group. Consider a case where all the parity 
data streamers are failed, now it is unnecessary to encode the data block cells 
and generate the parity data. This is a corner case where it can skip 
{{writeParityCells()}} step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9256) Erasure Coding: Improve failure handling of ECWorker striped block reconstruction

2015-10-16 Thread Rakesh R (JIRA)
Rakesh R created HDFS-9256:
--

 Summary: Erasure Coding: Improve failure handling of ECWorker 
striped block reconstruction
 Key: HDFS-9256
 URL: https://issues.apache.org/jira/browse/HDFS-9256
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


As we know reconstruction of missed striped block is a costly operation, it 
involves the following steps:-

step-1) read the data from minimum number of sources(remotely reading the data)
step-2) decode data for the targets (CPU cycles)
step-3) transfer the data to the targets(remotely writing the data)

Assume there is a failure in step-3 due to target DN disconnected or dead etc. 
Presently {{ECWorker}} is skipping the failed DN and continue transferring data 
to the other targets. In the next round, it should again start the 
reconstruction operation from first step. Considering the cost of 
reconstruction, it would be good to give another chance to retry the failed 
operation. The idea of this jira is to disucss the possible approaches and 
implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9185) TestRecoverStripedFile is failing

2015-09-30 Thread Rakesh R (JIRA)
Rakesh R created HDFS-9185:
--

 Summary: TestRecoverStripedFile is failing
 Key: HDFS-9185
 URL: https://issues.apache.org/jira/browse/HDFS-9185
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: erasure-coding
Reporter: Rakesh R
Assignee: Rakesh R
Priority: Critical


Below is the message taken from build:
{code}
Error Message

Time out waiting for EC block recovery.
Stacktrace

java.io.IOException: Time out waiting for EC block recovery.
at 
org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:383)
at 
org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:283)
at 
org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverAnyBlocks1(TestRecoverStripedFile.java:168)
{code}

Reference : https://builds.apache.org/job/PreCommit-HDFS-Build/12758



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9172) Erasure Coding: Move DFSStripedIO stream related classes to hadoop-hdfs-client

2015-09-28 Thread Rakesh R (JIRA)
Rakesh R created HDFS-9172:
--

 Summary: Erasure Coding: Move DFSStripedIO stream related classes 
to hadoop-hdfs-client
 Key: HDFS-9172
 URL: https://issues.apache.org/jira/browse/HDFS-9172
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


The idea of this jira is to move the striped stream related classes to 
{{hadoop-hdfs-client}} project. This will help to be in sync with the HDFS-6200 
proposal.

- DFSStripedInputStream
- DFSStripedOutputStream
- StripedDataStreamer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9091) Erasure Coding: Provide DistributedFilesystem API to getAllErasureCodingPolicies

2015-09-16 Thread Rakesh R (JIRA)
Rakesh R created HDFS-9091:
--

 Summary: Erasure Coding: Provide DistributedFilesystem API to 
getAllErasureCodingPolicies
 Key: HDFS-9091
 URL: https://issues.apache.org/jira/browse/HDFS-9091
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


This jira is to implement {{DFS#getAllErasureCodingPolicies()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8959) Provide an iterator-based API for listing all the snapshottable directories

2015-08-26 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8959:
--

 Summary: Provide an iterator-based API for listing all the 
snapshottable directories
 Key: HDFS-8959
 URL: https://issues.apache.org/jira/browse/HDFS-8959
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Rakesh R
Assignee: Rakesh R


Presently {{DistributedFileSystem#getSnapshottableDirListing()}} is sending all 
the {{SnapshottableDirectoryStatus[]}} array to the clients. Now the client 
should have enough space to hold it in memory. There could be chance that the 
client JVMs running out of memory because of this. Also, some time back there 
was a 
[comment|https://issues.apache.org/jira/browse/HDFS-8643?focusedCommentId=14658800&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14658800]
 about RPC packet limitation and a large number of snapshot list can again 
cause issues.

I believe iterator based {{DistributedFileSystem#listSnapshottableDirs()}} API 
would be a good addition!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8941) DistributedFileSystem listCorruptFileBlocks API should resolve relative path

2015-08-22 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8941:
--

 Summary: DistributedFileSystem listCorruptFileBlocks API should 
resolve relative path
 Key: HDFS-8941
 URL: https://issues.apache.org/jira/browse/HDFS-8941
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Rakesh R
Assignee: Rakesh R


Presently {{DFS#listCorruptFileBlocks(path)}} API is not resolving the given 
path relative to the workingDir. This jira is to discuss and provide the 
implementation of the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8899) Erasure Coding: use threadpool for EC recovery tasks

2015-08-14 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8899:
--

 Summary: Erasure Coding: use threadpool for EC recovery tasks
 Key: HDFS-8899
 URL: https://issues.apache.org/jira/browse/HDFS-8899
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


The idea is to use threadpool for processing erasure coding recovery tasks at 
the datanode.

{code}
new Daemon(new ReconstructAndTransferBlock(recoveryInfo)).start();
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8853) Erasure Coding: Provide ECSchema validation when creating ECZone

2015-08-03 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8853:
--

 Summary: Erasure Coding: Provide ECSchema validation when creating 
ECZone
 Key: HDFS-8853
 URL: https://issues.apache.org/jira/browse/HDFS-8853
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R


Presently the {{DFS#createErasureCodingZone(path, ecSchema, cellSize)}} doesn't 
have any validation that the given {{ecSchema}} is available in 
{{ErasureCodingSchemaManager#activeSchemas}} list. Now, if it doesn't exists 
then will create the ECZone with {{null}} schema. IMHO we could improve this by 
doing necessary basic sanity checks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8773) Few FSNamesystem metrics are not documented in the Metrics page

2015-07-14 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8773:
--

 Summary: Few FSNamesystem metrics are not documented in the 
Metrics page
 Key: HDFS-8773
 URL: https://issues.apache.org/jira/browse/HDFS-8773
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Reporter: Rakesh R
Assignee: Rakesh R


This jira is to document missing metrics in the [Metrics 
page|https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/Metrics.html#FSNamesystem].
 Following are not documented:
{code}
MissingReplOneBlocks
NumFilesUnderConstruction
NumActiveClients
HAState
FSState
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8721) Add a metric for number of encryption zones

2015-07-07 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8721:
--

 Summary: Add a metric for number of encryption zones
 Key: HDFS-8721
 URL: https://issues.apache.org/jira/browse/HDFS-8721
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: encryption
Reporter: Rakesh R
Assignee: Rakesh R


Would be good to expose the number of encryption zones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8648) Revisit FsDirectory#resolvePath() function usage to check the call is made under proper lock

2015-06-22 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8648:
--

 Summary: Revisit FsDirectory#resolvePath() function usage to check 
the call is made under proper lock
 Key: HDFS-8648
 URL: https://issues.apache.org/jira/browse/HDFS-8648
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


As per the 
[discussion|https://issues.apache.org/jira/browse/HDFS-8493?focusedCommentId=14595735&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14595735]
 in HDFS-8493 the function {{FsDirectory#resolvePath}} usage needs to be 
reviewed. It seems there are many places it has done the resolution 
{{fsd.resolvePath(pc, src, pathComponents);}} by acquiring only fsn lock and 
not fsd lock. As per the initial analysis following are such cases, probably it 
needs to filter out and fix wrong usage.
# FsDirAclOp.java
-> getAclStatus()
-> modifyAclEntries()
-> removeAcl()
-> removeDefaultAcl()
-> setAcl()
-> getAclStatus()
# FsDirDeleteOp.java
-> delete(fsn, src, recursive, logRetryCache)
# FsDirRenameOp.java
-> renameToInt(fsd, srcArg, dstArg, logRetryCache)
-> renameToInt(fsd, srcArg, dstArg, logRetryCache, options)
# FsDirStatAndListingOp.java
-> getContentSummary(fsd, src)
-> getFileInfo(fsd, srcArg, resolveLink)
-> isFileClosed(fsd, src)
-> getListingInt(fsd, srcArg, startAfter, needLocation)
# FsDirWriteFileOp.java
-> abandonBlock()
-> completeFile(fsn, pc, srcArg, holder, last, fileId)
-> getEncryptionKeyInfo(fsn, pc, src, supportedVersions)
-> startFile()
-> validateAddBlock()
# FsDirXAttrOp.java
-> getXAttrs(fsd, srcArg, xAttrs)
-> listXAttrs(fsd, src)
-> setXAttr(fsd, src, xAttr, flag, logRetryCache)
# FSNamesystem.java
-> createEncryptionZoneInt()
-> getEZForPath()

Thanks [~wheat9], [~vinayrpet] for the advice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8643) Add snapshot names list to SnapshottableDirectoryStatus

2015-06-22 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8643:
--

 Summary: Add snapshot names list to SnapshottableDirectoryStatus
 Key: HDFS-8643
 URL: https://issues.apache.org/jira/browse/HDFS-8643
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Rakesh R
Assignee: Rakesh R


The idea of this jira to enhance {{SnapshottableDirectoryStatus}} by adding 
{{snapshotNames}} attribute into it, presently it has the {{snapshotNumber}}. 
IMHO this would help the users to get the list of snapshot names created. Also, 
the snapshot names can be used while renaming or deleting the snapshots.

{code}
org.apache.hadoop.hdfs.protocol.SnapshottableDirectoryStatus.java

  /**
   * @return Snapshot names for the directory.
   */
  public List  getSnapshotNames() {
return snapshotNames;
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8642) Improve TestFileTruncate#setup by deleting the snapshots

2015-06-21 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8642:
--

 Summary: Improve TestFileTruncate#setup by deleting the snapshots
 Key: HDFS-8642
 URL: https://issues.apache.org/jira/browse/HDFS-8642
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Rakesh R
Assignee: Rakesh R
Priority: Minor


I've observed {{TestFileTruncate#setup()}} function has to be improved by 
making it more independent. Presently if any of the snapshots related test 
failures will affect all the subsequent unit test cases. One such error has 
been observed in the 
[Hadoop-Hdfs-trunk-2163|https://builds.apache.org/job/Hadoop-Hdfs-trunk/2163/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFileTruncate/testTruncateWithDataNodesRestart]

{code}
https://builds.apache.org/job/Hadoop-Hdfs-trunk/2163/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFileTruncate/testTruncateWithDataNodesRestart/

org.apache.hadoop.ipc.RemoteException: The directory /test cannot be deleted 
since /test is snapshottable and already has snapshots
at 
org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.checkSnapshot(FSDirSnapshotOp.java:226)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:54)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.deleteInternal(FSDirDeleteOp.java:177)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:104)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3046)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:939)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:608)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2172)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2166)

at org.apache.hadoop.ipc.Client.call(Client.java:1440)
at org.apache.hadoop.ipc.Client.call(Client.java:1371)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy22.delete(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:540)
at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
at com.sun.proxy.$Proxy23.delete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1711)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:718)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:714)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:714)
at 
org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.setup(TestFileTruncate.java:119)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8632) Erasure Coding: Add InterfaceAudience annotation to the erasure coding classes

2015-06-18 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8632:
--

 Summary: Erasure Coding: Add InterfaceAudience annotation to the 
erasure coding classes
 Key: HDFS-8632
 URL: https://issues.apache.org/jira/browse/HDFS-8632
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


I've noticed some of the erasure coding classes missing {{@InterfaceAudience}} 
annotation. It would be good to identify the classes and add proper annotation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8606) Cleanup DFSOutputStream by removing unwanted changes

2015-06-15 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8606:
--

 Summary: Cleanup DFSOutputStream by removing unwanted changes
 Key: HDFS-8606
 URL: https://issues.apache.org/jira/browse/HDFS-8606
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Rakesh R
Assignee: Rakesh R


This jira is to clean up few changes done as part of HDFS-8386. As per 
[~szetszwo] comments, it will affect the write performance. Please see the 
discussion 
[here|https://issues.apache.org/jira/browse/HDFS-8386?focusedCommentId=14575386&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14575386]

Needs to do the following changes as part of this jira:
#  remove “synchronized" from getStreamer() since it may unnecessarily block 
the caller
# remove setStreamer(..) which is currently not used. We may add it in the 
HDFS-7285 branch and see how to do synchronization correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-3854) Implement a fence method which should fence the BK shared storage.

2015-06-14 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R resolved HDFS-3854.

Resolution: Duplicate

> Implement a fence method which should fence the BK shared storage.
> --
>
> Key: HDFS-3854
> URL: https://issues.apache.org/jira/browse/HDFS-3854
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Uma Maheswara Rao G
>Assignee: Rakesh R
>
> Currently when machine down or network down, SSHFence can not ensure that, 
> other node is completely down. So, fence will fail and switch will not happen.
> [ internally we did work around to return true when machine is not reachable, 
> as BKJM already has fencing]
> It may be good idea to implement a fence method, which should ensure shared 
> storage fenced propertly and return true.
> We can plug in this new method in ZKFC fence methods.
> only pain points what I can see is, we may have to put the BKJM jar in ZKFC 
> lib for running this fence method.
> thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8568) TestClusterId is failing

2015-06-09 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8568:
--

 Summary: TestClusterId is failing
 Key: HDFS-8568
 URL: https://issues.apache.org/jira/browse/HDFS-8568
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Rakesh R
Assignee: Rakesh R


It fails with the below exception:

{code}
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.hdfs.server.namenode.TestClusterId.testFormatWithEmptyClusterIdOption(TestClusterId.java:292)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8550) Erasure Coding: Fix FindBugs Multithreaded correctness Warning

2015-06-05 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8550:
--

 Summary: Erasure Coding: Fix FindBugs Multithreaded correctness 
Warning
 Key: HDFS-8550
 URL: https://issues.apache.org/jira/browse/HDFS-8550
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


Findbug warning:- Inconsistent synchronization of 
org.apache.hadoop.hdfs.DFSOutputStream.streamer; locked 89% of time
{code}
Bug type IS2_INCONSISTENT_SYNC (click for details) 
In class org.apache.hadoop.hdfs.DFSOutputStream
Field org.apache.hadoop.hdfs.DFSOutputStream.streamer
Synchronized 89% of the time
Unsynchronized access at DFSOutputStream.java:[line 146]
Unsynchronized access at DFSOutputStream.java:[line 859]
Unsynchronized access at DFSOutputStream.java:[line 627]
Unsynchronized access at DFSOutputStream.java:[line 630]
Unsynchronized access at DFSOutputStream.java:[line 640]
Unsynchronized access at DFSOutputStream.java:[line 342]
Unsynchronized access at DFSOutputStream.java:[line 744]
Unsynchronized access at DFSOutputStream.java:[line 903]
Synchronized access at DFSOutputStream.java:[line 737]
Synchronized access at DFSOutputStream.java:[line 913]
Synchronized access at DFSOutputStream.java:[line 726]
Synchronized access at DFSOutputStream.java:[line 756]
Synchronized access at DFSOutputStream.java:[line 762]
Synchronized access at DFSOutputStream.java:[line 757]
Synchronized access at DFSOutputStream.java:[line 758]
Synchronized access at DFSOutputStream.java:[line 762]
Synchronized access at DFSOutputStream.java:[line 483]
Synchronized access at DFSOutputStream.java:[line 486]
Synchronized access at DFSOutputStream.java:[line 717]
Synchronized access at DFSOutputStream.java:[line 719]
Synchronized access at DFSOutputStream.java:[line 722]
Synchronized access at DFSOutputStream.java:[line 408]
Synchronized access at DFSOutputStream.java:[line 408]
Synchronized access at DFSOutputStream.java:[line 423]
Synchronized access at DFSOutputStream.java:[line 426]
Synchronized access at DFSOutputStream.java:[line 411]
Synchronized access at DFSOutputStream.java:[line 452]
Synchronized access at DFSOutputStream.java:[line 452]
Synchronized access at DFSOutputStream.java:[line 439]
Synchronized access at DFSOutputStream.java:[line 439]
Synchronized access at DFSOutputStream.java:[line 439]
Synchronized access at DFSOutputStream.java:[line 670]
Synchronized access at DFSOutputStream.java:[line 580]
Synchronized access at DFSOutputStream.java:[line 574]
Synchronized access at DFSOutputStream.java:[line 592]
Synchronized access at DFSOutputStream.java:[line 583]
Synchronized access at DFSOutputStream.java:[line 581]
Synchronized access at DFSOutputStream.java:[line 621]
Synchronized access at DFSOutputStream.java:[line 609]
Synchronized access at DFSOutputStream.java:[line 621]
Synchronized access at DFSOutputStream.java:[line 597]
Synchronized access at DFSOutputStream.java:[line 612]
Synchronized access at DFSOutputStream.java:[line 597]
Synchronized access at DFSOutputStream.java:[line 588]
Synchronized access at DFSOutputStream.java:[line 624]
Synchronized access at DFSOutputStream.java:[line 612]
Synchronized access at DFSOutputStream.java:[line 588]
Synchronized access at DFSOutputStream.java:[line 632]
Synchronized access at DFSOutputStream.java:[line 632]
Synchronized access at DFSOutputStream.java:[line 616]
Synchronized access at DFSOutputStream.java:[line 633]
Synchronized access at DFSOutputStream.java:[line 657]
Synchronized access at DFSOutputStream.java:[line 658]
Synchronized access at DFSOutputStream.java:[line 695]
Synchronized access at DFSOutputStream.java:[line 698]
Synchronized access at DFSOutputStream.java:[line 784]
Synchronized access at DFSOutputStream.java:[line 795]
Synchronized access at DFSOutputStream.java:[line 801]
Synchronized access at DFSOutputStream.java:[line 155]
Synchronized access at DFSOutputStream.java:[line 158]
Synchronized access at DFSOutputStream.java:[line 433]
Synchronized access at DFSOutputStream.java:[line 886]
Synchronized access at DFSOutputStream.java:[line 463]
Synchronized access at DFSOutputStream.java:[line 469]
Synchronized access at DFSOutputStream.java:[line 463]
Synchronized access at DFSOutputStream.java:[line 470]
Synchronized access at DFSOutputStream.java:[line 465]
Synchronized access at DFSOutputStream.java:[line 749]
Synchronized access at DFSStripedOutputStream.java:[line 260]
Synchronized access at DFSStripedOutputStream.java:[line 325]
Synchronized access at DFSStripedOutputStream.java:[line 325]
Synchronized access at DFSStripedOutputStream.java:[line 335]
Synchronized access at DFSStripedOutputStream.java:[line 264]
Synchronized access at DFSStripedOutputStream.java:[line 511]
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8532) Make the visibility of DFSOutputStream#streamer member variable to private

2015-06-03 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8532:
--

 Summary: Make the visibility of DFSOutputStream#streamer member 
variable to private
 Key: HDFS-8532
 URL: https://issues.apache.org/jira/browse/HDFS-8532
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Rakesh R
Assignee: Rakesh R
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8495) Consolidate append() related implementation into a single class

2015-05-28 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8495:
--

 Summary: Consolidate append() related implementation into a single 
class
 Key: HDFS-8495
 URL: https://issues.apache.org/jira/browse/HDFS-8495
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Rakesh R
Assignee: Rakesh R


This jira proposes to consolidate {{FSNamesystem#append()}} related methods 
into a single class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8450) Erasure Coding: Consolidate erasure coding zone related implementation into a single class

2015-05-20 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8450:
--

 Summary: Erasure Coding: Consolidate erasure coding zone related 
implementation into a single class
 Key: HDFS-8450
 URL: https://issues.apache.org/jira/browse/HDFS-8450
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


The idea is to follow the same pattern suggested by HDFS-7416. It is good  to 
consolidate all the erasure coding zone related implementations of 
{{FSNamesystem}}. Here, proposing {{FSDirErasureCodingZoneOp}} class to have 
functions to perform related erasure coding zone operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8420) Erasure Coding: ECZoneManager#getECZoneInfo is not resolving the path properly if zone dir itself is the snapshottable dir

2015-05-18 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8420:
--

 Summary: Erasure Coding: ECZoneManager#getECZoneInfo is not 
resolving the path properly if zone dir itself is the snapshottable dir
 Key: HDFS-8420
 URL: https://issues.apache.org/jira/browse/HDFS-8420
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


Presently the resultant zone dir will come with {{.snapshot}} only when the 
zone dir itself is snapshottable dir. It will return the path including the 
snapshot name like, {{/zone/.snapshot/snap1}}. Instead could improve this by 
returning only path {{/zone}}.

Thanks [~vinayrpet] for the helpful 
[discussion|https://issues.apache.org/jira/browse/HDFS-8266?focusedCommentId=14543821&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14543821]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8399) Erasure Coding: BlockManager is unnecessarily computing recovery work for the deleted blocks

2015-05-14 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8399:
--

 Summary: Erasure Coding: BlockManager is unnecessarily computing 
recovery work for the deleted blocks
 Key: HDFS-8399
 URL: https://issues.apache.org/jira/browse/HDFS-8399
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


Following exception occurred in the {{ReplicationMonitor}}. As per the initial 
analysis, I could see the exception is coming for the blocks of the deleted 
file.
{code}
2015-05-14 14:14:40,485 FATAL util.ExitUtil (ExitUtil.java:terminate(127)) - 
Terminate called
org.apache.hadoop.util.ExitUtil$ExitException: java.lang.AssertionError: 
Absolute path required
at 
org.apache.hadoop.hdfs.server.namenode.INode.getPathNames(INode.java:744)
at 
org.apache.hadoop.hdfs.server.namenode.INode.getPathComponents(INode.java:723)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.getINodesInPath(FSDirectory.java:1655)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getECSchemaForPath(FSNamesystem.java:8435)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeRecoveryWorkForBlocks(BlockManager.java:1572)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockRecoveryWork(BlockManager.java:1402)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3894)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3846)
at java.lang.Thread.run(Thread.java:722)

at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3865)
at java.lang.Thread.run(Thread.java:722)
Exception in thread 
"org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@1255079"
 org.apache.hadoop.util.ExitUtil$ExitException: java.lang.AssertionError: 
Absolute path required
at 
org.apache.hadoop.hdfs.server.namenode.INode.getPathNames(INode.java:744)
at 
org.apache.hadoop.hdfs.server.namenode.INode.getPathComponents(INode.java:723)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.getINodesInPath(FSDirectory.java:1655)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getECSchemaForPath(FSNamesystem.java:8435)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeRecoveryWorkForBlocks(BlockManager.java:1572)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockRecoveryWork(BlockManager.java:1402)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3894)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3846)
at java.lang.Thread.run(Thread.java:722)

at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3865)
at java.lang.Thread.run(Thread.java:722)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8387) Revisit the long and int datatypes usage in striping logic

2015-05-12 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8387:
--

 Summary: Revisit the long and int datatypes usage in striping logic
 Key: HDFS-8387
 URL: https://issues.apache.org/jira/browse/HDFS-8387
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


This idea of this jira is to revisit the usage of {{long}} and {{int}} data 
types in the striping logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8386) Improve synchronization of 'streamer' reference in DFSOutputStream - accessed inconsistently with respect to synchronization

2015-05-12 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8386:
--

 Summary: Improve synchronization of 'streamer' reference in 
DFSOutputStream - accessed inconsistently with respect to synchronization
 Key: HDFS-8386
 URL: https://issues.apache.org/jira/browse/HDFS-8386
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: Rakesh R
Assignee: Rakesh R


Presently {{DFSOutputStream#streamer}} object reference is accessed 
inconsistently with respect to synchronization. It would be good to improve 
this part. This has been noticed when implementing the erasure coding feature.

Please refer the related [discussion 
thread|https://issues.apache.org/jira/browse/HDFS-8294?focusedCommentId=14541411&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14541411]
 in the jira HDFS-8294 for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8378) Erasure Coding: Few improvements for the erasure coding worker

2015-05-12 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8378:
--

 Summary: Erasure Coding: Few improvements for the erasure coding 
worker
 Key: HDFS-8378
 URL: https://issues.apache.org/jira/browse/HDFS-8378
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R
Priority: Minor


# Following log is confusing, make it tidy. Its missing {{break;}} statement 
and causing this unwanted logs.
{code}
2015-05-10 15:06:45,878 INFO  datanode.DataNode 
(BPOfferService.java:processCommandFromActive(728)) - DatanodeCommand action: 
DNA_ERASURE_CODING_RECOVERY
2015-05-10 15:06:45,879 WARN  datanode.DataNode 
(BPOfferService.java:processCommandFromActive(732)) - Unknown DatanodeCommand 
action: 11
{code}
# Add exception trace to the log, would improve debuggability
{code}
} catch (Throwable e) {
   LOG.warn("Failed to recover striped block: " + blockGroup);
}
{code}
# Make member variables present in ErasureCodingWorker, 
ReconstructAndTransferBlock, StripedReader {{private}} {{final}}
# Correct spelling of the variable {{STRIPED_READ_TRHEAD_POOL}} to 
{{STRIPED_READ_THREAD_POOL}}
# Good to add debug logs to print the striped read pool size
{code}
LOG.debug("Using striped reads; pool threads=" + num);
{code}
# Add meaningful message to the precondition check:
{code}
Preconditions.checkArgument(liveIndices.length == sources.length);
{code}
# Remove unused import
{code}
import org.apache.hadoop.hdfs.server.common.HdfsServerConstants;
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8370) Erasure Coding: TestRecoverStripedFile#testRecoverOneParityBlock is failing

2015-05-11 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8370:
--

 Summary: Erasure Coding: 
TestRecoverStripedFile#testRecoverOneParityBlock is failing
 Key: HDFS-8370
 URL: https://issues.apache.org/jira/browse/HDFS-8370
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


This jira is to analyse more on the failure of this unit test. 

{code}
java.io.IOException: Time out waiting for EC block recovery.
at 
org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:333)
at 
org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:234)
at 
org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverOneParityBlock(TestRecoverStripedFile.java:98)
{code}

Exception occurred during recovery packet transferring:
{code}
2015-05-09 15:08:08,910 INFO  datanode.DataNode 
(BlockReceiver.java:receiveBlock(826)) - Exception for 
BP-1332677436-67.195.81.147-1431184082022:blk_-9223372036854775792_1001
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:203)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:787)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:803)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250)
at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8368) Erasure Coding: DFS opening a non-existent file need to be handled properly

2015-05-11 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8368:
--

 Summary: Erasure Coding: DFS opening a non-existent file need to 
be handled properly
 Key: HDFS-8368
 URL: https://issues.apache.org/jira/browse/HDFS-8368
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


This jira to address bad exceptions when opening a non-existent file. It throws 
NPE as shown below:

{code}
java.lang.NullPointerException: null
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1184)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:307)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:303)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:303)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:767)
at 
org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSClient(TestDistributedFileSystem.java:359)
at 
org.apache.hadoop.hdfs.TestDistributedFileSystem.testAllWithNoXmlDefaults(TestDistributedFileSystem.java:666)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8332) DistributedFileSystem listCacheDirectives() and listCachePools() API calls should check filesystem closed

2015-05-05 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8332:
--

 Summary: DistributedFileSystem listCacheDirectives() and 
listCachePools() API calls should check filesystem closed
 Key: HDFS-8332
 URL: https://issues.apache.org/jira/browse/HDFS-8332
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Rakesh R
Assignee: Rakesh R


I could see {{listCacheDirectives()}} and {{listCachePools()}} APIs can be 
called even after the filesystem close. Instead these calls should do 
{{checkOpen}} and throws:
{code}
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:464)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8331) Erasure Coding: Create FileStatus isErasureCoded() method

2015-05-05 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8331:
--

 Summary: Erasure Coding: Create FileStatus isErasureCoded() method
 Key: HDFS-8331
 URL: https://issues.apache.org/jira/browse/HDFS-8331
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


The idea of this jira is to discuss the need of {{FileStatus#isErasureCoded()}} 
API. This is just an initial thought, presently the use case/necessity of this 
is not clear now. Probably will revisit this once the feature is getting 
matured.  
Thanks [~umamaheswararao], [~vinayrpet] , [~zhz] for the offline discussions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8294) Erasure Coding: Fix Findbug warnings present in erasure coding

2015-04-29 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8294:
--

 Summary: Erasure Coding: Fix Findbug warnings present in erasure 
coding
 Key: HDFS-8294
 URL: https://issues.apache.org/jira/browse/HDFS-8294
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8275) Erasure Coding: Implement batched listing of enrasure coding zones

2015-04-27 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8275:
--

 Summary: Erasure Coding: Implement batched listing of enrasure 
coding zones
 Key: HDFS-8275
 URL: https://issues.apache.org/jira/browse/HDFS-8275
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


The idea of the jira is to provide batch API in {{DistributedFileSystem}} to 
list the {{ECZoneInfo}}.

API signature:-
{code}
  /**
  * List all ErasureCoding zones. Incrementally fetches results from the server.
  */
  public RemoteIterator listErasureCodingZones() throws IOException 
{
return dfs.listErasureCodingZones();
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >