from:"Hanisha Koneru \(JIRA\)"

[jira] [Created] (HDFS-15467) ObserverReadProxyProvider should skip logging first failover from each proxy

2020-07-13 Thread Hanisha Koneru (Jira)

Hanisha Koneru created HDFS-15467:
-

 Summary: ObserverReadProxyProvider should skip logging first 
failover from each proxy
 Key: HDFS-15467
 URL: https://issues.apache.org/jira/browse/HDFS-15467
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: Hanisha Koneru


After HADOOP-17116, \{{RetryInvocationHandler}} skips logging the first 
failover INFO message from each proxy. But {{ObserverReadProxyProvider}} uses 
{{combinedProxy}} object which combines all proxies into one and assigns 
{{combinedInfo}} as the ProxyInfo.
{noformat}
ObserverReadProxyProvider# Lines 197-207:

for (int i = 0; i < nameNodeProxies.size(); i++) {
  if (i > 0) {
combinedInfo.append(",");
  }
  combinedInfo.append(nameNodeProxies.get(i).proxyInfo);
}
combinedInfo.append(']');
T wrappedProxy = (T) Proxy.newProxyInstance(
ObserverReadInvocationHandler.class.getClassLoader(),
new Class[] {xface}, new ObserverReadInvocationHandler());
combinedProxy = new ProxyInfo<>(wrappedProxy, combinedInfo.toString()){noformat}
{{RetryInvocationHandler}} depends on the {{ProxyInfo}} to differentiate 
between proxies while checking if failover from that proxy happened before. And 
since combined proxy has only 1 proxy, HADOOP-17116 doesn't work on 
{{ObserverReadProxyProvider.}}It would need to handled separately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15154) Allow only hdfs superusers the ability to assign HDFS storage policies

2020-03-11 Thread Hanisha Koneru (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057484#comment-17057484
 ] 

Hanisha Koneru commented on HDFS-15154:
---

[~swagle], patch LGTM overall. Few comments:

* In StoragePolicySatisfyManager also we should call 
DFSUtil#getDfsStoragePolicySetting in case the deprecated config is set. 
* We might have to change the following log messages to indicate that either 
DFS_STORAGE_POLICIES_ENABLED_KEY is Disabled or DFS_STORAGE_POLICY_ENABLED_KEY 
is set to false.
{code:java}
LOG.info("Failed to change storage policy satisfier as {} set to {}.",
 DFSConfigKeys.DFS_STORAGE_POLICIES_ENABLED_KEY,
 DFSConfigKeys.DfsStoragePolicySetting.DISABLED);{code}
* We could probably add a new method in DFSUtil to check if StoragePolicy is 
enabled as that check is done in multiple places.

> Allow only hdfs superusers the ability to assign HDFS storage policies
> --
>
> Key: HDFS-15154
> URL: https://issues.apache.org/jira/browse/HDFS-15154
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: Bob Cauthen
>Assignee: Siddharth Wagle
>Priority: Major
> Attachments: HDFS-15154.01.patch, HDFS-15154.02.patch, 
> HDFS-15154.03.patch, HDFS-15154.04.patch, HDFS-15154.05.patch, 
> HDFS-15154.06.patch, HDFS-15154.07.patch, HDFS-15154.08.patch, 
> HDFS-15154.09.patch
>
>
> Please provide a way to limit only HDFS superusers the ability to assign HDFS 
> Storage Policies to HDFS directories.
> Currently, and based on Jira HDFS-7093, all storage policies can be disabled 
> cluster wide by setting the following:
> dfs.storage.policy.enabled to false
> But we need a way to allow only HDFS superusers the ability to assign an HDFS 
> Storage Policy to an HDFS directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14612) SlowDiskReport won't update when SlowDisks is always empty in heartbeat

2020-03-10 Thread Hanisha Koneru (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056144#comment-17056144
 ] 

Hanisha Koneru commented on HDFS-14612:
---

Overall the patch LGTM.
Current place of calling checkAndUpdateReportIfNecessary seems correct.

> SlowDiskReport won't update when SlowDisks is always empty in heartbeat
> ---
>
> Key: HDFS-14612
> URL: https://issues.apache.org/jira/browse/HDFS-14612
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haibin Huang
>Assignee: Haibin Huang
>Priority: Major
> Attachments: HDFS-14612-001.patch, HDFS-14612-002.patch, 
> HDFS-14612-003.patch, HDFS-14612-004.patch, HDFS-14612-005.patch, 
> HDFS-14612-006.patch, HDFS-14612-007.patch, HDFS-14612.patch
>
>
> I found SlowDiskReport won't update when slowDisks is always empty in 
> org.apache.hadoop.hdfs.server.blockmanagement.*handleHeartbeat*, this may 
> lead to outdated SlowDiskReport alway staying in jmx of namenode until next 
> time slowDisks isn't empty. So i think this method 
> *checkAndUpdateReportIfNecessary()* should be called firstly when we want to 
> get the jmx information about SlowDiskReport, this can keep the 
> SlowDiskReport on jmx alway valid. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14951) KMS Jetty server does not startup if trust store password is null

2019-11-21 Thread Hanisha Koneru (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979803#comment-16979803
 ] 

Hanisha Koneru commented on HDFS-14951:
---

Thank you [~smeng] and [~weichiu] for the reviews.
I have updated the patch and also included some unit tests.

> KMS Jetty server does not startup if trust store password is null
> -
>
> Key: HDFS-14951
> URL: https://issues.apache.org/jira/browse/HDFS-14951
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-14951.001.patch, HDFS-14951.002.patch
>
>
> In HttpServe2, if the trustStore is set but the trust store password is not, 
> then we set the TrustStorePassword of SSLContextFactory to null. This results 
> in the Jetty server not starting up.
> {code:java}
> In HttpServer2#createHttpsChannelConnector(),
> if (trustStore != null) {
>   sslContextFactory.setTrustStorePath(trustStore);
>   sslContextFactory.setTrustStoreType(trustStoreType);
>   sslContextFactory.setTrustStorePassword(trustStorePassword);
> }
> {code}
> Before setting the trust store password, we should check that it is not null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14951) KMS Jetty server does not startup if trust store password is null

2019-11-21 Thread Hanisha Koneru (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-14951:
--
Attachment: HDFS-14951.002.patch

> KMS Jetty server does not startup if trust store password is null
> -
>
> Key: HDFS-14951
> URL: https://issues.apache.org/jira/browse/HDFS-14951
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-14951.001.patch, HDFS-14951.002.patch
>
>
> In HttpServe2, if the trustStore is set but the trust store password is not, 
> then we set the TrustStorePassword of SSLContextFactory to null. This results 
> in the Jetty server not starting up.
> {code:java}
> In HttpServer2#createHttpsChannelConnector(),
> if (trustStore != null) {
>   sslContextFactory.setTrustStorePath(trustStore);
>   sslContextFactory.setTrustStoreType(trustStoreType);
>   sslContextFactory.setTrustStorePassword(trustStorePassword);
> }
> {code}
> Before setting the trust store password, we should check that it is not null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-2595) Update Ratis version to latest snapshot version

2019-11-20 Thread Hanisha Koneru (Jira)

Hanisha Koneru created HDDS-2595:


 Summary: Update Ratis version to latest snapshot version
 Key: HDDS-2595
 URL: https://issues.apache.org/jira/browse/HDDS-2595
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru


Update Ratis dependency version to latest snapshot ( 
[ce699ba|https://github.com/apache/incubator-ratis/commit/ce699ba] ), to avoid 
out of memory exceptions (RATIS-714).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2474) Remove OzoneClient exception Precondition check

2019-11-13 Thread Hanisha Koneru (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-2474:
-
Status: Patch Available  (was: Open)

> Remove OzoneClient exception Precondition check
> ---
>
> Key: HDDS-2474
> URL: https://issues.apache.org/jira/browse/HDDS-2474
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If RaftCleintReply encounters an exception other than NotLeaderException, 
> NotReplicatedException, StateMachineException or LeaderNotReady, then it sets 
> success to false but there is no exception set. This causes a Precondition 
> check failure in XceiverClientRatis which expects that there should be an 
> exception if success=false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-2474) Remove OzoneClient exception Precondition check

2019-11-13 Thread Hanisha Koneru (Jira)

Hanisha Koneru created HDDS-2474:


 Summary: Remove OzoneClient exception Precondition check
 Key: HDDS-2474
 URL: https://issues.apache.org/jira/browse/HDDS-2474
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru


If RaftCleintReply encounters an exception other than NotLeaderException, 
NotReplicatedException, StateMachineException or LeaderNotReady, then it sets 
success to false but there is no exception set. This causes a Precondition 
check failure in XceiverClientRatis which expects that there should be an 
exception if success=false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDDS-2392) Fix TestScmSafeMode#testSCMSafeModeRestrictedOp

2019-11-13 Thread Hanisha Koneru (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973716#comment-16973716
 ] 

Hanisha Koneru edited comment on HDDS-2392 at 11/13/19 9:30 PM:


Thank you [~avijayan].
This issue is fixed by RATIS-747


was (Author: hanishakoneru):
Fixed by RATIS-747

> Fix TestScmSafeMode#testSCMSafeModeRestrictedOp
> ---
>
> Key: HDDS-2392
> URL: https://issues.apache.org/jira/browse/HDDS-2392
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Blocker
>
> After ratis upgrade (HDDS-2340), TestScmSafeMode#testSCMSafeModeRestrictedOp 
> fails as the DNs fail to restart XceiverServerRatis. 
> RaftServer#start() fails with following exception:
> {code:java}
> java.io.IOException: java.lang.IllegalStateException: Not started
>   at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
>   at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
>   at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:70)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:284)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:296)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:421)
>   at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:215)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:110)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: Not started
>   at 
> org.apache.ratis.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:504)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.getPort(ServerImpl.java:176)
>   at 
> org.apache.ratis.grpc.server.GrpcService.lambda$new$2(GrpcService.java:143)
>   at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
>   at 
> org.apache.ratis.grpc.server.GrpcService.getInetSocketAddress(GrpcService.java:182)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.lambda$new$0(RaftServerImpl.java:84)
>   at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.getPeer(RaftServerImpl.java:136)
>   at 
> org.apache.ratis.server.impl.RaftServerMetrics.(RaftServerMetrics.java:70)
>   at 
> org.apache.ratis.server.impl.RaftServerMetrics.getRaftServerMetrics(RaftServerMetrics.java:62)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:119)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDDS-2392) Fix TestScmSafeMode#testSCMSafeModeRestrictedOp

2019-11-13 Thread Hanisha Koneru (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru resolved HDDS-2392.
--
Resolution: Fixed

Fixed by RATIS-747

> Fix TestScmSafeMode#testSCMSafeModeRestrictedOp
> ---
>
> Key: HDDS-2392
> URL: https://issues.apache.org/jira/browse/HDDS-2392
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Blocker
>
> After ratis upgrade (HDDS-2340), TestScmSafeMode#testSCMSafeModeRestrictedOp 
> fails as the DNs fail to restart XceiverServerRatis. 
> RaftServer#start() fails with following exception:
> {code:java}
> java.io.IOException: java.lang.IllegalStateException: Not started
>   at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
>   at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
>   at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:70)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:284)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:296)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:421)
>   at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:215)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:110)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: Not started
>   at 
> org.apache.ratis.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:504)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.getPort(ServerImpl.java:176)
>   at 
> org.apache.ratis.grpc.server.GrpcService.lambda$new$2(GrpcService.java:143)
>   at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
>   at 
> org.apache.ratis.grpc.server.GrpcService.getInetSocketAddress(GrpcService.java:182)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.lambda$new$0(RaftServerImpl.java:84)
>   at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.getPeer(RaftServerImpl.java:136)
>   at 
> org.apache.ratis.server.impl.RaftServerMetrics.(RaftServerMetrics.java:70)
>   at 
> org.apache.ratis.server.impl.RaftServerMetrics.getRaftServerMetrics(RaftServerMetrics.java:62)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:119)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-2468) scmcli close pipeline command not working

2019-11-13 Thread Hanisha Koneru (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973647#comment-16973647
 ] 

Hanisha Koneru commented on HDDS-2468:
--

Deactivate Pipeline is also failing with the same error.

> scmcli close pipeline command not working
> -
>
> Key: HDDS-2468
> URL: https://issues.apache.org/jira/browse/HDDS-2468
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Rajesh Balamohan
>Assignee: Nanda kumar
>Priority: Major
>
> Close pipeline command is failing with the following exception
> {noformat}
> java.lang.IllegalArgumentException: Unknown command type: ClosePipeline
>   at 
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.processRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:219)
>   at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
>   at 
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.submitRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:112)
>   at 
> org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java:29883)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2454) Improve OM HA robot tests

2019-11-08 Thread Hanisha Koneru (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-2454:
-
Status: Patch Available  (was: Open)

> Improve OM HA robot tests
> -
>
> Key: HDDS-2454
> URL: https://issues.apache.org/jira/browse/HDDS-2454
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In one CI run, testOMHA.robot failed because robot framework SSH commands 
> failed. This Jira aims to verify that the command execution succeeds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2454) Improve OM HA robot tests

2019-11-08 Thread Hanisha Koneru (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-2454:
-
Issue Type: Improvement  (was: Bug)

> Improve OM HA robot tests
> -
>
> Key: HDDS-2454
> URL: https://issues.apache.org/jira/browse/HDDS-2454
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>
> In one CI run, testOMHA.robot failed because robot framework SSH commands 
> failed. This Jira aims to verify that the command execution succeeds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-2454) Improve OM HA robot tests

2019-11-08 Thread Hanisha Koneru (Jira)

Hanisha Koneru created HDDS-2454:


 Summary: Improve OM HA robot tests
 Key: HDDS-2454
 URL: https://issues.apache.org/jira/browse/HDDS-2454
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru


In one CI run, testOMHA.robot failed because robot framework SSH commands 
failed. This Jira aims to verify that the command execution succeeds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-2392) Fix TestScmSafeMode#testSCMSafeModeRestrictedOp

2019-11-06 Thread Hanisha Koneru (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968911#comment-16968911
 ] 

Hanisha Koneru commented on HDDS-2392:
--

The root cause for this is RATIS-649.

RaftServer#start fails because {{RaftServerMetrics}} initialization is failing. 
[More 
details|https://issues.apache.org/jira/browse/RATIS-649?focusedCommentId=16968910=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16968910]
 about the root cause is posted in RATIS-649.

cc. [~avijayan], [~shashikant]

> Fix TestScmSafeMode#testSCMSafeModeRestrictedOp
> ---
>
> Key: HDDS-2392
> URL: https://issues.apache.org/jira/browse/HDDS-2392
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Blocker
>
> After ratis upgrade (HDDS-2340), TestScmSafeMode#testSCMSafeModeRestrictedOp 
> fails as the DNs fail to restart XceiverServerRatis. 
> RaftServer#start() fails with following exception:
> {code:java}
> java.io.IOException: java.lang.IllegalStateException: Not started
>   at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
>   at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
>   at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:70)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:284)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:296)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:421)
>   at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:215)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:110)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: Not started
>   at 
> org.apache.ratis.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:504)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.getPort(ServerImpl.java:176)
>   at 
> org.apache.ratis.grpc.server.GrpcService.lambda$new$2(GrpcService.java:143)
>   at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
>   at 
> org.apache.ratis.grpc.server.GrpcService.getInetSocketAddress(GrpcService.java:182)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.lambda$new$0(RaftServerImpl.java:84)
>   at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.getPeer(RaftServerImpl.java:136)
>   at 
> org.apache.ratis.server.impl.RaftServerMetrics.(RaftServerMetrics.java:70)
>   at 
> org.apache.ratis.server.impl.RaftServerMetrics.getRaftServerMetrics(RaftServerMetrics.java:62)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:119)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDDS-2392) Fix TestScmSafeMode#testSCMSafeModeRestrictedOp

2019-11-04 Thread Hanisha Koneru (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru reassigned HDDS-2392:


Assignee: Hanisha Koneru

> Fix TestScmSafeMode#testSCMSafeModeRestrictedOp
> ---
>
> Key: HDDS-2392
> URL: https://issues.apache.org/jira/browse/HDDS-2392
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>
> After ratis upgrade (HDDS-2340), TestScmSafeMode#testSCMSafeModeRestrictedOp 
> fails as the DNs fail to restart XceiverServerRatis. 
> RaftServer#start() fails with following exception:
> {code:java}
> java.io.IOException: java.lang.IllegalStateException: Not started
>   at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
>   at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
>   at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:70)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:284)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:296)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:421)
>   at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:215)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:110)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: Not started
>   at 
> org.apache.ratis.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:504)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.getPort(ServerImpl.java:176)
>   at 
> org.apache.ratis.grpc.server.GrpcService.lambda$new$2(GrpcService.java:143)
>   at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
>   at 
> org.apache.ratis.grpc.server.GrpcService.getInetSocketAddress(GrpcService.java:182)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.lambda$new$0(RaftServerImpl.java:84)
>   at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.getPeer(RaftServerImpl.java:136)
>   at 
> org.apache.ratis.server.impl.RaftServerMetrics.(RaftServerMetrics.java:70)
>   at 
> org.apache.ratis.server.impl.RaftServerMetrics.getRaftServerMetrics(RaftServerMetrics.java:62)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:119)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14951) KMS Jetty server does not startup if trust store password is null

2019-11-01 Thread Hanisha Koneru (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965130#comment-16965130
 ] 

Hanisha Koneru edited comment on HDFS-14951 at 11/1/19 10:36 PM:
-

Pinging [~weichiu] and [~xyao] for review please.


was (Author: hanishakoneru):
Pinging [~weichiu] for review please.

> KMS Jetty server does not startup if trust store password is null
> -
>
> Key: HDFS-14951
> URL: https://issues.apache.org/jira/browse/HDFS-14951
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-14951.001.patch
>
>
> In HttpServe2, if the trustStore is set but the trust store password is not, 
> then we set the TrustStorePassword of SSLContextFactory to null. This results 
> in the Jetty server not starting up.
> {code:java}
> In HttpServer2#createHttpsChannelConnector(),
> if (trustStore != null) {
>   sslContextFactory.setTrustStorePath(trustStore);
>   sslContextFactory.setTrustStoreType(trustStoreType);
>   sslContextFactory.setTrustStorePassword(trustStorePassword);
> }
> {code}
> Before setting the trust store password, we should check that it is not null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14951) KMS Jetty server does not startup if trust store password is null

2019-11-01 Thread Hanisha Koneru (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965130#comment-16965130
 ] 

Hanisha Koneru commented on HDFS-14951:
---

Pinging [~weichiu] for review please.

> KMS Jetty server does not startup if trust store password is null
> -
>
> Key: HDFS-14951
> URL: https://issues.apache.org/jira/browse/HDFS-14951
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-14951.001.patch
>
>
> In HttpServe2, if the trustStore is set but the trust store password is not, 
> then we set the TrustStorePassword of SSLContextFactory to null. This results 
> in the Jetty server not starting up.
> {code:java}
> In HttpServer2#createHttpsChannelConnector(),
> if (trustStore != null) {
>   sslContextFactory.setTrustStorePath(trustStore);
>   sslContextFactory.setTrustStoreType(trustStoreType);
>   sslContextFactory.setTrustStorePassword(trustStorePassword);
> }
> {code}
> Before setting the trust store password, we should check that it is not null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14951) KMS Jetty server does not startup if trust store password is null

2019-11-01 Thread Hanisha Koneru (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-14951:
--
Attachment: HDFS-14951.001.patch

> KMS Jetty server does not startup if trust store password is null
> -
>
> Key: HDFS-14951
> URL: https://issues.apache.org/jira/browse/HDFS-14951
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-14951.001.patch
>
>
> In HttpServe2, if the trustStore is set but the trust store password is not, 
> then we set the TrustStorePassword of SSLContextFactory to null. This results 
> in the Jetty server not starting up.
> {code:java}
> In HttpServer2#createHttpsChannelConnector(),
> if (trustStore != null) {
>   sslContextFactory.setTrustStorePath(trustStore);
>   sslContextFactory.setTrustStoreType(trustStoreType);
>   sslContextFactory.setTrustStorePassword(trustStorePassword);
> }
> {code}
> Before setting the trust store password, we should check that it is not null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-14951) KMS Jetty server does not startup if trust store password is null

2019-11-01 Thread Hanisha Koneru (Jira)

Hanisha Koneru created HDFS-14951:
-

 Summary: KMS Jetty server does not startup if trust store password 
is null
 Key: HDFS-14951
 URL: https://issues.apache.org/jira/browse/HDFS-14951
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru


In HttpServe2, if the trustStore is set but the trust store password is not, 
then we set the TrustStorePassword of SSLContextFactory to null. This results 
in the Jetty server not starting up.
{code:java}
In HttpServer2#createHttpsChannelConnector(),

if (trustStore != null) {
  sslContextFactory.setTrustStorePath(trustStore);
  sslContextFactory.setTrustStoreType(trustStoreType);
  sslContextFactory.setTrustStorePassword(trustStorePassword);
}
{code}
Before setting the trust store password, we should check that it is not null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2392) Fix TestScmSafeMode#testSCMSafeModeRestrictedOp

2019-10-31 Thread Hanisha Koneru (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-2392:
-
Description: 
After ratis upgrade (HDDS-2340), TestScmSafeMode#testSCMSafeModeRestrictedOp 
fails as the DNs fail to restart XceiverServerRatis. 

RaftServer#start() fails with following exception:
{code:java}
java.io.IOException: java.lang.IllegalStateException: Not started
at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:70)
at 
org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:284)
at 
org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:296)
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:421)
at 
org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:215)
at 
org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:110)
at 
org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Not started
at 
org.apache.ratis.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:504)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.getPort(ServerImpl.java:176)
at 
org.apache.ratis.grpc.server.GrpcService.lambda$new$2(GrpcService.java:143)
at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
at 
org.apache.ratis.grpc.server.GrpcService.getInetSocketAddress(GrpcService.java:182)
at 
org.apache.ratis.server.impl.RaftServerImpl.lambda$new$0(RaftServerImpl.java:84)
at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
at 
org.apache.ratis.server.impl.RaftServerImpl.getPeer(RaftServerImpl.java:136)
at 
org.apache.ratis.server.impl.RaftServerMetrics.(RaftServerMetrics.java:70)
at 
org.apache.ratis.server.impl.RaftServerMetrics.getRaftServerMetrics(RaftServerMetrics.java:62)
at 
org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:119)
at 
org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
{code}

  was:
After ratis upgrade (HDDS-2340), TestScmSafeMode#testSCMSafeModeRestrictedOp 
fails as the DNs fail to restart XceiverServerRatis. 
RaftServer#start() fails with following exception:
{code:java}
java.io.IOException: java.lang.IllegalStateException: Not 
startedjava.io.IOException: java.lang.IllegalStateException: Not started at 
org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54) at 
org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61) at 
org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:70) at 
org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:284) 
at org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:296) 
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:421)
 at 
org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:215)
 at 
org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:110)
 at 
org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)Caused by: 
java.lang.IllegalStateException: Not started at 
org.apache.ratis.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:504)
 at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.getPort(ServerImpl.java:176)
 at org.apache.ratis.grpc.server.GrpcService.lambda$new$2(GrpcService.java:143) 
at

[jira] [Created] (HDDS-2392) Fix TestScmSafeMode#testSCMSafeModeRestrictedOp

2019-10-31 Thread Hanisha Koneru (Jira)

Hanisha Koneru created HDDS-2392:


 Summary: Fix TestScmSafeMode#testSCMSafeModeRestrictedOp
 Key: HDDS-2392
 URL: https://issues.apache.org/jira/browse/HDDS-2392
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Hanisha Koneru


After ratis upgrade (HDDS-2340), TestScmSafeMode#testSCMSafeModeRestrictedOp 
fails as the DNs fail to restart XceiverServerRatis. 
RaftServer#start() fails with following exception:
{code:java}
java.io.IOException: java.lang.IllegalStateException: Not 
startedjava.io.IOException: java.lang.IllegalStateException: Not started at 
org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54) at 
org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61) at 
org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:70) at 
org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:284) 
at org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:296) 
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:421)
 at 
org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:215)
 at 
org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:110)
 at 
org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)Caused by: 
java.lang.IllegalStateException: Not started at 
org.apache.ratis.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:504)
 at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.getPort(ServerImpl.java:176)
 at org.apache.ratis.grpc.server.GrpcService.lambda$new$2(GrpcService.java:143) 
at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62) at 
org.apache.ratis.grpc.server.GrpcService.getInetSocketAddress(GrpcService.java:182)
 at 
org.apache.ratis.server.impl.RaftServerImpl.lambda$new$0(RaftServerImpl.java:84)
 at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62) at 
org.apache.ratis.server.impl.RaftServerImpl.getPeer(RaftServerImpl.java:136) at 
org.apache.ratis.server.impl.RaftServerMetrics.(RaftServerMetrics.java:70)
 at 
org.apache.ratis.server.impl.RaftServerMetrics.getRaftServerMetrics(RaftServerMetrics.java:62)
 at org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:119) 
at 
org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
 at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-2376) Fail to read data through XceiverClientGrpc

2019-10-30 Thread Hanisha Koneru (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963573#comment-16963573
 ] 

Hanisha Koneru commented on HDDS-2376:
--

[~Sammi], I ran teragen on my cluster and it ran successfully. Is this error 
reproducible?

> Fail to read data through XceiverClientGrpc
> ---
>
> Key: HDDS-2376
> URL: https://issues.apache.org/jira/browse/HDDS-2376
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Sammi Chen
>Assignee: Hanisha Koneru
>Priority: Blocker
>
> Run teragen, application failed with following stack, 
> 19/10/29 14:35:42 INFO mapreduce.Job: Running job: job_1567133159094_0048
> 19/10/29 14:35:59 INFO mapreduce.Job: Job job_1567133159094_0048 running in 
> uber mode : false
> 19/10/29 14:35:59 INFO mapreduce.Job:  map 0% reduce 0%
> 19/10/29 14:35:59 INFO mapreduce.Job: Job job_1567133159094_0048 failed with 
> state FAILED due to: Application application_1567133159094_0048 failed 2 
> times due to AM Container for appattempt_1567133159094_0048_02 exited 
> with  exitCode: -1000
> For more detailed output, check application tracking 
> page:http://host183:8088/cluster/app/application_1567133159094_0048Then, 
> click on links to logs of each attempt.
> Diagnostics: Unexpected OzoneException: 
> org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at 
> index 0
> java.io.IOException: Unexpected OzoneException: 
> org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at 
> index 0
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:342)
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307)
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259)
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239)
>   at 
> org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171)
>   at 
> org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:52)
>   at java.io.DataInputStream.read(DataInputStream.java:100)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:86)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:60)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:120)
>   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
>   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:267)
>   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
>   at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
>   at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum 
> mismatch at index 0
>   at 
> org.apache.hadoop.ozone.common.ChecksumData.verifyChecksumDataMatches(ChecksumData.java:148)
>   at 
> org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:275)
>   at 
> org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:238)
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.lambda$new$0(ChunkInputStream.java:375)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:287)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:250)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:233)
>   at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:245)
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:335)
>   ... 26 more
> Caused by: Checksum mismatch at index 0
>

[jira] [Resolved] (HDDS-2285) GetBlock and ReadChunk commands should be sent to the same datanode

2019-10-28 Thread Hanisha Koneru (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru resolved HDDS-2285.
--
Fix Version/s: 0.5.0
   Resolution: Fixed

> GetBlock and ReadChunk commands should be sent to the same datanode
> ---
>
> Key: HDDS-2285
> URL: https://issues.apache.org/jira/browse/HDDS-2285
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Mukul Kumar Singh
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I can be observed that the GetBlock and ReadChunk command is sent to 2 
> different datanodes. It should be sent to the same datanode to re-use the 
> connection.
> {code}
> 19/10/10 00:43:42 INFO scm.XceiverClientGrpc: Send command GetBlock to 
> datanode 172.26.32.224
> 19/10/10 00:43:42 INFO scm.XceiverClientGrpc: Send command ReadChunk to 
> datanode 172.26.32.231
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2285) GetBlock and ReadChunk commands should be sent to the same datanode

2019-10-28 Thread Hanisha Koneru (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-2285:
-
Summary: GetBlock and ReadChunk commands should be sent to the same 
datanode  (was: GetBlock and ReadChunk command from the client should be sent 
to the same datanode to re-use the same connection)

> GetBlock and ReadChunk commands should be sent to the same datanode
> ---
>
> Key: HDDS-2285
> URL: https://issues.apache.org/jira/browse/HDDS-2285
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Mukul Kumar Singh
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I can be observed that the GetBlock and ReadChunk command is sent to 2 
> different datanodes. It should be sent to the same datanode to re-use the 
> connection.
> {code}
> 19/10/10 00:43:42 INFO scm.XceiverClientGrpc: Send command GetBlock to 
> datanode 172.26.32.224
> 19/10/10 00:43:42 INFO scm.XceiverClientGrpc: Send command ReadChunk to 
> datanode 172.26.32.231
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-2271) Avoid buffer copying in KeyValueHandler

2019-10-16 Thread Hanisha Koneru (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953233#comment-16953233
 ] 

Hanisha Koneru commented on HDDS-2271:
--

Thanks [~szetszwo] for the patch.
LGTM. +1.

> Avoid buffer copying in KeyValueHandler
> ---
>
> Key: HDDS-2271
> URL: https://issues.apache.org/jira/browse/HDDS-2271
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>  Labels: pull-request-available
> Attachments: o2271_20191015.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> - In handleGetSmallFile, it first reads chunk data to a   byte[] and the copy 
> them to a ByteString.
> - In handlePutBlock/handleGetBlock, in order to get the length, it (1) builds 
> a ContainerProtos.BlockData and then copies it to a byte[].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-2275) In BatchOperation.SingleOperation, do not clone byte[]

2019-10-16 Thread Hanisha Koneru (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953216#comment-16953216
 ] 

Hanisha Koneru commented on HDDS-2275:
--

Thanks [~szetszwo] for the patch.
LGTM. +1.

> In BatchOperation.SingleOperation, do not clone byte[]
> --
>
> Key: HDDS-2275
> URL: https://issues.apache.org/jira/browse/HDDS-2275
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
> Attachments: o2275_20191015.patch
>
>
> byte[] is cloned once in the constructor and then it is cloned again in the 
> getter methods.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-2291) Acceptance tests for OM HA

2019-10-13 Thread Hanisha Koneru (Jira)

Hanisha Koneru created HDDS-2291:


 Summary: Acceptance tests for OM HA
 Key: HDDS-2291
 URL: https://issues.apache.org/jira/browse/HDDS-2291
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
  Components: HA, om
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru


Add robot tests to test OM HA functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2240) Command line tool for OM Admin

2019-10-13 Thread Hanisha Koneru (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-2240:
-
Summary: Command line tool for OM Admin  (was: Command line tool for OM HA)

> Command line tool for OM Admin
> --
>
> Key: HDDS-2240
> URL: https://issues.apache.org/jira/browse/HDDS-2240
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> A command line tool (*ozone omha*) to get information related to OM HA. 
> This Jira proposes to add the _getServiceState_ option for OM HA which lists 
> all the OMs in the service and their corresponding Ratis server roles 
> (LEADER/ FOLLOWER). 
> We can later add more options to this tool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDDS-2158) Fix Json Injection in JsonUtils

2019-10-04 Thread Hanisha Koneru (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru resolved HDDS-2158.
--
Fix Version/s: 0.5.0
   Resolution: Fixed

> Fix Json Injection in JsonUtils
> ---
>
> Key: HDDS-2158
> URL: https://issues.apache.org/jira/browse/HDDS-2158
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> JsonUtils#toJsonStringWithDefaultPrettyPrinter() does not validate the Json 
> String  before serializing it which could result in Json Injection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-2255) Improve Acl Handler Messages

2019-10-04 Thread Hanisha Koneru (Jira)

Hanisha Koneru created HDDS-2255:


 Summary: Improve Acl Handler Messages
 Key: HDDS-2255
 URL: https://issues.apache.org/jira/browse/HDDS-2255
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: om
Reporter: Hanisha Koneru


In Add/Remove/Set Acl Key/Bucket/Volume Handlers, we print a message about 
whether the operation was successful or not. If we are trying to add an ACL 
which is already existing, we convey the message that the operation failed. It 
would be better if the message conveyed more clearly why the operation failed 
i.e. the ACL already exists. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-2240) Command line tool for OM HA

2019-10-03 Thread Hanisha Koneru (Jira)

Hanisha Koneru created HDDS-2240:


 Summary: Command line tool for OM HA
 Key: HDDS-2240
 URL: https://issues.apache.org/jira/browse/HDDS-2240
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru


A command line tool (*ozone omha*) to get information related to OM HA. 
This Jira proposes to add the _getServiceState_ option for OM HA which lists 
all the OMs in the service and their corresponding Ratis server roles (LEADER/ 
FOLLOWER). 
We can later add more options to this tool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-2159) Fix Race condition in ProfileServlet#pid

2019-09-20 Thread Hanisha Koneru (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934821#comment-16934821
 ] 

Hanisha Koneru commented on HDDS-2159:
--

If two threads are running this method simultaneously, then one could change 
the value of pid before the other is done processing. 

> Fix Race condition in ProfileServlet#pid
> 
>
> Key: HDDS-2159
> URL: https://issues.apache.org/jira/browse/HDDS-2159
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a race condition in ProfileServlet. The Servlet member field pid 
> should not be used for local assignment. It could lead to race condition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-2159) Fix Race condition in ProfileServlet#pid

2019-09-20 Thread Hanisha Koneru (Jira)

Hanisha Koneru created HDDS-2159:


 Summary: Fix Race condition in ProfileServlet#pid
 Key: HDDS-2159
 URL: https://issues.apache.org/jira/browse/HDDS-2159
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru


There is a race condition in ProfileServlet. The Servlet member field pid 
should not be used for local assignment. It could lead to race condition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-2158) Fix Json Injection in JsonUtils

2019-09-20 Thread Hanisha Koneru (Jira)

Hanisha Koneru created HDDS-2158:


 Summary: Fix Json Injection in JsonUtils
 Key: HDDS-2158
 URL: https://issues.apache.org/jira/browse/HDDS-2158
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru


JsonUtils#toJsonStringWithDefaultPrettyPrinter() does not validate the Json 
String  before serializing it which could result in Json Injection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-2150) Update dependency versions to avoid security vulnerabilities

2019-09-18 Thread Hanisha Koneru (Jira)

Hanisha Koneru created HDDS-2150:


 Summary: Update dependency versions to avoid security 
vulnerabilities
 Key: HDDS-2150
 URL: https://issues.apache.org/jira/browse/HDDS-2150
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru


The following dependency versions have known security vulnerabilities. We 
should update them to recent/ later versions.
 * Apache Thrift 0.11.0
 * Apache Zookeeper 3.4.13
 * Jetty Servlet 9.3.24



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2139) Update BeanUtils and Jackson Databind dependency versions

2019-09-16 Thread Hanisha Koneru (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-2139:
-
Description: 
The following Ozone dependencies have known security vulnerabilities. We should 
update them to newer/ latest versions.
 * Apache Common BeanUtils version 1.9.3
 * Fasterxml Jackson version 2.9.5

  was:
The following Ozone dependencies have known security vulnerabilities. We should 
update them to newer/ latest versions.
* Apache Common BeanUtils version 1.9.3
* Fasterxml Jackson version 2.9.5


> Update BeanUtils and Jackson Databind dependency versions
> -
>
> Key: HDDS-2139
> URL: https://issues.apache.org/jira/browse/HDDS-2139
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>
> The following Ozone dependencies have known security vulnerabilities. We 
> should update them to newer/ latest versions.
>  * Apache Common BeanUtils version 1.9.3
>  * Fasterxml Jackson version 2.9.5



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-2139) Update BeanUtils and Jackson Databind dependency versions

2019-09-16 Thread Hanisha Koneru (Jira)

Hanisha Koneru created HDDS-2139:


 Summary: Update BeanUtils and Jackson Databind dependency versions
 Key: HDDS-2139
 URL: https://issues.apache.org/jira/browse/HDDS-2139
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru


The following Ozone dependencies have known security vulnerabilities. We should 
update them to newer/ latest versions.
* Apache Common BeanUtils version 1.9.3
* Fasterxml Jackson version 2.9.5



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDDS-2107) Datanodes should retry forever to connect to SCM in an unsecure environment

2019-09-16 Thread Hanisha Koneru (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru resolved HDDS-2107.
--
Resolution: Fixed

> Datanodes should retry forever to connect to SCM in an unsecure environment
> ---
>
> Key: HDDS-2107
> URL: https://issues.apache.org/jira/browse/HDDS-2107
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.1
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In an unsecure environment, the datanodes try upto 10 times after waiting for 
> 1000 milliseconds each time before throwing this error:
> {code:java}
> Unable to communicate to SCM server at scm:9861 for past 0 seconds.
> java.net.ConnectException: Call From scm/10.65.36.118 to scm:9861 failed on 
> connection exception: java.net.ConnectException: Connection refused; For more 
> details see:  http://wiki.apache.org/hadoop/ConnectionRefused
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1457)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1367)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy33.getVersion(Unknown Source)
>   at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:112)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:70)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:690)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794)
>   at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1572)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1403)
>   ... 13 more
> {code}
> The datanodes should try forever to connect with SCM and not throw any errors.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12831) HDFS throws FileNotFoundException on getFileBlockLocations(path-to-directory)

2019-09-04 Thread Hanisha Koneru (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-12831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922860#comment-16922860
 ] 

Hanisha Koneru commented on HDFS-12831:
---

[~hemanthboyina] sure go ahead.

> HDFS throws FileNotFoundException on getFileBlockLocations(path-to-directory)
> -
>
> Key: HDFS-12831
> URL: https://issues.apache.org/jira/browse/HDFS-12831
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Hanisha Koneru
>Priority: Major
>
> The HDFS implementation of {{getFileBlockLocations(path, offset, len)}} 
> throws an exception if the path references a directory. 
> The base implementation (and all other filesystems) just return an empty 
> array, something implemented in {{getFileBlockLocations(filestatsus, offset, 
> len)}}; something written up in filesystem.md as the correct behaviour. 
> # has been shown to break things: SPARK-14959
> # there's no contract tests for these APIs; shows up in HADOOP-15044. 
> # even if this is considered a wontfix, it should raise something like 
> {{PathIsDirectoryException}} rather than FNFE



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-168) Add ScmGroupID to Datanode Version File

2019-08-28 Thread Hanisha Koneru (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917971#comment-16917971
 ] 

Hanisha Koneru commented on HDDS-168:
-

[~sdeka], ScmGroupID is for supporting multiple SCMs in a cluster. This could 
also be used when SCM HA is implemented.

> Add ScmGroupID to Datanode Version File
> ---
>
> Key: HDDS-168
> URL: https://issues.apache.org/jira/browse/HDDS-168
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>
> Add the field {{ScmGroupID}} to Datanode Version file. This field identifies 
> the set of SCMs that this datanode talks to, or takes commands from.
> This value is not same as Cluster ID – since a cluster can technically have 
> more than one SCM group.
> Refer to [~anu]'s 
> [comment|https://issues.apache.org/jira/browse/HDDS-156?focusedCommentId=16511903=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16511903]
>  in HDDS-156.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1827) Load Snapshot info when OM Ratis server starts

2019-08-23 Thread Hanisha Koneru (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1827:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Load Snapshot info when OM Ratis server starts
> --
>
> Key: HDDS-1827
> URL: https://issues.apache.org/jira/browse/HDDS-1827
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> When Ratis server is starting it looks for the latest snapshot to load it. 
> Even though OM does not save snapshots via Ratis, we need to load the saved 
> snaphsot index into Ratis so that the LogAppender knows to not look for logs 
> before the snapshot index. Otherwise, Ratis will replay the logs from 
> beginning every time it starts up.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-2027) Corroborate log purging in TestOzoneManagerHA#testOMRestart

2019-08-23 Thread Hanisha Koneru (Jira)

Hanisha Koneru created HDDS-2027:


 Summary: Corroborate log purging in 
TestOzoneManagerHA#testOMRestart
 Key: HDDS-2027
 URL: https://issues.apache.org/jira/browse/HDDS-2027
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
  Components: test
Reporter: Hanisha Koneru


TestOzoneManagerHA#testOMRestart verifies that on OM restart, the snapshot is 
downloaded and the state is reloaded properly. For this, it is assumed that the 
logs are being purged periodically on the OMs. This Jira aims to corroborate 
that the ratis log purging is happening as expected.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1105) Add mechanism in Recon to obtain DB snapshot 'delta' updates from Ozone Manager.

2019-08-19 Thread Hanisha Koneru (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1105:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Add mechanism in Recon to obtain DB snapshot 'delta' updates from Ozone 
> Manager.
> 
>
> Key: HDDS-1105
> URL: https://issues.apache.org/jira/browse/HDDS-1105
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> *Some context*
> The FSCK server will periodically invoke this OM API passing in the most 
> recent sequence number of its own RocksDB instance. The OM will use the 
> RockDB getUpdateSince() API to answer this query. Since the getUpdateSince 
> API only works against the RocksDB WAL, we have to configure OM RocksDB WAL 
> (https://github.com/facebook/rocksdb/wiki/Write-Ahead-Log) with sufficient 
> max size to make this API useful. If the OM cannot get all transactions since 
> the given sequence number (due to WAL flushing), it can error out. In that 
> case the FSCK server can fall back to getting the entire checkpoint snapshot 
> implemented in HDDS-1085.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-153) Add HA-aware proxy for OM client

2019-08-13 Thread Hanisha Koneru (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906524#comment-16906524
 ] 

Hanisha Koneru commented on HDDS-153:
-

Yes, this has been done as part of HDDS-505 subtasks. Resolving this Jira.

> Add HA-aware proxy for OM client 
> -
>
> Key: HDDS-153
> URL: https://issues.apache.org/jira/browse/HDDS-153
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: DENG FEI
>Priority: Major
>
> This allows the client to talk to OMs in RATIS ring when failover (leader 
> change) happens. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDDS-153) Add HA-aware proxy for OM client

2019-08-13 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru resolved HDDS-153.
-
Resolution: Duplicate

> Add HA-aware proxy for OM client 
> -
>
> Key: HDDS-153
> URL: https://issues.apache.org/jira/browse/HDDS-153
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: DENG FEI
>Priority: Major
>
> This allows the client to talk to OMs in RATIS ring when failover (leader 
> change) happens. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1827) Load Snapshot info when OM Ratis server starts

2019-07-19 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1827:
-
Status: Patch Available  (was: Open)

> Load Snapshot info when OM Ratis server starts
> --
>
> Key: HDDS-1827
> URL: https://issues.apache.org/jira/browse/HDDS-1827
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When Ratis server is starting it looks for the latest snapshot to load it. 
> Even though OM does not save snapshots via Ratis, we need to load the saved 
> snaphsot index into Ratis so that the LogAppender knows to not look for logs 
> before the snapshot index. Otherwise, Ratis will replay the logs from 
> beginning every time it starts up.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-1830) OzoneManagerDoubleBuffer#stop should wait for daemon thread to die

2019-07-18 Thread Hanisha Koneru (JIRA)

Hanisha Koneru created HDDS-1830:


 Summary: OzoneManagerDoubleBuffer#stop should wait for daemon 
thread to die
 Key: HDDS-1830
 URL: https://issues.apache.org/jira/browse/HDDS-1830
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Hanisha Koneru


Based on [~arp]'s comment on HDDS-1649, OzoneManagerDoubleBuffer#stop() calls 
interrupt() on daemon thread but not join(). The thread might still be running 
when the call returns. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-1829) On OM reload/ restart OmMetrics#numKeys should be updated

2019-07-18 Thread Hanisha Koneru (JIRA)

Hanisha Koneru created HDDS-1829:


 Summary: On OM reload/ restart OmMetrics#numKeys should be updated
 Key: HDDS-1829
 URL: https://issues.apache.org/jira/browse/HDDS-1829
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Hanisha Koneru


When OM is restarted or the state is reloaded, OM Metrics is re-initialized. 
The saved numKeys value might not be valid as the DB state could have changed. 
Hence, the numKeys metric must be updated with the correct value on metrics 
re-initialization.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1827) Load Snapshot info when OM Ratis server starts

2019-07-18 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1827:
-
Summary: Load Snapshot info when OM Ratis server starts  (was: Load 
Snapshot info when start OM Ratis server)

> Load Snapshot info when OM Ratis server starts
> --
>
> Key: HDDS-1827
> URL: https://issues.apache.org/jira/browse/HDDS-1827
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>
> When Ratis server is starting it looks for the latest snapshot to load it. 
> Even though OM does not save snapshots via Ratis, we need to load the saved 
> snaphsot index into Ratis so that the LogAppender knows to not look for logs 
> before the snapshot index. Otherwise, Ratis will replay the logs from 
> beginning every time it starts up.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-1827) Load Snapshot info when start OM Ratis server

2019-07-18 Thread Hanisha Koneru (JIRA)

Hanisha Koneru created HDDS-1827:


 Summary: Load Snapshot info when start OM Ratis server
 Key: HDDS-1827
 URL: https://issues.apache.org/jira/browse/HDDS-1827
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru


When Ratis server is starting it looks for the latest snapshot to load it. Even 
though OM does not save snapshots via Ratis, we need to load the saved snaphsot 
index into Ratis so that the LogAppender knows to not look for logs before the 
snapshot index. Otherwise, Ratis will replay the logs from beginning every time 
it starts up.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1649) On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-17 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1649:
-
Description: 
Installing a DB checkpoint on the OM involves following steps:
 1. When an OM follower receives installSnapshot notification from OM leader, 
it should initiate a new checkpoint on the OM leader and download that 
checkpoint through Http. 
 2. After downloading the checkpoint, the StateMachine must be paused so that 
the old OM DB can be replaced with the new downloaded checkpoint. 
 3. The OM should be reloaded with the new state . All the services having a 
dependency on the OM DB (such as MetadataManager, KeyManager etc.) must be 
re-initialized/ restarted. 
 4. Once the OM is ready with the new state, the state machine must be unpaused 
to resume participating in the Ratis ring.

  was:
Installing a DB checkpoint on the OM involves following steps:
1. When an OM follower receives installSnapshot notification from OM leader, it 
should initiate a new checkpoint on the OM leader and download that checkpoint. 
2. After downloading the checkpoint, the StateMachine must be paused so that 
the old OM DB can be replaced with the new downloaded checkpoint. 
3. The OM should be reloaded with the new state . All the services having a 
dependency on the OM DB (such as MetadataManager, KeyManager etc.) must be 
re-initialized/ restarted. 
4. Once the OM is ready with the new state, the state machine must be unpaused 
to resume participating in the Ratis ring.


> On installSnapshot notification from OM leader, download checkpoint and 
> reload OM state
> ---
>
> Key: HDDS-1649
> URL: https://issues.apache.org/jira/browse/HDDS-1649
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Installing a DB checkpoint on the OM involves following steps:
>  1. When an OM follower receives installSnapshot notification from OM leader, 
> it should initiate a new checkpoint on the OM leader and download that 
> checkpoint through Http. 
>  2. After downloading the checkpoint, the StateMachine must be paused so that 
> the old OM DB can be replaced with the new downloaded checkpoint. 
>  3. The OM should be reloaded with the new state . All the services having a 
> dependency on the OM DB (such as MetadataManager, KeyManager etc.) must be 
> re-initialized/ restarted. 
>  4. Once the OM is ready with the new state, the state machine must be 
> unpaused to resume participating in the Ratis ring.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1649) On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-17 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1649:
-
Description: 
Installing a DB checkpoint on the OM involves following steps:
1. When an OM follower receives installSnapshot notification from OM leader, it 
should initiate a new checkpoint on the OM leader and download that checkpoint. 
2. After downloading the checkpoint, the StateMachine must be paused so that 
the old OM DB can be replaced with the new downloaded checkpoint. 
3. The OM should be reloaded with the new state . All the services having a 
dependency on the OM DB (such as MetadataManager, KeyManager etc.) must be 
re-initialized/ restarted. 
4. Once the OM is ready with the new state, the state machine must be unpaused 
to resume participating in the Ratis ring.

  was:When an OM follower receives installSnapshot notification from OM leader, 
it should initiate a new checkpoint on the OM leader and download that 
checkpoint.


> On installSnapshot notification from OM leader, download checkpoint and 
> reload OM state
> ---
>
> Key: HDDS-1649
> URL: https://issues.apache.org/jira/browse/HDDS-1649
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Installing a DB checkpoint on the OM involves following steps:
> 1. When an OM follower receives installSnapshot notification from OM leader, 
> it should initiate a new checkpoint on the OM leader and download that 
> checkpoint. 
> 2. After downloading the checkpoint, the StateMachine must be paused so that 
> the old OM DB can be replaced with the new downloaded checkpoint. 
> 3. The OM should be reloaded with the new state . All the services having a 
> dependency on the OM DB (such as MetadataManager, KeyManager etc.) must be 
> re-initialized/ restarted. 
> 4. Once the OM is ready with the new state, the state machine must be 
> unpaused to resume participating in the Ratis ring.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1775) Make OM KeyDeletingService compatible with HA model

2019-07-16 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1775:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Make OM KeyDeletingService compatible with HA model
> ---
>
> Key: HDDS-1775
> URL: https://issues.apache.org/jira/browse/HDDS-1775
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Currently OM KeyDeletingService directly deletes all the keys in DeletedTable 
> after deleting the corresponding blocks through SCM. For HA compatibility, 
> the key purging should happen through the OM Ratis server. This Jira 
> introduces PurgeKeys request in OM protocol. This request will be submitted 
> to OMs Ratis server after SCM deletes blocks corresponding to deleted keys.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-1814) Improve KeyDeletingService using Java 8 method reference

2019-07-16 Thread Hanisha Koneru (JIRA)

Hanisha Koneru created HDDS-1814:


 Summary: Improve KeyDeletingService using Java 8 method reference 
 Key: HDDS-1814
 URL: https://issues.apache.org/jira/browse/HDDS-1814
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Hanisha Koneru


Based on [~bharatviswa]'s comments on HDDS-1775 PR 
([https://github.com/apache/hadoop/pull/1063]), instead of null checks for 
OzoneManager, we should use the Java 8 method reference.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-1529) BlockInputStream: Avoid buffer copy if the whole chunk is being read

2019-07-10 Thread Hanisha Koneru (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882434#comment-16882434
 ] 

Hanisha Koneru commented on HDDS-1529:
--

[~hgadre], HDDS-1496 adds support to read partial chunks. When chunks are read 
from disk, they are stored in a local buffer and then the required part of the 
chunk is copied to the client buffer. This is required when the chunk boundary 
to be read does not coincide with the checksum boundary. But when we are 
reading the whole chunk, we do not need to do double copy i.e. copy from disk 
to local buffer and then to client buffer. We can directly copy the date from 
disk to client buffer.
Please let me know if this makes sense or if you have any questions.

> BlockInputStream: Avoid buffer copy if the whole chunk is being read
> 
>
> Key: HDDS-1529
> URL: https://issues.apache.org/jira/browse/HDDS-1529
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Hanisha Koneru
>Assignee: Hrishikesh Gadre
>Priority: Major
>
> Currently, BlockInputStream reads chunk data from DNs and puts it in a local 
> buffer and then copies the data to clients buffer. This is required for 
> partial chunk reads where extra chunk data than requested might have to be 
> read so that checksum verification can be done. But if the whole chunk is 
> being read, we can copy the data directly into client buffer and avoid double 
> buffer copies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1775) Make OM KeyDeletingService compatible with HA model

2019-07-10 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1775:
-
Status: Patch Available  (was: Open)

> Make OM KeyDeletingService compatible with HA model
> ---
>
> Key: HDDS-1775
> URL: https://issues.apache.org/jira/browse/HDDS-1775
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently OM KeyDeletingService directly deletes all the keys in DeletedTable 
> after deleting the corresponding blocks through SCM. For HA compatibility, 
> the key purging should happen through the OM Ratis server. This Jira 
> introduces PurgeKeys request in OM protocol. This request will be submitted 
> to OMs Ratis server after SCM deletes blocks corresponding to deleted keys.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1649) On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-10 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1649:
-
Status: Patch Available  (was: Open)

> On installSnapshot notification from OM leader, download checkpoint and 
> reload OM state
> ---
>
> Key: HDDS-1649
> URL: https://issues.apache.org/jira/browse/HDDS-1649
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When an OM follower receives installSnapshot notification from OM leader, it 
> should initiate a new checkpoint on the OM leader and download that 
> checkpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-1775) Make OM KeyDeletingService compatible with HA model

2019-07-08 Thread Hanisha Koneru (JIRA)

Hanisha Koneru created HDDS-1775:


 Summary: Make OM KeyDeletingService compatible with HA model
 Key: HDDS-1775
 URL: https://issues.apache.org/jira/browse/HDDS-1775
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru


Currently OM KeyDeletingService directly deletes all the keys in DeletedTable 
after deleting the corresponding blocks through SCM. For HA compatibility, the 
key purging should happen through the OM Ratis server. This Jira introduces 
PurgeKeys request in OM protocol. This request will be submitted to OMs Ratis 
server after SCM deletes blocks corresponding to deleted keys.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-1717) MR Job fails as OMFailoverProxyProvider has dependency hadoop-3.2

2019-07-03 Thread Hanisha Koneru (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878252#comment-16878252
 ] 

Hanisha Koneru commented on HDDS-1717:
--

I have tried resolving this without duplicating the classes. I have refactored 
OMProxyInfo to not extend FailoverProxyProvider.ProxyInfo.

> MR Job fails as OMFailoverProxyProvider has dependency hadoop-3.2
> -
>
> Key: HDDS-1717
> URL: https://issues.apache.org/jira/browse/HDDS-1717
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Affects Versions: 0.4.0
> Environment: Ozone : 10 Node (1 SCM, 1 OM, 10 DN)
> HDP : 5 Node
> Both cluster are on separate nodes and hosted on HDP Ycloud.
>Reporter: Soumitra Sulav
>Assignee: Hanisha Koneru
>Priority: Blocker
>  Labels: pull-request-available
> Attachments: syslog_mapred.err
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Mapreduce Jobs are failing with exception ??Couldn't create protocol class 
> org.apache.hadoop.ozone.client.rpc.RpcClient exception??
> Ozone hadoop-ozone-filesystem-lib-current.jar copied to HDP cluster's hadoop 
> and mapreduce classpath under :
> {code:java}
> /usr/hdp/3.1.0.0-78/hadoop/lib/hadoop-ozone-filesystem-lib-current-0.5.0-SNAPSHOT.jar
> /usr/hdp/3.1.0.0-78/hadoop-mapreduce/hadoop-ozone-filesystem-lib-current-0.5.0-SNAPSHOT.jar
> {code}
> Excerpt from exception :
> {code:java}
> 2019-06-21 10:07:57,982 ERROR [main] 
> org.apache.hadoop.ozone.client.OzoneClientFactory: Couldn't create protocol 
> class org.apache.hadoop.ozone.client.rpc.RpcClient exception:
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:291)
>   at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:169)
>   at 
> org.apache.hadoop.fs.ozone.BasicOzoneClientAdapterImpl.(BasicOzoneClientAdapterImpl.java:134)
>   at 
> org.apache.hadoop.fs.ozone.OzoneClientAdapterImpl.(OzoneClientAdapterImpl.java:50)
>   at 
> org.apache.hadoop.fs.ozone.OzoneFileSystem.createAdapter(OzoneFileSystem.java:103)
>   at 
> org.apache.hadoop.fs.ozone.BasicOzoneFileSystem.initialize(BasicOzoneFileSystem.java:143)
>   at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
>   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
>   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
>   at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
>   at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.(FileOutputCommitter.java:160)
>   at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.(FileOutputCommitter.java:116)
>   at 
> org.apache.hadoop.mapreduce.lib.output.PathOutputCommitterFactory.createFileOutputCommitter(PathOutputCommitterFactory.java:134)
>   at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitterFactory.createOutputCommitter(FileOutputCommitterFactory.java:35)
>   at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.getOutputCommitter(FileOutputFormat.java:338)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$3.call(MRAppMaster.java:552)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$3.call(MRAppMaster.java:534)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1802)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:534)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:311)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$6.run(MRAppMaster.java:1760)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1757)
>   at 
>

[jira] [Updated] (HDDS-1717) MR Job fails as OMFailoverProxyProvider has dependency hadoop-3.2

2019-07-03 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1717:
-
Summary: MR Job fails as OMFailoverProxyProvider has dependency hadoop-3.2  
(was: MR Job fails with exception)

> MR Job fails as OMFailoverProxyProvider has dependency hadoop-3.2
> -
>
> Key: HDDS-1717
> URL: https://issues.apache.org/jira/browse/HDDS-1717
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Affects Versions: 0.4.0
> Environment: Ozone : 10 Node (1 SCM, 1 OM, 10 DN)
> HDP : 5 Node
> Both cluster are on separate nodes and hosted on HDP Ycloud.
>Reporter: Soumitra Sulav
>Assignee: Hanisha Koneru
>Priority: Blocker
> Attachments: syslog_mapred.err
>
>
> Mapreduce Jobs are failing with exception ??Couldn't create protocol class 
> org.apache.hadoop.ozone.client.rpc.RpcClient exception??
> Ozone hadoop-ozone-filesystem-lib-current.jar copied to HDP cluster's hadoop 
> and mapreduce classpath under :
> {code:java}
> /usr/hdp/3.1.0.0-78/hadoop/lib/hadoop-ozone-filesystem-lib-current-0.5.0-SNAPSHOT.jar
> /usr/hdp/3.1.0.0-78/hadoop-mapreduce/hadoop-ozone-filesystem-lib-current-0.5.0-SNAPSHOT.jar
> {code}
> Excerpt from exception :
> {code:java}
> 2019-06-21 10:07:57,982 ERROR [main] 
> org.apache.hadoop.ozone.client.OzoneClientFactory: Couldn't create protocol 
> class org.apache.hadoop.ozone.client.rpc.RpcClient exception:
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:291)
>   at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:169)
>   at 
> org.apache.hadoop.fs.ozone.BasicOzoneClientAdapterImpl.(BasicOzoneClientAdapterImpl.java:134)
>   at 
> org.apache.hadoop.fs.ozone.OzoneClientAdapterImpl.(OzoneClientAdapterImpl.java:50)
>   at 
> org.apache.hadoop.fs.ozone.OzoneFileSystem.createAdapter(OzoneFileSystem.java:103)
>   at 
> org.apache.hadoop.fs.ozone.BasicOzoneFileSystem.initialize(BasicOzoneFileSystem.java:143)
>   at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
>   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
>   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
>   at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
>   at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.(FileOutputCommitter.java:160)
>   at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.(FileOutputCommitter.java:116)
>   at 
> org.apache.hadoop.mapreduce.lib.output.PathOutputCommitterFactory.createFileOutputCommitter(PathOutputCommitterFactory.java:134)
>   at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitterFactory.createOutputCommitter(FileOutputCommitterFactory.java:35)
>   at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.getOutputCommitter(FileOutputFormat.java:338)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$3.call(MRAppMaster.java:552)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$3.call(MRAppMaster.java:534)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1802)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:534)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:311)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$6.run(MRAppMaster.java:1760)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1757)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1691)
> Caused by: java.lang.VerifyError: Cannot inherit from final class
>   at java.lang.ClassLoader.defineClass1(Native

[jira] [Commented] (HDFS-12748) NameNode memory leak when accessing webhdfs GETHOMEDIRECTORY

2019-07-03 Thread Hanisha Koneru (JIRA)



[ 
https://issues.apache.org/jira/browse/HDFS-12748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878204#comment-16878204
 ] 

Hanisha Koneru commented on HDFS-12748:
---

Hi [~xkrogen], does patch v05 look good to you?

> NameNode memory leak when accessing webhdfs GETHOMEDIRECTORY
> 
>
> Key: HDFS-12748
> URL: https://issues.apache.org/jira/browse/HDFS-12748
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.8.2
>Reporter: Jiandan Yang 
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: HDFS-12748.001.patch, HDFS-12748.002.patch, 
> HDFS-12748.003.patch, HDFS-12748.004.patch, HDFS-12748.005.patch
>
>
> In our production environment, the standby NN often do fullgc, through mat we 
> found the largest object is FileSystem$Cache, which contains 7,844,890 
> DistributedFileSystem.
> By view hierarchy of method FileSystem.get() , I found only 
> NamenodeWebHdfsMethods#get call FileSystem.get(). I don't know why creating 
> different DistributedFileSystem every time instead of get a FileSystem from 
> cache.
> {code:java}
> case GETHOMEDIRECTORY: {
>   final String js = JsonUtil.toJsonString("Path",
>   FileSystem.get(conf != null ? conf : new Configuration())
>   .getHomeDirectory().toUri().getPath());
>   return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
> }
> {code}
> When we close FileSystem when GETHOMEDIRECTORY, NN don't do fullgc.
> {code:java}
> case GETHOMEDIRECTORY: {
>   FileSystem fs = null;
>   try {
> fs = FileSystem.get(conf != null ? conf : new Configuration());
> final String js = JsonUtil.toJsonString("Path",
> fs.getHomeDirectory().toUri().getPath());
> return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
>   } finally {
> if (fs != null) {
>   fs.close();
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12748) NameNode memory leak when accessing webhdfs GETHOMEDIRECTORY

2019-07-02 Thread Hanisha Koneru (JIRA)



[ 
https://issues.apache.org/jira/browse/HDFS-12748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877160#comment-16877160
 ] 

Hanisha Koneru commented on HDFS-12748:
---

Thank you for working on this [~cheersyang]. LGTM. +1.

> NameNode memory leak when accessing webhdfs GETHOMEDIRECTORY
> 
>
> Key: HDFS-12748
> URL: https://issues.apache.org/jira/browse/HDFS-12748
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.8.2
>Reporter: Jiandan Yang 
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: HDFS-12748.001.patch, HDFS-12748.002.patch, 
> HDFS-12748.003.patch, HDFS-12748.004.patch, HDFS-12748.005.patch
>
>
> In our production environment, the standby NN often do fullgc, through mat we 
> found the largest object is FileSystem$Cache, which contains 7,844,890 
> DistributedFileSystem.
> By view hierarchy of method FileSystem.get() , I found only 
> NamenodeWebHdfsMethods#get call FileSystem.get(). I don't know why creating 
> different DistributedFileSystem every time instead of get a FileSystem from 
> cache.
> {code:java}
> case GETHOMEDIRECTORY: {
>   final String js = JsonUtil.toJsonString("Path",
>   FileSystem.get(conf != null ? conf : new Configuration())
>   .getHomeDirectory().toUri().getPath());
>   return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
> }
> {code}
> When we close FileSystem when GETHOMEDIRECTORY, NN don't do fullgc.
> {code:java}
> case GETHOMEDIRECTORY: {
>   FileSystem fs = null;
>   try {
> fs = FileSystem.get(conf != null ? conf : new Configuration());
> final String js = JsonUtil.toJsonString("Path",
> fs.getHomeDirectory().toUri().getPath());
> return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
>   } finally {
> if (fs != null) {
>   fs.close();
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDDS-1684) OM should create Ratis related dirs only if ratis is enabled

2019-06-20 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru resolved HDDS-1684.
--
Resolution: Fixed

> OM should create Ratis related dirs only if ratis is enabled
> 
>
> Key: HDDS-1684
> URL: https://issues.apache.org/jira/browse/HDDS-1684
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In OM, Ratis related dirs (storage, snapshot etc.) should only be created if 
> OM ratis is enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-1684) OM should create Ratis related dirs only if ratis is enabled

2019-06-13 Thread Hanisha Koneru (JIRA)

Hanisha Koneru created HDDS-1684:


 Summary: OM should create Ratis related dirs only if ratis is 
enabled
 Key: HDDS-1684
 URL: https://issues.apache.org/jira/browse/HDDS-1684
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru


In OM, Ratis related dirs (storage, snapshot etc.) should only be created if OM 
ratis is enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-1683) Update Ratis to 0.4.0-300d9c5-SNAPSHOT

2019-06-13 Thread Hanisha Koneru (JIRA)

Hanisha Koneru created HDDS-1683:


 Summary: Update Ratis to 0.4.0-300d9c5-SNAPSHOT
 Key: HDDS-1683
 URL: https://issues.apache.org/jira/browse/HDDS-1683
 Project: Hadoop Distributed Data Store
  Issue Type: Task
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru


Update Ratis dependency to latest build - 0.4.0-300d9c5-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1649) On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-06-11 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1649:
-
Summary: On installSnapshot notification from OM leader, download 
checkpoint and reload OM state  (was: On installSnapshot notification from OM 
leader, download checkpoint)

> On installSnapshot notification from OM leader, download checkpoint and 
> reload OM state
> ---
>
> Key: HDDS-1649
> URL: https://issues.apache.org/jira/browse/HDDS-1649
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>
> When an OM follower receives installSnapshot notification from OM leader, it 
> should initiate a new checkpoint on the OM leader and download that 
> checkpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1496) Support partial chunk reads and checksum verification

2019-06-06 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1496:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Support partial chunk reads and checksum verification
> -
>
> Key: HDDS-1496
> URL: https://issues.apache.org/jira/browse/HDDS-1496
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> BlockInputStream#readChunkFromContainer() reads the whole chunk from disk 
> even if we need to read only a part of the chunk.
> This Jira aims to improve readChunkFromContainer so that only that part of 
> the chunk file is read which is needed by client plus the part of chunk file 
> which is required to verify the checksum.
> For example, lets say the client is reading from index 120 to 450 in the 
> chunk. And let's say checksum is stored for every 100 bytes in the chunk i.e. 
> the first checksum is for bytes from index 0 to 99, the next for bytes from 
> index 100 to 199 and so on. To verify bytes from 120 to 450, we would need to 
> read from bytes 100 to 499 so that checksum verification can be done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-1651) Create a http.policy config for Ozone

2019-06-05 Thread Hanisha Koneru (JIRA)

Hanisha Koneru created HDDS-1651:


 Summary: Create a http.policy config for Ozone
 Key: HDDS-1651
 URL: https://issues.apache.org/jira/browse/HDDS-1651
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Hanisha Koneru


Ozone currently uses dfs.http.policy for HTTP policy. Ozone should have its own 
ozone.http.policy configuration and if undefined, then fallback to 
dfs.http.policy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-1649) On installSnapshot notification from OM leader, download checkpoint

2019-06-05 Thread Hanisha Koneru (JIRA)

Hanisha Koneru created HDDS-1649:


 Summary: On installSnapshot notification from OM leader, download 
checkpoint
 Key: HDDS-1649
 URL: https://issues.apache.org/jira/browse/HDDS-1649
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru


When an OM follower receives installSnapshot notification from OM leader, it 
should initiate a new checkpoint on the OM leader and download that checkpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1224) Restructure code to validate the response from server in the Read path

2019-06-04 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1224:
-
Fix Version/s: 0.5.0

> Restructure code to validate the response from server in the Read path
> --
>
> Key: HDDS-1224
> URL: https://issues.apache.org/jira/browse/HDDS-1224
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0, 0.4.1
>
> Attachments: HDDS-1224.000.patch
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> In the read path, the validation of the response while reading the data from 
> the datanodes happen in XceiverClientGrpc as well as additional  Checksum 
> verification happens in Ozone client to verify the read chunk response. The 
> aim of this Jira is to modify the function call to take a validator function 
> as a part of reading data so all validation can happen in a single unified 
> place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1224) Restructure code to validate the response from server in the Read path

2019-06-04 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1224:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Restructure code to validate the response from server in the Read path
> --
>
> Key: HDDS-1224
> URL: https://issues.apache.org/jira/browse/HDDS-1224
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1
>
> Attachments: HDDS-1224.000.patch
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> In the read path, the validation of the response while reading the data from 
> the datanodes happen in XceiverClientGrpc as well as additional  Checksum 
> verification happens in Ozone client to verify the read chunk response. The 
> aim of this Jira is to modify the function call to take a validator function 
> as a part of reading data so all validation can happen in a single unified 
> place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDDS-1580) Obtain Handler reference in ContainerScrubber

2019-05-28 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru resolved HDDS-1580.
--
Resolution: Fixed

> Obtain Handler reference in ContainerScrubber
> -
>
> Key: HDDS-1580
> URL: https://issues.apache.org/jira/browse/HDDS-1580
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Affects Versions: 0.5.0
>Reporter: Shweta
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Obtain reference to Handler based on containerType in scrub() in 
> ContainerScrubber.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-1529) BlockInputStream: Avoid buffer copy if the whole chunk is being read

2019-05-14 Thread Hanisha Koneru (JIRA)

Hanisha Koneru created HDDS-1529:


 Summary: BlockInputStream: Avoid buffer copy if the whole chunk is 
being read
 Key: HDDS-1529
 URL: https://issues.apache.org/jira/browse/HDDS-1529
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Hanisha Koneru


Currently, BlockInputStream reads chunk data from DNs and puts it in a local 
buffer and then copies the data to clients buffer. This is required for partial 
chunk reads where extra chunk data than requested might have to be read so that 
checksum verification can be done. But if the whole chunk is being read, we can 
copy the data directly into client buffer and avoid double buffer copies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-1528) Buffer Pool in BlockInputStream

2019-05-14 Thread Hanisha Koneru (JIRA)

Hanisha Koneru created HDDS-1528:


 Summary: Buffer Pool in BlockInputStream
 Key: HDDS-1528
 URL: https://issues.apache.org/jira/browse/HDDS-1528
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Hanisha Koneru


In BlockInputStream, when a new chunk is being read, the data in old buffers is 
thrown away. So, if we want to read data in chunk1 and then chunk2 and then go 
back to reading chunk1, we would have to fetch chunk1 again from the DNs.
Reads can be optimized if we maintain a buffer pool instead instead of only one 
buffer. This way, we can cache chunk data for re-reads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Reopened] (HDDS-1474) "ozone.scm.datanode.id" config should take path for a dir and not a file

2019-05-09 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru reopened HDDS-1474:
--

> "ozone.scm.datanode.id" config should take path for a dir and not a file
> 
>
> Key: HDDS-1474
> URL: https://issues.apache.org/jira/browse/HDDS-1474
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Minor
>  Labels: newbie, pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Currently, the ozone config "ozone.scm.datanode.id" takes file path as its 
> value. It should instead take dir path as its value and assume a standard 
> filename "datanode.id"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDDS-1474) "ozone.scm.datanode.id" config should take path for a dir and not a file

2019-05-09 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru resolved HDDS-1474.
--
Resolution: Fixed

> "ozone.scm.datanode.id" config should take path for a dir and not a file
> 
>
> Key: HDDS-1474
> URL: https://issues.apache.org/jira/browse/HDDS-1474
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Minor
>  Labels: newbie, pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently, the ozone config "ozone.scm.datanode.id" takes file path as its 
> value. It should instead take dir path as its value and assume a standard 
> filename "datanode.id"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1496) Support partial chunk reads and checksum verification

2019-05-08 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1496:
-
Summary: Support partial chunk reads and checksum verification  (was: 
readChunkFromContainer() should only read the required part of chunk file)

> Support partial chunk reads and checksum verification
> -
>
> Key: HDDS-1496
> URL: https://issues.apache.org/jira/browse/HDDS-1496
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>
> BlockInputStream#readChunkFromContainer() reads the whole chunk from disk 
> even if we need to read only a part of the chunk.
> This Jira aims to improve readChunkFromContainer so that only that part of 
> the chunk file is read which is needed by client plus the part of chunk file 
> which is required to verify the checksum.
> For example, lets say the client is reading from index 120 to 450 in the 
> chunk. And let's say checksum is stored for every 100 bytes in the chunk i.e. 
> the first checksum is for bytes from index 0 to 99, the next for bytes from 
> index 100 to 199 and so on. To verify bytes from 120 to 450, we would need to 
> read from bytes 100 to 499 so that checksum verification can be done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1491) Ozone KeyInputStream seek() should not read the chunk file

2019-05-06 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1491:
-
Target Version/s: 0.5.0

> Ozone KeyInputStream seek() should not read the chunk file
> --
>
> Key: HDDS-1491
> URL: https://issues.apache.org/jira/browse/HDDS-1491
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> KeyInputStream#seek() calls BlockInputStream#seek() to adjust the buffer 
> position to the seeked position. As part of the seek operation, the whole 
> chunk is read from the container and stored in the buffer so that the buffer 
> position can be advanced to the seeked position. 
> We should not read from disk on a seek() operation. Instead, for a read 
> operation, when the chunk file is read and put in the buffer, at that time, 
> we can advance the buffer position to the previously seeked position.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1496) readChunkFromContainer() should only read the required part of chunk file

2019-05-06 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1496:
-
Target Version/s: 0.5.0

> readChunkFromContainer() should only read the required part of chunk file
> -
>
> Key: HDDS-1496
> URL: https://issues.apache.org/jira/browse/HDDS-1496
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>
> BlockInputStream#readChunkFromContainer() reads the whole chunk from disk 
> even if we need to read only a part of the chunk.
> This Jira aims to improve readChunkFromContainer so that only that part of 
> the chunk file is read which is needed by client plus the part of chunk file 
> which is required to verify the checksum.
> For example, lets say the client is reading from index 120 to 450 in the 
> chunk. And let's say checksum is stored for every 100 bytes in the chunk i.e. 
> the first checksum is for bytes from index 0 to 99, the next for bytes from 
> index 100 to 199 and so on. To verify bytes from 120 to 450, we would need to 
> read from bytes 100 to 499 so that checksum verification can be done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-1496) readChunkFromContainer() should only read the required part of chunk file

2019-05-06 Thread Hanisha Koneru (JIRA)

Hanisha Koneru created HDDS-1496:


 Summary: readChunkFromContainer() should only read the required 
part of chunk file
 Key: HDDS-1496
 URL: https://issues.apache.org/jira/browse/HDDS-1496
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru


BlockInputStream#readChunkFromContainer() reads the whole chunk from disk even 
if we need to read only a part of the chunk.
This Jira aims to improve readChunkFromContainer so that only that part of the 
chunk file is read which is needed by client plus the part of chunk file which 
is required to verify the checksum.



For example, lets say the client is reading from index 120 to 450 in the chunk. 
And let's say checksum is stored for every 100 bytes in the chunk i.e. the 
first checksum is for bytes from index 0 to 99, the next for bytes from index 
100 to 199 and so on. To verify bytes from 120 to 450, we would need to read 
from bytes 100 to 499 so that checksum verification can be done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-1491) Ozone KeyInputStream seek() should not read the chunk file

2019-05-05 Thread Hanisha Koneru (JIRA)

Hanisha Koneru created HDDS-1491:


 Summary: Ozone KeyInputStream seek() should not read the chunk file
 Key: HDDS-1491
 URL: https://issues.apache.org/jira/browse/HDDS-1491
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru


KeyInputStream#seek() calls BlockInputStream#seek() to adjust the buffer 
position to the seeked position. As part of the seek operation, the whole chunk 
is read from the container and stored in the buffer so that the buffer position 
can be advanced to the seeked position. 

We should not read from disk on a seek() operation. Instead, for a read 
operation, when the chunk file is read and put in the buffer, at that time, we 
can advance the buffer position to the previously seeked position.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1473) DataNode ID file should be human readable

2019-05-02 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1473:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> DataNode ID file should be human readable
> -
>
> Key: HDDS-1473
> URL: https://issues.apache.org/jira/browse/HDDS-1473
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Arpit Agarwal
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: newbie, pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> The DataNode ID file should be human readable to make debugging easier. We 
> should use YAML as we have used it elsewhere for meta files.
> Currently it is a binary file whose contents are protobuf encoded. This is a 
> tiny file read once on startup, so performance is not a concern.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1403) KeyOutputStream writes fails after max retries while writing to a closed container

2019-04-26 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1403:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> KeyOutputStream writes fails after max retries while writing to a closed 
> container
> --
>
> Key: HDDS-1403
> URL: https://issues.apache.org/jira/browse/HDDS-1403
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: MiniOzoneChaosCluster, pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently a Ozone Client retries a write operation 5 times. It is possible 
> that the container being written to is already closed by the time it is 
> written to. The key write will fail after retrying multiple times with this 
> error. This needs to be fixed as this is an internal error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-1464) Client should have different retry policies for different exceptions

2019-04-24 Thread Hanisha Koneru (JIRA)

Hanisha Koneru created HDDS-1464:


 Summary: Client should have different retry policies for different 
exceptions
 Key: HDDS-1464
 URL: https://issues.apache.org/jira/browse/HDDS-1464
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Hanisha Koneru


Client should have different retry policies for different type of failures.

For example, If a key write fails because of ContainerNotOpen exception, the 
client should wait for a specified interval before retrying. But if the key 
write fails because of lets say ratis leader election or request timeout, we 
want the client to retry immediately.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1403) KeyOutputStream writes fails after max retries while writing to a closed container

2019-04-18 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1403:
-
Status: Patch Available  (was: Open)

> KeyOutputStream writes fails after max retries while writing to a closed 
> container
> --
>
> Key: HDDS-1403
> URL: https://issues.apache.org/jira/browse/HDDS-1403
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: MiniOzoneChaosCluster, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently a Ozone Client retries a write operation 5 times. It is possible 
> that the container being written to is already closed by the time it is 
> written to. The key write will fail after retrying multiple times with this 
> error. This needs to be fixed as this is an internal error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDDS-1403) KeyOutputStream writes fails after max retries while writing to a closed container

2019-04-18 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru reassigned HDDS-1403:


Assignee: Hanisha Koneru

> KeyOutputStream writes fails after max retries while writing to a closed 
> container
> --
>
> Key: HDDS-1403
> URL: https://issues.apache.org/jira/browse/HDDS-1403
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: MiniOzoneChaosCluster
>
> Currently a Ozone Client retries a write operation 5 times. It is possible 
> that the container being written to is already closed by the time it is 
> written to. The key write will fail after retrying multiple times with this 
> error. This needs to be fixed as this is an internal error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1376) Datanode exits while executing client command when scmId is null

2019-04-16 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1376:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Datanode exits while executing client command when scmId is null
> 
>
> Key: HDDS-1376
> URL: https://issues.apache.org/jira/browse/HDDS-1376
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: MiniOzoneChaosCluster, pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Ozone Datanode exits with the following error, this happens because DN hasn't 
> received a scmID from the SCM after registration but is processing a client 
> command.
> {code}
> 2019-04-03 17:02:10,958 ERROR storage.RaftLogWorker 
> (ExitUtils.java:terminate(133)) - Terminating with exit status 1: 
> df6b578e-8d35-44f5-9b21-db7184dcc54e-RaftLogWorker failed.
> java.io.IOException: java.lang.NullPointerException: scmId cannot be null
> at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
> at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
> at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:83)
> at 
> org.apache.ratis.server.storage.RaftLogWorker$StateMachineDataPolicy.getFromFuture(RaftLogWorker.java:76)
> at 
> org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:354)
> at 
> org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:219)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException: scmId cannot be null
> at 
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
> at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.create(KeyValueContainer.java:110)
> at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleCreateContainer(KeyValueHandler.java:243)
> at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:165)
> at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.createContainer(HddsDispatcher.java:350)
> at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:224)
> at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:149)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:347)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:354)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk$0(ContainerStateMachine.java:385)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run$$$capture(CompletableFuture.java:1590)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDDS-1414) TestOzoneManagerHA.testMultipartUploadWithOneOmNodeDown is flaky

2019-04-10 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru reassigned HDDS-1414:


Assignee: Hanisha Koneru

> TestOzoneManagerHA.testMultipartUploadWithOneOmNodeDown is flaky
> 
>
> Key: HDDS-1414
> URL: https://issues.apache.org/jira/browse/HDDS-1414
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: ozone-flaky-test
> Attachments: ci2.log
>
>
> TestOzoneManagerHA.testMultipartUploadWithOneOmNodeDown is flaky, we get the 
> below exception when it fails.
> {code}
> org.apache.ratis.protocol.AlreadyClosedException: SlidingWindow$Client 
> client-04649B8D5AF3->RAFT is closed.
>  at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisClient.sendCommand(OzoneManagerRatisClient.java:133)
>  at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestToRatis(OzoneManagerProtocolServerSideTranslatorPB.java:97)
>  at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:83)
>  at 
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> Caused by: org.apache.ratis.protocol.AlreadyClosedException: 
> SlidingWindow$Client client-04649B8D5AF3->RAFT is closed.
>  at 
> org.apache.ratis.util.SlidingWindow$Client.alreadyClosed(SlidingWindow.java:350)
>  at 
> org.apache.ratis.util.SlidingWindow$Client.submitNewRequest(SlidingWindow.java:224)
>  at 
> org.apache.ratis.client.impl.RaftClientImpl.sendAsync(RaftClientImpl.java:207)
>  at 
> org.apache.ratis.client.impl.RaftClientImpl.sendAsync(RaftClientImpl.java:174)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisClient.sendRequestAsync(OzoneManagerRatisClient.java:208)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisClient.sendCommandAsync(OzoneManagerRatisClient.java:168)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisClient.sendCommand(OzoneManagerRatisClient.java:132)
>  ... 11 more
> Caused by: org.apache.ratis.protocol.RaftRetryFailureException: Failed 
> RaftClientRequest:client-04649B8D5AF3->omNode-1@group-523986131536, cid=71, 
> seq=1*, RW, 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisClient$$Lambda$396/1529424209@7ae5da75
>  for 10 attempts with RetryLimited(maxAttempts=10, sleepTime=100ms)
>  at 
> org.apache.ratis.client.impl.RaftClientImpl.newRaftRetryFailureException(RaftClientImpl.java:383)
>  at 
> org.apache.ratis.client.impl.RaftClientImpl.handleAsyncRetryFailure(RaftClientImpl.java:388)
>  at 
> org.apache.ratis.client.impl.RaftClientImpl.lambda$sendRequestAsync$14(RaftClientImpl.java:370)
>  at 
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
>  at 
> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
>  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>  at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>  at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.completeReplyExceptionally(GrpcClientProtocolClient.java:329)
>  at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$000(GrpcClientProtocolClient.java:245)
>  at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:257)
>  at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:248)
>  at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:421)
>  at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
>  at 
>

[jira] [Updated] (HDDS-1371) Download RocksDB checkpoint from OM Leader to Follower

2019-04-05 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1371:
-
Description: 
If a follower OM is lagging way behind the leader OM or in case of a restart or 
bootstrapping, a follower OM might need RocksDB checkpoint from the leader to 
catch up with it. This is because the leader might have purged its logs after 
taking a snapshot.
 This Jira aims to add support to download a RocksDB checkpoint from leader OM 
to follower OM through a HTTP servlet. We reuse the DBCheckpoint servlet used 
by Recon server. 

  was:
If a follower OM is lagging way behind the leader OM or in case of a restart or 
bootstrapping, a follower OM might need RocksDB checkpoint from the leader to 
catch up with it. This is because the leader might have purged its logs after 
taking a snapshot.
This Jira aims to add support to download a RocksDB checkpoint from leader OM 
to follower OM through a HTTP servlet. We reuse the servlet used by Recon 
server. 


> Download RocksDB checkpoint from OM Leader to Follower
> --
>
> Key: HDDS-1371
> URL: https://issues.apache.org/jira/browse/HDDS-1371
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If a follower OM is lagging way behind the leader OM or in case of a restart 
> or bootstrapping, a follower OM might need RocksDB checkpoint from the leader 
> to catch up with it. This is because the leader might have purged its logs 
> after taking a snapshot.
>  This Jira aims to add support to download a RocksDB checkpoint from leader 
> OM to follower OM through a HTTP servlet. We reuse the DBCheckpoint servlet 
> used by Recon server. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-1371) Download RocksDB checkpoint from OM Leader to Follower

2019-04-02 Thread Hanisha Koneru (JIRA)

Hanisha Koneru created HDDS-1371:


 Summary: Download RocksDB checkpoint from OM Leader to Follower
 Key: HDDS-1371
 URL: https://issues.apache.org/jira/browse/HDDS-1371
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru


If a follower OM is lagging way behind the leader OM or in case of a restart or 
bootstrapping, a follower OM might need RocksDB checkpoint from the leader to 
catch up with it. This is because the leader might have purged its logs after 
taking a snapshot.
This Jira aims to add support to download a RocksDB checkpoint from leader OM 
to follower OM through a HTTP servlet. We reuse the servlet used by Recon 
server. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1371) Download RocksDB checkpoint from OM Leader to Follower

2019-04-02 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1371:
-
Target Version/s: 0.5.0

> Download RocksDB checkpoint from OM Leader to Follower
> --
>
> Key: HDDS-1371
> URL: https://issues.apache.org/jira/browse/HDDS-1371
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>
> If a follower OM is lagging way behind the leader OM or in case of a restart 
> or bootstrapping, a follower OM might need RocksDB checkpoint from the leader 
> to catch up with it. This is because the leader might have purged its logs 
> after taking a snapshot.
> This Jira aims to add support to download a RocksDB checkpoint from leader OM 
> to follower OM through a HTTP servlet. We reuse the servlet used by Recon 
> server. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-1339) Implement Ratis Snapshots on OM

2019-03-26 Thread Hanisha Koneru (JIRA)

Hanisha Koneru created HDDS-1339:


 Summary: Implement Ratis Snapshots on OM
 Key: HDDS-1339
 URL: https://issues.apache.org/jira/browse/HDDS-1339
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru


For bootstrapping and restarting OMs, we need to implement snapshots in OM. The 
OM state maintained by RocksDB will be checkpoint-ed on demand. Ratis snapshots 
will only preserve the last applied log index by the State Machine on disk. 
This index will be stored in file in the OM metadata dir.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDDS-1324) TestOzoneManagerHA seems to be flaky

2019-03-25 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru reassigned HDDS-1324:


Assignee: Hanisha Koneru

> TestOzoneManagerHA seems to be flaky
> 
>
> Key: HDDS-1324
> URL: https://issues.apache.org/jira/browse/HDDS-1324
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.5.0
>Reporter: Arpit Agarwal
>Assignee: Hanisha Koneru
>Priority: Major
>
> TestOzoneManagerHA failed once with the following error:
> {code}
> [ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 105.931 s <<< FAILURE! - in org.apache.hadoop.ozone.om.TestOzoneManagerHA
> [ERROR] testOMRetryProxy(org.apache.hadoop.ozone.om.TestOzoneManagerHA)  Time 
> elapsed: 21.781 s  <<< FAILURE!
> java.lang.AssertionError: expected:<30> but was:<10>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.ozone.om.TestOzoneManagerHA.testOMRetryProxy(TestOzoneManagerHA.java:305)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-1311) Make Install Snapshot option configurable

2019-03-19 Thread Hanisha Koneru (JIRA)

Hanisha Koneru created HDDS-1311:


 Summary: Make Install Snapshot option configurable
 Key: HDDS-1311
 URL: https://issues.apache.org/jira/browse/HDDS-1311
 Project: Hadoop Distributed Data Store
  Issue Type: New Feature
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru


This Jira aims to make the install snapshot command from leader to follower 
configurable. By default, install snapshot should be enabled. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14320) Support skipTrash for WebHDFS

2019-03-08 Thread Hanisha Koneru (JIRA)



[ 
https://issues.apache.org/jira/browse/HDFS-14320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16788248#comment-16788248
 ] 

Hanisha Koneru commented on HDFS-14320:
---

Thank you [~kpalanisamy] for working on this. 

The patch LGTM overall. Just one question - In 
\{{NamenodeWebHdfsMethods#delete()}}, when we are moving the file to 
appropriate trash directory, will {{FileSystem.get(config)}} get the default FS 
of the cluster or the FS corresponding to the fullPath of the file. Do we set 
the defaultFS in the configuration when initializing WebHdfs?
{code:java}
b = Trash.moveToAppropriateTrash(FileSystem.get(config),
new org.apache.hadoop.fs.Path(fullpath), config);{code}

> Support skipTrash for WebHDFS 
> --
>
> Key: HDFS-14320
> URL: https://issues.apache.org/jira/browse/HDFS-14320
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode, webhdfs
>Affects Versions: 3.2.0
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Major
> Attachments: HDFS-14320-001.patch, HDFS-14320-002.patch, 
> HDFS-14320-003.patch, HDFS-14320-004.patch, HDFS-14320-005.patch, 
> HDFS-14320-006.patch, HDFS-14320-007.patch
>
>
> Files/Directories deleted via webhdfs rest call doesn't use the skiptrash 
> feature, it would be deleted permanently. This feature is very important us 
> because our user has deleted large directory accidentally.
> By default, Skiptrash option is set to true, skiptrash=true. Any files, Using 
> CURL will be permanently deleted.
> Example:
> curl -iv -X DELETE 
> "http://:50070/webhdfs/v1/tmp/sampledata?op=DELETE=hdfs=true;
>  
> Use skiptrash=false, to move files to trash Instead.
> Example:
> curl -iv -X DELETE 
> "http://:50070/webhdfs/v1/tmp/sampledata?op=DELETE=hdfs=true=false;
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-1175) Serve read requests directly from RocksDB

2019-03-06 Thread Hanisha Koneru (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786345#comment-16786345
 ] 

Hanisha Koneru commented on HDDS-1175:
--

Thank you [~linyiqun] and [~arpitagarwal].

I have merged the PR to trunk.

> Serve read requests directly from RocksDB
> -
>
> Key: HDDS-1175
> URL: https://issues.apache.org/jira/browse/HDDS-1175
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Manager
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1175.001.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We can directly server read requests from the OM's RocksDB instead of going 
> through the Ratis server. OM should first check its role and only if it is 
> the leader can it server read requests. 
> There can be a scenario where an OM can lose its Leader status but not know 
> about the new election in the ring. This OM could server stale reads for the 
> duration of the heartbeat timeout but this should be acceptable (similar to 
> how Standby Namenode could possibly server stale reads till it figures out 
> the new status).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1175) Serve read requests directly from RocksDB

2019-03-06 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1175:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Serve read requests directly from RocksDB
> -
>
> Key: HDDS-1175
> URL: https://issues.apache.org/jira/browse/HDDS-1175
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Manager
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1175.001.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We can directly server read requests from the OM's RocksDB instead of going 
> through the Ratis server. OM should first check its role and only if it is 
> the leader can it server read requests. 
> There can be a scenario where an OM can lose its Leader status but not know 
> about the new election in the ring. This OM could server stale reads for the 
> duration of the heartbeat timeout but this should be acceptable (similar to 
> how Standby Namenode could possibly server stale reads till it figures out 
> the new status).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1225) Provide docker-compose for OM HA

2019-03-05 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1225:
-
Component/s: Ozone Manager
 HA
 docker

> Provide docker-compose for OM HA
> 
>
> Key: HDDS-1225
> URL: https://issues.apache.org/jira/browse/HDDS-1225
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: docker, HA, Ozone Manager
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>
> **This Jira proposes to add docker-compose file to run local pseudo cluster 
> with OM HA (3 OM nodes).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1225) Provide docker-compose for OM HA

2019-03-05 Thread Hanisha Koneru (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-1225:
-
Target Version/s: 0.5.0

> Provide docker-compose for OM HA
> 
>
> Key: HDDS-1225
> URL: https://issues.apache.org/jira/browse/HDDS-1225
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: docker, HA, Ozone Manager
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>
> **This Jira proposes to add docker-compose file to run local pseudo cluster 
> with OM HA (3 OM nodes).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1314 matches

Mail list logo