[jira] [Created] (HDFS-15619) Metric for ordered snapshot deletion GC thread

2020-10-08 Thread Nilotpal Nandi (Jira)
Nilotpal Nandi created HDFS-15619:
-

 Summary: Metric for ordered snapshot deletion GC thread
 Key: HDFS-15619
 URL: https://issues.apache.org/jira/browse/HDFS-15619
 Project: Hadoop HDFS
  Issue Type: Task
  Components: hdfs
Reporter: Nilotpal Nandi
Assignee: Nilotpal Nandi


Following info should be captured and shown in JMX for garbage collection 
thread of ordered snapshot deletion
 * metric for all pending snapshots to be GCed
 * Number of times GC thread ran
 * Number of Snapshots already GCed
 * Average time taken by each GC run
 * Thread running Status
 * metric for failed deletion of GC thread



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2604) scmcli pipeline deactivate command not working

2019-11-21 Thread Nilotpal Nandi (Jira)
Nilotpal Nandi created HDDS-2604:


 Summary: scmcli pipeline deactivate command not working
 Key: HDDS-2604
 URL: https://issues.apache.org/jira/browse/HDDS-2604
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM Client
Reporter: Nilotpal Nandi
Assignee: Nilotpal Nandi


scmcli pipeline deactivate not working

 

output :
{noformat}
ozone scmcli pipeline deactivate 212e1f47-4890-49c2-a950-4d0b3a70cbfd
Unknown command type: DeactivatePipeline
root@st-ozone-kg2qce-l2ltm:/ansible# echo $?
255{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14980) diskbalancer query command always tries to contact to port 9867

2019-11-12 Thread Nilotpal Nandi (Jira)
Nilotpal Nandi created HDFS-14980:
-

 Summary: diskbalancer query command always tries to contact to 
port 9867
 Key: HDFS-14980
 URL: https://issues.apache.org/jira/browse/HDFS-14980
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: diskbalancer
Reporter: Nilotpal Nandi


disbalancer query commands always tries to connect to port 9867 even when 
datanode IPC port is different.

In this setup , datanode IPC port is set to 20001.

 

diskbalancer report command works fine and connects to IPC port 20001

 
{noformat}
hdfs diskbalancer -report -node 172.27.131.193
19/11/12 08:58:55 INFO command.Command: Processing report command
19/11/12 08:58:57 INFO balancer.KeyManager: Block token params received from 
NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
19/11/12 08:58:57 INFO block.BlockTokenSecretManager: Setting block keys
19/11/12 08:58:57 INFO balancer.KeyManager: Update block keys every 2hrs, 
30mins, 0sec
19/11/12 08:58:58 INFO command.Command: Reporting volume information for 
DataNode(s). These DataNode(s) are parsed from '172.27.131.193'.
Processing report command
Reporting volume information for DataNode(s). These DataNode(s) are parsed from 
'172.27.131.193'.
[172.27.131.193:20001] - : 3 
volumes with node data density 0.05.
[DISK: volume-/dataroot/ycloud/dfs/NEW_DISK1/] - 0.15 used: 
39343871181/259692498944, 0.85 free: 220348627763/259692498944, isFailed: 
False, isReadOnly: False, isSkip: False, isTransient: False.
[DISK: volume-/dataroot/ycloud/dfs/NEW_DISK2/] - 0.15 used: 
39371179986/259692498944, 0.85 free: 220321318958/259692498944, isFailed: 
False, isReadOnly: False, isSkip: False, isTransient: False.
[DISK: volume-/dataroot/ycloud/dfs/dn/] - 0.19 used: 49934903670/259692498944, 
0.81 free: 209757595274/259692498944, isFailed: False, isReadOnly: False, 
isSkip: False, isTransient: False.
 
{noformat}
 

But  diskbalancer query command fails and tries to connect to port 9867 
(default port).

 
{noformat}
hdfs diskbalancer -query 172.27.131.193
19/11/12 06:37:15 INFO command.Command: Executing "query plan" command.
19/11/12 06:37:16 INFO ipc.Client: Retrying connect to server: 
/172.27.131.193:9867. Already tried 0 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
19/11/12 06:37:17 INFO ipc.Client: Retrying connect to server: 
/172.27.131.193:9867. Already tried 1 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
..
..
..

19/11/12 06:37:25 ERROR tools.DiskBalancerCLI: Exception thrown while running 
DiskBalancerCLI.

{noformat}
 

 

Expectation :

diskbalancer query command should work fine without explicitly mentioning 
datanode IPC port address



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2350) NullPointerException seen in datanode log while writing data

2019-10-23 Thread Nilotpal Nandi (Jira)
Nilotpal Nandi created HDDS-2350:


 Summary: NullPointerException seen in datanode log while writing 
data
 Key: HDDS-2350
 URL: https://issues.apache.org/jira/browse/HDDS-2350
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Nilotpal Nandi


NullPointerException exception seen in datanode log while writing 10GB data. 
There is one pipelinee with factor 3 while writing data.
{noformat}
2019-10-23 11:25:45,674 ERROR 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: Error getting metrics 
from source 
ratis_core.ratis_leader.a23fb300-4c1e-420f-a21e-7e73d0c22cbe@group-4CA404C938C2
java.lang.NullPointerException
 at 
org.apache.ratis.server.impl.RaftLeaderMetrics.lambda$null$2(RaftLeaderMetrics.java:86)
 at 
com.github.joshelser.dropwizard.metrics.hadoop.HadoopMetrics2Reporter.snapshotAllMetrics(HadoopMetrics2Reporter.java:239)
 at 
com.github.joshelser.dropwizard.metrics.hadoop.HadoopMetrics2Reporter.getMetrics(HadoopMetrics2Reporter.java:219)
 at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:200)
 at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(MetricsSystemImpl.java:419)
 at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics(MetricsSystemImpl.java:406)
 at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.onTimerEvent(MetricsSystemImpl.java:381)
 at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run(MetricsSystemImpl.java:368)
 at java.util.TimerThread.mainLoop(Timer.java:555)
 at java.util.TimerThread.run(Timer.java:505)
2019-10-23 11:25:55,673 ERROR 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: Error getting metrics 
from source 
ratis_core.ratis_leader.a23fb300-4c1e-420f-a21e-7e73d0c22cbe@group-4CA404C938C2
java.lang.NullPointerException
 at 
org.apache.ratis.server.impl.RaftLeaderMetrics.lambda$null$2(RaftLeaderMetrics.java:86)
 at 
com.github.joshelser.dropwizard.metrics.hadoop.HadoopMetrics2Reporter.snapshotAllMetrics(HadoopMetrics2Reporter.java:239)
 at 
com.github.joshelser.dropwizard.metrics.hadoop.HadoopMetrics2Reporter.getMetrics(HadoopMetrics2Reporter.java:219)
 at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:200)
 at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(MetricsSystemImpl.java:419)
 at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics(MetricsSystemImpl.java:406)
 at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.onTimerEvent(MetricsSystemImpl.java:381)
 at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run(MetricsSystemImpl.java:368)
 at java.util.TimerThread.mainLoop(Timer.java:555)
 at java.util.TimerThread.run(Timer.java:505)
2019-10-23 11:26:05,674 ERROR 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: Error getting metrics 
from source 
ratis_core.ratis_leader.a23fb300-4c1e-420f-a21e-7e73d0c22cbe@group-4CA404C938C2
java.lang.NullPointerException
 at 
org.apache.ratis.server.impl.RaftLeaderMetrics.lambda$null$2(RaftLeaderMetrics.java:86)
 at 
com.github.joshelser.dropwizard.metrics.hadoop.HadoopMetrics2Reporter.snapshotAllMetrics(HadoopMetrics2Reporter.java:239)
 at 
com.github.joshelser.dropwizard.metrics.hadoop.HadoopMetrics2Reporter.getMetrics(HadoopMetrics2Reporter.java:219)
 at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:200)
 at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(MetricsSystemImpl.java:419)
 at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics(MetricsSystemImpl.java:406)
 at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.onTimerEvent(MetricsSystemImpl.java:381)
 at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run(MetricsSystemImpl.java:368)
 at java.util.TimerThread.mainLoop(Timer.java:555)
 at java.util.TimerThread.run(Timer.java:505){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2043) "VOLUME_NOT_FOUND" exception thrown while listing volumes

2019-08-27 Thread Nilotpal Nandi (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-2043:
-
Description: 
ozone list volume command throws OMException

bin/ozone sh volume list --user root
 VOLUME_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Volume 
info not found for vol-test-putfile-1566902803

 

On enabling DEBUG log , here is the console output :

 

 
{noformat}
bin/ozone sh volume create /n1 ; echo $?
2019-08-27 11:47:16 DEBUG ThriftSenderFactory:33 - Using the UDP Sender to send 
spans to the agent.
2019-08-27 11:47:16 DEBUG SenderResolver:86 - Using sender UdpSender()
2019-08-27 11:47:16 DEBUG MutableMetricsFactory:43 - field 
org.apache.hadoop.metrics2.lib.MutableRate 
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, 
always=false, valueName=Time, about=, interval=10, type=DEFAULT, value=[Rate of 
successful kerberos logins and latency (milliseconds)])
2019-08-27 11:47:16 DEBUG MutableMetricsFactory:43 - field 
org.apache.hadoop.metrics2.lib.MutableRate 
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, 
always=false, valueName=Time, about=, interval=10, type=DEFAULT, value=[Rate of 
failed kerberos logins and latency (milliseconds)])
2019-08-27 11:47:16 DEBUG MutableMetricsFactory:43 - field 
org.apache.hadoop.metrics2.lib.MutableRate 
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, 
always=false, valueName=Time, about=, interval=10, type=DEFAULT, 
value=[GetGroups])
2019-08-27 11:47:16 DEBUG MutableMetricsFactory:43 - field private 
org.apache.hadoop.metrics2.lib.MutableGaugeLong 
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailuresTotal 
with annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, 
always=false, valueName=Time, about=, interval=10, type=DEFAULT, value=[Renewal 
failures since startup])
2019-08-27 11:47:16 DEBUG MutableMetricsFactory:43 - field private 
org.apache.hadoop.metrics2.lib.MutableGaugeInt 
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailures with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, 
always=false, valueName=Time, about=, interval=10, type=DEFAULT, value=[Renewal 
failures since last successful login])
2019-08-27 11:47:16 DEBUG MetricsSystemImpl:231 - UgiMetrics, User and group 
related metrics
2019-08-27 11:47:16 DEBUG SecurityUtil:124 - Setting 
hadoop.security.token.service.use_ip to true
2019-08-27 11:47:16 DEBUG Shell:821 - setsid exited with exit code 0
2019-08-27 11:47:16 DEBUG Groups:449 - Creating new Groups object
2019-08-27 11:47:16 DEBUG Groups:151 - Group mapping 
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; 
cacheTimeout=30; warningDeltaMs=5000
2019-08-27 11:47:16 DEBUG UserGroupInformation:254 - hadoop login
2019-08-27 11:47:16 DEBUG UserGroupInformation:187 - hadoop login commit
2019-08-27 11:47:16 DEBUG UserGroupInformation:215 - using local 
user:UnixPrincipal: root
2019-08-27 11:47:16 DEBUG UserGroupInformation:221 - Using user: 
"UnixPrincipal: root" with name root
2019-08-27 11:47:16 DEBUG UserGroupInformation:235 - User entry: "root"
2019-08-27 11:47:16 DEBUG UserGroupInformation:766 - UGI loginUser:root 
(auth:SIMPLE)
2019-08-27 11:47:16 DEBUG OzoneClientFactory:287 - Using 
org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol.
2019-08-27 11:47:16 DEBUG Server:280 - rpcKind=RPC_PROTOCOL_BUFFER, 
rpcRequestWrapperClass=class 
org.apache.hadoop.ipc.ProtobufRpcEngine$RpcProtobufRequest, 
rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@710f4dc7
2019-08-27 11:47:16 DEBUG Client:63 - getting client out of cache: 
org.apache.hadoop.ipc.Client@24313fcc
2019-08-27 11:47:16 DEBUG Client:487 - The ping interval is 6 ms.
2019-08-27 11:47:16 DEBUG Client:785 - Connecting to 
nnandi-1.gce.cloudera.com/172.31.117.213:9862
2019-08-27 11:47:16 DEBUG Client:1064 - IPC Client (580871917) connection to 
nnandi-1.gce.cloudera.com/172.31.117.213:9862 from root: starting, having 
connections 1
2019-08-27 11:47:16 DEBUG Client:1127 - IPC Client (580871917) connection to 
nnandi-1.gce.cloudera.com/172.31.117.213:9862 from root sending #0 
org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
2019-08-27 11:47:17 DEBUG Client:1181 - IPC Client (580871917) connection to 
nnandi-1.gce.cloudera.com/172.31.117.213:9862 from root got value #0
2019-08-27 11:47:17 DEBUG ProtobufRpcEngine:249 - Call: submitRequest took 230ms
2019-08-27 11:47:17 DEBUG Client:63 - getting client out of cache: 
org.apache.hadoop.ipc.Client@24313fcc
2019-08-27 11:47:17 DEBUG Groups:312 - 

[jira] [Updated] (HDDS-2043) "VOLUME_NOT_FOUND" exception thrown while listing volumes

2019-08-27 Thread Nilotpal Nandi (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-2043:
-
Description: 
ozone list volume command throws OMException

bin/ozone sh volume list --user root
 VOLUME_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Volume 
info not found for vol-test-putfile-1566902803

  was:
ozone list volume command throws OMException

/opt/cloudera/parcels/CDH/bin/ozone sh volume list --user root
VOLUME_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Volume info 
not found for vol-test-putfile-1566902803


> "VOLUME_NOT_FOUND" exception thrown while listing volumes
> -
>
> Key: HDDS-2043
> URL: https://issues.apache.org/jira/browse/HDDS-2043
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone CLI, Ozone Manager
>Reporter: Nilotpal Nandi
>Priority: Major
>
> ozone list volume command throws OMException
> bin/ozone sh volume list --user root
>  VOLUME_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Volume 
> info not found for vol-test-putfile-1566902803



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2043) "VOLUME_NOT_FOUND" exception thrown while listing volumes

2019-08-27 Thread Nilotpal Nandi (Jira)
Nilotpal Nandi created HDDS-2043:


 Summary: "VOLUME_NOT_FOUND" exception thrown while listing volumes
 Key: HDDS-2043
 URL: https://issues.apache.org/jira/browse/HDDS-2043
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone CLI, Ozone Manager
Reporter: Nilotpal Nandi


ozone list volume command throws OMException

/opt/cloudera/parcels/CDH/bin/ozone sh volume list --user root
VOLUME_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Volume info 
not found for vol-test-putfile-1566902803



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1706) Replication Manager thread running too frequently

2019-06-19 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1706:
-
Status: Patch Available  (was: Open)

> Replication Manager thread running too frequently
> -
>
> Key: HDDS-1706
> URL: https://issues.apache.org/jira/browse/HDDS-1706
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1706.001.patch
>
>
> Replication manager is running too frequnently at a 3s interval in place of 
> 300s.
> {code}
> host: vc1337.halxg.cloudera.com, networkLocation: /default-rack, 
> certSerialId: null}.
> 2019-06-18 03:11:51,687 INFO 
> org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor 
> Thread took 4 milliseconds for processing 739 containers.
> .
> 2019-06-18 03:11:54,692 INFO 
> org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor 
> Thread took 4 milliseconds for processing 739 containers.
> {code}
> It is because of the following lines
> {code}
> @Config(key = "thread.interval",
> type = ConfigType.TIME,
> defaultValue = "3s",
> tags = {SCM, OZONE},
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1706) Replication Manager thread running too frequently

2019-06-19 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1706:
-
Attachment: HDDS-1706.001.patch

> Replication Manager thread running too frequently
> -
>
> Key: HDDS-1706
> URL: https://issues.apache.org/jira/browse/HDDS-1706
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1706.001.patch
>
>
> Replication manager is running too frequnently at a 3s interval in place of 
> 300s.
> {code}
> host: vc1337.halxg.cloudera.com, networkLocation: /default-rack, 
> certSerialId: null}.
> 2019-06-18 03:11:51,687 INFO 
> org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor 
> Thread took 4 milliseconds for processing 739 containers.
> .
> 2019-06-18 03:11:54,692 INFO 
> org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor 
> Thread took 4 milliseconds for processing 739 containers.
> {code}
> It is because of the following lines
> {code}
> @Config(key = "thread.interval",
> type = ConfigType.TIME,
> defaultValue = "3s",
> tags = {SCM, OZONE},
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1706) Replication Manager thread running too frequently

2019-06-19 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi reassigned HDDS-1706:


Assignee: Nilotpal Nandi

> Replication Manager thread running too frequently
> -
>
> Key: HDDS-1706
> URL: https://issues.apache.org/jira/browse/HDDS-1706
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Nilotpal Nandi
>Priority: Major
>
> Replication manager is running too frequnently at a 3s interval in place of 
> 300s.
> {code}
> host: vc1337.halxg.cloudera.com, networkLocation: /default-rack, 
> certSerialId: null}.
> 2019-06-18 03:11:51,687 INFO 
> org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor 
> Thread took 4 milliseconds for processing 739 containers.
> .
> 2019-06-18 03:11:54,692 INFO 
> org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor 
> Thread took 4 milliseconds for processing 739 containers.
> {code}
> It is because of the following lines
> {code}
> @Config(key = "thread.interval",
> type = ConfigType.TIME,
> defaultValue = "3s",
> tags = {SCM, OZONE},
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1497) Refactor blockade Tests

2019-05-29 Thread Nilotpal Nandi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16850826#comment-16850826
 ] 

Nilotpal Nandi commented on HDDS-1497:
--

Thanks [~shashikant] for review . I have addressed your comments . Here are 
inline comments :

1. Please update comments for property, getter and setter functions. - done
2.cluster.py:223-224 : > incorrect comments. - done
3. clusterUtils.py:324 -> "om_1" should be "om"? - This should work fine with 
'om_1" too. "om_1" string is present in om's container name.
4.cluster_utils.py:296 -> which file checksum is it supposed to compute ? can 
you please update the comments? - done 

> Refactor blockade Tests
> ---
>
> Key: HDDS-1497
> URL: https://issues.apache.org/jira/browse/HDDS-1497
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1497.001.patch, HDDS-1497.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1497) Refactor blockade Tests

2019-05-29 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1497:
-
Attachment: HDDS-1497.002.patch

> Refactor blockade Tests
> ---
>
> Key: HDDS-1497
> URL: https://issues.apache.org/jira/browse/HDDS-1497
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1497.001.patch, HDDS-1497.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-1534) freon should return non-zero exit code on failure

2019-05-23 Thread Nilotpal Nandi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846644#comment-16846644
 ] 

Nilotpal Nandi edited comment on HDDS-1534 at 5/23/19 11:16 AM:


Thanks [~sdeka] for the review.

I have addressed your comment and uploaded a new patch.


was (Author: nilotpalnandi):
Thanks [~sdeka] for the review.

I havve uploaded new patch

> freon should return non-zero exit code on failure
> -
>
> Key: HDDS-1534
> URL: https://issues.apache.org/jira/browse/HDDS-1534
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1534.001.patch, HDDS-1534.002.patch
>
>
> Currently freon does not return any non-zero exit code even on failure.
> The status shows as "Failed" but the exit code is always zero.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1534) freon should return non-zero exit code on failure

2019-05-23 Thread Nilotpal Nandi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846644#comment-16846644
 ] 

Nilotpal Nandi commented on HDDS-1534:
--

Thanks [~sdeka] for the review.

I havve uploaded new patch

> freon should return non-zero exit code on failure
> -
>
> Key: HDDS-1534
> URL: https://issues.apache.org/jira/browse/HDDS-1534
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1534.001.patch, HDDS-1534.002.patch
>
>
> Currently freon does not return any non-zero exit code even on failure.
> The status shows as "Failed" but the exit code is always zero.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1534) freon should return non-zero exit code on failure

2019-05-23 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1534:
-
Attachment: HDDS-1534.002.patch

> freon should return non-zero exit code on failure
> -
>
> Key: HDDS-1534
> URL: https://issues.apache.org/jira/browse/HDDS-1534
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1534.001.patch, HDDS-1534.002.patch
>
>
> Currently freon does not return any non-zero exit code even on failure.
> The status shows as "Failed" but the exit code is always zero.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1534) freon should return non-zero exit code on failure

2019-05-15 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1534:
-
Attachment: HDDS-1534.001.patch

> freon should return non-zero exit code on failure
> -
>
> Key: HDDS-1534
> URL: https://issues.apache.org/jira/browse/HDDS-1534
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1534.001.patch
>
>
> Currently freon does not return any non-zero exit code even on failure.
> The status shows as "Failed" but the exit code is always zero.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1534) freon should return non-zero exit code on failure

2019-05-15 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1534:
-
Status: Patch Available  (was: Open)

> freon should return non-zero exit code on failure
> -
>
> Key: HDDS-1534
> URL: https://issues.apache.org/jira/browse/HDDS-1534
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1534.001.patch
>
>
> Currently freon does not return any non-zero exit code even on failure.
> The status shows as "Failed" but the exit code is always zero.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1497) Refactor blockade Tests

2019-05-15 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1497:
-
Status: Patch Available  (was: Open)

> Refactor blockade Tests
> ---
>
> Key: HDDS-1497
> URL: https://issues.apache.org/jira/browse/HDDS-1497
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1497.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1497) Refactor blockade Tests

2019-05-15 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1497:
-
Attachment: HDDS-1497.001.patch

> Refactor blockade Tests
> ---
>
> Key: HDDS-1497
> URL: https://issues.apache.org/jira/browse/HDDS-1497
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1497.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1534) freon should return non-zero exit code on failure

2019-05-15 Thread Nilotpal Nandi (JIRA)
Nilotpal Nandi created HDDS-1534:


 Summary: freon should return non-zero exit code on failure
 Key: HDDS-1534
 URL: https://issues.apache.org/jira/browse/HDDS-1534
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Nilotpal Nandi
Assignee: Nilotpal Nandi


Currently freon does not return any non-zero exit code even on failure.

The status shows as "Failed" but the exit code is always zero.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1497) Refactor blockade Tests

2019-05-07 Thread Nilotpal Nandi (JIRA)
Nilotpal Nandi created HDDS-1497:


 Summary: Refactor blockade Tests
 Key: HDDS-1497
 URL: https://issues.apache.org/jira/browse/HDDS-1497
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Nilotpal Nandi
Assignee: Nilotpal Nandi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1164) Add New blockade Tests to test Replica Manager

2019-04-03 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1164:
-
Attachment: HDDS-1164.004.patch

> Add New blockade Tests to test Replica Manager
> --
>
> Key: HDDS-1164
> URL: https://issues.apache.org/jira/browse/HDDS-1164
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
>  Labels: postpone-to-craterlake
> Attachments: HDDS-1164.001.patch, HDDS-1164.002.patch, 
> HDDS-1164.003.patch, HDDS-1164.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1067) freon run on client gets hung when two of the datanodes are down in 3 datanode cluster

2019-03-29 Thread Nilotpal Nandi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805260#comment-16805260
 ] 

Nilotpal Nandi commented on HDDS-1067:
--

Thanks [~shashikant]. I have uploaded the patch

> freon run on client gets hung when two of the datanodes are down in 3 
> datanode cluster
> --
>
> Key: HDDS-1067
> URL: https://issues.apache.org/jira/browse/HDDS-1067
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1067.001.patch, stack_file.txt
>
>
> steps taken :
> 
>  # created 3 node docker cluster.
>  # wrote a key
>  # created partition such that 2 out of 3 datanodes cannot communicate with 
> any other node.
>  # Third datanode can communicate with scm, om and the client.
>  # ran freon to write key
> Observation :
> -
> freon run is hung. There is no timeout.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1067) freon run on client gets hung when two of the datanodes are down in 3 datanode cluster

2019-03-29 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1067:
-
Status: Patch Available  (was: Open)

> freon run on client gets hung when two of the datanodes are down in 3 
> datanode cluster
> --
>
> Key: HDDS-1067
> URL: https://issues.apache.org/jira/browse/HDDS-1067
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1067.001.patch, stack_file.txt
>
>
> steps taken :
> 
>  # created 3 node docker cluster.
>  # wrote a key
>  # created partition such that 2 out of 3 datanodes cannot communicate with 
> any other node.
>  # Third datanode can communicate with scm, om and the client.
>  # ran freon to write key
> Observation :
> -
> freon run is hung. There is no timeout.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1067) freon run on client gets hung when two of the datanodes are down in 3 datanode cluster

2019-03-29 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1067:
-
Attachment: HDDS-1067.001.patch

> freon run on client gets hung when two of the datanodes are down in 3 
> datanode cluster
> --
>
> Key: HDDS-1067
> URL: https://issues.apache.org/jira/browse/HDDS-1067
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1067.001.patch, stack_file.txt
>
>
> steps taken :
> 
>  # created 3 node docker cluster.
>  # wrote a key
>  # created partition such that 2 out of 3 datanodes cannot communicate with 
> any other node.
>  # Third datanode can communicate with scm, om and the client.
>  # ran freon to write key
> Observation :
> -
> freon run is hung. There is no timeout.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1298) blockade tests failing as the nodes are not able to communicate with Ozone Manager

2019-03-29 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi reassigned HDDS-1298:


Assignee: Nilotpal Nandi

> blockade tests failing as the nodes are not able to communicate with Ozone 
> Manager
> --
>
> Key: HDDS-1298
> URL: https://issues.apache.org/jira/browse/HDDS-1298
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Critical
> Attachments: alllogs.log
>
>
> steps taken:
> 
>  # started 3 datanodes docker cluster.
>  # freon run fails with error : "No such service: ozoneManager"
>  
> {noformat}
> om_1 | STARTUP_MSG: build = https://github.com/apache/hadoop.git -r 
> e97acb3bd8f3befd27418996fa5d4b50bf2e17bf; compiled by 'sunilg' on 
> 2019-01-15T17:34Z
> om_1 | STARTUP_MSG: java = 11.0.1
> om_1 | /
> om_1 | 2019-03-18 06:31:41 INFO OzoneManager:51 - registered UNIX signal 
> handlers for [TERM, HUP, INT]
> om_1 | 2019-03-18 06:31:41 WARN ScmUtils:77 - ozone.om.db.dirs is not 
> configured. We recommend adding this setting. Falling back to 
> ozone.metadata.dirs instead.
> om_1 | 2019-03-18 06:31:41 INFO OzoneManager:484 - OM Service ID is not set. 
> Setting it to the default ID: omServiceIdDefault
> om_1 | 2019-03-18 06:31:41 INFO OzoneManager:490 - OM Node ID is not set. 
> Setting it to the OmStorage's OmID: 25501758-f7f6-42d5-8196-52a885af7e23
> om_1 | 2019-03-18 06:31:41 INFO OzoneManager:441 - Found matching OM address 
> with OMServiceId: null, OMNodeId: null, RPC Address: om:9862 and Ratis port: 
> 9872
> om_1 | 2019-03-18 06:31:42 WARN ScmUtils:77 - ozone.om.db.dirs is not 
> configured. We recommend adding this setting. Falling back to 
> ozone.metadata.dirs instead.
> om_1 | 2019-03-18 06:31:42 INFO log:192 - Logging initialized @4061ms
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: userTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:userTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: volumeTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:volumeTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: bucketTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:bucketTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: keyTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:keyTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: deletedTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:deletedTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: openKeyTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:openKeyTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: s3Table
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:s3Table
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: multipartInfoTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:multipartInfoTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: s3SecretTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:s3SecretTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: default
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:158 - Using default column 
> profile:DBProfile.DISK for Table:default
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:189 - Using default options. 
> DBProfile.DISK
> om_1 | 2019-03-18 06:31:42 INFO CallQueueManager:84 - Using callQueue: class 
> java.util.concurrent.LinkedBlockingQueue, queueCapacity: 2000, scheduler: 
> class org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false.
> om_1 | 2019-03-18 06:31:42 INFO Server:1074 - Starting Socket Reader #1 for 
> port 9862
> om_1 | 2019-03-18 06:31:43 WARN ScmUtils:77 - ozone.om.db.dirs is not 
> configured. We recommend adding this setting. Falling back to 
> ozone.metadata.dirs instead.
> om_1 | 2019-03-18 

[jira] [Resolved] (HDDS-1298) blockade tests failing as the nodes are not able to communicate with Ozone Manager

2019-03-29 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi resolved HDDS-1298.
--
Resolution: Duplicate

> blockade tests failing as the nodes are not able to communicate with Ozone 
> Manager
> --
>
> Key: HDDS-1298
> URL: https://issues.apache.org/jira/browse/HDDS-1298
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Critical
> Attachments: alllogs.log
>
>
> steps taken:
> 
>  # started 3 datanodes docker cluster.
>  # freon run fails with error : "No such service: ozoneManager"
>  
> {noformat}
> om_1 | STARTUP_MSG: build = https://github.com/apache/hadoop.git -r 
> e97acb3bd8f3befd27418996fa5d4b50bf2e17bf; compiled by 'sunilg' on 
> 2019-01-15T17:34Z
> om_1 | STARTUP_MSG: java = 11.0.1
> om_1 | /
> om_1 | 2019-03-18 06:31:41 INFO OzoneManager:51 - registered UNIX signal 
> handlers for [TERM, HUP, INT]
> om_1 | 2019-03-18 06:31:41 WARN ScmUtils:77 - ozone.om.db.dirs is not 
> configured. We recommend adding this setting. Falling back to 
> ozone.metadata.dirs instead.
> om_1 | 2019-03-18 06:31:41 INFO OzoneManager:484 - OM Service ID is not set. 
> Setting it to the default ID: omServiceIdDefault
> om_1 | 2019-03-18 06:31:41 INFO OzoneManager:490 - OM Node ID is not set. 
> Setting it to the OmStorage's OmID: 25501758-f7f6-42d5-8196-52a885af7e23
> om_1 | 2019-03-18 06:31:41 INFO OzoneManager:441 - Found matching OM address 
> with OMServiceId: null, OMNodeId: null, RPC Address: om:9862 and Ratis port: 
> 9872
> om_1 | 2019-03-18 06:31:42 WARN ScmUtils:77 - ozone.om.db.dirs is not 
> configured. We recommend adding this setting. Falling back to 
> ozone.metadata.dirs instead.
> om_1 | 2019-03-18 06:31:42 INFO log:192 - Logging initialized @4061ms
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: userTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:userTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: volumeTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:volumeTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: bucketTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:bucketTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: keyTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:keyTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: deletedTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:deletedTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: openKeyTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:openKeyTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: s3Table
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:s3Table
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: multipartInfoTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:multipartInfoTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: s3SecretTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:s3SecretTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: default
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:158 - Using default column 
> profile:DBProfile.DISK for Table:default
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:189 - Using default options. 
> DBProfile.DISK
> om_1 | 2019-03-18 06:31:42 INFO CallQueueManager:84 - Using callQueue: class 
> java.util.concurrent.LinkedBlockingQueue, queueCapacity: 2000, scheduler: 
> class org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false.
> om_1 | 2019-03-18 06:31:42 INFO Server:1074 - Starting Socket Reader #1 for 
> port 9862
> om_1 | 2019-03-18 06:31:43 WARN ScmUtils:77 - ozone.om.db.dirs is not 
> configured. We recommend adding this setting. Falling back to 
> ozone.metadata.dirs instead.
> om_1 | 2019-03-18 06:31:43 

[jira] [Updated] (HDDS-1164) Add New blockade Tests to test Replica Manager

2019-03-29 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1164:
-
Attachment: HDDS-1164.003.patch

> Add New blockade Tests to test Replica Manager
> --
>
> Key: HDDS-1164
> URL: https://issues.apache.org/jira/browse/HDDS-1164
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
>  Labels: postpone-to-craterlake
> Attachments: HDDS-1164.001.patch, HDDS-1164.002.patch, 
> HDDS-1164.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1067) freon run on client gets hung when two of the datanodes are down in 3 datanode cluster

2019-03-29 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi reassigned HDDS-1067:


Assignee: Nilotpal Nandi  (was: Shashikant Banerjee)

> freon run on client gets hung when two of the datanodes are down in 3 
> datanode cluster
> --
>
> Key: HDDS-1067
> URL: https://issues.apache.org/jira/browse/HDDS-1067
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: stack_file.txt
>
>
> steps taken :
> 
>  # created 3 node docker cluster.
>  # wrote a key
>  # created partition such that 2 out of 3 datanodes cannot communicate with 
> any other node.
>  # Third datanode can communicate with scm, om and the client.
>  # ran freon to write key
> Observation :
> -
> freon run is hung. There is no timeout.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1164) Add New blockade Tests to test Replica Manager

2019-03-28 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1164:
-
Attachment: HDDS-1164.002.patch

> Add New blockade Tests to test Replica Manager
> --
>
> Key: HDDS-1164
> URL: https://issues.apache.org/jira/browse/HDDS-1164
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
>  Labels: postpone-to-craterlake
> Attachments: HDDS-1164.001.patch, HDDS-1164.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1338) ozone shell commands are throwing InvocationTargetException

2019-03-26 Thread Nilotpal Nandi (JIRA)
Nilotpal Nandi created HDDS-1338:


 Summary: ozone shell commands are throwing 
InvocationTargetException
 Key: HDDS-1338
 URL: https://issues.apache.org/jira/browse/HDDS-1338
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Nilotpal Nandi


ozone version
{noformat}
Source code repository g...@github.com:hortonworks/ozone.git -r 
310ebf5dc83b6c9e68d09246ed6c6f7cf6370fde
Compiled by jenkins on 2019-03-21T22:06Z
Compiled with protoc 2.5.0
>From source with checksum 9c367143ad43b81ca84bfdaafd1c3f

Using HDDS 0.4.0.3.0.100.0-388
Source code repository g...@github.com:hortonworks/ozone.git -r 
310ebf5dc83b6c9e68d09246ed6c6f7cf6370fde
Compiled by jenkins on 2019-03-21T22:06Z
Compiled with protoc 2.5.0
>From source with checksum f3297cbd3a5f59fb4e5fd551afa05ba9
{noformat}


Here is the ozone volume create failure output :

{noformat}
hdfs@ctr-e139-1542663976389-91321-01-02 ~]$ ozone sh volume create 
testvolume11
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/usr/hdp/3.0.100.0-388/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/usr/hdp/3.0.100.0-388/hadoop-ozone/share/ozone/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/03/26 17:31:37 ERROR client.OzoneClientFactory: Couldn't create protocol 
class org.apache.hadoop.ozone.client.rpc.RpcClient exception:
java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
 at 
org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:291)
 at 
org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:169)
 at 
org.apache.hadoop.ozone.web.ozShell.OzoneAddress.createClient(OzoneAddress.java:111)
 at 
org.apache.hadoop.ozone.web.ozShell.volume.CreateVolumeHandler.call(CreateVolumeHandler.java:70)
 at 
org.apache.hadoop.ozone.web.ozShell.volume.CreateVolumeHandler.call(CreateVolumeHandler.java:38)
 at picocli.CommandLine.execute(CommandLine.java:919)
 at picocli.CommandLine.access$700(CommandLine.java:104)
 at picocli.CommandLine$RunLast.handle(CommandLine.java:1083)
 at picocli.CommandLine$RunLast.handle(CommandLine.java:1051)
 at 
picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959)
 at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242)
 at picocli.CommandLine.parseWithHandler(CommandLine.java:1181)
 at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:61)
 at org.apache.hadoop.ozone.web.ozShell.Shell.execute(Shell.java:82)
 at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:52)
 at org.apache.hadoop.ozone.web.ozShell.Shell.main(Shell.java:93)
Caused by: java.lang.VerifyError: Cannot inherit from final class
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
 at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
 at 
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.(OzoneManagerProtocolClientSideTranslatorPB.java:169)
 at org.apache.hadoop.ozone.client.rpc.RpcClient.(RpcClient.java:142)
 ... 20 more
Couldn't create protocol class org.apache.hadoop.ozone.client.rpc.RpcClient
{noformat}

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1326) putkey operation failed with java.lang.ArrayIndexOutOfBoundsException

2019-03-22 Thread Nilotpal Nandi (JIRA)
Nilotpal Nandi created HDDS-1326:


 Summary: putkey operation failed with 
java.lang.ArrayIndexOutOfBoundsException
 Key: HDDS-1326
 URL: https://issues.apache.org/jira/browse/HDDS-1326
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Nilotpal Nandi


steps taken :

---
 # trying to write key in 40 node cluster.
 # write failed.

client output

---

 
{noformat}
e530-491c-ab03-3b1c34d1a751:c80390, 
974a806d-bf7d-4f1b-adb4-d51d802d368a:c80390, 
469bd8c4-5da2-43bb-bc4b-7edd884931e5:c80390]
2019-03-22 10:56:19,592 [main] WARN - Encountered exception {}
java.io.IOException: Unexpected Storage Container Exception: 
java.util.concurrent.ExecutionException: 
java.util.concurrent.CompletionException: 
org.apache.ratis.protocol.StateMachineException: 
org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException 
from Server 5d3eb91f-e530-491c-ab03-3b1c34d1a751: Container 1269 in CLOSED state
 at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:511)
 at 
org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:144)
 at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:565)
 at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:329)
 at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:273)
 at 
org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49)
 at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:96)
 at 
org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:111)
 at 
org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:53)
 at picocli.CommandLine.execute(CommandLine.java:919)
 at picocli.CommandLine.access$700(CommandLine.java:104)
 at picocli.CommandLine$RunLast.handle(CommandLine.java:1083)
 at picocli.CommandLine$RunLast.handle(CommandLine.java:1051)
 at 
picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959)
 at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242)
 at picocli.CommandLine.parseWithHandler(CommandLine.java:1181)
 at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:61)
 at org.apache.hadoop.ozone.web.ozShell.Shell.execute(Shell.java:82)
 at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:52)
 at org.apache.hadoop.ozone.web.ozShell.Shell.main(Shell.java:93)
Caused by: java.util.concurrent.ExecutionException: 
java.util.concurrent.CompletionException: 
org.apache.ratis.protocol.StateMachineException: 
org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException 
from Server 5d3eb91f-e530-491c-ab03-3b1c34d1a751: Container 1269 in CLOSED state
 at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
 at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
 at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.waitOnFlushFutures(BlockOutputStream.java:529)
 at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:481)
 at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:496)
 ... 19 more
Caused by: java.util.concurrent.CompletionException: 
org.apache.ratis.protocol.StateMachineException: 
org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException 
from Server 5d3eb91f-e530-491c-ab03-3b1c34d1a751: Container 1269 in CLOSED state
 at 
org.apache.ratis.client.impl.RaftClientImpl.handleStateMachineException(RaftClientImpl.java:402)
 at 
org.apache.ratis.client.impl.RaftClientImpl.lambda$sendAsync$3(RaftClientImpl.java:198)
 at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
 at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
 at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
 at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
 at 
org.apache.ratis.client.impl.RaftClientImpl$PendingAsyncRequest.setReply(RaftClientImpl.java:95)
 at 
org.apache.ratis.client.impl.RaftClientImpl$PendingAsyncRequest.setReply(RaftClientImpl.java:75)
 at 
org.apache.ratis.util.SlidingWindow$RequestMap.setReply(SlidingWindow.java:127)
 at 
org.apache.ratis.util.SlidingWindow$Client.receiveReply(SlidingWindow.java:279)
 at 
org.apache.ratis.client.impl.RaftClientImpl.lambda$sendRequestAsync$13(RaftClientImpl.java:344)
 at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
 at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
 at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
 at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
 at 

[jira] [Created] (HDDS-1325) Exception thrown while initializing ozoneClientAdapter

2019-03-22 Thread Nilotpal Nandi (JIRA)
Nilotpal Nandi created HDDS-1325:


 Summary: Exception thrown while initializing ozoneClientAdapter 
 Key: HDDS-1325
 URL: https://issues.apache.org/jira/browse/HDDS-1325
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Nilotpal Nandi


ozone version :



 
{noformat}
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r 
568d3ab8b65d1348dec9c971feffe200e6cba2ef
Compiled by nnandi on 2019-03-19T03:54Z
Compiled with protoc 2.5.0
>From source with checksum c44d339e20094d3054754078afbf4c
Using HDDS 0.5.0-SNAPSHOT
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r 
568d3ab8b65d1348dec9c971feffe200e6cba2ef
Compiled by nnandi on 2019-03-19T03:53Z
Compiled with protoc 2.5.0
>From source with checksum b354934fb1352f4d5425114bf8dce11
{noformat}
 

 

steps taken :

---
 # Add ozone libs in hadoop classpath.
 # Tried to run s3dupdo workload ([https://github.com/t3rmin4t0r/s3dupdo])

Here is the exception thrown :

 
{noformat}
java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
 at 
org.apache.hadoop.fs.ozone.OzoneClientAdapterFactory.lambda$createAdapter$1(OzoneClientAdapterFactory.java:65)
 at 
org.apache.hadoop.fs.ozone.OzoneClientAdapterFactory.createAdapter(OzoneClientAdapterFactory.java:105)
 at 
org.apache.hadoop.fs.ozone.OzoneClientAdapterFactory.createAdapter(OzoneClientAdapterFactory.java:61)
 at 
org.apache.hadoop.fs.ozone.OzoneFileSystem.initialize(OzoneFileSystem.java:167)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
 at org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:3326)
 at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:532)
 at org.notmysock.repl.Works$CopyWorker.run(Works.java:243)
 at org.notmysock.repl.Works$CopyWorker.call(Works.java:279)
 at org.notmysock.repl.Works$CopyWorker.call(Works.java:204)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.LinkageError: loader constraint violation: loader 
(instance of org/apache/hadoop/fs/ozone/FilteredClassLoader) previously 
initiated loading for a different type with name 
"org/apache/hadoop/security/token/Token"
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
 at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
 at 
org.apache.hadoop.fs.ozone.FilteredClassLoader.loadClass(FilteredClassLoader.java:71)
 at java.lang.Class.getDeclaredMethods0(Native Method)
 at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
 at java.lang.Class.privateGetPublicMethods(Class.java:2902)
 at java.lang.Class.getMethods(Class.java:1615)
 at sun.misc.ProxyGenerator.generateClassFile(ProxyGenerator.java:451)
 at sun.misc.ProxyGenerator.generateProxyClass(ProxyGenerator.java:339)
 at java.lang.reflect.Proxy$ProxyClassFactory.apply(Proxy.java:639)
 at java.lang.reflect.Proxy$ProxyClassFactory.apply(Proxy.java:557)
 at java.lang.reflect.WeakCache$Factory.get(WeakCache.java:230)
 at java.lang.reflect.WeakCache.get(WeakCache.java:127)
 at java.lang.reflect.Proxy.getProxyClass0(Proxy.java:419)
 at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:719)
 at 
org.apache.hadoop.ozone.client.OzoneClientFactory.getClient(OzoneClientFactory.java:264)
 at 
org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:169)
 at 
org.apache.hadoop.fs.ozone.OzoneClientAdapterImpl.(OzoneClientAdapterImpl.java:140)
 at 
org.apache.hadoop.fs.ozone.OzoneClientAdapterImpl.(OzoneClientAdapterImpl.java:104)
 at 
org.apache.hadoop.fs.ozone.OzoneClientAdapterImpl.(OzoneClientAdapterImpl.java:75)
 ... 20 more{noformat}
 




[jira] [Commented] (HDDS-1298) blockade tests failing as the nodes are not able to communicate with Ozone Manager

2019-03-18 Thread Nilotpal Nandi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794772#comment-16794772
 ] 

Nilotpal Nandi commented on HDDS-1298:
--

logs of all docker nodes:

[^alllogs.log]

> blockade tests failing as the nodes are not able to communicate with Ozone 
> Manager
> --
>
> Key: HDDS-1298
> URL: https://issues.apache.org/jira/browse/HDDS-1298
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Priority: Critical
> Attachments: alllogs.log
>
>
> steps taken:
> 
>  # started 3 datanodes docker cluster.
>  # freon run fails with error : "No such service: ozoneManager"
>  
> {noformat}
> om_1 | STARTUP_MSG: build = https://github.com/apache/hadoop.git -r 
> e97acb3bd8f3befd27418996fa5d4b50bf2e17bf; compiled by 'sunilg' on 
> 2019-01-15T17:34Z
> om_1 | STARTUP_MSG: java = 11.0.1
> om_1 | /
> om_1 | 2019-03-18 06:31:41 INFO OzoneManager:51 - registered UNIX signal 
> handlers for [TERM, HUP, INT]
> om_1 | 2019-03-18 06:31:41 WARN ScmUtils:77 - ozone.om.db.dirs is not 
> configured. We recommend adding this setting. Falling back to 
> ozone.metadata.dirs instead.
> om_1 | 2019-03-18 06:31:41 INFO OzoneManager:484 - OM Service ID is not set. 
> Setting it to the default ID: omServiceIdDefault
> om_1 | 2019-03-18 06:31:41 INFO OzoneManager:490 - OM Node ID is not set. 
> Setting it to the OmStorage's OmID: 25501758-f7f6-42d5-8196-52a885af7e23
> om_1 | 2019-03-18 06:31:41 INFO OzoneManager:441 - Found matching OM address 
> with OMServiceId: null, OMNodeId: null, RPC Address: om:9862 and Ratis port: 
> 9872
> om_1 | 2019-03-18 06:31:42 WARN ScmUtils:77 - ozone.om.db.dirs is not 
> configured. We recommend adding this setting. Falling back to 
> ozone.metadata.dirs instead.
> om_1 | 2019-03-18 06:31:42 INFO log:192 - Logging initialized @4061ms
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: userTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:userTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: volumeTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:volumeTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: bucketTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:bucketTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: keyTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:keyTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: deletedTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:deletedTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: openKeyTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:openKeyTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: s3Table
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:s3Table
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: multipartInfoTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:multipartInfoTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: s3SecretTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:s3SecretTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: default
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:158 - Using default column 
> profile:DBProfile.DISK for Table:default
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:189 - Using default options. 
> DBProfile.DISK
> om_1 | 2019-03-18 06:31:42 INFO CallQueueManager:84 - Using callQueue: class 
> java.util.concurrent.LinkedBlockingQueue, queueCapacity: 2000, scheduler: 
> class org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false.
> om_1 | 2019-03-18 06:31:42 INFO Server:1074 - Starting Socket Reader #1 for 
> port 9862
> om_1 | 2019-03-18 06:31:43 WARN ScmUtils:77 - ozone.om.db.dirs is not 
> configured. We recommend adding this setting. Falling back to 
> ozone.metadata.dirs instead.
> om_1 | 

[jira] [Updated] (HDDS-1298) blockade tests failing as the nodes are not able to communicate with Ozone Manager

2019-03-18 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1298:
-
Attachment: alllogs.log

> blockade tests failing as the nodes are not able to communicate with Ozone 
> Manager
> --
>
> Key: HDDS-1298
> URL: https://issues.apache.org/jira/browse/HDDS-1298
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Priority: Critical
> Attachments: alllogs.log
>
>
> steps taken:
> 
>  # started 3 datanodes docker cluster.
>  # freon run fails with error : "No such service: ozoneManager"
>  
> {noformat}
> om_1 | STARTUP_MSG: build = https://github.com/apache/hadoop.git -r 
> e97acb3bd8f3befd27418996fa5d4b50bf2e17bf; compiled by 'sunilg' on 
> 2019-01-15T17:34Z
> om_1 | STARTUP_MSG: java = 11.0.1
> om_1 | /
> om_1 | 2019-03-18 06:31:41 INFO OzoneManager:51 - registered UNIX signal 
> handlers for [TERM, HUP, INT]
> om_1 | 2019-03-18 06:31:41 WARN ScmUtils:77 - ozone.om.db.dirs is not 
> configured. We recommend adding this setting. Falling back to 
> ozone.metadata.dirs instead.
> om_1 | 2019-03-18 06:31:41 INFO OzoneManager:484 - OM Service ID is not set. 
> Setting it to the default ID: omServiceIdDefault
> om_1 | 2019-03-18 06:31:41 INFO OzoneManager:490 - OM Node ID is not set. 
> Setting it to the OmStorage's OmID: 25501758-f7f6-42d5-8196-52a885af7e23
> om_1 | 2019-03-18 06:31:41 INFO OzoneManager:441 - Found matching OM address 
> with OMServiceId: null, OMNodeId: null, RPC Address: om:9862 and Ratis port: 
> 9872
> om_1 | 2019-03-18 06:31:42 WARN ScmUtils:77 - ozone.om.db.dirs is not 
> configured. We recommend adding this setting. Falling back to 
> ozone.metadata.dirs instead.
> om_1 | 2019-03-18 06:31:42 INFO log:192 - Logging initialized @4061ms
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: userTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:userTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: volumeTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:volumeTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: bucketTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:bucketTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: keyTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:keyTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: deletedTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:deletedTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: openKeyTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:openKeyTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: s3Table
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:s3Table
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: multipartInfoTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:multipartInfoTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: s3SecretTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
> profile:DBProfile.DISK for Table:s3SecretTable
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
> table: default
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:158 - Using default column 
> profile:DBProfile.DISK for Table:default
> om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:189 - Using default options. 
> DBProfile.DISK
> om_1 | 2019-03-18 06:31:42 INFO CallQueueManager:84 - Using callQueue: class 
> java.util.concurrent.LinkedBlockingQueue, queueCapacity: 2000, scheduler: 
> class org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false.
> om_1 | 2019-03-18 06:31:42 INFO Server:1074 - Starting Socket Reader #1 for 
> port 9862
> om_1 | 2019-03-18 06:31:43 WARN ScmUtils:77 - ozone.om.db.dirs is not 
> configured. We recommend adding this setting. Falling back to 
> ozone.metadata.dirs instead.
> om_1 | 2019-03-18 06:31:43 INFO OzoneManager:1129 - OzoneManager 

[jira] [Created] (HDDS-1298) blockade tests failing as the nodes are not able to communicate with Ozone Manager

2019-03-18 Thread Nilotpal Nandi (JIRA)
Nilotpal Nandi created HDDS-1298:


 Summary: blockade tests failing as the nodes are not able to 
communicate with Ozone Manager
 Key: HDDS-1298
 URL: https://issues.apache.org/jira/browse/HDDS-1298
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Nilotpal Nandi


steps taken:


 # started 3 datanodes docker cluster.
 # freon run fails with error : "No such service: ozoneManager"

 
{noformat}
om_1 | STARTUP_MSG: build = https://github.com/apache/hadoop.git -r 
e97acb3bd8f3befd27418996fa5d4b50bf2e17bf; compiled by 'sunilg' on 
2019-01-15T17:34Z
om_1 | STARTUP_MSG: java = 11.0.1
om_1 | /
om_1 | 2019-03-18 06:31:41 INFO OzoneManager:51 - registered UNIX signal 
handlers for [TERM, HUP, INT]
om_1 | 2019-03-18 06:31:41 WARN ScmUtils:77 - ozone.om.db.dirs is not 
configured. We recommend adding this setting. Falling back to 
ozone.metadata.dirs instead.
om_1 | 2019-03-18 06:31:41 INFO OzoneManager:484 - OM Service ID is not set. 
Setting it to the default ID: omServiceIdDefault
om_1 | 2019-03-18 06:31:41 INFO OzoneManager:490 - OM Node ID is not set. 
Setting it to the OmStorage's OmID: 25501758-f7f6-42d5-8196-52a885af7e23
om_1 | 2019-03-18 06:31:41 INFO OzoneManager:441 - Found matching OM address 
with OMServiceId: null, OMNodeId: null, RPC Address: om:9862 and Ratis port: 
9872
om_1 | 2019-03-18 06:31:42 WARN ScmUtils:77 - ozone.om.db.dirs is not 
configured. We recommend adding this setting. Falling back to 
ozone.metadata.dirs instead.
om_1 | 2019-03-18 06:31:42 INFO log:192 - Logging initialized @4061ms
om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
table: userTable
om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
profile:DBProfile.DISK for Table:userTable
om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
table: volumeTable
om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
profile:DBProfile.DISK for Table:volumeTable
om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
table: bucketTable
om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
profile:DBProfile.DISK for Table:bucketTable
om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
table: keyTable
om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
profile:DBProfile.DISK for Table:keyTable
om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
table: deletedTable
om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
profile:DBProfile.DISK for Table:deletedTable
om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
table: openKeyTable
om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
profile:DBProfile.DISK for Table:openKeyTable
om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
table: s3Table
om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
profile:DBProfile.DISK for Table:s3Table
om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
table: multipartInfoTable
om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
profile:DBProfile.DISK for Table:multipartInfoTable
om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
table: s3SecretTable
om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column 
profile:DBProfile.DISK for Table:s3SecretTable
om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for 
table: default
om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:158 - Using default column 
profile:DBProfile.DISK for Table:default
om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:189 - Using default options. 
DBProfile.DISK
om_1 | 2019-03-18 06:31:42 INFO CallQueueManager:84 - Using callQueue: class 
java.util.concurrent.LinkedBlockingQueue, queueCapacity: 2000, scheduler: class 
org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false.
om_1 | 2019-03-18 06:31:42 INFO Server:1074 - Starting Socket Reader #1 for 
port 9862
om_1 | 2019-03-18 06:31:43 WARN ScmUtils:77 - ozone.om.db.dirs is not 
configured. We recommend adding this setting. Falling back to 
ozone.metadata.dirs instead.
om_1 | 2019-03-18 06:31:43 INFO OzoneManager:1129 - OzoneManager RPC server is 
listening at om/172.21.0.3:9862
om_1 | 2019-03-18 06:31:43 INFO MetricsConfig:118 - Loaded properties from 
hadoop-metrics2.properties
om_1 | 2019-03-18 06:31:43 INFO MetricsSystemImpl:374 - Scheduled Metric 
snapshot period at 10 second(s).
om_1 | 2019-03-18 06:31:43 INFO MetricsSystemImpl:191 - OzoneManager metrics 
system started
om_1 | 2019-03-18 06:31:43 INFO Server:1314 - IPC Server Responder: starting
om_1 | 2019-03-18 

[jira] [Comment Edited] (HDDS-1088) Add blockade Tests to test Replica Manager

2019-03-15 Thread Nilotpal Nandi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793727#comment-16793727
 ] 

Nilotpal Nandi edited comment on HDDS-1088 at 3/15/19 3:44 PM:
---

Thanks [~shashikant]. 

Please note that there is some existing issue due to which pylint is throwing 
error. Need to be resolved later.


was (Author: nilotpalnandi):
Thanks [~shashikant]. 

Please note that there is some existing issue due to which pylint is throwing 
error. Neeed to resolved later.

> Add blockade Tests to test Replica Manager
> --
>
> Key: HDDS-1088
> URL: https://issues.apache.org/jira/browse/HDDS-1088
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1088.001.patch, HDDS-1088.002.patch, 
> HDDS-1088.003.patch, HDDS-1088.004.patch, HDDS-1088.005.patch, 
> HDDS-1088.006.patch, HDDS-1088.007.patch, HDDS-1088.008.patch
>
>
> We need to add tests for testing Replica Manager for scenarios like loss of 
> node, adding new nodes, under-replicated containers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1088) Add blockade Tests to test Replica Manager

2019-03-15 Thread Nilotpal Nandi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793727#comment-16793727
 ] 

Nilotpal Nandi commented on HDDS-1088:
--

Thanks [~shashikant]. 

Please note that there is some existing issue due to which pylint is throwing 
error. Neeed to resolved later.

> Add blockade Tests to test Replica Manager
> --
>
> Key: HDDS-1088
> URL: https://issues.apache.org/jira/browse/HDDS-1088
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1088.001.patch, HDDS-1088.002.patch, 
> HDDS-1088.003.patch, HDDS-1088.004.patch, HDDS-1088.005.patch, 
> HDDS-1088.006.patch, HDDS-1088.007.patch, HDDS-1088.008.patch
>
>
> We need to add tests for testing Replica Manager for scenarios like loss of 
> node, adding new nodes, under-replicated containers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1289) get Key failed on SCM restart

2019-03-15 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1289:
-
Description: 
Seeing ContainerNotFoundException in scm log when get key operation tried after 
scm restart.

scm.log:

[^hadoop-hdfs-scm-ctr-e139-1542663976389-86524-01-03.log]

 
{noformat}
 
 
ozone version :

Source code repository g...@github.com:hortonworks/ozone.git -r 
67b7c4fd071b3f557bdb54be2a266b8a611cbce6
Compiled by jenkins on 2019-03-06T22:02Z
Compiled with protoc 2.5.0
>From source with checksum 65be9a337d178cd3855f5c5a2f111
Using HDDS 0.4.0.3.0.100.0-348
Source code repository g...@github.com:hortonworks/ozone.git -r 
67b7c4fd071b3f557bdb54be2a266b8a611cbce6
Compiled by jenkins on 2019-03-06T22:01Z
Compiled with protoc 2.5.0
>From source with checksum 324109cb3e8b188c1b89dc0b328c3a
root@ctr-e139-1542663976389-86524-01-06 hdfs# hadoop version
Hadoop 3.1.1.3.0.100.0-348
Source code repository g...@github.com:hortonworks/hadoop.git -r 
484434b1c2480bdc9314a7ee1ade8a0f4db1758f
Compiled by jenkins on 2019-03-06T22:14Z
Compiled with protoc 2.5.0
>From source with checksum ba6aad94c14256ef3ad8634e3b5086
This command was run using 
/usr/hdp/3.0.100.0-348/hadoop/hadoop-common-3.1.1.3.0.100.0-348.jar
{noformat}
 

 

 
{noformat}
2019-03-13 17:00:54,348 ERROR container.ContainerReportHandler 
(ContainerReportHandler.java:processContainerReplicas(173)) - Received 
container report for an unknown container 22 from datanode 
80f046cb-6fe2-4a05-bb67-9bf46f48723b{ip: 172.27.69.155, host: 
ctr-e139-1542663976389-86524-01-05.hwx.site} {} 
org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: #22 at 
org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.checkIfContainerExist(ContainerStateMap.java:543)
 at 
org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.updateContainerReplica(ContainerStateMap.java:230)
 at 
org.apache.hadoop.hdds.scm.container.ContainerStateManager.updateContainerReplica(ContainerStateManager.java:565)
 at 
org.apache.hadoop.hdds.scm.container.SCMContainerManager.updateContainerReplica(SCMContainerManager.java:393)
 at 
org.apache.hadoop.hdds.scm.container.ReportHandlerHelper.processContainerReplica(ReportHandlerHelper.java:74)
 at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:159)
 at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:110)
 at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:51)
 at 
org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748) 2019-03-13 17:00:54,349 ERROR 
container.ContainerReportHandler 
(ContainerReportHandler.java:processContainerReplicas(173)) - Received 
container report for an unknown container 23 from datanode 
80f046cb-6fe2-4a05-bb67-9bf46f48723b{ip: 172.27.69.155, host: 
ctr-e139-1542663976389-86524-01-05.hwx.site} {} 
org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: #23 at 
org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.checkIfContainerExist(ContainerStateMap.java:543)
 at 
org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.updateContainerReplica(ContainerStateMap.java:230)
 at 
org.apache.hadoop.hdds.scm.container.ContainerStateManager.updateContainerReplica(ContainerStateManager.java:565)
 at 
org.apache.hadoop.hdds.scm.container.SCMContainerManager.updateContainerReplica(SCMContainerManager.java:393)
 at 
org.apache.hadoop.hdds.scm.container.ReportHandlerHelper.processContainerReplica(ReportHandlerHelper.java:74)
 at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:159)
 at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:110)
 at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:51)
 at 
org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748) 2019-03-13 17:01:24,230 ERROR 
container.ContainerReportHandler 
(ContainerReportHandler.java:processContainerReplicas(173)) - Received 
container report for an unknown container 22 from datanode 
076fd0d8-ab5f-4fbe-ad10-b71a1ccb19bf{ip: 172.27.39.88, host: 
ctr-e139-1542663976389-86524-01-04.hwx.site} {} 

[jira] [Created] (HDDS-1289) get Key failed on SCM restart

2019-03-15 Thread Nilotpal Nandi (JIRA)
Nilotpal Nandi created HDDS-1289:


 Summary: get Key failed on SCM restart
 Key: HDDS-1289
 URL: https://issues.apache.org/jira/browse/HDDS-1289
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Nilotpal Nandi
 Attachments: hadoop-hdfs-scm-ctr-e139-1542663976389-86524-01-03.log

Seeing ContainerNotFoundException in scm log when get key operation tried after 
scm restart.

scm.log:

[^hadoop-hdfs-scm-ctr-e139-1542663976389-86524-01-03.log]

 

 
{noformat}
2019-03-13 17:00:54,348 ERROR container.ContainerReportHandler 
(ContainerReportHandler.java:processContainerReplicas(173)) - Received 
container report for an unknown container 22 from datanode 
80f046cb-6fe2-4a05-bb67-9bf46f48723b{ip: 172.27.69.155, host: 
ctr-e139-1542663976389-86524-01-05.hwx.site} {} 
org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: #22 at 
org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.checkIfContainerExist(ContainerStateMap.java:543)
 at 
org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.updateContainerReplica(ContainerStateMap.java:230)
 at 
org.apache.hadoop.hdds.scm.container.ContainerStateManager.updateContainerReplica(ContainerStateManager.java:565)
 at 
org.apache.hadoop.hdds.scm.container.SCMContainerManager.updateContainerReplica(SCMContainerManager.java:393)
 at 
org.apache.hadoop.hdds.scm.container.ReportHandlerHelper.processContainerReplica(ReportHandlerHelper.java:74)
 at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:159)
 at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:110)
 at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:51)
 at 
org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748) 2019-03-13 17:00:54,349 ERROR 
container.ContainerReportHandler 
(ContainerReportHandler.java:processContainerReplicas(173)) - Received 
container report for an unknown container 23 from datanode 
80f046cb-6fe2-4a05-bb67-9bf46f48723b{ip: 172.27.69.155, host: 
ctr-e139-1542663976389-86524-01-05.hwx.site} {} 
org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: #23 at 
org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.checkIfContainerExist(ContainerStateMap.java:543)
 at 
org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.updateContainerReplica(ContainerStateMap.java:230)
 at 
org.apache.hadoop.hdds.scm.container.ContainerStateManager.updateContainerReplica(ContainerStateManager.java:565)
 at 
org.apache.hadoop.hdds.scm.container.SCMContainerManager.updateContainerReplica(SCMContainerManager.java:393)
 at 
org.apache.hadoop.hdds.scm.container.ReportHandlerHelper.processContainerReplica(ReportHandlerHelper.java:74)
 at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:159)
 at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:110)
 at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:51)
 at 
org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748) 2019-03-13 17:01:24,230 ERROR 
container.ContainerReportHandler 
(ContainerReportHandler.java:processContainerReplicas(173)) - Received 
container report for an unknown container 22 from datanode 
076fd0d8-ab5f-4fbe-ad10-b71a1ccb19bf{ip: 172.27.39.88, host: 
ctr-e139-1542663976389-86524-01-04.hwx.site} {} 
org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: #22 at 
org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.checkIfContainerExist(ContainerStateMap.java:543)
 at 
org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.updateContainerReplica(ContainerStateMap.java:230)
 at 
org.apache.hadoop.hdds.scm.container.ContainerStateManager.updateContainerReplica(ContainerStateManager.java:565)
 at 
org.apache.hadoop.hdds.scm.container.SCMContainerManager.updateContainerReplica(SCMContainerManager.java:393)
 at 
org.apache.hadoop.hdds.scm.container.ReportHandlerHelper.processContainerReplica(ReportHandlerHelper.java:74)
 at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:159)
 at 

[jira] [Updated] (HDDS-1289) get Key failed on SCM restart

2019-03-15 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1289:
-
Component/s: SCM

> get Key failed on SCM restart
> -
>
> Key: HDDS-1289
> URL: https://issues.apache.org/jira/browse/HDDS-1289
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Nilotpal Nandi
>Priority: Critical
> Attachments: 
> hadoop-hdfs-scm-ctr-e139-1542663976389-86524-01-03.log
>
>
> Seeing ContainerNotFoundException in scm log when get key operation tried 
> after scm restart.
> scm.log:
> [^hadoop-hdfs-scm-ctr-e139-1542663976389-86524-01-03.log]
>  
>  
> {noformat}
> 2019-03-13 17:00:54,348 ERROR container.ContainerReportHandler 
> (ContainerReportHandler.java:processContainerReplicas(173)) - Received 
> container report for an unknown container 22 from datanode 
> 80f046cb-6fe2-4a05-bb67-9bf46f48723b{ip: 172.27.69.155, host: 
> ctr-e139-1542663976389-86524-01-05.hwx.site} {} 
> org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: #22 at 
> org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.checkIfContainerExist(ContainerStateMap.java:543)
>  at 
> org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.updateContainerReplica(ContainerStateMap.java:230)
>  at 
> org.apache.hadoop.hdds.scm.container.ContainerStateManager.updateContainerReplica(ContainerStateManager.java:565)
>  at 
> org.apache.hadoop.hdds.scm.container.SCMContainerManager.updateContainerReplica(SCMContainerManager.java:393)
>  at 
> org.apache.hadoop.hdds.scm.container.ReportHandlerHelper.processContainerReplica(ReportHandlerHelper.java:74)
>  at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:159)
>  at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:110)
>  at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:51)
>  at 
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) 2019-03-13 17:00:54,349 ERROR 
> container.ContainerReportHandler 
> (ContainerReportHandler.java:processContainerReplicas(173)) - Received 
> container report for an unknown container 23 from datanode 
> 80f046cb-6fe2-4a05-bb67-9bf46f48723b{ip: 172.27.69.155, host: 
> ctr-e139-1542663976389-86524-01-05.hwx.site} {} 
> org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: #23 at 
> org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.checkIfContainerExist(ContainerStateMap.java:543)
>  at 
> org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.updateContainerReplica(ContainerStateMap.java:230)
>  at 
> org.apache.hadoop.hdds.scm.container.ContainerStateManager.updateContainerReplica(ContainerStateManager.java:565)
>  at 
> org.apache.hadoop.hdds.scm.container.SCMContainerManager.updateContainerReplica(SCMContainerManager.java:393)
>  at 
> org.apache.hadoop.hdds.scm.container.ReportHandlerHelper.processContainerReplica(ReportHandlerHelper.java:74)
>  at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:159)
>  at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:110)
>  at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:51)
>  at 
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) 2019-03-13 17:01:24,230 ERROR 
> container.ContainerReportHandler 
> (ContainerReportHandler.java:processContainerReplicas(173)) - Received 
> container report for an unknown container 22 from datanode 
> 076fd0d8-ab5f-4fbe-ad10-b71a1ccb19bf{ip: 172.27.39.88, host: 
> ctr-e139-1542663976389-86524-01-04.hwx.site} {} 
> org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: #22 at 
> org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.checkIfContainerExist(ContainerStateMap.java:543)
>  at 
> org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.updateContainerReplica(ContainerStateMap.java:230)
>  at 
> org.apache.hadoop.hdds.scm.container.ContainerStateManager.updateContainerReplica(ContainerStateManager.java:565)
>  at 
> 

[jira] [Created] (HDDS-1290) ozone.log is not getting created in logs directory

2019-03-15 Thread Nilotpal Nandi (JIRA)
Nilotpal Nandi created HDDS-1290:


 Summary: ozone.log is not getting created in logs directory
 Key: HDDS-1290
 URL: https://issues.apache.org/jira/browse/HDDS-1290
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Reporter: Nilotpal Nandi


ozone.log is getting created in the log directory of the client or any other 
nodes of ozone cluster.

ozone version :

Source code repository g...@github.com:hortonworks/ozone.git -r 
67b7c4fd071b3f557bdb54be2a266b8a611cbce6
Compiled by jenkins on 2019-03-06T22:02Z
Compiled with protoc 2.5.0
>From source with checksum 65be9a337d178cd3855f5c5a2f111

Using HDDS 0.4.0.3.0.100.0-348
Source code repository g...@github.com:hortonworks/ozone.git -r 
67b7c4fd071b3f557bdb54be2a266b8a611cbce6
Compiled by jenkins on 2019-03-06T22:01Z
Compiled with protoc 2.5.0
>From source with checksum 324109cb3e8b188c1b89dc0b328c3a

[root@ctr-e139-1542663976389-86524-01-06 hdfs]# hadoop version
Hadoop 3.1.1.3.0.100.0-348
Source code repository g...@github.com:hortonworks/hadoop.git -r 
484434b1c2480bdc9314a7ee1ade8a0f4db1758f
Compiled by jenkins on 2019-03-06T22:14Z
Compiled with protoc 2.5.0
>From source with checksum ba6aad94c14256ef3ad8634e3b5086
This command was run using 
/usr/hdp/3.0.100.0-348/hadoop/hadoop-common-3.1.1.3.0.100.0-348.jar



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1290) ozone.log is not getting created in logs directory

2019-03-15 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1290:
-
Description: 
ozone.log is not getting created in the log directory of the client or any 
other nodes of ozone cluster.

ozone version :
 
 Source code repository g...@github.com:hortonworks/ozone.git -r 
67b7c4fd071b3f557bdb54be2a266b8a611cbce6
 Compiled by jenkins on 2019-03-06T22:02Z
 Compiled with protoc 2.5.0
 From source with checksum 65be9a337d178cd3855f5c5a2f111

Using HDDS 0.4.0.3.0.100.0-348
 Source code repository g...@github.com:hortonworks/ozone.git -r 
67b7c4fd071b3f557bdb54be2a266b8a611cbce6
 Compiled by jenkins on 2019-03-06T22:01Z
 Compiled with protoc 2.5.0
 From source with checksum 324109cb3e8b188c1b89dc0b328c3a

[root@ctr-e139-1542663976389-86524-01-06 hdfs]# hadoop version
 Hadoop 3.1.1.3.0.100.0-348
 Source code repository g...@github.com:hortonworks/hadoop.git -r 
484434b1c2480bdc9314a7ee1ade8a0f4db1758f
 Compiled by jenkins on 2019-03-06T22:14Z
 Compiled with protoc 2.5.0
 From source with checksum ba6aad94c14256ef3ad8634e3b5086
 This command was run using 
/usr/hdp/3.0.100.0-348/hadoop/hadoop-common-3.1.1.3.0.100.0-348.jar

  was:
ozone.log is getting created in the log directory of the client or any other 
nodes of ozone cluster.

ozone version :

Source code repository g...@github.com:hortonworks/ozone.git -r 
67b7c4fd071b3f557bdb54be2a266b8a611cbce6
Compiled by jenkins on 2019-03-06T22:02Z
Compiled with protoc 2.5.0
>From source with checksum 65be9a337d178cd3855f5c5a2f111

Using HDDS 0.4.0.3.0.100.0-348
Source code repository g...@github.com:hortonworks/ozone.git -r 
67b7c4fd071b3f557bdb54be2a266b8a611cbce6
Compiled by jenkins on 2019-03-06T22:01Z
Compiled with protoc 2.5.0
>From source with checksum 324109cb3e8b188c1b89dc0b328c3a

[root@ctr-e139-1542663976389-86524-01-06 hdfs]# hadoop version
Hadoop 3.1.1.3.0.100.0-348
Source code repository g...@github.com:hortonworks/hadoop.git -r 
484434b1c2480bdc9314a7ee1ade8a0f4db1758f
Compiled by jenkins on 2019-03-06T22:14Z
Compiled with protoc 2.5.0
>From source with checksum ba6aad94c14256ef3ad8634e3b5086
This command was run using 
/usr/hdp/3.0.100.0-348/hadoop/hadoop-common-3.1.1.3.0.100.0-348.jar


> ozone.log is not getting created in logs directory
> --
>
> Key: HDDS-1290
> URL: https://issues.apache.org/jira/browse/HDDS-1290
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Nilotpal Nandi
>Priority: Major
>
> ozone.log is not getting created in the log directory of the client or any 
> other nodes of ozone cluster.
> ozone version :
>  
>  Source code repository g...@github.com:hortonworks/ozone.git -r 
> 67b7c4fd071b3f557bdb54be2a266b8a611cbce6
>  Compiled by jenkins on 2019-03-06T22:02Z
>  Compiled with protoc 2.5.0
>  From source with checksum 65be9a337d178cd3855f5c5a2f111
> Using HDDS 0.4.0.3.0.100.0-348
>  Source code repository g...@github.com:hortonworks/ozone.git -r 
> 67b7c4fd071b3f557bdb54be2a266b8a611cbce6
>  Compiled by jenkins on 2019-03-06T22:01Z
>  Compiled with protoc 2.5.0
>  From source with checksum 324109cb3e8b188c1b89dc0b328c3a
> [root@ctr-e139-1542663976389-86524-01-06 hdfs]# hadoop version
>  Hadoop 3.1.1.3.0.100.0-348
>  Source code repository g...@github.com:hortonworks/hadoop.git -r 
> 484434b1c2480bdc9314a7ee1ade8a0f4db1758f
>  Compiled by jenkins on 2019-03-06T22:14Z
>  Compiled with protoc 2.5.0
>  From source with checksum ba6aad94c14256ef3ad8634e3b5086
>  This command was run using 
> /usr/hdp/3.0.100.0-348/hadoop/hadoop-common-3.1.1.3.0.100.0-348.jar



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1088) Add blockade Tests to test Replica Manager

2019-03-15 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1088:
-
Attachment: HDDS-1088.008.patch

> Add blockade Tests to test Replica Manager
> --
>
> Key: HDDS-1088
> URL: https://issues.apache.org/jira/browse/HDDS-1088
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1088.001.patch, HDDS-1088.002.patch, 
> HDDS-1088.003.patch, HDDS-1088.004.patch, HDDS-1088.005.patch, 
> HDDS-1088.006.patch, HDDS-1088.007.patch, HDDS-1088.008.patch
>
>
> We need to add tests for testing Replica Manager for scenarios like loss of 
> node, adding new nodes, under-replicated containers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1088) Add blockade Tests to test Replica Manager

2019-03-14 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1088:
-
Attachment: HDDS-1088.007.patch

> Add blockade Tests to test Replica Manager
> --
>
> Key: HDDS-1088
> URL: https://issues.apache.org/jira/browse/HDDS-1088
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1088.001.patch, HDDS-1088.002.patch, 
> HDDS-1088.003.patch, HDDS-1088.004.patch, HDDS-1088.005.patch, 
> HDDS-1088.006.patch, HDDS-1088.007.patch
>
>
> We need to add tests for testing Replica Manager for scenarios like loss of 
> node, adding new nodes, under-replicated containers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1088) Add blockade Tests to test Replica Manager

2019-03-14 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1088:
-
Attachment: HDDS-1088.006.patch

> Add blockade Tests to test Replica Manager
> --
>
> Key: HDDS-1088
> URL: https://issues.apache.org/jira/browse/HDDS-1088
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1088.001.patch, HDDS-1088.002.patch, 
> HDDS-1088.003.patch, HDDS-1088.004.patch, HDDS-1088.005.patch, 
> HDDS-1088.006.patch
>
>
> We need to add tests for testing Replica Manager for scenarios like loss of 
> node, adding new nodes, under-replicated containers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1088) Add blockade Tests to test Replica Manager

2019-03-14 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1088:
-
Attachment: (was: HDDS-1088.006.patch)

> Add blockade Tests to test Replica Manager
> --
>
> Key: HDDS-1088
> URL: https://issues.apache.org/jira/browse/HDDS-1088
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1088.001.patch, HDDS-1088.002.patch, 
> HDDS-1088.003.patch, HDDS-1088.004.patch, HDDS-1088.005.patch, 
> HDDS-1088.006.patch
>
>
> We need to add tests for testing Replica Manager for scenarios like loss of 
> node, adding new nodes, under-replicated containers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1088) Add blockade Tests to test Replica Manager

2019-03-14 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1088:
-
Attachment: HDDS-1088.006.patch

> Add blockade Tests to test Replica Manager
> --
>
> Key: HDDS-1088
> URL: https://issues.apache.org/jira/browse/HDDS-1088
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1088.001.patch, HDDS-1088.002.patch, 
> HDDS-1088.003.patch, HDDS-1088.004.patch, HDDS-1088.005.patch, 
> HDDS-1088.006.patch
>
>
> We need to add tests for testing Replica Manager for scenarios like loss of 
> node, adding new nodes, under-replicated containers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1088) Add blockade Tests to test Replica Manager

2019-03-14 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1088:
-
Attachment: HDDS-1088.005.patch

> Add blockade Tests to test Replica Manager
> --
>
> Key: HDDS-1088
> URL: https://issues.apache.org/jira/browse/HDDS-1088
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1088.001.patch, HDDS-1088.002.patch, 
> HDDS-1088.003.patch, HDDS-1088.004.patch, HDDS-1088.005.patch
>
>
> We need to add tests for testing Replica Manager for scenarios like loss of 
> node, adding new nodes, under-replicated containers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1251) all chunks are not deleted by block deletion even when all keys are deleted and all containers are closed

2019-03-12 Thread Nilotpal Nandi (JIRA)
Nilotpal Nandi created HDDS-1251:


 Summary: all chunks are not deleted by block deletion even when 
all keys are deleted and all containers are closed
 Key: HDDS-1251
 URL: https://issues.apache.org/jira/browse/HDDS-1251
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Nilotpal Nandi


steps taken :

---
 # created 40 nodes cluster, wrote data on all datanodes.
 # deleted all keys from the cluster and all containers are closed.

block deletion triggered and deleted most of the chunks from all datanodes.

But , it could not delete all chunks even after several days.

 

expectations : 

all chunks should be deleted if there is no key present in the cluster and all 
containers are closed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1088) Add blockade Tests to test Replica Manager

2019-03-12 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1088:
-
Attachment: HDDS-1088.004.patch

> Add blockade Tests to test Replica Manager
> --
>
> Key: HDDS-1088
> URL: https://issues.apache.org/jira/browse/HDDS-1088
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1088.001.patch, HDDS-1088.002.patch, 
> HDDS-1088.003.patch, HDDS-1088.004.patch
>
>
> We need to add tests for testing Replica Manager for scenarios like loss of 
> node, adding new nodes, under-replicated containers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1088) Add blockade Tests to test Replica Manager

2019-03-03 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1088:
-
Attachment: HDDS-1088.003.patch

> Add blockade Tests to test Replica Manager
> --
>
> Key: HDDS-1088
> URL: https://issues.apache.org/jira/browse/HDDS-1088
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1088.001.patch, HDDS-1088.002.patch, 
> HDDS-1088.003.patch
>
>
> We need to add tests for testing Replica Manager for scenarios like loss of 
> node, adding new nodes, under-replicated containers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1088) Add blockade Tests to test Replica Manager

2019-03-03 Thread Nilotpal Nandi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782714#comment-16782714
 ] 

Nilotpal Nandi commented on HDDS-1088:
--

Thanks [~shashikant] for the review. 

I have addressed the changes for comment # 1.

> Add blockade Tests to test Replica Manager
> --
>
> Key: HDDS-1088
> URL: https://issues.apache.org/jira/browse/HDDS-1088
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1088.001.patch, HDDS-1088.002.patch
>
>
> We need to add tests for testing Replica Manager for scenarios like loss of 
> node, adding new nodes, under-replicated containers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1206) need to handle in the client when one of the datanode disk goes out of space

2019-03-01 Thread Nilotpal Nandi (JIRA)
Nilotpal Nandi created HDDS-1206:


 Summary: need to handle in the client when one of the datanode 
disk goes out of space
 Key: HDDS-1206
 URL: https://issues.apache.org/jira/browse/HDDS-1206
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Nilotpal Nandi
Assignee: Shashikant Banerjee


steps taken :


 # create 40 datanode cluster.
 # one of the datanodes has less than 5 GB space.
 # Started writing key of size 600MB.

operation failed:

Error on the client:


{noformat}
Fri Mar 1 09:05:28 UTC 2019 Ruuning 
/root/hadoop_trunk/ozone-0.4.0-SNAPSHOT/bin/ozone sh key put 
testvol172275910-1551431122-1/testbuck172275910-1551431122-1/test_file24 
/root/test_files/test_file24
original md5sum a6de00c9284708585f5a99b0490b0b23
2019-03-01 09:05:39,142 ERROR storage.BlockOutputStream: Unexpected Storage 
Container Exception:
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
ContainerID 79 creation failed
 at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568)
 at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535)
 at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613)
 at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
 at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
 at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
2019-03-01 09:05:39,578 ERROR storage.BlockOutputStream: Unexpected Storage 
Container Exception:
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
ContainerID 79 creation failed
 at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568)
 at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535)
 at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613)
 at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
 at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
 at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
2019-03-01 09:05:40,368 ERROR storage.BlockOutputStream: Unexpected Storage 
Container Exception:
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
ContainerID 79 creation failed
 at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568)
 at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535)
 at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613)
 at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
 at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
 at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
2019-03-01 09:05:40,450 ERROR storage.BlockOutputStream: Unexpected Storage 
Container Exception:
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
ContainerID 79 creation failed
 at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568)
 at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535)
 at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613)
 at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
 at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
 at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)

[jira] [Updated] (HDDS-1206) need to handle in the client when one of the datanode disk goes out of space

2019-03-01 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1206:
-
Component/s: Ozone Client

> need to handle in the client when one of the datanode disk goes out of space
> 
>
> Key: HDDS-1206
> URL: https://issues.apache.org/jira/browse/HDDS-1206
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Nilotpal Nandi
>Assignee: Shashikant Banerjee
>Priority: Major
>
> steps taken :
> 
>  # create 40 datanode cluster.
>  # one of the datanodes has less than 5 GB space.
>  # Started writing key of size 600MB.
> operation failed:
> Error on the client:
> 
> {noformat}
> Fri Mar 1 09:05:28 UTC 2019 Ruuning 
> /root/hadoop_trunk/ozone-0.4.0-SNAPSHOT/bin/ozone sh key put 
> testvol172275910-1551431122-1/testbuck172275910-1551431122-1/test_file24 
> /root/test_files/test_file24
> original md5sum a6de00c9284708585f5a99b0490b0b23
> 2019-03-01 09:05:39,142 ERROR storage.BlockOutputStream: Unexpected Storage 
> Container Exception:
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  ContainerID 79 creation failed
>  at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613)
>  at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
>  at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>  at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-03-01 09:05:39,578 ERROR storage.BlockOutputStream: Unexpected Storage 
> Container Exception:
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  ContainerID 79 creation failed
>  at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613)
>  at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
>  at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>  at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-03-01 09:05:40,368 ERROR storage.BlockOutputStream: Unexpected Storage 
> Container Exception:
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  ContainerID 79 creation failed
>  at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613)
>  at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
>  at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>  at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-03-01 09:05:40,450 ERROR storage.BlockOutputStream: Unexpected Storage 
> Container Exception:
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  ContainerID 79 creation failed
>  at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613)
>  at 
> 

[jira] [Updated] (HDDS-1164) Add New blockade Tests to test Replica Manager

2019-02-22 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1164:
-
Status: Patch Available  (was: Open)

> Add New blockade Tests to test Replica Manager
> --
>
> Key: HDDS-1164
> URL: https://issues.apache.org/jira/browse/HDDS-1164
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1164.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1164) Add New blockade Tests to test Replica Manager

2019-02-22 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi reassigned HDDS-1164:


Assignee: Nilotpal Nandi

> Add New blockade Tests to test Replica Manager
> --
>
> Key: HDDS-1164
> URL: https://issues.apache.org/jira/browse/HDDS-1164
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1164) Add New blockade Tests to test Replica Manager

2019-02-22 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1164:
-
Attachment: HDDS-1164.001.patch

> Add New blockade Tests to test Replica Manager
> --
>
> Key: HDDS-1164
> URL: https://issues.apache.org/jira/browse/HDDS-1164
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1164.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1088) Add blockade Tests to test Replica Manager

2019-02-22 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1088:
-
Attachment: HDDS-1088.002.patch

> Add blockade Tests to test Replica Manager
> --
>
> Key: HDDS-1088
> URL: https://issues.apache.org/jira/browse/HDDS-1088
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1088.001.patch, HDDS-1088.002.patch
>
>
> We need to add tests for testing Replica Manager for scenarios like loss of 
> node, adding new nodes, under-replicated containers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1164) Add New blockade Tests to test Replica Manager

2019-02-22 Thread Nilotpal Nandi (JIRA)
Nilotpal Nandi created HDDS-1164:


 Summary: Add New blockade Tests to test Replica Manager
 Key: HDDS-1164
 URL: https://issues.apache.org/jira/browse/HDDS-1164
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Nilotpal Nandi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1131) destroy pipeline failed with PipelineNotFoundException

2019-02-19 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1131:
-
Fix Version/s: 0.4.0

> destroy pipeline failed with PipelineNotFoundException
> --
>
> Key: HDDS-1131
> URL: https://issues.apache.org/jira/browse/HDDS-1131
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Nilotpal Nandi
>Priority: Major
> Fix For: 0.4.0
>
>
> steps taken :
> 
>  # created 12 datanodes cluster and running workload on all the nodes
> exceptions seen in scm log
> 
> {noformat}
> 2019-02-18 07:17:51,112 INFO 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: destroying 
> pipeline:PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb with 
> group-012343D76ADB:[a40a7b01-a30b-469c-b373-9fcb20a126ed:172.27.54.212:9858, 
> 8c77b16b-8054-49e3-b669-1ff759cfd271:172.27.23.196:9858, 
> 943007c8-4fdd-4926-89e2-2c8c52c05073:172.27.76.72:9858]
> 2019-02-18 07:17:51,112 INFO 
> org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler: Close 
> container Event triggered for container : #40
> 2019-02-18 07:17:51,113 INFO 
> org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler: Close 
> container Event triggered for container : #41
> 2019-02-18 07:17:51,114 INFO 
> org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler: Close 
> container Event triggered for container : #42
> 2019-02-18 07:22:51,127 WARN 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Pipeline destroy 
> failed for pipeline=PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb 
> dn=a40a7b01-a30b-469c-b373-9fcb20a126ed{ip: 172.27.54.212, host: 
> ctr-e139-1542663976389-62237-01-07.hwx.site}
> 2019-02-18 07:22:51,139 WARN 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Pipeline destroy 
> failed for pipeline=PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb 
> dn=8c77b16b-8054-49e3-b669-1ff759cfd271{ip: 172.27.23.196, host: 
> ctr-e139-1542663976389-62237-01-15.hwx.site}
> 2019-02-18 07:22:51,149 WARN 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Pipeline destroy 
> failed for pipeline=PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb 
> dn=943007c8-4fdd-4926-89e2-2c8c52c05073{ip: 172.27.76.72, host: 
> ctr-e139-1542663976389-62237-01-06.hwx.site}
> 2019-02-18 07:22:51,150 ERROR 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Destroy pipeline 
> failed for pipeline:PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb with 
> group-012343D76ADB:[a40a7b01-a30b-469c-b373-9fcb20a126ed:172.27.54.212:9858, 
> 8c77b16b-8054-49e3-b669-1ff759cfd271:172.27.23.196:9858, 
> 943007c8-4fdd-4926-89e2-2c8c52c05073:172.27.76.72:9858]
> org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: 
> PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb not found
>  at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getPipeline(PipelineStateMap.java:112)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.removePipeline(PipelineStateMap.java:247)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.removePipeline(PipelineStateManager.java:90)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.removePipeline(SCMPipelineManager.java:261)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils.destroyPipeline(RatisPipelineUtils.java:103)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils.lambda$finalizeAndDestroyPipeline$1(RatisPipelineUtils.java:133)
>  at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:85)
>  at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:104)
>  at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50)
>  at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748){noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1131) destroy pipeline failed with PipelineNotFoundException

2019-02-19 Thread Nilotpal Nandi (JIRA)
Nilotpal Nandi created HDDS-1131:


 Summary: destroy pipeline failed with PipelineNotFoundException
 Key: HDDS-1131
 URL: https://issues.apache.org/jira/browse/HDDS-1131
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Nilotpal Nandi


steps taken :


 # created 12 datanodes cluster and running workload on all the nodes

exceptions seen in scm log


{noformat}
2019-02-18 07:17:51,112 INFO 
org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: destroying 
pipeline:PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb with 
group-012343D76ADB:[a40a7b01-a30b-469c-b373-9fcb20a126ed:172.27.54.212:9858, 
8c77b16b-8054-49e3-b669-1ff759cfd271:172.27.23.196:9858, 
943007c8-4fdd-4926-89e2-2c8c52c05073:172.27.76.72:9858]
2019-02-18 07:17:51,112 INFO 
org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler: Close 
container Event triggered for container : #40
2019-02-18 07:17:51,113 INFO 
org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler: Close 
container Event triggered for container : #41
2019-02-18 07:17:51,114 INFO 
org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler: Close 
container Event triggered for container : #42
2019-02-18 07:22:51,127 WARN 
org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Pipeline destroy failed 
for pipeline=PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb 
dn=a40a7b01-a30b-469c-b373-9fcb20a126ed{ip: 172.27.54.212, host: 
ctr-e139-1542663976389-62237-01-07.hwx.site}
2019-02-18 07:22:51,139 WARN 
org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Pipeline destroy failed 
for pipeline=PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb 
dn=8c77b16b-8054-49e3-b669-1ff759cfd271{ip: 172.27.23.196, host: 
ctr-e139-1542663976389-62237-01-15.hwx.site}
2019-02-18 07:22:51,149 WARN 
org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Pipeline destroy failed 
for pipeline=PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb 
dn=943007c8-4fdd-4926-89e2-2c8c52c05073{ip: 172.27.76.72, host: 
ctr-e139-1542663976389-62237-01-06.hwx.site}
2019-02-18 07:22:51,150 ERROR 
org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Destroy pipeline failed 
for pipeline:PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb with 
group-012343D76ADB:[a40a7b01-a30b-469c-b373-9fcb20a126ed:172.27.54.212:9858, 
8c77b16b-8054-49e3-b669-1ff759cfd271:172.27.23.196:9858, 
943007c8-4fdd-4926-89e2-2c8c52c05073:172.27.76.72:9858]
org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: 
PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb not found
 at 
org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getPipeline(PipelineStateMap.java:112)
 at 
org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.removePipeline(PipelineStateMap.java:247)
 at 
org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.removePipeline(PipelineStateManager.java:90)
 at 
org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.removePipeline(SCMPipelineManager.java:261)
 at 
org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils.destroyPipeline(RatisPipelineUtils.java:103)
 at 
org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils.lambda$finalizeAndDestroyPipeline$1(RatisPipelineUtils.java:133)
 at 
org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:85)
 at 
org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:104)
 at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50)
 at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748){noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1131) destroy pipeline failed with PipelineNotFoundException

2019-02-19 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1131:
-
Component/s: SCM

> destroy pipeline failed with PipelineNotFoundException
> --
>
> Key: HDDS-1131
> URL: https://issues.apache.org/jira/browse/HDDS-1131
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Nilotpal Nandi
>Priority: Major
>
> steps taken :
> 
>  # created 12 datanodes cluster and running workload on all the nodes
> exceptions seen in scm log
> 
> {noformat}
> 2019-02-18 07:17:51,112 INFO 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: destroying 
> pipeline:PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb with 
> group-012343D76ADB:[a40a7b01-a30b-469c-b373-9fcb20a126ed:172.27.54.212:9858, 
> 8c77b16b-8054-49e3-b669-1ff759cfd271:172.27.23.196:9858, 
> 943007c8-4fdd-4926-89e2-2c8c52c05073:172.27.76.72:9858]
> 2019-02-18 07:17:51,112 INFO 
> org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler: Close 
> container Event triggered for container : #40
> 2019-02-18 07:17:51,113 INFO 
> org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler: Close 
> container Event triggered for container : #41
> 2019-02-18 07:17:51,114 INFO 
> org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler: Close 
> container Event triggered for container : #42
> 2019-02-18 07:22:51,127 WARN 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Pipeline destroy 
> failed for pipeline=PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb 
> dn=a40a7b01-a30b-469c-b373-9fcb20a126ed{ip: 172.27.54.212, host: 
> ctr-e139-1542663976389-62237-01-07.hwx.site}
> 2019-02-18 07:22:51,139 WARN 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Pipeline destroy 
> failed for pipeline=PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb 
> dn=8c77b16b-8054-49e3-b669-1ff759cfd271{ip: 172.27.23.196, host: 
> ctr-e139-1542663976389-62237-01-15.hwx.site}
> 2019-02-18 07:22:51,149 WARN 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Pipeline destroy 
> failed for pipeline=PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb 
> dn=943007c8-4fdd-4926-89e2-2c8c52c05073{ip: 172.27.76.72, host: 
> ctr-e139-1542663976389-62237-01-06.hwx.site}
> 2019-02-18 07:22:51,150 ERROR 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Destroy pipeline 
> failed for pipeline:PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb with 
> group-012343D76ADB:[a40a7b01-a30b-469c-b373-9fcb20a126ed:172.27.54.212:9858, 
> 8c77b16b-8054-49e3-b669-1ff759cfd271:172.27.23.196:9858, 
> 943007c8-4fdd-4926-89e2-2c8c52c05073:172.27.76.72:9858]
> org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: 
> PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb not found
>  at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getPipeline(PipelineStateMap.java:112)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.removePipeline(PipelineStateMap.java:247)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.removePipeline(PipelineStateManager.java:90)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.removePipeline(SCMPipelineManager.java:261)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils.destroyPipeline(RatisPipelineUtils.java:103)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils.lambda$finalizeAndDestroyPipeline$1(RatisPipelineUtils.java:133)
>  at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:85)
>  at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:104)
>  at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50)
>  at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748){noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1126) datanode is trying to qausi-close a container which is already closed

2019-02-18 Thread Nilotpal Nandi (JIRA)
Nilotpal Nandi created HDDS-1126:


 Summary: datanode is trying to qausi-close a container which is 
already closed
 Key: HDDS-1126
 URL: https://issues.apache.org/jira/browse/HDDS-1126
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Nilotpal Nandi


steps taken :


 # created 12 datanodes cluster and running workload on all the nodes
 # running failure injection/restart on 1 datanode at a time periodically and 
randomly.

 

Error seen in ozone.log :

--

 
{noformat}
2019-02-18 06:06:32,780 [Datanode State Machine Thread - 0] DEBUG 
(DatanodeStateMachine.java:176) - Executing cycle Number : 30
2019-02-18 06:06:32,784 [Command processor thread] DEBUG 
(CloseContainerCommandHandler.java:71) - Processing Close Container command.
2019-02-18 06:06:32,785 [Datanode State Machine Thread - 0] DEBUG 
(DatanodeStateMachine.java:176) - Executing cycle Number : 31
2019-02-18 06:06:32,785 [Command processor thread] ERROR 
(CloseContainerCommandHandler.java:118) - Can't close container #37
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
Cannot quasi close container #37 while in CLOSED state.
 at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.quasiCloseContainer(KeyValueHandler.java:903)
 at 
org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.quasiCloseContainer(ContainerController.java:93)
 at 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CloseContainerCommandHandler.handle(CloseContainerCommandHandler.java:110)
 at 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:93)
 at 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$1(DatanodeStateMachine.java:413)
 at java.lang.Thread.run(Thread.java:748)
2019-02-18 06:06:32,785 [Command processor thread] DEBUG 
(CloseContainerCommandHandler.java:71) - Processing Close Container command.
2019-02-18 06:06:32,788 [Command processor thread] DEBUG 
(CloseContainerCommandHandler.java:71) - Processing Close Container command.
2019-02-18 06:06:32,788 [Datanode State Machine Thread - 0] DEBUG 
(DatanodeStateMachine.java:176) - Executing cycle Number : 32
2019-02-18 06:06:34,430 [main] DEBUG (OzoneClientFactory.java:287) - Using 
org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol.
2019-02-18 06:06:36,608 [main] DEBUG (OzoneClientFactory.java:287) - Using 
org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol.
2019-02-18 06:06:38,876 [main] DEBUG (OzoneClientFactory.java:287) - Using 
org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol.
2019-02-18 06:06:41,084 [main] DEBUG (OzoneClientFactory.java:287) - Using 
org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol.
2019-02-18 06:06:43,297 [main] DEBUG (OzoneClientFactory.java:287) - Using 
org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol.
2019-02-18 06:06:45,469 [main] DEBUG (OzoneClientFactory.java:287) - Using 
org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol.
2019-02-18 06:06:47,684 [main] DEBUG (OzoneClientFactory.java:287) - Using 
org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol.
2019-02-18 06:06:49,958 [main] DEBUG (OzoneClientFactory.java:287) - Using 
org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol.
2019-02-18 06:06:52,124 [main] DEBUG (OzoneClientFactory.java:287) - Using 
org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol.
2019-02-18 06:06:54,344 [main] DEBUG (OzoneClientFactory.java:287) - Using 
org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol.
2019-02-18 06:06:56,499 [main] DEBUG (OzoneClientFactory.java:287) - Using 
org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol.
2019-02-18 06:06:58,764 [main] DEBUG (OzoneClientFactory.java:287) - Using 
org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol.
2019-02-18 06:07:00,969 [main] DEBUG (OzoneClientFactory.java:287) - Using 
org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol.
2019-02-18 06:07:02,788 [Datanode State Machine Thread - 0] DEBUG 
(DatanodeStateMachine.java:176) - Executing cycle Number : 33
2019-02-18 06:07:03,240 [main] DEBUG (OzoneClientFactory.java:287) - Using 
org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol.
2019-02-18 06:07:05,486 [main] DEBUG (OzoneClientFactory.java:287) - Using 
org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol.
 
{noformat}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1125) java.lang.InterruptedException seen in datanode logs

2019-02-17 Thread Nilotpal Nandi (JIRA)
Nilotpal Nandi created HDDS-1125:


 Summary: java.lang.InterruptedException seen in datanode logs
 Key: HDDS-1125
 URL: https://issues.apache.org/jira/browse/HDDS-1125
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Nilotpal Nandi


steps taken :


 # created 12 datanodes cluster and running workload on all the nodes

 

exception seen :

-

 
{noformat}
2019-02-15 10:16:48,713 ERROR org.apache.ratis.server.impl.LogAppender: 
943007c8-4fdd-4926-89e2-2c8c52c05073: Failed readStateMachineData for (t:3, 
i:3084), STATEMACHINELOGENTRY, client-632E77ADA885, cid=6232
java.lang.InterruptedException
 at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:347)
 at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
 at 
org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:433)
 at org.apache.ratis.util.DataQueue.pollList(DataQueue.java:133)
 at org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:171)
 at 
org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)
 at 
org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)
 at org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:101)
 at java.lang.Thread.run(Thread.java:748)
2019-02-15 10:16:48,714 ERROR org.apache.ratis.server.impl.LogAppender: 
GrpcLogAppender(943007c8-4fdd-4926-89e2-2c8c52c05073 -> 
8c77b16b-8054-49e3-b669-1ff759cfd271) hit IOException while loading raft log
org.apache.ratis.server.storage.RaftLogIOException: 
943007c8-4fdd-4926-89e2-2c8c52c05073: Failed readStateMachineData for (t:3, 
i:3084), STATEMACHINELOGENTRY, client-632E77ADA885, cid=6232
 at 
org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:440)
 at org.apache.ratis.util.DataQueue.pollList(DataQueue.java:133)
 at org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:171)
 at 
org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)
 at 
org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)
 at org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:101)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.InterruptedException
 at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:347)
 at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
 at 
org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:433)
 ... 6 more
2019-02-15 10:16:48,715 ERROR org.apache.ratis.server.impl.LogAppender: 
943007c8-4fdd-4926-89e2-2c8c52c05073: Failed readStateMachineData for (t:3, 
i:3084), STATEMACHINELOGENTRY, client-632E77ADA885, cid=6232
java.lang.InterruptedException
 at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:347)
 at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
 at 
org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:433)
 at org.apache.ratis.util.DataQueue.pollList(DataQueue.java:133)
 at org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:171)
 at 
org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)
 at 
org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)
 at org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:101)
 at java.lang.Thread.run(Thread.java:748)
2019-02-15 10:16:48,715 ERROR org.apache.ratis.server.impl.LogAppender: 
GrpcLogAppender(943007c8-4fdd-4926-89e2-2c8c52c05073 -> 
a40a7b01-a30b-469c-b373-9fcb20a126ed) hit IOException while loading raft log
org.apache.ratis.server.storage.RaftLogIOException: 
943007c8-4fdd-4926-89e2-2c8c52c05073: Failed readStateMachineData for (t:3, 
i:3084), STATEMACHINELOGENTRY, client-632E77ADA885, cid=6232
 at 
org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:440)
 at org.apache.ratis.util.DataQueue.pollList(DataQueue.java:133)
 at org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:171)
 at 
org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)
 at 
org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)
 at org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:101)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.InterruptedException
 at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:347)
 at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at 
org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:433)
 ... 6 more
2019-02-15 10:16:48,723 WARN 
org.apache.ratis.grpc.client.GrpcClientProtocolService: 
943007c8-4fdd-4926-89e2-2c8c52c05073-5: onError: 

[jira] [Created] (HDDS-1124) java.lang.IllegalStateException exception in datanode log

2019-02-17 Thread Nilotpal Nandi (JIRA)
Nilotpal Nandi created HDDS-1124:


 Summary: java.lang.IllegalStateException exception in datanode log
 Key: HDDS-1124
 URL: https://issues.apache.org/jira/browse/HDDS-1124
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Nilotpal Nandi


steps taken :


 # created 12 datanodes cluster and running workload on all the nodes

exception seen :

---

 
{noformat}
2019-02-15 10:15:53,355 INFO org.apache.ratis.server.storage.RaftLogWorker: 
943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: Rolled log segment from 
/data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3036
 to 
/data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_3036-3047
2019-02-15 10:15:53,367 INFO org.apache.ratis.server.impl.RaftServerImpl: 
943007c8-4fdd-4926-89e2-2c8c52c05073: set configuration 3048: 
[a40a7b01-a30b-469c-b373-9fcb20a126ed:172.27.54.212:9858, 
8c77b16b-8054-49e3-b669-1ff759cfd271:172.27.23.196:9858, 
943007c8-4fdd-4926-89e2-2c8c52c05073:172.27.76.72:9858], old=null at 3048
2019-02-15 10:15:53,523 INFO org.apache.ratis.server.storage.RaftLogWorker: 
943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: created new log segment 
/data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3048
2019-02-15 10:15:53,580 ERROR org.apache.ratis.grpc.server.GrpcLogAppender: 
Failed onNext serverReply {
 requestorId: "943007c8-4fdd-4926-89e2-2c8c52c05073"
 replyId: "a40a7b01-a30b-469c-b373-9fcb20a126ed"
 raftGroupId {
 id: "\001\323\357*\221,O\300\200\266\001#C\327j\333"
 }
 success: true
}
term: 3
nextIndex: 3049
followerCommit: 3047
java.lang.IllegalStateException: reply's next index is 3049, request's previous 
is term: 1
index: 3047
at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
 at 
org.apache.ratis.grpc.server.GrpcLogAppender.onSuccess(GrpcLogAppender.java:285)
 at 
org.apache.ratis.grpc.server.GrpcLogAppender$AppendLogResponseHandler.onNextImpl(GrpcLogAppender.java:230)
 at 
org.apache.ratis.grpc.server.GrpcLogAppender$AppendLogResponseHandler.onNext(GrpcLogAppender.java:215)
 at 
org.apache.ratis.grpc.server.GrpcLogAppender$AppendLogResponseHandler.onNext(GrpcLogAppender.java:197)
 at 
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:421)
 at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
 at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
 at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:519)
 at 
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
 at 
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
2019-02-15 10:15:56,442 INFO org.apache.ratis.server.storage.RaftLogWorker: 
943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: Rolling segment 
log-3048_3066 to index:3066
2019-02-15 10:15:56,442 INFO org.apache.ratis.server.storage.RaftLogWorker: 
943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: Rolled log segment from 
/data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3048
 to 
/data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_3048-3066
2019-02-15 10:15:56,564 INFO org.apache.ratis.server.storage.RaftLogWorker: 
943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: created new log segment 
/data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3067
2019-02-15 10:16:45,420 INFO org.apache.ratis.server.storage.RaftLogWorker: 
943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: Rolling segment 
log-3067_3077 to index:3077
{noformat}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1088) Add blockade Tests to test Replica Manager

2019-02-17 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1088:
-
Attachment: HDDS-1088.001.patch

> Add blockade Tests to test Replica Manager
> --
>
> Key: HDDS-1088
> URL: https://issues.apache.org/jira/browse/HDDS-1088
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1088.001.patch
>
>
> We need to add tests for testing Replica Manager for scenarios like loss of 
> node, adding new nodes, under-replicated containers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1088) Add blockade Tests to test Replica Manager

2019-02-17 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1088:
-
Status: Patch Available  (was: Open)

> Add blockade Tests to test Replica Manager
> --
>
> Key: HDDS-1088
> URL: https://issues.apache.org/jira/browse/HDDS-1088
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1088.001.patch
>
>
> We need to add tests for testing Replica Manager for scenarios like loss of 
> node, adding new nodes, under-replicated containers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1088) Add blockade Tests to test Replica Manager

2019-02-17 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi reassigned HDDS-1088:


Assignee: Nilotpal Nandi

> Add blockade Tests to test Replica Manager
> --
>
> Key: HDDS-1088
> URL: https://issues.apache.org/jira/browse/HDDS-1088
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
>
> We need to add tests for testing Replica Manager for scenarios like loss of 
> node, adding new nodes, under-replicated containers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1102) docker datanode stopped when new datanodes are added to the cluster

2019-02-17 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1102:
-
Attachment: allnode.log

> docker datanode stopped when new  datanodes are added to the cluster
> 
>
> Key: HDDS-1102
> URL: https://issues.apache.org/jira/browse/HDDS-1102
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Priority: Major
> Attachments: allnode.log, datanode.log
>
>
> steps taken:
> 
>  # created 5 datanode cluster.
>  # shutdown 2 datanodes
>  # started the datanodes again.
> One of the datanodes was shut down.
> exception seen :
>  
> {noformat}
> 2019-02-14 07:37:26 INFO LeaderElection:230 - 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8 got exception when requesting votes: {}
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found.
>  at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>  at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>  at 
> org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214)
>  at 
> org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146)
>  at org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102)
> Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> INTERNAL: a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found.
>  at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:233)
>  at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:214)
>  at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:139)
>  at 
> org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:265)
>  at 
> org.apache.ratis.grpc.server.GrpcServerProtocolClient.requestVote(GrpcServerProtocolClient.java:83)
>  at org.apache.ratis.grpc.server.GrpcService.requestVote(GrpcService.java:187)
>  at 
> org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-02-14 07:37:26 INFO LeaderElection:46 - 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8: Election PASSED; received 1 response(s) 
> [6a0522ba-019e-4b77-ac1f-a9322cd525b8<-61ad3bf3-e9b1-48e5-90e3-3b78c8b5bba5#0:OK-t7]
>  and 1 exception(s); 6a0522ba-019e-4b77-ac1f-a9322cd525b8:t7, leader=null, 
> voted=6a0522ba-019e-4b77-ac1f-a9322cd525b8, 
> raftlog=6a0522ba-019e-4b77-ac1f-a9322cd525b8-SegmentedRaftLog:OPENED, conf=3: 
> [61ad3bf3-e9b1-48e5-90e3-3b78c8b5bba5:172.20.0.8:9858, 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8:172.20.0.6:9858, 
> 0f377918-aafa-4d8a-972a-6ead54048fba:172.20.0.3:9858], old=null
> 2019-02-14 07:37:26 INFO LeaderElection:52 - 0: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found.
> 2019-02-14 07:37:26 INFO RoleInfo:130 - 6a0522ba-019e-4b77-ac1f-a9322cd525b8: 
> shutdown LeaderElection
> 2019-02-14 07:37:26 INFO RaftServerImpl:161 - 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8 changes role from CANDIDATE to LEADER at 
> term 7 for changeToLeader
> 2019-02-14 07:37:26 INFO RaftServerImpl:258 - 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8: change Leader from null to 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8 at term 7 for becomeLeader, leader 
> elected after 1066ms
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
> raft.server.staging.catchup.gap = 1000 (default)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.rpc.sleep.time 
> = 25ms (default)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.watch.timeout 
> = 10s (default)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
> raft.server.watch.timeout.denomination = 1s (default)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
> raft.server.log.appender.snapshot.chunk.size.max = 16MB (=16777216) (default)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
> raft.server.log.appender.buffer.byte-limit = 33554432 (custom)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
> 

[jira] [Commented] (HDDS-1102) docker datanode stopped when new datanodes are added to the cluster

2019-02-17 Thread Nilotpal Nandi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16770480#comment-16770480
 ] 

Nilotpal Nandi commented on HDDS-1102:
--

Here is all node logs for a different run :

[^allnode.log]

> docker datanode stopped when new  datanodes are added to the cluster
> 
>
> Key: HDDS-1102
> URL: https://issues.apache.org/jira/browse/HDDS-1102
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Priority: Major
> Attachments: allnode.log, datanode.log
>
>
> steps taken:
> 
>  # created 5 datanode cluster.
>  # shutdown 2 datanodes
>  # started the datanodes again.
> One of the datanodes was shut down.
> exception seen :
>  
> {noformat}
> 2019-02-14 07:37:26 INFO LeaderElection:230 - 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8 got exception when requesting votes: {}
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found.
>  at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>  at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>  at 
> org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214)
>  at 
> org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146)
>  at org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102)
> Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> INTERNAL: a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found.
>  at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:233)
>  at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:214)
>  at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:139)
>  at 
> org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:265)
>  at 
> org.apache.ratis.grpc.server.GrpcServerProtocolClient.requestVote(GrpcServerProtocolClient.java:83)
>  at org.apache.ratis.grpc.server.GrpcService.requestVote(GrpcService.java:187)
>  at 
> org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-02-14 07:37:26 INFO LeaderElection:46 - 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8: Election PASSED; received 1 response(s) 
> [6a0522ba-019e-4b77-ac1f-a9322cd525b8<-61ad3bf3-e9b1-48e5-90e3-3b78c8b5bba5#0:OK-t7]
>  and 1 exception(s); 6a0522ba-019e-4b77-ac1f-a9322cd525b8:t7, leader=null, 
> voted=6a0522ba-019e-4b77-ac1f-a9322cd525b8, 
> raftlog=6a0522ba-019e-4b77-ac1f-a9322cd525b8-SegmentedRaftLog:OPENED, conf=3: 
> [61ad3bf3-e9b1-48e5-90e3-3b78c8b5bba5:172.20.0.8:9858, 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8:172.20.0.6:9858, 
> 0f377918-aafa-4d8a-972a-6ead54048fba:172.20.0.3:9858], old=null
> 2019-02-14 07:37:26 INFO LeaderElection:52 - 0: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found.
> 2019-02-14 07:37:26 INFO RoleInfo:130 - 6a0522ba-019e-4b77-ac1f-a9322cd525b8: 
> shutdown LeaderElection
> 2019-02-14 07:37:26 INFO RaftServerImpl:161 - 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8 changes role from CANDIDATE to LEADER at 
> term 7 for changeToLeader
> 2019-02-14 07:37:26 INFO RaftServerImpl:258 - 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8: change Leader from null to 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8 at term 7 for becomeLeader, leader 
> elected after 1066ms
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
> raft.server.staging.catchup.gap = 1000 (default)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.rpc.sleep.time 
> = 25ms (default)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.watch.timeout 
> = 10s (default)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
> raft.server.watch.timeout.denomination = 1s (default)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
> raft.server.log.appender.snapshot.chunk.size.max = 16MB (=16777216) (default)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
> raft.server.log.appender.buffer.byte-limit = 33554432 (custom)
> 2019-02-14 

[jira] [Commented] (HDDS-1102) docker datanode stopped when new datanodes are added to the cluster

2019-02-14 Thread Nilotpal Nandi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768068#comment-16768068
 ] 

Nilotpal Nandi commented on HDDS-1102:
--

datanode log of the node which was shutdown :

[^datanode.log]

> docker datanode stopped when new  datanodes are added to the cluster
> 
>
> Key: HDDS-1102
> URL: https://issues.apache.org/jira/browse/HDDS-1102
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Priority: Major
> Attachments: datanode.log
>
>
> steps taken:
> 
>  # created 5 datanode cluster.
>  # shutdown 2 datanodes
>  # started the datanodes again.
> One of the datanodes was shut down.
> exception seen :
>  
> {noformat}
> 2019-02-14 07:37:26 INFO LeaderElection:230 - 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8 got exception when requesting votes: {}
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found.
>  at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>  at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>  at 
> org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214)
>  at 
> org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146)
>  at org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102)
> Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> INTERNAL: a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found.
>  at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:233)
>  at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:214)
>  at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:139)
>  at 
> org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:265)
>  at 
> org.apache.ratis.grpc.server.GrpcServerProtocolClient.requestVote(GrpcServerProtocolClient.java:83)
>  at org.apache.ratis.grpc.server.GrpcService.requestVote(GrpcService.java:187)
>  at 
> org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-02-14 07:37:26 INFO LeaderElection:46 - 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8: Election PASSED; received 1 response(s) 
> [6a0522ba-019e-4b77-ac1f-a9322cd525b8<-61ad3bf3-e9b1-48e5-90e3-3b78c8b5bba5#0:OK-t7]
>  and 1 exception(s); 6a0522ba-019e-4b77-ac1f-a9322cd525b8:t7, leader=null, 
> voted=6a0522ba-019e-4b77-ac1f-a9322cd525b8, 
> raftlog=6a0522ba-019e-4b77-ac1f-a9322cd525b8-SegmentedRaftLog:OPENED, conf=3: 
> [61ad3bf3-e9b1-48e5-90e3-3b78c8b5bba5:172.20.0.8:9858, 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8:172.20.0.6:9858, 
> 0f377918-aafa-4d8a-972a-6ead54048fba:172.20.0.3:9858], old=null
> 2019-02-14 07:37:26 INFO LeaderElection:52 - 0: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found.
> 2019-02-14 07:37:26 INFO RoleInfo:130 - 6a0522ba-019e-4b77-ac1f-a9322cd525b8: 
> shutdown LeaderElection
> 2019-02-14 07:37:26 INFO RaftServerImpl:161 - 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8 changes role from CANDIDATE to LEADER at 
> term 7 for changeToLeader
> 2019-02-14 07:37:26 INFO RaftServerImpl:258 - 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8: change Leader from null to 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8 at term 7 for becomeLeader, leader 
> elected after 1066ms
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
> raft.server.staging.catchup.gap = 1000 (default)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.rpc.sleep.time 
> = 25ms (default)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.watch.timeout 
> = 10s (default)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
> raft.server.watch.timeout.denomination = 1s (default)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
> raft.server.log.appender.snapshot.chunk.size.max = 16MB (=16777216) (default)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
> raft.server.log.appender.buffer.byte-limit = 33554432 (custom)
> 2019-02-14 07:37:26 INFO 

[jira] [Created] (HDDS-1102) docker datanode stopped when new datanodes are added to the cluster

2019-02-14 Thread Nilotpal Nandi (JIRA)
Nilotpal Nandi created HDDS-1102:


 Summary: docker datanode stopped when new  datanodes are added to 
the cluster
 Key: HDDS-1102
 URL: https://issues.apache.org/jira/browse/HDDS-1102
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Nilotpal Nandi


steps taken:


 # created 5 datanode cluster.
 # shutdown 2 datanodes
 # started the datanodes again.

One of the datanodes was shut down.

exception seen :

 
{noformat}
2019-02-14 07:37:26 INFO LeaderElection:230 - 
6a0522ba-019e-4b77-ac1f-a9322cd525b8 got exception when requesting votes: {}
java.util.concurrent.ExecutionException: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found.
 at java.util.concurrent.FutureTask.report(FutureTask.java:122)
 at java.util.concurrent.FutureTask.get(FutureTask.java:192)
 at 
org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214)
 at 
org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146)
 at org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102)
Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
INTERNAL: a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found.
 at 
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:233)
 at 
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:214)
 at 
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:139)
 at 
org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:265)
 at 
org.apache.ratis.grpc.server.GrpcServerProtocolClient.requestVote(GrpcServerProtocolClient.java:83)
 at org.apache.ratis.grpc.server.GrpcService.requestVote(GrpcService.java:187)
 at 
org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
2019-02-14 07:37:26 INFO LeaderElection:46 - 
6a0522ba-019e-4b77-ac1f-a9322cd525b8: Election PASSED; received 1 response(s) 
[6a0522ba-019e-4b77-ac1f-a9322cd525b8<-61ad3bf3-e9b1-48e5-90e3-3b78c8b5bba5#0:OK-t7]
 and 1 exception(s); 6a0522ba-019e-4b77-ac1f-a9322cd525b8:t7, leader=null, 
voted=6a0522ba-019e-4b77-ac1f-a9322cd525b8, 
raftlog=6a0522ba-019e-4b77-ac1f-a9322cd525b8-SegmentedRaftLog:OPENED, conf=3: 
[61ad3bf3-e9b1-48e5-90e3-3b78c8b5bba5:172.20.0.8:9858, 
6a0522ba-019e-4b77-ac1f-a9322cd525b8:172.20.0.6:9858, 
0f377918-aafa-4d8a-972a-6ead54048fba:172.20.0.3:9858], old=null
2019-02-14 07:37:26 INFO LeaderElection:52 - 0: 
java.util.concurrent.ExecutionException: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found.
2019-02-14 07:37:26 INFO RoleInfo:130 - 6a0522ba-019e-4b77-ac1f-a9322cd525b8: 
shutdown LeaderElection
2019-02-14 07:37:26 INFO RaftServerImpl:161 - 
6a0522ba-019e-4b77-ac1f-a9322cd525b8 changes role from CANDIDATE to LEADER at 
term 7 for changeToLeader
2019-02-14 07:37:26 INFO RaftServerImpl:258 - 
6a0522ba-019e-4b77-ac1f-a9322cd525b8: change Leader from null to 
6a0522ba-019e-4b77-ac1f-a9322cd525b8 at term 7 for becomeLeader, leader elected 
after 1066ms
2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
raft.server.staging.catchup.gap = 1000 (default)
2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.rpc.sleep.time = 
25ms (default)
2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.watch.timeout = 
10s (default)
2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
raft.server.watch.timeout.denomination = 1s (default)
2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
raft.server.log.appender.snapshot.chunk.size.max = 16MB (=16777216) (default)
2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
raft.server.log.appender.buffer.byte-limit = 33554432 (custom)
2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
raft.server.log.appender.buffer.element-limit = 1 (custom)
2019-02-14 07:37:26 INFO GrpcConfigKeys$Server:43 - 
raft.grpc.server.leader.outstanding.appends.max = 128 (default)
2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
raft.server.rpc.request.timeout = 3000ms (default)
2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
raft.server.log.appender.snapshot.chunk.size.max = 16MB (=16777216) (default)
2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 

[jira] [Updated] (HDDS-1102) docker datanode stopped when new datanodes are added to the cluster

2019-02-14 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1102:
-
Attachment: datanode.log

> docker datanode stopped when new  datanodes are added to the cluster
> 
>
> Key: HDDS-1102
> URL: https://issues.apache.org/jira/browse/HDDS-1102
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Priority: Major
> Attachments: datanode.log
>
>
> steps taken:
> 
>  # created 5 datanode cluster.
>  # shutdown 2 datanodes
>  # started the datanodes again.
> One of the datanodes was shut down.
> exception seen :
>  
> {noformat}
> 2019-02-14 07:37:26 INFO LeaderElection:230 - 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8 got exception when requesting votes: {}
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found.
>  at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>  at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>  at 
> org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214)
>  at 
> org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146)
>  at org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102)
> Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> INTERNAL: a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found.
>  at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:233)
>  at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:214)
>  at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:139)
>  at 
> org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:265)
>  at 
> org.apache.ratis.grpc.server.GrpcServerProtocolClient.requestVote(GrpcServerProtocolClient.java:83)
>  at org.apache.ratis.grpc.server.GrpcService.requestVote(GrpcService.java:187)
>  at 
> org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-02-14 07:37:26 INFO LeaderElection:46 - 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8: Election PASSED; received 1 response(s) 
> [6a0522ba-019e-4b77-ac1f-a9322cd525b8<-61ad3bf3-e9b1-48e5-90e3-3b78c8b5bba5#0:OK-t7]
>  and 1 exception(s); 6a0522ba-019e-4b77-ac1f-a9322cd525b8:t7, leader=null, 
> voted=6a0522ba-019e-4b77-ac1f-a9322cd525b8, 
> raftlog=6a0522ba-019e-4b77-ac1f-a9322cd525b8-SegmentedRaftLog:OPENED, conf=3: 
> [61ad3bf3-e9b1-48e5-90e3-3b78c8b5bba5:172.20.0.8:9858, 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8:172.20.0.6:9858, 
> 0f377918-aafa-4d8a-972a-6ead54048fba:172.20.0.3:9858], old=null
> 2019-02-14 07:37:26 INFO LeaderElection:52 - 0: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found.
> 2019-02-14 07:37:26 INFO RoleInfo:130 - 6a0522ba-019e-4b77-ac1f-a9322cd525b8: 
> shutdown LeaderElection
> 2019-02-14 07:37:26 INFO RaftServerImpl:161 - 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8 changes role from CANDIDATE to LEADER at 
> term 7 for changeToLeader
> 2019-02-14 07:37:26 INFO RaftServerImpl:258 - 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8: change Leader from null to 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8 at term 7 for becomeLeader, leader 
> elected after 1066ms
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
> raft.server.staging.catchup.gap = 1000 (default)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.rpc.sleep.time 
> = 25ms (default)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.watch.timeout 
> = 10s (default)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
> raft.server.watch.timeout.denomination = 1s (default)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
> raft.server.log.appender.snapshot.chunk.size.max = 16MB (=16777216) (default)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
> raft.server.log.appender.buffer.byte-limit = 33554432 (custom)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
> raft.server.log.appender.buffer.element-limit 

[jira] [Commented] (HDDS-1047) Fix TestRatisPipelineProvider#testCreatePipelineWithFactor

2019-02-12 Thread Nilotpal Nandi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766311#comment-16766311
 ] 

Nilotpal Nandi commented on HDDS-1047:
--

[~bharatviswa] and [~linyiqun]

Thanks for the comments.

I have uploaded the patch with changes.

> Fix TestRatisPipelineProvider#testCreatePipelineWithFactor
> --
>
> Key: HDDS-1047
> URL: https://issues.apache.org/jira/browse/HDDS-1047
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1047.001.patch, HDDS-1047.002.patch, 
> HDDS-1047.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1047) Fix TestRatisPipelineProvider#testCreatePipelineWithFactor

2019-02-12 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1047:
-
Attachment: HDDS-1047.003.patch

> Fix TestRatisPipelineProvider#testCreatePipelineWithFactor
> --
>
> Key: HDDS-1047
> URL: https://issues.apache.org/jira/browse/HDDS-1047
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1047.001.patch, HDDS-1047.002.patch, 
> HDDS-1047.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1088) Add blockade Tests to test Replica Manager

2019-02-11 Thread Nilotpal Nandi (JIRA)
Nilotpal Nandi created HDDS-1088:


 Summary: Add blockade Tests to test Replica Manager
 Key: HDDS-1088
 URL: https://issues.apache.org/jira/browse/HDDS-1088
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Nilotpal Nandi


We need to add tests for testing Replica Manager for scenarios like loss of 
node, adding new nodes, under-replicated containers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1082) OutOfMemoryError while reading key

2019-02-11 Thread Nilotpal Nandi (JIRA)
Nilotpal Nandi created HDDS-1082:


 Summary: OutOfMemoryError while reading key
 Key: HDDS-1082
 URL: https://issues.apache.org/jira/browse/HDDS-1082
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Nilotpal Nandi


steps taken :


 # put key with size 100GB
 # Tried to read back the key.

error thrown:

--
{noformat}
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /tmp/heapdump.bin ...
Heap dump file created [3883178021 bytes in 10.667 secs]
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
 at 
org.apache.ratis.thirdparty.com.google.protobuf.ByteString.toByteArray(ByteString.java:643)
 at org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:217)
 at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.readChunkFromContainer(BlockInputStream.java:227)
 at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.prepareRead(BlockInputStream.java:188)
 at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:130)
 at 
org.apache.hadoop.ozone.client.io.KeyInputStream$ChunkInputStreamEntry.read(KeyInputStream.java:232)
 at 
org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:126)
 at 
org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:49)
 at java.io.InputStream.read(InputStream.java:101)
 at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:100)
 at 
org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:98)
 at 
org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:48)
 at picocli.CommandLine.execute(CommandLine.java:919)
 at picocli.CommandLine.access$700(CommandLine.java:104)
 at picocli.CommandLine$RunLast.handle(CommandLine.java:1083)
 at picocli.CommandLine$RunLast.handle(CommandLine.java:1051)
 at 
picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959)
 at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242)
 at picocli.CommandLine.parseWithHandler(CommandLine.java:1181)
 at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:61)
 at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:52)
 at org.apache.hadoop.ozone.web.ozShell.Shell.main(Shell.java:83){noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1079) java.lang.RuntimeException: ManagedChannel allocation site exception seen on client cli when datanode restarted in one of the pipelines

2019-02-11 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1079:
-
Attachment: nodes-ozone-logs-1549879783.tar.gz

> java.lang.RuntimeException: ManagedChannel allocation site exception seen on 
> client cli when datanode restarted in one of the pipelines
> ---
>
> Key: HDDS-1079
> URL: https://issues.apache.org/jira/browse/HDDS-1079
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Nilotpal Nandi
>Priority: Major
> Attachments: nodes-ozone-logs-1549879783.tar.gz
>
>
> steps taken :
> 
>  # created 12 datanode cluster.
>  # started put key operation with size 100GB.
>  # Restarted one of the datanodes from one of the pipelines.
> exception seen  on cli :
> 
>  
> {noformat}
> [root@ctr-e139-1542663976389-62237-01-06 ~]# time ozone sh key put 
> volume1/bucket1/key1 /root/100G
> Feb 11, 2019 9:12:49 AM 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference
>  cleanQueue
> SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=61, 
> target=172.27.10.133:9858} was not shutdown properly!!! ~*~*~*
>  Make sure to call shutdown()/shutdownNow() and wait until awaitTermination() 
> returns true.
> java.lang.RuntimeException: ManagedChannel allocation site
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103)
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53)
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:44)
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:411)
>  at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.(GrpcClientProtocolClient.java:116)
>  at 
> org.apache.ratis.grpc.client.GrpcClientRpc.lambda$new$0(GrpcClientRpc.java:54)
>  at 
> org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$getProxy$0(PeerProxyMap.java:60)
>  at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:191)
>  at 
> org.apache.ratis.util.PeerProxyMap$PeerAndProxy.getProxy(PeerProxyMap.java:59)
>  at org.apache.ratis.util.PeerProxyMap.getProxy(PeerProxyMap.java:106)
>  at 
> org.apache.ratis.grpc.client.GrpcClientRpc.sendRequestAsync(GrpcClientRpc.java:69)
>  at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequestAsync(RaftClientImpl.java:324)
>  at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequestWithRetryAsync(RaftClientImpl.java:286)
>  at 
> org.apache.ratis.util.SlidingWindow$Client.sendOrDelayRequest(SlidingWindow.java:243)
>  at org.apache.ratis.util.SlidingWindow$Client.retry(SlidingWindow.java:259)
>  at 
> org.apache.ratis.client.impl.RaftClientImpl.lambda$null$10(RaftClientImpl.java:293)
>  at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:85)
>  at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:104)
>  at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50)
>  at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Feb 11, 2019 9:12:49 AM 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference
>  cleanQueue
> SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=29, 
> target=172.27.10.133:9858} was not shutdown properly!!! ~*~*~*
>  Make sure to call shutdown()/shutdownNow() and wait until awaitTermination() 
> returns true.
> java.lang.RuntimeException: ManagedChannel allocation site
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103)
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53)
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:44)
>  

[jira] [Commented] (HDDS-1079) java.lang.RuntimeException: ManagedChannel allocation site exception seen on client cli when datanode restarted in one of the pipelines

2019-02-11 Thread Nilotpal Nandi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764832#comment-16764832
 ] 

Nilotpal Nandi commented on HDDS-1079:
--

logs present at :

[^nodes-ozone-logs-1549879783.tar.gz]

> java.lang.RuntimeException: ManagedChannel allocation site exception seen on 
> client cli when datanode restarted in one of the pipelines
> ---
>
> Key: HDDS-1079
> URL: https://issues.apache.org/jira/browse/HDDS-1079
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Nilotpal Nandi
>Priority: Major
> Attachments: nodes-ozone-logs-1549879783.tar.gz
>
>
> steps taken :
> 
>  # created 12 datanode cluster.
>  # started put key operation with size 100GB.
>  # Restarted one of the datanodes from one of the pipelines.
> exception seen  on cli :
> 
>  
> {noformat}
> [root@ctr-e139-1542663976389-62237-01-06 ~]# time ozone sh key put 
> volume1/bucket1/key1 /root/100G
> Feb 11, 2019 9:12:49 AM 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference
>  cleanQueue
> SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=61, 
> target=172.27.10.133:9858} was not shutdown properly!!! ~*~*~*
>  Make sure to call shutdown()/shutdownNow() and wait until awaitTermination() 
> returns true.
> java.lang.RuntimeException: ManagedChannel allocation site
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103)
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53)
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:44)
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:411)
>  at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.(GrpcClientProtocolClient.java:116)
>  at 
> org.apache.ratis.grpc.client.GrpcClientRpc.lambda$new$0(GrpcClientRpc.java:54)
>  at 
> org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$getProxy$0(PeerProxyMap.java:60)
>  at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:191)
>  at 
> org.apache.ratis.util.PeerProxyMap$PeerAndProxy.getProxy(PeerProxyMap.java:59)
>  at org.apache.ratis.util.PeerProxyMap.getProxy(PeerProxyMap.java:106)
>  at 
> org.apache.ratis.grpc.client.GrpcClientRpc.sendRequestAsync(GrpcClientRpc.java:69)
>  at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequestAsync(RaftClientImpl.java:324)
>  at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequestWithRetryAsync(RaftClientImpl.java:286)
>  at 
> org.apache.ratis.util.SlidingWindow$Client.sendOrDelayRequest(SlidingWindow.java:243)
>  at org.apache.ratis.util.SlidingWindow$Client.retry(SlidingWindow.java:259)
>  at 
> org.apache.ratis.client.impl.RaftClientImpl.lambda$null$10(RaftClientImpl.java:293)
>  at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:85)
>  at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:104)
>  at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50)
>  at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Feb 11, 2019 9:12:49 AM 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference
>  cleanQueue
> SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=29, 
> target=172.27.10.133:9858} was not shutdown properly!!! ~*~*~*
>  Make sure to call shutdown()/shutdownNow() and wait until awaitTermination() 
> returns true.
> java.lang.RuntimeException: ManagedChannel allocation site
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103)
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53)
>  at 
> 

[jira] [Created] (HDDS-1079) java.lang.RuntimeException: ManagedChannel allocation site exception seen on client cli when datanode restarted in one of the pipelines

2019-02-11 Thread Nilotpal Nandi (JIRA)
Nilotpal Nandi created HDDS-1079:


 Summary: java.lang.RuntimeException: ManagedChannel allocation 
site exception seen on client cli when datanode restarted in one of the 
pipelines
 Key: HDDS-1079
 URL: https://issues.apache.org/jira/browse/HDDS-1079
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Reporter: Nilotpal Nandi


steps taken :


 # created 12 datanode cluster.
 # started put key operation with size 100GB.
 # Restarted one of the datanodes from one of the pipelines.

exception seen  on cli :



 
{noformat}
[root@ctr-e139-1542663976389-62237-01-06 ~]# time ozone sh key put 
volume1/bucket1/key1 /root/100G
Feb 11, 2019 9:12:49 AM 
org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference
 cleanQueue
SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=61, target=172.27.10.133:9858} 
was not shutdown properly!!! ~*~*~*
 Make sure to call shutdown()/shutdownNow() and wait until awaitTermination() 
returns true.
java.lang.RuntimeException: ManagedChannel allocation site
 at 
org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103)
 at 
org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53)
 at 
org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:44)
 at 
org.apache.ratis.thirdparty.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:411)
 at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient.(GrpcClientProtocolClient.java:116)
 at 
org.apache.ratis.grpc.client.GrpcClientRpc.lambda$new$0(GrpcClientRpc.java:54)
 at 
org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$getProxy$0(PeerProxyMap.java:60)
 at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:191)
 at 
org.apache.ratis.util.PeerProxyMap$PeerAndProxy.getProxy(PeerProxyMap.java:59)
 at org.apache.ratis.util.PeerProxyMap.getProxy(PeerProxyMap.java:106)
 at 
org.apache.ratis.grpc.client.GrpcClientRpc.sendRequestAsync(GrpcClientRpc.java:69)
 at 
org.apache.ratis.client.impl.RaftClientImpl.sendRequestAsync(RaftClientImpl.java:324)
 at 
org.apache.ratis.client.impl.RaftClientImpl.sendRequestWithRetryAsync(RaftClientImpl.java:286)
 at 
org.apache.ratis.util.SlidingWindow$Client.sendOrDelayRequest(SlidingWindow.java:243)
 at org.apache.ratis.util.SlidingWindow$Client.retry(SlidingWindow.java:259)
 at 
org.apache.ratis.client.impl.RaftClientImpl.lambda$null$10(RaftClientImpl.java:293)
 at 
org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:85)
 at 
org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:104)
 at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50)
 at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Feb 11, 2019 9:12:49 AM 
org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference
 cleanQueue
SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=29, target=172.27.10.133:9858} 
was not shutdown properly!!! ~*~*~*
 Make sure to call shutdown()/shutdownNow() and wait until awaitTermination() 
returns true.
java.lang.RuntimeException: ManagedChannel allocation site
 at 
org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103)
 at 
org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53)
 at 
org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:44)
 at 
org.apache.ratis.thirdparty.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:411)
 at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient.(GrpcClientProtocolClient.java:116)
 at 
org.apache.ratis.grpc.client.GrpcClientRpc.lambda$new$0(GrpcClientRpc.java:54)
 at 
org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$getProxy$0(PeerProxyMap.java:60)
 at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:191)
 at 

[jira] [Commented] (HDDS-1047) Fix TestRatisPipelineProvider#testCreatePipelineWithFactor

2019-02-07 Thread Nilotpal Nandi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762997#comment-16762997
 ] 

Nilotpal Nandi commented on HDDS-1047:
--

[~linyiqun] , thanks for the review.

I have addressed the comment and uploaded a new patch.

> Fix TestRatisPipelineProvider#testCreatePipelineWithFactor
> --
>
> Key: HDDS-1047
> URL: https://issues.apache.org/jira/browse/HDDS-1047
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1047.001.patch, HDDS-1047.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1047) Fix TestRatisPipelineProvider#testCreatePipelineWithFactor

2019-02-07 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1047:
-
Attachment: HDDS-1047.002.patch

> Fix TestRatisPipelineProvider#testCreatePipelineWithFactor
> --
>
> Key: HDDS-1047
> URL: https://issues.apache.org/jira/browse/HDDS-1047
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1047.001.patch, HDDS-1047.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1040) Add blockade Tests for client failures

2019-02-07 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1040:
-
Attachment: HDDS-1040.003.patch

> Add blockade Tests for client failures
> --
>
> Key: HDDS-1040
> URL: https://issues.apache.org/jira/browse/HDDS-1040
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-1040.001.patch, HDDS-1040.002.patch, 
> HDDS-1040.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1067) freon run on client gets hung when two of the datanodes are down in 3 datanode cluster

2019-02-06 Thread Nilotpal Nandi (JIRA)
Nilotpal Nandi created HDDS-1067:


 Summary: freon run on client gets hung when two of the datanodes 
are down in 3 datanode cluster
 Key: HDDS-1067
 URL: https://issues.apache.org/jira/browse/HDDS-1067
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Reporter: Nilotpal Nandi


steps taken :


 # created 3 node docker cluster.
 # wrote a key
 # created partition such that 2 out of 3 datanodes cannot communicate with any 
other node.
 # Third datanode can communicate with scm, om and the client.
 # ran freon to write key

Observation :

-

freon run is hung. There is no timeout.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1057) get key operation fails when client cannot communicate with 2 of the datanodes in 3 node cluster

2019-02-06 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1057:
-
Description: 
steps taken :

--
 # created 3 node docker cluster.
 # wrote a key
 # created partition such that 2 out of 3 datanodes cannot communicate with any 
other node.
 # Third datanode can communicate with scm, om and the client.
 # Tried to read the key

Exception seen :



 
{noformat}
Failed to execute command cmdType: GetBlock
E traceID: "9b3ebd93-e598-4ca2-a6f4-2389f2d35f63"
E containerID: 22
E datanodeUuid: "15345663-15c9-4fe3-9b8f-a46123ba8a6e"
E getBlock {
E blockID {
E containerID: 22
E localID: 101545011736215553
E blockCommitSequenceId: 5
E }
E }
E on datanode 15345663-15c9-4fe3-9b8f-a46123ba8a6e
E java.util.concurrent.ExecutionException: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
exception
E at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
E at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
E at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:220)
E at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:201)
E at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:118)
E at 
org.apache.hadoop.ozone.client.io.KeyInputStream.getFromOmKeyInfo(KeyInputStream.java:305)
E at org.apache.hadoop.ozone.client.rpc.RpcClient.getKey(RpcClient.java:608)
E at org.apache.hadoop.ozone.client.OzoneBucket.readKey(OzoneBucket.java:284)
E at 
org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:95)
E at 
org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:48)
E at picocli.CommandLine.execute(CommandLine.java:919)
E at picocli.CommandLine.access$700(CommandLine.java:104)
E at picocli.CommandLine$RunLast.handle(CommandLine.java:1083)
E at picocli.CommandLine$RunLast.handle(CommandLine.java:1051)
E at 
picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959)
E at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242)
E at picocli.CommandLine.parseWithHandler(CommandLine.java:1181)
E at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:61)
E at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:52)
E at org.apache.hadoop.ozone.web.ozShell.Shell.main(Shell.java:83)
E Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
E at 
org.apache.ratis.thirdparty.io.grpc.Status.asRuntimeException(Status.java:526)
E at 
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434)
E at 
org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
E at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
E at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
E at 
org.apache.ratis.thirdparty.io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:678)
E at 
org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
E at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
E at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
E at 
org.apache.ratis.thirdparty.io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397)
E at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
E at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
E at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546)
E at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467)
E at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584)
E at 
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
E at 
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
E at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
E at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
E at java.lang.Thread.run(Thread.java:748)
E Caused by: 

[jira] [Updated] (HDDS-1057) get key operation fails when client cannot communicate with 2 of the datanodes in 3 node cluster

2019-02-06 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1057:
-
Attachment: test_client_failure_isolate_two_datanodes_all_docker.log

> get key operation fails when client cannot communicate with 2 of the 
> datanodes in 3 node cluster
> 
>
> Key: HDDS-1057
> URL: https://issues.apache.org/jira/browse/HDDS-1057
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Nilotpal Nandi
>Priority: Major
> Attachments: test_client_failure_isolate_two_datanodes_all_docker.log
>
>
> steps taken :
> --
>  # created 3 node docker cluster.
>  # wrote a key
>  # created partition such that 2 out of 3 datanodes cannot communicate with 
> any other node.
>  # Third datanode can communicate with all other nodes.
>  # Tried to read the key
> Exception seen :
> 
>  
> {noformat}
> Failed to execute command cmdType: GetBlock
> E traceID: "9b3ebd93-e598-4ca2-a6f4-2389f2d35f63"
> E containerID: 22
> E datanodeUuid: "15345663-15c9-4fe3-9b8f-a46123ba8a6e"
> E getBlock {
> E blockID {
> E containerID: 22
> E localID: 101545011736215553
> E blockCommitSequenceId: 5
> E }
> E }
> E on datanode 15345663-15c9-4fe3-9b8f-a46123ba8a6e
> E java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> E at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> E at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> E at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:220)
> E at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:201)
> E at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:118)
> E at 
> org.apache.hadoop.ozone.client.io.KeyInputStream.getFromOmKeyInfo(KeyInputStream.java:305)
> E at org.apache.hadoop.ozone.client.rpc.RpcClient.getKey(RpcClient.java:608)
> E at org.apache.hadoop.ozone.client.OzoneBucket.readKey(OzoneBucket.java:284)
> E at 
> org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:95)
> E at 
> org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:48)
> E at picocli.CommandLine.execute(CommandLine.java:919)
> E at picocli.CommandLine.access$700(CommandLine.java:104)
> E at picocli.CommandLine$RunLast.handle(CommandLine.java:1083)
> E at picocli.CommandLine$RunLast.handle(CommandLine.java:1051)
> E at 
> picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959)
> E at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242)
> E at picocli.CommandLine.parseWithHandler(CommandLine.java:1181)
> E at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:61)
> E at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:52)
> E at org.apache.hadoop.ozone.web.ozShell.Shell.main(Shell.java:83)
> E Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> E at 
> org.apache.ratis.thirdparty.io.grpc.Status.asRuntimeException(Status.java:526)
> E at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434)
> E at 
> org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
> E at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
> E at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
> E at 
> org.apache.ratis.thirdparty.io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:678)
> E at 
> org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
> E at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
> E at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
> E at 
> org.apache.ratis.thirdparty.io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397)
> E at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
> E at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
> E at 
> 

[jira] [Commented] (HDDS-1057) get key operation fails when client cannot communicate with 2 of the datanodes in 3 node cluster

2019-02-06 Thread Nilotpal Nandi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761690#comment-16761690
 ] 

Nilotpal Nandi commented on HDDS-1057:
--

logs present at :

[^test_client_failure_isolate_two_datanodes_all_docker.log]

> get key operation fails when client cannot communicate with 2 of the 
> datanodes in 3 node cluster
> 
>
> Key: HDDS-1057
> URL: https://issues.apache.org/jira/browse/HDDS-1057
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Nilotpal Nandi
>Priority: Major
> Attachments: test_client_failure_isolate_two_datanodes_all_docker.log
>
>
> steps taken :
> --
>  # created 3 node docker cluster.
>  # wrote a key
>  # created partition such that 2 out of 3 datanodes cannot communicate with 
> any other node.
>  # Third datanode can communicate with all other nodes.
>  # Tried to read the key
> Exception seen :
> 
>  
> {noformat}
> Failed to execute command cmdType: GetBlock
> E traceID: "9b3ebd93-e598-4ca2-a6f4-2389f2d35f63"
> E containerID: 22
> E datanodeUuid: "15345663-15c9-4fe3-9b8f-a46123ba8a6e"
> E getBlock {
> E blockID {
> E containerID: 22
> E localID: 101545011736215553
> E blockCommitSequenceId: 5
> E }
> E }
> E on datanode 15345663-15c9-4fe3-9b8f-a46123ba8a6e
> E java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> E at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> E at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> E at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:220)
> E at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:201)
> E at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:118)
> E at 
> org.apache.hadoop.ozone.client.io.KeyInputStream.getFromOmKeyInfo(KeyInputStream.java:305)
> E at org.apache.hadoop.ozone.client.rpc.RpcClient.getKey(RpcClient.java:608)
> E at org.apache.hadoop.ozone.client.OzoneBucket.readKey(OzoneBucket.java:284)
> E at 
> org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:95)
> E at 
> org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:48)
> E at picocli.CommandLine.execute(CommandLine.java:919)
> E at picocli.CommandLine.access$700(CommandLine.java:104)
> E at picocli.CommandLine$RunLast.handle(CommandLine.java:1083)
> E at picocli.CommandLine$RunLast.handle(CommandLine.java:1051)
> E at 
> picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959)
> E at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242)
> E at picocli.CommandLine.parseWithHandler(CommandLine.java:1181)
> E at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:61)
> E at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:52)
> E at org.apache.hadoop.ozone.web.ozShell.Shell.main(Shell.java:83)
> E Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> E at 
> org.apache.ratis.thirdparty.io.grpc.Status.asRuntimeException(Status.java:526)
> E at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434)
> E at 
> org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
> E at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
> E at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
> E at 
> org.apache.ratis.thirdparty.io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:678)
> E at 
> org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
> E at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
> E at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
> E at 
> org.apache.ratis.thirdparty.io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397)
> E at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
> E at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
> E at 
> 

[jira] [Created] (HDDS-1057) get key operation fails when client cannot communicate with 2 of the datanodes in 3 node cluster

2019-02-06 Thread Nilotpal Nandi (JIRA)
Nilotpal Nandi created HDDS-1057:


 Summary: get key operation fails when client cannot communicate 
with 2 of the datanodes in 3 node cluster
 Key: HDDS-1057
 URL: https://issues.apache.org/jira/browse/HDDS-1057
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Reporter: Nilotpal Nandi


steps taken :

--
 # created 3 node docker cluster.
 # wrote a key
 # created partition such that 2 out of 3 datanodes cannot communicate with any 
other node.
 # Third datanode can communicate with all other nodes.
 # Tried to read the key

Exception seen :



 
{noformat}
Failed to execute command cmdType: GetBlock
E traceID: "9b3ebd93-e598-4ca2-a6f4-2389f2d35f63"
E containerID: 22
E datanodeUuid: "15345663-15c9-4fe3-9b8f-a46123ba8a6e"
E getBlock {
E blockID {
E containerID: 22
E localID: 101545011736215553
E blockCommitSequenceId: 5
E }
E }
E on datanode 15345663-15c9-4fe3-9b8f-a46123ba8a6e
E java.util.concurrent.ExecutionException: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
exception
E at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
E at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
E at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:220)
E at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:201)
E at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:118)
E at 
org.apache.hadoop.ozone.client.io.KeyInputStream.getFromOmKeyInfo(KeyInputStream.java:305)
E at org.apache.hadoop.ozone.client.rpc.RpcClient.getKey(RpcClient.java:608)
E at org.apache.hadoop.ozone.client.OzoneBucket.readKey(OzoneBucket.java:284)
E at 
org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:95)
E at 
org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:48)
E at picocli.CommandLine.execute(CommandLine.java:919)
E at picocli.CommandLine.access$700(CommandLine.java:104)
E at picocli.CommandLine$RunLast.handle(CommandLine.java:1083)
E at picocli.CommandLine$RunLast.handle(CommandLine.java:1051)
E at 
picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959)
E at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242)
E at picocli.CommandLine.parseWithHandler(CommandLine.java:1181)
E at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:61)
E at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:52)
E at org.apache.hadoop.ozone.web.ozShell.Shell.main(Shell.java:83)
E Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
E at 
org.apache.ratis.thirdparty.io.grpc.Status.asRuntimeException(Status.java:526)
E at 
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434)
E at 
org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
E at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
E at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
E at 
org.apache.ratis.thirdparty.io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:678)
E at 
org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
E at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
E at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
E at 
org.apache.ratis.thirdparty.io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397)
E at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
E at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
E at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546)
E at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467)
E at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584)
E at 
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
E at 
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
E at 

[jira] [Updated] (HDDS-1040) Add blockade Tests for client failures

2019-02-05 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1040:
-
Attachment: HDDS-1040.002.patch

> Add blockade Tests for client failures
> --
>
> Key: HDDS-1040
> URL: https://issues.apache.org/jira/browse/HDDS-1040
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-1040.001.patch, HDDS-1040.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1027) Add blockade Tests for datanode isolation and scm failures

2019-02-05 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1027:
-
Attachment: (was: HDDS-1027.002.patch)

> Add blockade Tests for datanode isolation and scm failures
> --
>
> Key: HDDS-1027
> URL: https://issues.apache.org/jira/browse/HDDS-1027
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1027.001.patch, HDDS-1027.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1027) Add blockade Tests for datanode isolation and scm failures

2019-02-05 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1027:
-
Attachment: HDDS-1027.002.patch

> Add blockade Tests for datanode isolation and scm failures
> --
>
> Key: HDDS-1027
> URL: https://issues.apache.org/jira/browse/HDDS-1027
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1027.001.patch, HDDS-1027.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1040) Add blockade Tests for client failures

2019-02-05 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi reassigned HDDS-1040:


Assignee: Nilotpal Nandi

> Add blockade Tests for client failures
> --
>
> Key: HDDS-1040
> URL: https://issues.apache.org/jira/browse/HDDS-1040
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1040.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1040) Add blockade Tests for client failures

2019-02-05 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1040:
-
Status: Patch Available  (was: Open)

> Add blockade Tests for client failures
> --
>
> Key: HDDS-1040
> URL: https://issues.apache.org/jira/browse/HDDS-1040
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1040.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1040) Add blockade Tests for client failures

2019-02-05 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1040:
-
Attachment: HDDS-1040.001.patch

> Add blockade Tests for client failures
> --
>
> Key: HDDS-1040
> URL: https://issues.apache.org/jira/browse/HDDS-1040
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1040.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1047) Fix TestRatisPipelineProvider#testCreatePipelineWithFactor

2019-02-04 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1047:
-
Attachment: HDDS-1047.001.patch

> Fix TestRatisPipelineProvider#testCreatePipelineWithFactor
> --
>
> Key: HDDS-1047
> URL: https://issues.apache.org/jira/browse/HDDS-1047
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1047.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1047) Fix TestRatisPipelineProvider#testCreatePipelineWithFactor

2019-02-04 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi reassigned HDDS-1047:


Assignee: Nilotpal Nandi

> Fix TestRatisPipelineProvider#testCreatePipelineWithFactor
> --
>
> Key: HDDS-1047
> URL: https://issues.apache.org/jira/browse/HDDS-1047
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1047.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1047) Fix TestRatisPipelineProvider#testCreatePipelineWithFactor

2019-02-04 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-1047:
-
Status: Patch Available  (was: Open)

> Fix TestRatisPipelineProvider#testCreatePipelineWithFactor
> --
>
> Key: HDDS-1047
> URL: https://issues.apache.org/jira/browse/HDDS-1047
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1047.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-997) Add blockade Tests for scm isolation and mixed node isolation

2019-01-31 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-997:

Attachment: HDDS-997.003.patch

> Add blockade Tests for scm isolation and mixed node isolation
> -
>
> Key: HDDS-997
> URL: https://issues.apache.org/jira/browse/HDDS-997
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-997.001.patch, HDDS-997.002.patch, 
> HDDS-997.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-997) Add blockade Tests for scm isolation and mixed node isolation

2019-01-31 Thread Nilotpal Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilotpal Nandi updated HDDS-997:

Attachment: (was: HDDS-997.003.patch)

> Add blockade Tests for scm isolation and mixed node isolation
> -
>
> Key: HDDS-997
> URL: https://issues.apache.org/jira/browse/HDDS-997
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-997.001.patch, HDDS-997.002.patch, 
> HDDS-997.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   3   >