[jira] [Created] (HDFS-15619) Metric for ordered snapshot deletion GC thread
Nilotpal Nandi created HDFS-15619: - Summary: Metric for ordered snapshot deletion GC thread Key: HDFS-15619 URL: https://issues.apache.org/jira/browse/HDFS-15619 Project: Hadoop HDFS Issue Type: Task Components: hdfs Reporter: Nilotpal Nandi Assignee: Nilotpal Nandi Following info should be captured and shown in JMX for garbage collection thread of ordered snapshot deletion * metric for all pending snapshots to be GCed * Number of times GC thread ran * Number of Snapshots already GCed * Average time taken by each GC run * Thread running Status * metric for failed deletion of GC thread -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2604) scmcli pipeline deactivate command not working
Nilotpal Nandi created HDDS-2604: Summary: scmcli pipeline deactivate command not working Key: HDDS-2604 URL: https://issues.apache.org/jira/browse/HDDS-2604 Project: Hadoop Distributed Data Store Issue Type: Bug Components: SCM Client Reporter: Nilotpal Nandi Assignee: Nilotpal Nandi scmcli pipeline deactivate not working output : {noformat} ozone scmcli pipeline deactivate 212e1f47-4890-49c2-a950-4d0b3a70cbfd Unknown command type: DeactivatePipeline root@st-ozone-kg2qce-l2ltm:/ansible# echo $? 255{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14980) diskbalancer query command always tries to contact to port 9867
Nilotpal Nandi created HDFS-14980: - Summary: diskbalancer query command always tries to contact to port 9867 Key: HDFS-14980 URL: https://issues.apache.org/jira/browse/HDFS-14980 Project: Hadoop HDFS Issue Type: Bug Components: diskbalancer Reporter: Nilotpal Nandi disbalancer query commands always tries to connect to port 9867 even when datanode IPC port is different. In this setup , datanode IPC port is set to 20001. diskbalancer report command works fine and connects to IPC port 20001 {noformat} hdfs diskbalancer -report -node 172.27.131.193 19/11/12 08:58:55 INFO command.Command: Processing report command 19/11/12 08:58:57 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec 19/11/12 08:58:57 INFO block.BlockTokenSecretManager: Setting block keys 19/11/12 08:58:57 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec 19/11/12 08:58:58 INFO command.Command: Reporting volume information for DataNode(s). These DataNode(s) are parsed from '172.27.131.193'. Processing report command Reporting volume information for DataNode(s). These DataNode(s) are parsed from '172.27.131.193'. [172.27.131.193:20001] - : 3 volumes with node data density 0.05. [DISK: volume-/dataroot/ycloud/dfs/NEW_DISK1/] - 0.15 used: 39343871181/259692498944, 0.85 free: 220348627763/259692498944, isFailed: False, isReadOnly: False, isSkip: False, isTransient: False. [DISK: volume-/dataroot/ycloud/dfs/NEW_DISK2/] - 0.15 used: 39371179986/259692498944, 0.85 free: 220321318958/259692498944, isFailed: False, isReadOnly: False, isSkip: False, isTransient: False. [DISK: volume-/dataroot/ycloud/dfs/dn/] - 0.19 used: 49934903670/259692498944, 0.81 free: 209757595274/259692498944, isFailed: False, isReadOnly: False, isSkip: False, isTransient: False. {noformat} But diskbalancer query command fails and tries to connect to port 9867 (default port). {noformat} hdfs diskbalancer -query 172.27.131.193 19/11/12 06:37:15 INFO command.Command: Executing "query plan" command. 19/11/12 06:37:16 INFO ipc.Client: Retrying connect to server: /172.27.131.193:9867. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/11/12 06:37:17 INFO ipc.Client: Retrying connect to server: /172.27.131.193:9867. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) .. .. .. 19/11/12 06:37:25 ERROR tools.DiskBalancerCLI: Exception thrown while running DiskBalancerCLI. {noformat} Expectation : diskbalancer query command should work fine without explicitly mentioning datanode IPC port address -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2350) NullPointerException seen in datanode log while writing data
Nilotpal Nandi created HDDS-2350: Summary: NullPointerException seen in datanode log while writing data Key: HDDS-2350 URL: https://issues.apache.org/jira/browse/HDDS-2350 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Nilotpal Nandi NullPointerException exception seen in datanode log while writing 10GB data. There is one pipelinee with factor 3 while writing data. {noformat} 2019-10-23 11:25:45,674 ERROR org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: Error getting metrics from source ratis_core.ratis_leader.a23fb300-4c1e-420f-a21e-7e73d0c22cbe@group-4CA404C938C2 java.lang.NullPointerException at org.apache.ratis.server.impl.RaftLeaderMetrics.lambda$null$2(RaftLeaderMetrics.java:86) at com.github.joshelser.dropwizard.metrics.hadoop.HadoopMetrics2Reporter.snapshotAllMetrics(HadoopMetrics2Reporter.java:239) at com.github.joshelser.dropwizard.metrics.hadoop.HadoopMetrics2Reporter.getMetrics(HadoopMetrics2Reporter.java:219) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:200) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(MetricsSystemImpl.java:419) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics(MetricsSystemImpl.java:406) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.onTimerEvent(MetricsSystemImpl.java:381) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run(MetricsSystemImpl.java:368) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) 2019-10-23 11:25:55,673 ERROR org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: Error getting metrics from source ratis_core.ratis_leader.a23fb300-4c1e-420f-a21e-7e73d0c22cbe@group-4CA404C938C2 java.lang.NullPointerException at org.apache.ratis.server.impl.RaftLeaderMetrics.lambda$null$2(RaftLeaderMetrics.java:86) at com.github.joshelser.dropwizard.metrics.hadoop.HadoopMetrics2Reporter.snapshotAllMetrics(HadoopMetrics2Reporter.java:239) at com.github.joshelser.dropwizard.metrics.hadoop.HadoopMetrics2Reporter.getMetrics(HadoopMetrics2Reporter.java:219) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:200) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(MetricsSystemImpl.java:419) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics(MetricsSystemImpl.java:406) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.onTimerEvent(MetricsSystemImpl.java:381) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run(MetricsSystemImpl.java:368) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) 2019-10-23 11:26:05,674 ERROR org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: Error getting metrics from source ratis_core.ratis_leader.a23fb300-4c1e-420f-a21e-7e73d0c22cbe@group-4CA404C938C2 java.lang.NullPointerException at org.apache.ratis.server.impl.RaftLeaderMetrics.lambda$null$2(RaftLeaderMetrics.java:86) at com.github.joshelser.dropwizard.metrics.hadoop.HadoopMetrics2Reporter.snapshotAllMetrics(HadoopMetrics2Reporter.java:239) at com.github.joshelser.dropwizard.metrics.hadoop.HadoopMetrics2Reporter.getMetrics(HadoopMetrics2Reporter.java:219) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:200) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(MetricsSystemImpl.java:419) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics(MetricsSystemImpl.java:406) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.onTimerEvent(MetricsSystemImpl.java:381) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run(MetricsSystemImpl.java:368) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505){noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2043) "VOLUME_NOT_FOUND" exception thrown while listing volumes
[ https://issues.apache.org/jira/browse/HDDS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-2043: - Description: ozone list volume command throws OMException bin/ozone sh volume list --user root VOLUME_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Volume info not found for vol-test-putfile-1566902803 On enabling DEBUG log , here is the console output : {noformat} bin/ozone sh volume create /n1 ; echo $? 2019-08-27 11:47:16 DEBUG ThriftSenderFactory:33 - Using the UDP Sender to send spans to the agent. 2019-08-27 11:47:16 DEBUG SenderResolver:86 - Using sender UdpSender() 2019-08-27 11:47:16 DEBUG MutableMetricsFactory:43 - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, always=false, valueName=Time, about=, interval=10, type=DEFAULT, value=[Rate of successful kerberos logins and latency (milliseconds)]) 2019-08-27 11:47:16 DEBUG MutableMetricsFactory:43 - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, always=false, valueName=Time, about=, interval=10, type=DEFAULT, value=[Rate of failed kerberos logins and latency (milliseconds)]) 2019-08-27 11:47:16 DEBUG MutableMetricsFactory:43 - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, always=false, valueName=Time, about=, interval=10, type=DEFAULT, value=[GetGroups]) 2019-08-27 11:47:16 DEBUG MutableMetricsFactory:43 - field private org.apache.hadoop.metrics2.lib.MutableGaugeLong org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailuresTotal with annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, always=false, valueName=Time, about=, interval=10, type=DEFAULT, value=[Renewal failures since startup]) 2019-08-27 11:47:16 DEBUG MutableMetricsFactory:43 - field private org.apache.hadoop.metrics2.lib.MutableGaugeInt org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailures with annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, always=false, valueName=Time, about=, interval=10, type=DEFAULT, value=[Renewal failures since last successful login]) 2019-08-27 11:47:16 DEBUG MetricsSystemImpl:231 - UgiMetrics, User and group related metrics 2019-08-27 11:47:16 DEBUG SecurityUtil:124 - Setting hadoop.security.token.service.use_ip to true 2019-08-27 11:47:16 DEBUG Shell:821 - setsid exited with exit code 0 2019-08-27 11:47:16 DEBUG Groups:449 - Creating new Groups object 2019-08-27 11:47:16 DEBUG Groups:151 - Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=30; warningDeltaMs=5000 2019-08-27 11:47:16 DEBUG UserGroupInformation:254 - hadoop login 2019-08-27 11:47:16 DEBUG UserGroupInformation:187 - hadoop login commit 2019-08-27 11:47:16 DEBUG UserGroupInformation:215 - using local user:UnixPrincipal: root 2019-08-27 11:47:16 DEBUG UserGroupInformation:221 - Using user: "UnixPrincipal: root" with name root 2019-08-27 11:47:16 DEBUG UserGroupInformation:235 - User entry: "root" 2019-08-27 11:47:16 DEBUG UserGroupInformation:766 - UGI loginUser:root (auth:SIMPLE) 2019-08-27 11:47:16 DEBUG OzoneClientFactory:287 - Using org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol. 2019-08-27 11:47:16 DEBUG Server:280 - rpcKind=RPC_PROTOCOL_BUFFER, rpcRequestWrapperClass=class org.apache.hadoop.ipc.ProtobufRpcEngine$RpcProtobufRequest, rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@710f4dc7 2019-08-27 11:47:16 DEBUG Client:63 - getting client out of cache: org.apache.hadoop.ipc.Client@24313fcc 2019-08-27 11:47:16 DEBUG Client:487 - The ping interval is 6 ms. 2019-08-27 11:47:16 DEBUG Client:785 - Connecting to nnandi-1.gce.cloudera.com/172.31.117.213:9862 2019-08-27 11:47:16 DEBUG Client:1064 - IPC Client (580871917) connection to nnandi-1.gce.cloudera.com/172.31.117.213:9862 from root: starting, having connections 1 2019-08-27 11:47:16 DEBUG Client:1127 - IPC Client (580871917) connection to nnandi-1.gce.cloudera.com/172.31.117.213:9862 from root sending #0 org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest 2019-08-27 11:47:17 DEBUG Client:1181 - IPC Client (580871917) connection to nnandi-1.gce.cloudera.com/172.31.117.213:9862 from root got value #0 2019-08-27 11:47:17 DEBUG ProtobufRpcEngine:249 - Call: submitRequest took 230ms 2019-08-27 11:47:17 DEBUG Client:63 - getting client out of cache: org.apache.hadoop.ipc.Client@24313fcc 2019-08-27 11:47:17 DEBUG Groups:312 -
[jira] [Updated] (HDDS-2043) "VOLUME_NOT_FOUND" exception thrown while listing volumes
[ https://issues.apache.org/jira/browse/HDDS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-2043: - Description: ozone list volume command throws OMException bin/ozone sh volume list --user root VOLUME_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Volume info not found for vol-test-putfile-1566902803 was: ozone list volume command throws OMException /opt/cloudera/parcels/CDH/bin/ozone sh volume list --user root VOLUME_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Volume info not found for vol-test-putfile-1566902803 > "VOLUME_NOT_FOUND" exception thrown while listing volumes > - > > Key: HDDS-2043 > URL: https://issues.apache.org/jira/browse/HDDS-2043 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone CLI, Ozone Manager >Reporter: Nilotpal Nandi >Priority: Major > > ozone list volume command throws OMException > bin/ozone sh volume list --user root > VOLUME_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Volume > info not found for vol-test-putfile-1566902803 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2043) "VOLUME_NOT_FOUND" exception thrown while listing volumes
Nilotpal Nandi created HDDS-2043: Summary: "VOLUME_NOT_FOUND" exception thrown while listing volumes Key: HDDS-2043 URL: https://issues.apache.org/jira/browse/HDDS-2043 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone CLI, Ozone Manager Reporter: Nilotpal Nandi ozone list volume command throws OMException /opt/cloudera/parcels/CDH/bin/ozone sh volume list --user root VOLUME_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Volume info not found for vol-test-putfile-1566902803 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1706) Replication Manager thread running too frequently
[ https://issues.apache.org/jira/browse/HDDS-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1706: - Status: Patch Available (was: Open) > Replication Manager thread running too frequently > - > > Key: HDDS-1706 > URL: https://issues.apache.org/jira/browse/HDDS-1706 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1706.001.patch > > > Replication manager is running too frequnently at a 3s interval in place of > 300s. > {code} > host: vc1337.halxg.cloudera.com, networkLocation: /default-rack, > certSerialId: null}. > 2019-06-18 03:11:51,687 INFO > org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor > Thread took 4 milliseconds for processing 739 containers. > . > 2019-06-18 03:11:54,692 INFO > org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor > Thread took 4 milliseconds for processing 739 containers. > {code} > It is because of the following lines > {code} > @Config(key = "thread.interval", > type = ConfigType.TIME, > defaultValue = "3s", > tags = {SCM, OZONE}, > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1706) Replication Manager thread running too frequently
[ https://issues.apache.org/jira/browse/HDDS-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1706: - Attachment: HDDS-1706.001.patch > Replication Manager thread running too frequently > - > > Key: HDDS-1706 > URL: https://issues.apache.org/jira/browse/HDDS-1706 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1706.001.patch > > > Replication manager is running too frequnently at a 3s interval in place of > 300s. > {code} > host: vc1337.halxg.cloudera.com, networkLocation: /default-rack, > certSerialId: null}. > 2019-06-18 03:11:51,687 INFO > org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor > Thread took 4 milliseconds for processing 739 containers. > . > 2019-06-18 03:11:54,692 INFO > org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor > Thread took 4 milliseconds for processing 739 containers. > {code} > It is because of the following lines > {code} > @Config(key = "thread.interval", > type = ConfigType.TIME, > defaultValue = "3s", > tags = {SCM, OZONE}, > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1706) Replication Manager thread running too frequently
[ https://issues.apache.org/jira/browse/HDDS-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi reassigned HDDS-1706: Assignee: Nilotpal Nandi > Replication Manager thread running too frequently > - > > Key: HDDS-1706 > URL: https://issues.apache.org/jira/browse/HDDS-1706 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Nilotpal Nandi >Priority: Major > > Replication manager is running too frequnently at a 3s interval in place of > 300s. > {code} > host: vc1337.halxg.cloudera.com, networkLocation: /default-rack, > certSerialId: null}. > 2019-06-18 03:11:51,687 INFO > org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor > Thread took 4 milliseconds for processing 739 containers. > . > 2019-06-18 03:11:54,692 INFO > org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor > Thread took 4 milliseconds for processing 739 containers. > {code} > It is because of the following lines > {code} > @Config(key = "thread.interval", > type = ConfigType.TIME, > defaultValue = "3s", > tags = {SCM, OZONE}, > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1497) Refactor blockade Tests
[ https://issues.apache.org/jira/browse/HDDS-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16850826#comment-16850826 ] Nilotpal Nandi commented on HDDS-1497: -- Thanks [~shashikant] for review . I have addressed your comments . Here are inline comments : 1. Please update comments for property, getter and setter functions. - done 2.cluster.py:223-224 : > incorrect comments. - done 3. clusterUtils.py:324 -> "om_1" should be "om"? - This should work fine with 'om_1" too. "om_1" string is present in om's container name. 4.cluster_utils.py:296 -> which file checksum is it supposed to compute ? can you please update the comments? - done > Refactor blockade Tests > --- > > Key: HDDS-1497 > URL: https://issues.apache.org/jira/browse/HDDS-1497 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1497.001.patch, HDDS-1497.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1497) Refactor blockade Tests
[ https://issues.apache.org/jira/browse/HDDS-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1497: - Attachment: HDDS-1497.002.patch > Refactor blockade Tests > --- > > Key: HDDS-1497 > URL: https://issues.apache.org/jira/browse/HDDS-1497 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1497.001.patch, HDDS-1497.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-1534) freon should return non-zero exit code on failure
[ https://issues.apache.org/jira/browse/HDDS-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846644#comment-16846644 ] Nilotpal Nandi edited comment on HDDS-1534 at 5/23/19 11:16 AM: Thanks [~sdeka] for the review. I have addressed your comment and uploaded a new patch. was (Author: nilotpalnandi): Thanks [~sdeka] for the review. I havve uploaded new patch > freon should return non-zero exit code on failure > - > > Key: HDDS-1534 > URL: https://issues.apache.org/jira/browse/HDDS-1534 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1534.001.patch, HDDS-1534.002.patch > > > Currently freon does not return any non-zero exit code even on failure. > The status shows as "Failed" but the exit code is always zero. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1534) freon should return non-zero exit code on failure
[ https://issues.apache.org/jira/browse/HDDS-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846644#comment-16846644 ] Nilotpal Nandi commented on HDDS-1534: -- Thanks [~sdeka] for the review. I havve uploaded new patch > freon should return non-zero exit code on failure > - > > Key: HDDS-1534 > URL: https://issues.apache.org/jira/browse/HDDS-1534 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1534.001.patch, HDDS-1534.002.patch > > > Currently freon does not return any non-zero exit code even on failure. > The status shows as "Failed" but the exit code is always zero. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1534) freon should return non-zero exit code on failure
[ https://issues.apache.org/jira/browse/HDDS-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1534: - Attachment: HDDS-1534.002.patch > freon should return non-zero exit code on failure > - > > Key: HDDS-1534 > URL: https://issues.apache.org/jira/browse/HDDS-1534 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1534.001.patch, HDDS-1534.002.patch > > > Currently freon does not return any non-zero exit code even on failure. > The status shows as "Failed" but the exit code is always zero. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1534) freon should return non-zero exit code on failure
[ https://issues.apache.org/jira/browse/HDDS-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1534: - Attachment: HDDS-1534.001.patch > freon should return non-zero exit code on failure > - > > Key: HDDS-1534 > URL: https://issues.apache.org/jira/browse/HDDS-1534 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1534.001.patch > > > Currently freon does not return any non-zero exit code even on failure. > The status shows as "Failed" but the exit code is always zero. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1534) freon should return non-zero exit code on failure
[ https://issues.apache.org/jira/browse/HDDS-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1534: - Status: Patch Available (was: Open) > freon should return non-zero exit code on failure > - > > Key: HDDS-1534 > URL: https://issues.apache.org/jira/browse/HDDS-1534 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1534.001.patch > > > Currently freon does not return any non-zero exit code even on failure. > The status shows as "Failed" but the exit code is always zero. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1497) Refactor blockade Tests
[ https://issues.apache.org/jira/browse/HDDS-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1497: - Status: Patch Available (was: Open) > Refactor blockade Tests > --- > > Key: HDDS-1497 > URL: https://issues.apache.org/jira/browse/HDDS-1497 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1497.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1497) Refactor blockade Tests
[ https://issues.apache.org/jira/browse/HDDS-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1497: - Attachment: HDDS-1497.001.patch > Refactor blockade Tests > --- > > Key: HDDS-1497 > URL: https://issues.apache.org/jira/browse/HDDS-1497 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1497.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1534) freon should return non-zero exit code on failure
Nilotpal Nandi created HDDS-1534: Summary: freon should return non-zero exit code on failure Key: HDDS-1534 URL: https://issues.apache.org/jira/browse/HDDS-1534 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Nilotpal Nandi Assignee: Nilotpal Nandi Currently freon does not return any non-zero exit code even on failure. The status shows as "Failed" but the exit code is always zero. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1497) Refactor blockade Tests
Nilotpal Nandi created HDDS-1497: Summary: Refactor blockade Tests Key: HDDS-1497 URL: https://issues.apache.org/jira/browse/HDDS-1497 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Nilotpal Nandi Assignee: Nilotpal Nandi -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1164) Add New blockade Tests to test Replica Manager
[ https://issues.apache.org/jira/browse/HDDS-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1164: - Attachment: HDDS-1164.004.patch > Add New blockade Tests to test Replica Manager > -- > > Key: HDDS-1164 > URL: https://issues.apache.org/jira/browse/HDDS-1164 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Labels: postpone-to-craterlake > Attachments: HDDS-1164.001.patch, HDDS-1164.002.patch, > HDDS-1164.003.patch, HDDS-1164.004.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1067) freon run on client gets hung when two of the datanodes are down in 3 datanode cluster
[ https://issues.apache.org/jira/browse/HDDS-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805260#comment-16805260 ] Nilotpal Nandi commented on HDDS-1067: -- Thanks [~shashikant]. I have uploaded the patch > freon run on client gets hung when two of the datanodes are down in 3 > datanode cluster > -- > > Key: HDDS-1067 > URL: https://issues.apache.org/jira/browse/HDDS-1067 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1067.001.patch, stack_file.txt > > > steps taken : > > # created 3 node docker cluster. > # wrote a key > # created partition such that 2 out of 3 datanodes cannot communicate with > any other node. > # Third datanode can communicate with scm, om and the client. > # ran freon to write key > Observation : > - > freon run is hung. There is no timeout. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1067) freon run on client gets hung when two of the datanodes are down in 3 datanode cluster
[ https://issues.apache.org/jira/browse/HDDS-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1067: - Status: Patch Available (was: Open) > freon run on client gets hung when two of the datanodes are down in 3 > datanode cluster > -- > > Key: HDDS-1067 > URL: https://issues.apache.org/jira/browse/HDDS-1067 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1067.001.patch, stack_file.txt > > > steps taken : > > # created 3 node docker cluster. > # wrote a key > # created partition such that 2 out of 3 datanodes cannot communicate with > any other node. > # Third datanode can communicate with scm, om and the client. > # ran freon to write key > Observation : > - > freon run is hung. There is no timeout. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1067) freon run on client gets hung when two of the datanodes are down in 3 datanode cluster
[ https://issues.apache.org/jira/browse/HDDS-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1067: - Attachment: HDDS-1067.001.patch > freon run on client gets hung when two of the datanodes are down in 3 > datanode cluster > -- > > Key: HDDS-1067 > URL: https://issues.apache.org/jira/browse/HDDS-1067 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1067.001.patch, stack_file.txt > > > steps taken : > > # created 3 node docker cluster. > # wrote a key > # created partition such that 2 out of 3 datanodes cannot communicate with > any other node. > # Third datanode can communicate with scm, om and the client. > # ran freon to write key > Observation : > - > freon run is hung. There is no timeout. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1298) blockade tests failing as the nodes are not able to communicate with Ozone Manager
[ https://issues.apache.org/jira/browse/HDDS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi reassigned HDDS-1298: Assignee: Nilotpal Nandi > blockade tests failing as the nodes are not able to communicate with Ozone > Manager > -- > > Key: HDDS-1298 > URL: https://issues.apache.org/jira/browse/HDDS-1298 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Critical > Attachments: alllogs.log > > > steps taken: > > # started 3 datanodes docker cluster. > # freon run fails with error : "No such service: ozoneManager" > > {noformat} > om_1 | STARTUP_MSG: build = https://github.com/apache/hadoop.git -r > e97acb3bd8f3befd27418996fa5d4b50bf2e17bf; compiled by 'sunilg' on > 2019-01-15T17:34Z > om_1 | STARTUP_MSG: java = 11.0.1 > om_1 | / > om_1 | 2019-03-18 06:31:41 INFO OzoneManager:51 - registered UNIX signal > handlers for [TERM, HUP, INT] > om_1 | 2019-03-18 06:31:41 WARN ScmUtils:77 - ozone.om.db.dirs is not > configured. We recommend adding this setting. Falling back to > ozone.metadata.dirs instead. > om_1 | 2019-03-18 06:31:41 INFO OzoneManager:484 - OM Service ID is not set. > Setting it to the default ID: omServiceIdDefault > om_1 | 2019-03-18 06:31:41 INFO OzoneManager:490 - OM Node ID is not set. > Setting it to the OmStorage's OmID: 25501758-f7f6-42d5-8196-52a885af7e23 > om_1 | 2019-03-18 06:31:41 INFO OzoneManager:441 - Found matching OM address > with OMServiceId: null, OMNodeId: null, RPC Address: om:9862 and Ratis port: > 9872 > om_1 | 2019-03-18 06:31:42 WARN ScmUtils:77 - ozone.om.db.dirs is not > configured. We recommend adding this setting. Falling back to > ozone.metadata.dirs instead. > om_1 | 2019-03-18 06:31:42 INFO log:192 - Logging initialized @4061ms > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: userTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:userTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: volumeTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:volumeTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: bucketTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:bucketTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: keyTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:keyTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: deletedTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:deletedTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: openKeyTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:openKeyTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: s3Table > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:s3Table > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: multipartInfoTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:multipartInfoTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: s3SecretTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:s3SecretTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: default > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:158 - Using default column > profile:DBProfile.DISK for Table:default > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:189 - Using default options. > DBProfile.DISK > om_1 | 2019-03-18 06:31:42 INFO CallQueueManager:84 - Using callQueue: class > java.util.concurrent.LinkedBlockingQueue, queueCapacity: 2000, scheduler: > class org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false. > om_1 | 2019-03-18 06:31:42 INFO Server:1074 - Starting Socket Reader #1 for > port 9862 > om_1 | 2019-03-18 06:31:43 WARN ScmUtils:77 - ozone.om.db.dirs is not > configured. We recommend adding this setting. Falling back to > ozone.metadata.dirs instead. > om_1 | 2019-03-18
[jira] [Resolved] (HDDS-1298) blockade tests failing as the nodes are not able to communicate with Ozone Manager
[ https://issues.apache.org/jira/browse/HDDS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi resolved HDDS-1298. -- Resolution: Duplicate > blockade tests failing as the nodes are not able to communicate with Ozone > Manager > -- > > Key: HDDS-1298 > URL: https://issues.apache.org/jira/browse/HDDS-1298 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Critical > Attachments: alllogs.log > > > steps taken: > > # started 3 datanodes docker cluster. > # freon run fails with error : "No such service: ozoneManager" > > {noformat} > om_1 | STARTUP_MSG: build = https://github.com/apache/hadoop.git -r > e97acb3bd8f3befd27418996fa5d4b50bf2e17bf; compiled by 'sunilg' on > 2019-01-15T17:34Z > om_1 | STARTUP_MSG: java = 11.0.1 > om_1 | / > om_1 | 2019-03-18 06:31:41 INFO OzoneManager:51 - registered UNIX signal > handlers for [TERM, HUP, INT] > om_1 | 2019-03-18 06:31:41 WARN ScmUtils:77 - ozone.om.db.dirs is not > configured. We recommend adding this setting. Falling back to > ozone.metadata.dirs instead. > om_1 | 2019-03-18 06:31:41 INFO OzoneManager:484 - OM Service ID is not set. > Setting it to the default ID: omServiceIdDefault > om_1 | 2019-03-18 06:31:41 INFO OzoneManager:490 - OM Node ID is not set. > Setting it to the OmStorage's OmID: 25501758-f7f6-42d5-8196-52a885af7e23 > om_1 | 2019-03-18 06:31:41 INFO OzoneManager:441 - Found matching OM address > with OMServiceId: null, OMNodeId: null, RPC Address: om:9862 and Ratis port: > 9872 > om_1 | 2019-03-18 06:31:42 WARN ScmUtils:77 - ozone.om.db.dirs is not > configured. We recommend adding this setting. Falling back to > ozone.metadata.dirs instead. > om_1 | 2019-03-18 06:31:42 INFO log:192 - Logging initialized @4061ms > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: userTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:userTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: volumeTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:volumeTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: bucketTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:bucketTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: keyTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:keyTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: deletedTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:deletedTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: openKeyTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:openKeyTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: s3Table > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:s3Table > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: multipartInfoTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:multipartInfoTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: s3SecretTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:s3SecretTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: default > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:158 - Using default column > profile:DBProfile.DISK for Table:default > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:189 - Using default options. > DBProfile.DISK > om_1 | 2019-03-18 06:31:42 INFO CallQueueManager:84 - Using callQueue: class > java.util.concurrent.LinkedBlockingQueue, queueCapacity: 2000, scheduler: > class org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false. > om_1 | 2019-03-18 06:31:42 INFO Server:1074 - Starting Socket Reader #1 for > port 9862 > om_1 | 2019-03-18 06:31:43 WARN ScmUtils:77 - ozone.om.db.dirs is not > configured. We recommend adding this setting. Falling back to > ozone.metadata.dirs instead. > om_1 | 2019-03-18 06:31:43
[jira] [Updated] (HDDS-1164) Add New blockade Tests to test Replica Manager
[ https://issues.apache.org/jira/browse/HDDS-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1164: - Attachment: HDDS-1164.003.patch > Add New blockade Tests to test Replica Manager > -- > > Key: HDDS-1164 > URL: https://issues.apache.org/jira/browse/HDDS-1164 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Labels: postpone-to-craterlake > Attachments: HDDS-1164.001.patch, HDDS-1164.002.patch, > HDDS-1164.003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1067) freon run on client gets hung when two of the datanodes are down in 3 datanode cluster
[ https://issues.apache.org/jira/browse/HDDS-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi reassigned HDDS-1067: Assignee: Nilotpal Nandi (was: Shashikant Banerjee) > freon run on client gets hung when two of the datanodes are down in 3 > datanode cluster > -- > > Key: HDDS-1067 > URL: https://issues.apache.org/jira/browse/HDDS-1067 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: stack_file.txt > > > steps taken : > > # created 3 node docker cluster. > # wrote a key > # created partition such that 2 out of 3 datanodes cannot communicate with > any other node. > # Third datanode can communicate with scm, om and the client. > # ran freon to write key > Observation : > - > freon run is hung. There is no timeout. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1164) Add New blockade Tests to test Replica Manager
[ https://issues.apache.org/jira/browse/HDDS-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1164: - Attachment: HDDS-1164.002.patch > Add New blockade Tests to test Replica Manager > -- > > Key: HDDS-1164 > URL: https://issues.apache.org/jira/browse/HDDS-1164 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Labels: postpone-to-craterlake > Attachments: HDDS-1164.001.patch, HDDS-1164.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1338) ozone shell commands are throwing InvocationTargetException
Nilotpal Nandi created HDDS-1338: Summary: ozone shell commands are throwing InvocationTargetException Key: HDDS-1338 URL: https://issues.apache.org/jira/browse/HDDS-1338 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Nilotpal Nandi ozone version {noformat} Source code repository g...@github.com:hortonworks/ozone.git -r 310ebf5dc83b6c9e68d09246ed6c6f7cf6370fde Compiled by jenkins on 2019-03-21T22:06Z Compiled with protoc 2.5.0 >From source with checksum 9c367143ad43b81ca84bfdaafd1c3f Using HDDS 0.4.0.3.0.100.0-388 Source code repository g...@github.com:hortonworks/ozone.git -r 310ebf5dc83b6c9e68d09246ed6c6f7cf6370fde Compiled by jenkins on 2019-03-21T22:06Z Compiled with protoc 2.5.0 >From source with checksum f3297cbd3a5f59fb4e5fd551afa05ba9 {noformat} Here is the ozone volume create failure output : {noformat} hdfs@ctr-e139-1542663976389-91321-01-02 ~]$ ozone sh volume create testvolume11 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/3.0.100.0-388/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/3.0.100.0-388/hadoop-ozone/share/ozone/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/03/26 17:31:37 ERROR client.OzoneClientFactory: Couldn't create protocol class org.apache.hadoop.ozone.client.rpc.RpcClient exception: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:291) at org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:169) at org.apache.hadoop.ozone.web.ozShell.OzoneAddress.createClient(OzoneAddress.java:111) at org.apache.hadoop.ozone.web.ozShell.volume.CreateVolumeHandler.call(CreateVolumeHandler.java:70) at org.apache.hadoop.ozone.web.ozShell.volume.CreateVolumeHandler.call(CreateVolumeHandler.java:38) at picocli.CommandLine.execute(CommandLine.java:919) at picocli.CommandLine.access$700(CommandLine.java:104) at picocli.CommandLine$RunLast.handle(CommandLine.java:1083) at picocli.CommandLine$RunLast.handle(CommandLine.java:1051) at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959) at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242) at picocli.CommandLine.parseWithHandler(CommandLine.java:1181) at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:61) at org.apache.hadoop.ozone.web.ozShell.Shell.execute(Shell.java:82) at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:52) at org.apache.hadoop.ozone.web.ozShell.Shell.main(Shell.java:93) Caused by: java.lang.VerifyError: Cannot inherit from final class at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.(OzoneManagerProtocolClientSideTranslatorPB.java:169) at org.apache.hadoop.ozone.client.rpc.RpcClient.(RpcClient.java:142) ... 20 more Couldn't create protocol class org.apache.hadoop.ozone.client.rpc.RpcClient {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1326) putkey operation failed with java.lang.ArrayIndexOutOfBoundsException
Nilotpal Nandi created HDDS-1326: Summary: putkey operation failed with java.lang.ArrayIndexOutOfBoundsException Key: HDDS-1326 URL: https://issues.apache.org/jira/browse/HDDS-1326 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Nilotpal Nandi steps taken : --- # trying to write key in 40 node cluster. # write failed. client output --- {noformat} e530-491c-ab03-3b1c34d1a751:c80390, 974a806d-bf7d-4f1b-adb4-d51d802d368a:c80390, 469bd8c4-5da2-43bb-bc4b-7edd884931e5:c80390] 2019-03-22 10:56:19,592 [main] WARN - Encountered exception {} java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.ExecutionException: java.util.concurrent.CompletionException: org.apache.ratis.protocol.StateMachineException: org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException from Server 5d3eb91f-e530-491c-ab03-3b1c34d1a751: Container 1269 in CLOSED state at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:511) at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:144) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:565) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:329) at org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:273) at org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:96) at org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:111) at org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:53) at picocli.CommandLine.execute(CommandLine.java:919) at picocli.CommandLine.access$700(CommandLine.java:104) at picocli.CommandLine$RunLast.handle(CommandLine.java:1083) at picocli.CommandLine$RunLast.handle(CommandLine.java:1051) at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959) at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242) at picocli.CommandLine.parseWithHandler(CommandLine.java:1181) at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:61) at org.apache.hadoop.ozone.web.ozShell.Shell.execute(Shell.java:82) at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:52) at org.apache.hadoop.ozone.web.ozShell.Shell.main(Shell.java:93) Caused by: java.util.concurrent.ExecutionException: java.util.concurrent.CompletionException: org.apache.ratis.protocol.StateMachineException: org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException from Server 5d3eb91f-e530-491c-ab03-3b1c34d1a751: Container 1269 in CLOSED state at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.waitOnFlushFutures(BlockOutputStream.java:529) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:481) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:496) ... 19 more Caused by: java.util.concurrent.CompletionException: org.apache.ratis.protocol.StateMachineException: org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException from Server 5d3eb91f-e530-491c-ab03-3b1c34d1a751: Container 1269 in CLOSED state at org.apache.ratis.client.impl.RaftClientImpl.handleStateMachineException(RaftClientImpl.java:402) at org.apache.ratis.client.impl.RaftClientImpl.lambda$sendAsync$3(RaftClientImpl.java:198) at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) at org.apache.ratis.client.impl.RaftClientImpl$PendingAsyncRequest.setReply(RaftClientImpl.java:95) at org.apache.ratis.client.impl.RaftClientImpl$PendingAsyncRequest.setReply(RaftClientImpl.java:75) at org.apache.ratis.util.SlidingWindow$RequestMap.setReply(SlidingWindow.java:127) at org.apache.ratis.util.SlidingWindow$Client.receiveReply(SlidingWindow.java:279) at org.apache.ratis.client.impl.RaftClientImpl.lambda$sendRequestAsync$13(RaftClientImpl.java:344) at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) at
[jira] [Created] (HDDS-1325) Exception thrown while initializing ozoneClientAdapter
Nilotpal Nandi created HDDS-1325: Summary: Exception thrown while initializing ozoneClientAdapter Key: HDDS-1325 URL: https://issues.apache.org/jira/browse/HDDS-1325 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Nilotpal Nandi ozone version : {noformat} Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r 568d3ab8b65d1348dec9c971feffe200e6cba2ef Compiled by nnandi on 2019-03-19T03:54Z Compiled with protoc 2.5.0 >From source with checksum c44d339e20094d3054754078afbf4c Using HDDS 0.5.0-SNAPSHOT Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r 568d3ab8b65d1348dec9c971feffe200e6cba2ef Compiled by nnandi on 2019-03-19T03:53Z Compiled with protoc 2.5.0 >From source with checksum b354934fb1352f4d5425114bf8dce11 {noformat} steps taken : --- # Add ozone libs in hadoop classpath. # Tried to run s3dupdo workload ([https://github.com/t3rmin4t0r/s3dupdo]) Here is the exception thrown : {noformat} java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.fs.ozone.OzoneClientAdapterFactory.lambda$createAdapter$1(OzoneClientAdapterFactory.java:65) at org.apache.hadoop.fs.ozone.OzoneClientAdapterFactory.createAdapter(OzoneClientAdapterFactory.java:105) at org.apache.hadoop.fs.ozone.OzoneClientAdapterFactory.createAdapter(OzoneClientAdapterFactory.java:61) at org.apache.hadoop.fs.ozone.OzoneFileSystem.initialize(OzoneFileSystem.java:167) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352) at org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:3326) at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:532) at org.notmysock.repl.Works$CopyWorker.run(Works.java:243) at org.notmysock.repl.Works$CopyWorker.call(Works.java:279) at org.notmysock.repl.Works$CopyWorker.call(Works.java:204) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.LinkageError: loader constraint violation: loader (instance of org/apache/hadoop/fs/ozone/FilteredClassLoader) previously initiated loading for a different type with name "org/apache/hadoop/security/token/Token" at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.fs.ozone.FilteredClassLoader.loadClass(FilteredClassLoader.java:71) at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.privateGetPublicMethods(Class.java:2902) at java.lang.Class.getMethods(Class.java:1615) at sun.misc.ProxyGenerator.generateClassFile(ProxyGenerator.java:451) at sun.misc.ProxyGenerator.generateProxyClass(ProxyGenerator.java:339) at java.lang.reflect.Proxy$ProxyClassFactory.apply(Proxy.java:639) at java.lang.reflect.Proxy$ProxyClassFactory.apply(Proxy.java:557) at java.lang.reflect.WeakCache$Factory.get(WeakCache.java:230) at java.lang.reflect.WeakCache.get(WeakCache.java:127) at java.lang.reflect.Proxy.getProxyClass0(Proxy.java:419) at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:719) at org.apache.hadoop.ozone.client.OzoneClientFactory.getClient(OzoneClientFactory.java:264) at org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:169) at org.apache.hadoop.fs.ozone.OzoneClientAdapterImpl.(OzoneClientAdapterImpl.java:140) at org.apache.hadoop.fs.ozone.OzoneClientAdapterImpl.(OzoneClientAdapterImpl.java:104) at org.apache.hadoop.fs.ozone.OzoneClientAdapterImpl.(OzoneClientAdapterImpl.java:75) ... 20 more{noformat}
[jira] [Commented] (HDDS-1298) blockade tests failing as the nodes are not able to communicate with Ozone Manager
[ https://issues.apache.org/jira/browse/HDDS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794772#comment-16794772 ] Nilotpal Nandi commented on HDDS-1298: -- logs of all docker nodes: [^alllogs.log] > blockade tests failing as the nodes are not able to communicate with Ozone > Manager > -- > > Key: HDDS-1298 > URL: https://issues.apache.org/jira/browse/HDDS-1298 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Priority: Critical > Attachments: alllogs.log > > > steps taken: > > # started 3 datanodes docker cluster. > # freon run fails with error : "No such service: ozoneManager" > > {noformat} > om_1 | STARTUP_MSG: build = https://github.com/apache/hadoop.git -r > e97acb3bd8f3befd27418996fa5d4b50bf2e17bf; compiled by 'sunilg' on > 2019-01-15T17:34Z > om_1 | STARTUP_MSG: java = 11.0.1 > om_1 | / > om_1 | 2019-03-18 06:31:41 INFO OzoneManager:51 - registered UNIX signal > handlers for [TERM, HUP, INT] > om_1 | 2019-03-18 06:31:41 WARN ScmUtils:77 - ozone.om.db.dirs is not > configured. We recommend adding this setting. Falling back to > ozone.metadata.dirs instead. > om_1 | 2019-03-18 06:31:41 INFO OzoneManager:484 - OM Service ID is not set. > Setting it to the default ID: omServiceIdDefault > om_1 | 2019-03-18 06:31:41 INFO OzoneManager:490 - OM Node ID is not set. > Setting it to the OmStorage's OmID: 25501758-f7f6-42d5-8196-52a885af7e23 > om_1 | 2019-03-18 06:31:41 INFO OzoneManager:441 - Found matching OM address > with OMServiceId: null, OMNodeId: null, RPC Address: om:9862 and Ratis port: > 9872 > om_1 | 2019-03-18 06:31:42 WARN ScmUtils:77 - ozone.om.db.dirs is not > configured. We recommend adding this setting. Falling back to > ozone.metadata.dirs instead. > om_1 | 2019-03-18 06:31:42 INFO log:192 - Logging initialized @4061ms > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: userTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:userTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: volumeTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:volumeTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: bucketTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:bucketTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: keyTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:keyTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: deletedTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:deletedTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: openKeyTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:openKeyTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: s3Table > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:s3Table > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: multipartInfoTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:multipartInfoTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: s3SecretTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:s3SecretTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: default > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:158 - Using default column > profile:DBProfile.DISK for Table:default > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:189 - Using default options. > DBProfile.DISK > om_1 | 2019-03-18 06:31:42 INFO CallQueueManager:84 - Using callQueue: class > java.util.concurrent.LinkedBlockingQueue, queueCapacity: 2000, scheduler: > class org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false. > om_1 | 2019-03-18 06:31:42 INFO Server:1074 - Starting Socket Reader #1 for > port 9862 > om_1 | 2019-03-18 06:31:43 WARN ScmUtils:77 - ozone.om.db.dirs is not > configured. We recommend adding this setting. Falling back to > ozone.metadata.dirs instead. > om_1 |
[jira] [Updated] (HDDS-1298) blockade tests failing as the nodes are not able to communicate with Ozone Manager
[ https://issues.apache.org/jira/browse/HDDS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1298: - Attachment: alllogs.log > blockade tests failing as the nodes are not able to communicate with Ozone > Manager > -- > > Key: HDDS-1298 > URL: https://issues.apache.org/jira/browse/HDDS-1298 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Priority: Critical > Attachments: alllogs.log > > > steps taken: > > # started 3 datanodes docker cluster. > # freon run fails with error : "No such service: ozoneManager" > > {noformat} > om_1 | STARTUP_MSG: build = https://github.com/apache/hadoop.git -r > e97acb3bd8f3befd27418996fa5d4b50bf2e17bf; compiled by 'sunilg' on > 2019-01-15T17:34Z > om_1 | STARTUP_MSG: java = 11.0.1 > om_1 | / > om_1 | 2019-03-18 06:31:41 INFO OzoneManager:51 - registered UNIX signal > handlers for [TERM, HUP, INT] > om_1 | 2019-03-18 06:31:41 WARN ScmUtils:77 - ozone.om.db.dirs is not > configured. We recommend adding this setting. Falling back to > ozone.metadata.dirs instead. > om_1 | 2019-03-18 06:31:41 INFO OzoneManager:484 - OM Service ID is not set. > Setting it to the default ID: omServiceIdDefault > om_1 | 2019-03-18 06:31:41 INFO OzoneManager:490 - OM Node ID is not set. > Setting it to the OmStorage's OmID: 25501758-f7f6-42d5-8196-52a885af7e23 > om_1 | 2019-03-18 06:31:41 INFO OzoneManager:441 - Found matching OM address > with OMServiceId: null, OMNodeId: null, RPC Address: om:9862 and Ratis port: > 9872 > om_1 | 2019-03-18 06:31:42 WARN ScmUtils:77 - ozone.om.db.dirs is not > configured. We recommend adding this setting. Falling back to > ozone.metadata.dirs instead. > om_1 | 2019-03-18 06:31:42 INFO log:192 - Logging initialized @4061ms > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: userTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:userTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: volumeTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:volumeTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: bucketTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:bucketTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: keyTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:keyTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: deletedTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:deletedTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: openKeyTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:openKeyTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: s3Table > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:s3Table > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: multipartInfoTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:multipartInfoTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: s3SecretTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column > profile:DBProfile.DISK for Table:s3SecretTable > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for > table: default > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:158 - Using default column > profile:DBProfile.DISK for Table:default > om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:189 - Using default options. > DBProfile.DISK > om_1 | 2019-03-18 06:31:42 INFO CallQueueManager:84 - Using callQueue: class > java.util.concurrent.LinkedBlockingQueue, queueCapacity: 2000, scheduler: > class org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false. > om_1 | 2019-03-18 06:31:42 INFO Server:1074 - Starting Socket Reader #1 for > port 9862 > om_1 | 2019-03-18 06:31:43 WARN ScmUtils:77 - ozone.om.db.dirs is not > configured. We recommend adding this setting. Falling back to > ozone.metadata.dirs instead. > om_1 | 2019-03-18 06:31:43 INFO OzoneManager:1129 - OzoneManager
[jira] [Created] (HDDS-1298) blockade tests failing as the nodes are not able to communicate with Ozone Manager
Nilotpal Nandi created HDDS-1298: Summary: blockade tests failing as the nodes are not able to communicate with Ozone Manager Key: HDDS-1298 URL: https://issues.apache.org/jira/browse/HDDS-1298 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Nilotpal Nandi steps taken: # started 3 datanodes docker cluster. # freon run fails with error : "No such service: ozoneManager" {noformat} om_1 | STARTUP_MSG: build = https://github.com/apache/hadoop.git -r e97acb3bd8f3befd27418996fa5d4b50bf2e17bf; compiled by 'sunilg' on 2019-01-15T17:34Z om_1 | STARTUP_MSG: java = 11.0.1 om_1 | / om_1 | 2019-03-18 06:31:41 INFO OzoneManager:51 - registered UNIX signal handlers for [TERM, HUP, INT] om_1 | 2019-03-18 06:31:41 WARN ScmUtils:77 - ozone.om.db.dirs is not configured. We recommend adding this setting. Falling back to ozone.metadata.dirs instead. om_1 | 2019-03-18 06:31:41 INFO OzoneManager:484 - OM Service ID is not set. Setting it to the default ID: omServiceIdDefault om_1 | 2019-03-18 06:31:41 INFO OzoneManager:490 - OM Node ID is not set. Setting it to the OmStorage's OmID: 25501758-f7f6-42d5-8196-52a885af7e23 om_1 | 2019-03-18 06:31:41 INFO OzoneManager:441 - Found matching OM address with OMServiceId: null, OMNodeId: null, RPC Address: om:9862 and Ratis port: 9872 om_1 | 2019-03-18 06:31:42 WARN ScmUtils:77 - ozone.om.db.dirs is not configured. We recommend adding this setting. Falling back to ozone.metadata.dirs instead. om_1 | 2019-03-18 06:31:42 INFO log:192 - Logging initialized @4061ms om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for table: userTable om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column profile:DBProfile.DISK for Table:userTable om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for table: volumeTable om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column profile:DBProfile.DISK for Table:volumeTable om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for table: bucketTable om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column profile:DBProfile.DISK for Table:bucketTable om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for table: keyTable om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column profile:DBProfile.DISK for Table:keyTable om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for table: deletedTable om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column profile:DBProfile.DISK for Table:deletedTable om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for table: openKeyTable om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column profile:DBProfile.DISK for Table:openKeyTable om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for table: s3Table om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column profile:DBProfile.DISK for Table:s3Table om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for table: multipartInfoTable om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column profile:DBProfile.DISK for Table:multipartInfoTable om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for table: s3SecretTable om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:152 - Using default column profile:DBProfile.DISK for Table:s3SecretTable om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:101 - using custom profile for table: default om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:158 - Using default column profile:DBProfile.DISK for Table:default om_1 | 2019-03-18 06:31:42 INFO DBStoreBuilder:189 - Using default options. DBProfile.DISK om_1 | 2019-03-18 06:31:42 INFO CallQueueManager:84 - Using callQueue: class java.util.concurrent.LinkedBlockingQueue, queueCapacity: 2000, scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false. om_1 | 2019-03-18 06:31:42 INFO Server:1074 - Starting Socket Reader #1 for port 9862 om_1 | 2019-03-18 06:31:43 WARN ScmUtils:77 - ozone.om.db.dirs is not configured. We recommend adding this setting. Falling back to ozone.metadata.dirs instead. om_1 | 2019-03-18 06:31:43 INFO OzoneManager:1129 - OzoneManager RPC server is listening at om/172.21.0.3:9862 om_1 | 2019-03-18 06:31:43 INFO MetricsConfig:118 - Loaded properties from hadoop-metrics2.properties om_1 | 2019-03-18 06:31:43 INFO MetricsSystemImpl:374 - Scheduled Metric snapshot period at 10 second(s). om_1 | 2019-03-18 06:31:43 INFO MetricsSystemImpl:191 - OzoneManager metrics system started om_1 | 2019-03-18 06:31:43 INFO Server:1314 - IPC Server Responder: starting om_1 | 2019-03-18
[jira] [Comment Edited] (HDDS-1088) Add blockade Tests to test Replica Manager
[ https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793727#comment-16793727 ] Nilotpal Nandi edited comment on HDDS-1088 at 3/15/19 3:44 PM: --- Thanks [~shashikant]. Please note that there is some existing issue due to which pylint is throwing error. Need to be resolved later. was (Author: nilotpalnandi): Thanks [~shashikant]. Please note that there is some existing issue due to which pylint is throwing error. Neeed to resolved later. > Add blockade Tests to test Replica Manager > -- > > Key: HDDS-1088 > URL: https://issues.apache.org/jira/browse/HDDS-1088 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1088.001.patch, HDDS-1088.002.patch, > HDDS-1088.003.patch, HDDS-1088.004.patch, HDDS-1088.005.patch, > HDDS-1088.006.patch, HDDS-1088.007.patch, HDDS-1088.008.patch > > > We need to add tests for testing Replica Manager for scenarios like loss of > node, adding new nodes, under-replicated containers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1088) Add blockade Tests to test Replica Manager
[ https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793727#comment-16793727 ] Nilotpal Nandi commented on HDDS-1088: -- Thanks [~shashikant]. Please note that there is some existing issue due to which pylint is throwing error. Neeed to resolved later. > Add blockade Tests to test Replica Manager > -- > > Key: HDDS-1088 > URL: https://issues.apache.org/jira/browse/HDDS-1088 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1088.001.patch, HDDS-1088.002.patch, > HDDS-1088.003.patch, HDDS-1088.004.patch, HDDS-1088.005.patch, > HDDS-1088.006.patch, HDDS-1088.007.patch, HDDS-1088.008.patch > > > We need to add tests for testing Replica Manager for scenarios like loss of > node, adding new nodes, under-replicated containers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1289) get Key failed on SCM restart
[ https://issues.apache.org/jira/browse/HDDS-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1289: - Description: Seeing ContainerNotFoundException in scm log when get key operation tried after scm restart. scm.log: [^hadoop-hdfs-scm-ctr-e139-1542663976389-86524-01-03.log] {noformat} ozone version : Source code repository g...@github.com:hortonworks/ozone.git -r 67b7c4fd071b3f557bdb54be2a266b8a611cbce6 Compiled by jenkins on 2019-03-06T22:02Z Compiled with protoc 2.5.0 >From source with checksum 65be9a337d178cd3855f5c5a2f111 Using HDDS 0.4.0.3.0.100.0-348 Source code repository g...@github.com:hortonworks/ozone.git -r 67b7c4fd071b3f557bdb54be2a266b8a611cbce6 Compiled by jenkins on 2019-03-06T22:01Z Compiled with protoc 2.5.0 >From source with checksum 324109cb3e8b188c1b89dc0b328c3a root@ctr-e139-1542663976389-86524-01-06 hdfs# hadoop version Hadoop 3.1.1.3.0.100.0-348 Source code repository g...@github.com:hortonworks/hadoop.git -r 484434b1c2480bdc9314a7ee1ade8a0f4db1758f Compiled by jenkins on 2019-03-06T22:14Z Compiled with protoc 2.5.0 >From source with checksum ba6aad94c14256ef3ad8634e3b5086 This command was run using /usr/hdp/3.0.100.0-348/hadoop/hadoop-common-3.1.1.3.0.100.0-348.jar {noformat} {noformat} 2019-03-13 17:00:54,348 ERROR container.ContainerReportHandler (ContainerReportHandler.java:processContainerReplicas(173)) - Received container report for an unknown container 22 from datanode 80f046cb-6fe2-4a05-bb67-9bf46f48723b{ip: 172.27.69.155, host: ctr-e139-1542663976389-86524-01-05.hwx.site} {} org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: #22 at org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.checkIfContainerExist(ContainerStateMap.java:543) at org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.updateContainerReplica(ContainerStateMap.java:230) at org.apache.hadoop.hdds.scm.container.ContainerStateManager.updateContainerReplica(ContainerStateManager.java:565) at org.apache.hadoop.hdds.scm.container.SCMContainerManager.updateContainerReplica(SCMContainerManager.java:393) at org.apache.hadoop.hdds.scm.container.ReportHandlerHelper.processContainerReplica(ReportHandlerHelper.java:74) at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:159) at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:110) at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:51) at org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-03-13 17:00:54,349 ERROR container.ContainerReportHandler (ContainerReportHandler.java:processContainerReplicas(173)) - Received container report for an unknown container 23 from datanode 80f046cb-6fe2-4a05-bb67-9bf46f48723b{ip: 172.27.69.155, host: ctr-e139-1542663976389-86524-01-05.hwx.site} {} org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: #23 at org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.checkIfContainerExist(ContainerStateMap.java:543) at org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.updateContainerReplica(ContainerStateMap.java:230) at org.apache.hadoop.hdds.scm.container.ContainerStateManager.updateContainerReplica(ContainerStateManager.java:565) at org.apache.hadoop.hdds.scm.container.SCMContainerManager.updateContainerReplica(SCMContainerManager.java:393) at org.apache.hadoop.hdds.scm.container.ReportHandlerHelper.processContainerReplica(ReportHandlerHelper.java:74) at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:159) at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:110) at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:51) at org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-03-13 17:01:24,230 ERROR container.ContainerReportHandler (ContainerReportHandler.java:processContainerReplicas(173)) - Received container report for an unknown container 22 from datanode 076fd0d8-ab5f-4fbe-ad10-b71a1ccb19bf{ip: 172.27.39.88, host: ctr-e139-1542663976389-86524-01-04.hwx.site} {}
[jira] [Created] (HDDS-1289) get Key failed on SCM restart
Nilotpal Nandi created HDDS-1289: Summary: get Key failed on SCM restart Key: HDDS-1289 URL: https://issues.apache.org/jira/browse/HDDS-1289 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Nilotpal Nandi Attachments: hadoop-hdfs-scm-ctr-e139-1542663976389-86524-01-03.log Seeing ContainerNotFoundException in scm log when get key operation tried after scm restart. scm.log: [^hadoop-hdfs-scm-ctr-e139-1542663976389-86524-01-03.log] {noformat} 2019-03-13 17:00:54,348 ERROR container.ContainerReportHandler (ContainerReportHandler.java:processContainerReplicas(173)) - Received container report for an unknown container 22 from datanode 80f046cb-6fe2-4a05-bb67-9bf46f48723b{ip: 172.27.69.155, host: ctr-e139-1542663976389-86524-01-05.hwx.site} {} org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: #22 at org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.checkIfContainerExist(ContainerStateMap.java:543) at org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.updateContainerReplica(ContainerStateMap.java:230) at org.apache.hadoop.hdds.scm.container.ContainerStateManager.updateContainerReplica(ContainerStateManager.java:565) at org.apache.hadoop.hdds.scm.container.SCMContainerManager.updateContainerReplica(SCMContainerManager.java:393) at org.apache.hadoop.hdds.scm.container.ReportHandlerHelper.processContainerReplica(ReportHandlerHelper.java:74) at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:159) at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:110) at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:51) at org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-03-13 17:00:54,349 ERROR container.ContainerReportHandler (ContainerReportHandler.java:processContainerReplicas(173)) - Received container report for an unknown container 23 from datanode 80f046cb-6fe2-4a05-bb67-9bf46f48723b{ip: 172.27.69.155, host: ctr-e139-1542663976389-86524-01-05.hwx.site} {} org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: #23 at org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.checkIfContainerExist(ContainerStateMap.java:543) at org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.updateContainerReplica(ContainerStateMap.java:230) at org.apache.hadoop.hdds.scm.container.ContainerStateManager.updateContainerReplica(ContainerStateManager.java:565) at org.apache.hadoop.hdds.scm.container.SCMContainerManager.updateContainerReplica(SCMContainerManager.java:393) at org.apache.hadoop.hdds.scm.container.ReportHandlerHelper.processContainerReplica(ReportHandlerHelper.java:74) at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:159) at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:110) at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:51) at org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-03-13 17:01:24,230 ERROR container.ContainerReportHandler (ContainerReportHandler.java:processContainerReplicas(173)) - Received container report for an unknown container 22 from datanode 076fd0d8-ab5f-4fbe-ad10-b71a1ccb19bf{ip: 172.27.39.88, host: ctr-e139-1542663976389-86524-01-04.hwx.site} {} org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: #22 at org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.checkIfContainerExist(ContainerStateMap.java:543) at org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.updateContainerReplica(ContainerStateMap.java:230) at org.apache.hadoop.hdds.scm.container.ContainerStateManager.updateContainerReplica(ContainerStateManager.java:565) at org.apache.hadoop.hdds.scm.container.SCMContainerManager.updateContainerReplica(SCMContainerManager.java:393) at org.apache.hadoop.hdds.scm.container.ReportHandlerHelper.processContainerReplica(ReportHandlerHelper.java:74) at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:159) at
[jira] [Updated] (HDDS-1289) get Key failed on SCM restart
[ https://issues.apache.org/jira/browse/HDDS-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1289: - Component/s: SCM > get Key failed on SCM restart > - > > Key: HDDS-1289 > URL: https://issues.apache.org/jira/browse/HDDS-1289 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: Nilotpal Nandi >Priority: Critical > Attachments: > hadoop-hdfs-scm-ctr-e139-1542663976389-86524-01-03.log > > > Seeing ContainerNotFoundException in scm log when get key operation tried > after scm restart. > scm.log: > [^hadoop-hdfs-scm-ctr-e139-1542663976389-86524-01-03.log] > > > {noformat} > 2019-03-13 17:00:54,348 ERROR container.ContainerReportHandler > (ContainerReportHandler.java:processContainerReplicas(173)) - Received > container report for an unknown container 22 from datanode > 80f046cb-6fe2-4a05-bb67-9bf46f48723b{ip: 172.27.69.155, host: > ctr-e139-1542663976389-86524-01-05.hwx.site} {} > org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: #22 at > org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.checkIfContainerExist(ContainerStateMap.java:543) > at > org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.updateContainerReplica(ContainerStateMap.java:230) > at > org.apache.hadoop.hdds.scm.container.ContainerStateManager.updateContainerReplica(ContainerStateManager.java:565) > at > org.apache.hadoop.hdds.scm.container.SCMContainerManager.updateContainerReplica(SCMContainerManager.java:393) > at > org.apache.hadoop.hdds.scm.container.ReportHandlerHelper.processContainerReplica(ReportHandlerHelper.java:74) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:159) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:110) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:51) > at > org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) 2019-03-13 17:00:54,349 ERROR > container.ContainerReportHandler > (ContainerReportHandler.java:processContainerReplicas(173)) - Received > container report for an unknown container 23 from datanode > 80f046cb-6fe2-4a05-bb67-9bf46f48723b{ip: 172.27.69.155, host: > ctr-e139-1542663976389-86524-01-05.hwx.site} {} > org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: #23 at > org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.checkIfContainerExist(ContainerStateMap.java:543) > at > org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.updateContainerReplica(ContainerStateMap.java:230) > at > org.apache.hadoop.hdds.scm.container.ContainerStateManager.updateContainerReplica(ContainerStateManager.java:565) > at > org.apache.hadoop.hdds.scm.container.SCMContainerManager.updateContainerReplica(SCMContainerManager.java:393) > at > org.apache.hadoop.hdds.scm.container.ReportHandlerHelper.processContainerReplica(ReportHandlerHelper.java:74) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:159) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:110) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:51) > at > org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) 2019-03-13 17:01:24,230 ERROR > container.ContainerReportHandler > (ContainerReportHandler.java:processContainerReplicas(173)) - Received > container report for an unknown container 22 from datanode > 076fd0d8-ab5f-4fbe-ad10-b71a1ccb19bf{ip: 172.27.39.88, host: > ctr-e139-1542663976389-86524-01-04.hwx.site} {} > org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: #22 at > org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.checkIfContainerExist(ContainerStateMap.java:543) > at > org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.updateContainerReplica(ContainerStateMap.java:230) > at > org.apache.hadoop.hdds.scm.container.ContainerStateManager.updateContainerReplica(ContainerStateManager.java:565) > at >
[jira] [Created] (HDDS-1290) ozone.log is not getting created in logs directory
Nilotpal Nandi created HDDS-1290: Summary: ozone.log is not getting created in logs directory Key: HDDS-1290 URL: https://issues.apache.org/jira/browse/HDDS-1290 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Client Reporter: Nilotpal Nandi ozone.log is getting created in the log directory of the client or any other nodes of ozone cluster. ozone version : Source code repository g...@github.com:hortonworks/ozone.git -r 67b7c4fd071b3f557bdb54be2a266b8a611cbce6 Compiled by jenkins on 2019-03-06T22:02Z Compiled with protoc 2.5.0 >From source with checksum 65be9a337d178cd3855f5c5a2f111 Using HDDS 0.4.0.3.0.100.0-348 Source code repository g...@github.com:hortonworks/ozone.git -r 67b7c4fd071b3f557bdb54be2a266b8a611cbce6 Compiled by jenkins on 2019-03-06T22:01Z Compiled with protoc 2.5.0 >From source with checksum 324109cb3e8b188c1b89dc0b328c3a [root@ctr-e139-1542663976389-86524-01-06 hdfs]# hadoop version Hadoop 3.1.1.3.0.100.0-348 Source code repository g...@github.com:hortonworks/hadoop.git -r 484434b1c2480bdc9314a7ee1ade8a0f4db1758f Compiled by jenkins on 2019-03-06T22:14Z Compiled with protoc 2.5.0 >From source with checksum ba6aad94c14256ef3ad8634e3b5086 This command was run using /usr/hdp/3.0.100.0-348/hadoop/hadoop-common-3.1.1.3.0.100.0-348.jar -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1290) ozone.log is not getting created in logs directory
[ https://issues.apache.org/jira/browse/HDDS-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1290: - Description: ozone.log is not getting created in the log directory of the client or any other nodes of ozone cluster. ozone version : Source code repository g...@github.com:hortonworks/ozone.git -r 67b7c4fd071b3f557bdb54be2a266b8a611cbce6 Compiled by jenkins on 2019-03-06T22:02Z Compiled with protoc 2.5.0 From source with checksum 65be9a337d178cd3855f5c5a2f111 Using HDDS 0.4.0.3.0.100.0-348 Source code repository g...@github.com:hortonworks/ozone.git -r 67b7c4fd071b3f557bdb54be2a266b8a611cbce6 Compiled by jenkins on 2019-03-06T22:01Z Compiled with protoc 2.5.0 From source with checksum 324109cb3e8b188c1b89dc0b328c3a [root@ctr-e139-1542663976389-86524-01-06 hdfs]# hadoop version Hadoop 3.1.1.3.0.100.0-348 Source code repository g...@github.com:hortonworks/hadoop.git -r 484434b1c2480bdc9314a7ee1ade8a0f4db1758f Compiled by jenkins on 2019-03-06T22:14Z Compiled with protoc 2.5.0 From source with checksum ba6aad94c14256ef3ad8634e3b5086 This command was run using /usr/hdp/3.0.100.0-348/hadoop/hadoop-common-3.1.1.3.0.100.0-348.jar was: ozone.log is getting created in the log directory of the client or any other nodes of ozone cluster. ozone version : Source code repository g...@github.com:hortonworks/ozone.git -r 67b7c4fd071b3f557bdb54be2a266b8a611cbce6 Compiled by jenkins on 2019-03-06T22:02Z Compiled with protoc 2.5.0 >From source with checksum 65be9a337d178cd3855f5c5a2f111 Using HDDS 0.4.0.3.0.100.0-348 Source code repository g...@github.com:hortonworks/ozone.git -r 67b7c4fd071b3f557bdb54be2a266b8a611cbce6 Compiled by jenkins on 2019-03-06T22:01Z Compiled with protoc 2.5.0 >From source with checksum 324109cb3e8b188c1b89dc0b328c3a [root@ctr-e139-1542663976389-86524-01-06 hdfs]# hadoop version Hadoop 3.1.1.3.0.100.0-348 Source code repository g...@github.com:hortonworks/hadoop.git -r 484434b1c2480bdc9314a7ee1ade8a0f4db1758f Compiled by jenkins on 2019-03-06T22:14Z Compiled with protoc 2.5.0 >From source with checksum ba6aad94c14256ef3ad8634e3b5086 This command was run using /usr/hdp/3.0.100.0-348/hadoop/hadoop-common-3.1.1.3.0.100.0-348.jar > ozone.log is not getting created in logs directory > -- > > Key: HDDS-1290 > URL: https://issues.apache.org/jira/browse/HDDS-1290 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Nilotpal Nandi >Priority: Major > > ozone.log is not getting created in the log directory of the client or any > other nodes of ozone cluster. > ozone version : > > Source code repository g...@github.com:hortonworks/ozone.git -r > 67b7c4fd071b3f557bdb54be2a266b8a611cbce6 > Compiled by jenkins on 2019-03-06T22:02Z > Compiled with protoc 2.5.0 > From source with checksum 65be9a337d178cd3855f5c5a2f111 > Using HDDS 0.4.0.3.0.100.0-348 > Source code repository g...@github.com:hortonworks/ozone.git -r > 67b7c4fd071b3f557bdb54be2a266b8a611cbce6 > Compiled by jenkins on 2019-03-06T22:01Z > Compiled with protoc 2.5.0 > From source with checksum 324109cb3e8b188c1b89dc0b328c3a > [root@ctr-e139-1542663976389-86524-01-06 hdfs]# hadoop version > Hadoop 3.1.1.3.0.100.0-348 > Source code repository g...@github.com:hortonworks/hadoop.git -r > 484434b1c2480bdc9314a7ee1ade8a0f4db1758f > Compiled by jenkins on 2019-03-06T22:14Z > Compiled with protoc 2.5.0 > From source with checksum ba6aad94c14256ef3ad8634e3b5086 > This command was run using > /usr/hdp/3.0.100.0-348/hadoop/hadoop-common-3.1.1.3.0.100.0-348.jar -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1088) Add blockade Tests to test Replica Manager
[ https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1088: - Attachment: HDDS-1088.008.patch > Add blockade Tests to test Replica Manager > -- > > Key: HDDS-1088 > URL: https://issues.apache.org/jira/browse/HDDS-1088 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1088.001.patch, HDDS-1088.002.patch, > HDDS-1088.003.patch, HDDS-1088.004.patch, HDDS-1088.005.patch, > HDDS-1088.006.patch, HDDS-1088.007.patch, HDDS-1088.008.patch > > > We need to add tests for testing Replica Manager for scenarios like loss of > node, adding new nodes, under-replicated containers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1088) Add blockade Tests to test Replica Manager
[ https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1088: - Attachment: HDDS-1088.007.patch > Add blockade Tests to test Replica Manager > -- > > Key: HDDS-1088 > URL: https://issues.apache.org/jira/browse/HDDS-1088 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1088.001.patch, HDDS-1088.002.patch, > HDDS-1088.003.patch, HDDS-1088.004.patch, HDDS-1088.005.patch, > HDDS-1088.006.patch, HDDS-1088.007.patch > > > We need to add tests for testing Replica Manager for scenarios like loss of > node, adding new nodes, under-replicated containers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1088) Add blockade Tests to test Replica Manager
[ https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1088: - Attachment: HDDS-1088.006.patch > Add blockade Tests to test Replica Manager > -- > > Key: HDDS-1088 > URL: https://issues.apache.org/jira/browse/HDDS-1088 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1088.001.patch, HDDS-1088.002.patch, > HDDS-1088.003.patch, HDDS-1088.004.patch, HDDS-1088.005.patch, > HDDS-1088.006.patch > > > We need to add tests for testing Replica Manager for scenarios like loss of > node, adding new nodes, under-replicated containers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1088) Add blockade Tests to test Replica Manager
[ https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1088: - Attachment: (was: HDDS-1088.006.patch) > Add blockade Tests to test Replica Manager > -- > > Key: HDDS-1088 > URL: https://issues.apache.org/jira/browse/HDDS-1088 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1088.001.patch, HDDS-1088.002.patch, > HDDS-1088.003.patch, HDDS-1088.004.patch, HDDS-1088.005.patch, > HDDS-1088.006.patch > > > We need to add tests for testing Replica Manager for scenarios like loss of > node, adding new nodes, under-replicated containers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1088) Add blockade Tests to test Replica Manager
[ https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1088: - Attachment: HDDS-1088.006.patch > Add blockade Tests to test Replica Manager > -- > > Key: HDDS-1088 > URL: https://issues.apache.org/jira/browse/HDDS-1088 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1088.001.patch, HDDS-1088.002.patch, > HDDS-1088.003.patch, HDDS-1088.004.patch, HDDS-1088.005.patch, > HDDS-1088.006.patch > > > We need to add tests for testing Replica Manager for scenarios like loss of > node, adding new nodes, under-replicated containers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1088) Add blockade Tests to test Replica Manager
[ https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1088: - Attachment: HDDS-1088.005.patch > Add blockade Tests to test Replica Manager > -- > > Key: HDDS-1088 > URL: https://issues.apache.org/jira/browse/HDDS-1088 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1088.001.patch, HDDS-1088.002.patch, > HDDS-1088.003.patch, HDDS-1088.004.patch, HDDS-1088.005.patch > > > We need to add tests for testing Replica Manager for scenarios like loss of > node, adding new nodes, under-replicated containers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1251) all chunks are not deleted by block deletion even when all keys are deleted and all containers are closed
Nilotpal Nandi created HDDS-1251: Summary: all chunks are not deleted by block deletion even when all keys are deleted and all containers are closed Key: HDDS-1251 URL: https://issues.apache.org/jira/browse/HDDS-1251 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Nilotpal Nandi steps taken : --- # created 40 nodes cluster, wrote data on all datanodes. # deleted all keys from the cluster and all containers are closed. block deletion triggered and deleted most of the chunks from all datanodes. But , it could not delete all chunks even after several days. expectations : all chunks should be deleted if there is no key present in the cluster and all containers are closed -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1088) Add blockade Tests to test Replica Manager
[ https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1088: - Attachment: HDDS-1088.004.patch > Add blockade Tests to test Replica Manager > -- > > Key: HDDS-1088 > URL: https://issues.apache.org/jira/browse/HDDS-1088 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1088.001.patch, HDDS-1088.002.patch, > HDDS-1088.003.patch, HDDS-1088.004.patch > > > We need to add tests for testing Replica Manager for scenarios like loss of > node, adding new nodes, under-replicated containers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1088) Add blockade Tests to test Replica Manager
[ https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1088: - Attachment: HDDS-1088.003.patch > Add blockade Tests to test Replica Manager > -- > > Key: HDDS-1088 > URL: https://issues.apache.org/jira/browse/HDDS-1088 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1088.001.patch, HDDS-1088.002.patch, > HDDS-1088.003.patch > > > We need to add tests for testing Replica Manager for scenarios like loss of > node, adding new nodes, under-replicated containers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1088) Add blockade Tests to test Replica Manager
[ https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782714#comment-16782714 ] Nilotpal Nandi commented on HDDS-1088: -- Thanks [~shashikant] for the review. I have addressed the changes for comment # 1. > Add blockade Tests to test Replica Manager > -- > > Key: HDDS-1088 > URL: https://issues.apache.org/jira/browse/HDDS-1088 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1088.001.patch, HDDS-1088.002.patch > > > We need to add tests for testing Replica Manager for scenarios like loss of > node, adding new nodes, under-replicated containers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1206) need to handle in the client when one of the datanode disk goes out of space
Nilotpal Nandi created HDDS-1206: Summary: need to handle in the client when one of the datanode disk goes out of space Key: HDDS-1206 URL: https://issues.apache.org/jira/browse/HDDS-1206 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Nilotpal Nandi Assignee: Shashikant Banerjee steps taken : # create 40 datanode cluster. # one of the datanodes has less than 5 GB space. # Started writing key of size 600MB. operation failed: Error on the client: {noformat} Fri Mar 1 09:05:28 UTC 2019 Ruuning /root/hadoop_trunk/ozone-0.4.0-SNAPSHOT/bin/ozone sh key put testvol172275910-1551431122-1/testbuck172275910-1551431122-1/test_file24 /root/test_files/test_file24 original md5sum a6de00c9284708585f5a99b0490b0b23 2019-03-01 09:05:39,142 ERROR storage.BlockOutputStream: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 79 creation failed at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613) at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-03-01 09:05:39,578 ERROR storage.BlockOutputStream: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 79 creation failed at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613) at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-03-01 09:05:40,368 ERROR storage.BlockOutputStream: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 79 creation failed at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613) at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-03-01 09:05:40,450 ERROR storage.BlockOutputStream: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 79 creation failed at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613) at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
[jira] [Updated] (HDDS-1206) need to handle in the client when one of the datanode disk goes out of space
[ https://issues.apache.org/jira/browse/HDDS-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1206: - Component/s: Ozone Client > need to handle in the client when one of the datanode disk goes out of space > > > Key: HDDS-1206 > URL: https://issues.apache.org/jira/browse/HDDS-1206 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Nilotpal Nandi >Assignee: Shashikant Banerjee >Priority: Major > > steps taken : > > # create 40 datanode cluster. > # one of the datanodes has less than 5 GB space. > # Started writing key of size 600MB. > operation failed: > Error on the client: > > {noformat} > Fri Mar 1 09:05:28 UTC 2019 Ruuning > /root/hadoop_trunk/ozone-0.4.0-SNAPSHOT/bin/ozone sh key put > testvol172275910-1551431122-1/testbuck172275910-1551431122-1/test_file24 > /root/test_files/test_file24 > original md5sum a6de00c9284708585f5a99b0490b0b23 > 2019-03-01 09:05:39,142 ERROR storage.BlockOutputStream: Unexpected Storage > Container Exception: > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > ContainerID 79 creation failed > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613) > at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) > at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) > at > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-03-01 09:05:39,578 ERROR storage.BlockOutputStream: Unexpected Storage > Container Exception: > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > ContainerID 79 creation failed > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613) > at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) > at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) > at > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-03-01 09:05:40,368 ERROR storage.BlockOutputStream: Unexpected Storage > Container Exception: > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > ContainerID 79 creation failed > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613) > at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) > at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) > at > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-03-01 09:05:40,450 ERROR storage.BlockOutputStream: Unexpected Storage > Container Exception: > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > ContainerID 79 creation failed > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613) > at >
[jira] [Updated] (HDDS-1164) Add New blockade Tests to test Replica Manager
[ https://issues.apache.org/jira/browse/HDDS-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1164: - Status: Patch Available (was: Open) > Add New blockade Tests to test Replica Manager > -- > > Key: HDDS-1164 > URL: https://issues.apache.org/jira/browse/HDDS-1164 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1164.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1164) Add New blockade Tests to test Replica Manager
[ https://issues.apache.org/jira/browse/HDDS-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi reassigned HDDS-1164: Assignee: Nilotpal Nandi > Add New blockade Tests to test Replica Manager > -- > > Key: HDDS-1164 > URL: https://issues.apache.org/jira/browse/HDDS-1164 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1164) Add New blockade Tests to test Replica Manager
[ https://issues.apache.org/jira/browse/HDDS-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1164: - Attachment: HDDS-1164.001.patch > Add New blockade Tests to test Replica Manager > -- > > Key: HDDS-1164 > URL: https://issues.apache.org/jira/browse/HDDS-1164 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1164.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1088) Add blockade Tests to test Replica Manager
[ https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1088: - Attachment: HDDS-1088.002.patch > Add blockade Tests to test Replica Manager > -- > > Key: HDDS-1088 > URL: https://issues.apache.org/jira/browse/HDDS-1088 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1088.001.patch, HDDS-1088.002.patch > > > We need to add tests for testing Replica Manager for scenarios like loss of > node, adding new nodes, under-replicated containers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1164) Add New blockade Tests to test Replica Manager
Nilotpal Nandi created HDDS-1164: Summary: Add New blockade Tests to test Replica Manager Key: HDDS-1164 URL: https://issues.apache.org/jira/browse/HDDS-1164 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Nilotpal Nandi -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1131) destroy pipeline failed with PipelineNotFoundException
[ https://issues.apache.org/jira/browse/HDDS-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1131: - Fix Version/s: 0.4.0 > destroy pipeline failed with PipelineNotFoundException > -- > > Key: HDDS-1131 > URL: https://issues.apache.org/jira/browse/HDDS-1131 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: Nilotpal Nandi >Priority: Major > Fix For: 0.4.0 > > > steps taken : > > # created 12 datanodes cluster and running workload on all the nodes > exceptions seen in scm log > > {noformat} > 2019-02-18 07:17:51,112 INFO > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: destroying > pipeline:PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb with > group-012343D76ADB:[a40a7b01-a30b-469c-b373-9fcb20a126ed:172.27.54.212:9858, > 8c77b16b-8054-49e3-b669-1ff759cfd271:172.27.23.196:9858, > 943007c8-4fdd-4926-89e2-2c8c52c05073:172.27.76.72:9858] > 2019-02-18 07:17:51,112 INFO > org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler: Close > container Event triggered for container : #40 > 2019-02-18 07:17:51,113 INFO > org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler: Close > container Event triggered for container : #41 > 2019-02-18 07:17:51,114 INFO > org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler: Close > container Event triggered for container : #42 > 2019-02-18 07:22:51,127 WARN > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Pipeline destroy > failed for pipeline=PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb > dn=a40a7b01-a30b-469c-b373-9fcb20a126ed{ip: 172.27.54.212, host: > ctr-e139-1542663976389-62237-01-07.hwx.site} > 2019-02-18 07:22:51,139 WARN > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Pipeline destroy > failed for pipeline=PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb > dn=8c77b16b-8054-49e3-b669-1ff759cfd271{ip: 172.27.23.196, host: > ctr-e139-1542663976389-62237-01-15.hwx.site} > 2019-02-18 07:22:51,149 WARN > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Pipeline destroy > failed for pipeline=PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb > dn=943007c8-4fdd-4926-89e2-2c8c52c05073{ip: 172.27.76.72, host: > ctr-e139-1542663976389-62237-01-06.hwx.site} > 2019-02-18 07:22:51,150 ERROR > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Destroy pipeline > failed for pipeline:PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb with > group-012343D76ADB:[a40a7b01-a30b-469c-b373-9fcb20a126ed:172.27.54.212:9858, > 8c77b16b-8054-49e3-b669-1ff759cfd271:172.27.23.196:9858, > 943007c8-4fdd-4926-89e2-2c8c52c05073:172.27.76.72:9858] > org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: > PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb not found > at > org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getPipeline(PipelineStateMap.java:112) > at > org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.removePipeline(PipelineStateMap.java:247) > at > org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.removePipeline(PipelineStateManager.java:90) > at > org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.removePipeline(SCMPipelineManager.java:261) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils.destroyPipeline(RatisPipelineUtils.java:103) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils.lambda$finalizeAndDestroyPipeline$1(RatisPipelineUtils.java:133) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:85) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:104) > at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50) > at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1131) destroy pipeline failed with PipelineNotFoundException
Nilotpal Nandi created HDDS-1131: Summary: destroy pipeline failed with PipelineNotFoundException Key: HDDS-1131 URL: https://issues.apache.org/jira/browse/HDDS-1131 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Nilotpal Nandi steps taken : # created 12 datanodes cluster and running workload on all the nodes exceptions seen in scm log {noformat} 2019-02-18 07:17:51,112 INFO org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: destroying pipeline:PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb with group-012343D76ADB:[a40a7b01-a30b-469c-b373-9fcb20a126ed:172.27.54.212:9858, 8c77b16b-8054-49e3-b669-1ff759cfd271:172.27.23.196:9858, 943007c8-4fdd-4926-89e2-2c8c52c05073:172.27.76.72:9858] 2019-02-18 07:17:51,112 INFO org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler: Close container Event triggered for container : #40 2019-02-18 07:17:51,113 INFO org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler: Close container Event triggered for container : #41 2019-02-18 07:17:51,114 INFO org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler: Close container Event triggered for container : #42 2019-02-18 07:22:51,127 WARN org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Pipeline destroy failed for pipeline=PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb dn=a40a7b01-a30b-469c-b373-9fcb20a126ed{ip: 172.27.54.212, host: ctr-e139-1542663976389-62237-01-07.hwx.site} 2019-02-18 07:22:51,139 WARN org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Pipeline destroy failed for pipeline=PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb dn=8c77b16b-8054-49e3-b669-1ff759cfd271{ip: 172.27.23.196, host: ctr-e139-1542663976389-62237-01-15.hwx.site} 2019-02-18 07:22:51,149 WARN org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Pipeline destroy failed for pipeline=PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb dn=943007c8-4fdd-4926-89e2-2c8c52c05073{ip: 172.27.76.72, host: ctr-e139-1542663976389-62237-01-06.hwx.site} 2019-02-18 07:22:51,150 ERROR org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Destroy pipeline failed for pipeline:PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb with group-012343D76ADB:[a40a7b01-a30b-469c-b373-9fcb20a126ed:172.27.54.212:9858, 8c77b16b-8054-49e3-b669-1ff759cfd271:172.27.23.196:9858, 943007c8-4fdd-4926-89e2-2c8c52c05073:172.27.76.72:9858] org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb not found at org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getPipeline(PipelineStateMap.java:112) at org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.removePipeline(PipelineStateMap.java:247) at org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.removePipeline(PipelineStateManager.java:90) at org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.removePipeline(SCMPipelineManager.java:261) at org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils.destroyPipeline(RatisPipelineUtils.java:103) at org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils.lambda$finalizeAndDestroyPipeline$1(RatisPipelineUtils.java:133) at org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:85) at org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:104) at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50) at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748){noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1131) destroy pipeline failed with PipelineNotFoundException
[ https://issues.apache.org/jira/browse/HDDS-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1131: - Component/s: SCM > destroy pipeline failed with PipelineNotFoundException > -- > > Key: HDDS-1131 > URL: https://issues.apache.org/jira/browse/HDDS-1131 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: Nilotpal Nandi >Priority: Major > > steps taken : > > # created 12 datanodes cluster and running workload on all the nodes > exceptions seen in scm log > > {noformat} > 2019-02-18 07:17:51,112 INFO > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: destroying > pipeline:PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb with > group-012343D76ADB:[a40a7b01-a30b-469c-b373-9fcb20a126ed:172.27.54.212:9858, > 8c77b16b-8054-49e3-b669-1ff759cfd271:172.27.23.196:9858, > 943007c8-4fdd-4926-89e2-2c8c52c05073:172.27.76.72:9858] > 2019-02-18 07:17:51,112 INFO > org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler: Close > container Event triggered for container : #40 > 2019-02-18 07:17:51,113 INFO > org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler: Close > container Event triggered for container : #41 > 2019-02-18 07:17:51,114 INFO > org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler: Close > container Event triggered for container : #42 > 2019-02-18 07:22:51,127 WARN > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Pipeline destroy > failed for pipeline=PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb > dn=a40a7b01-a30b-469c-b373-9fcb20a126ed{ip: 172.27.54.212, host: > ctr-e139-1542663976389-62237-01-07.hwx.site} > 2019-02-18 07:22:51,139 WARN > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Pipeline destroy > failed for pipeline=PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb > dn=8c77b16b-8054-49e3-b669-1ff759cfd271{ip: 172.27.23.196, host: > ctr-e139-1542663976389-62237-01-15.hwx.site} > 2019-02-18 07:22:51,149 WARN > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Pipeline destroy > failed for pipeline=PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb > dn=943007c8-4fdd-4926-89e2-2c8c52c05073{ip: 172.27.76.72, host: > ctr-e139-1542663976389-62237-01-06.hwx.site} > 2019-02-18 07:22:51,150 ERROR > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Destroy pipeline > failed for pipeline:PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb with > group-012343D76ADB:[a40a7b01-a30b-469c-b373-9fcb20a126ed:172.27.54.212:9858, > 8c77b16b-8054-49e3-b669-1ff759cfd271:172.27.23.196:9858, > 943007c8-4fdd-4926-89e2-2c8c52c05073:172.27.76.72:9858] > org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: > PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb not found > at > org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getPipeline(PipelineStateMap.java:112) > at > org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.removePipeline(PipelineStateMap.java:247) > at > org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.removePipeline(PipelineStateManager.java:90) > at > org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.removePipeline(SCMPipelineManager.java:261) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils.destroyPipeline(RatisPipelineUtils.java:103) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils.lambda$finalizeAndDestroyPipeline$1(RatisPipelineUtils.java:133) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:85) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:104) > at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50) > at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1126) datanode is trying to qausi-close a container which is already closed
Nilotpal Nandi created HDDS-1126: Summary: datanode is trying to qausi-close a container which is already closed Key: HDDS-1126 URL: https://issues.apache.org/jira/browse/HDDS-1126 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Nilotpal Nandi steps taken : # created 12 datanodes cluster and running workload on all the nodes # running failure injection/restart on 1 datanode at a time periodically and randomly. Error seen in ozone.log : -- {noformat} 2019-02-18 06:06:32,780 [Datanode State Machine Thread - 0] DEBUG (DatanodeStateMachine.java:176) - Executing cycle Number : 30 2019-02-18 06:06:32,784 [Command processor thread] DEBUG (CloseContainerCommandHandler.java:71) - Processing Close Container command. 2019-02-18 06:06:32,785 [Datanode State Machine Thread - 0] DEBUG (DatanodeStateMachine.java:176) - Executing cycle Number : 31 2019-02-18 06:06:32,785 [Command processor thread] ERROR (CloseContainerCommandHandler.java:118) - Can't close container #37 org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: Cannot quasi close container #37 while in CLOSED state. at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.quasiCloseContainer(KeyValueHandler.java:903) at org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.quasiCloseContainer(ContainerController.java:93) at org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CloseContainerCommandHandler.handle(CloseContainerCommandHandler.java:110) at org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:93) at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$1(DatanodeStateMachine.java:413) at java.lang.Thread.run(Thread.java:748) 2019-02-18 06:06:32,785 [Command processor thread] DEBUG (CloseContainerCommandHandler.java:71) - Processing Close Container command. 2019-02-18 06:06:32,788 [Command processor thread] DEBUG (CloseContainerCommandHandler.java:71) - Processing Close Container command. 2019-02-18 06:06:32,788 [Datanode State Machine Thread - 0] DEBUG (DatanodeStateMachine.java:176) - Executing cycle Number : 32 2019-02-18 06:06:34,430 [main] DEBUG (OzoneClientFactory.java:287) - Using org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol. 2019-02-18 06:06:36,608 [main] DEBUG (OzoneClientFactory.java:287) - Using org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol. 2019-02-18 06:06:38,876 [main] DEBUG (OzoneClientFactory.java:287) - Using org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol. 2019-02-18 06:06:41,084 [main] DEBUG (OzoneClientFactory.java:287) - Using org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol. 2019-02-18 06:06:43,297 [main] DEBUG (OzoneClientFactory.java:287) - Using org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol. 2019-02-18 06:06:45,469 [main] DEBUG (OzoneClientFactory.java:287) - Using org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol. 2019-02-18 06:06:47,684 [main] DEBUG (OzoneClientFactory.java:287) - Using org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol. 2019-02-18 06:06:49,958 [main] DEBUG (OzoneClientFactory.java:287) - Using org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol. 2019-02-18 06:06:52,124 [main] DEBUG (OzoneClientFactory.java:287) - Using org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol. 2019-02-18 06:06:54,344 [main] DEBUG (OzoneClientFactory.java:287) - Using org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol. 2019-02-18 06:06:56,499 [main] DEBUG (OzoneClientFactory.java:287) - Using org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol. 2019-02-18 06:06:58,764 [main] DEBUG (OzoneClientFactory.java:287) - Using org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol. 2019-02-18 06:07:00,969 [main] DEBUG (OzoneClientFactory.java:287) - Using org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol. 2019-02-18 06:07:02,788 [Datanode State Machine Thread - 0] DEBUG (DatanodeStateMachine.java:176) - Executing cycle Number : 33 2019-02-18 06:07:03,240 [main] DEBUG (OzoneClientFactory.java:287) - Using org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol. 2019-02-18 06:07:05,486 [main] DEBUG (OzoneClientFactory.java:287) - Using org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol. {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1125) java.lang.InterruptedException seen in datanode logs
Nilotpal Nandi created HDDS-1125: Summary: java.lang.InterruptedException seen in datanode logs Key: HDDS-1125 URL: https://issues.apache.org/jira/browse/HDDS-1125 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Nilotpal Nandi steps taken : # created 12 datanodes cluster and running workload on all the nodes exception seen : - {noformat} 2019-02-15 10:16:48,713 ERROR org.apache.ratis.server.impl.LogAppender: 943007c8-4fdd-4926-89e2-2c8c52c05073: Failed readStateMachineData for (t:3, i:3084), STATEMACHINELOGENTRY, client-632E77ADA885, cid=6232 java.lang.InterruptedException at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:347) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) at org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:433) at org.apache.ratis.util.DataQueue.pollList(DataQueue.java:133) at org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:171) at org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152) at org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96) at org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:101) at java.lang.Thread.run(Thread.java:748) 2019-02-15 10:16:48,714 ERROR org.apache.ratis.server.impl.LogAppender: GrpcLogAppender(943007c8-4fdd-4926-89e2-2c8c52c05073 -> 8c77b16b-8054-49e3-b669-1ff759cfd271) hit IOException while loading raft log org.apache.ratis.server.storage.RaftLogIOException: 943007c8-4fdd-4926-89e2-2c8c52c05073: Failed readStateMachineData for (t:3, i:3084), STATEMACHINELOGENTRY, client-632E77ADA885, cid=6232 at org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:440) at org.apache.ratis.util.DataQueue.pollList(DataQueue.java:133) at org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:171) at org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152) at org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96) at org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:101) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.InterruptedException at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:347) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) at org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:433) ... 6 more 2019-02-15 10:16:48,715 ERROR org.apache.ratis.server.impl.LogAppender: 943007c8-4fdd-4926-89e2-2c8c52c05073: Failed readStateMachineData for (t:3, i:3084), STATEMACHINELOGENTRY, client-632E77ADA885, cid=6232 java.lang.InterruptedException at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:347) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) at org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:433) at org.apache.ratis.util.DataQueue.pollList(DataQueue.java:133) at org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:171) at org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152) at org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96) at org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:101) at java.lang.Thread.run(Thread.java:748) 2019-02-15 10:16:48,715 ERROR org.apache.ratis.server.impl.LogAppender: GrpcLogAppender(943007c8-4fdd-4926-89e2-2c8c52c05073 -> a40a7b01-a30b-469c-b373-9fcb20a126ed) hit IOException while loading raft log org.apache.ratis.server.storage.RaftLogIOException: 943007c8-4fdd-4926-89e2-2c8c52c05073: Failed readStateMachineData for (t:3, i:3084), STATEMACHINELOGENTRY, client-632E77ADA885, cid=6232 at org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:440) at org.apache.ratis.util.DataQueue.pollList(DataQueue.java:133) at org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:171) at org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152) at org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96) at org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:101) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.InterruptedException at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:347) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) at org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:433) ... 6 more 2019-02-15 10:16:48,723 WARN org.apache.ratis.grpc.client.GrpcClientProtocolService: 943007c8-4fdd-4926-89e2-2c8c52c05073-5: onError:
[jira] [Created] (HDDS-1124) java.lang.IllegalStateException exception in datanode log
Nilotpal Nandi created HDDS-1124: Summary: java.lang.IllegalStateException exception in datanode log Key: HDDS-1124 URL: https://issues.apache.org/jira/browse/HDDS-1124 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Nilotpal Nandi steps taken : # created 12 datanodes cluster and running workload on all the nodes exception seen : --- {noformat} 2019-02-15 10:15:53,355 INFO org.apache.ratis.server.storage.RaftLogWorker: 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: Rolled log segment from /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3036 to /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_3036-3047 2019-02-15 10:15:53,367 INFO org.apache.ratis.server.impl.RaftServerImpl: 943007c8-4fdd-4926-89e2-2c8c52c05073: set configuration 3048: [a40a7b01-a30b-469c-b373-9fcb20a126ed:172.27.54.212:9858, 8c77b16b-8054-49e3-b669-1ff759cfd271:172.27.23.196:9858, 943007c8-4fdd-4926-89e2-2c8c52c05073:172.27.76.72:9858], old=null at 3048 2019-02-15 10:15:53,523 INFO org.apache.ratis.server.storage.RaftLogWorker: 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: created new log segment /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3048 2019-02-15 10:15:53,580 ERROR org.apache.ratis.grpc.server.GrpcLogAppender: Failed onNext serverReply { requestorId: "943007c8-4fdd-4926-89e2-2c8c52c05073" replyId: "a40a7b01-a30b-469c-b373-9fcb20a126ed" raftGroupId { id: "\001\323\357*\221,O\300\200\266\001#C\327j\333" } success: true } term: 3 nextIndex: 3049 followerCommit: 3047 java.lang.IllegalStateException: reply's next index is 3049, request's previous is term: 1 index: 3047 at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60) at org.apache.ratis.grpc.server.GrpcLogAppender.onSuccess(GrpcLogAppender.java:285) at org.apache.ratis.grpc.server.GrpcLogAppender$AppendLogResponseHandler.onNextImpl(GrpcLogAppender.java:230) at org.apache.ratis.grpc.server.GrpcLogAppender$AppendLogResponseHandler.onNext(GrpcLogAppender.java:215) at org.apache.ratis.grpc.server.GrpcLogAppender$AppendLogResponseHandler.onNext(GrpcLogAppender.java:197) at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:421) at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33) at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33) at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:519) at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-02-15 10:15:56,442 INFO org.apache.ratis.server.storage.RaftLogWorker: 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: Rolling segment log-3048_3066 to index:3066 2019-02-15 10:15:56,442 INFO org.apache.ratis.server.storage.RaftLogWorker: 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: Rolled log segment from /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3048 to /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_3048-3066 2019-02-15 10:15:56,564 INFO org.apache.ratis.server.storage.RaftLogWorker: 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: created new log segment /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3067 2019-02-15 10:16:45,420 INFO org.apache.ratis.server.storage.RaftLogWorker: 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: Rolling segment log-3067_3077 to index:3077 {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1088) Add blockade Tests to test Replica Manager
[ https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1088: - Attachment: HDDS-1088.001.patch > Add blockade Tests to test Replica Manager > -- > > Key: HDDS-1088 > URL: https://issues.apache.org/jira/browse/HDDS-1088 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1088.001.patch > > > We need to add tests for testing Replica Manager for scenarios like loss of > node, adding new nodes, under-replicated containers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1088) Add blockade Tests to test Replica Manager
[ https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1088: - Status: Patch Available (was: Open) > Add blockade Tests to test Replica Manager > -- > > Key: HDDS-1088 > URL: https://issues.apache.org/jira/browse/HDDS-1088 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1088.001.patch > > > We need to add tests for testing Replica Manager for scenarios like loss of > node, adding new nodes, under-replicated containers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1088) Add blockade Tests to test Replica Manager
[ https://issues.apache.org/jira/browse/HDDS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi reassigned HDDS-1088: Assignee: Nilotpal Nandi > Add blockade Tests to test Replica Manager > -- > > Key: HDDS-1088 > URL: https://issues.apache.org/jira/browse/HDDS-1088 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > > We need to add tests for testing Replica Manager for scenarios like loss of > node, adding new nodes, under-replicated containers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1102) docker datanode stopped when new datanodes are added to the cluster
[ https://issues.apache.org/jira/browse/HDDS-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1102: - Attachment: allnode.log > docker datanode stopped when new datanodes are added to the cluster > > > Key: HDDS-1102 > URL: https://issues.apache.org/jira/browse/HDDS-1102 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Priority: Major > Attachments: allnode.log, datanode.log > > > steps taken: > > # created 5 datanode cluster. > # shutdown 2 datanodes > # started the datanodes again. > One of the datanodes was shut down. > exception seen : > > {noformat} > 2019-02-14 07:37:26 INFO LeaderElection:230 - > 6a0522ba-019e-4b77-ac1f-a9322cd525b8 got exception when requesting votes: {} > java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: > a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found. > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214) > at > org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146) > at org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102) > Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: > INTERNAL: a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found. > at > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:233) > at > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:214) > at > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:139) > at > org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:265) > at > org.apache.ratis.grpc.server.GrpcServerProtocolClient.requestVote(GrpcServerProtocolClient.java:83) > at org.apache.ratis.grpc.server.GrpcService.requestVote(GrpcService.java:187) > at > org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-02-14 07:37:26 INFO LeaderElection:46 - > 6a0522ba-019e-4b77-ac1f-a9322cd525b8: Election PASSED; received 1 response(s) > [6a0522ba-019e-4b77-ac1f-a9322cd525b8<-61ad3bf3-e9b1-48e5-90e3-3b78c8b5bba5#0:OK-t7] > and 1 exception(s); 6a0522ba-019e-4b77-ac1f-a9322cd525b8:t7, leader=null, > voted=6a0522ba-019e-4b77-ac1f-a9322cd525b8, > raftlog=6a0522ba-019e-4b77-ac1f-a9322cd525b8-SegmentedRaftLog:OPENED, conf=3: > [61ad3bf3-e9b1-48e5-90e3-3b78c8b5bba5:172.20.0.8:9858, > 6a0522ba-019e-4b77-ac1f-a9322cd525b8:172.20.0.6:9858, > 0f377918-aafa-4d8a-972a-6ead54048fba:172.20.0.3:9858], old=null > 2019-02-14 07:37:26 INFO LeaderElection:52 - 0: > java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: > a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found. > 2019-02-14 07:37:26 INFO RoleInfo:130 - 6a0522ba-019e-4b77-ac1f-a9322cd525b8: > shutdown LeaderElection > 2019-02-14 07:37:26 INFO RaftServerImpl:161 - > 6a0522ba-019e-4b77-ac1f-a9322cd525b8 changes role from CANDIDATE to LEADER at > term 7 for changeToLeader > 2019-02-14 07:37:26 INFO RaftServerImpl:258 - > 6a0522ba-019e-4b77-ac1f-a9322cd525b8: change Leader from null to > 6a0522ba-019e-4b77-ac1f-a9322cd525b8 at term 7 for becomeLeader, leader > elected after 1066ms > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - > raft.server.staging.catchup.gap = 1000 (default) > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.rpc.sleep.time > = 25ms (default) > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.watch.timeout > = 10s (default) > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - > raft.server.watch.timeout.denomination = 1s (default) > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - > raft.server.log.appender.snapshot.chunk.size.max = 16MB (=16777216) (default) > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - > raft.server.log.appender.buffer.byte-limit = 33554432 (custom) > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - >
[jira] [Commented] (HDDS-1102) docker datanode stopped when new datanodes are added to the cluster
[ https://issues.apache.org/jira/browse/HDDS-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16770480#comment-16770480 ] Nilotpal Nandi commented on HDDS-1102: -- Here is all node logs for a different run : [^allnode.log] > docker datanode stopped when new datanodes are added to the cluster > > > Key: HDDS-1102 > URL: https://issues.apache.org/jira/browse/HDDS-1102 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Priority: Major > Attachments: allnode.log, datanode.log > > > steps taken: > > # created 5 datanode cluster. > # shutdown 2 datanodes > # started the datanodes again. > One of the datanodes was shut down. > exception seen : > > {noformat} > 2019-02-14 07:37:26 INFO LeaderElection:230 - > 6a0522ba-019e-4b77-ac1f-a9322cd525b8 got exception when requesting votes: {} > java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: > a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found. > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214) > at > org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146) > at org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102) > Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: > INTERNAL: a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found. > at > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:233) > at > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:214) > at > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:139) > at > org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:265) > at > org.apache.ratis.grpc.server.GrpcServerProtocolClient.requestVote(GrpcServerProtocolClient.java:83) > at org.apache.ratis.grpc.server.GrpcService.requestVote(GrpcService.java:187) > at > org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-02-14 07:37:26 INFO LeaderElection:46 - > 6a0522ba-019e-4b77-ac1f-a9322cd525b8: Election PASSED; received 1 response(s) > [6a0522ba-019e-4b77-ac1f-a9322cd525b8<-61ad3bf3-e9b1-48e5-90e3-3b78c8b5bba5#0:OK-t7] > and 1 exception(s); 6a0522ba-019e-4b77-ac1f-a9322cd525b8:t7, leader=null, > voted=6a0522ba-019e-4b77-ac1f-a9322cd525b8, > raftlog=6a0522ba-019e-4b77-ac1f-a9322cd525b8-SegmentedRaftLog:OPENED, conf=3: > [61ad3bf3-e9b1-48e5-90e3-3b78c8b5bba5:172.20.0.8:9858, > 6a0522ba-019e-4b77-ac1f-a9322cd525b8:172.20.0.6:9858, > 0f377918-aafa-4d8a-972a-6ead54048fba:172.20.0.3:9858], old=null > 2019-02-14 07:37:26 INFO LeaderElection:52 - 0: > java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: > a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found. > 2019-02-14 07:37:26 INFO RoleInfo:130 - 6a0522ba-019e-4b77-ac1f-a9322cd525b8: > shutdown LeaderElection > 2019-02-14 07:37:26 INFO RaftServerImpl:161 - > 6a0522ba-019e-4b77-ac1f-a9322cd525b8 changes role from CANDIDATE to LEADER at > term 7 for changeToLeader > 2019-02-14 07:37:26 INFO RaftServerImpl:258 - > 6a0522ba-019e-4b77-ac1f-a9322cd525b8: change Leader from null to > 6a0522ba-019e-4b77-ac1f-a9322cd525b8 at term 7 for becomeLeader, leader > elected after 1066ms > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - > raft.server.staging.catchup.gap = 1000 (default) > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.rpc.sleep.time > = 25ms (default) > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.watch.timeout > = 10s (default) > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - > raft.server.watch.timeout.denomination = 1s (default) > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - > raft.server.log.appender.snapshot.chunk.size.max = 16MB (=16777216) (default) > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - > raft.server.log.appender.buffer.byte-limit = 33554432 (custom) > 2019-02-14
[jira] [Commented] (HDDS-1102) docker datanode stopped when new datanodes are added to the cluster
[ https://issues.apache.org/jira/browse/HDDS-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768068#comment-16768068 ] Nilotpal Nandi commented on HDDS-1102: -- datanode log of the node which was shutdown : [^datanode.log] > docker datanode stopped when new datanodes are added to the cluster > > > Key: HDDS-1102 > URL: https://issues.apache.org/jira/browse/HDDS-1102 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Priority: Major > Attachments: datanode.log > > > steps taken: > > # created 5 datanode cluster. > # shutdown 2 datanodes > # started the datanodes again. > One of the datanodes was shut down. > exception seen : > > {noformat} > 2019-02-14 07:37:26 INFO LeaderElection:230 - > 6a0522ba-019e-4b77-ac1f-a9322cd525b8 got exception when requesting votes: {} > java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: > a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found. > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214) > at > org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146) > at org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102) > Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: > INTERNAL: a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found. > at > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:233) > at > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:214) > at > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:139) > at > org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:265) > at > org.apache.ratis.grpc.server.GrpcServerProtocolClient.requestVote(GrpcServerProtocolClient.java:83) > at org.apache.ratis.grpc.server.GrpcService.requestVote(GrpcService.java:187) > at > org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-02-14 07:37:26 INFO LeaderElection:46 - > 6a0522ba-019e-4b77-ac1f-a9322cd525b8: Election PASSED; received 1 response(s) > [6a0522ba-019e-4b77-ac1f-a9322cd525b8<-61ad3bf3-e9b1-48e5-90e3-3b78c8b5bba5#0:OK-t7] > and 1 exception(s); 6a0522ba-019e-4b77-ac1f-a9322cd525b8:t7, leader=null, > voted=6a0522ba-019e-4b77-ac1f-a9322cd525b8, > raftlog=6a0522ba-019e-4b77-ac1f-a9322cd525b8-SegmentedRaftLog:OPENED, conf=3: > [61ad3bf3-e9b1-48e5-90e3-3b78c8b5bba5:172.20.0.8:9858, > 6a0522ba-019e-4b77-ac1f-a9322cd525b8:172.20.0.6:9858, > 0f377918-aafa-4d8a-972a-6ead54048fba:172.20.0.3:9858], old=null > 2019-02-14 07:37:26 INFO LeaderElection:52 - 0: > java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: > a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found. > 2019-02-14 07:37:26 INFO RoleInfo:130 - 6a0522ba-019e-4b77-ac1f-a9322cd525b8: > shutdown LeaderElection > 2019-02-14 07:37:26 INFO RaftServerImpl:161 - > 6a0522ba-019e-4b77-ac1f-a9322cd525b8 changes role from CANDIDATE to LEADER at > term 7 for changeToLeader > 2019-02-14 07:37:26 INFO RaftServerImpl:258 - > 6a0522ba-019e-4b77-ac1f-a9322cd525b8: change Leader from null to > 6a0522ba-019e-4b77-ac1f-a9322cd525b8 at term 7 for becomeLeader, leader > elected after 1066ms > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - > raft.server.staging.catchup.gap = 1000 (default) > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.rpc.sleep.time > = 25ms (default) > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.watch.timeout > = 10s (default) > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - > raft.server.watch.timeout.denomination = 1s (default) > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - > raft.server.log.appender.snapshot.chunk.size.max = 16MB (=16777216) (default) > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - > raft.server.log.appender.buffer.byte-limit = 33554432 (custom) > 2019-02-14 07:37:26 INFO
[jira] [Created] (HDDS-1102) docker datanode stopped when new datanodes are added to the cluster
Nilotpal Nandi created HDDS-1102: Summary: docker datanode stopped when new datanodes are added to the cluster Key: HDDS-1102 URL: https://issues.apache.org/jira/browse/HDDS-1102 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Nilotpal Nandi steps taken: # created 5 datanode cluster. # shutdown 2 datanodes # started the datanodes again. One of the datanodes was shut down. exception seen : {noformat} 2019-02-14 07:37:26 INFO LeaderElection:230 - 6a0522ba-019e-4b77-ac1f-a9322cd525b8 got exception when requesting votes: {} java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found. at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214) at org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146) at org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102) Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found. at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:233) at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:214) at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:139) at org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:265) at org.apache.ratis.grpc.server.GrpcServerProtocolClient.requestVote(GrpcServerProtocolClient.java:83) at org.apache.ratis.grpc.server.GrpcService.requestVote(GrpcService.java:187) at org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-02-14 07:37:26 INFO LeaderElection:46 - 6a0522ba-019e-4b77-ac1f-a9322cd525b8: Election PASSED; received 1 response(s) [6a0522ba-019e-4b77-ac1f-a9322cd525b8<-61ad3bf3-e9b1-48e5-90e3-3b78c8b5bba5#0:OK-t7] and 1 exception(s); 6a0522ba-019e-4b77-ac1f-a9322cd525b8:t7, leader=null, voted=6a0522ba-019e-4b77-ac1f-a9322cd525b8, raftlog=6a0522ba-019e-4b77-ac1f-a9322cd525b8-SegmentedRaftLog:OPENED, conf=3: [61ad3bf3-e9b1-48e5-90e3-3b78c8b5bba5:172.20.0.8:9858, 6a0522ba-019e-4b77-ac1f-a9322cd525b8:172.20.0.6:9858, 0f377918-aafa-4d8a-972a-6ead54048fba:172.20.0.3:9858], old=null 2019-02-14 07:37:26 INFO LeaderElection:52 - 0: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found. 2019-02-14 07:37:26 INFO RoleInfo:130 - 6a0522ba-019e-4b77-ac1f-a9322cd525b8: shutdown LeaderElection 2019-02-14 07:37:26 INFO RaftServerImpl:161 - 6a0522ba-019e-4b77-ac1f-a9322cd525b8 changes role from CANDIDATE to LEADER at term 7 for changeToLeader 2019-02-14 07:37:26 INFO RaftServerImpl:258 - 6a0522ba-019e-4b77-ac1f-a9322cd525b8: change Leader from null to 6a0522ba-019e-4b77-ac1f-a9322cd525b8 at term 7 for becomeLeader, leader elected after 1066ms 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.staging.catchup.gap = 1000 (default) 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.rpc.sleep.time = 25ms (default) 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.watch.timeout = 10s (default) 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.watch.timeout.denomination = 1s (default) 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.log.appender.snapshot.chunk.size.max = 16MB (=16777216) (default) 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.log.appender.buffer.byte-limit = 33554432 (custom) 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.log.appender.buffer.element-limit = 1 (custom) 2019-02-14 07:37:26 INFO GrpcConfigKeys$Server:43 - raft.grpc.server.leader.outstanding.appends.max = 128 (default) 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.rpc.request.timeout = 3000ms (default) 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.log.appender.snapshot.chunk.size.max = 16MB (=16777216) (default) 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 -
[jira] [Updated] (HDDS-1102) docker datanode stopped when new datanodes are added to the cluster
[ https://issues.apache.org/jira/browse/HDDS-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1102: - Attachment: datanode.log > docker datanode stopped when new datanodes are added to the cluster > > > Key: HDDS-1102 > URL: https://issues.apache.org/jira/browse/HDDS-1102 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Priority: Major > Attachments: datanode.log > > > steps taken: > > # created 5 datanode cluster. > # shutdown 2 datanodes > # started the datanodes again. > One of the datanodes was shut down. > exception seen : > > {noformat} > 2019-02-14 07:37:26 INFO LeaderElection:230 - > 6a0522ba-019e-4b77-ac1f-a9322cd525b8 got exception when requesting votes: {} > java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: > a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found. > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214) > at > org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146) > at org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102) > Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: > INTERNAL: a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found. > at > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:233) > at > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:214) > at > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:139) > at > org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:265) > at > org.apache.ratis.grpc.server.GrpcServerProtocolClient.requestVote(GrpcServerProtocolClient.java:83) > at org.apache.ratis.grpc.server.GrpcService.requestVote(GrpcService.java:187) > at > org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-02-14 07:37:26 INFO LeaderElection:46 - > 6a0522ba-019e-4b77-ac1f-a9322cd525b8: Election PASSED; received 1 response(s) > [6a0522ba-019e-4b77-ac1f-a9322cd525b8<-61ad3bf3-e9b1-48e5-90e3-3b78c8b5bba5#0:OK-t7] > and 1 exception(s); 6a0522ba-019e-4b77-ac1f-a9322cd525b8:t7, leader=null, > voted=6a0522ba-019e-4b77-ac1f-a9322cd525b8, > raftlog=6a0522ba-019e-4b77-ac1f-a9322cd525b8-SegmentedRaftLog:OPENED, conf=3: > [61ad3bf3-e9b1-48e5-90e3-3b78c8b5bba5:172.20.0.8:9858, > 6a0522ba-019e-4b77-ac1f-a9322cd525b8:172.20.0.6:9858, > 0f377918-aafa-4d8a-972a-6ead54048fba:172.20.0.3:9858], old=null > 2019-02-14 07:37:26 INFO LeaderElection:52 - 0: > java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: > a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found. > 2019-02-14 07:37:26 INFO RoleInfo:130 - 6a0522ba-019e-4b77-ac1f-a9322cd525b8: > shutdown LeaderElection > 2019-02-14 07:37:26 INFO RaftServerImpl:161 - > 6a0522ba-019e-4b77-ac1f-a9322cd525b8 changes role from CANDIDATE to LEADER at > term 7 for changeToLeader > 2019-02-14 07:37:26 INFO RaftServerImpl:258 - > 6a0522ba-019e-4b77-ac1f-a9322cd525b8: change Leader from null to > 6a0522ba-019e-4b77-ac1f-a9322cd525b8 at term 7 for becomeLeader, leader > elected after 1066ms > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - > raft.server.staging.catchup.gap = 1000 (default) > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.rpc.sleep.time > = 25ms (default) > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.watch.timeout > = 10s (default) > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - > raft.server.watch.timeout.denomination = 1s (default) > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - > raft.server.log.appender.snapshot.chunk.size.max = 16MB (=16777216) (default) > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - > raft.server.log.appender.buffer.byte-limit = 33554432 (custom) > 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - > raft.server.log.appender.buffer.element-limit
[jira] [Commented] (HDDS-1047) Fix TestRatisPipelineProvider#testCreatePipelineWithFactor
[ https://issues.apache.org/jira/browse/HDDS-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766311#comment-16766311 ] Nilotpal Nandi commented on HDDS-1047: -- [~bharatviswa] and [~linyiqun] Thanks for the comments. I have uploaded the patch with changes. > Fix TestRatisPipelineProvider#testCreatePipelineWithFactor > -- > > Key: HDDS-1047 > URL: https://issues.apache.org/jira/browse/HDDS-1047 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1047.001.patch, HDDS-1047.002.patch, > HDDS-1047.003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1047) Fix TestRatisPipelineProvider#testCreatePipelineWithFactor
[ https://issues.apache.org/jira/browse/HDDS-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1047: - Attachment: HDDS-1047.003.patch > Fix TestRatisPipelineProvider#testCreatePipelineWithFactor > -- > > Key: HDDS-1047 > URL: https://issues.apache.org/jira/browse/HDDS-1047 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1047.001.patch, HDDS-1047.002.patch, > HDDS-1047.003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1088) Add blockade Tests to test Replica Manager
Nilotpal Nandi created HDDS-1088: Summary: Add blockade Tests to test Replica Manager Key: HDDS-1088 URL: https://issues.apache.org/jira/browse/HDDS-1088 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Nilotpal Nandi We need to add tests for testing Replica Manager for scenarios like loss of node, adding new nodes, under-replicated containers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1082) OutOfMemoryError while reading key
Nilotpal Nandi created HDDS-1082: Summary: OutOfMemoryError while reading key Key: HDDS-1082 URL: https://issues.apache.org/jira/browse/HDDS-1082 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Nilotpal Nandi steps taken : # put key with size 100GB # Tried to read back the key. error thrown: -- {noformat} java.lang.OutOfMemoryError: Java heap space Dumping heap to /tmp/heapdump.bin ... Heap dump file created [3883178021 bytes in 10.667 secs] Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at org.apache.ratis.thirdparty.com.google.protobuf.ByteString.toByteArray(ByteString.java:643) at org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:217) at org.apache.hadoop.hdds.scm.storage.BlockInputStream.readChunkFromContainer(BlockInputStream.java:227) at org.apache.hadoop.hdds.scm.storage.BlockInputStream.prepareRead(BlockInputStream.java:188) at org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:130) at org.apache.hadoop.ozone.client.io.KeyInputStream$ChunkInputStreamEntry.read(KeyInputStream.java:232) at org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:126) at org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:49) at java.io.InputStream.read(InputStream.java:101) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:100) at org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:98) at org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:48) at picocli.CommandLine.execute(CommandLine.java:919) at picocli.CommandLine.access$700(CommandLine.java:104) at picocli.CommandLine$RunLast.handle(CommandLine.java:1083) at picocli.CommandLine$RunLast.handle(CommandLine.java:1051) at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959) at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242) at picocli.CommandLine.parseWithHandler(CommandLine.java:1181) at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:61) at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:52) at org.apache.hadoop.ozone.web.ozShell.Shell.main(Shell.java:83){noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1079) java.lang.RuntimeException: ManagedChannel allocation site exception seen on client cli when datanode restarted in one of the pipelines
[ https://issues.apache.org/jira/browse/HDDS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1079: - Attachment: nodes-ozone-logs-1549879783.tar.gz > java.lang.RuntimeException: ManagedChannel allocation site exception seen on > client cli when datanode restarted in one of the pipelines > --- > > Key: HDDS-1079 > URL: https://issues.apache.org/jira/browse/HDDS-1079 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Nilotpal Nandi >Priority: Major > Attachments: nodes-ozone-logs-1549879783.tar.gz > > > steps taken : > > # created 12 datanode cluster. > # started put key operation with size 100GB. > # Restarted one of the datanodes from one of the pipelines. > exception seen on cli : > > > {noformat} > [root@ctr-e139-1542663976389-62237-01-06 ~]# time ozone sh key put > volume1/bucket1/key1 /root/100G > Feb 11, 2019 9:12:49 AM > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference > cleanQueue > SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=61, > target=172.27.10.133:9858} was not shutdown properly!!! ~*~*~* > Make sure to call shutdown()/shutdownNow() and wait until awaitTermination() > returns true. > java.lang.RuntimeException: ManagedChannel allocation site > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103) > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53) > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:44) > at > org.apache.ratis.thirdparty.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:411) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.(GrpcClientProtocolClient.java:116) > at > org.apache.ratis.grpc.client.GrpcClientRpc.lambda$new$0(GrpcClientRpc.java:54) > at > org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$getProxy$0(PeerProxyMap.java:60) > at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:191) > at > org.apache.ratis.util.PeerProxyMap$PeerAndProxy.getProxy(PeerProxyMap.java:59) > at org.apache.ratis.util.PeerProxyMap.getProxy(PeerProxyMap.java:106) > at > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequestAsync(GrpcClientRpc.java:69) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequestAsync(RaftClientImpl.java:324) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequestWithRetryAsync(RaftClientImpl.java:286) > at > org.apache.ratis.util.SlidingWindow$Client.sendOrDelayRequest(SlidingWindow.java:243) > at org.apache.ratis.util.SlidingWindow$Client.retry(SlidingWindow.java:259) > at > org.apache.ratis.client.impl.RaftClientImpl.lambda$null$10(RaftClientImpl.java:293) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:85) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:104) > at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50) > at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Feb 11, 2019 9:12:49 AM > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference > cleanQueue > SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=29, > target=172.27.10.133:9858} was not shutdown properly!!! ~*~*~* > Make sure to call shutdown()/shutdownNow() and wait until awaitTermination() > returns true. > java.lang.RuntimeException: ManagedChannel allocation site > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103) > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53) > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:44) >
[jira] [Commented] (HDDS-1079) java.lang.RuntimeException: ManagedChannel allocation site exception seen on client cli when datanode restarted in one of the pipelines
[ https://issues.apache.org/jira/browse/HDDS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764832#comment-16764832 ] Nilotpal Nandi commented on HDDS-1079: -- logs present at : [^nodes-ozone-logs-1549879783.tar.gz] > java.lang.RuntimeException: ManagedChannel allocation site exception seen on > client cli when datanode restarted in one of the pipelines > --- > > Key: HDDS-1079 > URL: https://issues.apache.org/jira/browse/HDDS-1079 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Nilotpal Nandi >Priority: Major > Attachments: nodes-ozone-logs-1549879783.tar.gz > > > steps taken : > > # created 12 datanode cluster. > # started put key operation with size 100GB. > # Restarted one of the datanodes from one of the pipelines. > exception seen on cli : > > > {noformat} > [root@ctr-e139-1542663976389-62237-01-06 ~]# time ozone sh key put > volume1/bucket1/key1 /root/100G > Feb 11, 2019 9:12:49 AM > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference > cleanQueue > SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=61, > target=172.27.10.133:9858} was not shutdown properly!!! ~*~*~* > Make sure to call shutdown()/shutdownNow() and wait until awaitTermination() > returns true. > java.lang.RuntimeException: ManagedChannel allocation site > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103) > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53) > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:44) > at > org.apache.ratis.thirdparty.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:411) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.(GrpcClientProtocolClient.java:116) > at > org.apache.ratis.grpc.client.GrpcClientRpc.lambda$new$0(GrpcClientRpc.java:54) > at > org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$getProxy$0(PeerProxyMap.java:60) > at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:191) > at > org.apache.ratis.util.PeerProxyMap$PeerAndProxy.getProxy(PeerProxyMap.java:59) > at org.apache.ratis.util.PeerProxyMap.getProxy(PeerProxyMap.java:106) > at > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequestAsync(GrpcClientRpc.java:69) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequestAsync(RaftClientImpl.java:324) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequestWithRetryAsync(RaftClientImpl.java:286) > at > org.apache.ratis.util.SlidingWindow$Client.sendOrDelayRequest(SlidingWindow.java:243) > at org.apache.ratis.util.SlidingWindow$Client.retry(SlidingWindow.java:259) > at > org.apache.ratis.client.impl.RaftClientImpl.lambda$null$10(RaftClientImpl.java:293) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:85) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:104) > at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50) > at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Feb 11, 2019 9:12:49 AM > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference > cleanQueue > SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=29, > target=172.27.10.133:9858} was not shutdown properly!!! ~*~*~* > Make sure to call shutdown()/shutdownNow() and wait until awaitTermination() > returns true. > java.lang.RuntimeException: ManagedChannel allocation site > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103) > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53) > at >
[jira] [Created] (HDDS-1079) java.lang.RuntimeException: ManagedChannel allocation site exception seen on client cli when datanode restarted in one of the pipelines
Nilotpal Nandi created HDDS-1079: Summary: java.lang.RuntimeException: ManagedChannel allocation site exception seen on client cli when datanode restarted in one of the pipelines Key: HDDS-1079 URL: https://issues.apache.org/jira/browse/HDDS-1079 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Client Reporter: Nilotpal Nandi steps taken : # created 12 datanode cluster. # started put key operation with size 100GB. # Restarted one of the datanodes from one of the pipelines. exception seen on cli : {noformat} [root@ctr-e139-1542663976389-62237-01-06 ~]# time ozone sh key put volume1/bucket1/key1 /root/100G Feb 11, 2019 9:12:49 AM org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference cleanQueue SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=61, target=172.27.10.133:9858} was not shutdown properly!!! ~*~*~* Make sure to call shutdown()/shutdownNow() and wait until awaitTermination() returns true. java.lang.RuntimeException: ManagedChannel allocation site at org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103) at org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53) at org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:44) at org.apache.ratis.thirdparty.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:411) at org.apache.ratis.grpc.client.GrpcClientProtocolClient.(GrpcClientProtocolClient.java:116) at org.apache.ratis.grpc.client.GrpcClientRpc.lambda$new$0(GrpcClientRpc.java:54) at org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$getProxy$0(PeerProxyMap.java:60) at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:191) at org.apache.ratis.util.PeerProxyMap$PeerAndProxy.getProxy(PeerProxyMap.java:59) at org.apache.ratis.util.PeerProxyMap.getProxy(PeerProxyMap.java:106) at org.apache.ratis.grpc.client.GrpcClientRpc.sendRequestAsync(GrpcClientRpc.java:69) at org.apache.ratis.client.impl.RaftClientImpl.sendRequestAsync(RaftClientImpl.java:324) at org.apache.ratis.client.impl.RaftClientImpl.sendRequestWithRetryAsync(RaftClientImpl.java:286) at org.apache.ratis.util.SlidingWindow$Client.sendOrDelayRequest(SlidingWindow.java:243) at org.apache.ratis.util.SlidingWindow$Client.retry(SlidingWindow.java:259) at org.apache.ratis.client.impl.RaftClientImpl.lambda$null$10(RaftClientImpl.java:293) at org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:85) at org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:104) at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50) at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Feb 11, 2019 9:12:49 AM org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference cleanQueue SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=29, target=172.27.10.133:9858} was not shutdown properly!!! ~*~*~* Make sure to call shutdown()/shutdownNow() and wait until awaitTermination() returns true. java.lang.RuntimeException: ManagedChannel allocation site at org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103) at org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53) at org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:44) at org.apache.ratis.thirdparty.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:411) at org.apache.ratis.grpc.client.GrpcClientProtocolClient.(GrpcClientProtocolClient.java:116) at org.apache.ratis.grpc.client.GrpcClientRpc.lambda$new$0(GrpcClientRpc.java:54) at org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$getProxy$0(PeerProxyMap.java:60) at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:191) at
[jira] [Commented] (HDDS-1047) Fix TestRatisPipelineProvider#testCreatePipelineWithFactor
[ https://issues.apache.org/jira/browse/HDDS-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762997#comment-16762997 ] Nilotpal Nandi commented on HDDS-1047: -- [~linyiqun] , thanks for the review. I have addressed the comment and uploaded a new patch. > Fix TestRatisPipelineProvider#testCreatePipelineWithFactor > -- > > Key: HDDS-1047 > URL: https://issues.apache.org/jira/browse/HDDS-1047 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1047.001.patch, HDDS-1047.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1047) Fix TestRatisPipelineProvider#testCreatePipelineWithFactor
[ https://issues.apache.org/jira/browse/HDDS-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1047: - Attachment: HDDS-1047.002.patch > Fix TestRatisPipelineProvider#testCreatePipelineWithFactor > -- > > Key: HDDS-1047 > URL: https://issues.apache.org/jira/browse/HDDS-1047 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1047.001.patch, HDDS-1047.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1040) Add blockade Tests for client failures
[ https://issues.apache.org/jira/browse/HDDS-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1040: - Attachment: HDDS-1040.003.patch > Add blockade Tests for client failures > -- > > Key: HDDS-1040 > URL: https://issues.apache.org/jira/browse/HDDS-1040 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.4.0 >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Fix For: 0.4.0 > > Attachments: HDDS-1040.001.patch, HDDS-1040.002.patch, > HDDS-1040.003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1067) freon run on client gets hung when two of the datanodes are down in 3 datanode cluster
Nilotpal Nandi created HDDS-1067: Summary: freon run on client gets hung when two of the datanodes are down in 3 datanode cluster Key: HDDS-1067 URL: https://issues.apache.org/jira/browse/HDDS-1067 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Client Reporter: Nilotpal Nandi steps taken : # created 3 node docker cluster. # wrote a key # created partition such that 2 out of 3 datanodes cannot communicate with any other node. # Third datanode can communicate with scm, om and the client. # ran freon to write key Observation : - freon run is hung. There is no timeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1057) get key operation fails when client cannot communicate with 2 of the datanodes in 3 node cluster
[ https://issues.apache.org/jira/browse/HDDS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1057: - Description: steps taken : -- # created 3 node docker cluster. # wrote a key # created partition such that 2 out of 3 datanodes cannot communicate with any other node. # Third datanode can communicate with scm, om and the client. # Tried to read the key Exception seen : {noformat} Failed to execute command cmdType: GetBlock E traceID: "9b3ebd93-e598-4ca2-a6f4-2389f2d35f63" E containerID: 22 E datanodeUuid: "15345663-15c9-4fe3-9b8f-a46123ba8a6e" E getBlock { E blockID { E containerID: 22 E localID: 101545011736215553 E blockCommitSequenceId: 5 E } E } E on datanode 15345663-15c9-4fe3-9b8f-a46123ba8a6e E java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception E at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) E at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) E at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:220) E at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:201) E at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:118) E at org.apache.hadoop.ozone.client.io.KeyInputStream.getFromOmKeyInfo(KeyInputStream.java:305) E at org.apache.hadoop.ozone.client.rpc.RpcClient.getKey(RpcClient.java:608) E at org.apache.hadoop.ozone.client.OzoneBucket.readKey(OzoneBucket.java:284) E at org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:95) E at org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:48) E at picocli.CommandLine.execute(CommandLine.java:919) E at picocli.CommandLine.access$700(CommandLine.java:104) E at picocli.CommandLine$RunLast.handle(CommandLine.java:1083) E at picocli.CommandLine$RunLast.handle(CommandLine.java:1051) E at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959) E at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242) E at picocli.CommandLine.parseWithHandler(CommandLine.java:1181) E at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:61) E at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:52) E at org.apache.hadoop.ozone.web.ozShell.Shell.main(Shell.java:83) E Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception E at org.apache.ratis.thirdparty.io.grpc.Status.asRuntimeException(Status.java:526) E at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434) E at org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) E at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) E at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) E at org.apache.ratis.thirdparty.io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:678) E at org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) E at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) E at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) E at org.apache.ratis.thirdparty.io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397) E at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459) E at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63) E at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546) E at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467) E at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584) E at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) E at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) E at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) E at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) E at java.lang.Thread.run(Thread.java:748) E Caused by:
[jira] [Updated] (HDDS-1057) get key operation fails when client cannot communicate with 2 of the datanodes in 3 node cluster
[ https://issues.apache.org/jira/browse/HDDS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1057: - Attachment: test_client_failure_isolate_two_datanodes_all_docker.log > get key operation fails when client cannot communicate with 2 of the > datanodes in 3 node cluster > > > Key: HDDS-1057 > URL: https://issues.apache.org/jira/browse/HDDS-1057 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Nilotpal Nandi >Priority: Major > Attachments: test_client_failure_isolate_two_datanodes_all_docker.log > > > steps taken : > -- > # created 3 node docker cluster. > # wrote a key > # created partition such that 2 out of 3 datanodes cannot communicate with > any other node. > # Third datanode can communicate with all other nodes. > # Tried to read the key > Exception seen : > > > {noformat} > Failed to execute command cmdType: GetBlock > E traceID: "9b3ebd93-e598-4ca2-a6f4-2389f2d35f63" > E containerID: 22 > E datanodeUuid: "15345663-15c9-4fe3-9b8f-a46123ba8a6e" > E getBlock { > E blockID { > E containerID: 22 > E localID: 101545011736215553 > E blockCommitSequenceId: 5 > E } > E } > E on datanode 15345663-15c9-4fe3-9b8f-a46123ba8a6e > E java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception > E at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > E at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > E at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:220) > E at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:201) > E at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:118) > E at > org.apache.hadoop.ozone.client.io.KeyInputStream.getFromOmKeyInfo(KeyInputStream.java:305) > E at org.apache.hadoop.ozone.client.rpc.RpcClient.getKey(RpcClient.java:608) > E at org.apache.hadoop.ozone.client.OzoneBucket.readKey(OzoneBucket.java:284) > E at > org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:95) > E at > org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:48) > E at picocli.CommandLine.execute(CommandLine.java:919) > E at picocli.CommandLine.access$700(CommandLine.java:104) > E at picocli.CommandLine$RunLast.handle(CommandLine.java:1083) > E at picocli.CommandLine$RunLast.handle(CommandLine.java:1051) > E at > picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959) > E at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242) > E at picocli.CommandLine.parseWithHandler(CommandLine.java:1181) > E at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:61) > E at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:52) > E at org.apache.hadoop.ozone.web.ozShell.Shell.main(Shell.java:83) > E Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: > UNAVAILABLE: io exception > E at > org.apache.ratis.thirdparty.io.grpc.Status.asRuntimeException(Status.java:526) > E at > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434) > E at > org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) > E at > org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) > E at > org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) > E at > org.apache.ratis.thirdparty.io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:678) > E at > org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) > E at > org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) > E at > org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) > E at > org.apache.ratis.thirdparty.io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397) > E at > org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459) > E at > org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63) > E at >
[jira] [Commented] (HDDS-1057) get key operation fails when client cannot communicate with 2 of the datanodes in 3 node cluster
[ https://issues.apache.org/jira/browse/HDDS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761690#comment-16761690 ] Nilotpal Nandi commented on HDDS-1057: -- logs present at : [^test_client_failure_isolate_two_datanodes_all_docker.log] > get key operation fails when client cannot communicate with 2 of the > datanodes in 3 node cluster > > > Key: HDDS-1057 > URL: https://issues.apache.org/jira/browse/HDDS-1057 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Nilotpal Nandi >Priority: Major > Attachments: test_client_failure_isolate_two_datanodes_all_docker.log > > > steps taken : > -- > # created 3 node docker cluster. > # wrote a key > # created partition such that 2 out of 3 datanodes cannot communicate with > any other node. > # Third datanode can communicate with all other nodes. > # Tried to read the key > Exception seen : > > > {noformat} > Failed to execute command cmdType: GetBlock > E traceID: "9b3ebd93-e598-4ca2-a6f4-2389f2d35f63" > E containerID: 22 > E datanodeUuid: "15345663-15c9-4fe3-9b8f-a46123ba8a6e" > E getBlock { > E blockID { > E containerID: 22 > E localID: 101545011736215553 > E blockCommitSequenceId: 5 > E } > E } > E on datanode 15345663-15c9-4fe3-9b8f-a46123ba8a6e > E java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception > E at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > E at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > E at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:220) > E at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:201) > E at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:118) > E at > org.apache.hadoop.ozone.client.io.KeyInputStream.getFromOmKeyInfo(KeyInputStream.java:305) > E at org.apache.hadoop.ozone.client.rpc.RpcClient.getKey(RpcClient.java:608) > E at org.apache.hadoop.ozone.client.OzoneBucket.readKey(OzoneBucket.java:284) > E at > org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:95) > E at > org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:48) > E at picocli.CommandLine.execute(CommandLine.java:919) > E at picocli.CommandLine.access$700(CommandLine.java:104) > E at picocli.CommandLine$RunLast.handle(CommandLine.java:1083) > E at picocli.CommandLine$RunLast.handle(CommandLine.java:1051) > E at > picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959) > E at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242) > E at picocli.CommandLine.parseWithHandler(CommandLine.java:1181) > E at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:61) > E at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:52) > E at org.apache.hadoop.ozone.web.ozShell.Shell.main(Shell.java:83) > E Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: > UNAVAILABLE: io exception > E at > org.apache.ratis.thirdparty.io.grpc.Status.asRuntimeException(Status.java:526) > E at > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434) > E at > org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) > E at > org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) > E at > org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) > E at > org.apache.ratis.thirdparty.io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:678) > E at > org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) > E at > org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) > E at > org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) > E at > org.apache.ratis.thirdparty.io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397) > E at > org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459) > E at > org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63) > E at >
[jira] [Created] (HDDS-1057) get key operation fails when client cannot communicate with 2 of the datanodes in 3 node cluster
Nilotpal Nandi created HDDS-1057: Summary: get key operation fails when client cannot communicate with 2 of the datanodes in 3 node cluster Key: HDDS-1057 URL: https://issues.apache.org/jira/browse/HDDS-1057 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Client Reporter: Nilotpal Nandi steps taken : -- # created 3 node docker cluster. # wrote a key # created partition such that 2 out of 3 datanodes cannot communicate with any other node. # Third datanode can communicate with all other nodes. # Tried to read the key Exception seen : {noformat} Failed to execute command cmdType: GetBlock E traceID: "9b3ebd93-e598-4ca2-a6f4-2389f2d35f63" E containerID: 22 E datanodeUuid: "15345663-15c9-4fe3-9b8f-a46123ba8a6e" E getBlock { E blockID { E containerID: 22 E localID: 101545011736215553 E blockCommitSequenceId: 5 E } E } E on datanode 15345663-15c9-4fe3-9b8f-a46123ba8a6e E java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception E at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) E at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) E at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:220) E at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:201) E at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:118) E at org.apache.hadoop.ozone.client.io.KeyInputStream.getFromOmKeyInfo(KeyInputStream.java:305) E at org.apache.hadoop.ozone.client.rpc.RpcClient.getKey(RpcClient.java:608) E at org.apache.hadoop.ozone.client.OzoneBucket.readKey(OzoneBucket.java:284) E at org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:95) E at org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:48) E at picocli.CommandLine.execute(CommandLine.java:919) E at picocli.CommandLine.access$700(CommandLine.java:104) E at picocli.CommandLine$RunLast.handle(CommandLine.java:1083) E at picocli.CommandLine$RunLast.handle(CommandLine.java:1051) E at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959) E at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242) E at picocli.CommandLine.parseWithHandler(CommandLine.java:1181) E at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:61) E at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:52) E at org.apache.hadoop.ozone.web.ozShell.Shell.main(Shell.java:83) E Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception E at org.apache.ratis.thirdparty.io.grpc.Status.asRuntimeException(Status.java:526) E at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434) E at org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) E at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) E at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) E at org.apache.ratis.thirdparty.io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:678) E at org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) E at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) E at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) E at org.apache.ratis.thirdparty.io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397) E at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459) E at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63) E at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546) E at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467) E at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584) E at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) E at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) E at
[jira] [Updated] (HDDS-1040) Add blockade Tests for client failures
[ https://issues.apache.org/jira/browse/HDDS-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1040: - Attachment: HDDS-1040.002.patch > Add blockade Tests for client failures > -- > > Key: HDDS-1040 > URL: https://issues.apache.org/jira/browse/HDDS-1040 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.4.0 >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Fix For: 0.4.0 > > Attachments: HDDS-1040.001.patch, HDDS-1040.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1027) Add blockade Tests for datanode isolation and scm failures
[ https://issues.apache.org/jira/browse/HDDS-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1027: - Attachment: (was: HDDS-1027.002.patch) > Add blockade Tests for datanode isolation and scm failures > -- > > Key: HDDS-1027 > URL: https://issues.apache.org/jira/browse/HDDS-1027 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1027.001.patch, HDDS-1027.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1027) Add blockade Tests for datanode isolation and scm failures
[ https://issues.apache.org/jira/browse/HDDS-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1027: - Attachment: HDDS-1027.002.patch > Add blockade Tests for datanode isolation and scm failures > -- > > Key: HDDS-1027 > URL: https://issues.apache.org/jira/browse/HDDS-1027 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1027.001.patch, HDDS-1027.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1040) Add blockade Tests for client failures
[ https://issues.apache.org/jira/browse/HDDS-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi reassigned HDDS-1040: Assignee: Nilotpal Nandi > Add blockade Tests for client failures > -- > > Key: HDDS-1040 > URL: https://issues.apache.org/jira/browse/HDDS-1040 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1040.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1040) Add blockade Tests for client failures
[ https://issues.apache.org/jira/browse/HDDS-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1040: - Status: Patch Available (was: Open) > Add blockade Tests for client failures > -- > > Key: HDDS-1040 > URL: https://issues.apache.org/jira/browse/HDDS-1040 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1040.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1040) Add blockade Tests for client failures
[ https://issues.apache.org/jira/browse/HDDS-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1040: - Attachment: HDDS-1040.001.patch > Add blockade Tests for client failures > -- > > Key: HDDS-1040 > URL: https://issues.apache.org/jira/browse/HDDS-1040 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1040.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1047) Fix TestRatisPipelineProvider#testCreatePipelineWithFactor
[ https://issues.apache.org/jira/browse/HDDS-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1047: - Attachment: HDDS-1047.001.patch > Fix TestRatisPipelineProvider#testCreatePipelineWithFactor > -- > > Key: HDDS-1047 > URL: https://issues.apache.org/jira/browse/HDDS-1047 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1047.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1047) Fix TestRatisPipelineProvider#testCreatePipelineWithFactor
[ https://issues.apache.org/jira/browse/HDDS-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi reassigned HDDS-1047: Assignee: Nilotpal Nandi > Fix TestRatisPipelineProvider#testCreatePipelineWithFactor > -- > > Key: HDDS-1047 > URL: https://issues.apache.org/jira/browse/HDDS-1047 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1047.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1047) Fix TestRatisPipelineProvider#testCreatePipelineWithFactor
[ https://issues.apache.org/jira/browse/HDDS-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-1047: - Status: Patch Available (was: Open) > Fix TestRatisPipelineProvider#testCreatePipelineWithFactor > -- > > Key: HDDS-1047 > URL: https://issues.apache.org/jira/browse/HDDS-1047 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1047.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-997) Add blockade Tests for scm isolation and mixed node isolation
[ https://issues.apache.org/jira/browse/HDDS-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-997: Attachment: HDDS-997.003.patch > Add blockade Tests for scm isolation and mixed node isolation > - > > Key: HDDS-997 > URL: https://issues.apache.org/jira/browse/HDDS-997 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.4.0 >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-997.001.patch, HDDS-997.002.patch, > HDDS-997.003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-997) Add blockade Tests for scm isolation and mixed node isolation
[ https://issues.apache.org/jira/browse/HDDS-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi updated HDDS-997: Attachment: (was: HDDS-997.003.patch) > Add blockade Tests for scm isolation and mixed node isolation > - > > Key: HDDS-997 > URL: https://issues.apache.org/jira/browse/HDDS-997 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.4.0 >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-997.001.patch, HDDS-997.002.patch, > HDDS-997.003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org