On 2015/9/30 14:32, gjprabu wrote: > Hi Joseph, > > Thanks for your reply, our self we made down node1 and this the > logs showing. Now to all the nodes are reachable and still we are facing i/o > errors for particular directories while move the data's in the same disk > (Copying is not having any problem). Even we do have problem while > unmountinng. umount process goes to "D" stat, then i need to restart server > itself. Is there any solution for this issue. > > ls -althr > ls: cannot access MICKEYLITE_3_0_M4_1_TEST: Input/output error > ls: cannot access MICKEYLITE_3_0_M4_1_OLD: Input/output error > total 0 > d????????? ? ? ? ? ? MICKEYLITE_3_0_M4_1_TEST > d????????? ? ? ? ? ? MICKEYLITE_3_0_M4_1_OLD > Regards > G.J > ** > Could you please show me the umount process stack? cat /proc/<pid>/stack
> > > ---- On Wed, 30 Sep 2015 08:54:16 +0530 *Joseph Qi <joseph...@huawei.com>* > wrote ---- > > Hi Prabu, > > [193918.928968] (kworker/u128:1,51132,30):dlm_do_assert_master:1717 > ERROR: Error -112 when sending message 502 (key 0xc3460ae7) to node 1 > [193918.929004] > (kworker/u128:3,63088,31):dlm_send_remote_convert_request:392 ERROR: Error > -112 when sending message 504 (key 0xc3460ae7) to node 1 > > The above error messages show that the link between this node and node > 1 is down. So that it cannot send dlm messages. > > > On 2015/9/29 19:52, gjprabu wrote: > > > > Hi Joseph, > > > > Our self testing purpose we reboot Node1 and Node7 this is the log > shows. I have cross checked configuration in /etc/ocfs2/cluster.conf and its > fine. Anybody help on this issue. Hope this issue related to OCFS2 not on RBD. > > > > /sys/kernel/config/cluster/ocfs2/node/ > > [root@integ-cm2 node]# ls > > integ-ci-1 integ-cm1 integ-cm2 integ-hm2 integ-hm5 integ-hm8 integ-hm9 > > > > Also pls find missed out logs. > > > > [ 475.407086] o2dlm: Joining domain A895BC216BE641A8A7E20AA89D57E051 ( > 1 3 4 7 ) 4 nodes > > [ 880.734421] o2dlm: Node 2 joins domain > A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 7 ) 5 nodes > > [ 892.746728] o2dlm: Node 2 leaves domain > A895BC216BE641A8A7E20AA89D57E051 ( 1 3 4 7 ) 4 nodes > > [ 905.264066] o2dlm: Node 2 joins domain > A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 7 ) 5 nodes > > [12313.418294] o2cb: o2dlm has evicted node 1 from domain > A895BC216BE641A8A7E20AA89D57E051 > > [12315.042208] o2cb: o2dlm has evicted node 1 from domain > A895BC216BE641A8A7E20AA89D57E051 > > [12315.402103] o2dlm: Begin recovery on domain > A895BC216BE641A8A7E20AA89D57E051 for node 1 > > [12315.402111] o2dlm: Node 4 (he) is the Recovery Master for the dead > node 1 in domain A895BC216BE641A8A7E20AA89D57E051 > > [12315.402114] o2dlm: End recovery on domain > A895BC216BE641A8A7E20AA89D57E051 > > [12320.402074] o2dlm: Begin recovery on domain > A895BC216BE641A8A7E20AA89D57E051 for node 1 > > [12320.402080] o2dlm: Node 4 (he) is the Recovery Master for the dead > node 1 in domain A895BC216BE641A8A7E20AA89D57E051 > > [12320.402083] o2dlm: End recovery on domain > A895BC216BE641A8A7E20AA89D57E051 > > [12698.830376] o2dlm: Node 1 joins domain > A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 7 ) 5 nodes > > [181348.383986] o2cb: o2dlm has evicted node 7 from domain > A895BC216BE641A8A7E20AA89D57E051 > > [181349.048120] o2cb: o2dlm has evicted node 7 from domain > A895BC216BE641A8A7E20AA89D57E051 > > [181351.972048] o2dlm: Begin recovery on domain > A895BC216BE641A8A7E20AA89D57E051 for node 7 > > [181351.972056] o2dlm: Node 1 (he) is the Recovery Master for the dead > node 7 in domain A895BC216BE641A8A7E20AA89D57E051 > > [181351.972059] o2dlm: End recovery on domain > A895BC216BE641A8A7E20AA89D57E051 > > [181356.972035] o2dlm: Begin recovery on domain > A895BC216BE641A8A7E20AA89D57E051 for node 7 > > [181356.972040] o2dlm: Node 1 (he) is the Recovery Master for the dead > node 7 in domain A895BC216BE641A8A7E20AA89D57E051 > > [181356.972042] o2dlm: End recovery on domain > A895BC216BE641A8A7E20AA89D57E051 > > [181361.972046] o2dlm: Begin recovery on domain > A895BC216BE641A8A7E20AA89D57E051 for node 7 > > [181361.972054] o2dlm: Node 1 (he) is the Recovery Master for the dead > node 7 in domain A895BC216BE641A8A7E20AA89D57E051 > > [181361.972057] o2dlm: End recovery on domain > A895BC216BE641A8A7E20AA89D57E051 > > [181366.972049] o2dlm: Begin recovery on domain > A895BC216BE641A8A7E20AA89D57E051 for node 7 > > [181366.972056] o2dlm: Node 1 (he) is the Recovery Master for the dead > node 7 in domain A895BC216BE641A8A7E20AA89D57E051 > > [181366.972059] o2dlm: End recovery on domain > A895BC216BE641A8A7E20AA89D57E051 > > [181599.543509] o2dlm: Node 7 joins domain > A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 7 ) 5 nodes > > [183251.706097] o2dlm: Node 7 leaves domain > A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 ) 4 nodes > > [183462.532465] o2dlm: Node 7 joins domain > A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 7 ) 5 nodes > > [183506.924225] o2dlm: Node 7 leaves domain > A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 ) 4 nodes > > [183709.344072] o2dlm: Node 7 joins domain > A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 7 ) 5 nodes > > [183905.441289] o2dlm: Node 7 leaves domain > A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 ) 4 nodes > > [184103.391770] o2dlm: Node 7 joins domain > A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 7 ) 5 nodes > > [184175.702196] o2dlm: Node 7 leaves domain > A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 ) 4 nodes > > [184363.166986] o2dlm: Node 7 joins domain > A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 7 ) 5 nodes > > *[193918.928968] (kworker/u128:1,51132,30):dlm_do_assert_master:1717 > ERROR: Error -112 when sending message 502 (key 0xc3460ae7) to node 1* > > *[193918.929004] > (kworker/u128:3,63088,31):dlm_send_remote_convert_request:392 ERROR: Error > -112 when sending message 504 (key 0xc3460ae7) to node 1* > > [193918.929035] o2dlm: Waiting on the death of node 1 in domain > A895BC216BE641A8A7E20AA89D57E051 > > [193918.929083] o2cb: o2dlm has evicted node 1 from domain > A895BC216BE641A8A7E20AA89D57E051 > > [193920.386365] o2cb: o2dlm has evicted node 1 from domain > A895BC216BE641A8A7E20AA89D57E051 > > [193921.972105] o2dlm: Begin recovery on domain > A895BC216BE641A8A7E20AA89D57E051 for node 1 > > [193921.972114] o2dlm: Node 2 (he) is the Recovery Master for the dead > node 1 in domain A895BC216BE641A8A7E20AA89D57E051 > > [193921.972116] o2dlm: End recovery on domain > A895BC216BE641A8A7E20AA89D57E051 > > [193926.972056] o2dlm: Begin recovery on domain > A895BC216BE641A8A7E20AA89D57E051 for node 1 > > [193926.972063] o2dlm: Node 2 (he) is the Recovery Master for the dead > node 1 in domain A895BC216BE641A8A7E20AA89D57E051 > > [193926.972066] o2dlm: End recovery on domain > A895BC216BE641A8A7E20AA89D57E051 > > [193931.972054] o2dlm: Begin recovery on domain > A895BC216BE641A8A7E20AA89D57E051 for node 1 > > [193931.972062] o2dlm: Node 2 (he) is the Recovery Master for the dead > node 1 in domain A895BC216BE641A8A7E20AA89D57E051 > > [193931.972065] o2dlm: End recovery on domain > A895BC216BE641A8A7E20AA89D57E051 > > [193936.972101] o2dlm: Begin recovery on domain > A895BC216BE641A8A7E20AA89D57E051 for node 1 > > [193936.972108] o2dlm: Node 2 (he) is the Recovery Master for the dead > node 1 in domain A895BC216BE641A8A7E20AA89D57E051 > > [193936.972110] o2dlm: End recovery on domain > A895BC216BE641A8A7E20AA89D57E051 > > [193941.972066] o2dlm: Begin recovery on domain > A895BC216BE641A8A7E20AA89D57E051 for node 1 > > [193941.972072] o2dlm: Node 2 (he) is the Recovery Master for the dead > node 1 in domain A895BC216BE641A8A7E20AA89D57E051 > > [193941.972075] o2dlm: End recovery on domain > A895BC216BE641A8A7E20AA89D57E051 > > [193946.972077] o2dlm: Begin recovery on domain > A895BC216BE641A8A7E20AA89D57E051 for node 1 > > [193946.972084] o2dlm: Node 2 (he) is the Recovery Master for the dead > node 1 in domain A895BC216BE641A8A7E20AA89D57E051 > > [193946.972086] o2dlm: End recovery on domain > A895BC216BE641A8A7E20AA89D57E051 > > [193951.972107] o2dlm: Begin recovery on domain > A895BC216BE641A8A7E20AA89D57E051 for node 1 > > [193951.972114] o2dlm: Node 2 (he) is the Recovery Master for the dead > node 1 in domain A895BC216BE641A8A7E20AA89D57E051 > > [193951.972116] o2dlm: End recovery on domain > A895BC216BE641A8A7E20AA89D57E051 > > [193956.972073] o2dlm: Begin recovery on domain > A895BC216BE641A8A7E20AA89D57E051 for node 1 > > [193956.972081] o2dlm: Node 2 (he) is the Recovery Master for the dead > node 1 in domain A895BC216BE641A8A7E20AA89D57E051 > > [193956.972084] o2dlm: End recovery on domain > A895BC216BE641A8A7E20AA89D57E051 > > [193961.972075] o2dlm: Begin recovery on domain > A895BC216BE641A8A7E20AA89D57E051 for node 1 > > [193961.972082] o2dlm: Node 2 (he) is the Recovery Master for the dead > node 1 in domain A895BC216BE641A8A7E20AA89D57E051 > > [193961.972084] o2dlm: End recovery on domain > A895BC216BE641A8A7E20AA89D57E051 > > [193966.972051] o2dlm: Begin recovery on domain > A895BC216BE641A8A7E20AA89D57E051 for node 1 > > [193966.972059] o2dlm: Node 2 (he) is the Recovery Master for the dead > node 1 in domain A895BC216BE641A8A7E20AA89D57E051 > > [193966.972062] o2dlm: End recovery on domain > A895BC216BE641A8A7E20AA89D57E051 > > [193971.972115] o2dlm: Begin recovery on domain > A895BC216BE641A8A7E20AA89D57E051 for node 1 > > [193971.972122] o2dlm: Node 2 (he) is the Recovery Master for the dead > node 1 in domain A895BC216BE641A8A7E20AA89D57E051 > > [193971.972124] o2dlm: End recovery on domain > A895BC216BE641A8A7E20AA89D57E051 > > [193976.972103] o2dlm: Begin recovery on domain > A895BC216BE641A8A7E20AA89D57E051 for node 1 > > [193976.972111] o2dlm: Node 2 (he) is the Recovery Master for the dead > node 1 in domain A895BC216BE641A8A7E20AA89D57E051 > > [193976.972114] o2dlm: End recovery on domain > A895BC216BE641A8A7E20AA89D57E051 > > [194143.962241] o2dlm: Node 1 joins domain > A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 7 ) 5 nodes > > [199847.473092] o2dlm: Node 7 leaves domain > A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 ) 4 nodes > > [208215.106305] o2dlm: Node 7 joins domain > A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 7 ) 5 nodes > > [258418.054204] o2cb: o2dlm has evicted node 7 from domain > A895BC216BE641A8A7E20AA89D57E051 > > [258418.957738] o2cb: o2dlm has evicted node 7 from domain > A895BC216BE641A8A7E20AA89D57E051 > > [264056.408719] o2dlm: Node 7 joins domain > A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 7 ) 5 nodes > > [264464.605542] o2dlm: Node 7 leaves domain > A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 ) 4 nodes > > [275619.497198] o2dlm: Node 7 joins domain > A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 7 ) 5 nodes > > [426628.076148] o2cb: o2dlm has evicted node 1 from domain > A895BC216BE641A8A7E20AA89D57E051 > > [426628.885084] o2dlm: Begin recovery on domain > A895BC216BE641A8A7E20AA89D57E051 for node 1 > > [426628.891170] o2dlm: Node 3 (me) is the Recovery Master for the dead > node 1 in domain A895BC216BE641A8A7E20AA89D57E051 > > [426634.182384] o2dlm: End recovery on domain > A895BC216BE641A8A7E20AA89D57E051 > > [427001.383315] o2dlm: Node 1 joins domain > A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 7 ) 5 nodes > > > > > > > > > > Regards > > Prabu** > > > > > > > > > > > > ---- On Tue, 29 Sep 2015 15:01:40 +0530 *Joseph Qi > <joseph...@huawei.com <mailto:joseph...@huawei.com> > <mailto:joseph...@huawei.com>>* wrote ---- > > > > > > > > On 2015/9/29 15:18, gjprabu wrote: > > > Hi Joseph, > > > > > > We have total 7 nodes and this problem occurs in multiple nodes > simultaneously not in particular one node. we checked network and its fine. > > > When we remount the ocfs2 partition, this problem is get fixed > temporarily and same problem reoccurs after some time. > > > > > > Even we do have problem while unmountinng. umount process goes to "D" > stat, then i need to restart server itself. Is there any solution for this > issue. > > > > > > I have tried running fsck.ocfs2 in problematic machine but its > throwing error. > > > > > > fsck.ocfs2 1.8.0 > > > fsck.ocfs2: I/O error on channel while opening "/zoho/build/downloads" > > > > > IMO, this can happen if the mountpoint is offline. > > > > > > > > Please refer the latest logs from one node. > > > > > > [258418.054204] o2cb: o2dlm has evicted node 7 from domain > A895BC216BE641A8A7E20AA89D57E051 > > > [258418.957738] o2cb: o2dlm has evicted node 7 from domain > A895BC216BE641A8A7E20AA89D57E051 > > > [264056.408719] o2dlm: Node 7 joins domain > A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 7 ) 5 nodes > > > [264464.605542] o2dlm: Node 7 leaves domain > A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 ) 4 nodes > > > [275619.497198] o2dlm: Node 7 joins domain > A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 7 ) 5 nodes > > > [426628.076148] o2cb: o2dlm has evicted node 1 from domain > A895BC216BE641A8A7E20AA89D57E051 > > > [426628.885084] o2dlm: Begin recovery on domain > A895BC216BE641A8A7E20AA89D57E051 for node 1 > > > [426628.891170] o2dlm: Node 3 (me) is the Recovery Master for the > dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051 > > > [426634.182384] o2dlm: End recovery on domain > A895BC216BE641A8A7E20AA89D57E051 > > > [427001.383315] o2dlm: Node 1 joins domain > A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 7 ) 5 nodes > > > > > The above message shows nodes in your cluster is frequently in and out. > > I suggest you check the cluster config in each node > > (/etc/ocfs2/cluster.conf as well as > /sys/kernel/config/<cluster_name>/node/). > > I haven't used ocfs2 along with ceph rbd. So I am not sure if it has > > relations with rbd. > > > > > > > > > > > > > > Regards > > > G.J > > > ** > > > > > > > > > > > > ---- On Fri, 25 Sep 2015 06:26:57 +0530 *Joseph Qi > <joseph...@huawei.com <mailto:joseph...@huawei.com> > <mailto:joseph...@huawei.com>>* wrote ---- > > > > > > On 2015/9/24 18:30, gjprabu wrote: > > > > Hi All, > > > > > > > > Can someone tell me what kind of is this. > > > > > > > > Regards > > > > Prabu GJ > > > > > > > > > > > > ---- On Wed, 23 Sep 2015 18:26:13 +0530 *gjprabu > <gjpr...@zohocorp.com <mailto:gjpr...@zohocorp.com> > <mailto:gjpr...@zohocorp.com> <mailto:gjpr...@zohocorp.com > <mailto:gjpr...@zohocorp.com>>>* wrote ---- > > > > > > > > Hi All, > > > > > > > > This issue we faced in locally machine also. but it is not in all > the client only two ocfs2 client we facing this issue. > > > > > > > > Regards > > > > Prabu GJ > > > > > > > > > > > > > > > > ---- On Wed, 23 Sep 2015 17:49:51 +0530 *gjprabu > <gjpr...@zohocorp.com <mailto:gjpr...@zohocorp.com> > <mailto:gjpr...@zohocorp.com> <mailto:gjpr...@zohocorp.com > <mailto:gjpr...@zohocorp.com>> <mailto:gjpr...@zohocorp.com > <mailto:gjpr...@zohocorp.com>>>* wrote ---- > > > > > > > > > > > > > > > > Hi All, > > > > > > > > We are using ocfs2 for RBD mounting and everything works fine, but > while writing or moving the data via the scripts after written it shows below > error. Please anybody help on this issue. > > > > > > > > > > > > > > > > # ls -althr > > > > ls: cannot access MICKEYLITE_3_0_M4_1_TEST: Input/output error > > > > ls: cannot access MICKEYLITE_3_0_M4_1_OLD: Input/output error > > > > total 0 > > > > d????????? ? ? ? ? ? MICKEYLITE_3_0_M4_1_TEST > > > > d????????? ? ? ? ? ? MICKEYLITE_3_0_M4_1_OLD > > > > > > > > _*Partition details.*_ > > > > > > > > /dev/rbd0 ocfs2 9.6T 140G 9.5T 2% /zoho/build/downloads > > > > > > > > /etc/ocfs2/cluster.conf > > > > cluster: > > > > node_count=7 > > > > heartbeat_mode = local > > > > name=ocfs2 > > > > > > > > node: > > > > ip_port = 7777 > > > > ip_address = 10.1.1.50 > > > > number = 1 > > > > name = integ-hm5 > > > > cluster = ocfs2 > > > > > > > > node: > > > > ip_port = 7777 > > > > ip_address = 10.1.1.51 > > > > number = 2 > > > > name = integ-hm9 > > > > cluster = ocfs2 > > > > > > > > node: > > > > ip_port = 7777 > > > > ip_address = 10.1.1.52 > > > > number = 3 > > > > name = integ-hm2 > > > > cluster = ocfs2 > > > > > > > > node: > > > > ip_port = 7777 > > > > ip_address = 10.1.1.53 > > > > number = 4 > > > > name = integ-ci-1 > > > > cluster = ocfs2 > > > > node: > > > > ip_port = 7777 > > > > ip_address = 10.1.1.54 > > > > number = 5 > > > > name = integ-cm2 > > > > cluster = ocfs2 > > > > node: > > > > ip_port = 7777 > > > > ip_address = 10.1.1.55 > > > > number = 6 > > > > name = integ-cm1 > > > > cluster = ocfs2 > > > > node: > > > > ip_port = 7777 > > > > ip_address = 10.1.1.56 > > > > number = 7 > > > > name = integ-hm8 > > > > cluster = ocfs2 > > > > > > > > > > > > *_Error on dmesg_* > > > > > > > > > > > > [516421.342393] (dlm_thread,51005,25):dlm_flush_asts:599 ERROR: > status = -112 > > > > [517119.689992] (httpd,64399,31):dlm_do_master_request:1383 ERROR: > link to 1 went down! > > > > [517119.690003] (dlm_thread,51005,25):dlm_send_proxy_ast_msg:482 > ERROR: A895BC216BE641A8A7E20AA89D57E051: res S000000000000000000000200000000, > error -112 send AST to node 1 > > > > [517119.690028] (dlm_thread,51005,25):dlm_flush_asts:599 ERROR: > status = -112 > > > > [517119.690034] (dlm_thread,51005,25):dlm_send_proxy_ast_msg:482 > ERROR: A895BC216BE641A8A7E20AA89D57E051: res S000000000000000000000200000000, > error -107 send AST to node 1 > > > > [517119.690036] (dlm_thread,51005,25):dlm_flush_asts:599 ERROR: > status = -107 > > > > [517119.700425] (httpd,64399,31):dlm_get_lock_resource:968 ERROR: > status = -112 > > > > [517517.894949] (dlm_thread,51005,25):dlm_send_proxy_ast_msg:482 > ERROR: A895BC216BE641A8A7E20AA89D57E051: res S000000000000000000000200000000, > error -112 send AST to node 1 > > > > [517517.899640] (dlm_thread,51005,25):dlm_flush_asts:599 ERROR: > status = -112 > > > > > > > The error messages means the connection between this node and node 1 > has problem. > > > You have to check the network. > > > > > > > > > > > Regards > > > > Prabu GJ > > > > > > > > > > > > > > > > _______________________________________________ > > > > Ocfs2-users mailing list > > > > Ocfs2-users@oss.oracle.com <mailto:Ocfs2-users@oss.oracle.com> > <mailto:Ocfs2-users@oss.oracle.com> <mailto:Ocfs2-users@oss.oracle.com > <mailto:Ocfs2-users@oss.oracle.com>> <mailto:Ocfs2-users@oss.oracle.com > <mailto:Ocfs2-users@oss.oracle.com>> > > > > https://oss.oracle.com/mailman/listinfo/ocfs2-users > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Ocfs2-users mailing list > > > > Ocfs2-users@oss.oracle.com <mailto:Ocfs2-users@oss.oracle.com> > <mailto:Ocfs2-users@oss.oracle.com> <mailto:Ocfs2-users@oss.oracle.com > <mailto:Ocfs2-users@oss.oracle.com>> > > > > https://oss.oracle.com/mailman/listinfo/ocfs2-users > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users