Hi,
On 10/22/15 21:00, gjprabu wrote:
Hi Eric,
Thanks for your reply, Still we are facing same issue. we found this
dmesg logs and this is known logs because our self made down node1 and
made up this is showing in logs and other then we didn't found error
message. Even we do have problem while unmounting. umount process goes
to "D" stat and fsck through fsck.ocfs2: I/O error. If required to run
any other command pls let me know.
1. system log over boots
#journalctl --list-boots
If there is just one boot record, please " man journald.conf" to
configure saving system logs over boots.
so, you can use "journalctl -b xxx" to see any specific boot system log.
I can't see what steps exactly lead to that error message? Better to
tidy up your problems from clean state.
2. umount issue may be caused by the bad condition cluster.
Communication between nodes hung up.
3. please using device instead of mount point.
4. Did you build up CEPH RBD based on a good conditional ocfs2 cluster?
It's better test more if cluster is
good before working on it.
Thanks,
Eric
**
*ocfs2 version*
debugfs.ocfs2 1.8.0
*# cat /etc/sysconfig/o2cb*
#
# This is a configuration file for automatic startup of the O2CB
# driver. It is generated by running /etc/init.d/o2cb configure.
# On Debian based systems the preferred method is running
# 'dpkg-reconfigure ocfs2-tools'.
#
# O2CB_STACK: The name of the cluster stack backing O2CB.
O2CB_STACK=o2cb
# O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start.
O2CB_BOOTCLUSTER=ocfs2
# O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.
O2CB_HEARTBEAT_THRESHOLD=31
# O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is
considered dead.
O2CB_IDLE_TIMEOUT_MS=30000
# O2CB_KEEPALIVE_DELAY_MS: Max time in ms before a keepalive packet is
sent
O2CB_KEEPALIVE_DELAY_MS=2000
# O2CB_RECONNECT_DELAY_MS: Min time in ms between connection attempts
O2CB_RECONNECT_DELAY_MS=2000
*# fsck.ocfs2 -fy /home/build/downloads/*
fsck.ocfs2 1.8.0
fsck.ocfs2: I/O error on channel while opening "/zoho/build/downloads/"
_*dmesg logs*_
[ 4229.886284] o2dlm: Joining domain A895BC216BE641A8A7E20AA89D57E051
( 5 ) 1 nodes
[ 4251.437451] o2dlm: Node 3 joins domain
A895BC216BE641A8A7E20AA89D57E051 ( 3 5 ) 2 nodes
[ 4267.836392] o2dlm: Node 1 joins domain
A895BC216BE641A8A7E20AA89D57E051 ( 1 3 5 ) 3 nodes
[ 4292.755589] o2dlm: Node 2 joins domain
A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 5 ) 4 nodes
[ 4306.262165] o2dlm: Node 4 joins domain
A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 5 ) 5 nodes
[316476.505401] (kworker/u192:0,95923,0):dlm_do_assert_master:1717
ERROR: Error -112 when sending message 502 (key 0xc3460ae7) to node 1
[316476.505470] o2cb: o2dlm has evicted node 1 from domain
A895BC216BE641A8A7E20AA89D57E051
[316480.437231] o2dlm: Begin recovery on domain
A895BC216BE641A8A7E20AA89D57E051 for node 1
[316480.442389] o2cb: o2dlm has evicted node 1 from domain
A895BC216BE641A8A7E20AA89D57E051
[316480.442412] (kworker/u192:0,95923,20):dlm_begin_reco_handler:2765
A895BC216BE641A8A7E20AA89D57E051: dead_node previously set to 1, node
3 changing it to 1
[316480.541237] o2dlm: Node 3 (he) is the Recovery Master for the dead
node 1 in domain A895BC216BE641A8A7E20AA89D57E051
[316480.541241] o2dlm: End recovery on domain
A895BC216BE641A8A7E20AA89D57E051
[316485.542733] o2dlm: Begin recovery on domain
A895BC216BE641A8A7E20AA89D57E051 for node 1
[316485.542740] o2dlm: Node 3 (he) is the Recovery Master for the dead
node 1 in domain A895BC216BE641A8A7E20AA89D57E051
[316485.542742] o2dlm: End recovery on domain
A895BC216BE641A8A7E20AA89D57E051
[316490.544535] o2dlm: Begin recovery on domain
A895BC216BE641A8A7E20AA89D57E051 for node 1
[316490.544538] o2dlm: Node 3 (he) is the Recovery Master for the dead
node 1 in domain A895BC216BE641A8A7E20AA89D57E051
[316490.544539] o2dlm: End recovery on domain
A895BC216BE641A8A7E20AA89D57E051
[316495.546356] o2dlm: Begin recovery on domain
A895BC216BE641A8A7E20AA89D57E051 for node 1
[316495.546362] o2dlm: Node 3 (he) is the Recovery Master for the dead
node 1 in domain A895BC216BE641A8A7E20AA89D57E051
[316495.546364] o2dlm: End recovery on domain
A895BC216BE641A8A7E20AA89D57E051
[316500.548135] o2dlm: Begin recovery on domain
A895BC216BE641A8A7E20AA89D57E051 for node 1
[316500.548139] o2dlm: Node 3 (he) is the Recovery Master for the dead
node 1 in domain A895BC216BE641A8A7E20AA89D57E051
[316500.548140] o2dlm: End recovery on domain
A895BC216BE641A8A7E20AA89D57E051
[316505.549947] o2dlm: Begin recovery on domain
A895BC216BE641A8A7E20AA89D57E051 for node 1
[316505.549951] o2dlm: Node 3 (he) is the Recovery Master for the dead
node 1 in domain A895BC216BE641A8A7E20AA89D57E051
[316505.549952] o2dlm: End recovery on domain
A895BC216BE641A8A7E20AA89D57E051
[316510.551734] o2dlm: Begin recovery on domain
A895BC216BE641A8A7E20AA89D57E051 for node 1
[316510.551739] o2dlm: Node 3 (he) is the Recovery Master for the dead
node 1 in domain A895BC216BE641A8A7E20AA89D57E051
[316510.551740] o2dlm: End recovery on domain
A895BC216BE641A8A7E20AA89D57E051
[316515.553543] o2dlm: Begin recovery on domain
A895BC216BE641A8A7E20AA89D57E051 for node 1
[316515.553547] o2dlm: Node 3 (he) is the Recovery Master for the dead
node 1 in domain A895BC216BE641A8A7E20AA89D57E051
[316515.553548] o2dlm: End recovery on domain
A895BC216BE641A8A7E20AA89D57E051
[316520.555337] o2dlm: Begin recovery on domain
A895BC216BE641A8A7E20AA89D57E051 for node 1
[316520.555341] o2dlm: Node 3 (he) is the Recovery Master for the dead
node 1 in domain A895BC216BE641A8A7E20AA89D57E051
[316520.555343] o2dlm: End recovery on domain
A895BC216BE641A8A7E20AA89D57E051
[316525.557131] o2dlm: Begin recovery on domain
A895BC216BE641A8A7E20AA89D57E051 for node 1
[316525.557136] o2dlm: Node 3 (he) is the Recovery Master for the dead
node 1 in domain A895BC216BE641A8A7E20AA89D57E051
[316525.557153] o2dlm: End recovery on domain
A895BC216BE641A8A7E20AA89D57E051
[316530.558952] o2dlm: Begin recovery on domain
A895BC216BE641A8A7E20AA89D57E051 for node 1
[316530.558955] o2dlm: Node 3 (he) is the Recovery Master for the dead
node 1 in domain A895BC216BE641A8A7E20AA89D57E051
[316530.558957] o2dlm: End recovery on domain
A895BC216BE641A8A7E20AA89D57E051
[316535.560781] o2dlm: Begin recovery on domain
A895BC216BE641A8A7E20AA89D57E051 for node 1
[316535.560789] o2dlm: Node 3 (he) is the Recovery Master for the dead
node 1 in domain A895BC216BE641A8A7E20AA89D57E051
[316535.560792] o2dlm: End recovery on domain
A895BC216BE641A8A7E20AA89D57E051
[319419.525609] o2dlm: Node 1 joins domain
A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 5 ) 5 nodes
*ps -auxxxxx | grep umount*
root 32083 21.8 0.0 125620 2828 pts/14 D+ 19:37 0:18 umount
/home/build/repository
root 32196 0.0 0.0 112652 2264 pts/8 S+ 19:38 0:00 grep
--color=auto umount
*cat /proc/32083/stack*
[<ffffffff8132ad7d>] o2net_send_message_vec+0x71d/0xb00
[<ffffffff81352148>] dlm_send_remote_unlock_request.isra.2+0x128/0x410
[<ffffffff813527db>] dlmunlock_common+0x3ab/0x9e0
[<ffffffff81353088>] dlmunlock+0x278/0x800
[<ffffffff8131f765>] o2cb_dlm_unlock+0x35/0x50
[<ffffffff8131ecfe>] ocfs2_dlm_unlock+0x1e/0x30
[<ffffffff812a8776>] ocfs2_drop_lock.isra.29.part.30+0x1f6/0x700
[<ffffffff812ae40d>] ocfs2_simple_drop_lockres+0x2d/0x40
[<ffffffff8129b43c>] ocfs2_dentry_lock_put+0x5c/0x80
[<ffffffff8129b4a2>] ocfs2_dentry_iput+0x42/0x1d0
[<ffffffff81204dc2>] __dentry_kill+0x102/0x1f0
[<ffffffff81205294>] shrink_dentry_list+0xe4/0x2a0
[<ffffffff81205aa8>] shrink_dcache_parent+0x38/0x90
[<ffffffff81205b16>] do_one_tree+0x16/0x50
[<ffffffff81206e9f>] shrink_dcache_for_umount+0x2f/0x90
[<ffffffff811efb15>] generic_shutdown_super+0x25/0x100
[<ffffffff811eff57>] kill_block_super+0x27/0x70
[<ffffffff811f02a9>] deactivate_locked_super+0x49/0x60
[<ffffffff811f089e>] deactivate_super+0x4e/0x70
[<ffffffff8120da83>] cleanup_mnt+0x43/0x90
[<ffffffff8120db22>] __cleanup_mnt+0x12/0x20
[<ffffffff81093ba4>] task_work_run+0xc4/0xe0
[<ffffffff81013c67>] do_notify_resume+0x97/0xb0
[<ffffffff817d2ee7>] int_signal+0x12/0x17
[<ffffffffffffffff>] 0xffffffffffffffff
Regards
Prabu
---- On Wed, 21 Oct 2015 08:32:15 +0530 *Eric Ren <z...@suse.com>*
wrote ----
Hi Prabu,
I guess others like me are not familiar with this case that
combine CEPH RBD and OCFS2.
We'd really like to help you. But I think ocfs2 developers cannot
get any info about what happened
to ocfs2 from your descriptions.
So, I'm wondering if you can reproduce and tell us the steps. Once
developers can reproduce it,
it's likely be resolved;-) BTW, any dmesg log about ocfs2
especially the initial error message and stack
back trace will be helpful!
Thanks,
Eric
On 10/20/15 17:29, gjprabu wrote:
Hi
We are looking forward to your input on this.
Regads
Prabu
--- On Fri, 09 Oct 2015 12:08:19 +0530 *gjprabu
<gjpr...@zohocorp.com> <mailto:gjpr...@zohocorp.com>* wrote ----
Hi All,
Anybody pls help me on this issue.
Regards
Prabu
---- On Thu, 08 Oct 2015 12:33:57 +0530 *gjprabu
<gjpr...@zohocorp.com <mailto:gjpr...@zohocorp.com>>*
wrote ----
Hi All,
We have CEPH RBD with OCFS2 mounted
servers. we are facing i/o errors simultaneously
while move the data's in the same disk (Copying is
not having any problem). Temporary we remount the
partition and the issue get resolved but after
sometime problem again reproduced. If anybody
faced same issue. Please help us.
Note : We have total 5 Nodes, here two nodes
working fine other nodes are showing like below
input/output error.
ls -althr
ls: cannot access LITE_3_0_M4_1_TEST: Input/output
error
ls: cannot access LITE_3_0_M4_1_OLD: Input/output
error
total 0
d????????? ? ? ? ? ? LITE_3_0_M4_1_TEST
d????????? ? ? ? ? ? LITE_3_0_M4_1_OLD
cluster:
node_count=5
heartbeat_mode = local
name=ocfs2
node:
ip_port = 7777
ip_address = 192.168.113.42
number = 1
name = integ-hm9
cluster = ocfs2
node:
ip_port = 7777
ip_address = 192.168.112.115
number = 2
name = integ-hm2
cluster = ocfs2
node:
ip_port = 7777
ip_address = 192.168.113.43
number = 3
name = integ-ci-1
cluster = ocfs2
node:
ip_port = 7777
ip_address = 192.168.112.217
number = 4
name = integ-hm8
cluster = ocfs2
node:
ip_port = 7777
ip_address = 192.168.112.192
number = 5
name = integ-hm5
cluster = ocfs2
Regards
Prabu
_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
<mailto:Ocfs2-users@oss.oracle.com>
https://oss.oracle.com/mailman/listinfo/ocfs2-users
_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com <mailto:Ocfs2-users@oss.oracle.com>
https://oss.oracle.com/mailman/listinfo/ocfs2-users
_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users