Re: [Gluster-users] GlusterFS mount crash

2022-12-02 Thread Angel Docampo
Yes, I'm on 10.3 on a brand new installation (i.e.: no upgrade or
whatsoever)

Ok, I've finally read how to get core dumps on Debian. Soft limits are 0 by
default, so no core dumps can be generated. I've set up soft ulimit to
unlimited and killed a test process with a SIGSEGV signal, then I was able
to see the core dump. Great!

I've moved all the qcow2 files to another location but not destroyed the
volume or deleted any files, I will create some test VMs there and leave
this a few days to see if the dump is generated properly next time,
hopefully soon enough.

Thank you Xavi.

*Angel Docampo*

<+34-93-1592929>


El jue, 1 dic 2022 a las 18:02, Xavi Hernandez ()
escribió:

> I'm not so sure the problem is with sharding. Basically it's saying that
> seek is not supported, which means that something between shard and the
> bricks doesn't support it. DHT didn't support seek before 10.3, but if I'm
> not wrong you are already using 10.3, so the message is weird. But in any
> case this shouldn't cause a crash. The stack trace seems to indicate that
> the crash happens inside disperse, but without a core dump there's little
> more that I can do.
>
>
>
> On Thu, Dec 1, 2022 at 5:27 PM Angel Docampo 
> wrote:
>
>> Well, that last more time, but it crashed once again, same node, same
>> mountpoint... fortunately, I've moved preventively all the VMs to the
>> underlying ZFS filesystem those past days, so none of them have been
>> affected this time...
>>
>> dmesg show this
>> [2022-12-01 15:49:54]  INFO: task iou-wrk-637144:946532 blocked for more
>> than 120 seconds.
>> [2022-12-01 15:49:54]Tainted: P  IO  5.15.74-1-pve #1
>> [2022-12-01 15:49:54]  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [2022-12-01 15:49:54]  task:iou-wrk-637144  state:D stack:0
>> pid:946532 ppid: 1 flags:0x4000
>> [2022-12-01 15:49:54]  Call Trace:
>> [2022-12-01 15:49:54]   
>> [2022-12-01 15:49:54]   __schedule+0x34e/0x1740
>> [2022-12-01 15:49:54]   ? kmem_cache_free+0x271/0x290
>> [2022-12-01 15:49:54]   ? mempool_free_slab+0x17/0x20
>> [2022-12-01 15:49:54]   schedule+0x69/0x110
>> [2022-12-01 15:49:54]   rwsem_down_write_slowpath+0x231/0x4f0
>> [2022-12-01 15:49:54]   ? ttwu_queue_wakelist+0x40/0x1c0
>> [2022-12-01 15:49:54]   down_write+0x47/0x60
>> [2022-12-01 15:49:54]   fuse_file_write_iter+0x1a3/0x430
>> [2022-12-01 15:49:54]   ? apparmor_file_permission+0x70/0x170
>> [2022-12-01 15:49:54]   io_write+0xf6/0x330
>> [2022-12-01 15:49:54]   ? update_cfs_group+0x9c/0xc0
>> [2022-12-01 15:49:54]   ? dequeue_entity+0xd8/0x490
>> [2022-12-01 15:49:54]   io_issue_sqe+0x401/0x1fc0
>> [2022-12-01 15:49:54]   ? lock_timer_base+0x3b/0xd0
>> [2022-12-01 15:49:54]   io_wq_submit_work+0x76/0xd0
>> [2022-12-01 15:49:54]   io_worker_handle_work+0x1a7/0x5f0
>> [2022-12-01 15:49:54]   io_wqe_worker+0x2c0/0x360
>> [2022-12-01 15:49:54]   ? finish_task_switch.isra.0+0x7e/0x2b0
>> [2022-12-01 15:49:54]   ? io_worker_handle_work+0x5f0/0x5f0
>> [2022-12-01 15:49:54]   ? io_worker_handle_work+0x5f0/0x5f0
>> [2022-12-01 15:49:54]   ret_from_fork+0x1f/0x30
>> [2022-12-01 15:49:54]  RIP: 0033:0x0
>> [2022-12-01 15:49:54]  RSP: 002b: EFLAGS: 0207
>> [2022-12-01 15:49:54]  RAX:  RBX: 0011 RCX:
>> 
>> [2022-12-01 15:49:54]  RDX: 0001 RSI: 0001 RDI:
>> 0120
>> [2022-12-01 15:49:54]  RBP: 0120 R08: 0001 R09:
>> 00f0
>> [2022-12-01 15:49:54]  R10: 00f8 R11: 0001239a4128 R12:
>> db90
>> [2022-12-01 15:49:54]  R13: 0001 R14: 0001 R15:
>> 0100
>> [2022-12-01 15:49:54]   
>>
>> My gluster volume log shows plenty of error like this
>> The message "I [MSGID: 133017] [shard.c:7275:shard_seek] 0-vmdata-shard:
>> seek called on 73f0ad95-f7e3-4a68-8d08-9f7e03182baa. [Operation not
>> supported]" repeated 1564 times between [2022-12-01 00:20:09.578233 +]
>> and [2022-12-01 00:22:09.436927 +]
>> [2022-12-01 00:22:09.516269 +] I [MSGID: 133017]
>> [shard.c:7275:shard_seek] 0-vmdata-shard: seek called on
>> 73f0ad95-f7e3-4a68-8d08-9f7e03182baa. [Operation not supported]
>>
>> and of this
>> [2022-12-01 09:05:08.525867 +] I [MSGID: 133017]
>> [shard.c:7275:shard_seek] 0-vmdata-shard: seek called on
>> 3ed993c4-bbb5-4938-86e9-6d22b8541e8e. [Operation not supported]
>>
>> Then simply the same
>> pending frames:
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(1) op(FSYNC)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) 

Re: [Gluster-users] GlusterFS mount crash

2022-12-01 Thread Xavi Hernandez
I'm not so sure the problem is with sharding. Basically it's saying that
seek is not supported, which means that something between shard and the
bricks doesn't support it. DHT didn't support seek before 10.3, but if I'm
not wrong you are already using 10.3, so the message is weird. But in any
case this shouldn't cause a crash. The stack trace seems to indicate that
the crash happens inside disperse, but without a core dump there's little
more that I can do.



On Thu, Dec 1, 2022 at 5:27 PM Angel Docampo 
wrote:

> Well, that last more time, but it crashed once again, same node, same
> mountpoint... fortunately, I've moved preventively all the VMs to the
> underlying ZFS filesystem those past days, so none of them have been
> affected this time...
>
> dmesg show this
> [2022-12-01 15:49:54]  INFO: task iou-wrk-637144:946532 blocked for more
> than 120 seconds.
> [2022-12-01 15:49:54]Tainted: P  IO  5.15.74-1-pve #1
> [2022-12-01 15:49:54]  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [2022-12-01 15:49:54]  task:iou-wrk-637144  state:D stack:0 pid:946532
> ppid: 1 flags:0x4000
> [2022-12-01 15:49:54]  Call Trace:
> [2022-12-01 15:49:54]   
> [2022-12-01 15:49:54]   __schedule+0x34e/0x1740
> [2022-12-01 15:49:54]   ? kmem_cache_free+0x271/0x290
> [2022-12-01 15:49:54]   ? mempool_free_slab+0x17/0x20
> [2022-12-01 15:49:54]   schedule+0x69/0x110
> [2022-12-01 15:49:54]   rwsem_down_write_slowpath+0x231/0x4f0
> [2022-12-01 15:49:54]   ? ttwu_queue_wakelist+0x40/0x1c0
> [2022-12-01 15:49:54]   down_write+0x47/0x60
> [2022-12-01 15:49:54]   fuse_file_write_iter+0x1a3/0x430
> [2022-12-01 15:49:54]   ? apparmor_file_permission+0x70/0x170
> [2022-12-01 15:49:54]   io_write+0xf6/0x330
> [2022-12-01 15:49:54]   ? update_cfs_group+0x9c/0xc0
> [2022-12-01 15:49:54]   ? dequeue_entity+0xd8/0x490
> [2022-12-01 15:49:54]   io_issue_sqe+0x401/0x1fc0
> [2022-12-01 15:49:54]   ? lock_timer_base+0x3b/0xd0
> [2022-12-01 15:49:54]   io_wq_submit_work+0x76/0xd0
> [2022-12-01 15:49:54]   io_worker_handle_work+0x1a7/0x5f0
> [2022-12-01 15:49:54]   io_wqe_worker+0x2c0/0x360
> [2022-12-01 15:49:54]   ? finish_task_switch.isra.0+0x7e/0x2b0
> [2022-12-01 15:49:54]   ? io_worker_handle_work+0x5f0/0x5f0
> [2022-12-01 15:49:54]   ? io_worker_handle_work+0x5f0/0x5f0
> [2022-12-01 15:49:54]   ret_from_fork+0x1f/0x30
> [2022-12-01 15:49:54]  RIP: 0033:0x0
> [2022-12-01 15:49:54]  RSP: 002b: EFLAGS: 0207
> [2022-12-01 15:49:54]  RAX:  RBX: 0011 RCX:
> 
> [2022-12-01 15:49:54]  RDX: 0001 RSI: 0001 RDI:
> 0120
> [2022-12-01 15:49:54]  RBP: 0120 R08: 0001 R09:
> 00f0
> [2022-12-01 15:49:54]  R10: 00f8 R11: 0001239a4128 R12:
> db90
> [2022-12-01 15:49:54]  R13: 0001 R14: 0001 R15:
> 0100
> [2022-12-01 15:49:54]   
>
> My gluster volume log shows plenty of error like this
> The message "I [MSGID: 133017] [shard.c:7275:shard_seek] 0-vmdata-shard:
> seek called on 73f0ad95-f7e3-4a68-8d08-9f7e03182baa. [Operation not
> supported]" repeated 1564 times between [2022-12-01 00:20:09.578233 +]
> and [2022-12-01 00:22:09.436927 +]
> [2022-12-01 00:22:09.516269 +] I [MSGID: 133017]
> [shard.c:7275:shard_seek] 0-vmdata-shard: seek called on
> 73f0ad95-f7e3-4a68-8d08-9f7e03182baa. [Operation not supported]
>
> and of this
> [2022-12-01 09:05:08.525867 +] I [MSGID: 133017]
> [shard.c:7275:shard_seek] 0-vmdata-shard: seek called on
> 3ed993c4-bbb5-4938-86e9-6d22b8541e8e. [Operation not supported]
>
> Then simply the same
> pending frames:
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(1) op(FSYNC)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> patchset: git://git.gluster.org/glusterfs.git
> signal received: 11
> time of crash:
> 2022-12-01 14:45:14 +
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> libpthread 1
> llistxattr 1
> setfsid 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 10.3
> /lib/x86_64-linux-gnu/libglusterfs.so.0(+0x28a54)[0x7f1e23db3a54]
> /lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x700)[0x7f1e23dbbfc0]
>
> /lib/x86_64-linux-gnu/libc.so.6(+0x38d60)[0x7f1e23b76d60]
> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x37a14)[0x7f1e200e9a14]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f1e200cb414]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0xd072)[0x7f1e200bf072]
>
> 

Re: [Gluster-users] GlusterFS mount crash

2022-12-01 Thread Angel Docampo
Well, that last more time, but it crashed once again, same node, same
mountpoint... fortunately, I've moved preventively all the VMs to the
underlying ZFS filesystem those past days, so none of them have been
affected this time...

dmesg show this
[2022-12-01 15:49:54]  INFO: task iou-wrk-637144:946532 blocked for more
than 120 seconds.
[2022-12-01 15:49:54]Tainted: P  IO  5.15.74-1-pve #1
[2022-12-01 15:49:54]  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[2022-12-01 15:49:54]  task:iou-wrk-637144  state:D stack:0 pid:946532
ppid: 1 flags:0x4000
[2022-12-01 15:49:54]  Call Trace:
[2022-12-01 15:49:54]   
[2022-12-01 15:49:54]   __schedule+0x34e/0x1740
[2022-12-01 15:49:54]   ? kmem_cache_free+0x271/0x290
[2022-12-01 15:49:54]   ? mempool_free_slab+0x17/0x20
[2022-12-01 15:49:54]   schedule+0x69/0x110
[2022-12-01 15:49:54]   rwsem_down_write_slowpath+0x231/0x4f0
[2022-12-01 15:49:54]   ? ttwu_queue_wakelist+0x40/0x1c0
[2022-12-01 15:49:54]   down_write+0x47/0x60
[2022-12-01 15:49:54]   fuse_file_write_iter+0x1a3/0x430
[2022-12-01 15:49:54]   ? apparmor_file_permission+0x70/0x170
[2022-12-01 15:49:54]   io_write+0xf6/0x330
[2022-12-01 15:49:54]   ? update_cfs_group+0x9c/0xc0
[2022-12-01 15:49:54]   ? dequeue_entity+0xd8/0x490
[2022-12-01 15:49:54]   io_issue_sqe+0x401/0x1fc0
[2022-12-01 15:49:54]   ? lock_timer_base+0x3b/0xd0
[2022-12-01 15:49:54]   io_wq_submit_work+0x76/0xd0
[2022-12-01 15:49:54]   io_worker_handle_work+0x1a7/0x5f0
[2022-12-01 15:49:54]   io_wqe_worker+0x2c0/0x360
[2022-12-01 15:49:54]   ? finish_task_switch.isra.0+0x7e/0x2b0
[2022-12-01 15:49:54]   ? io_worker_handle_work+0x5f0/0x5f0
[2022-12-01 15:49:54]   ? io_worker_handle_work+0x5f0/0x5f0
[2022-12-01 15:49:54]   ret_from_fork+0x1f/0x30
[2022-12-01 15:49:54]  RIP: 0033:0x0
[2022-12-01 15:49:54]  RSP: 002b: EFLAGS: 0207
[2022-12-01 15:49:54]  RAX:  RBX: 0011 RCX:

[2022-12-01 15:49:54]  RDX: 0001 RSI: 0001 RDI:
0120
[2022-12-01 15:49:54]  RBP: 0120 R08: 0001 R09:
00f0
[2022-12-01 15:49:54]  R10: 00f8 R11: 0001239a4128 R12:
db90
[2022-12-01 15:49:54]  R13: 0001 R14: 0001 R15:
0100
[2022-12-01 15:49:54]   

My gluster volume log shows plenty of error like this
The message "I [MSGID: 133017] [shard.c:7275:shard_seek] 0-vmdata-shard:
seek called on 73f0ad95-f7e3-4a68-8d08-9f7e03182baa. [Operation not
supported]" repeated 1564 times between [2022-12-01 00:20:09.578233 +]
and [2022-12-01 00:22:09.436927 +]
[2022-12-01 00:22:09.516269 +] I [MSGID: 133017]
[shard.c:7275:shard_seek] 0-vmdata-shard: seek called on
73f0ad95-f7e3-4a68-8d08-9f7e03182baa. [Operation not supported]

and of this
[2022-12-01 09:05:08.525867 +] I [MSGID: 133017]
[shard.c:7275:shard_seek] 0-vmdata-shard: seek called on
3ed993c4-bbb5-4938-86e9-6d22b8541e8e. [Operation not supported]

Then simply the same
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(FSYNC)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2022-12-01 14:45:14 +
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 10.3
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x28a54)[0x7f1e23db3a54]
/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x700)[0x7f1e23dbbfc0]

/lib/x86_64-linux-gnu/libc.so.6(+0x38d60)[0x7f1e23b76d60]
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x37a14)[0x7f1e200e9a14]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f1e200cb414]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0xd072)[0x7f1e200bf072]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/performance/readdir-ahead.so(+0x316d)[0x7f1e200a316d]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/distribute.so(+0x5bdd4)[0x7f1e197aadd4]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/features/shard.so(+0x1e69c)[0x7f1e2008b69c]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/features/shard.so(+0x16551)[0x7f1e20083551]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/features/shard.so(+0x25abf)[0x7f1e20092abf]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/features/shard.so(+0x25d21)[0x7f1e20092d21]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/features/shard.so(+0x167be)[0x7f1e200837be]


Re: [Gluster-users] GlusterFS mount crash

2022-11-25 Thread Angel Docampo
I did also notice about that loop0... AFAIK, I wasn't using any loop
device, at least consciously.
After looking for the same messages at the other gluster/proxmox nodes, I
saw no trace of it.
Then I saw on that node, there is a single LXC container, which disk is
living on the glusterfs, and effectively, is using ext4.
After the crash of today, I was unable to boot it up again, and the logs
became silent, I did just try to boot it up, and immediately appeared this
on dmesg
[2022-11-25 18:04:18]  loop0: detected capacity change from 0 to 16777216
[2022-11-25 18:04:18]  EXT4-fs (loop0): error loading journal
[2022-11-25 18:05:26]  loop0: detected capacity change from 0 to 16777216
[2022-11-25 18:05:26]  EXT4-fs (loop0): INFO: recovery required on readonly
filesystem
[2022-11-25 18:05:26]  EXT4-fs (loop0): write access unavailable, cannot
proceed (try mounting with noload)

And the LXC container didn't boot up. I've manually moved the LXC container
to the underlying ZFS where gluster lives, and the LXC booted up and the
dmesg log shows
[2022-11-25 18:24:06]  loop0: detected capacity change from 0 to 16777216
[2022-11-25 18:24:06]  EXT4-fs warning (device loop0):
ext4_multi_mount_protect:326: MMP interval 42 higher than expected, please
wait.
[2022-11-25 18:24:50]  EXT4-fs (loop0): mounted filesystem with ordered
data mode. Opts: (null). Quota mode: none.

So, to recapitulate:
- the loop device on the host relies on the LXC, is not surprising, but I
didn't know it.
- the LXC container had a lot of I/O issues just before the two crashes,
the crash from today, and the crash 4 days ago, this Monday
- as side note, this gluster is in production since last Thursday, so the
first crash was exactly 4 days since this LXC was started with the storage
on the gluster, and exactly 4 days after, it crashed again.
- this crashes began to happen since the upgrade to gluster 10.3, because
it was working just fine with former versions of gluster (from 3.X to 9.X),
and from proxmox 5.X to proxmox 7.1, when all the issues begun, now I'm on
proxmox 7.2.
- underlying ZFS where gluster is, has no ZIL or ZLOG (it had before the
upgrade to gluster 10.3, but as I had to re-create the gluster, I decided
not to add them because all my disks are SSD, so there is no need to add
any of those), I've added them to test if the LXC container caused the same
issues, it did, so they don't seem to make any difference.
- there are more loop0 I/O errors on the dmesg besides the days of the
crash, but there are just "one" error per day, and not all days, but the
days gluster mountpoint become inaccessible, there are tens of errors per
millisecond just before the crash

I'm going to get rid of that LXC, as now I'm migrating from VMs to K8s
(living in a VM cluster inside proxmox), I was ready to convert this as
well, now is a must.

I don't know if anyone at gluster can replicate this scenario (proxmox +
gluster distributed disperse + LXC on a gluster directory), to see if it
can be reproducible. I know this must be a corner case, just wondering why
stopped working, if it is a bug on GlusterFS 10.3, a bug in LXC or in
Proxmox 7.1 upwards (where I'm going to post this now, but Proxmox probably
won't be interested as they explicitly suggest mounting glusterfs with the
gluster client, and not to map a directory where gluster is mounted via
fstab)

Thank you a lot Xavi, I will monitor dmesg to make sure all those loop
errors disappear, and hopefully I won't have a crash next Tuesday. :)

*Angel Docampo*

<+34-93-1592929>


El vie, 25 nov 2022 a las 13:25, Xavi Hernandez ()
escribió:

> What is "loop0" it seems it's having some issue. Does it point to a
> Gluster file ?
>
> I also see that there's an io_uring thread in D state. If that one belongs
> to Gluster, it may explain why systemd was unable to generate a core dump
> (all threads need to be stopped to generate a core dump, but a thread
> blocked inside the kernel cannot be stopped).
>
> If you are using io_uring in Gluster, maybe you can disable it to see if
> it's related.
>
> Xavi
>
> On Fri, Nov 25, 2022 at 11:39 AM Angel Docampo <
> angel.doca...@eoniantec.com> wrote:
>
>> Well, just happened again, the same server, the same mountpoint.
>>
>> I'm unable to get the core dumps, coredumpctl says there are no core
>> dumps, it would be funny if I wasn't the one suffering it, but
>> systemd-coredump service crashed as well
>> ● systemd-coredump@0-3199871-0.service - Process Core Dump (PID
>> 3199871/UID 0)
>> Loaded: loaded (/lib/systemd/system/systemd-coredump@.service;
>> static)
>> Active: failed (Result: timeout) since Fri 2022-11-25 10:54:59 CET;
>> 39min ago
>> TriggeredBy: ● systemd-coredump.socket
>>   Docs: man:systemd-coredump(8)
>>Process: 3199873 

Re: [Gluster-users] GlusterFS mount crash

2022-11-25 Thread Xavi Hernandez
What is "loop0" it seems it's having some issue. Does it point to a Gluster
file ?

I also see that there's an io_uring thread in D state. If that one belongs
to Gluster, it may explain why systemd was unable to generate a core dump
(all threads need to be stopped to generate a core dump, but a thread
blocked inside the kernel cannot be stopped).

If you are using io_uring in Gluster, maybe you can disable it to see if
it's related.

Xavi

On Fri, Nov 25, 2022 at 11:39 AM Angel Docampo 
wrote:

> Well, just happened again, the same server, the same mountpoint.
>
> I'm unable to get the core dumps, coredumpctl says there are no core
> dumps, it would be funny if I wasn't the one suffering it, but
> systemd-coredump service crashed as well
> ● systemd-coredump@0-3199871-0.service - Process Core Dump (PID
> 3199871/UID 0)
> Loaded: loaded (/lib/systemd/system/systemd-coredump@.service;
> static)
> Active: failed (Result: timeout) since Fri 2022-11-25 10:54:59 CET;
> 39min ago
> TriggeredBy: ● systemd-coredump.socket
>   Docs: man:systemd-coredump(8)
>Process: 3199873 ExecStart=/lib/systemd/systemd-coredump (code=killed,
> signal=TERM)
>   Main PID: 3199873 (code=killed, signal=TERM)
>CPU: 15ms
>
> Nov 25 10:49:59 pve02 systemd[1]: Started Process Core Dump (PID
> 3199871/UID 0).
> Nov 25 10:54:59 pve02 systemd[1]: systemd-coredump@0-3199871-0.service:
> Service reached runtime time limit. Stopping.
> Nov 25 10:54:59 pve02 systemd[1]: systemd-coredump@0-3199871-0.service:
> Failed with result 'timeout'.
>
>
> I just saw the exception on dmesg,
> [2022-11-25 10:50:08]  INFO: task kmmpd-loop0:681644 blocked for more than
> 120 seconds.
> [2022-11-25 10:50:08]Tainted: P  IO  5.15.60-2-pve #1
> [2022-11-25 10:50:08]  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [2022-11-25 10:50:08]  task:kmmpd-loop0 state:D stack:0 pid:681644
> ppid: 2 flags:0x4000
> [2022-11-25 10:50:08]  Call Trace:
> [2022-11-25 10:50:08]   
> [2022-11-25 10:50:08]   __schedule+0x33d/0x1750
> [2022-11-25 10:50:08]   ? bit_wait+0x70/0x70
> [2022-11-25 10:50:08]   schedule+0x4e/0xc0
> [2022-11-25 10:50:08]   io_schedule+0x46/0x80
> [2022-11-25 10:50:08]   bit_wait_io+0x11/0x70
> [2022-11-25 10:50:08]   __wait_on_bit+0x31/0xa0
> [2022-11-25 10:50:08]   out_of_line_wait_on_bit+0x8d/0xb0
> [2022-11-25 10:50:08]   ? var_wake_function+0x30/0x30
> [2022-11-25 10:50:08]   __wait_on_buffer+0x34/0x40
> [2022-11-25 10:50:08]   write_mmp_block+0x127/0x180
> [2022-11-25 10:50:08]   kmmpd+0x1b9/0x430
> [2022-11-25 10:50:08]   ? write_mmp_block+0x180/0x180
> [2022-11-25 10:50:08]   kthread+0x127/0x150
> [2022-11-25 10:50:08]   ? set_kthread_struct+0x50/0x50
> [2022-11-25 10:50:08]   ret_from_fork+0x1f/0x30
> [2022-11-25 10:50:08]   
> [2022-11-25 10:50:08]  INFO: task iou-wrk-1511979:3200401 blocked for
> more than 120 seconds.
> [2022-11-25 10:50:08]Tainted: P  IO  5.15.60-2-pve #1
> [2022-11-25 10:50:08]  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [2022-11-25 10:50:08]  task:iou-wrk-1511979 state:D stack:0
> pid:3200401 ppid: 1 flags:0x4000
> [2022-11-25 10:50:08]  Call Trace:
> [2022-11-25 10:50:08]   
> [2022-11-25 10:50:08]   __schedule+0x33d/0x1750
> [2022-11-25 10:50:08]   schedule+0x4e/0xc0
> [2022-11-25 10:50:08]   rwsem_down_write_slowpath+0x231/0x4f0
> [2022-11-25 10:50:08]   down_write+0x47/0x60
> [2022-11-25 10:50:08]   fuse_file_write_iter+0x1a3/0x430
> [2022-11-25 10:50:08]   ? apparmor_file_permission+0x70/0x170
> [2022-11-25 10:50:08]   io_write+0xfb/0x320
> [2022-11-25 10:50:08]   ? put_dec+0x1c/0xa0
> [2022-11-25 10:50:08]   io_issue_sqe+0x401/0x1fc0
> [2022-11-25 10:50:08]   io_wq_submit_work+0x76/0xd0
> [2022-11-25 10:50:08]   io_worker_handle_work+0x1a7/0x5f0
> [2022-11-25 10:50:08]   io_wqe_worker+0x2c0/0x360
> [2022-11-25 10:50:08]   ? finish_task_switch.isra.0+0x7e/0x2b0
> [2022-11-25 10:50:08]   ? io_worker_handle_work+0x5f0/0x5f0
> [2022-11-25 10:50:08]   ? io_worker_handle_work+0x5f0/0x5f0
> [2022-11-25 10:50:08]   ret_from_fork+0x1f/0x30
> [2022-11-25 10:50:08]  RIP: 0033:0x0
> [2022-11-25 10:50:08]  RSP: 002b: EFLAGS: 0216
> ORIG_RAX: 01aa
> [2022-11-25 10:50:08]  RAX:  RBX: 7fdb1efef640 RCX:
> 7fdd59f872e9
> [2022-11-25 10:50:08]  RDX:  RSI: 0001 RDI:
> 0011
> [2022-11-25 10:50:08]  RBP:  R08:  R09:
> 0008
> [2022-11-25 10:50:08]  R10:  R11: 0216 R12:
> 55662e5bd268
> [2022-11-25 10:50:08]  R13: 55662e5bd320 R14: 55662e5bd260 R15:
> 
> [2022-11-25 10:50:08]   
> [2022-11-25 10:52:08]  INFO: task kmmpd-loop0:681644 blocked for more than
> 241 seconds.
> [2022-11-25 10:52:08]Tainted: P  IO  5.15.60-2-pve #1
> [2022-11-25 10:52:08]  "echo 0 > 

Re: [Gluster-users] GlusterFS mount crash

2022-11-25 Thread Angel Docampo
Well, just happened again, the same server, the same mountpoint.

I'm unable to get the core dumps, coredumpctl says there are no core dumps,
it would be funny if I wasn't the one suffering it, but systemd-coredump
service crashed as well
● systemd-coredump@0-3199871-0.service - Process Core Dump (PID 3199871/UID
0)
Loaded: loaded (/lib/systemd/system/systemd-coredump@.service; static)
Active: failed (Result: timeout) since Fri 2022-11-25 10:54:59 CET;
39min ago
TriggeredBy: ● systemd-coredump.socket
  Docs: man:systemd-coredump(8)
   Process: 3199873 ExecStart=/lib/systemd/systemd-coredump (code=killed,
signal=TERM)
  Main PID: 3199873 (code=killed, signal=TERM)
   CPU: 15ms

Nov 25 10:49:59 pve02 systemd[1]: Started Process Core Dump (PID
3199871/UID 0).
Nov 25 10:54:59 pve02 systemd[1]: systemd-coredump@0-3199871-0.service:
Service reached runtime time limit. Stopping.
Nov 25 10:54:59 pve02 systemd[1]: systemd-coredump@0-3199871-0.service:
Failed with result 'timeout'.


I just saw the exception on dmesg,
[2022-11-25 10:50:08]  INFO: task kmmpd-loop0:681644 blocked for more than
120 seconds.
[2022-11-25 10:50:08]Tainted: P  IO  5.15.60-2-pve #1
[2022-11-25 10:50:08]  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[2022-11-25 10:50:08]  task:kmmpd-loop0 state:D stack:0 pid:681644
ppid: 2 flags:0x4000
[2022-11-25 10:50:08]  Call Trace:
[2022-11-25 10:50:08]   
[2022-11-25 10:50:08]   __schedule+0x33d/0x1750
[2022-11-25 10:50:08]   ? bit_wait+0x70/0x70
[2022-11-25 10:50:08]   schedule+0x4e/0xc0
[2022-11-25 10:50:08]   io_schedule+0x46/0x80
[2022-11-25 10:50:08]   bit_wait_io+0x11/0x70
[2022-11-25 10:50:08]   __wait_on_bit+0x31/0xa0
[2022-11-25 10:50:08]   out_of_line_wait_on_bit+0x8d/0xb0
[2022-11-25 10:50:08]   ? var_wake_function+0x30/0x30
[2022-11-25 10:50:08]   __wait_on_buffer+0x34/0x40
[2022-11-25 10:50:08]   write_mmp_block+0x127/0x180
[2022-11-25 10:50:08]   kmmpd+0x1b9/0x430
[2022-11-25 10:50:08]   ? write_mmp_block+0x180/0x180
[2022-11-25 10:50:08]   kthread+0x127/0x150
[2022-11-25 10:50:08]   ? set_kthread_struct+0x50/0x50
[2022-11-25 10:50:08]   ret_from_fork+0x1f/0x30
[2022-11-25 10:50:08]   
[2022-11-25 10:50:08]  INFO: task iou-wrk-1511979:3200401 blocked for more
than 120 seconds.
[2022-11-25 10:50:08]Tainted: P  IO  5.15.60-2-pve #1
[2022-11-25 10:50:08]  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[2022-11-25 10:50:08]  task:iou-wrk-1511979 state:D stack:0 pid:3200401
ppid: 1 flags:0x4000
[2022-11-25 10:50:08]  Call Trace:
[2022-11-25 10:50:08]   
[2022-11-25 10:50:08]   __schedule+0x33d/0x1750
[2022-11-25 10:50:08]   schedule+0x4e/0xc0
[2022-11-25 10:50:08]   rwsem_down_write_slowpath+0x231/0x4f0
[2022-11-25 10:50:08]   down_write+0x47/0x60
[2022-11-25 10:50:08]   fuse_file_write_iter+0x1a3/0x430
[2022-11-25 10:50:08]   ? apparmor_file_permission+0x70/0x170
[2022-11-25 10:50:08]   io_write+0xfb/0x320
[2022-11-25 10:50:08]   ? put_dec+0x1c/0xa0
[2022-11-25 10:50:08]   io_issue_sqe+0x401/0x1fc0
[2022-11-25 10:50:08]   io_wq_submit_work+0x76/0xd0
[2022-11-25 10:50:08]   io_worker_handle_work+0x1a7/0x5f0
[2022-11-25 10:50:08]   io_wqe_worker+0x2c0/0x360
[2022-11-25 10:50:08]   ? finish_task_switch.isra.0+0x7e/0x2b0
[2022-11-25 10:50:08]   ? io_worker_handle_work+0x5f0/0x5f0
[2022-11-25 10:50:08]   ? io_worker_handle_work+0x5f0/0x5f0
[2022-11-25 10:50:08]   ret_from_fork+0x1f/0x30
[2022-11-25 10:50:08]  RIP: 0033:0x0
[2022-11-25 10:50:08]  RSP: 002b: EFLAGS: 0216
ORIG_RAX: 01aa
[2022-11-25 10:50:08]  RAX:  RBX: 7fdb1efef640 RCX:
7fdd59f872e9
[2022-11-25 10:50:08]  RDX:  RSI: 0001 RDI:
0011
[2022-11-25 10:50:08]  RBP:  R08:  R09:
0008
[2022-11-25 10:50:08]  R10:  R11: 0216 R12:
55662e5bd268
[2022-11-25 10:50:08]  R13: 55662e5bd320 R14: 55662e5bd260 R15:

[2022-11-25 10:50:08]   
[2022-11-25 10:52:08]  INFO: task kmmpd-loop0:681644 blocked for more than
241 seconds.
[2022-11-25 10:52:08]Tainted: P  IO  5.15.60-2-pve #1
[2022-11-25 10:52:08]  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[2022-11-25 10:52:08]  task:kmmpd-loop0 state:D stack:0 pid:681644
ppid: 2 flags:0x4000
[2022-11-25 10:52:08]  Call Trace:
[2022-11-25 10:52:08]   
[2022-11-25 10:52:08]   __schedule+0x33d/0x1750
[2022-11-25 10:52:08]   ? bit_wait+0x70/0x70
[2022-11-25 10:52:08]   schedule+0x4e/0xc0
[2022-11-25 10:52:08]   io_schedule+0x46/0x80
[2022-11-25 10:52:08]   bit_wait_io+0x11/0x70
[2022-11-25 10:52:08]   __wait_on_bit+0x31/0xa0
[2022-11-25 10:52:08]   out_of_line_wait_on_bit+0x8d/0xb0
[2022-11-25 10:52:08]   ? var_wake_function+0x30/0x30
[2022-11-25 10:52:08]   __wait_on_buffer+0x34/0x40
[2022-11-25 10:52:08]   

Re: [Gluster-users] GlusterFS mount crash

2022-11-22 Thread Angel Docampo
I've taken a look into all possible places they should be, and I couldn't
find it anywhere. Some people say the dump file is generated where the
application is running... well, I don't know where to look then, and I hope
they hadn't been generated on the failed mountpoint.

As Debian 11 has systemd, I've installed systemd-coredump, so in the case a
new crash happens, at least I will have the exact location and tool
(coredumpctl) to find them and will install then the debug symbols, which
is particularly tricky on debian. But I need to wait to happen again, now
the tool says there isn't any core dump on the system.

Thank you, Xavi, if this happens again (let's hope it won't), I will report
back.

Best regards!

*Angel Docampo*

<+34-93-1592929>


El mar, 22 nov 2022 a las 10:45, Xavi Hernandez ()
escribió:

> The crash seems related to some problem in ec xlator, but I don't have
> enough information to determine what it is. The crash should have generated
> a core dump somewhere in the system (I don't know where Debian keeps the
> core dumps). If you find it, you should be able to open it using this
> command (make sure debug symbols package is also installed before running
> it):
>
> # gdb /usr/sbin/glusterfs 
>
> And then run this command:
>
> # bt -full
>
> Regards,
>
> Xavi
>
> On Tue, Nov 22, 2022 at 9:41 AM Angel Docampo 
> wrote:
>
>> Hi Xavi,
>>
>> The OS is Debian 11 with the proxmox kernel. Gluster packages are the
>> official from gluster.org (
>> https://download.gluster.org/pub/gluster/glusterfs/10/10.3/Debian/bullseye/
>> )
>>
>> The system logs showed no other issues by the time of the crash, no OOM
>> kill or whatsoever, and no other process was interacting with the gluster
>> mountpoint besides proxmox.
>>
>> I wasn't running gdb when it crashed, so I don't really know if I can
>> obtain a more detailed trace from logs or if there is a simple way to let
>> it running in the background to see if it happens again (or there is a flag
>> to start the systemd daemon in debug mode).
>>
>> Best,
>>
>> *Angel Docampo*
>>
>> 
>> <+34-93-1592929>
>>
>>
>> El lun, 21 nov 2022 a las 15:16, Xavi Hernandez ()
>> escribió:
>>
>>> Hi Angel,
>>>
>>> On Mon, Nov 21, 2022 at 2:33 PM Angel Docampo <
>>> angel.doca...@eoniantec.com> wrote:
>>>
 Sorry for necrobumping this, but this morning I've suffered this on my
 Proxmox  + GlusterFS cluster. In the log I can see this

 [2022-11-21 07:38:00.213620 +] I [MSGID: 133017]
 [shard.c:7275:shard_seek] 11-vmdata-shard: seek called on
 fbc063cb-874e-475d-b585-f89
 f7518acdd. [Operation not supported]
 pending frames:
 frame : type(1) op(WRITE)
 frame : type(0) op(0)
 frame : type(0) op(0)
 frame : type(0) op(0)
 frame : type(0) op(0)
 frame : type(0) op(0)
 frame : type(0) op(0)
 frame : type(0) op(0)
 frame : type(0) op(0)
 frame : type(0) op(0)
 frame : type(0) op(0)
 frame : type(0) op(0)
 frame : type(0) op(0)
 frame : type(0) op(0)
 frame : type(0) op(0)
 frame : type(0) op(0)
 frame : type(0) op(0)
 ...
 frame : type(1) op(FSYNC)
 frame : type(1) op(FSYNC)
 frame : type(1) op(FSYNC)
 frame : type(1) op(FSYNC)
 frame : type(1) op(FSYNC)
 frame : type(1) op(FSYNC)
 frame : type(1) op(FSYNC)
 frame : type(1) op(FSYNC)
 frame : type(1) op(FSYNC)
 frame : type(1) op(FSYNC)
 frame : type(1) op(FSYNC)
 frame : type(1) op(FSYNC)
 frame : type(1) op(FSYNC)
 frame : type(1) op(FSYNC)
 frame : type(1) op(FSYNC)
 frame : type(1) op(FSYNC)
 frame : type(1) op(FSYNC)
 frame : type(1) op(FSYNC)
 frame : type(1) op(FSYNC)
 frame : type(1) op(FSYNC)
 patchset: git://git.gluster.org/glusterfs.git
 signal received: 11
 time of crash:
 2022-11-21 07:38:00 +
 configuration details:
 argp 1
 backtrace 1
 dlfcn 1
 libpthread 1
 llistxattr 1
 setfsid 1
 epoll.h 1
 xattr.h 1
 st_atim.tv_nsec 1
 package-string: glusterfs 10.3
 /lib/x86_64-linux-gnu/libglusterfs.so.0(+0x28a54)[0x7f74f286ba54]
 /lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x700)[0x7f74f2873fc0]

 /lib/x86_64-linux-gnu/libc.so.6(+0x38d60)[0x7f74f262ed60]
 /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x37a14)[0x7f74ecfcea14]

 /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]

 

Re: [Gluster-users] GlusterFS mount crash

2022-11-22 Thread Xavi Hernandez
The crash seems related to some problem in ec xlator, but I don't have
enough information to determine what it is. The crash should have generated
a core dump somewhere in the system (I don't know where Debian keeps the
core dumps). If you find it, you should be able to open it using this
command (make sure debug symbols package is also installed before running
it):

# gdb /usr/sbin/glusterfs 

And then run this command:

# bt -full

Regards,

Xavi

On Tue, Nov 22, 2022 at 9:41 AM Angel Docampo 
wrote:

> Hi Xavi,
>
> The OS is Debian 11 with the proxmox kernel. Gluster packages are the
> official from gluster.org (
> https://download.gluster.org/pub/gluster/glusterfs/10/10.3/Debian/bullseye/
> )
>
> The system logs showed no other issues by the time of the crash, no OOM
> kill or whatsoever, and no other process was interacting with the gluster
> mountpoint besides proxmox.
>
> I wasn't running gdb when it crashed, so I don't really know if I can
> obtain a more detailed trace from logs or if there is a simple way to let
> it running in the background to see if it happens again (or there is a flag
> to start the systemd daemon in debug mode).
>
> Best,
>
> *Angel Docampo*
>
> 
> <+34-93-1592929>
>
>
> El lun, 21 nov 2022 a las 15:16, Xavi Hernandez ()
> escribió:
>
>> Hi Angel,
>>
>> On Mon, Nov 21, 2022 at 2:33 PM Angel Docampo <
>> angel.doca...@eoniantec.com> wrote:
>>
>>> Sorry for necrobumping this, but this morning I've suffered this on my
>>> Proxmox  + GlusterFS cluster. In the log I can see this
>>>
>>> [2022-11-21 07:38:00.213620 +] I [MSGID: 133017]
>>> [shard.c:7275:shard_seek] 11-vmdata-shard: seek called on
>>> fbc063cb-874e-475d-b585-f89
>>> f7518acdd. [Operation not supported]
>>> pending frames:
>>> frame : type(1) op(WRITE)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> ...
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> patchset: git://git.gluster.org/glusterfs.git
>>> signal received: 11
>>> time of crash:
>>> 2022-11-21 07:38:00 +
>>> configuration details:
>>> argp 1
>>> backtrace 1
>>> dlfcn 1
>>> libpthread 1
>>> llistxattr 1
>>> setfsid 1
>>> epoll.h 1
>>> xattr.h 1
>>> st_atim.tv_nsec 1
>>> package-string: glusterfs 10.3
>>> /lib/x86_64-linux-gnu/libglusterfs.so.0(+0x28a54)[0x7f74f286ba54]
>>> /lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x700)[0x7f74f2873fc0]
>>>
>>> /lib/x86_64-linux-gnu/libc.so.6(+0x38d60)[0x7f74f262ed60]
>>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x37a14)[0x7f74ecfcea14]
>>>
>>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]
>>>
>>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]
>>>
>>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x21d59)[0x7f74ecfb8d59]
>>>
>>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x22815)[0x7f74ecfb9815]
>>>
>>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x377d9)[0x7f74ecfce7d9]
>>>
>>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]
>>>
>>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]
>>>
>>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x170f9)[0x7f74ecfae0f9]
>>>
>>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x313bb)[0x7f74ecfc83bb]
>>>
>>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/protocol/client.so(+0x48e3a)[0x7f74ed06ce3a]
>>>
>>> /lib/x86_64-linux-gnu/libgfrpc.so.0(+0xfccb)[0x7f74f2816ccb]
>>> /lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7f74f2812646]
>>>
>>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0x64c8)[0x7f74ee15f4c8]
>>>
>>> 

Re: [Gluster-users] GlusterFS mount crash

2022-11-22 Thread Angel Docampo
Hi Xavi,

The OS is Debian 11 with the proxmox kernel. Gluster packages are the
official from gluster.org (
https://download.gluster.org/pub/gluster/glusterfs/10/10.3/Debian/bullseye/)

The system logs showed no other issues by the time of the crash, no OOM
kill or whatsoever, and no other process was interacting with the gluster
mountpoint besides proxmox.

I wasn't running gdb when it crashed, so I don't really know if I can
obtain a more detailed trace from logs or if there is a simple way to let
it running in the background to see if it happens again (or there is a flag
to start the systemd daemon in debug mode).

Best,

*Angel Docampo*

<+34-93-1592929>


El lun, 21 nov 2022 a las 15:16, Xavi Hernandez ()
escribió:

> Hi Angel,
>
> On Mon, Nov 21, 2022 at 2:33 PM Angel Docampo 
> wrote:
>
>> Sorry for necrobumping this, but this morning I've suffered this on my
>> Proxmox  + GlusterFS cluster. In the log I can see this
>>
>> [2022-11-21 07:38:00.213620 +] I [MSGID: 133017]
>> [shard.c:7275:shard_seek] 11-vmdata-shard: seek called on
>> fbc063cb-874e-475d-b585-f89
>> f7518acdd. [Operation not supported]
>> pending frames:
>> frame : type(1) op(WRITE)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> ...
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> patchset: git://git.gluster.org/glusterfs.git
>> signal received: 11
>> time of crash:
>> 2022-11-21 07:38:00 +
>> configuration details:
>> argp 1
>> backtrace 1
>> dlfcn 1
>> libpthread 1
>> llistxattr 1
>> setfsid 1
>> epoll.h 1
>> xattr.h 1
>> st_atim.tv_nsec 1
>> package-string: glusterfs 10.3
>> /lib/x86_64-linux-gnu/libglusterfs.so.0(+0x28a54)[0x7f74f286ba54]
>> /lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x700)[0x7f74f2873fc0]
>>
>> /lib/x86_64-linux-gnu/libc.so.6(+0x38d60)[0x7f74f262ed60]
>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x37a14)[0x7f74ecfcea14]
>>
>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]
>>
>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]
>>
>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x21d59)[0x7f74ecfb8d59]
>>
>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x22815)[0x7f74ecfb9815]
>>
>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x377d9)[0x7f74ecfce7d9]
>>
>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]
>>
>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]
>>
>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x170f9)[0x7f74ecfae0f9]
>>
>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x313bb)[0x7f74ecfc83bb]
>>
>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/protocol/client.so(+0x48e3a)[0x7f74ed06ce3a]
>>
>> /lib/x86_64-linux-gnu/libgfrpc.so.0(+0xfccb)[0x7f74f2816ccb]
>> /lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7f74f2812646]
>>
>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0x64c8)[0x7f74ee15f4c8]
>>
>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0xd38c)[0x7f74ee16638c]
>>
>> /lib/x86_64-linux-gnu/libglusterfs.so.0(+0x7971d)[0x7f74f28bc71d]
>> /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7)[0x7f74f27d2ea7]
>> /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f74f26f2aef]
>> -
>> The mount point wasn't accessible with the "Transport endpoint is not
>> connected" message and it was shown like this.
>> d?   ? ???? vmdata
>>
>> I had to stop all the VMs on that proxmox node, then stop the gluster
>> daemon to ummount de directory, and after starting the daemon and
>> re-mounting, all was working again.
>>
>> My gluster volume info returns this
>>
>> Volume Name: vmdata
>> 

Re: [Gluster-users] GlusterFS mount crash

2022-11-21 Thread Xavi Hernandez
Hi Angel,

On Mon, Nov 21, 2022 at 2:33 PM Angel Docampo 
wrote:

> Sorry for necrobumping this, but this morning I've suffered this on my
> Proxmox  + GlusterFS cluster. In the log I can see this
>
> [2022-11-21 07:38:00.213620 +] I [MSGID: 133017]
> [shard.c:7275:shard_seek] 11-vmdata-shard: seek called on
> fbc063cb-874e-475d-b585-f89
> f7518acdd. [Operation not supported]
> pending frames:
> frame : type(1) op(WRITE)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> ...
> frame : type(1) op(FSYNC)
> frame : type(1) op(FSYNC)
> frame : type(1) op(FSYNC)
> frame : type(1) op(FSYNC)
> frame : type(1) op(FSYNC)
> frame : type(1) op(FSYNC)
> frame : type(1) op(FSYNC)
> frame : type(1) op(FSYNC)
> frame : type(1) op(FSYNC)
> frame : type(1) op(FSYNC)
> frame : type(1) op(FSYNC)
> frame : type(1) op(FSYNC)
> frame : type(1) op(FSYNC)
> frame : type(1) op(FSYNC)
> frame : type(1) op(FSYNC)
> frame : type(1) op(FSYNC)
> frame : type(1) op(FSYNC)
> frame : type(1) op(FSYNC)
> frame : type(1) op(FSYNC)
> frame : type(1) op(FSYNC)
> patchset: git://git.gluster.org/glusterfs.git
> signal received: 11
> time of crash:
> 2022-11-21 07:38:00 +
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> libpthread 1
> llistxattr 1
> setfsid 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 10.3
> /lib/x86_64-linux-gnu/libglusterfs.so.0(+0x28a54)[0x7f74f286ba54]
> /lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x700)[0x7f74f2873fc0]
>
> /lib/x86_64-linux-gnu/libc.so.6(+0x38d60)[0x7f74f262ed60]
> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x37a14)[0x7f74ecfcea14]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x21d59)[0x7f74ecfb8d59]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x22815)[0x7f74ecfb9815]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x377d9)[0x7f74ecfce7d9]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x170f9)[0x7f74ecfae0f9]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x313bb)[0x7f74ecfc83bb]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/protocol/client.so(+0x48e3a)[0x7f74ed06ce3a]
>
> /lib/x86_64-linux-gnu/libgfrpc.so.0(+0xfccb)[0x7f74f2816ccb]
> /lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7f74f2812646]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0x64c8)[0x7f74ee15f4c8]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0xd38c)[0x7f74ee16638c]
>
> /lib/x86_64-linux-gnu/libglusterfs.so.0(+0x7971d)[0x7f74f28bc71d]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7)[0x7f74f27d2ea7]
> /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f74f26f2aef]
> -
> The mount point wasn't accessible with the "Transport endpoint is not
> connected" message and it was shown like this.
> d?   ? ???? vmdata
>
> I had to stop all the VMs on that proxmox node, then stop the gluster
> daemon to ummount de directory, and after starting the daemon and
> re-mounting, all was working again.
>
> My gluster volume info returns this
>
> Volume Name: vmdata
> Type: Distributed-Disperse
> Volume ID: cace5aa4-b13a-4750-8736-aa179c2485e1
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 2 x (2 + 1) = 6
> Transport-type: tcp
> Bricks:
> Brick1: g01:/data/brick1/brick
> Brick2: g02:/data/brick2/brick
> Brick3: g03:/data/brick1/brick
> Brick4: g01:/data/brick2/brick
> Brick5: g02:/data/brick1/brick
> Brick6: g03:/data/brick2/brick
> Options Reconfigured:
> nfs.disable: on
> transport.address-family: inet
> storage.fips-mode-rchecksum: on
> features.shard: enable
> features.shard-block-size: 256MB
> performance.read-ahead: off
> performance.quick-read: off
> performance.io-cache: off
> server.event-threads: 2
> client.event-threads: 3
> performance.client-io-threads: on
> performance.stat-prefetch: off
> dht.force-readdirp: off
> performance.force-readdirp: off
> network.remote-dio: on
> features.cache-invalidation: on
> performance.parallel-readdir: on
> performance.readdir-ahead: on
>
> Xavi, do you think the open-behind off setting can help somehow? I did try
> to understand what it does (with no luck), and if 

Re: [Gluster-users] GlusterFS mount crash

2022-11-21 Thread Angel Docampo
Sorry for necrobumping this, but this morning I've suffered this on my
Proxmox  + GlusterFS cluster. In the log I can see this

[2022-11-21 07:38:00.213620 +] I [MSGID: 133017]
[shard.c:7275:shard_seek] 11-vmdata-shard: seek called on
fbc063cb-874e-475d-b585-f89
f7518acdd. [Operation not supported]
pending frames:
frame : type(1) op(WRITE)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
...
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2022-11-21 07:38:00 +
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 10.3
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x28a54)[0x7f74f286ba54]
/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x700)[0x7f74f2873fc0]

/lib/x86_64-linux-gnu/libc.so.6(+0x38d60)[0x7f74f262ed60]
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x37a14)[0x7f74ecfcea14]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x21d59)[0x7f74ecfb8d59]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x22815)[0x7f74ecfb9815]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x377d9)[0x7f74ecfce7d9]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x170f9)[0x7f74ecfae0f9]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x313bb)[0x7f74ecfc83bb]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/protocol/client.so(+0x48e3a)[0x7f74ed06ce3a]

/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xfccb)[0x7f74f2816ccb]
/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7f74f2812646]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0x64c8)[0x7f74ee15f4c8]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0xd38c)[0x7f74ee16638c]

/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x7971d)[0x7f74f28bc71d]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7)[0x7f74f27d2ea7]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f74f26f2aef]
-
The mount point wasn't accessible with the "Transport endpoint is not
connected" message and it was shown like this.
d?   ? ???? vmdata

I had to stop all the VMs on that proxmox node, then stop the gluster
daemon to ummount de directory, and after starting the daemon and
re-mounting, all was working again.

My gluster volume info returns this

Volume Name: vmdata
Type: Distributed-Disperse
Volume ID: cace5aa4-b13a-4750-8736-aa179c2485e1
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: g01:/data/brick1/brick
Brick2: g02:/data/brick2/brick
Brick3: g03:/data/brick1/brick
Brick4: g01:/data/brick2/brick
Brick5: g02:/data/brick1/brick
Brick6: g03:/data/brick2/brick
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
features.shard: enable
features.shard-block-size: 256MB
performance.read-ahead: off
performance.quick-read: off
performance.io-cache: off
server.event-threads: 2
client.event-threads: 3
performance.client-io-threads: on
performance.stat-prefetch: off
dht.force-readdirp: off
performance.force-readdirp: off
network.remote-dio: on
features.cache-invalidation: on
performance.parallel-readdir: on
performance.readdir-ahead: on

Xavi, do you think the open-behind off setting can help somehow? I did try
to understand what it does (with no luck), and if it could impact the
performance of my VMs (I've the setup you know so well ;))
I would like to avoid more crashings like this, version 10.3 of gluster was
working since two weeks ago, quite well until this morning.

*Angel Docampo*

Re: [Gluster-users] GlusterFS mount crash

2021-03-18 Thread David Cunningham
Hi Xavi,

Thank you for that information. We'll look at upgrading it.


On Fri, 12 Mar 2021 at 05:20, Xavi Hernandez  wrote:

> Hi David,
>
> with so little information it's hard to tell, but given that there are
> several OPEN and UNLINK operations, it could be related to an already fixed
> bug (in recent versions) in open-behind.
>
> You can try disabling open-behind with this command:
>
> # gluster volume set  open-behind off
>
> But given the version you are using is very old and unmaintained, I would
> recommend you to upgrade to 8.x at least.
>
> Regards,
>
> Xavi
>
>
> On Wed, Mar 10, 2021 at 5:10 AM David Cunningham <
> dcunning...@voisonics.com> wrote:
>
>> Hello,
>>
>> We have a GlusterFS 5.13 server which also mounts itself with the native
>> FUSE client. Recently the FUSE mount crashed and we found the following in
>> the syslog. There isn't anything logged in mnt-glusterfs.log for that time.
>> After killing all processes with a file handle open on the filesystem we
>> were able to unmount and then remount the filesystem successfully.
>>
>> Would anyone have advice on how to debug this crash? Thank you in advance!
>>
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: pending frames:
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(0) op(0)
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(0) op(0)
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(UNLINK)
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(UNLINK)
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(OPEN)
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message repeated 3355 times: [
>> frame : type(1) op(OPEN)]
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(OPEN)
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message repeated 6965 times: [
>> frame : type(1) op(OPEN)]
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(OPEN)
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message repeated 4095 times: [
>> frame : type(1) op(OPEN)]
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(0) op(0)
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: patchset: git://
>> git.gluster.org/glusterfs.git
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: signal received: 11
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: time of crash:
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: 2021-03-09 03:12:31
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: configuration details:
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: argp 1
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: backtrace 1
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: dlfcn 1
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: libpthread 1
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: llistxattr 1
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: setfsid 1
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: spinlock 1
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: epoll.h 1
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: xattr.h 1
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: st_atim.tv_nsec 1
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: package-string: glusterfs 5.13
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: -
>> ...
>> Mar 9 05:13:50 voip1 systemd[1]: glusterfssharedstorage.service: Main
>> process exited, code=killed, status=11/SEGV
>> Mar 9 05:13:50 voip1 systemd[1]: glusterfssharedstorage.service: Failed
>> with result 'signal'.
>> ...
>> Mar 9 05:13:54 voip1 systemd[1]: glusterfssharedstorage.service: Service
>> hold-off time over, scheduling restart.
>> Mar 9 05:13:54 voip1 systemd[1]: glusterfssharedstorage.service:
>> Scheduled restart job, restart counter is at 2.
>> Mar 9 05:13:54 voip1 systemd[1]: Stopped Mount glusterfs sharedstorage.
>> Mar 9 05:13:54 voip1 systemd[1]: Starting Mount glusterfs sharedstorage...
>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: ERROR: Mount point
>> does not exist
>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: Please specify a
>> mount point
>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: Usage:
>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: man 8
>> /sbin/mount.glusterfs
>>
>> --
>> David Cunningham, Voisonics Limited
>> http://voisonics.com/
>> USA: +1 213 221 1092
>> New Zealand: +64 (0)28 2558 3782
>> 
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://meet.google.com/cpu-eiue-hvk
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] GlusterFS mount crash

2021-03-11 Thread Xavi Hernandez
Hi David,

with so little information it's hard to tell, but given that there are
several OPEN and UNLINK operations, it could be related to an already fixed
bug (in recent versions) in open-behind.

You can try disabling open-behind with this command:

# gluster volume set  open-behind off

But given the version you are using is very old and unmaintained, I would
recommend you to upgrade to 8.x at least.

Regards,

Xavi


On Wed, Mar 10, 2021 at 5:10 AM David Cunningham 
wrote:

> Hello,
>
> We have a GlusterFS 5.13 server which also mounts itself with the native
> FUSE client. Recently the FUSE mount crashed and we found the following in
> the syslog. There isn't anything logged in mnt-glusterfs.log for that time.
> After killing all processes with a file handle open on the filesystem we
> were able to unmount and then remount the filesystem successfully.
>
> Would anyone have advice on how to debug this crash? Thank you in advance!
>
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: pending frames:
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(0) op(0)
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(0) op(0)
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(UNLINK)
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(UNLINK)
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(OPEN)
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message repeated 3355 times: [
> frame : type(1) op(OPEN)]
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(OPEN)
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message repeated 6965 times: [
> frame : type(1) op(OPEN)]
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(OPEN)
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message repeated 4095 times: [
> frame : type(1) op(OPEN)]
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(0) op(0)
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: patchset: git://
> git.gluster.org/glusterfs.git
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: signal received: 11
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: time of crash:
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: 2021-03-09 03:12:31
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: configuration details:
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: argp 1
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: backtrace 1
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: dlfcn 1
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: libpthread 1
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: llistxattr 1
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: setfsid 1
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: spinlock 1
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: epoll.h 1
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: xattr.h 1
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: st_atim.tv_nsec 1
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: package-string: glusterfs 5.13
> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: -
> ...
> Mar 9 05:13:50 voip1 systemd[1]: glusterfssharedstorage.service: Main
> process exited, code=killed, status=11/SEGV
> Mar 9 05:13:50 voip1 systemd[1]: glusterfssharedstorage.service: Failed
> with result 'signal'.
> ...
> Mar 9 05:13:54 voip1 systemd[1]: glusterfssharedstorage.service: Service
> hold-off time over, scheduling restart.
> Mar 9 05:13:54 voip1 systemd[1]: glusterfssharedstorage.service: Scheduled
> restart job, restart counter is at 2.
> Mar 9 05:13:54 voip1 systemd[1]: Stopped Mount glusterfs sharedstorage.
> Mar 9 05:13:54 voip1 systemd[1]: Starting Mount glusterfs sharedstorage...
> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: ERROR: Mount point
> does not exist
> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: Please specify a
> mount point
> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: Usage:
> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: man 8
> /sbin/mount.glusterfs
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users