Re: [ceph-users] PG stuck peering - OSD cephx: verify_authorizer key problem

2019-06-07 Thread Jan Pekař - Imatic
and was not successful or the monitors reacted not correctly in this situation and didn't complete key exchange with OSD's. After system disk replacement on the problematic mon, verify_authorizer problem was not in log anymore. With regards Jan Pekar On 01/05/2019 13.58, Jan Pekař - Imatic wrote: Today problem

Re: [ceph-users] PG stuck peering - OSD cephx: verify_authorizer key problem

2019-05-01 Thread Jan Pekař - Imatic
Today problem reappeared. Restarting mon helps, but it is no solving the issue. Is there any way how to debug that? Can I dump this keys from MON, from OSD or other components? Can I debug key exchange? Thank you On 27/04/2019 10.56, Jan Pekař - Imatic wrote: On 26/04/2019 21.50, Gregory

Re: [ceph-users] PG stuck peering - OSD cephx: verify_authorizer key problem

2019-04-27 Thread Jan Pekař - Imatic
On 26/04/2019 21.50, Gregory Farnum wrote: On Fri, Apr 26, 2019 at 10:55 AM Jan Pekař - Imatic wrote: Hi, yesterday my cluster reported slow request for minutes and after restarting OSDs (reporting slow requests) it stuck with peering PGs. Whole cluster was not responding and IO stopped. I

[ceph-users] PG stuck peering - OSD cephx: verify_authorizer key problem

2019-04-26 Thread Jan Pekař - Imatic
there some timeout or grace period of old keys usage before they are invalidated? Thank you With regards Jan Pekar -- ==== Ing. Jan Pekař jan.pe...@imatic.cz Imatic | Jagellonská 14 | Praha 3 | 130 00 http://www.imatic.cz -- ___

Re: [ceph-users] Unfound object on erasure when recovering

2018-10-04 Thread Jan Pekař - Imatic
m appeared before trying to re-balance my cluster and was invisible to me. But it never happened before and scrub and depp-scrub is running regularly. I don't know where to continue with debugging this problem. JP On 3.10.2018 08:47, Jan Pekař - Imatic wrote: Hi all, I'm playing with my testi

[ceph-users] Unfound object on erasure when recovering

2018-10-03 Thread Jan Pekař - Imatic
quot;snapid": -2,     "hash": 586898362,     "max": 0,     "pool": 10,     "namespace": ""     },     "need": "13528'6795",     &q

Re: [ceph-users] OSD crash during pg repair - recovery_info.ss.clone_snaps.end and other problems

2018-03-07 Thread Jan Pekař - Imatic
On 6.3.2018 22:28, Gregory Farnum wrote: On Sat, Mar 3, 2018 at 2:28 AM Jan Pekař - Imatic <jan.pe...@imatic.cz <mailto:jan.pe...@imatic.cz>> wrote: Hi all, I have few problems on my cluster, that are maybe linked together and now caused OSD down during pg repair.

[ceph-users] OSD crash during pg repair - recovery_info.ss.clone_snaps.end and other problems

2018-03-03 Thread Jan Pekař - Imatic
Hi all, I have few problems on my cluster, that are maybe linked together and now caused OSD down during pg repair. First few notes about my cluster: 4 nodes, 15 OSDs installed on Luminous (no upgrade). Replicated pools with 1 pool (pool 6) cached by ssd disks. I don't detect any hardware

Re: [ceph-users] Corrupted files on CephFS since Luminous upgrade

2018-03-03 Thread Jan Pekař - Imatic
On 3.3.2018 11:12, Yan, Zheng wrote: On Tue, Feb 27, 2018 at 2:29 PM, Jan Pekař - Imatic <jan.pe...@imatic.cz> wrote: I think I hit the same issue. I have corrupted data on cephfs and I don't remember the same issue before Luminous (i did the same tests before). It is on my test 1 node c

Re: [ceph-users] Corrupted files on CephFS since Luminous upgrade

2018-03-03 Thread Jan Pekař - Imatic
and let you know. With regards Jan Pekar On 28.2.2018 15:14, David C wrote: On 27 Feb 2018 06:46, "Jan Pekař - Imatic" <jan.pe...@imatic.cz <mailto:jan.pe...@imatic.cz>> wrote: I think I hit the same issue. I have corrupted data on cephfs and I don't r

Re: [ceph-users] Corrupted files on CephFS since Luminous upgrade

2018-02-26 Thread Jan Pekař - Imatic
there is any change. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- ==== Ing. Jan Pekař jan.pe...@imatic.cz | +420603811737 Imatic | Jagellonská 14 | Praha 3 | 130 00

[ceph-users] Problem with OSD down and problematic rbd object

2018-01-05 Thread Jan Pekař - Imatic
Hi all, yesterday I got OSD down with error 2018-01-04 06:47:25.304513 7fe6eda51700 -1 log_channel(cluster) log [ERR] : 6.20 repair 1 missing, 0 inconsistent objects 2018-01-04 06:47:25.312861 7fe6eda51700 -1 log_channel(cluster) log [ERR] : 6.20 repair 3 errors, 2 fixed 2018-01-04

Re: [ceph-users] rbd-nbd timeout and crash

2018-01-04 Thread Jan Pekař - Imatic
egards Jan Pekar On 6.12.2017 23:58, David Turner wrote: Do you have the FS mounted with a trimming ability?  What are your mount options? On Wed, Dec 6, 2017 at 5:30 PM Jan Pekař - Imatic <jan.pe...@imatic.cz <mailto:jan.pe...@imatic.cz>> wrote: Hi, On 6.12.2017 15:24

Re: [ceph-users] Cluster stuck in failed state after power failure - please help

2017-12-11 Thread Jan Pekař - Imatic
at seeing up an mgr daemon. On Mon, Dec 11, 2017, 2:07 PM Jan Pekař - Imatic <jan.pe...@imatic.cz <mailto:jan.pe...@imatic.cz>> wrote: Hi, thank you for response. I started mds manually and accessed cephfs, I'm not running mgr yet, it is not necessary. I just responde

Re: [ceph-users] Cluster stuck in failed state after power failure - please help

2017-12-11 Thread Jan Pekař - Imatic
c 11, 2017 at 1:08 PM Jan Pekař - Imatic <jan.pe...@imatic.cz <mailto:jan.pe...@imatic.cz>> wrote: Hi all, hope that somebody can help me. I have home ceph installation. After power failure (it can happen in datacenter also) my ceph booted in non-consistent state.

Re: [ceph-users] Cluster stuck in failed state after power failure - please help

2017-12-11 Thread Jan Pekař - Imatic
that pg data from osd's? In osd logs I can see, that backfilling is continuing etc, so they have correct informations or they are running previous operations before power failure. With regards Jan Pekar On 11.12.2017 19:07, Jan Pekař - Imatic wrote: Hi all, hope that somebody can help me. I have

[ceph-users] Cluster stuck in failed state after power failure - please help

2017-12-11 Thread Jan Pekař - Imatic
Hi all, hope that somebody can help me. I have home ceph installation. After power failure (it can happen in datacenter also) my ceph booted in non-consistent state. I was backfilling data on one new disk during power failure. First time it booted without some OSDs, but I fixed that. Now I

Re: [ceph-users] rbd-nbd timeout and crash

2017-12-06 Thread Jan Pekař - Imatic
Hi, On 6.12.2017 15:24, Jason Dillaman wrote: On Wed, Dec 6, 2017 at 3:46 AM, Jan Pekař - Imatic <jan.pe...@imatic.cz> wrote: Hi, I run to overloaded cluster (deep-scrub running) for few seconds and rbd-nbd client timeouted, and device become unavailable. block nbd0: Connection tim

[ceph-users] rbd-nbd timeout and crash

2017-12-06 Thread Jan Pekař - Imatic
Hi, I run to overloaded cluster (deep-scrub running) for few seconds and rbd-nbd client timeouted, and device become unavailable. block nbd0: Connection timed out block nbd0: shutting down sockets block nbd0: Connection timed out print_req_error: I/O error, dev nbd0, sector 2131833856

Re: [ceph-users] RBD corruption when removing tier cache

2017-12-02 Thread Jan Pekař - Imatic
ays to flush all objects (like turn off VMs, set short time to evict or target size) and remove overlay after that. With regards Jan Pekar On 1.12.2017 03:43, Jan Pekař - Imatic wrote: Hi all, today I tested adding SSD cache tier to pool. Everything worked, but when I tried to remove it and run rados

[ceph-users] RBD corruption when removing tier cache

2017-11-30 Thread Jan Pekař - Imatic
Hi all, today I tested adding SSD cache tier to pool. Everything worked, but when I tried to remove it and run rados -p hot-pool cache-flush-evict-all I got rbd_data.9c000238e1f29. failed to flush /rbd_data.9c000238e1f29.: (2) No such file or directory

Re: [ceph-users] Libvirt hosts freeze after ceph osd+mon problem

2017-11-08 Thread Jan Pekař - Imatic
was deadlocked, the worst case that I would expect would be your guest OS complaining about hung kernel tasks related to disk IO (since the disk wouldn't be responding). On Mon, Nov 6, 2017 at 6:02 PM, Jan Pekař - Imatic <jan.pe...@imatic.cz <mailto:jan.pe...@imatic.cz>> wro

Re: [ceph-users] Libvirt hosts freeze after ceph osd+mon problem

2017-11-07 Thread Jan Pekař - Imatic
den Hollander wrote: Op 7 november 2017 om 10:14 schreef Jan Pekař - Imatic <jan.pe...@imatic.cz>: Additional info - it is not librbd related, I mapped disk through rbd map and it was the same - virtuals were stuck/frozen. I happened exactly when in my log appeared Why aren't you

Re: [ceph-users] Libvirt hosts freeze after ceph osd+mon problem

2017-11-07 Thread Jan Pekař - Imatic
st trying a different version of QEMU and/or different host OS since loss of a disk shouldn't hang it -- only potentially the guest OS. On Tue, Nov 7, 2017 at 5:17 AM, Jan Pekař - Imatic <jan.pe...@imatic.cz <mailto:jan.pe...@imatic.cz>> wrote: I'm calling kill -STOP to simulat

Re: [ceph-users] Libvirt hosts freeze after ceph osd+mon problem

2017-11-07 Thread Jan Pekař - Imatic
attached inside QEMU/KVM virtuals. JP On 7.11.2017 10:57, Piotr Dałek wrote: On 17-11-07 12:02 AM, Jan Pekař - Imatic wrote: Hi, I'm using debian stretch with ceph 12.2.1-1~bpo80+1 and qemu 1:2.8+dfsg-6+deb9u3 I'm running 3 nodes with 3 monitors and 8 osds on my nodes, all on IPV6. When I

Re: [ceph-users] Libvirt hosts freeze after ceph osd+mon problem

2017-11-07 Thread Jan Pekař - Imatic
was deadlocked, the worst case that I would expect would be your guest OS complaining about hung kernel tasks related to disk IO (since the disk wouldn't be responding). On Mon, Nov 6, 2017 at 6:02 PM, Jan Pekař - Imatic <jan.pe...@imatic.cz <mailto:jan.pe...@imatic.cz>> wrote:

[ceph-users] Libvirt hosts freeze after ceph osd+mon problem

2017-11-06 Thread Jan Pekař - Imatic
Hi, I'm using debian stretch with ceph 12.2.1-1~bpo80+1 and qemu 1:2.8+dfsg-6+deb9u3 I'm running 3 nodes with 3 monitors and 8 osds on my nodes, all on IPV6. When I tested the cluster, I detected strange and severe problem. On first node I'm running qemu hosts with librados disk connection to

Re: [ceph-users] CephFS kernel client reboots on write

2015-07-13 Thread Jan Pekař
On 2015-07-13 12:01, Gregory Farnum wrote: On Mon, Jul 13, 2015 at 9:49 AM, Ilya Dryomov idryo...@gmail.com wrote: On Fri, Jul 10, 2015 at 9:36 PM, Jan Pekař jan.pe...@imatic.cz wrote: Hi all, I think I found a bug in cephfs kernel client. When I create directory in cephfs and set layout

[ceph-users] CephFS kernel client reboots on write

2015-07-10 Thread Jan Pekař
Hi all, I think I found a bug in cephfs kernel client. When I create directory in cephfs and set layout to ceph.dir.layout=stripe_unit=1073741824 stripe_count=1 object_size=1073741824 pool=somepool attepmts to write larger file will cause kernel hung or reboot. When I'm using cephfs client

Re: [ceph-users] Pg's stuck in inactive/unclean state + Association from PG-OSD does not seem to be happenning.

2014-11-10 Thread Jan Pekař
ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Ing. Jan Pekař jan.pe...@imatic.cz | +420603811737 Imatic | Jagellonská 14 | Praha 3 | 130 00 http://www.imatic.cz -- ___ ceph-users mailing

Re: [ceph-users] Stuck in stale state

2014-11-10 Thread Jan Pekař
On 2014-11-10 20:53, Craig Lewis wrote: nothing to send, going to standby isn't necessarily bad, I see it from time to time. It shouldn't stay like that for long though. If it's been 5 minutes, and the cluster still isn't doing anything, I'd restart that osd. On Fri, Nov 7, 2014 at 1:55 PM, Jan Pekař

[ceph-users] Erasure coding parameters change

2014-11-09 Thread Jan Pekař
Hi, is there any possibility to change erasure coding pool parameters ie k and m values on the fly? I want to add more disks to existing erasure pool and change redundancy level. I cannot find it in docs. Changing erasure-code-profile is not working so I assume that is only template for

[ceph-users] Stuck in stale state

2014-11-09 Thread Jan Pekař
Hi, I was testing ceph cluster map changes and I got to stuck state which seems to be indefinite. First my description what I have done. I'm testing special case with only one copy of pg's (pool size = 1). All pg's was on one osd.0. I created second osd.1 and modified cluster map to