Re: [ceph-users] PG stuck peering - OSD cephx: verify_authorizer key problem

2019-06-07 Thread Jan Pekař - Imatic
successful or the monitors reacted not correctly in this situation and didn't complete key exchange with OSD's. After system disk replacement on the problematic mon, verify_authorizer problem was not in log anymore. With regards Jan Pekar On 01/05/2019 13.58, Jan Pekař - Imatic wrote: Tod

Re: [ceph-users] PG stuck peering - OSD cephx: verify_authorizer key problem

2019-05-01 Thread Jan Pekař - Imatic
Today problem reappeared. Restarting mon helps, but it is no solving the issue. Is there any way how to debug that? Can I dump this keys from MON, from OSD or other components? Can I debug key exchange? Thank you On 27/04/2019 10.56, Jan Pekař - Imatic wrote: On 26/04/2019 21.50, Gregory

Re: [ceph-users] PG stuck peering - OSD cephx: verify_authorizer key problem

2019-04-27 Thread Jan Pekař - Imatic
On 26/04/2019 21.50, Gregory Farnum wrote: On Fri, Apr 26, 2019 at 10:55 AM Jan Pekař - Imatic wrote: Hi, yesterday my cluster reported slow request for minutes and after restarting OSDs (reporting slow requests) it stuck with peering PGs. Whole cluster was not responding and IO stopped. I

[ceph-users] PG stuck peering - OSD cephx: verify_authorizer key problem

2019-04-26 Thread Jan Pekař - Imatic
Hi, yesterday my cluster reported slow request for minutes and after restarting OSDs (reporting slow requests) it stuck with peering PGs. Whole cluster was not responding and IO stopped. I also notice, that problem was with cephx - all OSDs were reporting the same (even the same number of sec

Re: [ceph-users] Unfound object on erasure when recovering

2018-10-04 Thread Jan Pekař - Imatic
roblem appeared before trying to re-balance my cluster and was invisible to me. But it never happened before and scrub and depp-scrub is running regularly. I don't know where to continue with debugging this problem. JP On 3.10.2018 08:47, Jan Pekař - Imatic wrote: Hi all, I'm playin

[ceph-users] Unfound object on erasure when recovering

2018-10-02 Thread Jan Pekař - Imatic
Hi all, I'm playing with my testing cluster with ceph 12.2.8 installed. It happened to me for the second time, that I have 1 unfound objects on erasure coded pool. I have erasure with 3+1 configuration. First time I was adding additional disk. During cluster rebalance I noticed one unfound ob

Re: [ceph-users] OSD crash during pg repair - recovery_info.ss.clone_snaps.end and other problems

2018-03-07 Thread Jan Pekař - Imatic
On 6.3.2018 22:28, Gregory Farnum wrote: On Sat, Mar 3, 2018 at 2:28 AM Jan Pekař - Imatic <mailto:jan.pe...@imatic.cz>> wrote: Hi all, I have few problems on my cluster, that are maybe linked together and now caused OSD down during pg repair. First few notes about m

[ceph-users] OSD crash during pg repair - recovery_info.ss.clone_snaps.end and other problems

2018-03-03 Thread Jan Pekař - Imatic
Hi all, I have few problems on my cluster, that are maybe linked together and now caused OSD down during pg repair. First few notes about my cluster: 4 nodes, 15 OSDs installed on Luminous (no upgrade). Replicated pools with 1 pool (pool 6) cached by ssd disks. I don't detect any hardware fai

Re: [ceph-users] Corrupted files on CephFS since Luminous upgrade

2018-03-03 Thread Jan Pekař - Imatic
On 3.3.2018 11:12, Yan, Zheng wrote: On Tue, Feb 27, 2018 at 2:29 PM, Jan Pekař - Imatic wrote: I think I hit the same issue. I have corrupted data on cephfs and I don't remember the same issue before Luminous (i did the same tests before). It is on my test 1 node cluster with lower m

Re: [ceph-users] Corrupted files on CephFS since Luminous upgrade

2018-03-03 Thread Jan Pekař - Imatic
inous and let you know. With regards Jan Pekar On 28.2.2018 15:14, David C wrote: On 27 Feb 2018 06:46, "Jan Pekař - Imatic" <mailto:jan.pe...@imatic.cz>> wrote: I think I hit the same issue. I have corrupted data on cephfs and I don't remember the same issue

Re: [ceph-users] Corrupted files on CephFS since Luminous upgrade

2018-02-26 Thread Jan Pekař - Imatic
I think I hit the same issue. I have corrupted data on cephfs and I don't remember the same issue before Luminous (i did the same tests before). It is on my test 1 node cluster with lower memory then recommended (so server is swapping) but it shouldn't lose data (it never did before). So slow

[ceph-users] Problem with OSD down and problematic rbd object

2018-01-05 Thread Jan Pekař - Imatic
Hi all, yesterday I got OSD down with error 2018-01-04 06:47:25.304513 7fe6eda51700 -1 log_channel(cluster) log [ERR] : 6.20 repair 1 missing, 0 inconsistent objects 2018-01-04 06:47:25.312861 7fe6eda51700 -1 log_channel(cluster) log [ERR] : 6.20 repair 3 errors, 2 fixed 2018-01-04 06:47:26.79

Re: [ceph-users] rbd-nbd timeout and crash

2018-01-04 Thread Jan Pekař - Imatic
ith regards Jan Pekar On 6.12.2017 23:58, David Turner wrote: Do you have the FS mounted with a trimming ability?  What are your mount options? On Wed, Dec 6, 2017 at 5:30 PM Jan Pekař - Imatic <mailto:jan.pe...@imatic.cz>> wrote: Hi, On 6.12.2017 15:24, Jason Dillaman wrote:

Re: [ceph-users] Cluster stuck in failed state after power failure - please help

2017-12-11 Thread Jan Pekař - Imatic
uster, look at seeing up an mgr daemon. On Mon, Dec 11, 2017, 2:07 PM Jan Pekař - Imatic <mailto:jan.pe...@imatic.cz>> wrote: Hi, thank you for response. I started mds manually and accessed cephfs, I'm not running mgr yet, it is not necessary. I just responded to

Re: [ceph-users] Cluster stuck in failed state after power failure - please help

2017-12-11 Thread Jan Pekař - Imatic
Dec 11, 2017 at 1:08 PM Jan Pekař - Imatic <mailto:jan.pe...@imatic.cz>> wrote: Hi all, hope that somebody can help me. I have home ceph installation. After power failure (it can happen in datacenter also) my ceph booted in non-consistent state. I was backfilling dat

Re: [ceph-users] Cluster stuck in failed state after power failure - please help

2017-12-11 Thread Jan Pekař - Imatic
pg data from osd's? In osd logs I can see, that backfilling is continuing etc, so they have correct informations or they are running previous operations before power failure. With regards Jan Pekar On 11.12.2017 19:07, Jan Pekař - Imatic wrote: Hi all, hope that somebody can help me. I

[ceph-users] Cluster stuck in failed state after power failure - please help

2017-12-11 Thread Jan Pekař - Imatic
Hi all, hope that somebody can help me. I have home ceph installation. After power failure (it can happen in datacenter also) my ceph booted in non-consistent state. I was backfilling data on one new disk during power failure. First time it booted without some OSDs, but I fixed that. Now I ha

Re: [ceph-users] rbd-nbd timeout and crash

2017-12-06 Thread Jan Pekař - Imatic
Hi, On 6.12.2017 15:24, Jason Dillaman wrote: On Wed, Dec 6, 2017 at 3:46 AM, Jan Pekař - Imatic wrote: Hi, I run to overloaded cluster (deep-scrub running) for few seconds and rbd-nbd client timeouted, and device become unavailable. block nbd0: Connection timed out block nbd0: shutting down

[ceph-users] rbd-nbd timeout and crash

2017-12-06 Thread Jan Pekař - Imatic
Hi, I run to overloaded cluster (deep-scrub running) for few seconds and rbd-nbd client timeouted, and device become unavailable. block nbd0: Connection timed out block nbd0: shutting down sockets block nbd0: Connection timed out print_req_error: I/O error, dev nbd0, sector 2131833856 print_req

Re: [ceph-users] RBD corruption when removing tier cache

2017-12-02 Thread Jan Pekař - Imatic
some other ways to flush all objects (like turn off VMs, set short time to evict or target size) and remove overlay after that. With regards Jan Pekar On 1.12.2017 03:43, Jan Pekař - Imatic wrote: Hi all, today I tested adding SSD cache tier to pool. Everything worked, but when I tried to remove it

[ceph-users] RBD corruption when removing tier cache

2017-11-30 Thread Jan Pekař - Imatic
Hi all, today I tested adding SSD cache tier to pool. Everything worked, but when I tried to remove it and run rados -p hot-pool cache-flush-evict-all I got rbd_data.9c000238e1f29. failed to flush /rbd_data.9c000238e1f29.: (2) No such file or directory

Re: [ceph-users] Libvirt hosts freeze after ceph osd+mon problem

2017-11-08 Thread Jan Pekař - Imatic
ibrbd was deadlocked, the worst case that I would expect would be your guest OS complaining about hung kernel tasks related to disk IO (since the disk wouldn't be responding). On Mon, Nov 6, 2017 at 6:02 PM, Jan Pekař - Imatic <mailto:jan.pe...@imatic.cz>> wrote: Hi,

Re: [ceph-users] Libvirt hosts freeze after ceph osd+mon problem

2017-11-07 Thread Jan Pekař - Imatic
den Hollander wrote: Op 7 november 2017 om 10:14 schreef Jan Pekař - Imatic : Additional info - it is not librbd related, I mapped disk through rbd map and it was the same - virtuals were stuck/frozen. I happened exactly when in my log appeared Why aren't you using librbd? Is th

Re: [ceph-users] Libvirt hosts freeze after ceph osd+mon problem

2017-11-07 Thread Jan Pekař - Imatic
would suggest trying a different version of QEMU and/or different host OS since loss of a disk shouldn't hang it -- only potentially the guest OS. On Tue, Nov 7, 2017 at 5:17 AM, Jan Pekař - Imatic <mailto:jan.pe...@imatic.cz>> wrote: I'm calling kill -STOP to simulate

Re: [ceph-users] Libvirt hosts freeze after ceph osd+mon problem

2017-11-07 Thread Jan Pekař - Imatic
map device attached inside QEMU/KVM virtuals. JP On 7.11.2017 10:57, Piotr Dałek wrote: On 17-11-07 12:02 AM, Jan Pekař - Imatic wrote: Hi, I'm using debian stretch with ceph 12.2.1-1~bpo80+1 and qemu 1:2.8+dfsg-6+deb9u3 I'm running 3 nodes with 3 monitors and 8 osds on my nodes,

Re: [ceph-users] Libvirt hosts freeze after ceph osd+mon problem

2017-11-07 Thread Jan Pekař - Imatic
n if librbd was deadlocked, the worst case that I would expect would be your guest OS complaining about hung kernel tasks related to disk IO (since the disk wouldn't be responding). On Mon, Nov 6, 2017 at 6:02 PM, Jan Pekař - Imatic <mailto:jan.pe...@imatic.cz>> wrote: Hi,

[ceph-users] Libvirt hosts freeze after ceph osd+mon problem

2017-11-06 Thread Jan Pekař - Imatic
Hi, I'm using debian stretch with ceph 12.2.1-1~bpo80+1 and qemu 1:2.8+dfsg-6+deb9u3 I'm running 3 nodes with 3 monitors and 8 osds on my nodes, all on IPV6. When I tested the cluster, I detected strange and severe problem. On first node I'm running qemu hosts with librados disk connection to