successful or the monitors
reacted not correctly in this situation and didn't complete key exchange with OSD's.
After system disk replacement on the problematic mon, verify_authorizer problem
was not in log anymore.
With regards
Jan Pekar
On 01/05/2019 13.58, Jan Pekař - Imatic wrote:
Tod
Today problem reappeared.
Restarting mon helps, but it is no solving the issue.
Is there any way how to debug that? Can I dump this keys from MON, from OSD or
other components? Can I debug key exchange?
Thank you
On 27/04/2019 10.56, Jan Pekař - Imatic wrote:
On 26/04/2019 21.50, Gregory
On 26/04/2019 21.50, Gregory Farnum wrote:
On Fri, Apr 26, 2019 at 10:55 AM Jan Pekař - Imatic wrote:
Hi,
yesterday my cluster reported slow request for minutes and after restarting
OSDs (reporting slow requests) it stuck with peering PGs. Whole
cluster was not responding and IO stopped.
I
Hi,
yesterday my cluster reported slow request for minutes and after restarting OSDs (reporting slow requests) it stuck with peering PGs. Whole
cluster was not responding and IO stopped.
I also notice, that problem was with cephx - all OSDs were reporting the same
(even the same number of sec
roblem appeared before trying to re-balance my cluster and was invisible to me. But it never happened before and scrub and
depp-scrub is running regularly.
I don't know where to continue with debugging this problem.
JP
On 3.10.2018 08:47, Jan Pekař - Imatic wrote:
Hi all,
I'm playin
Hi all,
I'm playing with my testing cluster with ceph 12.2.8 installed.
It happened to me for the second time, that I have 1 unfound objects on erasure
coded pool.
I have erasure with 3+1 configuration.
First time I was adding additional disk. During cluster rebalance I noticed one unfound ob
On 6.3.2018 22:28, Gregory Farnum wrote:
On Sat, Mar 3, 2018 at 2:28 AM Jan Pekař - Imatic <mailto:jan.pe...@imatic.cz>> wrote:
Hi all,
I have few problems on my cluster, that are maybe linked together and
now caused OSD down during pg repair.
First few notes about m
Hi all,
I have few problems on my cluster, that are maybe linked together and
now caused OSD down during pg repair.
First few notes about my cluster:
4 nodes, 15 OSDs installed on Luminous (no upgrade).
Replicated pools with 1 pool (pool 6) cached by ssd disks.
I don't detect any hardware fai
On 3.3.2018 11:12, Yan, Zheng wrote:
On Tue, Feb 27, 2018 at 2:29 PM, Jan Pekař - Imatic wrote:
I think I hit the same issue.
I have corrupted data on cephfs and I don't remember the same issue before
Luminous (i did the same tests before).
It is on my test 1 node cluster with lower m
inous and let you
know.
With regards
Jan Pekar
On 28.2.2018 15:14, David C wrote:
On 27 Feb 2018 06:46, "Jan Pekař - Imatic" <mailto:jan.pe...@imatic.cz>> wrote:
I think I hit the same issue.
I have corrupted data on cephfs and I don't remember the same issue
I think I hit the same issue.
I have corrupted data on cephfs and I don't remember the same issue
before Luminous (i did the same tests before).
It is on my test 1 node cluster with lower memory then recommended (so
server is swapping) but it shouldn't lose data (it never did before).
So slow
Hi all,
yesterday I got OSD down with error
2018-01-04 06:47:25.304513 7fe6eda51700 -1 log_channel(cluster) log
[ERR] : 6.20 repair 1 missing, 0 inconsistent objects
2018-01-04 06:47:25.312861 7fe6eda51700 -1 log_channel(cluster) log
[ERR] : 6.20 repair 3 errors, 2 fixed
2018-01-04 06:47:26.79
ith regards
Jan Pekar
On 6.12.2017 23:58, David Turner wrote:
Do you have the FS mounted with a trimming ability? What are your mount
options?
On Wed, Dec 6, 2017 at 5:30 PM Jan Pekař - Imatic <mailto:jan.pe...@imatic.cz>> wrote:
Hi,
On 6.12.2017 15:24, Jason Dillaman wrote:
uster,
look at seeing up an mgr daemon.
On Mon, Dec 11, 2017, 2:07 PM Jan Pekař - Imatic <mailto:jan.pe...@imatic.cz>> wrote:
Hi,
thank you for response. I started mds manually and accessed cephfs, I'm
not running mgr yet, it is not necessary.
I just responded to
Dec 11, 2017 at 1:08 PM Jan Pekař - Imatic <mailto:jan.pe...@imatic.cz>> wrote:
Hi all,
hope that somebody can help me. I have home ceph installation.
After power failure (it can happen in datacenter also) my ceph booted in
non-consistent state.
I was backfilling dat
pg data from osd's?
In osd logs I can see, that backfilling is continuing etc, so they have
correct informations or they are running previous operations before
power failure.
With regards
Jan Pekar
On 11.12.2017 19:07, Jan Pekař - Imatic wrote:
Hi all,
hope that somebody can help me. I
Hi all,
hope that somebody can help me. I have home ceph installation.
After power failure (it can happen in datacenter also) my ceph booted in
non-consistent state.
I was backfilling data on one new disk during power failure. First time
it booted without some OSDs, but I fixed that. Now I ha
Hi,
On 6.12.2017 15:24, Jason Dillaman wrote:
On Wed, Dec 6, 2017 at 3:46 AM, Jan Pekař - Imatic wrote:
Hi,
I run to overloaded cluster (deep-scrub running) for few seconds and rbd-nbd
client timeouted, and device become unavailable.
block nbd0: Connection timed out
block nbd0: shutting down
Hi,
I run to overloaded cluster (deep-scrub running) for few seconds and
rbd-nbd client timeouted, and device become unavailable.
block nbd0: Connection timed out
block nbd0: shutting down sockets
block nbd0: Connection timed out
print_req_error: I/O error, dev nbd0, sector 2131833856
print_req
some other ways to flush all objects (like turn off VMs, set
short time to evict or target size) and remove overlay after that.
With regards
Jan Pekar
On 1.12.2017 03:43, Jan Pekař - Imatic wrote:
Hi all,
today I tested adding SSD cache tier to pool.
Everything worked, but when I tried to remove it
Hi all,
today I tested adding SSD cache tier to pool.
Everything worked, but when I tried to remove it and run
rados -p hot-pool cache-flush-evict-all
I got
rbd_data.9c000238e1f29.
failed to flush /rbd_data.9c000238e1f29.: (2) No such
file or directory
ibrbd was deadlocked, the worst case that I would expect would
be your guest OS complaining about hung kernel tasks related to disk IO
(since the disk wouldn't be responding).
On Mon, Nov 6, 2017 at 6:02 PM, Jan Pekař - Imatic <mailto:jan.pe...@imatic.cz>> wrote:
Hi,
den Hollander wrote:
Op 7 november 2017 om 10:14 schreef Jan Pekař - Imatic :
Additional info - it is not librbd related, I mapped disk through
rbd map and it was the same - virtuals were stuck/frozen.
I happened exactly when in my log appeared
Why aren't you using librbd? Is th
would suggest trying a
different version of QEMU and/or different host OS since loss of a disk
shouldn't hang it -- only potentially the guest OS.
On Tue, Nov 7, 2017 at 5:17 AM, Jan Pekař - Imatic <mailto:jan.pe...@imatic.cz>> wrote:
I'm calling kill -STOP to simulate
map device attached inside
QEMU/KVM virtuals.
JP
On 7.11.2017 10:57, Piotr Dałek wrote:
On 17-11-07 12:02 AM, Jan Pekař - Imatic wrote:
Hi,
I'm using debian stretch with ceph 12.2.1-1~bpo80+1 and qemu
1:2.8+dfsg-6+deb9u3
I'm running 3 nodes with 3 monitors and 8 osds on my nodes,
n if librbd was deadlocked, the worst case that I would expect would
be your guest OS complaining about hung kernel tasks related to disk IO
(since the disk wouldn't be responding).
On Mon, Nov 6, 2017 at 6:02 PM, Jan Pekař - Imatic <mailto:jan.pe...@imatic.cz>> wrote:
Hi,
Hi,
I'm using debian stretch with ceph 12.2.1-1~bpo80+1 and qemu
1:2.8+dfsg-6+deb9u3
I'm running 3 nodes with 3 monitors and 8 osds on my nodes, all on IPV6.
When I tested the cluster, I detected strange and severe problem.
On first node I'm running qemu hosts with librados disk connection to
27 matches
Mail list logo