Hi All,
Some feedback on my end. I managed to recover the "lost data" from one of
the other OSDs. Seems like my initial summary was a bit off, in that the
PG's was replicated, CEPH just wanted to confirm that the objects were
still relevant.
For future reference, I basically marked the OSD as
Hi,
My Cluster show me this message cince last two weeks.
Ceph Version (ceph -v):
root@heku1 ~ # ceph -v
ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic
(stable)
All pgs are active+clean:
root@heku1 ~ # ceph -s
cluster:
id: 0839c91a-f3ca-4119-853b-eb10904cf322
Hi Igor,
Many thanks for your reply. Here are the details about the cluster:
1. Ceph version - 13.2.5-1xenial (installed from Ceph repository for ubuntu
16.04)
2. main devices for radosgw pool - hdd. we do use a few ssds for the other
pool, but it is not used by radosgw
3. we use
Hi Greg,
Can you please share the api details for COPY_FROM or any reference
document?
Thanks ,
Muthu
On Wed, Jul 3, 2019 at 4:12 AM Brad Hubbard wrote:
> On Wed, Jul 3, 2019 at 4:25 AM Gregory Farnum wrote:
> >
> > I'm not sure how or why you'd get an object class involved in doing
> >
Hello,
I have strange problem with scrubbing.
When scrubbing starts on PG which belong to default.rgw.buckets.index
pool, I can see that this OSD is very busy (see attachment), and starts
showing many
slow request, after the scrubbing of this PG stops, slow requests
stops immediately.
Hi,
Mounted a CephFS through kernel module or FUSE. Both work except when we do a
"df -h", the "Avail" value shown is the MAX AVAIL of the data pool in "ceph df".
I'm expecting it should match with max_bytes of the data pool.
Rbd mount doesn't have similar observation.
Is this normal?
Thanks
Your cephfs was probably created with a buggy version that didn't set the
metadata tags on the data pools correctly. IIRC there still isn't any
automated migration of old broken pools.
See https://github.com/ceph/ceph/pull/24125
Fix:
ceph osd pool application set cephfs data cephfs_data2
Paul
Hi Andrei,
Additionally I'd like to see performance counters dump for a couple of
HDD OSDs (obtained through 'ceph daemon osd.N perf dump' command).
W.r.t average object size - I was thinking that you might know what
objects had been uploaded... If not then you might want to estimate it
by
Den ons 3 juli 2019 kl 05:41 skrev Bryan Henderson :
> I may need to modify the above, though, now that I know how Ceph works,
> because I've seen storage server products that use Ceph inside. However,
> I'll
> bet the people who buy those are not aware that it's designed never to go
> down
>
Looks fine - comparing bluestore_allocated vs. bluestore_stored shows a
little difference. So that's not the allocation overhead.
What's about comparing object counts reported by ceph and radosgw tools?
Igor.
On 7/3/2019 3:25 PM, Andrei Mikhailovsky wrote:
Thanks Igor, Here is a link to
Den ons 3 juli 2019 kl 09:01 skrev Luk :
> Hello,
>
> I have strange problem with scrubbing.
>
> When scrubbing starts on PG which belong to default.rgw.buckets.index
> pool, I can see that this OSD is very busy (see attachment), and starts
> showing many
> slow request, after the
For anyone reading this in the future from a google search: please don't
set osd_find_best_info_ignore_history_les unless you know exactly what you
are doing.
That's a really dangerous option and should be a last resort. It will
almost definitely lead to some data loss or inconsistencies (lost
are you running with auto repair enabled? There's a bug that sometimes
resets the scrub timestamps to 0 in this configuration.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89
Hi,
With --debug-objecter=20, I found that the rados ls command hangs
looping on laggy messages :
|
||2019-07-03 13:33:24.913 7efc402f5700 10 client.21363886.objecter
_op_submit op 0x7efc3800dc10||
||2019-07-03 13:33:24.913 7efc402f5700 20 client.21363886.objecter
_calc_target epoch 13146
Hi everyone,
The target release date for Octopus is March 1, 2020.
The freeze will be January 1, 2020. As a practical matter, that means any
features need to be in before people leave for the holidays, ensuring the
features get in in time and also that we can run tests over the holidays
Hi,
rocksdb in BlueStore should be opened like this with ceph-kvstore-tool:
ceph-kvstore-tool bluestore-kv
Instead of just "rocksdb" which is for rocksdb on some file system.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
On Wed, Jul 3, 2019 at 4:47 PM Luk wrote:
>
>
> this pool is that 'big' :
>
> [root@ceph-mon-01 ~]# rados df | grep -e index -e WR
> POOL_NAME USEDOBJECTS CLONES COPIES
> MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR
>
> default.rgw.buckets.index
Hi Igor.
The numbers are identical it seems:
.rgw.buckets 19 15 TiB 78.22 4.3 TiB 8786934
# cat /root/ceph-rgw.buckets-rados-ls-all |wc -l
8786934
Cheers
> From: "Igor Fedotov"
> To: "andrei"
> Cc: "ceph-users"
> Sent: Wednesday, 3 July, 2019 13:49:02
> Subject: Re: [ceph-users]
Thanks Igor, Here is a link to the ceph perf data on several osds.
https://paste.ee/p/IzDMy
In terms of the object sizes. We use rgw to backup the data from various
workstations and servers. So, the sizes would be from a few kb to a few gig per
individual file.
Cheers
> From: "Igor
On Sun, 30 Jun 2019, Bryan Henderson wrote:
> > I'm not sure why the monitor did not mark it _out_ after 600 seconds
> > (default)
>
> Well, that part I understand. The monitor didn't mark the OSD out because the
> monitor still considered the OSD up. No reason to mark an up OSD out.
>
> I
Den ons 3 juli 2019 kl 20:51 skrev Austin Workman :
>
> But a very strange number shows up in the active sections of the pg's
> that's the same number roughly as 2147483648. This seems very odd,
> and maybe the value got lodged somewhere it doesn't belong which is causing
> an issue.
>
>
Well, the RADOS interface doesn't have a great deal of documentation
so I don't know if I can point you at much.
But if you look at Objecter.h, you see that the ObjectOperation has
this function:
void copy_from(object_t src, snapid_t snapid, object_locator_t
src_oloc, version_t src_version,
> I'm a bit confused about what happened here, though: that 600 second
> interval is only important if *every* OSD in the system is down. If you
> reboot the data center, why didn't *any* OSD daemons start? (And even if
> none did, having the ceph -s report all OSDs down instead of up isn't
That makes more sense.
Setting min_size = 4 on the EC pool allows data to flow again(kind of not
really because of the still missing 22 other PG's) maybe this automatically
raised to 5 when I adjusted the EC pool originally?, outside of the 21
unknown and 1 down PG which are probably depending on
So several events unfolded that may have led to this situation. Some of
them in hindsight were probably not the smartest decision around adjusting
the ec pool and restarting the OSD's several times during these migrations.
1. Added a new 6th OSD with ceph-ansible
1. Hung during restart
Something very curious is that I was adjusting the configuration for osd
memory target via ceph-ansible and had at one point set 2147483648
which is around 2GB
Currently It's set to 1610612736, but strangely in the config file it
wrote 1963336226.
But a very strange number shows up in the
Hi All! Today I've had 3 OSDs stop themselves and are unable to restart,
all with the same error. These OSDs are all on different hosts. All are
running 14.2.1
I did try the following two commands
- ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-80 list > keys
## This failed with the
thanks for the tip, I did wonder about that, and checked that at one point,
and assumed that was ok.
root@cnx-11:~# ceph osd pool application get cephfs_data
{
"cephfs": {
"data": "cephfs"
}
}
root@cnx-11:~# ceph osd pool application get cephfs_data2
{
"cephfs": {
After some creative PG surgery, everything is coming back online cleanly.
I went through one at a time(80-90 PG's) on the least filled(new osd.5) and
export-remove'd each PG that was causing the assertion failures after
testing starting the OSD. # tail -f /var/log/ceph/ceph-osd.5.log | grep -A1
29 matches
Mail list logo