What is the difference between services and daemons?
Specifically, what does it mean that "orch ps" lists cephadm daemons and
"orch ls" lists cephadm services?
This question will help me close this bug:
https://tracker.ceph.com/issues/47142
Zac Dover
Upstream Docs
Ceph
Figured it out:
admin/bucket?quota works, but it does not seem to be documented.
On Mon, Aug 31, 2020 at 4:16 PM Youzhong Yang wrote:
> Hi all,
>
> I tried to set bucket quota using admin API as shown below:
>
> admin/user?quota=bse=test=bucket
>
> with payload in json format:
> {
>
Dallas;
First, I should point out that you have an issue with your units. Your cluster
is reporting 81TiB (1024^4) of available space, not 81TB (1000^4). Similarly;
it's reporting 22.8 TiB free space in the pool, not 22.8TB. For comparison;
your 5.5 TB drives (this is the correct unit here)
Looks like the image attachment got removed. Please find it here:
https://imgur.com/a/3tabzCN
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Frank Schilder
Sent: 31 August 2020 14:42
To: Mark Nelson; Dan van der Ster;
On Mon, Aug 31, 2020 at 5:02 AM Stefan Kooman wrote:
>
> Hi list,
>
> We had some stuck ops on our MDS. In order to figure out why, we looked
> up the documention. The first thing it mentions is the following:
>
> ceph daemon mds. dump cache /tmp/dump.txt
>
> Our MDS had 170 GB in cache at that
Both the MDS maps and the keyrings are lost as a side effect of the monitor
recovery process I mentioned in my initial email, detailed here
https://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-mon/#monitor-store-failures
.
On Mon, 31 Aug 2020 at 21:10, Eugen Block wrote:
> I
Hi all,
I tried to set bucket quota using admin API as shown below:
admin/user?quota=bse=test=bucket
with payload in json format:
{
"enabled": true,
"max_size": 1099511627776,
"max_size_kb": 1073741824,
"max_objects": -1
}
it
I don’t understand, what happened to the previous MDS? If there are
cephfs pools there also was an old MDS, right? Can you explain that
please?
Zitat von cyclic3@gmail.com:
I added an MDS, but there was no change in either output (apart from
recognising the existence of an MDS)
This sounds rather risky; will this definitely not lose any of my data?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
I was talking about on-disk cache, but, yes, the controller cache needs to be
disabled too. The first can be done with smartctl or hdparm. Check cache status
with something like 'smartctl -g wcache /dev/sda' and disable with something
like 'smartctl -s wcache=off /dev/sda'.
Controller cache
Llya,
Thank you for the quick response this was very helpful toward getting
resolution.
Restarting those 3 osds has allowed the rbd image to mount successfully.
I really appreciate all your help on this.
Shain
On 8/31/20, 12:41 PM, "Ilya Dryomov" wrote:
On Mon, Aug 31, 2020 at 6:21
Hi,
A few weeks ago several of our rdb images became unresponsive after a few of
our OSDs reached a near full state.
Another member of the team rebooted the server that the rbd images are mounted
on in an attempt to resolve the issue.
In the meantime I added several more nodes to the cluster in
Hi list,
We had some stuck ops on our MDS. In order to figure out why, we looked
up the documention. The first thing it mentions is the following:
ceph daemon mds. dump cache /tmp/dump.txt
Our MDS had 170 GB in cache at that moment.
Turns out that is a sure way to get your active MDS replaced
Hi,
I have a CEPH 15.2.4 running in a docker. How to configure for use a
specific data pool? i try put the follow line in the ceph.conf but the
changes not working. .
[client.myclient]
rbd default data pool = Mydatapool
I need it to configure for erasure pool with cloudstack
Can help me?
Hi all,
can anyone help me with this? In mimic, for any of these commands:
ceph osd [deep-]scrub ID
ceph pg [deep-]scrub ID
ceph pg repair ID
an operation is scheduled asynchronously. How can I check the following states:
1) Operation is pending (scheduled, not started).
2)
Yes, they can - if volatile write cache is not disabled. There are many threads
on this, also recent. Search for "disable write cache" and/or "disable volatile
write cache".
You will also find different methods of doing this automatically.
Best regards,
=
Frank Schilder
AIT
Here the blog I wrote about:
https://yourcmc.ru/wiki/index.php?title=Ceph_performance=toggle_view_desktopHDD
for data + SSD for journal
Filestore writes everything to the journal and only starts to flush
it to the data device when the journal fills up to the configured
percent. This is very
On Mon, 2020-08-31 at 14:36 +, Eugen Block wrote:
> > Disks are utilized roughly between 70 and 80 percent. Not sure why
> > would operations slow down when disks are getting more utilization.
> > If that would be the case, I'd expect Ceph to issue a warning.
>
> It is warning you, that's why
Disks are utilized roughly between 70 and 80 percent. Not sure why
would operations slow down when disks are getting more utilization.
If that would be the case, I'd expect Ceph to issue a warning.
It is warning you, that's why you see slow requests. ;-) But just to
be clear, by utilization I
Could you please run: ceph daemon calc_objectstore_db_histogram
and share the output?
On 8/31/2020 4:33 PM, Wido den Hollander wrote:
On 31/08/2020 12:31, Igor Fedotov wrote:
Hi Wido,
'b' prefix relates to free list manager which keeps all the free
extents for main device in a bitmap.
On 31/08/2020 11:00, Dennis Benndorf wrote:
Hi,
today I recognized bad performance in our cluster. Running "watch ceph
osd perf |sort -hk 2 -r" I found that all bluestore OSDs are slow on
commit and that the commit timings are equal to their apply timings:
For example
Every 2.0s: ceph osd
On 31/08/2020 15:44, Francois Legrand wrote:
Thanks Igor for your answer,
We could try do a compaction of RocksDB manually, but it's not clear to
me if we have to compact on the mon with something like
ceph-kvstore-tool rocksdb /var/lib/ceph/mon/mon01/store.db/ compact
or on the concerned
osd_memory_target of failed osd in one ceph-osd node changed to 6G but
other osd_memory_target is 3G, starting failed osd with 6G memory_target
causes other osd "down" in ceph-osd node! and failed osd is still down.
On Mon, Aug 31, 2020 at 2:19 PM Eugen Block wrote:
> Can you try the opposite
On 31/08/2020 12:31, Igor Fedotov wrote:
Hi Wido,
'b' prefix relates to free list manager which keeps all the free extents
for main device in a bitmap. Its records have fixed size hence you can
easily estimate the overall size for these type of data.
Yes, so I figured.
But I doubt it
I replaced the VMs taking care of routing between clients and MDSes by physical
machines. Problems below are solved. It seems to have been related to issues
with the virtual NIC. It seemed to work well with E1000 instead of VirtIO...
Met vriendelijke groeten,
William Edwards
- Original
We have older LSi Raid controller with no HBA/JBOD option. So we expose the
single disks as raid0 devices. Ceph should not be aware of cache status?
But digging deeper in to it it seems that 1 out of 4 serves is performing a lot
better and has super low commit/applay rates while the other have a
The compaction of the bluestore-kv's helped indeed. The repons is back to
acceptable levels
Thanks for the help
> Thank you Stefan, I'm going to give that a try
>
> Kind Regards
>
> Marcel Kuiper
>
>> On 2020-08-27 13:29, Marcel Kuiper wrote:
>>> Sorry that had to be Wido/Stefan
>>
>> What does
Perhaps both clusters have the same bottleneck and you perceive them as
equally fast.
Can you provide as much details of your clusters as possible?
Also please show outputs of the tests that you've run.
On 8/31/20 1:02 PM, VELARTIS Philipp Dürhammer wrote:
I have a productive 60 osd's
I have a productive 60 osd's cluster. No extra Journals. Its performing okay.
Now I added an extra ssd Pool with 16 Micron 5100 MAX. And the performance is
little slower or equal to the 60 hdd pool. 4K random as also sequential reads.
All on dedicated 2 times 10G Network. HDDS are still on
Hi Wido,
'b' prefix relates to free list manager which keeps all the free extents
for main device in a bitmap. Its records have fixed size hence you can
easily estimate the overall size for these type of data.
But I doubt it takes that much. I presume that DB just lacks the proper
Hi Francois,
given that slow operations are observed for collection listings you
might want to manually compact RocksDB using ceph-kvstore-tool.
The observed slowdown tends to happen after massive data removals. I've
seen multiple compains about this issue including some post in this
Hey Eugen,
On Wed, 2020-08-26 at 09:29 +, Eugen Block wrote:
> Hi,
>
> > > root@cephosd01:~# ceph config get mds.cephosd01 osd_op_queue
> > > wpq
> > > root@0cephosd01:~# ceph config get mds.cephosd01
> > > osd_op_queue_cut_off
> > > high
>
> just to make sure, I referred to OSD not MDS
Can you try the opposite and turn up the memory_target and only try to
start a single OSD?
Zitat von Vahideh Alinouri :
osd_memory_target is changed to 3G, starting failed osd causes ceph-osd
nodes crash! and failed osd is still "down"
On Fri, Aug 28, 2020 at 1:13 PM Vahideh Alinouri
Hi Dave,
On Tue, 2020-08-25 at 15:25 +0100, david.neal wrote:
> Hi Momo,
>
> This can be caused by many things apart from the ceph sw.
>
> For example I saw this once with the MTU in openvswitch not fully
> matching on a few nodes . We realised this using ping between nodes.
> For a 9000 MTU:
>
We tried to rise the osd_memory_target from 4 to 8G but the problem
still occurs (osd wrongly marked down few times a day).
Does somebody have any clue ?
F.
On Fri, Aug 28, 2020 at 10:34 AM Francois Legrand
mailto:f...@lpnhe.in2p3.fr>> wrote:
Hi all,
We have a ceph
Hi,
I have a rook cluster running with ceph 12.2.7 for almost one year.
Recently some pvc couldn’t be attached with error as below,
Warning FailedMount 7m19s kubelet, 192.168.34.119
MountVolume.SetUp failed for volume "pvc-8f4ca7ac-42ab-11ea-99d7-005056b84936"
: mount command
Hi,
today I recognized bad performance in our cluster. Running "watch ceph
osd perf |sort -hk 2 -r" I found that all bluestore OSDs are slow on
commit and that the commit timings are equal to their apply timings:
For example
Every 2.0s: ceph osd perf |sort -hk 2
-r
Hi Max,
So, it seems that you prefer to use image cache than allowing cross access
between Ceph users. By that, all communications are APi based, the snapshot
and CoW happen inside the same pool for a single Ceph client only, isn't
it? I'll consider this way and compare with the cross pool access
Hello,
Lately, we upgraded Ceph to version 15.2.4 and shortly after that we had
a blackout, which caused a restart of all servers at once (BTW, Ceph did
not come up well itself). Since then we were receiving lots of
complaints about problems with "object-maps" with every benji backup
(tool for
Hello,
On a Nautilus 14.2.8 cluster I am seeing large RocksDB database with
many slow DB bytes in use.
To investigate this further I marked one OSD as out and waited for the
all the backfilling to complete.
Once the backfilling was completed I exported BlueFS and investigated
the RocksDB
osd_memory_target is changed to 3G, starting failed osd causes ceph-osd
nodes crash! and failed osd is still "down"
On Fri, Aug 28, 2020 at 1:13 PM Vahideh Alinouri
wrote:
> Yes, each osd node has 7 osds with 4 GB memory_target.
>
>
> On Fri, Aug 28, 2020, 12:48 PM Eugen Block wrote:
>
>> Just
On Sun, Aug 30, 2020 at 8:05 PM wrote:
>
> Hi,
> I've had a complete monitor failure, which I have recovered from with the
> steps here:
> https://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-mon/#monitor-store-failures
> The data and metadata pools are there and are
Hello!
Mon, Aug 31, 2020 at 01:06:13AM +0700, mrxlazuardin wrote:
> Hi Max,
>
> As far as I know, cross access of Ceph pools is needed for copy on write
> feature which enables fast cloning/snapshotting. For example, nova and
> cinder users need to read to images pool to do copy on write from
43 matches
Mail list logo