On Thu, 19 Oct 2017, Daniel Pryor wrote:
> Hello Everyone,
>
> We are currently running into two issues.
>
> 1) We are noticing huge pauses during directory creation, but our file write
> times are super fast. The metadata and data pools are on the same
> infrastructure.
> *
Hello Everyone,
We are currently running into two issues.
1) We are noticing huge pauses during directory creation, but our file
write times are super fast. The metadata and data pools are on the same
infrastructure.
- https://gist.github.com/pryorda/a0d5c37f119c4a320fa4ca9d48c8752b
-
Hi Cephers,
Brett Niver and Orit Wasserman are organizing a Ceph Upstream meeting on
next Thursday October 25 in Prague.
The meeting will happen at The Pub from 5pm to 9pm (CEST):
http://www.thepub.cz/praha-1/?lng=en
At the moment we are working on the participant list, if you're
interested
I guess you have both read and followed
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/?highlight=backfill#debugging-slow-requests
What was the result?
On Fri, Oct 20, 2017 at 2:50 AM, J David wrote:
> On Wed, Oct 18, 2017 at 8:12 AM, Ольга
On Fri, Oct 20, 2017 at 6:32 AM, Josy wrote:
> Hi,
>
>>> have you checked the output of "ceph-disk list” on the nodes where the
>>> OSDs are not coming back on?
>
> Yes, it shows all the disk correctly mounted.
>
>>> And finally inspect /var/log/ceph/ceph-osd.${id}.log
Okay, you're going to need to explain in very clear terms exactly what
happened to your cluster, and *exactly* what operations you performed
manually.
The PG shards seem to have different views of the PG in question. The
primary has a different log_tail, last_user_version, and last_epoch_clean
Development versions of the RPMs can be found here [1]. We don't have
production signed builds in place for our ceph-iscsi-XYZ packages yet and
the other packages would eventually come from a distro (or third party
add-on) repo.
[1] https://shaman.ceph.com/repos/
On Thu, Oct 19, 2017 at 8:27 PM,
Where did you find the iscsi rpms ect? I looked all through the repo and can't
find anything but the documentation.
_
Tyler Bishop
Founder EST 2007
O: 513-299-7108 x10
M: 513-646-5809
[ http://beyondhosting.net/ | http://BeyondHosting.net ]
Hello,
On Thu, 19 Oct 2017 17:14:17 -0500 Russell Glaue wrote:
> That is a good idea.
> However, a previous rebalancing processes has brought performance of our
> Guest VMs to a slow drag.
>
Never mind that I'm not sure that these SSDs are particular well suited
for Ceph, your problem is
On Thu, Oct 19, 2017 at 12:59 AM, zhaomingyue wrote:
> Hi:
>
> when I analyzed the performance of ceph, I found that rebuild_aligned was
> time-consuming, and the analysis found that rebuild operations were
> performed every time.
>
>
>
> Source code:
>
>
That is a good idea.
However, a previous rebalancing processes has brought performance of our
Guest VMs to a slow drag.
On Thu, Oct 19, 2017 at 3:55 PM, Jean-Charles Lopez
wrote:
> Hi Russell,
>
> as you have 4 servers, assuming you are not doing EC pools, just stop all
>
Hi Richard,
Thanks a lot for sharing your experience... I have made deeper
investigation and it looks export-diff is the most common tool used for
backup as you have suggested.
I will make some tests with export-diff and I will share my experience.
Again, thanks a lot!
2017-10-16 12:00
Unless your min_size is set to 3, then you are not hitting the bug in the
tracker you linked. Most likely you are running with a min_size of 2 which
means that bug is not relevant to your cluster. Upload this if you
wouldn't mind. `ceph osd pool get {pool_name} all`
On Thu, Oct 19, 2017 at
I am using RGW, with an S3 bucket setup.
The live vershion also uses rbd as well
On 19 Oct 2017 10:04 pm, "David Turner" wrote:
How are you uploading a file? RGW, librados, CephFS, or RBD? There are
multiple reasons that the space might not be updating or cleaning
How are you uploading a file? RGW, librados, CephFS, or RBD? There are
multiple reasons that the space might not be updating or cleaning itself
up. The more information you can give us about how you're testing, the
more we can help you.
On Thu, Oct 19, 2017 at 5:00 PM nigel davies
Yes, I am trying it over luminous.
Well the bug has been going for 8 month and it hasn't been merged yet.
Idk if that is whats preventing me to make it work. Tomorrow I will try
to prove it again.
El 19/10/2017 a las 23:00, David Turner escribió:
> Running a cluster on various versions of
Running a cluster on various versions of Hammer and Jewel I haven't had any
problems. I haven't upgraded to Luminous quite yet, but I'd be surprised
if there is that severe of a regression especially since they did so many
improvements to Erasure Coding.
On Thu, Oct 19, 2017 at 4:59 PM Jorge
Hay
I some how got the space back, by tweeking the reweights.
but i am a tad confused i uploaded a file (200MB) then removed the file and
the space is not changed. i am not sure why that happens and what i can do
On Thu, Oct 19, 2017 at 6:42 PM, nigel davies wrote:
> PS
Well I was trying it some days ago and it didn't work for me.
maybe because of this:
http://tracker.ceph.com/issues/18749
https://github.com/ceph/ceph/pull/17619
I don't know if now it's actually working
El 19/10/2017 a las 22:55, David Turner escribió:
> In a 3 node cluster with EC k=2 m=1,
I'm better off trying to solve the first hurdle.
This ceph cluster is in production serving 186 guest VMs.
-RG
On Thu, Oct 19, 2017 at 3:52 PM, David Turner wrote:
> Assuming the problem with swapping out hardware is having spare
> hardware... you could always switch
No, I have not ruled out the disk controller and backplane making the disks
slower.
Is there a way I could test that theory, other than swapping out hardware?
-RG
On Thu, Oct 19, 2017 at 3:44 PM, David Turner wrote:
> Have you ruled out the disk controller and backplane
Imagine we have a 3 OSDs cluster and I make an erasure pool with k=2 m=1.
If I have an OSD fail, we can rebuild the data but (I think) the hole
cluster won't be able to perform IOS.
Wouldn't be possible to make the cluster work in a degraded mode?
I think it would be a good idea to make the
Have you ruled out the disk controller and backplane in the server running
slower?
On Thu, Oct 19, 2017 at 4:42 PM Russell Glaue wrote:
> I ran the test on the Ceph pool, and ran atop on all 4 storage servers, as
> suggested.
>
> Out of the 4 servers:
> 3 of them performed with
I ran the test on the Ceph pool, and ran atop on all 4 storage servers, as
suggested.
Out of the 4 servers:
3 of them performed with 17% to 30% disk %busy, and 11% CPU wait.
Momentarily spiking up to 50% on one server, and 80% on another
The 2nd newest server was almost averaging 90% disk %busy
The most realistic backlog feature would be for adding support for
namespaces within RBD [1], but it's not being actively developed at
the moment. Of course, the usual caveat that "everyone with access to
the cluster network would be trusted" would still apply. It's because
of that assumption that
Hi,
>> have you checked the output of "ceph-disk list” on the nodes where
the OSDs are not coming back on?
Yes, it shows all the disk correctly mounted.
>> And finally inspect /var/log/ceph/ceph-osd.${id}.log to see messages
produced by the OSD itself when it starts.
This is the error
I want to give permissions to my clients but only for reading/writting
an specific RBD image not the hole pool.
If I give permissions to the hole pool, a client could delete all the
images in the pool or mount any other image and I don't really want that.
I've read about using prefix
Hello, I recently migrated to Bluestore on Luminous and have enabled
aggressive snappy compression on my CephFS data pool. I was wondering if
there was a way to see how much space was being saved. Also, are existing
files compressed at all, or do I have a bunch of resyncing ahead of me?
Sorry if
Hi,
have you checked the output of "ceph-disk list” on the nodes where the OSDs are
not coming back on?
This should give you a hint on what’s going one.
Also use dmesg to search for any error message
And finally inspect /var/log/ceph/ceph-osd.${id}.log to see messages produced
by the OSD
Hi,
I am not able to start some of the OSDs in the cluster.
This is a test cluster and had 8 OSDs. One node was taken out for
maintenance. I set the noout flag and after the server came back up I
unset the noout flag.
Suddenly couple of OSDs went down.
And now I can start the OSDs manually
I have tried using ceph-disk directly and i'm running into all sorts of
trouble but I'm trying my best. Currently I am using the following cobbled
script which seems to be working:
https://github.com/seapasulli/CephScripts/blob/master/provision_storage.sh
I'm at 11 right now. I hope this works.
Nigel-
What method did you use to upload and delete the file? How did you check
the space utilization? I believe the reason that you are still seeing the
space being utilized when you issue your ceph -df is because even after the
file is deleted, the file system doesn't actually delete the file,
Please ignore. I found the mistake.
On 19-10-2017 21:08, Josy wrote:
Hi,
I created a testprofile, but not able to create a pool using it
==
$ ceph osd erasure-code-profile get testprofile1
crush-device-class=
crush-failure-domain=host
crush-root=default
Hay all
I am looking at my small test Ceph cluster, i have uploaded a 200MB iso and
checked the space on "ceph status" and see it incress.
But when i delete the file the space used does not go down.
Have i missed a configuration somewhere or something?
On 19/10/17 11:00, Dennis Benndorf wrote:
> Hello @all,
>
> givin the following config:
>
> * ceph.conf:
>
> ...
> mon osd down out subtree limit = host
> osd_pool_default_size = 3
> osd_pool_default_min_size = 2
> ...
>
> * each OSD has its
Hi,
I created a testprofile, but not able to create a pool using it
==
$ ceph osd erasure-code-profile get testprofile1
crush-device-class=
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=10
m=4
plugin=jerasure
technique=reed_sol_van
w=8
$ ceph osd pool
> Op 19 oktober 2017 om 16:47 schreef Caspar Smit :
>
>
> Hi David,
>
> Thank you for your answer, but wouldn't scrub (deep-scrub) handle
> that? It will flag the unflushed journal pg's as inconsistent and you
> would have to repair the pg's. Or am i overlooking
Hi,
If you want to split your data to 10 peaces (stripes), and hold 4 parity
peaces in extra (so your cluster can handle the loss of any 4 osds),
then you need a minimum of 14 osds to hold your data.
Denes.
On 10/19/2017 04:24 PM, Josy wrote:
Hi,
I would like to set up an erasure code
Hi David,
Thank you for your answer, but wouldn't scrub (deep-scrub) handle
that? It will flag the unflushed journal pg's as inconsistent and you
would have to repair the pg's. Or am i overlooking something here? The
official blog doesn't state anything about this method being a bad
idea.
Caspar
Hi,
I would like to set up an erasure code profile with k=10 amd m=4 settings.
Is there any minimum requirement of OSD nodes and OSDs to achieve this
setting ?
Can I create a pool with 8 OSD servers, with one disk each in it ?
___
ceph-users
I'm speaking to the method in general and don't know the specifics of
bluestore. Recovering from a failed journal in this way is only a good
idea if you were able to flush the journal before making a new one. If the
journal failed during operation and you couldn't cleanly flush the journal,
then
Hi all,
I'm testing some scenario's with the new Ceph luminous/bluestore combination.
I've created a demo setup with 3 nodes (each has 10 HDD's and 2 SSD's)
So i created 10 BlueStore OSD's with a seperate 20GB block.db on the
SSD's (5 HDD's per block.db SSD).
I'm testing a failure of one of
Hi all,
I'm hoping some of you have some experience in dealing with this, as
unfortunately this is the first time we encountered this issue.
We currently have placement groups that are stuck unclean with
'active+remapped' as last state.
The rundown of what happened:
Yesterday morning, one of
Hi Greg,
Thanks for your findings! We've updated the issue with the log files of
osd.93 and osd.69 which corresponds to the period of the log we posted.
Also, we've recreated a new set of logs for that pair of OSDs. As we
explain in the issue, right now the OSDs fail on that other assert you
Hello @all,
givin the following config:
* ceph.conf:
...
mon osd down out subtree limit = host
osd_pool_default_size = 3
osd_pool_default_min_size = 2
...
* each OSD has its journal on a 30GB partition on a PCIe-Flash-Card
* 3 hosts
What
Mostly I'm using ceph as a storage to my vms in Proxmox. I have radosgw but
only for tests. It doesn't seem cause of problem.
I've tuned these parameters. It should improve speed of requests in
recovery stage, but I receive warnings anyway:
osd_client_op_priority = 63
osd_recovery_op_priority = 1
hi greg,
i attached the gzip output of the query and some more info below. if you
need more, let me know.
stijn
> [root@mds01 ~]# ceph -s
> cluster 92beef0a-1239-4000-bacf-4453ab630e47
> health HEALTH_ERR
> 1 pgs inconsistent
> 40 requests are blocked > 512 sec
Are you using radosgw? I found this page useful when I had a similar issue:
http://www.osris.org/performance/rgw.html
Sean
On Wed, 18 Oct 2017, Ольга Ухина said:
> Hi!
>
> I have a problem with ceph luminous 12.2.1. It was upgraded from kraken,
> but I'm not sure if it was a problem in
> Memory usage is still quite high here even with a large onode cache!
> Are you using erasure coding? I recently was able to reproduce a bug in
> bluestore causing excessive memory usage during large writes with EC,
> but have not tracked down exactly what's going on yet.
>
> Mark
No, this
Hi:
when I analyzed the performance of ceph, I found that rebuild_aligned was
time-consuming, and the analysis found that rebuild operations were performed
every time.
Source code:
FileStore::queue_transactions
–> journal->prepare_entry(o->tls, );
-> data_align = ((*p).get_data_alignment() -
What about not using deploy?
-Original Message-
From: Sean Sullivan [mailto:lookcr...@gmail.com]
Sent: donderdag 19 oktober 2017 2:28
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Luminous can't seem to provision more than 32 OSDs
per server
I am trying to install Ceph
51 matches
Mail list logo