[ceph-users] Backfilling on Luminous

2018-03-15 Thread David Turner
I upgraded a [1] cluster from Jewel 10.2.7 to Luminous 12.2.2 and last week I added 2 nodes to the cluster. The backfilling has been ATROCIOUS. I have OSDs consistently [2] segfaulting during recovery. There's no pattern of which OSDs are segfaulting, which hosts have segfaulting OSDs, etc...

Re: [ceph-users] Disk write cache - safe?

2018-03-14 Thread David Byte
on the RAID controller with each drive as it's own RAID-0 has positive performance results. This is something to try and see if you can regain some of the performance, but as always in storage, YMMV. David Byte Sr. Technology Strategist SCE Enterprise Linux SCE Enterprise Storage Alliances and SUSE

Re: [ceph-users] Understanding/correcting sudden onslaught of unfound objects

2018-03-14 Thread David Zafman
attributes are getting corrupted.  All the errors are on shard 0.  My testing shows that repair will fix this scenario. David On 3/13/18 3:48 PM, Graham Allan wrote: Updated cluster now to 12.2.4 and the cycle of inconsistent->repair->unfound seems to continue, though possibly slightly differ

Re: [ceph-users] Cephfs MDS slow requests

2018-03-14 Thread David C
Thanks, John. I'm pretty sure the root of my slow OSD issues is filestore subfolder splitting. On Wed, Mar 14, 2018 at 2:17 PM, John Spray <jsp...@redhat.com> wrote: > On Tue, Mar 13, 2018 at 7:17 PM, David C <dcsysengin...@gmail.com> wrote: > > Hi All > >

Re: [ceph-users] Luminous | PG split causing slow requests

2018-03-14 Thread David C
On Mon, Feb 26, 2018 at 6:08 PM, David Turner <drakonst...@gmail.com> wrote: > The slow requests are absolutely expected on filestore subfolder > splitting. You can however stop an OSD, split it's subfolders, and start > it back up. I perform this maintenance once/month. I chan

Re: [ceph-users] Cephfs MDS slow requests

2018-03-13 Thread David C
Thanks for the detailed response, Greg. A few follow ups inline: On 13 Mar 2018 20:52, "Gregory Farnum" <gfar...@redhat.com> wrote: On Tue, Mar 13, 2018 at 12:17 PM, David C <dcsysengin...@gmail.com> wrote: > Hi All > > I have a Samba server that is exporting d

[ceph-users] Cephfs MDS slow requests

2018-03-13 Thread David C
ended highest MDS debug setting before performance starts to be adversely affected (I'm aware log files will get huge)? 4) What's the best way of matching inodes in the MDS log to the file names in cephfs? Hardware/Versions: Luminous 12.1.1 Cephfs client 3.10.0-514.2.2.el7.x86_64 Samba 4.4.4 4 node

Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

2018-03-12 Thread David Disseldorp
kes sense to change the current behaviour of blocking the TMF ABORT response until the cluster I/O completes. Cheers, David ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

2018-03-12 Thread David Disseldorp
existing abort_request() codepath only cancels the I/O on the client/gw side. A TMF ABORT successful response should only be sent if we can guarantee that the I/O is terminated at all layers below, so I think this would have to be implemented via an additional OSD epoch barrier or similar. Chee

Re: [ceph-users] Civetweb log format

2018-03-09 Thread David Turner
I'll have to do some processing to correlate > the key id with the rest of the request info. > > > Aaron > > On Mar 8, 2018, at 8:18 PM, Matt Benjamin <mbenj...@redhat.com> wrote: > > Hi Yehuda, > > I did add support for logging arbitrary headers, but not a >

Re: [ceph-users] Uneven pg distribution cause high fs_apply_latency on osds with more pgs

2018-03-08 Thread David Turner
PGs being unevenly distributed is a common occurrence in Ceph. Luminous started making some steps towards correcting this, but you're in Jewel. There are a lot of threads in the ML archives about fixing PG distribution. Generally every method comes down to increasing the weight on OSDs with too

Re: [ceph-users] Civetweb log format

2018-03-08 Thread David Turner
at makes me sad. > > Aaron > > > On Mar 8, 2018, at 12:36 PM, David Turner <drakonst...@gmail.com> wrote: > > Setting radosgw debug logging to 10/10 is the only way I've been able to > get the access key in the logs for requests. It's very unfortunate as it > DRASTICALLY i

Re: [ceph-users] Civetweb log format

2018-03-08 Thread David Turner
Setting radosgw debug logging to 10/10 is the only way I've been able to get the access key in the logs for requests. It's very unfortunate as it DRASTICALLY increases the amount of log per request, but it's what we needed to do to be able to have the access key in the logs along with the

Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

2018-03-07 Thread David Disseldorp
the same region via the alternate path. It's not something that we've observed in the wild, but is nevertheless a bug that is being worked on, with a resolution that should also be usable for active/active tcmu-runner. Cheers, David ___ ceph-users mailing

Re: [ceph-users] Why one crippled osd can slow down or block all request to the whole ceph cluster?

2018-03-06 Thread David Turner
if the mon can detect slow/blocked request from certain osd > why can't mon mark a osd with blocked request down if the request is > blocked for a certain time. > > 2018-03-07 > -- > shadow_lin > -- > > *发件人:*David Tu

Re: [ceph-users] Why one crippled osd can slow down or block all request to the whole ceph cluster?

2018-03-06 Thread David Turner
There are multiple settings that affect this. osd_heartbeat_grace is probably the most apt. If an OSD is not getting a response from another OSD for more than the heartbeat_grace period, then it will tell the mons that the OSD is down. Once mon_osd_min_down_reporters have told the mons that an

Re: [ceph-users] Deep Scrub distribution

2018-03-06 Thread David Turner
I'm pretty sure I put up one of those scripts in the past. Basically what we did was we set our scrub cycle to something like 40 days, we then sort all PGs by the last time they were deep scrubbed. We grab the oldest 1/30 of those PGs and tell them to deep-scrub manually, the next day we do it

Re: [ceph-users] how is iops from ceph -s client io section caculated?

2018-03-03 Thread David Turner
I would guess that the higher iops in ceph status are from iops calculated from replication. fio isn't aware of the backend replication iops, only what it's doing to the rbd On Fri, Mar 2, 2018, 11:53 PM shadow_lin wrote: > Hi list, > There is a client io section from the

Re: [ceph-users] Cluster is empty but it still use 1Gb of data

2018-03-02 Thread David Turner
[1] Here is a ceph starts on a brand new cluster that has never had any pools created or data or into it at all. 323GB used out of 2.3PB. that's 0.01% overhead, but we're using 10TB disks for this cluster, and the overhead is moreso per osd than per TB. It is 1.1GB overhead per osd. 34 of the

Re: [ceph-users] Ceph and multiple RDMA NICs

2018-03-01 Thread David Turner
> Hi David, > > Thank you for your reply. As I understand your experience with multiple > subnets > suggests sticking to a single device. However, I have a powerful RDMA NIC > (100Gbps) with two ports and I have seen recommendations from Mellanox to > separate the > two

Re: [ceph-users] Slow requests troubleshooting in Luminous - details missing

2018-03-01 Thread David Turner
Blocked requests and slow requests are synonyms in ceph. They are 2 names for the exact same thing. On Thu, Mar 1, 2018, 10:21 PM Alex Gorbachev <a...@iss-integration.com> wrote: > On Thu, Mar 1, 2018 at 2:47 PM, David Turner <drakonst...@gmail.com> > wrote: > > `ceph he

Re: [ceph-users] Slow requests troubleshooting in Luminous - details missing

2018-03-01 Thread David Turner
`ceph health detail` should show you more information about the slow requests. If the output is too much stuff, you can grep out for blocked or something. It should tell you which OSDs are involved, how long they've been slow, etc. The default is for them to show '> 32 sec' but that may very

Re: [ceph-users] Cannot delete a pool

2018-03-01 Thread David Turner
When dealing with the admin socket you need to be an admin. `sudu` or `sudo -u ceph` ought to get you around that. I was able to delete a pool just by using the injectargs that you showed above. ceph tell mon.\* injectargs '--mon-allow-pool-delete=true' ceph osd pool rm pool_name pool_name

Re: [ceph-users] Case where a separate Bluestore WAL/DB device crashes...

2018-03-01 Thread David Turner
This aspect of osds has not changed from filestore with SSD journals to bluestore with DB and WAL soon SSDs. If the SSD fails, all osds using it aren't lost and need to be removed from the cluster and recreated with a new drive. You can never guarantee data integrity on bluestore or filestore if

Re: [ceph-users] Slow clients after git pull

2018-03-01 Thread David Turner
with vim I notice that is a bit > slower while is updating the repository, but after the update it works as > fast as before. > > It fails even on Jewel so I think that maybe the only way to do it is to > create a task to remount the FS when I deploy. > > Greetings and thanks!! &g

Re: [ceph-users] force scrubbing

2018-03-01 Thread David Turner
They added `ceph pg force-backfill ` but there is nothing to force scrubbing yet aside from the previously mentioned tricks. You should be able to change osd_max_scrubs around until the PGs you want to scrub are going. On Thu, Mar 1, 2018 at 9:30 AM Kenneth Waegeman

Re: [ceph-users] Ceph iSCSI is a prank?

2018-03-01 Thread David Disseldorp
s too intrusive to make it upstream, so we need to work on a proper upstreamable solution, with tcmu-runner or otherwise. Cheers, David ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Slow clients after git pull

2018-03-01 Thread David Turner
Using CephFS for something like this is about the last thing I would do. Does it need to be on a networked posix filesystem that can be mounted on multiple machines at the same time? If so, then you're kinda stuck and we can start looking at your MDS hardware and see if there are any MDS settings

Re: [ceph-users] Memory leak in Ceph OSD?

2018-03-01 Thread David Turner
With default memory settings, the general rule is 1GB ram/1TB OSD. If you have a 4TB OSD, you should plan to have at least 4GB ram. This was the recommendation for filestore OSDs, but it was a bit much memory for the OSDs. From what I've seen, this rule is a little more appropriate with

Re: [ceph-users] Luminous Watch Live Cluster Changes Problem

2018-03-01 Thread David Turner
`ceph pg stat` might be cleaner to watch than the `ceph status | grep pgs`. I also like watching `ceph osd pool stats` which breaks down all IO by pool. You also have the option of the dashboard mgr service which has a lot of useful information including the pool IO breakdown. On Thu, Mar 1,

Re: [ceph-users] Ceph and multiple RDMA NICs

2018-03-01 Thread David Turner
There has been some chatter on the ML questioning the need to separate out the public and private subnets for Ceph. The trend seems to be in simplifying your configuration which for some is not specifying multiple subnets here. I haven't heard of anyone complaining about network problems with

Re: [ceph-users] ceph-deploy won't install luminous (but Jewel instead)

2018-03-01 Thread David Turner
o repo-luminous > ... got always Jewel. > > Only force the release in the ceph-deploy command allow me to install > luminous. > > Probably yum-plugin-priorities should not be installed after ceph-deploy > even if I didn't run still any command. > But what is so strange is

Re: [ceph-users] ceph-deploy won't install luminous (but Jewel instead)

2018-02-28 Thread David Turner
Which version of ceph-deploy are you using? On Wed, Feb 28, 2018 at 4:37 AM Massimiliano Cuttini wrote: > This worked. > > However somebody should investigate why default is still jewel on Centos > 7.4 > > Il 28/02/2018 00:53, jorpilo ha scritto: > > Try using: > ceph-deploy

Re: [ceph-users] mirror OSD configuration

2018-02-28 Thread David Turner
A more common search term for this might be Rack failure domain. The premise is the same for room as it is for rack, both can hold hosts and be set as the failure domain. There is a fair bit of discussion on how to achieve multi-rack/room/datacenter setups. Datacenter setups are more likely to

Re: [ceph-users] Ceph SNMP hooks?

2018-02-28 Thread David Turner
You could probably write an SNMP module for the new ceph-mgr daemon. What do you want to use to monitor Ceph that requires SNMP? On Wed, Feb 28, 2018 at 1:13 PM Andre Goree wrote: > I've looked and haven't found much information besides custom 3rd-party > plugins so I figured

Re: [ceph-users] Ceph iSCSI is a prank?

2018-02-28 Thread David Turner
My thought is that in 4 years you could have migrated to a hypervisor that will have better performance into ceph than an added iSCSI layer. I won't deploy VMs for ceph on anything that won't allow librbd to work. Anything else is added complexity and reduced performance. On Wed, Feb 28, 2018,

Re: [ceph-users] Ceph-Fuse and mount namespaces

2018-02-28 Thread David Turner
If you run your container in privileged mode you can mount ceph-fuse inside of the VMs instead of from the shared resource on the host. I used a configuration like this to test multi-tenency speed tests of CephFS using ceph-fuse. The more mount points I used 1 per container, the more bandwidth I

Re: [ceph-users] Corrupted files on CephFS since Luminous upgrade

2018-02-28 Thread David C
On 27 Feb 2018 06:46, "Jan Pekař - Imatic" wrote: I think I hit the same issue. I have corrupted data on cephfs and I don't remember the same issue before Luminous (i did the same tests before). It is on my test 1 node cluster with lower memory then recommended (so server

Re: [ceph-users] OSD maintenance (ceph osd set noout)

2018-02-28 Thread David Turner
Like John says, noout prevents an osd being marked out in the cluster. It does not impede it from being marked down and back up which is the desired behavior when restarting a server. What are you seeing with your osds becoming unusable and needing to rebuild them? When rebooting a server if it

Re: [ceph-users] Luminous | PG split causing slow requests

2018-02-27 Thread David C
This is super helpful, thanks for sharing, David. I need to a bit more reading into this. On 26 Feb 2018 6:08 p.m., "David Turner" <drakonst...@gmail.com> wrote: The slow requests are absolutely expected on filestore subfolder splitting. You can however stop an OSD, split

Re: [ceph-users] Proper procedure to replace DB/WAL SSD

2018-02-27 Thread David Turner
Smit <caspars...@supernas.eu> wrote: > David, > > Yes i know, i use 20GB partitions for 2TB disks as journal. It was just to > inform other people that Ceph's default of 1GB is pretty low. > Now that i read my own sentence it indeed looks as if i was using 1GB > partitions,

Re: [ceph-users] cannot reboot one of 3 nodes without locking a cluster OSDs stay in...

2018-02-27 Thread David Turner
nt > work. > > > > 2018-02-27 14:24 GMT+01:00 David Turner <drakonst...@gmail.com>: > >> `systemctl list-dependencies ceph.target` >> >> I'm guessing that you might need to enable your osds to be managed by >> systemctl so that they can be stopped when the serve

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-02-27 Thread David Turner
>>> I believe you can override the setting (I'm not sure how), > but you really want to correct that flag at the OS layer. Generally when we > see this there's a RAID card or something between the solid-state device > and the host which is lying about the s

Re: [ceph-users] cannot reboot one of 3 nodes without locking a cluster OSDs stay in...

2018-02-27 Thread David Turner
`systemctl list-dependencies ceph.target` I'm guessing that you might need to enable your osds to be managed by systemctl so that they can be stopped when the server goes down. `systemctl enable ceph-osd@{osd number}` On Tue, Feb 27, 2018, 4:13 AM Philip Schroth

Re: [ceph-users] Proper procedure to replace DB/WAL SSD

2018-02-27 Thread David Turner
<caspars...@supernas.eu> >> wrote: >> >>> 2018-02-24 7:10 GMT+01:00 David Turner <drakonst...@gmail.com>: >>> >>>> Caspar, it looks like your idea should work. Worst case scenario seems >>>> like the osd wouldn't start, y

Re: [ceph-users] Significance of the us-east-1 region when using S3 clients to talk to RGW

2018-02-26 Thread David Turner
ure (assuming http endpoint and not https). > > Yehuda > > On Mon, Feb 26, 2018 at 1:21 PM, David Turner <drakonst...@gmail.com> > wrote: > > I set it to that for randomness. I don't have a zonegroup named 'us' > > either, but that works fine. I don't see why 'cn'

Re: [ceph-users] Significance of the us-east-1 region when using S3 clients to talk to RGW

2018-02-26 Thread David Turner
t you set in the config file, I assume that's what passed > in. Why did you set that in your config file? You don't have a > zonegroup named 'cn', right? > > On Mon, Feb 26, 2018 at 1:10 PM, David Turner <drakonst...@gmail.com> > wrote: > > I'm also not certain how to do th

Re: [ceph-users] Significance of the us-east-1 region when using S3 clients to talk to RGW

2018-02-26 Thread David Turner
I'm also not certain how to do the tcpdump for this. Do you have any pointers to how to capture that for you? On Mon, Feb 26, 2018 at 4:09 PM David Turner <drakonst...@gmail.com> wrote: > That's what I set it to in the config file. I probably should have > mentioned that. > &

Re: [ceph-users] Significance of the us-east-1 region when using S3 clients to talk to RGW

2018-02-26 Thread David Turner
tcpdump, see if that's actually > what's passed in? > > On Mon, Feb 26, 2018 at 12:02 PM, David Turner <drakonst...@gmail.com> > wrote: > > I run with `debug rgw = 10` and was able to find these lines at the end > of a > > request to create the bucket. > >

Re: [ceph-users] planning a new cluster

2018-02-26 Thread David Turner
Depending on what your security requirements are, you may not have a choice. If your OpenStack deployment shouldn't be able to load the Kubernetes RBDs (or vice versa), then you need to keep them separate and maintain different keyrings for the 2 services. If that is going to be how you go about

Re: [ceph-users] Significance of the us-east-1 region when using S3 clients to talk to RGW

2018-02-26 Thread David Turner
the specific failed > request might shed some light (would be interesting to look at the > generated LocationConstraint). > > Yehuda > > On Mon, Feb 26, 2018 at 11:29 AM, David Turner <drakonst...@gmail.com> > wrote: > > Our problem only appeared to be present in bucke

Re: [ceph-users] Significance of the us-east-1 region when using S3 clients to talk to RGW

2018-02-26 Thread David Turner
Our problem only appeared to be present in bucket creation. Listing, putting, etc objects in a bucket work just fine regardless of the bucket_location setting. I ran this test on a few different realms to see what would happen and only 1 of them had a problem. There isn't an obvious thing that

Re: [ceph-users] Luminous | PG split causing slow requests

2018-02-26 Thread David Turner
are being split. [1] filestore_merge_threshold = -16 filestore_split_multiple = 256 [2] https://gist.github.com/drakonstein/cb76c7696e65522ab0e699b7ea1ab1c4 [3] filestore_merge_threshold = -1 filestore_split_multiple = 1 On Mon, Feb 26, 2018 at 12:18 PM David C <dcsysengin...@gmail.

Re: [ceph-users] How to correctly purge a "ceph-volume lvm" OSD

2018-02-26 Thread David Turner
Deza <ad...@redhat.com> wrote: > On Mon, Feb 26, 2018 at 11:24 AM, David Turner <drakonst...@gmail.com> > wrote: > > If we're asking for documentation updates, the man page for ceph-volume > is > > incredibly outdated. In 12.2.3 it still says that bluestore is not

Re: [ceph-users] Luminous | PG split causing slow requests

2018-02-26 Thread David C
Thanks, David. I think I've probably used the wrong terminology here, I'm not splitting PGs to create more PGs. This is the PG folder splitting that happens automatically, I believe it's controlled by the "filestore_split_multiple" setting (which is 8 on my OSDs, I believe that's th

Re: [ceph-users] Proper procedure to replace DB/WAL SSD

2018-02-26 Thread David Turner
picked up these changes here. On Mon, Feb 26, 2018 at 6:23 AM Caspar Smit <caspars...@supernas.eu> wrote: > 2018-02-24 7:10 GMT+01:00 David Turner <drakonst...@gmail.com>: > >> Caspar, it looks like your idea should work. Worst case scenario seems >> like the osd wou

Re: [ceph-users] Storage usage of CephFS-MDS

2018-02-26 Thread David Turner
). > > Using that you can extrapolate how much space the data pool needs > > based on your file system usage. (If all you're doing is filling the > > file system with empty files, of course you're going to need an > > unusually large metadata pool.) > > > Many th

Re: [ceph-users] How to correctly purge a "ceph-volume lvm" OSD

2018-02-26 Thread David Turner
If we're asking for documentation updates, the man page for ceph-volume is incredibly outdated. In 12.2.3 it still says that bluestore is not yet implemented and that it's planned to be supported. '[--bluestore] filestore objectstore (not yet implemented)' 'using a filestore setup (bluestore

Re: [ceph-users] Luminous | PG split causing slow requests

2018-02-26 Thread David Turner
much priority to the recovery operations so that client IO can still happen. On Mon, Feb 26, 2018 at 11:10 AM David C <dcsysengin...@gmail.com> wrote: > Hi All > > I have a 12.2.1 cluster, all filestore OSDs, OSDs are spinners, journals > on NVME. Cluster primarily used for Ce

[ceph-users] Luminous | PG split causing slow requests

2018-02-26 Thread David C
crash issues which I think could be related. Is there anything I can do to mitigate the slow requests problem? The rest of the time the cluster is performing pretty well. Thanks, David ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Storage usage of CephFS-MDS

2018-02-26 Thread David Turner
Patrick's answer supersedes what I said about RocksDB usage. My knowledge was more general for actually storing objects, not the metadata inside of MDS. Thank you for sharing Patrick. On Mon, Feb 26, 2018 at 11:00 AM Patrick Donnelly wrote: > On Sun, Feb 25, 2018 at 10:26

Re: [ceph-users] reweight-by-utilization reverse weight after adding new nodes?

2018-02-26 Thread David Turner
I would recommend continuing from where you are now and running `ceph osd reweight-by-utilization` again. Your weights might be a little more odd, but your data distribution should be the same. If you were to reset the weights for the previous OSDs, you would only incur an additional round of

Re: [ceph-users] Storage usage of CephFS-MDS

2018-02-26 Thread David Turner
When a Ceph system is in recovery, it uses much more RAM than it does while running healthy. This increase is often on the order of 4x more memory (at least back in the days of filestore, I'm not 100% certain about bluestore, but I would assume the same applies). You have another thread on the

Re: [ceph-users] Install previous version of Ceph

2018-02-26 Thread David Turner
In the past I downloaded the packages for a version and configured it as a local repo on the server. basically it was a tar.gz that I would extract that would place the ceph packages in a folder for me and swap out the repo config file to a version that points to the local folder. I haven't

Re: [ceph-users] 【mon】Problem with mon leveldb

2018-02-26 Thread David Turner
Mons won't compact and clean up old maps while any PG is in a non-clean state. What is your `ceph status`? I would guess this isn't your problem, but thought I'd throw it out there just in case. Also in Hammer, OSDs started telling each other when they clean up maps and this caused a map

Re: [ceph-users] MDS crash Luminous

2018-02-26 Thread David C
Thanks for the tips, John. I'll increase the debug level as suggested. On 25 Feb 2018 20:56, "John Spray" <jsp...@redhat.com> wrote: > On Sat, Feb 24, 2018 at 10:13 AM, David C <dcsysengin...@gmail.com> wrote: > > Hi All > > > > I had an MDS go down

[ceph-users] MDS crash Luminous

2018-02-24 Thread David C
Hi All I had an MDS go down on a 12.2.1 cluster, the standby took over but I don't know what caused the issue. Scrubs are scheduled to start at 23:00 on this cluster but this appears to have started a minute before. Can anyone help me with diagnosing this please. Here's the relevant bit from the

Re: [ceph-users] PG overdose protection causing PG unavailability

2018-02-23 Thread David Turner
There was another part to my suggestion which was to set the initial crush weight to 0 in ceph.conf. after you add all of your osds, you could download the crush map, weight the new osds to what they should be, and upload the crush map to give them all the ability to take PGs at the same time.

Re: [ceph-users] Proper procedure to replace DB/WAL SSD

2018-02-23 Thread David Turner
Caspar, it looks like your idea should work. Worst case scenario seems like the osd wouldn't start, you'd put the old SSD back in and go back to the idea to weight them to 0, backfilling, then recreate the osds. Definitely with a try in my opinion, and I'd love to hear your experience after.

Re: [ceph-users] Ceph auth caps - make it more user error proof

2018-02-23 Thread David Turner
+1 for this. I messed up a cap on a cluster I was configuring doing this same thing. Luckily it wasn't production and I could fix it quickly. On Thu, Feb 22, 2018, 8:09 PM Gregory Farnum wrote: > On Wed, Feb 21, 2018 at 10:54 AM, Enrico Kern >

Re: [ceph-users] Ceph Bluestore performance question

2018-02-23 Thread David Turner
Your 6.7GB of DB partition for each 4TB osd is on the very small side of things. It's been discussed a few times in the ML and the general use case seems to be about 10GB DB per 1TB of osd. That would be about 40GB DB partition for each of your osds. This general rule covers most things except for

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-02-23 Thread David Turner
Here is a [1] link to a ML thread tracking some slow backfilling on bluestore. It came down to the backfill sleep setting for them. Maybe it will help. [1] https://www.mail-archive.com/ceph-users@lists.ceph.com/msg40256.html On Fri, Feb 23, 2018 at 10:46 AM Reed Dier

Re: [ceph-users] Min Size equal to Replicated Size Risks

2018-02-22 Thread David Turner
The pool will not actually go read only. All read and write requests will block until both osds are back up. If I were you, I would use min_size=2 and change it to 1 temporarily if needed to do maintenance or troubleshooting where down time is not an option. On Thu, Feb 22, 2018, 5:31 PM Georgios

Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

2018-02-22 Thread David Herselman
their Data Centre reliability stamp. I returned the lot and am done with Intel SSDs, will advise as many customers and peers to do the same… Regards David Herselman From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mike Lovell Sent: Thursday, 22 February 2018 11:19 PM

Re: [ceph-users] mon service failed to start

2018-02-22 Thread David Turner
Did you remove and recreate the OSDs that used the SSD for their WAL/DB? Or did you try to do something to not have to do that? That is an integral part of the OSD and changing the SSD would destroy the OSDs involved unless you attempted some sort of dd. If you did that, then any corruption for

Re: [ceph-users] PG overdose protection causing PG unavailability

2018-02-21 Thread David Turner
You could set the flag noin to prevent the new osds from being calculated by crush until you are ready for all of them in the host to be marked in. You can also set initial crush weight to 0 for new pads so that they won't receive any PGs until you're ready for it. On Wed, Feb 21, 2018, 5:46 PM

Re: [ceph-users] How to really change public network in ceph

2018-02-21 Thread David Turner
Osds can change their IP every time they start. When they start and check in with the mons, they tell the mons where they are. Changing your public network requires restarting every daemon. Likely you will want to schedule downtime for this. Clients can be routed and on whatever subnet you want,

Re: [ceph-users] Help with Bluestore WAL

2018-02-21 Thread David Turner
There WAL sis a required party of the osd. If you remove that, then the osd is missing a crucial part of itself and it will be unable to start until the WAL is back online. If the SSD were to fail, then all osds using it would need to be removed and recreated on the cluster. On Tue, Feb 20, 2018,

Re: [ceph-users] Migrating to new pools

2018-02-21 Thread David Turner
I recently migrated several VMs from an HDD pool to an SSD pool without any downtime with proxmox. It is definitely possible with qemu to do no downtime migrations between pools. On Wed, Feb 21, 2018, 8:32 PM Alexandre DERUMIER wrote: > Hi, > > if you use qemu, it's also

Re: [ceph-users] Upgrading inconvenience for Luminous

2018-02-21 Thread David Turner
Having all of the daemons in your cluster able to restart themselves at will sounds terrifying. What's preventing every osd from restarting at the same time? Also, ceph dot releases have been known to break environments. It's the nature of such a widely used software. I would recommend pinning the

Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2

2018-02-21 Thread David Turner
r Ceph cluster" from Ceph Days Germany earlier this month for > other things to watch out for: > > > > https://ceph.com/cephdays/germany/ > > > > Bryan > > > > *From: *ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of Bryan > Banister <bbani

Re: [ceph-users] Significance of the us-east-1 region when using S3 clients to talk to RGW

2018-02-19 Thread David Turner
Specifically my issue was having problems without this set in the .s3cfg file. `bucket_location = US` On Mon, Feb 19, 2018 at 5:04 PM David Turner <drakonst...@gmail.com> wrote: > I wasn't using the Go SDK. I was using s3cmd when I came across this. > > On Mon, Feb 19, 2018 at

Re: [ceph-users] Significance of the us-east-1 region when using S3 clients to talk to RGW

2018-02-19 Thread David Turner
I wasn't using the Go SDK. I was using s3cmd when I came across this. On Mon, Feb 19, 2018 at 4:42 PM Yehuda Sadeh-Weinraub wrote: > Sounds like the go sdk adds a location constraint to requests that > don't go to us-east-1. RGW itself is definitely isn't tied to >

Re: [ceph-users] Significance of the us-east-1 region when using S3 clients to talk to RGW

2018-02-18 Thread David Turner
I recently I came across this as well. It is an odd requirement. On Sun, Feb 18, 2018, 4:54 PM F21 wrote: > I am using the AWS Go SDK v2 (https://github.com/aws/aws-sdk-go-v2) to > talk to my RGW instance using the s3 interface. I am running ceph in > docker using the

Re: [ceph-users] Understanding/correcting sudden onslaught of unfound objects

2018-02-17 Thread David Zafman
hung before due to a bug or if recovery stopped (as designed) because of the unfound object.  The new recovery_unfound and backfill_unfound states indicates that recovery has stopped due to unfound objects. commit 64047e1bac2e775a06423a03cfab69b88462538c Author: David Zafman <d

Re: [ceph-users] Orphaned entries in Crush map

2018-02-16 Thread David Turner
20; do ceph osd rm > ${n}; done > > I assume that I did the right steps... > > > > > > On 16.02.2018 21:56, David Turner wrote: > > What is the output of `ceph osd stat`? My guess is that they are still > > considered to be part of the cluster and going throug

Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2

2018-02-16 Thread David Turner
a couple OSDs holding everything up. On Fri, Feb 16, 2018 at 4:15 PM Bryan Banister <bbanis...@jumptrading.com> wrote: > Thanks David, > > > > Taking the list of all OSDs that are stuck reports that a little over 50% > of all OSDs are in this condition. There isn’t any disc

Re: [ceph-users] Orphaned entries in Crush map

2018-02-16 Thread David Turner
What is the output of `ceph osd stat`? My guess is that they are still considered to be part of the cluster and going through the process of removing OSDs from your cluster is what you need to do. In particular `ceph osd rm 19`. On Fri, Feb 16, 2018 at 2:31 PM Karsten Becker

Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2

2018-02-16 Thread David Turner
osds 4,8,10,15,16,20,27,29,30,31,34,37,38,42,43,44,47,48,49,51,52,57,66,68,73,81,84,85,87,90,95,97,99,100,102,105,106,107,108,111,112,113,121,124,127,130,132 have stuck requests > 268435 sec On Fri, Feb 16, 2018 at 2:53 PM Bryan Banister <bbanis...@jumptrading.com> wrote: > Thanks David, > > > > I have set the nobackfill,

Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2

2018-02-16 Thread David Turner
Your problem might have been creating too many PGs at once. I generally increase pg_num and pgp_num by no more than 256 at a time. Making sure that all PGs are creating, peered, and healthy (other than backfilling). To help you get back to a healthy state, let's start off by getting all of your

Re: [ceph-users] Monitor won't upgrade

2018-02-16 Thread David Turner
Can you send us a `ceph status` and `ceph health detail`? Something is still weird. Also can you query the running daemon for it's version instead of asking the cluster? You should also be able to find it in the logs when it starts. On Fri, Feb 16, 2018, 4:24 AM Mark Schouten

Re: [ceph-users] Efficient deletion of large radosgw buckets

2018-02-15 Thread David Turner
Which is more important to you? Deleting the bucket fast or having the used space become available? If deleting the bucket fast is the priority, then you can swamp the GC by multithreading object deletion from the bucket with python or something. If having everything deleted and cleaned up from

Re: [ceph-users] Uneven OSD data distribution

2018-02-15 Thread David Turner
There are a lot of threads in the ML about rebalancing the data distribution in a cluster. The CRUSH algorithm is far from perfect when it comes to evenly distributing PGs, but it's fairly simple to work around and there are ceph tools that help with it. reweight-by-utilization being one of

Re: [ceph-users] Deployment with Xen

2018-02-15 Thread David Turner
15, 2018, 7:01 AM Egoitz Aurrekoetxea <ego...@sarenet.es> wrote: > Good morning David!! > > > First all I wanted to hugely thank the mail you sent yesterday. You don't > receive all the days these kind of advises from an expert in the area. I > printed the mail and read it

Re: [ceph-users] Monitor won't upgrade

2018-02-14 Thread David Turner
>From the mon.0 server run `ceph --version`. If you've restarted the mon daemon and it is still showing 0.94.5, it is most likely because that is the version of the packages on that server. On Wed, Feb 14, 2018 at 10:56 AM Mark Schouten wrote: > Hi, > > > > I have a (Proxmox)

Re: [ceph-users] Shutting down half / full cluster

2018-02-14 Thread David Turner
. On Wed, Feb 14, 2018 at 11:08 AM <dhils...@performair.com> wrote: > All; > > This might be a noob type question, but this thread is interesting, and > there's one thing I would like clarified. > > David Turner mentions setting 3 flags on OSDs, Götz has mentioned 5 flags

Re: [ceph-users] removing cache of ec pool (bluestore) with ec_overwrites enabled

2018-02-14 Thread David Turner
http://tracker.ceph.com/issues/22754 This is a bug in Luminous for cephfs volumes. This is not anything you're doing wrong. The mon check for removing a cache tier only checks that it's EC on CephFS and says no. The above tracker has a PR marked for backporting into Luminous to respond yes if

Re: [ceph-users] Shutting down half / full cluster

2018-02-14 Thread David Turner
ceph osd set noout ceph osd set nobackfill ceph osd set norecover Noout will prevent OSDs from being marked out during the maintenance and no PGs will be able to shift data around with the other 2 flags. After everything is done, unset the 3 flags and you're good to go. On Wed, Feb 14, 2018 at

Re: [ceph-users] Deployment with Xen

2018-02-14 Thread David Turner
that use librbd. On Tue, Feb 13, 2018 at 6:13 PM Egoitz Aurrekoetxea <ego...@sarenet.es> wrote: > Hi David!! > > Thanks a lot for your answer. But what happens when you have... imagine > two monitors or more and one of them becomes unreponsive?. Another one is > used after

Re: [ceph-users] osd crush reweight 0 on "out" OSD causes backfilling?

2018-02-13 Thread David Turner
/master/rados/operations/add-or-rm-osds/#removing-osds-manual > > On 13/02/18 14:38, David Turner wrote: > > An out osd still has a crush weight. Removing that osd or weighting it > > to 0 will change the weight of the host that it's in. That is why data > > moves again. Th

<    1   2   3   4   5   6   7   8   9   10   >