Re: [ceph-users] XFS attempt to access beyond end of device

2017-03-22 Thread Dan van der Ster
On Wed, Mar 22, 2017 at 8:24 AM, Marcus Furlong wrote: > Hi, > > I'm experiencing the same issue as outlined in this post: > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/013330.html > > I have also deployed this jewel cluster using ceph-deploy. > > This

Re: [ceph-users] Mon not starting after upgrading to 10.2.7

2017-04-12 Thread Dan van der Ster
Can't help, but just wanted to say that the upgrade worked for us: # ceph health HEALTH_OK # ceph tell mon.* version mon.p01001532077488: ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185) mon.p01001532149022: ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)

[ceph-users] fsping, why you no work no mo?

2017-04-13 Thread Dan van der Ster
Dear ceph-*, A couple weeks ago I wrote this simple tool to measure the round-trip latency of a shared filesystem. https://github.com/dvanders/fsping In our case, the tool is to be run from two clients who mount the same CephFS. First, start the server (a.k.a. the ping reflector) on one

Re: [ceph-users] [ceph-fuse] Quota size change does not notify another ceph-fuse client.

2017-03-14 Thread Dan van der Ster
Hi, This sounds familiar: http://tracker.ceph.com/issues/17939 I found that you can get the updated quota on node2 by touching the base dir. In your case: touch /shares/share0 -- Dan On Tue, Mar 14, 2017 at 10:52 AM, yu2xiangyang wrote: > Dear cephers, > I met

Re: [ceph-users] osd_disk_thread_ioprio_priority help

2017-03-13 Thread Dan van der Ster
On Mon, Mar 13, 2017 at 10:35 AM, Florian Haas wrote: > On Sun, Mar 12, 2017 at 9:07 PM, Laszlo Budai wrote: >> Hi Florian, >> >> thank you for your answer. >> >> We have already set the IO scheduler to cfq in order to be able to lower the >>

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-13 Thread Dan van der Ster
On Sat, Mar 11, 2017 at 12:21 PM, wrote: > > The next and biggest problem we encountered had to do with the CRC errors on > the OSD map. On every map update, the OSDs that were not upgraded yet, got > that CRC error and asked the monitor for a full OSD map instead of

[ceph-users] cephfs deep scrub error:

2017-03-13 Thread Dan van der Ster
Hi John, Last week we updated our prod CephFS cluster to 10.2.6 (clients and server side), and for the first time today we've got an object info size mismatch: I found this ticket you created in the tracker, which is why I've emailed you: http://tracker.ceph.com/issues/18240 Here's the detail

Re: [ceph-users] cephfs deep scrub error:

2017-03-13 Thread Dan van der Ster
On Mon, Mar 13, 2017 at 1:35 PM, John Spray <jsp...@redhat.com> wrote: > On Mon, Mar 13, 2017 at 10:28 AM, Dan van der Ster <d...@vanderster.com> > wrote: >> Hi John, >> >> Last week we updated our prod CephFS cluster to 10.2.6 (clients and >> server sid

[ceph-users] ceph osd safe to remove

2017-07-28 Thread Dan van der Ster
Hi all, We are trying to outsource the disk replacement process for our ceph clusters to some non-expert sysadmins. We could really use a tool that reports if a Ceph OSD *would* or *would not* be safe to stop, e.g. # ceph-osd-safe-to-stop osd.X Yes it would be OK to stop osd.X (which of course

Re: [ceph-users] ceph osd safe to remove

2017-08-03 Thread Dan van der Ster
r > recovery is complete, respectively. (the magic that made my reweight script > efficient compared to the official reweight script) > > And I have not used such a method in the past... my cluster is small, so I > have always just let recovery completely finish instead. I hope y

Re: [ceph-users] ceph osd safe to remove

2017-08-03 Thread Dan van der Ster
On Fri, Jul 28, 2017 at 9:39 PM, Alexandre Germain <germain.alexan...@gmail.com> wrote: > Hello Dan, > > Something like this maybe? > > https://github.com/CanonicalLtd/ceph_safe_disk > > Cheers, > > Alex > > 2017-07-28 9:36 GMT-04:00 Dan van der Ster <d.

Re: [ceph-users] ceph osd safe to remove

2017-08-03 Thread Dan van der Ster
On Thu, Aug 3, 2017 at 11:42 AM, Peter Maloney <peter.malo...@brockmann-consult.de> wrote: > On 08/03/17 11:05, Dan van der Ster wrote: > > On Fri, Jul 28, 2017 at 9:42 PM, Peter Maloney > <peter.malo...@brockmann-consult.de> wrote: > > Hello Dan, > > Based on

Re: [ceph-users] expanding cluster with minimal impact

2017-08-04 Thread Dan van der Ster
minimal impact. Reading previous > threads on this topic from the list I've found the ceph-gentle-reweight > script > (https://github.com/cernceph/ceph-scripts/blob/master/tools/ceph-gentle-reweight) > created by Dan van der Ster (Thank you Dan for sharing the script with us!). >

Re: [ceph-users] ceph-fuse mouting and returning 255

2017-08-10 Thread Dan van der Ster
Hi, I also noticed this and finally tracked it down: http://tracker.ceph.com/issues/20972 Cheers, Dan On Mon, Jul 10, 2017 at 3:58 PM, Florent B wrote: > Hi, > > Since 10.2.8 Jewel update, when ceph-fuse is mounting a file system, it > returns 255 instead of 0 ! > > $

Re: [ceph-users] PG stuck inconsistent, but appears ok?

2017-07-13 Thread Dan van der Ster
On Thu, Jul 13, 2017 at 4:23 PM, Aaron Bassett wrote: > Because it was a read error I check SMART stats for that osd's disk and sure > enough, it had some uncorrected read errors. In order to stop it from causing > more problems > I stopped the daemon to let ceph

[ceph-users] how to list and reset the scrub schedules

2017-07-14 Thread Dan van der Ster
Hi, Occasionally we want to change the scrub schedule for a pool or whole cluster, but we want to do this by injecting new settings without restarting every daemon. I've noticed that in jewel, changes to scrub_min/max_interval and deep_scrub_interval do not take immediate effect, presumably

Re: [ceph-users] autoconfigured haproxy service?

2017-07-11 Thread Dan van der Ster
On Tue, Jul 11, 2017 at 5:40 PM, Sage Weil wrote: > On Tue, 11 Jul 2017, Haomai Wang wrote: >> On Tue, Jul 11, 2017 at 11:11 PM, Sage Weil wrote: >> > On Tue, 11 Jul 2017, Sage Weil wrote: >> >> Hi all, >> >> >> >> Luminous features a new 'service map' that

Re: [ceph-users] Stealth Jewel release?

2017-07-12 Thread Dan van der Ster
On Wed, Jul 12, 2017 at 5:51 PM, Abhishek L wrote: > On Wed, Jul 12, 2017 at 9:13 PM, Xiaoxi Chen wrote: >> +However, it also introduced a regression that could cause MDS damage. >> +Therefore, we do *not* recommend that Jewel users upgrade

Re: [ceph-users] PG stuck inconsistent, but appears ok?

2017-07-14 Thread Dan van der Ster
Jul 13, 2017, at 10:29 AM, Aaron Bassett >> > <aaron.bass...@nantomics.com> wrote: >> > >> > Ok good to hear, I just kicked one off on the acting primary so I guess >> > I'll be patient now... >> > >> > Thanks, >> > Aaron >&g

Re: [ceph-users] hammer -> jewel 10.2.8 upgrade and setting sortbitwise

2017-07-14 Thread Dan van der Ster
orry about. (Btw, we just upgraded our biggest prod clusters to jewel -- that also went totally smooth!) -- Dan > sage > > >> >> >> >> On Mon, Jul 10, 2017 at 3:17 PM, Dan van der Ster <d...@vanderster.com> >> wrote: >> > Hi all, >> > >&g

Re: [ceph-users] how to list and reset the scrub schedules

2017-07-18 Thread Dan van der Ster
On Fri, Jul 14, 2017 at 10:40 PM, Gregory Farnum <gfar...@redhat.com> wrote: > On Fri, Jul 14, 2017 at 5:41 AM Dan van der Ster <d...@vanderster.com> wrote: >> >> Hi, >> >> Occasionally we want to change the scrub schedule for a pool or whole >> clu

Re: [ceph-users] XFS attempt to access beyond end of device

2017-07-18 Thread Dan van der Ster
On Tue, Jul 18, 2017 at 6:08 AM, Marcus Furlong <furlo...@gmail.com> wrote: > On 22 March 2017 at 05:51, Dan van der Ster <d...@vanderster.com> wrote: >> On Wed, Mar 22, 2017 at 8:24 AM, Marcus Furlong <furlo...@gmail.com> >> wrote: >>> Hi, >>&

Re: [ceph-users] hammer -> jewel 10.2.8 upgrade and setting sortbitwise

2017-07-18 Thread Dan van der Ster
o be set if we have a cluster running OSDs on > 10.2.6 and some OSDs on 10.2.9? Or should we wait that all OSDs are on > 10.2.9? > > Monitor nodes are already on 10.2.9. > > Best, > Martin > > On Fri, Jul 14, 2017 at 1:16 PM, Dan van der Ster <d...@vanderster.com>

[ceph-users] hammer -> jewel 10.2.8 upgrade and setting sortbitwise

2017-07-10 Thread Dan van der Ster
Hi all, With 10.2.8, ceph will now warn if you didn't yet set sortbitwise. I just updated a test cluster, saw that warning, then did the necessary ceph osd set sortbitwise I noticed a short re-peering which took around 10s on this small cluster with very little data. Has anyone done this

[ceph-users] ipv6 monclient

2017-07-19 Thread Dan van der Ster
Hi Wido, Quick question about IPv6 clusters which you may have already noticed. We have an IPv6 cluster and clients use this as the ceph.conf: [global] mon host = cephv6.cern.ch cephv6 is an alias to our three mons, which are listening on their v6 addrs (ms bind ipv6 = true). But those mon

Re: [ceph-users] v12.0.2 Luminous (dev) released

2017-04-25 Thread Dan van der Ster
Hi, The mon's on my test luminous cluster do not start after upgrading from 12.0.1 to 12.0.2. Here is the backtrace: 0> 2017-04-25 11:06:02.897941 7f467ddd7880 -1 *** Caught signal (Aborted) ** in thread 7f467ddd7880 thread_name:ceph-mon ceph version 12.0.2

Re: [ceph-users] v12.0.2 Luminous (dev) released

2017-04-25 Thread Dan van der Ster
out(7) << __func__ << " loading creating_pgs e" << creating_pgs.last_scan_epoch << dendl; } ... Cheers, Dan On Tue, Apr 25, 2017 at 11:15 AM, Dan van der Ster <d...@vanderster.com> wrote: > Hi, > > The mon's on my test luminous cluster do not start

Re: [ceph-users] v12.0.2 Luminous (dev) released

2017-04-25 Thread Dan van der Ster
Created ticket to follow up: http://tracker.ceph.com/issues/19769 On Tue, Apr 25, 2017 at 11:34 AM, Dan van der Ster <d...@vanderster.com> wrote: > Could this change be the culprit? > > commit 973829132bf7206eff6c2cf30dd0aa32fb0ce706 > Author: Sage Weil <s...@redhat.com>

Re: [ceph-users] expanding cluster with minimal impact

2017-08-08 Thread Dan van der Ster
SH weight by 1.0 each time which > seemed to reduce the extra data movement we were seeing with smaller weight > increases. Maybe something to try out next time? > > Bryan > > From: ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of Dan van der > Ster <d...@

Re: [ceph-users] Changing SSD Landscape

2017-05-18 Thread Dan van der Ster
On Thu, May 18, 2017 at 3:11 AM, Christian Balzer wrote: > On Wed, 17 May 2017 18:02:06 -0700 Ben Hines wrote: > >> Well, ceph journals are of course going away with the imminent bluestore. > Not really, in many senses. > But we should expect far fewer writes to pass through the

Re: [ceph-users] Changing SSD Landscape

2017-05-17 Thread Dan van der Ster
On Wed, May 17, 2017 at 11:29 AM, Dan van der Ster <d...@vanderster.com> wrote: > I am currently pricing out some DCS3520's, for OSDs. Word is that the > price is going up, but I don't have specifics, yet. > > I'm curious, does your real usage show that the 3500 series do

Re: [ceph-users] Changing SSD Landscape

2017-05-17 Thread Dan van der Ster
I am currently pricing out some DCS3520's, for OSDs. Word is that the price is going up, but I don't have specifics, yet. I'm curious, does your real usage show that the 3500 series don't offer enough endurance? Here's one of our DCS3700's after 2.5 years of RBD + a bit of S3: Model Family:

Re: [ceph-users] Living with huge bucket sizes

2017-06-09 Thread Dan van der Ster
Hi Bryan, On Fri, Jun 9, 2017 at 1:55 AM, Bryan Stillwell wrote: > This has come up quite a few times before, but since I was only working with > RBD before I didn't pay too close attention to the conversation. I'm > looking > for the best way to handle existing clusters

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-19 Thread Dan van der Ster
On Thu, Jun 15, 2017 at 7:56 PM, Casey Bodley <cbod...@redhat.com> wrote: > > On 06/14/2017 05:59 AM, Dan van der Ster wrote: >> >> Dear ceph users, >> >> Today we had O(100) slow requests which were caused by deep-scrubbing >> of the metadata log: >>

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-21 Thread Dan van der Ster
osd > config and restarting. > > Casey > > > On 06/19/2017 11:01 AM, Dan van der Ster wrote: >> >> On Thu, Jun 15, 2017 at 7:56 PM, Casey Bodley <cbod...@redhat.com> wrote: >>> >>> On 06/14/2017 05:59 AM, Dan van der Ster wrote: >>>>

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-22 Thread Dan van der Ster
at osd in order to trim more at a time. > > > On 06/21/2017 09:27 AM, Dan van der Ster wrote: >> >> Hi Casey, >> >> I managed to trim up all shards except for that big #54. The others >> all trimmed within a few seconds. >> >> But 54 is provin

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-22 Thread Dan van der Ster
On Wed, Jun 21, 2017 at 4:16 PM, Peter Maloney <peter.malo...@brockmann-consult.de> wrote: > On 06/14/17 11:59, Dan van der Ster wrote: >> Dear ceph users, >> >> Today we had O(100) slow requests which were caused by deep-scrubbing >> of the metadata log: >>

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-23 Thread Dan van der Ster
On Thu, Jun 22, 2017 at 5:31 PM, Casey Bodley <cbod...@redhat.com> wrote: > > On 06/22/2017 10:40 AM, Dan van der Ster wrote: >> >> On Thu, Jun 22, 2017 at 4:25 PM, Casey Bodley <cbod...@redhat.com> wrote: >>> >>> On 06/22/2017 04:00 AM, Dan van der

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-22 Thread Dan van der Ster
On Thu, Jun 22, 2017 at 4:25 PM, Casey Bodley <cbod...@redhat.com> wrote: > > On 06/22/2017 04:00 AM, Dan van der Ster wrote: >> >> I'm now running the three relevant OSDs with that patch. (Recompiled, >> replaced /usr/lib64/rados-classes/libcls_log.so with the

Re: [ceph-users] removing cluster name support

2017-06-08 Thread Dan van der Ster
Hi Sage, We need named clusters on the client side. RBD or CephFS clients, or monitoring/admin machines all need to be able to access several clusters. Internally, each cluster is indeed called "ceph", but the clients use distinct names to differentiate their configs/keyrings. Cheers, Dan On

Re: [ceph-users] removing cluster name support

2017-06-09 Thread Dan van der Ster
On Fri, Jun 9, 2017 at 5:58 PM, Vasu Kulkarni wrote: > On Fri, Jun 9, 2017 at 6:11 AM, Wes Dillingham > wrote: >> Similar to Dan's situation we utilize the --cluster name concept for our >> operations. Primarily for "datamover" nodes which do

Re: [ceph-users] Help build a drive reliability service!

2017-06-14 Thread Dan van der Ster
Hi Patrick, We've just discussed this internally and I wanted to share some notes. First, there are at least three separate efforts in our IT dept to collect and analyse SMART data -- its clearly a popular idea and simple to implement, but this leads to repetition and begs for a common, good

[ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-14 Thread Dan van der Ster
Dear ceph users, Today we had O(100) slow requests which were caused by deep-scrubbing of the metadata log: 2017-06-14 11:07:55.373184 osd.155 [2001:1458:301:24::100:d]:6837/3817268 7387 : cluster [INF] 24.1d deep-scrub starts ... 2017-06-14 11:22:04.143903 osd.155

Re: [ceph-users] Intel power tuning - 30% throughput performance increase

2017-05-03 Thread Dan van der Ster
On Wed, May 3, 2017 at 10:32 AM, Blair Bethwaite <blair.bethwa...@gmail.com> wrote: > On 3 May 2017 at 18:15, Dan van der Ster <d...@vanderster.com> wrote: >> It looks like el7's tuned natively supports the pmqos interface in >> plugins/plugin_cpu.py. > > Ahha, you

Re: [ceph-users] Intel power tuning - 30% throughput performance increase

2017-05-03 Thread Dan van der Ster
On Wed, May 3, 2017 at 9:13 AM, Blair Bethwaite wrote: > We did the latter using the pmqos_static.py, which was previously part of > the RHEL6 tuned latency-performance profile, but seems to have been dropped > in RHEL7 (don't yet know why), It looks like el7's tuned

Re: [ceph-users] Intel power tuning - 30% throughput performance increase

2017-05-03 Thread Dan van der Ster
On Wed, May 3, 2017 at 10:52 AM, Blair Bethwaite <blair.bethwa...@gmail.com> wrote: > On 3 May 2017 at 18:38, Dan van der Ster <d...@vanderster.com> wrote: >> Seems to work for me, or? > > Yeah now that I read the code more I see it is opening and > manipulating /d

Re: [ceph-users] TRIM/Discard on SSDs with BlueStore

2017-06-27 Thread Dan van der Ster
On Tue, Jun 27, 2017 at 1:56 PM, Christian Balzer wrote: > On Tue, 27 Jun 2017 13:24:45 +0200 (CEST) Wido den Hollander wrote: > >> > Op 27 juni 2017 om 13:05 schreef Christian Balzer : >> > >> > >> > On Tue, 27 Jun 2017 11:24:54 +0200 (CEST) Wido den Hollander

Re: [ceph-users] why sudden (and brief) HEALTH_ERR

2017-10-04 Thread Dan van der Ster
On Wed, Oct 4, 2017 at 9:08 AM, Piotr Dałek wrote: > On 17-10-04 08:51 AM, lists wrote: >> >> Hi, >> >> Yesterday I chowned our /var/lib/ceph ceph, to completely finalize our >> jewel migration, and noticed something interesting. >> >> After I brought back up the OSDs I

Re: [ceph-users] ceph-volume: migration and disk partition support

2017-10-10 Thread Dan van der Ster
On Fri, Oct 6, 2017 at 6:56 PM, Alfredo Deza wrote: > Hi, > > Now that ceph-volume is part of the Luminous release, we've been able > to provide filestore support for LVM-based OSDs. We are making use of > LVM's powerful mechanisms to store metadata which allows the process > to

Re: [ceph-users] Reaching aio-max-nr on Ubuntu 16.04 with Luminous

2017-08-30 Thread Dan van der Ster
Hi Thomas, Yes we set it to a million. >From our puppet manifest: # need to increase aio-max-nr to allow many bluestore devs sysctl { 'fs.aio-max-nr': val => '1048576' } Cheers, Dan On Aug 30, 2017 9:53 AM, "Thomas Bennett" wrote: > > Hi, > > I've

Re: [ceph-users] moving mons across networks

2017-09-13 Thread Dan van der Ster
Hi Blair, You can add/remove mons on the fly -- connected clients will learn about all of the mons as the monmap changes and there won't be any downtime as long as the quorum is maintained. There is one catch when it comes to OpenStack, however. Unfortunately, OpenStack persists the mon IP

Re: [ceph-users] moving mons across networks

2017-09-13 Thread Dan van der Ster
On Wed, Sep 13, 2017 at 11:04 AM, Dan van der Ster <d...@vanderster.com> wrote: > On Wed, Sep 13, 2017 at 10:54 AM, Wido den Hollander <w...@42on.com> wrote: >> >>> Op 13 september 2017 om 10:38 schreef Dan van der Ster >>> <d...@vanderster.com>: >

Re: [ceph-users] moving mons across networks

2017-09-13 Thread Dan van der Ster
On Wed, Sep 13, 2017 at 10:54 AM, Wido den Hollander <w...@42on.com> wrote: > >> Op 13 september 2017 om 10:38 schreef Dan van der Ster <d...@vanderster.com>: >> >> >> Hi Blair, >> >> You can add/remove mons on the fly -- connected clients will l

Re: [ceph-users] tunable question

2017-09-28 Thread Dan van der Ster
Hi, How big is your cluster and what is your use case? For us, we'll likely never enable the recent tunables that need to remap *all* PGs -- it would simply be too disruptive for marginal benefit. Cheers, Dan On Thu, Sep 28, 2017 at 9:21 AM, mj wrote: > Hi, > > We have

Re: [ceph-users] Very slow start of osds after reboot

2017-08-31 Thread Dan van der Ster
Hi, I see the same with jewel on el7 -- it started one of the recent point releases around ~10.2.5, IIRC. Problem seems to be the same -- daemon is started before the osd is mounted... then the service waits several seconds before trying again. Aug 31 15:41:47 ceph-osd: 2017-08-31

Re: [ceph-users] Very slow start of osds after reboot

2017-08-31 Thread Dan van der Ster
eph-osd@84.service ● │ ├─ceph-osd@89.service ● │ ├─ceph-osd@90.service ● │ ├─ceph-osd@91.service ● │ └─ceph-osd@92.service ● ├─getty.target ... On Thu, Aug 31, 2017 at 4:57 PM, Dan van der Ster <d...@vanderster.com> wrote: > Hi, > > I see the same with jewel on el7

[ceph-users] Any RGW admin frontends?

2017-12-15 Thread Dan van der Ster
Hi all, As we are starting to ramp up our internal rgw service, I am wondering if someone already developed some "open source" high-level admin tools for rgw. On the one hand, we're looking for a web UI for users to create and see their credentials, quota, usage, and maybe a web bucket browser.

Re: [ceph-users] Cephfs snapshot work

2017-11-07 Thread Dan van der Ster
On Tue, Nov 7, 2017 at 12:57 PM, John Spray wrote: > On Sun, Nov 5, 2017 at 4:19 PM, Brady Deetz wrote: >> My organization has a production cluster primarily used for cephfs upgraded >> from jewel to luminous. We would very much like to have snapshots on

Re: [ceph-users] Cephfs snapshot work

2017-11-07 Thread Dan van der Ster
On Tue, Nov 7, 2017 at 4:15 PM, John Spray <jsp...@redhat.com> wrote: > On Tue, Nov 7, 2017 at 3:01 PM, Dan van der Ster <d...@vanderster.com> wrote: >> On Tue, Nov 7, 2017 at 12:57 PM, John Spray <jsp...@redhat.com> wrote: >>> On Sun, Nov 5, 2017 at 4:19 PM,

[ceph-users] mgr dashboard and cull Removing data for x

2017-12-11 Thread Dan Van Der Ster
Hi all, I'm playing with the dashboard module in 12.2.2 (and it's very cool!) but I noticed that some OSDs do not have metadata, e.g. this page: http://xxx:7000/osd/perf/74 Has empty metadata. I *am* able to see all the info with `ceph osd metadata 74`. I noticed in the mgr log we have:

Re: [ceph-users] ceph-volume lvm activate could not find osd..0

2017-12-12 Thread Dan van der Ster
Doh! The activate command needs the *osd* fsid, not the cluster fsid. So this works: ceph-volume lvm activate 0 6608c0cf-3827-4967-94fd-5a3336f604c3 Is an "activate-all" equivalent planned? -- Dan On Tue, Dec 12, 2017 at 11:35 AM, Dan van der Ster <d...@vanderster.com>

[ceph-users] ceph-volume lvm activate could not find osd..0

2017-12-12 Thread Dan van der Ster
Hi all, Did anyone successfully prepare a new OSD with ceph-volume in 12.2.2? We are trying the simplest thing possible and not succeeding :( # ceph-volume lvm prepare --bluestore --data /dev/sdb # ceph-volume lvm list == osd.0 === [block]

Re: [ceph-users] Luminous radosgw S3/Keystone integration issues

2018-05-07 Thread Dan van der Ster
: > Hi Dan, > > We agreed in upstream RGW to make this change. Do you intend to > submit this as a PR? > > regards > > Matt > > On Fri, May 4, 2018 at 10:57 AM, Dan van der Ster <d...@vanderster.com> wrote: >> Hi Valery, >> >> Did you eventual

Re: [ceph-users] What is the meaning of size and min_size for erasure-coded pools?

2018-05-08 Thread Dan van der Ster
On Tue, May 8, 2018 at 7:35 PM, Vasu Kulkarni wrote: > On Mon, May 7, 2018 at 2:26 PM, Maciej Puzio wrote: >> I am an admin in a research lab looking for a cluster storage >> solution, and a newbie to ceph. I have setup a mini toy cluster on >> some VMs,

Re: [ceph-users] jewel to luminous upgrade, chooseleaf_vary_r and chooseleaf_stable

2018-05-14 Thread Dan van der Ster
Hi Adrian, Is there a strict reason why you *must* upgrade the tunables? It is normally OK to run with old (e.g. hammer) tunables on a luminous cluster. The crush placement won't be state of the art, but that's not a huge problem. We have a lot of data in a jewel cluster with hammer tunables.

Re: [ceph-users] *** SPAM *** Re: Multi-MDS Failover

2018-04-27 Thread Dan van der Ster
Hi Scott, Multi MDS just assigns different parts of the namespace to different "ranks". Each rank (0, 1, 2, ...) is handled by one of the active MDSs. (You can query which parts of the name space are assigned to each rank using the jq tricks in [1]). If a rank is down and there are no more

Re: [ceph-users] Luminous radosgw S3/Keystone integration issues

2018-05-04 Thread Dan van der Ster
Hi Valery, Did you eventually find a workaround for this? I *think* we'd also prefer rgw to fallback to external plugins, rather than checking them before local. But I never understood the reasoning behind the change from jewel to luminous. I saw that there is work towards a cache for ldap [1]

Re: [ceph-users] Poor CentOS 7.5 client performance

2018-05-17 Thread Dan van der Ster
Hi, It still isn't clear if you're using the fuse or kernel client. Do you `mount -t ceph` or something else? -- Dan On Wed, May 16, 2018 at 8:28 PM Donald "Mac" McCarthy wrote: > CephFS. 8 core atom C2758, 16 GB ram, 256GB ssd, 2.5 GB NIC (supermicro microblade

Re: [ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-07 Thread Dan van der Ster
On Thu, Jun 7, 2018 at 4:31 PM Alfredo Deza wrote: > > On Thu, Jun 7, 2018 at 10:23 AM, Dan van der Ster wrote: > > Hi all, > > > > We have an intermittent issue where bluestore osds sometimes fail to > > start after a reboot. > > The osds all fail t

Re: [ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-07 Thread Dan van der Ster
On Thu, Jun 7, 2018 at 4:33 PM Sage Weil wrote: > > On Thu, 7 Jun 2018, Dan van der Ster wrote: > > Hi all, > > > > We have an intermittent issue where bluestore osds sometimes fail to > > start after a reboot. > > The osds all fail the same way [see 2], fai

[ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-07 Thread Dan van der Ster
Hi all, We have an intermittent issue where bluestore osds sometimes fail to start after a reboot. The osds all fail the same way [see 2], failing to open the superblock. One one particular host, there are 24 osds and 4 SSDs partitioned for the block.db's. The affected non-starting OSDs all have

Re: [ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-07 Thread Dan van der Ster
On Thu, Jun 7, 2018 at 6:09 PM Sage Weil wrote: > > On Thu, 7 Jun 2018, Dan van der Ster wrote: > > On Thu, Jun 7, 2018 at 5:36 PM Dan van der Ster wrote: > > > > > > On Thu, Jun 7, 2018 at 5:34 PM Sage Weil wrote: > > > > > > > > On Thu

Re: [ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-07 Thread Dan van der Ster
On Thu, Jun 7, 2018 at 6:01 PM Dan van der Ster wrote: > > On Thu, Jun 7, 2018 at 5:36 PM Dan van der Ster wrote: > > > > On Thu, Jun 7, 2018 at 5:34 PM Sage Weil wrote: > > > > > > On Thu, 7 Jun 2018, Dan van der Ster wrote: > > > >

Re: [ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-07 Thread Dan van der Ster
On Thu, Jun 7, 2018 at 5:36 PM Dan van der Ster wrote: > > On Thu, Jun 7, 2018 at 5:34 PM Sage Weil wrote: > > > > On Thu, 7 Jun 2018, Dan van der Ster wrote: > > > On Thu, Jun 7, 2018 at 4:41 PM Sage Weil wrote: > > > > > > > > On Thu, 7 Jun

Re: [ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-07 Thread Dan van der Ster
On Thu, Jun 7, 2018 at 5:16 PM Alfredo Deza wrote: > > On Thu, Jun 7, 2018 at 10:54 AM, Dan van der Ster wrote: > > On Thu, Jun 7, 2018 at 4:41 PM Sage Weil wrote: > >> > >> On Thu, 7 Jun 2018, Dan van der Ster wrote: > >> > On Th

Re: [ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-07 Thread Dan van der Ster
On Thu, Jun 7, 2018 at 4:41 PM Sage Weil wrote: > > On Thu, 7 Jun 2018, Dan van der Ster wrote: > > On Thu, Jun 7, 2018 at 4:33 PM Sage Weil wrote: > > > > > > On Thu, 7 Jun 2018, Dan van der Ster wrote: > > > > Hi all, > > > > > > >

Re: [ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-07 Thread Dan van der Ster
On Thu, Jun 7, 2018 at 5:34 PM Sage Weil wrote: > > On Thu, 7 Jun 2018, Dan van der Ster wrote: > > On Thu, Jun 7, 2018 at 4:41 PM Sage Weil wrote: > > > > > > On Thu, 7 Jun 2018, Dan van der Ster wrote: > > > > On Thu, Jun 7, 2018 at 4:33 PM Sage Weil

Re: [ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-07 Thread Dan van der Ster
On Thu, Jun 7, 2018 at 6:33 PM Sage Weil wrote: > > On Thu, 7 Jun 2018, Dan van der Ster wrote: > > > > Wait, we found something!!! > > > > > > > > In the 1st 4k on the block we found the block.db pointing at the wrong > > > > device (/dev/sd

Re: [ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-07 Thread Dan van der Ster
On Thu, Jun 7, 2018 at 6:58 PM Alfredo Deza wrote: > > On Thu, Jun 7, 2018 at 12:09 PM, Sage Weil wrote: > > On Thu, 7 Jun 2018, Dan van der Ster wrote: > >> On Thu, Jun 7, 2018 at 5:36 PM Dan van der Ster > >> wrote: > >> > > >>

Re: [ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-07 Thread Dan van der Ster
On Thu, Jun 7, 2018 at 8:58 PM Alfredo Deza wrote: > > On Thu, Jun 7, 2018 at 2:45 PM, Dan van der Ster wrote: > > On Thu, Jun 7, 2018 at 6:58 PM Alfredo Deza wrote: > >> > >> On Thu, Jun 7, 2018 at 12:09 PM, Sage Weil wrote: > >> > On Thu, 7 Jun 20

Re: [ceph-users] IO to OSD with librados

2018-06-18 Thread Dan van der Ster
8=128.55.xxx.xx:6789/0} > > election epoch 4, quorum 0,1 ngfdv076,ngfdv078 > > osdmap e280: 48 osds: 48 up, 48 in > > flags sortbitwise,require_jewel_osds > > pgmap v117283: 3136 pgs, 11 pools, 25600 MB data, 510 objects > >

Re: [ceph-users] *****SPAM***** Re: Add ssd's to hdd cluster, crush map class hdd update necessary?

2018-06-13 Thread Dan van der Ster
See this thread: http://lists.ceph.com/pipermail/ceph-large-ceph.com/2018-April/000106.html http://lists.ceph.com/pipermail/ceph-large-ceph.com/2018-June/000113.html (Wido -- should we kill the ceph-large list??) -- dan On Wed, Jun 13, 2018 at 12:27 PM Marc Roos wrote: > > > Shit, I added

Re: [ceph-users] Add ssd's to hdd cluster, crush map class hdd update necessary?

2018-06-13 Thread Dan van der Ster
See this thread: http://lists.ceph.com/pipermail/ceph-large-ceph.com/2018-April/000106.html http://lists.ceph.com/pipermail/ceph-large-ceph.com/2018-June/000113.html (Wido -- should we kill the ceph-large list??) On Wed, Jun 13, 2018 at 1:14 PM Marc Roos wrote: > > > I wonder if this is not a

Re: [ceph-users] IO to OSD with librados

2018-06-18 Thread Dan van der Ster
Hi, One way you can see exactly what is happening when you write an object is with --debug_ms=1. For example, I write a 100MB object to a test pool: rados --debug_ms=1 -p test put 100M.dat 100M.dat I pasted the output of this here: https://pastebin.com/Zg8rjaTV In this case, it first gets the

Re: [ceph-users] unfound blocks IO or gives IO error?

2018-06-25 Thread Dan van der Ster
irtio-blk vs virtio-scsi: the latter has a timeout but blk blocks forever. On 5000 attached volumes we saw around 12 of these IO errors, and this was the first time in 5 years of upgrades that an IO error happened... -- dan > -Greg > >> >> >> On 22.06.2018, at 16:16, Dan

Re: [ceph-users] Bluestore on HDD+SSD sync write latency experiences

2018-05-03 Thread Dan van der Ster
Hi Nick, Our latency probe results (4kB rados bench) didn't change noticeably after converting a test cluster from FileStore (sata SSD journal) to BlueStore (sata SSD db). Those 4kB writes take 3-4ms on average from a random VM in our data centre. (So bluestore DB seems equivalent to FileStore

Re: [ceph-users] Linux Meltdown (KPTI) fix and how it affects performance?

2018-01-11 Thread Dan van der Ster
Hi all, Is anyone getting useful results with your benchmarking? I've prepared two test machines/pools and don't see any definitive slowdown with patched kernels from CentOS [1]. I wonder if Ceph will be somewhat tolerant of these patches, similarly to what's described here:

Re: [ceph-users] ceph-volume lvm deactivate/destroy/zap

2018-01-08 Thread Dan van der Ster
On Mon, Jan 8, 2018 at 4:37 PM, Alfredo Deza <ad...@redhat.com> wrote: > On Thu, Dec 21, 2017 at 11:35 AM, Stefan Kooman <ste...@bit.nl> wrote: >> Quoting Dan van der Ster (d...@vanderster.com): >>> Thanks Stefan. But isn't there also some vgremove or lvremove magi

[ceph-users] Luminous: example of a single down osd taking out a cluster

2018-01-22 Thread Dan van der Ster
Hi all, We just saw an example of one single down OSD taking down a whole (small) luminous 12.2.2 cluster. The cluster has only 5 OSDs, on 5 different servers. Three of those servers also run a mon/mgr combo. First, we had one server (mon+osd) go down legitimately [1] -- I can tell when it went

Re: [ceph-users] Luminous: example of a single down osd taking out a cluster

2018-01-22 Thread Dan van der Ster
r the help solving this puzzle, Dan On Mon, Jan 22, 2018 at 8:07 PM, Dan van der Ster <d...@vanderster.com> wrote: > Hi all, > > We just saw an example of one single down OSD taking down a whole > (small) luminous 12.2.2 cluster. > > The cluster has only 5 OSDs, on 5 diff

Re: [ceph-users] balancer mgr module

2018-02-16 Thread Dan van der Ster
Hi Caspar, I've been trying the mgr balancer for a couple weeks now and can share some experience. Currently there are two modes implemented: upmap and crush-compat. Upmap requires all clients to be running luminous -- it uses this new pg-upmap mechanism to precisely move PGs one by one to a

Re: [ceph-users] ceph-volume activation

2018-02-21 Thread Dan van der Ster
On Wed, Feb 21, 2018 at 2:24 PM, Alfredo Deza wrote: > On Tue, Feb 20, 2018 at 9:05 PM, Oliver Freyermuth > wrote: >> Many thanks for your replies! >> >> Are there plans to have something like >> "ceph-volume discover-and-activate" >> which would

Re: [ceph-users] ceph-volume activation

2018-02-22 Thread Dan van der Ster
On Wed, Feb 21, 2018 at 11:56 PM, Oliver Freyermuth <freyerm...@physik.uni-bonn.de> wrote: > Am 21.02.2018 um 15:58 schrieb Alfredo Deza: >> On Wed, Feb 21, 2018 at 9:40 AM, Dan van der Ster <d...@vanderster.com> >> wrote: >>> On Wed, Feb 21, 2018 at 2:24 PM, A

Re: [ceph-users] Open Compute (OCP) servers for Ceph

2017-12-22 Thread Dan van der Ster
Hi Wido, We have used a few racks of Wiwynn OCP servers in a Ceph cluster for a couple of years. The machines are dual Xeon [1] and use some of those 2U 30-disk "Knox" enclosures. Other than that, I have nothing particularly interesting to say about these. Our data centre procurement team have

[ceph-users] ceph-volume lvm deactivate/destroy/zap

2017-12-21 Thread Dan van der Ster
Hi, For someone who is not an lvm expert, does anyone have a recipe for destroying a ceph-volume lvm osd? (I have a failed disk which I want to deactivate / wipe before physically removing from the host, and the tooling for this doesn't exist yet http://tracker.ceph.com/issues/22287) >

Re: [ceph-users] MDS behind on trimming

2017-12-21 Thread Dan van der Ster
Hi, We've used double the defaults for around 6 months now and haven't had any behind on trimming errors in that time. mds log max segments = 60 mds log max expiring = 40 Should be simple to try. -- dan On Thu, Dec 21, 2017 at 2:32 PM, Stefan Kooman wrote: > Hi, > >

Re: [ceph-users] ceph-volume lvm deactivate/destroy/zap

2017-12-21 Thread Dan van der Ster
On Thu, Dec 21, 2017 at 3:59 PM, Stefan Kooman <ste...@bit.nl> wrote: > Quoting Dan van der Ster (d...@vanderster.com): >> Hi, >> >> For someone who is not an lvm expert, does anyone have a recipe for >> destroying a ceph-volume lvm osd? >> (I have a failed

Re: [ceph-users] CentOS Dojo at CERN

2018-06-21 Thread Dan van der Ster
On Thu, Jun 21, 2018 at 2:41 PM Kai Wagner wrote: > > On 20.06.2018 17:39, Dan van der Ster wrote: > > And BTW, if you can't make it to this event we're in the early days of > > planning a dedicated Ceph + OpenStack Days at CERN around May/June > > 2019. >

Re: [ceph-users] CentOS Dojo at CERN

2018-06-20 Thread Dan van der Ster
And BTW, if you can't make it to this event we're in the early days of planning a dedicated Ceph + OpenStack Days at CERN around May/June 2019. More news on that later... -- Dan @ CERN On Tue, Jun 19, 2018 at 10:23 PM Leonardo Vaz wrote: > > Hey Cephers, > > We will join our friends from

Re: [ceph-users] Slack-IRC integration

2018-07-28 Thread Dan van der Ster
It's here https://ceph-storage.slack.com/ but for some reason the list of accepted email domains is limited. I have no idea who is maintaining this. Anyway, the slack is just mirroring #ceph and #ceph-devel on IRC so better to connect there directly. Cheers, Dan On Sat, Jul 28, 2018, 6:59 PM

<    1   2   3   4   5   6   >