Re: [ceph-users] Ceph @ OpenStack Sydney Summit

2017-07-06 Thread Blair Bethwaite
Oops, this time plain text... On 7 July 2017 at 13:47, Blair Bethwaite wrote: > > Hi all, > > Are there any "official" plans to have Ceph events co-hosted with OpenStack > Summit Sydney, like in Boston? > > The call for presentations closes in a week. The Forum will

[ceph-users] Ceph @ OpenStack Sydney Summit

2017-07-06 Thread Blair Bethwaite
Hi all, Are there any "official" plans to have Ceph events co-hosted with OpenStack Summit Sydney, like in Boston? The call for presentations closes in a week. The Forum will be organised throughout September and (I think) that is the most likely place to have e.g. Ceph ops sessions like we have

Re: [ceph-users] How to set Ceph client operation priority (ionice)

2017-07-06 Thread Christian Balzer
Hello, On Thu, 6 Jul 2017 14:34:41 -0700 Su, Zhan wrote: > Hi, > > We are running a Ceph cluster serving both batch workload (e.g. data import > / export, offline processing) and latency-sensitive workload. Currently > batch traffic causes a huge slow down in serving latency-sensitive requests

Re: [ceph-users] Removing very large buckets

2017-07-06 Thread Blair Bethwaite
How did you even get 60M objects into the bucket...?! The stuck requests are only likely to be impacting the PG in which the bucket index is stored. Hopefully you are not running other pools on those OSDs? You'll need to upgrade to Jewel and gain the --bypass-gc radosgw-admin flag, that speeds up

Re: [ceph-users] Speeding up backfill after increasing PGs and or adding OSDs

2017-07-06 Thread Christian Balzer
Hello, On Thu, 6 Jul 2017 17:57:06 + george.vasilaka...@stfc.ac.uk wrote: > Thanks for your response David. > > What you've described has been what I've been thinking about too. We have > 1401 OSDs in the cluster currently and this output is from the tail end of > the backfill for +64 PG

Re: [ceph-users] Watch for fstrim running on your Ubuntu systems

2017-07-06 Thread Reed Dier
I could easily see that being the case, especially with Micron as a common thread, but it appears that I am on the latest FW for both the SATA and the NVMe: > $ sudo ./msecli -L | egrep 'Device|FW' > Device Name : /dev/sda > FW-Rev : D0MU027 > Device Name :

[ceph-users] How to set Ceph client operation priority (ionice)

2017-07-06 Thread Su, Zhan
Hi, We are running a Ceph cluster serving both batch workload (e.g. data import / export, offline processing) and latency-sensitive workload. Currently batch traffic causes a huge slow down in serving latency-sensitive requests (e.g. streaming). When that happens, network is not the bottleneck

Re: [ceph-users] krbd journal support

2017-07-06 Thread Jason Dillaman
There are no immediate plans to support the RBD journaling in krbd. The journaling feature requires a lot of code and, with limited resources, the priority has been to provide alternative block device options that pass-through to librbd for such use-cases and to optimize the performance of librbd

Re: [ceph-users] Note about rbd_aio_write usage

2017-07-06 Thread Jason Dillaman
On Thu, Jul 6, 2017 at 3:25 PM, Piotr Dałek wrote: > Is that deep copy an equivalent of what > Jewel librbd did at unspecified point of time, or extra one? It's equivalent / replacement -- not an additional copy. This was changed to support scatter/gather IO API methods

Re: [ceph-users] How to set up bluestore manually?

2017-07-06 Thread Vasu Kulkarni
I recommend you file a tracker issue at http://tracker.ceph.com/ with all details( ceph version, steps you ran and output hiding out anything you dont want to put), I doubt its a ceph-deploy issue but we can try in our lab to replicate it. On Thu, Jul 6, 2017 at 5:25 AM, Martin Emrich

Re: [ceph-users] Degraded objects while OSD is being added/filled

2017-07-06 Thread Andras Pataki
Hi Greg, At the moment our cluster is all in balance. We have one failed drive that will be replaced in a few days (the OSD has been removed from ceph and will be re-added with the replacement drive). I'll document the state of the PGs before the addition of the drive and during the

Re: [ceph-users] Adding storage to exiting clusters with minimal impact

2017-07-06 Thread Peter Maloney
Here's my possibly unique method... I had 3 nodes with 12 disks each, and when adding 2 more nodes, I had issues with the common method you describe, totally blocking clients for minutes, but this worked great for me: > my own method > - osd max backfills = 1 and osd recovery max active = 1 > -

Re: [ceph-users] Watch for fstrim running on your Ubuntu systems

2017-07-06 Thread Peter Maloney
Hey, I have some SAS Micron S630DC-400 which came with firmware M013 which did the same or worse (takes very long... 100% blocked for about 5min for 16GB trimmed), and works just fine with firmware M017 (4s for 32GB trimmed). So maybe you just need an update. Peter On 07/06/17 18:39, Reed

Re: [ceph-users] Speeding up backfill after increasing PGs and or adding OSDs

2017-07-06 Thread David Turner
ceph pg dump | grep backfill Look through the output of that command and see the acting (osds the pg is on/moving off of) and current (where the pg will end up). All it takes is a single osd being listed on a pg currently backfilling and any other PGs it's listed on will be backfill+wait and

Re: [ceph-users] Note about rbd_aio_write usage

2017-07-06 Thread Jason Dillaman
On Thu, Jul 6, 2017 at 11:46 AM, Piotr Dałek wrote: > How about a hybrid solution? Keep the old rbd_aio_write contract (don't copy > the buffer with the assumption that it won't change) and instead of > constructing bufferlist containing bufferptr to copied data,

Re: [ceph-users] Speeding up backfill after increasing PGs and or adding OSDs

2017-07-06 Thread george.vasilakakos
Thanks for your response David. What you've described has been what I've been thinking about too. We have 1401 OSDs in the cluster currently and this output is from the tail end of the backfill for +64 PG increase on the biggest pool. The problem is we see this cluster do at most 20 backfills

[ceph-users] Removing very large buckets

2017-07-06 Thread Eric Beerman
Hello, We have a bucket that has 60 million + objects in it, and are trying to delete it. To do so, we have tried doing: radosgw-admin bucket list --bucket= and then cycling through the list of object names and deleting them, 1,000 at a time. However, after ~3-4k objects deleted, the list

Re: [ceph-users] Adding storage to exiting clusters with minimal impact

2017-07-06 Thread Brian Andrus
On Thu, Jul 6, 2017 at 9:18 AM, Gregory Farnum wrote: > On Thu, Jul 6, 2017 at 7:04 AM wrote: > >> Hi Ceph Users, >> >> >> >> We plan to add 20 storage nodes to our existing cluster of 40 nodes, each >> node has 36 x 5.458 TiB drives. We plan to add

[ceph-users] krbd journal support

2017-07-06 Thread Maged Mokhtar
Hi all, Are there any plans to support rbd journal feature in kernel krbd ? Cheers /Maged ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Watch for fstrim running on your Ubuntu systems

2017-07-06 Thread Reed Dier
Hi Wido, I came across this ancient ML entry with no responses and wanted to follow up with you to see if you recalled any solution to this. Copying the ceph-users list to preserve any replies that may result for archival. I have a couple of boxes with 10x Micron 5100 SATA SSD’s, journaled on

Re: [ceph-users] New cluster - configuration tips and reccomendation - NVMe

2017-07-06 Thread Wido den Hollander
> Op 6 juli 2017 om 18:27 schreef Massimiliano Cuttini : > > > WOW! > > Thanks to everybody! > A tons of suggestion and good tips! > > At the moment we are already using 100Gb/s cards and we are already > adopted 100Gb/s switch so we can go with 40Gb/s that are fully >

Re: [ceph-users] New cluster - configuration tips and reccomendation - NVMe

2017-07-06 Thread Massimiliano Cuttini
WOW! Thanks to everybody! A tons of suggestion and good tips! At the moment we are already using 100Gb/s cards and we are already adopted 100Gb/s switch so we can go with 40Gb/s that are fully compatible with our SWITCH. About CPU I was wrong, the model that we are seeing is not 2603 but 2630

Re: [ceph-users] Degraded objects while OSD is being added/filled

2017-07-06 Thread Gregory Farnum
On Tue, Jul 4, 2017 at 10:47 PM Eino Tuominen wrote: > ​Hello, > > > I noticed the same behaviour in our cluster. > > > ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185) > > > > cluster 0a9f2d69-5905-4369-81ae-e36e4a791831 > > health HEALTH_WARN > >

Re: [ceph-users] Adding storage to exiting clusters with minimal impact

2017-07-06 Thread Gregory Farnum
On Thu, Jul 6, 2017 at 7:04 AM wrote: > Hi Ceph Users, > > > > We plan to add 20 storage nodes to our existing cluster of 40 nodes, each > node has 36 x 5.458 TiB drives. We plan to add the storage such that all > new OSDs are prepared, activated and ready to take data

Re: [ceph-users] Note about rbd_aio_write usage

2017-07-06 Thread Piotr Dałek
On 17-07-06 04:40 PM, Jason Dillaman wrote: On Thu, Jul 6, 2017 at 10:22 AM, Piotr Dałek wrote: So I really see two problems here: lack of API docs and backwards-incompatible change in API behavior. Docs are always in need of update, so any pull requests would be

Re: [ceph-users] Speeding up backfill after increasing PGs and or adding OSDs

2017-07-06 Thread David Turner
Just a quick place to start is osd_max_backfills. You have this set to 1. Each PG is on 11 OSDs. When you have a PG moving, it is on the original 11 OSDs and the new X number of OSDs that it is going to. For each of your PGs that is moving, an OSD can only move 1 at a time (your

Re: [ceph-users] Note about rbd_aio_write usage

2017-07-06 Thread Jason Dillaman
On Thu, Jul 6, 2017 at 10:22 AM, Piotr Dałek wrote: > So I really see two problems here: lack of API docs and > backwards-incompatible change in API behavior. Docs are always in need of update, so any pull requests would be greatly appreciated. However, I disagree that

Re: [ceph-users] ceph-mon leader election problem, should it be improved ?

2017-07-06 Thread Sage Weil
On Thu, 6 Jul 2017, Z Will wrote: > Hi Joao : > > Thanks for thorough analysis . My initial concern is that , I think > in some cases , network failure will make low rank monitor see little > siblings (not enough to form a quorum ) , but some high rank mointor > can see more siblings, so I want

Re: [ceph-users] Note about rbd_aio_write usage

2017-07-06 Thread Piotr Dałek
On 17-07-06 03:43 PM, Jason Dillaman wrote: I've learned the hard way that pre-luminous, even if it copies the buffer, it does so too late. In my specific case, my FUSE module does enter the write call and issues rbd_aio_write there, then exits the write - expecting the buffer provided by FUSE

[ceph-users] Speeding up backfill after increasing PGs and or adding OSDs

2017-07-06 Thread george.vasilakakos
Hey folks, We have a cluster that's currently backfilling from increasing PG counts. We have tuned recovery and backfill way down as a "precaution" and would like to start tuning it to bring up to a good balance between that and client I/O. At the moment we're in the process of bumping up PG

[ceph-users] Adding storage to exiting clusters with minimal impact

2017-07-06 Thread bruno.canning
Hi Ceph Users, We plan to add 20 storage nodes to our existing cluster of 40 nodes, each node has 36 x 5.458 TiB drives. We plan to add the storage such that all new OSDs are prepared, activated and ready to take data but not until we start slowly increasing their weightings. We also expect

Re: [ceph-users] Note about rbd_aio_write usage

2017-07-06 Thread Jason Dillaman
The correct (POSIX-style) program behavior should treat the buffer as immutable until the IO operation completes. It is never safe to assume the buffer can be re-used while the IO is in-flight. You should not add any logic to assume the buffer is safely copied prior to the completion of the IO.

Re: [ceph-users] Note about rbd_aio_write usage

2017-07-06 Thread Piotr Dałek
On 17-07-06 03:03 PM, Jason Dillaman wrote: On Thu, Jul 6, 2017 at 8:26 AM, Piotr Dałek wrote: Hi, If you're using "rbd_aio_write()" in your code, be aware of the fact that before Luminous release, this function expects buffer to remain unchanged until write op ends,

Re: [ceph-users] Note about rbd_aio_write usage

2017-07-06 Thread Jason Dillaman
Pre-Luminous also copies the provided buffer when using the C API -- it just copies it at a later point and not immediately. The eventual goal is to eliminate the copy completely, but that requires some additional plumbing work deep down within the librados messenger layer. On Thu, Jul 6, 2017 at

Re: [ceph-users] How to force "rbd unmap"

2017-07-06 Thread Ilya Dryomov
On Thu, Jul 6, 2017 at 2:23 PM, Stanislav Kopp wrote: > 2017-07-06 14:16 GMT+02:00 Ilya Dryomov : >> On Thu, Jul 6, 2017 at 1:28 PM, Stanislav Kopp wrote: >>> Hi, >>> >>> 2017-07-05 20:31 GMT+02:00 Ilya Dryomov :

[ceph-users] Note about rbd_aio_write usage

2017-07-06 Thread Piotr Dałek
Hi, If you're using "rbd_aio_write()" in your code, be aware of the fact that before Luminous release, this function expects buffer to remain unchanged until write op ends, and on Luminous and later this function internally copies the buffer, allocating memory where needed, freeing it once

Re: [ceph-users] How to set up bluestore manually?

2017-07-06 Thread Martin Emrich
Hi! I changed the partitioning scheme to use a "real" primary partition instead of a logical volume. Ceph-deploy seems run fine now, but the OSD does not start. I see lots of these in the journal: Jul 06 13:53:42 sh[9768]: 0> 2017-07-06 13:53:42.794027 7fcf9918fb80 -1 *** Caught signal

Re: [ceph-users] How to force "rbd unmap"

2017-07-06 Thread Ilya Dryomov
On Thu, Jul 6, 2017 at 1:28 PM, Stanislav Kopp wrote: > Hi, > > 2017-07-05 20:31 GMT+02:00 Ilya Dryomov : >> On Wed, Jul 5, 2017 at 7:55 PM, Stanislav Kopp wrote: >>> Hello, >>> >>> I have problem that sometimes I can't unmap rbd

Re: [ceph-users] OSD Full Ratio Luminous - Unset

2017-07-06 Thread Ashley Merrick
Anyone have some feedback on this? Happy to log a bug ticket if it is one, but want to make sure not missing something Luminous change related. ,Ashley Sent from my iPhone On 4 Jul 2017, at 3:30 PM, Ashley Merrick > wrote: Okie noticed

Re: [ceph-users] ceph-mon leader election problem, should it be improved ?

2017-07-06 Thread Z Will
Hi Joao : Thanks for thorough analysis . My initial concern is that , I think in some cases , network failure will make low rank monitor see little siblings (not enough to form a quorum ) , but some high rank mointor can see more siblings, so I want to try to choose the one who can see the