[ceph-users] resize wal/db

2018-07-11 Thread Shunde Zhang
Hi Ceph Gurus, I have installed Ceph Luminous with Bluestore using ceph-ansible. However, when I did the install, I didn’t set the wal/db size. Then it ended up using the default values, which is quite small: 1G db and 576MB wal. Note that each OSD node has 12 OSDs and each OSD has a 1.8T

Re: [ceph-users] SSDs for data drives

2018-07-11 Thread Konstantin Shalygin
In a recent thread the Samsung SM863a was recommended as a journal SSD. Are there any recommendations for data SSDs, for people who want to use just SSDs in a new Ceph cluster? Take a look to HGST SN260, this is MLC NVMe's [1] [1]

Re: [ceph-users] Luminous 12.2.6 release date?

2018-07-11 Thread Linh Vu
Going by http://tracker.ceph.com/issues/24597, does this only affect FileStore OSDs or are BlueStore ones affected too? Cheers, Linh From: ceph-users on behalf of Sage Weil Sent: Thursday, 12 July 2018 3:48:10 AM To: Ken Dreyer Cc: ceph-users;

Re: [ceph-users] unfound blocks IO or gives IO error?

2018-07-11 Thread Gregory Farnum
On Mon, Jun 25, 2018 at 12:34 AM Dan van der Ster wrote: > On Fri, Jun 22, 2018 at 10:44 PM Gregory Farnum > wrote: > > > > On Fri, Jun 22, 2018 at 6:22 AM Sergey Malinin wrote: > >> > >> From > http://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-pg/ > : > >> > >> "Now 1

Re: [ceph-users] Ceph 12.2.5 - FAILED assert(0 == "put on missing extent (nothing before)")

2018-07-11 Thread Gregory Farnum
A bit delayed, but Radoslaw looked at this some and has a diagnosis on the tracker ticket: http://tracker.ceph.com/issues/24715 So it looks like a symptom of a bug that was already fixed for unrelated reasons. :) -Greg On Wed, Jun 27, 2018 at 4:51 AM Dyweni - Ceph-Users <6exbab4fy...@dyweni.com>

Re: [ceph-users] PGs stuck peering (looping?) after upgrade to Luminous.

2018-07-11 Thread Magnus Grönlund
Hi Kevin, Unfortunately restarting OSD don't appear to help, instead it seems to make it worse with PGs getting stuck degraded. Best regards /Magnus 2018-07-11 20:46 GMT+02:00 Kevin Olbrich : > Sounds a little bit like the problem I had on OSDs: > > [ceph-users] Blocked requests

Re: [ceph-users] Snaptrim_error

2018-07-11 Thread Gregory Farnum
Ah sadly those logs don't look like they have enough debugging to be of much use. But from what I'm seeing here, I don't think this state should actually hurt anything. It ought to go away the next time you delete a snapshot (maybe only if it has data in those PGs? not sure) and otherwise be

Re: [ceph-users] MDS damaged

2018-07-11 Thread Gregory Farnum
On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo < alessandro.desa...@roma1.infn.it> wrote: > OK, I found where the object is: > > > ceph osd map cephfs_metadata 200. > osdmap e632418 pool 'cephfs_metadata' (10) object '200.' -> pg > 10.844f3494 (10.14) -> up ([23,35,18], p23)

Re: [ceph-users] Add filestore based osd to a luminous cluster

2018-07-11 Thread Alfredo Deza
On Wed, Jul 11, 2018 at 3:33 PM, Huseyin Cotuk wrote: > Dear Paul and Alfredo, > > Downgrading to ceph-deploy 1.5.38 did not work either. I labeled the journal > partition (i.e. /dev/nvme0n1p12) with parted, and it added a gpt partuuid to > this specific partition: > > /dev/nvme0n1p12:

Re: [ceph-users] Add filestore based osd to a luminous cluster

2018-07-11 Thread Huseyin Cotuk
Dear Paul and Alfredo, Downgrading to ceph-deploy 1.5.38 did not work either. I labeled the journal partition (i.e. /dev/nvme0n1p12) with parted, and it added a gpt partuuid to this specific partition: /dev/nvme0n1p12: PTUUID="8a775205-1364-43d9-820e-c4d3a0d9f9e3" PTTYPE="gpt" PARTLABEL="ceph

Re: [ceph-users] PGs stuck peering (looping?) after upgrade to Luminous.

2018-07-11 Thread Magnus Grönlund
Hi Paul, No all OSDs are still jewel , the issue started before I had even started to upgrade the first OSD and they don't appear to be flapping. ceph -w shows a lot of slow request etc, but nothing unexpected as far as I can tell considering the state the cluster is in. 2018-07-11

Re: [ceph-users] PGs stuck peering (looping?) after upgrade to Luminous.

2018-07-11 Thread Kevin Olbrich
Sounds a little bit like the problem I had on OSDs: [ceph-users] Blocked requests activating+remapped after extending pg(p)_num *Kevin Olbrich* - [ceph-users] Blocked requests activating+remapped

Re: [ceph-users] PGs stuck peering (looping?) after upgrade to Luminous.

2018-07-11 Thread Paul Emmerich
Did you finish the upgrade of the OSDs? Are OSDs flapping? (ceph -w) Is there anything weird in the OSDs' log files? Paul 2018-07-11 20:30 GMT+02:00 Magnus Grönlund : > Hi, > > Started to upgrade a ceph-cluster from Jewel (10.2.10) to Luminous (12.2.6) > > After upgrading and restarting the

[ceph-users] Ceph-ansible issue with libselinux-python

2018-07-11 Thread Satish Patel
I am installing Ceph Cluster using openstack-ansible and having this strange issue, i did google and some people saying bug and some saying it can be solve by hack.. This is my error TASK [ceph-config : create ceph conf directory and assemble directory]

Re: [ceph-users] Luminous 12.2.6 release date?

2018-07-11 Thread Sage Weil
On Wed, 11 Jul 2018, Ken Dreyer wrote: > Sage, does http://tracker.ceph.com/issues/24597 cover the full problem > you're describing? Yeah. I've added some detail to that bug. Working on fixing our rados test suite to reproduce the issue. sage > > - Ken > > On Wed, Jul 11, 2018 at 9:40 AM,

Re: [ceph-users] Add filestore based osd to a luminous cluster

2018-07-11 Thread Alfredo Deza
On Wed, Jul 11, 2018 at 12:57 PM, Huseyin Cotuk wrote: > Hi Paul, > > Thanks for your reply. I did not mention any special parameter while > upgrading to luminous. So this ceph-deploy version is the one coming from > the official debian luminous repo. That is because ceph-volume came out in

Re: [ceph-users] v10.2.11 Jewel released

2018-07-11 Thread Webert de Souza Lima
Cheers! Thanks for all the backports and fixes. Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Wed, Jul 11, 2018 at 1:46 PM Abhishek Lekshmanan wrote: > > We're glad to announce v10.2.11 release of the Jewel stable release >

Re: [ceph-users] Add filestore based osd to a luminous cluster

2018-07-11 Thread Paul Emmerich
Luminous repo comes with ceph-deploy 2.0.0 since 12.2.2 I believe. Paul 2018-07-11 18:57 GMT+02:00 Huseyin Cotuk : > Hi Paul, > > Thanks for your reply. I did not mention any special parameter while > upgrading to luminous. So this ceph-deploy version is the one coming from > the official

Re: [ceph-users] Add filestore based osd to a luminous cluster

2018-07-11 Thread Huseyin Cotuk
Hi Paul, Thanks for your reply. I did not mention any special parameter while upgrading to luminous. So this ceph-deploy version is the one coming from the official debian luminous repo. I will try to downgrade ceph-deploy and try to add osd again. To prevent any inconsistency, maybe you can

Re: [ceph-users] Add filestore based osd to a luminous cluster

2018-07-11 Thread Paul Emmerich
BlueStore is really stable and mature nowadays. You seem to be using ceph-deploy 2.0.0 which I would not call mature and stable at the moment ;) Anyways, it uses ceph-volume instead of ceph-disk and I think you have to specify the actual partition here. But I'd just downgrade to ceph-deploy

[ceph-users] v10.2.11 Jewel released

2018-07-11 Thread Abhishek Lekshmanan
We're glad to announce v10.2.11 release of the Jewel stable release series. This point releases brings a number of important bugfixes and has a few important security fixes. This is most likely going to be the final Jewel release (shine on you crazy diamond). We thank everyone in the community

Re: [ceph-users] KPIs for Ceph/OSD client latency / deepscrub latency overhead

2018-07-11 Thread Paul Emmerich
Hi, from experience: commit/apply_latency are not good metrics, the only good thing about them is that they are really easy to track. But we have found them to be almost completely useless in the real world. We track the op_*_latency metrics from perf dump and found them to be very helpful, they

[ceph-users] Add filestore based osd to a luminous cluster

2018-07-11 Thread Huseyin Cotuk
Hello everybody, I have just upgraded my ceph cluster from kraken to luminous. I just want to go on with filestore based objectstore for OSDs until Redhat announces bluestore as stable. It is still in technical preview. So my question is: “What is the right procedure of adding an filestore

Re: [ceph-users] MDS damaged

2018-07-11 Thread Alessandro De Salvo
OK, I found where the object is: ceph osd map cephfs_metadata 200. osdmap e632418 pool 'cephfs_metadata' (10) object '200.' -> pg 10.844f3494 (10.14) -> up ([23,35,18], p23) acting ([23,35,18], p23) So, looking at the osds 23, 35 and 18 logs in fact I see: osd.23:

Re: [ceph-users] MDS damaged

2018-07-11 Thread John Spray
On Wed, Jul 11, 2018 at 4:49 PM Alessandro De Salvo wrote: > > Hi John, > > in fact I get an I/O error by hand too: > > > rados get -p cephfs_metadata 200. 200. > error getting cephfs_metadata/200.: (5) Input/output error Next step would be to go look for corresponding

[ceph-users] KPIs for Ceph/OSD client latency / deepscrub latency overhead

2018-07-11 Thread Marc Schöchlin
Hello ceph-users and ceph-devel list, we got in production with our new shiny luminous (12.2.5) cluster. This cluster runs SSD and HDD based OSD pools. To ensure the service quality of the cluster and to have a baseline for client latency optimization (i.e. in the area of deepscrub optimization)

Re: [ceph-users] MDS damaged

2018-07-11 Thread Alessandro De Salvo
Hi John, in fact I get an I/O error by hand too: rados get -p cephfs_metadata 200. 200. error getting cephfs_metadata/200.: (5) Input/output error Can this be recovered someway? Thanks,     Alessandro Il 11/07/18 18:33, John Spray ha scritto: On Wed, Jul 11,

Re: [ceph-users] Luminous 12.2.6 release date?

2018-07-11 Thread Ken Dreyer
Sage, does http://tracker.ceph.com/issues/24597 cover the full problem you're describing? - Ken On Wed, Jul 11, 2018 at 9:40 AM, Sage Weil wrote: > Please hold off on upgrading. We discovered a regression (in 12.2.5 > actually) but the triggering event is OSD restarts or other peering >

Re: [ceph-users] Luminous 12.2.6 release date?

2018-07-11 Thread Sage Weil
Please hold off on upgrading. We discovered a regression (in 12.2.5 actually) but the triggering event is OSD restarts or other peering combined with RGW workloads on EC pools, so unnecessary OSD restarts should be avoided with 12.2.5 until we have is sorted out. sage On Wed, 11 Jul 2018,

Re: [ceph-users] MDS damaged

2018-07-11 Thread John Spray
On Wed, Jul 11, 2018 at 4:10 PM Alessandro De Salvo wrote: > > Hi, > > after the upgrade to luminous 12.2.6 today, all our MDSes have been > marked as damaged. Trying to restart the instances only result in > standby MDSes. We currently have 2 filesystems active and 2 MDSes each. > > I found the

Re: [ceph-users] MDS damaged

2018-07-11 Thread Alessandro De Salvo
Hi Gregory, thanks for the reply. I have the dump of the metadata pool, but I'm not sure what to check there. Is it what you mean? The cluster was operational until today at noon, when a full restart of the daemons was issued, like many other times in the past. I was trying to issue the

Re: [ceph-users] MDS damaged

2018-07-11 Thread Gregory Farnum
Have you checked the actual journal objects as the "journal export" suggested? Did you identify any actual source of the damage before issuing the "repaired" command? What is the history of the filesystems on this cluster? On Wed, Jul 11, 2018 at 8:10 AM Alessandro De Salvo <

Re: [ceph-users] Snaptrim_error

2018-07-11 Thread Gregory Farnum
On Wed, Jul 11, 2018 at 8:07 AM Flash wrote: > Hi there. > > Yesterday I caught that error: > PG_DAMAGED Possible data damage: 2 pgs snaptrim_error > pg 11.9 is active+clean+snaptrim_error, acting [196,167,32] > pg 11.127 is active+clean+snaptrim_error, acting [184,138,1] > May it be

[ceph-users] MDS damaged

2018-07-11 Thread Alessandro De Salvo
Hi, after the upgrade to luminous 12.2.6 today, all our MDSes have been marked as damaged. Trying to restart the instances only result in standby MDSes. We currently have 2 filesystems active and 2 MDSes each. I found the following error messages in the mon: mds.0 :6800/2412911269

[ceph-users] Snaptrim_error

2018-07-11 Thread Flash
Hi there. Yesterday I caught that error: PG_DAMAGED Possible data damage: 2 pgs snaptrim_error pg 11.9 is active+clean+snaptrim_error, acting [196,167,32] pg 11.127 is active+clean+snaptrim_error, acting [184,138,1] May it be because the scrub was done when the snapshots were cleaned up?

Re: [ceph-users] SSDs for data drives

2018-07-11 Thread leo David
I am using S3510 for both filestore and bluestore. Performance seems pretty good. On Wed, Jul 11, 2018 at 5:44 PM, Robert Stanford wrote: > > Any opinions on the Dell DC S3520 (for journals)? That's what I have, > stock and I wonder if I should replace them. > > On Wed, Jul 11, 2018 at 8:34

Re: [ceph-users] SSDs for data drives

2018-07-11 Thread Robert Stanford
Any opinions on the Dell DC S3520 (for journals)? That's what I have, stock and I wonder if I should replace them. On Wed, Jul 11, 2018 at 8:34 AM, Simon Ironside wrote: > > On 11/07/18 14:26, Simon Ironside wrote: > > The 2TB Samsung 850 EVO for example is only rated for 300TBW (terabytes >>

Re: [ceph-users] SSDs for data drives

2018-07-11 Thread Simon Ironside
On 11/07/18 14:26, Simon Ironside wrote: The 2TB Samsung 850 EVO for example is only rated for 300TBW (terabytes written). Over the 5 year warranty period that's only 165GB/day, not even 0.01 full drive writes per day. The SM863a part of the same size is rated for 12,320TBW, over 3 DWPD.

Re: [ceph-users] SSDs for data drives

2018-07-11 Thread Simon Ironside
On 11/07/18 13:49, Satish Patel wrote: Prices going way up if I am picking Samsung SM863a for all data drives. We have many servers running on consumer grade sad drives and we never noticed any performance or any fault so far (but we never used ceph before) I thought that is the whole

Re: [ceph-users] SSDs for data drives

2018-07-11 Thread Piotr Dałek
On 18-07-11 02:35 PM, David Blundell wrote: Hi, I’m looking at 4TB Intel DC P4510 for data drives running BlueStore with WAL, DB and data on the same drives. Has anyone had any good / bad experiences with them? As Intel’s new data centre NVMe SSD it should be fast and reliable but then I

Re: [ceph-users] SSDs for data drives

2018-07-11 Thread Mart van Santen
Hi, We started with consumer grade SSDs. This was in normal operation no problem, but did caused terrible performance during recovery or other platform adjustments which involved datamovements. We finally decided to replace everything with SM863 disks, which after a few years still perform

Re: [ceph-users] SSDs for data drives

2018-07-11 Thread Satish Patel
Prices going way up if I am picking Samsung SM863a for all data drives. We have many servers running on consumer grade sad drives and we never noticed any performance or any fault so far (but we never used ceph before) I thought that is the whole point of ceph to provide high availability if

Re: [ceph-users] SSDs for data drives

2018-07-11 Thread David Blundell
Hi, I’m looking at 4TB Intel DC P4510 for data drives running BlueStore with WAL, DB and data on the same drives. Has anyone had any good / bad experiences with them? As Intel’s new data centre NVMe SSD it should be fast and reliable but then I would have thought the same about the DC S4600

Re: [ceph-users] SSDs for data drives

2018-07-11 Thread Paul Emmerich
Hi, we‘ve no long-term data for the SM variant. Performance is fine as far as we can tell, but the main difference between these two models should be endurance. Also, I forgot to mention that my experiences are only for the 1, 2, and 4 TB variants. Smaller SSDs are often proportionally slower

Re: [ceph-users] CephFS - How to handle "loaded dup inode" errors

2018-07-11 Thread Linh Vu
Thanks John :) Yeah I did the `ceph fs reset` as well, because we did have 2 active MDSes. Currently running on just one until we completely cleared all these issues. Our original problem started with a partial network outage a few weeks ago around the weekend. After it came back, post-DR and

Re: [ceph-users] SSDs for data drives

2018-07-11 Thread Robert Stanford
Paul - That's extremely helpful, thanks. I do have another cluster that uses Samsung SM863a just for journal (spinning disks for data). Do you happen to have an opinion on those as well? On Wed, Jul 11, 2018 at 4:03 AM, Paul Emmerich wrote: > PM/SM863a are usually great disks and should be

Re: [ceph-users] CephFS - How to handle "loaded dup inode" errors

2018-07-11 Thread John Spray
On Wed, Jul 11, 2018 at 2:23 AM Linh Vu wrote: > > Hi John, > > > Thanks for the explanation, that command is a lot more impacting than I > thought! I hope the change of name for the verb "reset" comes through in the > next version, because that is very easy to misunderstand. > > "The first

Re: [ceph-users] SSDs for data drives

2018-07-11 Thread Paul Emmerich
PM/SM863a are usually great disks and should be the default go-to option, they outperform even the more expensive PM1633 in our experience. (But that really doesn't matter if it's for the full OSD and not as dedicated WAL/journal) We got a cluster with a few hundred SanDisk Ultra II

Re: [ceph-users] mimic (13.2.0) and "Failed to send data to Zabbix"

2018-07-11 Thread Wido den Hollander
On 07/11/2018 10:22 AM, ceph.nov...@habmalnefrage.de wrote: > at about the same time we also updated the Linux OS via "YUM" to: > > # more /etc/redhat-release > Red Hat Enterprise Linux Server release 7.5 (Maipo) > > > > from the given error message, it seems like there are 32 "measure

Re: [ceph-users] Mimic 13.2.1 release date

2018-07-11 Thread Paul Emmerich
We are also waiting for 13.2.1 before we can release our croit management software for Mimic. Paul 2018-07-09 17:11 GMT+02:00 Wido den Hollander : > Hi, > > Is there a release date for Mimic 13.2.1 yet? > > There are a few issues which currently make deploying with Mimic 13.2.0 > a bit

Re: [ceph-users] mimic (13.2.0) and "Failed to send data to Zabbix"

2018-07-11 Thread ceph . novice
at about the same time we also updated the Linux OS via "YUM" to: # more /etc/redhat-release Red Hat Enterprise Linux Server release 7.5 (Maipo) from the given error message, it seems like there are 32 "measure points", which are to be send but 3 of them are somehow failing: >>>

Re: [ceph-users] SSDs for data drives

2018-07-11 Thread Robert Stanford
Wido - You're using the same SATA drive as journals and data drives both? I want to make sure my question was understood, since you mention BlueStore (maybe you were just using them for journals; I want to make sure I understood). Thanks On Wed, Jul 11, 2018 at 3:14 AM, Wido den Hollander

Re: [ceph-users] SSDs for data drives

2018-07-11 Thread Wido den Hollander
On 07/11/2018 10:10 AM, Robert Stanford wrote: > >  In a recent thread the Samsung SM863a was recommended as a journal > SSD.  Are there any recommendations for data SSDs, for people who want > to use just SSDs in a new Ceph cluster? > Depends on what you are looking for, SATA, SAS3 or NVMe?

[ceph-users] SSDs for data drives

2018-07-11 Thread Robert Stanford
In a recent thread the Samsung SM863a was recommended as a journal SSD. Are there any recommendations for data SSDs, for people who want to use just SSDs in a new Ceph cluster? Thank you ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] mimic (13.2.0) and "Failed to send data to Zabbix"

2018-07-11 Thread Wido den Hollander
On 07/11/2018 10:02 AM, ceph.nov...@habmalnefrage.de wrote: > anyone with "mgr Zabbix enabled" migrated from Luminous (12.2.5 or 5) and has > the same problem in Mimic now? > if I disable and re-enable the "zabbix" module, the status is "HEALTH_OK" for > some sec. and changes to "HEALTH_WARN"

[ceph-users] mimic (13.2.0) and "Failed to send data to Zabbix"

2018-07-11 Thread ceph . novice
anyone with "mgr Zabbix enabled" migrated from Luminous (12.2.5 or 5) and has the same problem in Mimic now? if I disable and re-enable the "zabbix" module, the status is "HEALTH_OK" for some sec. and changes to "HEALTH_WARN" again... --- # ceph -s cluster: id:

Re: [ceph-users] Luminous 12.2.6 release date?

2018-07-11 Thread Dan van der Ster
And voila, I see the 12.2.6 rpms were released overnight. Waiting here for an announcement before upgrading. -- dan On Tue, Jul 10, 2018 at 10:08 AM Sean Purdy wrote: > > While we're at it, is there a release date for 12.2.6? It fixes a > reshard/versioning bug for us. > > Sean >

Re: [ceph-users] Mimic 13.2.1 release date

2018-07-11 Thread ceph . novice
- adding ceph-devel - Same here. An estimated date would already help for internal plannings :|   Gesendet: Dienstag, 10. Juli 2018 um 11:59 Uhr Von: "Martin Overgaard Hansen" An: ceph-users Betreff: Re: [ceph-users] Mimic 13.2.1 release date > Den 9. jul. 2018 kl. 17.12 skrev Wido den

Re: [ceph-users] CephFS - How to handle "loaded dup inode" errors

2018-07-11 Thread Linh Vu
For this cluster, we currently don't build our own ceph packages (although just had to do that for one other cluster recently). Is it safe to comment out that particular assert, in the event that the full fix isn't coming really soon? From: Wido den Hollander

Re: [ceph-users] radosgw multizone not syncing large bucket completly to other zone

2018-07-11 Thread Enrico Kern
I changed the endpoints to bypass the loadbalancers for sync. But the Problem stil remains. Will probably resetup the bucket and recopy the data to see if that changes something. I cant make anything out of all the log messages, need to dig deeper into that On Sun, Jul 8, 2018 at 4:55 PM Enrico

Re: [ceph-users] CephFS - How to handle "loaded dup inode" errors

2018-07-11 Thread Wido den Hollander
On 07/11/2018 01:47 AM, Linh Vu wrote: > Thanks John :) Has it - asserting out on dupe inode - already been > logged as a bug yet? I could put one in if needed.  > Did you just comment out the assert? And indeed, my next question would be, do we have a issue tracker for this? Wido > >