[ceph-users] Mixed Bluestore and Filestore NVMe OSDs for RGW metadata both running out of space
osd daemon perf dump for a one of my bluestore NVMe OSDs has [1] this excerpt. I grabbed those stats based on Wido's [2] script to determine how much DB overhead you have per object. My [3] calculations for this particular OSD are staggering. 99% of the space used on this OSD is being consumed by the DB. This particular OSD is sitting between 90%-97% disk usage with an occasional drop to 80%, but then back up. It's fluctuating wildly from one minute to the next. One of my filestore NVMe OSDs in the same cluster has 99% of its used space in ./current/omap/ This is causing IO stalls as well as OSDs flapping on the cluster. Does anyone have any ideas of anything I can try? It's definitely not the actual PGs on the OSDs. I tried balancing the weights of the OSDs to better distribute the data, but moving the PGs around seemed to make things worse. Thank you. [1] "bluestore_onodes": 167, "stat_bytes_used": 143855271936, "db_used_bytes": 142656667648, [2] https://gist.github.com/wido/b1328dd45aae07c45cb8075a24de9f1f [3] Average object size = 821MB DB overhead per object = 814MB ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Hammer and a (little) disk/partition shrink...
Mandi! David Turner In chel di` si favelave... > Replace the raid controller in the chassis with an HBA before moving into the > new hardware? ;) Eh... some hint on a controller i can buy? > If you do move to the HP controller, make sure you're monitoring the health of > the cache battery in the controller. I've no battery in the controller... ;-) -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph-Deploy error on 15/71 stage
Hi Eugen. Sorry for the delay in answering. Just looked in the /var/log/ceph/ directory. It only contains the following files (for example on node01): ### # ls -lart total 3864 -rw--- 1 ceph ceph 904 ago 24 13:11 ceph.audit.log-20180829.xz drwxr-xr-x 1 root root 898 ago 28 10:07 .. -rw-r--r-- 1 ceph ceph 189464 ago 28 23:59 ceph-mon.node01.log-20180829.xz -rw--- 1 ceph ceph 24360 ago 28 23:59 ceph.log-20180829.xz -rw-r--r-- 1 ceph ceph 48584 ago 29 00:00 ceph-mgr.node01.log-20180829.xz -rw--- 1 ceph ceph 0 ago 29 00:00 ceph.audit.log drwxrws--T 1 ceph ceph 352 ago 29 00:00 . -rw-r--r-- 1 ceph ceph 1908122 ago 29 12:46 ceph-mon.node01.log -rw--- 1 ceph ceph 175229 ago 29 12:48 ceph.log -rw-r--r-- 1 ceph ceph 1599920 ago 29 12:49 ceph-mgr.node01.log ### So, it only contains logs concerning the node itself (is it correct? sincer node01 is also the master, I was expecting it to have logs from the other too) and, moreover, no ceph-osd* files. Also, I'm looking the logs I have available, and nothing "shines out" (sorry for my poor english) as a possible error. Any suggestion on how to proceed? Thanks a lot in advance, Jones On Mon, Aug 27, 2018 at 5:29 AM Eugen Block wrote: > Hi Jones, > > all ceph logs are in the directory /var/log/ceph/, each daemon has its > own log file, e.g. OSD logs are named ceph-osd.*. > > I haven't tried it but I don't think SUSE Enterprise Storage deploys > OSDs on partitioned disks. Is there a way to attach a second disk to > the OSD nodes, maybe via USB or something? > > Although this thread is ceph related it is referring to a specific > product, so I would recommend to post your question in the SUSE forum > [1]. > > Regards, > Eugen > > [1] https://forums.suse.com/forumdisplay.php?99-SUSE-Enterprise-Storage > > Zitat von Jones de Andrade : > > > Hi Eugen. > > > > Thanks for the suggestion. I'll look for the logs (since it's our first > > attempt with ceph, I'll have to discover where they are, but no problem). > > > > One thing called my attention on your response however: > > > > I haven't made myself clear, but one of the failures we encountered were > > that the files now containing: > > > > node02: > >-- > >storage: > >-- > >osds: > >-- > >/dev/sda4: > >-- > >format: > >bluestore > >standalone: > >True > > > > Were originally empty, and we filled them by hand following a model found > > elsewhere on the web. It was necessary, so that we could continue, but > the > > model indicated that, for example, it should have the path for /dev/sda > > here, not /dev/sda4. We chosen to include the specific partition > > identification because we won't have dedicated disks here, rather just > the > > very same partition as all disks were partitioned exactly the same. > > > > While that was enough for the procedure to continue at that point, now I > > wonder if it was the right call and, if it indeed was, if it was done > > properly. As such, I wonder: what you mean by "wipe" the partition here? > > /dev/sda4 is created, but is both empty and unmounted: Should a different > > operation be performed on it, should I remove it first, should I have > > written the files above with only /dev/sda as target? > > > > I know that probably I wouldn't run in this issues with dedicated discks, > > but unfortunately that is absolutely not an option. > > > > Thanks a lot in advance for any comments and/or extra suggestions. > > > > Sincerely yours, > > > > Jones > > > > On Sat, Aug 25, 2018 at 5:46 PM Eugen Block wrote: > > > >> Hi, > >> > >> take a look into the logs, they should point you in the right direction. > >> Since the deployment stage fails at the OSD level, start with the OSD > >> logs. Something's not right with the disks/partitions, did you wipe > >> the partition from previous attempts? > >> > >> Regards, > >> Eugen > >> > >> Zitat von Jones de Andrade : > >> > >>> (Please forgive my previous email: I was using another message and > >>> completely forget to update the subject) > >>> > >>> Hi all. > >>> > >>> I'm new to ceph, and after having serious problems in ceph stages 0, 1 > >> and > >>> 2 that I could solve myself, now it seems that I have hit a w
Re: [ceph-users] Hammer and a (little) disk/partition shrink...
Replace the raid controller in the chassis with an HBA before moving into the new hardware? ;) If you do move to the HP controller, make sure you're monitoring the health of the cache battery in the controller. We notice a significant increase to await on our OSD nodes behind these when the cache battery fails. We've replaced over 10 batteries on HP raid controllers for our OSD nodes and the first time we noticed it, there were 6 of them failed across multiple clusters causing the OSDs to be slower in those nodes. On Wed, Aug 29, 2018 at 7:21 AM Marco Gaiarin wrote: > > Probably a complex question, with a simple answer: NO. ;-) > > > I need to move disks from a ceph node (still on hammer) from an > hardware to another one. The source hardware have a simple SATA/SAS > controller, the 'new' server have a RAID controller with no JBOD mode > (the infamous HP P410i), so i need to create some 'RAID 0 with a single > disk' fake raid. > > These controller, seems to ''eat'' some space at the end of the disk, > so (doing some tests) the disk does not get corrupted with the > 'raid0-ification', but lost some bytes at the end, and linux then > complain that the (last) partition are corrupted. > > hammer use filestore, so practically i need to shrunk an xfs > filesystem, that is not supported by XFS. > Clearly i can do 'xfsdump' of disks in some scratch space and rebuild > filesystem but... > > > I've some escape path? > > > Thanks. > > -- > dott. Marco Gaiarin GNUPG Key ID: > 240A3D66 > Associazione ``La Nostra Famiglia'' > http://www.lanostrafamiglia.it/ > Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento > (PN) > marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 > <+39%200434%20842711> f +39-0434-842797 <+39%200434%20842797> > > Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! > http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 > (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs mount on osd node
The problem with mounting an RBD or CephFS on an OSD node is if you're doing so with the kernel client. In a previous message on the ML John Spray explained this wonderfully. "This is not a Ceph-specific thing -- it can also affect similar systems like Lustre. The classic case is when under some memory pressure, the kernel tries to free memory by flushing the client's page cache, but doing the flush means allocating more memory on the server, making the memory pressure worse, until the whole thing just seizes up." If you're using ceph-fuse to mount cephfs, then you only have resource contention as a problem, but nothing as severe as deadlocking. Settings like Jake mentioned can help you work around resource contention if that is an issue for you. Don't change the settings unless you notice a problem, though. Ceph is pretty good at having sane defaults. On Wed, Aug 29, 2018 at 6:35 AM Jake Grimmett wrote: > Hi Marc, > > We mount cephfs using FUSE on all 10 nodes of our cluster, and provided > that we limit bluestore memory use, find it to be reliable*. > > bluestore_cache_size = 209715200 > bluestore_cache_kv_max = 134217728 > > Without the above tuning, we get OOM errors. > > As others will confirm, the FUSE client is more stable than the kernel > client, but slower. > > ta ta > > Jake > > * We have 128GB of ram per 45 x 8TB Drive OSD node, way below > recommendations (1GB RAM per TB storage); our OOM issues are completely > predictable... > > On 29/08/18 13:25, Marc Roos wrote: > > > > > > I have 3 node test cluster and I would like to expand this with a 4th > > node that is currently mounting the cephfs and rsync's backups to it. I > > can remember reading something about that you could create a deadlock > > situation doing this. > > > > What are the risks I would be taking if I would be doing this? > > > > > > > > > > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Error EINVAL: (22) Invalid argument While using ceph osd safe-to-destroy
I am addressing the doc bug at https://github.com/ceph/ceph/pull/23801 On Mon, Aug 27, 2018 at 2:08 AM, Eugen Block wrote: > Hi, > > could you please paste your osd tree and the exact command you try to > execute? > >> Extra note, the while loop in the instructions look like it's bad. I had >> to change it to make it work in bash. > > > The documented command didn't work for me either. > > Regards, > Eugen > > Zitat von Robert Stanford : > > >> I am following the procedure here: >> http://docs.ceph.com/docs/mimic/rados/operations/bluestore-migration/ >> >> When I get to the part to run "ceph osd safe-to-destroy $ID" in a while >> loop, I get a EINVAL error. I get this error when I run "ceph osd >> safe-to-destroy 0" on the command line by itself, too. (Extra note, the >> while loop in the instructions look like it's bad. I had to change it to >> make it work in bash.) >> >> I know my ID is correct because I was able to use it in the previous step >> (ceph osd out $ID). I also substituted $ID for the number on the command >> line and got the same error. Why isn't this working? >> >> Error: Error EINVAL: (22) Invalid argument While using ceph osd >> safe-to-destroy >> >> Thank you >> R > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Hammer and a (little) disk/partition shrink...
Probably a complex question, with a simple answer: NO. ;-) I need to move disks from a ceph node (still on hammer) from an hardware to another one. The source hardware have a simple SATA/SAS controller, the 'new' server have a RAID controller with no JBOD mode (the infamous HP P410i), so i need to create some 'RAID 0 with a single disk' fake raid. These controller, seems to ''eat'' some space at the end of the disk, so (doing some tests) the disk does not get corrupted with the 'raid0-ification', but lost some bytes at the end, and linux then complain that the (last) partition are corrupted. hammer use filestore, so practically i need to shrunk an xfs filesystem, that is not supported by XFS. Clearly i can do 'xfsdump' of disks in some scratch space and rebuild filesystem but... I've some escape path? Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Looking for information on full SSD deployments
Hello fellow Ceph users, We have been using a small cluster (6 data nodes with 12 disks each, 3 monitors) with OSDs on spinners and journals on SATA SSD-s for a while now. We still haven't upgraded to Luminous, and are going to test it now, as we also need to switch some projects on a shared file system and cephFS seems to fit the bill. What I'm mostly looking for is to get in contact with someone with experience in running Ceph as a full SSD cluster, or full SSD pool(s) on the main cluster. Main interest is in performance centric workloads generated by web applications that work directly with files, heavily both in read and write capacity, with low latency being very important. As mentioned above, the other question is about viability of cephFS in production environment right now, for web applications with several nodes, using a shared file system for certain read and write operations. I will not go into more detail here, if you have some experience and would be willing to share it, please write to val...@eenet.ee Also thanks to everyone in this list for the insights other people's random problems have given us. We have probably managed to prevent some problems in the current cluster just by skimming through these e-mails. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSD OSDs crashing after upgrade to 12.2.7
On Wed, Aug 29, 2018 at 2:06 AM, Wolfgang Lendl wrote: > Hi, > > after upgrading my ceph clusters from 12.2.5 to 12.2.7 I'm experiencing > random crashes from SSD OSDs (bluestore) - it seems that HDD OSDs are not > affected. > I destroyed and recreated some of the SSD OSDs which seemed to help. > > this happens on centos 7.5 (different kernels tested) > > /var/log/messages: > Aug 29 10:24:08 ceph-osd: *** Caught signal (Segmentation fault) ** > Aug 29 10:24:08 ceph-osd: in thread 7f8a8e69e700 thread_name:bstore_kv_final > Aug 29 10:24:08 kernel: traps: bstore_kv_final[187470] general protection > ip:7f8a997cf42b sp:7f8a8e69abc0 error:0 in > libtcmalloc.so.4.4.5[7f8a997a8000+46000] > Aug 29 10:24:08 systemd: ceph-osd@2.service: main process exited, > code=killed, status=11/SEGV > Aug 29 10:24:08 systemd: Unit ceph-osd@2.service entered failed state. > Aug 29 10:24:08 systemd: ceph-osd@2.service failed. > Aug 29 10:24:28 systemd: ceph-osd@2.service holdoff time over, scheduling > restart. > Aug 29 10:24:28 systemd: Starting Ceph object storage daemon osd.2... > Aug 29 10:24:28 systemd: Started Ceph object storage daemon osd.2. > Aug 29 10:24:28 ceph-osd: starting osd.2 at - osd_data > /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal > Aug 29 10:24:35 ceph-osd: *** Caught signal (Segmentation fault) ** > Aug 29 10:24:35 ceph-osd: in thread 7f5f1e790700 thread_name:tp_osd_tp > Aug 29 10:24:35 kernel: traps: tp_osd_tp[186933] general protection > ip:7f5f43103e63 sp:7f5f1e78a1c8 error:0 in > libtcmalloc.so.4.4.5[7f5f430cd000+46000] > Aug 29 10:24:35 systemd: ceph-osd@0.service: main process exited, > code=killed, status=11/SEGV > Aug 29 10:24:35 systemd: Unit ceph-osd@0.service entered failed state. > Aug 29 10:24:35 systemd: ceph-osd@0.service failed These systemd messages aren't usually helpful, try poking around /var/log/ceph/ for the output on that one OSD. If those logs aren't useful either, try bumping up the verbosity (see http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/#boot-time ) > > did I hit a known issue? > any suggestions are highly appreciated > > > br > wolfgang > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] cephfs mount on osd node
Hi Marc, We mount cephfs using FUSE on all 10 nodes of our cluster, and provided that we limit bluestore memory use, find it to be reliable*. bluestore_cache_size = 209715200 bluestore_cache_kv_max = 134217728 Without the above tuning, we get OOM errors. As others will confirm, the FUSE client is more stable than the kernel client, but slower. ta ta Jake * We have 128GB of ram per 45 x 8TB Drive OSD node, way below recommendations (1GB RAM per TB storage); our OOM issues are completely predictable... On 29/08/18 13:25, Marc Roos wrote: > > > I have 3 node test cluster and I would like to expand this with a 4th > node that is currently mounting the cephfs and rsync's backups to it. I > can remember reading something about that you could create a deadlock > situation doing this. > > What are the risks I would be taking if I would be doing this? > > > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] New Ceph community manager: Mike Perez
Correction: Mike's new email is actually mipe...@redhat.com (sorry, mperez!). sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] cephfs mount on osd node
I have 3 node test cluster and I would like to expand this with a 4th node that is currently mounting the cephfs and rsync's backups to it. I can remember reading something about that you could create a deadlock situation doing this. What are the risks I would be taking if I would be doing this? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] prevent unnecessary MON leader re-election
On 08/29/2018 11:02 AM, William Lawton wrote: > > We have a 5 node Ceph cluster, status output copied below. During our > cluster resiliency tests we have noted that a MON leader election takes > place when we fail one member of the MON quorum, even though the failed > instance is not the current MON leader. We speculate that this > re-election process may be contributing to short periods of cluster > unavailability when one or more cluster instances fail. Is there a way > to configure the cluster so that there is only a MON leader election if > the existing MON leader fails but not when some other member of the MON > quorum fails? Not at the moment, and this hasn't been in our plans. My reasoning, at least, has been that if a monitor failed, an election is the best way we have to ensure the remaining monitors are alive and communicative. And the election itself should be a quick process anyway, so this never became a particularly pressing feature. I'd suggest opening a feature request in the tracker, asking for this. And, if possible, attach logs to the ticket showing that the election is taking too long, or evidence that you're getting I/O stalls during this period. (for the mon logs, I'd suggest 'debug mon = 10', 'debug paxos = 10', and 'debug ms = 1') -Joao ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Installing ceph 12.2.4 via Ubuntu apt
The root cause is a restriction in reprepro used to manage the repository: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=570623 Paul 2018-08-29 8:50 GMT+02:00 Thomas Bennett : > Hi David, > > Thanks for your reply. That's how I'm currently handling it. > > Kind regards, > Tom > > On Tue, Aug 28, 2018 at 4:36 PM David Turner wrote: >> >> That is the expected behavior of the ceph repo. In the past when I needed >> a specific version I would download the packages for the version to a folder >> and you can create a repo file that reads from a local directory. That's how >> I would re-install my test lab after testing an upgrade procedure to try it >> over again. >> >> On Tue, Aug 28, 2018, 1:01 AM Thomas Bennett wrote: >>> >>> Hi, >>> >>> I'm wanting to pin to an older version of Ceph Luminous (12.2.4) and I've >>> noticed that https://download.ceph.com/debian-luminous/ does not support >>> this via apt install: >>> apt install ceph works for 12.2.7 but >>> apt install ceph=12.2.4-1xenial does not work >>> >>> The deb file are there, they're just not included in the package >>> distribution. Is this the desired behaviour or a misconfiguration? >>> >>> Cheers, >>> Tom >>> >>> -- >>> Thomas Bennett >>> >>> SARAO >>> Science Data Processing >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- > Thomas Bennett > > SARAO > Science Data Processing > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] prevent unnecessary MON leader re-election
Hi. We have a 5 node Ceph cluster, status output copied below. During our cluster resiliency tests we have noted that a MON leader election takes place when we fail one member of the MON quorum, even though the failed instance is not the current MON leader. We speculate that this re-election process may be contributing to short periods of cluster unavailability when one or more cluster instances fail. Is there a way to configure the cluster so that there is only a MON leader election if the existing MON leader fails but not when some other member of the MON quorum fails? cluster: id: f774b9b2-d514-40d9-85ab-d0389724b6c0 health: HEALTH_OK services: mon: 3 daemons, quorum dub-sitv-ceph-03,dub-sitv-ceph-04,dub-sitv-ceph-05 mgr: dub-sitv-ceph-04(active), standbys: dub-sitv-ceph-03, dub-sitv-ceph-05 mds: cephfs-1/1/1 up {0=dub-sitv-ceph-02=up:active}, 1 up:standby-replay osd: 4 osds: 4 up, 4 in data: pools: 2 pools, 200 pgs objects: 554 objects, 980 MiB usage: 7.9 GiB used, 1.9 TiB / 2.0 TiB avail pgs: 200 active+clean io: client: 1.5 MiB/s rd, 810 KiB/s wr, 286 op/s rd, 218 op/s wr William Lawton ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] New Ceph community manager: Mike Perez
On 2018-08-29T01:13:24, Sage Weil wrote: Most excellent! Welcome, Mike! I look forward to working with you. Regards, Lars -- Architect SDS, Distinguished Engineer SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) "Architects should open possibilities and not determine everything." (Ueli Zbinden) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] New Ceph community manager: Mike Perez
On 08/29/2018 02:13 AM, Sage Weil wrote: > Hi everyone, > > Please help me welcome Mike Perez, the new Ceph community manager! Very happy to have you with us! Let us know if there's anything we can help you with, and don't hesitate to get in touch :) -Joao ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] New Ceph community manager: Mike Perez
Great news. Welcome Mike! I look forward to working with you, let me know if there is anything I can help you with. Lenz On 08/29/2018 03:13 AM, Sage Weil wrote: > Please help me welcome Mike Perez, the new Ceph community manager! > > Mike has a long history with Ceph: he started at DreamHost working on > OpenStack and Ceph back in the early days, including work on the original > RBD integration. He went on to work in several roles in the OpenStack > project, doing a mix of infrastructure, cross-project and community > related initiatives, including serving as the Project Technical Lead for > Cinder. > > Mike lives in Pasadena, CA, and can be reached at mpe...@redhat.com, on > IRC as thingee, or twitter as @thingee. > > I am very excited to welcome Mike back to Ceph, and look forward to > working together on building the Ceph developer and user communities! -- SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany) GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg) signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] SSD OSDs crashing after upgrade to 12.2.7
Hi, after upgrading my ceph clusters from 12.2.5 to 12.2.7 I'm experiencing random crashes from SSD OSDs (bluestore) - it seems that HDD OSDs are not affected. I destroyed and recreated some of the SSD OSDs which seemed to help. this happens on centos 7.5 (different kernels tested) /var/log/messages: Aug 29 10:24:08 ceph-osd: *** Caught signal (Segmentation fault) ** Aug 29 10:24:08 ceph-osd: in thread 7f8a8e69e700 thread_name:bstore_kv_final Aug 29 10:24:08 kernel: traps: bstore_kv_final[187470] general protection ip:7f8a997cf42b sp:7f8a8e69abc0 error:0 in libtcmalloc.so.4.4.5[7f8a997a8000+46000] Aug 29 10:24:08 systemd: ceph-osd@2.service: main process exited, code=killed, status=11/SEGV Aug 29 10:24:08 systemd: Unit ceph-osd@2.service entered failed state. Aug 29 10:24:08 systemd: ceph-osd@2.service failed. Aug 29 10:24:28 systemd: ceph-osd@2.service holdoff time over, scheduling restart. Aug 29 10:24:28 systemd: Starting Ceph object storage daemon osd.2... Aug 29 10:24:28 systemd: Started Ceph object storage daemon osd.2. Aug 29 10:24:28 ceph-osd: starting osd.2 at - osd_data /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal Aug 29 10:24:35 ceph-osd: *** Caught signal (Segmentation fault) ** Aug 29 10:24:35 ceph-osd: in thread 7f5f1e790700 thread_name:tp_osd_tp Aug 29 10:24:35 kernel: traps: tp_osd_tp[186933] general protection ip:7f5f43103e63 sp:7f5f1e78a1c8 error:0 in libtcmalloc.so.4.4.5[7f5f430cd000+46000] Aug 29 10:24:35 systemd: ceph-osd@0.service: main process exited, code=killed, status=11/SEGV Aug 29 10:24:35 systemd: Unit ceph-osd@0.service entered failed state. Aug 29 10:24:35 systemd: ceph-osd@0.service failed. did I hit a known issue? any suggestions are highly appreciated br wolfgang signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph cluster "hung" after node failure
Hi All. I have a ceph cluster that's partially upgraded to Luminous. Last night a host died and since then the cluster is failing to recover. It finished backfilling, but was left with thousands of requests degraded, inactive, or stale. In order to move past the issue, I put the cluster in noout,noscrub,nodeep-scrub and restarted all services one by one. Here is the current state of the cluster, any idea how to get past the stale and stuck pgs? Any help would be very appreciated. Thanks. -Brett ## ceph -s output ### $ sudo ceph -s cluster: id: health: HEALTH_ERR 165 pgs are stuck inactive for more than 60 seconds 243 pgs backfill_wait 144 pgs backfilling 332 pgs degraded 5 pgs peering 1 pgs recovery_wait 22 pgs stale 332 pgs stuck degraded 143 pgs stuck inactive 22 pgs stuck stale 531 pgs stuck unclean 330 pgs stuck undersized 330 pgs undersized 671 requests are blocked > 32 sec 603 requests are blocked > 4096 sec recovery 3524906/412016682 objects degraded (0.856%) recovery 2462252/412016682 objects misplaced (0.598%) noout,noscrub,nodeep-scrub flag(s) set mon.ceph0rdi-mon1-1-prd store is getting too big! 17612 MB >= 15360 MB mon.ceph0rdi-mon2-1-prd store is getting too big! 17669 MB >= 15360 MB mon.ceph0rdi-mon3-1-prd store is getting too big! 17586 MB >= 15360 MB services: mon: 3 daemons, quorum ceph0rdi-mon1-1-prd,ceph0rdi-mon2-1-prd,ceph0rdi-mon3-1-prd mgr: ceph0rdi-mon3-1-prd(active), standbys: ceph0rdi-mon2-1-prd, ceph0rdi-mon1-1-prd osd: 222 osds: 218 up, 218 in; 428 remapped pgs flags noout,noscrub,nodeep-scrub data: pools: 35 pools, 38144 pgs objects: 130M objects, 172 TB usage: 538 TB used, 337 TB / 875 TB avail pgs: 0.375% pgs not active 3524906/412016682 objects degraded (0.856%) 2462252/412016682 objects misplaced (0.598%) 37599 active+clean 173 active+undersized+degraded+remapped+backfill_wait 133 active+undersized+degraded+remapped+backfilling 93activating 68active+remapped+backfill_wait 22activating+undersized+degraded+remapped 13stale+active+clean 11active+remapped+backfilling 9 activating+remapped 5 remapped 5 stale+activating+remapped 3 remapped+peering 2 stale+remapped 2 stale+remapped+peering 1 activating+degraded+remapped 1 active+clean+remapped 1 active+degraded+remapped+backfill_wait 1 active+undersized+remapped+backfill_wait 1 activating+degraded 1 active+recovery_wait+undersized+degraded+remapped io: client: 187 kB/s rd, 2595 kB/s wr, 99 op/s rd, 343 op/s wr recovery: 1509 MB/s, 1541 objects/s ## ceph pg dump_stuck stale (this number doesn't seem to decrease) $ sudo ceph pg dump_stuck stale ok PG_STAT STATE UPUP_PRIMARY ACTING ACTING_PRIMARY 17.6d7 stale+remapped[5,223,96] 5 [223,96,148] 223 2.5c5 stale+active+clean [224,48,179]224 [224,48,179] 224 17.64e stale+active+clean [224,84,109]224 [224,84,109] 224 19.5b4 stale+activating+remapped [124,130,20]124 [124,20,11] 124 17.4c6 stale+active+clean [224,216,95]224 [224,216,95] 224 73.413 stale+activating+remapped [117,130,189]117 [117,189,137] 117 2.431 stale+remapped+peering [5,180,142] 5 [180,142,40] 180 69.1dc stale+active+clean[62,36,54] 62[62,36,54] 62 14.790 stale+active+clean [81,114,19] 81 [81,114,19] 81 2.78e stale+active+clean [224,143,124]224 [224,143,124] 224 73.37a stale+active+clean [224,84,38]224 [224,84,38] 224 17.42d stale+activating+remapped [220,130,25]220 [220,25,137] 220 72.263 stale+active+clean [224,148,117]224 [224,148,117] 224 67.40 stale+active+clean [62,170,71] 62 [62,170,71] 62 67.16d stale+remapped+peering[3,147,22] 3 [147,22,29] 147 20.3de stale+active+clean [224,103,126]224 [224,103,126] 224 19.721 stale+remapped[3,34,179] 3 [34,179,128] 34 19.2f1 stale+activating+remapped [126,130,178]126 [126,178,72] 126 74.28b stale+active+clean [224,95,56]224
Re: [ceph-users] SAN or DAS for Production ceph
Thanks, Tom and John, both of your input really helpful and helped to put things into perspective. Much appreciated. @John, I am based out of Dubai. On Wed, Aug 29, 2018 at 2:06 AM John Hearns wrote: > James, you also use the words enterprise and production ready. > Is Redhat support important to you? > > > > > On Tue, 28 Aug 2018 at 23:56, John Hearns wrote: > >> James, well for a start don't use a SAN. I speak as someone who managed a >> SAN with Brocade switches and multipathing for an F1 team. CEPH is Software >> Defined Storage. You want discreet storage servers with a high bandwidth >> Ethernet (or maybe Infiniband) fabric. >> >> Fibrechannel still has it place here though if you want servers with FC >> attached JBODs. >> >> Also you ask about the choice between spinning disks, SSDs and NVMe >> drives. Think about the COST for your petabyte archive. >> True, these days you can argue that all SSD could be comparable to >> spinning disks. But NVMe? Yes you get the best performance.. but do you >> really want all that video data on $$$ NVMe? You need tiering. >> >> Also dont forget low and slow archive tiers - shingled archive disks and >> perhaps tape. >> >> Me, I would start from the building blocks of Supermicro 36 bay storage >> servers. Fill them with 12 Tbyte helium drives. >> Two slots in the back for SSDs for your journaling. >> For a higher performance tier, look at the 'double double' storage >> servers from Supermicro. Or even nicer the new 'ruler'form factor servers. >> For a higher density archiving tier the 90 bay Supermicro servers. >> >> Please get in touch with someone for advice. If you are in the UK I am >> happy to help and point you in the right direction. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Tue, 28 Aug 2018 at 21:05, James Watson >> wrote: >> >>> Dear cephers, >>> >>> I am new to the storage domain. >>> Trying to get my head around the enterprise - production-ready setup. >>> >>> The following article helps a lot here: (Yahoo ceph implementation) >>> https://yahooeng.tumblr.com/tagged/object-storage >>> >>> But a couple of questions: >>> >>> What HDD would they have used here? NVMe / SATA /SAS etc (with just 52 >>> storage node they got 3.2 PB of capacity !! ) >>> I try to calculate a similar setup with HGST Ultrastar He12 (12TB and >>> it's more recent ) and would need 86 HDDs that adds up to 1 PB only!! >>> >>> How is the HDD drive attached is it DAS or a SAN (using Fibre Channel >>> Switches, Host Bus Adapters etc)? >>> >>> Do we need a proprietary hashing algorithm to implement multi-cluster >>> based setup of ceph to contain CPU/Memory usage within the cluster when >>> rebuilding happens during device failure? >>> >>> If proprietary hashing algorithm is required to setup multi-cluster ceph >>> using load balancer - then what could be the alternative setup we can >>> deploy to address the same issue? >>> >>> The aim is to design a similar architecture but with upgraded products >>> and higher performance. - Any suggestions or thoughts are welcome >>> >>> >>> >>> Thanks in advance >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Installing ceph 12.2.4 via Ubuntu apt
Hi David, Thanks for your reply. That's how I'm currently handling it. Kind regards, Tom On Tue, Aug 28, 2018 at 4:36 PM David Turner wrote: > That is the expected behavior of the ceph repo. In the past when I needed > a specific version I would download the packages for the version to a > folder and you can create a repo file that reads from a local directory. > That's how I would re-install my test lab after testing an upgrade > procedure to try it over again. > > On Tue, Aug 28, 2018, 1:01 AM Thomas Bennett wrote: > >> Hi, >> >> I'm wanting to pin to an older version of Ceph Luminous (12.2.4) and I've >> noticed that https://download.ceph.com/debian-luminous/ does not support >> this via apt install: >> apt install ceph works for 12.2.7 but >> apt install ceph=12.2.4-1xenial does not work >> >> The deb file are there, they're just not included in the package >> distribution. Is this the desired behaviour or a misconfiguration? >> >> Cheers, >> Tom >> >> -- >> Thomas Bennett >> >> SARAO >> Science Data Processing >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > -- Thomas Bennett SARAO Science Data Processing ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com