Re: [ceph-users] low power single disk nodes
Rather expensive option: Applied Micro X-Gene, overkill for a single disk, and only really available in a development kit format right now. https://www.apm.com/products/data-center/x-gene-family/x-c1-development-kits/ Better Option: Ambedded CY7 - 7 nodes in 1U half Depth, 6 positions for SATA disks, and one node with mSATA SSD http://www.ambedded.com.tw/pt_list.php?CM_ID=20140214001 --phil On 09 April 2015 at 15:57 Quentin Hartman qhart...@direwolfdigital.com wrote: I'm skeptical about how well this would work, but a Banana Pi might be a place to start. Like a raspberry pi, but it has a SATA connector: http://www.bananapi.org/ On Thu, Apr 9, 2015 at 3:18 AM, Jerker Nyberg jer...@update.uu.se mailto:jer...@update.uu.se wrote: Hello ceph users, Is anyone running any low powered single disk nodes with Ceph now? Calxeda seems to be no more according to Wikipedia. I do not think HP moonshot is what I am looking for - I want stand-alone nodes, not server cartridges integrated into server chassis. And I do not want to be locked to a single vendor. I was playing with Raspberry Pi 2 for signage when I thought of my old experiments with Ceph. I am thinking of for example Odroid-C1 or Odroid-XU3 Lite or maybe something with a low-power Intel x64/x86 processor. Together with one SSD or one low power HDD the node could get all power via PoE (via splitter or integrated into board if such boards exist). PoE provide remote power-on power-off even for consumer grade nodes. The cost for a single low power node should be able to compete with traditional PC-servers price per disk. Ceph take care of redundancy. I think simple custom casing should be good enough - maybe just strap or velcro everything on trays in the rack, at least for the nodes with SSD. Kind regards, -- Jerker Nyberg, Uppsala, Sweden. ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] low power single disk nodes
Notice that this is under their emerging technologies section. I don't think you can buy them yet. Hopefully we'll know more as time goes on. :) Mark On 04/09/2015 10:52 AM, Stillwell, Bryan wrote: These are really interesting to me, but how can you buy them? What's the performance like in ceph? Are they using the keyvaluestore backend, or something specific to these drives? Also what kind of chassis do they go into (some kind of ethernet JBOD)? Bryan On 4/9/15, 9:43 AM, Mark Nelson mnel...@redhat.com wrote: How about drives that run Linux with an ARM processor, RAM, and an ethernet port right on the drive? Notice the Ceph logo. :) https://www.hgst.com/science-of-storage/emerging-technologies/open-etherne t-drive-architecture Mark On 04/09/2015 10:37 AM, Scott Laird wrote: Minnowboard Max? 2 atom cores, 1 SATA port, and a real (non-USB) Ethernet port. On Thu, Apr 9, 2015, 8:03 AM p...@philw.com mailto:p...@philw.com p...@philw.com mailto:p...@philw.com wrote: Rather expensive option: Applied Micro X-Gene, overkill for a single disk, and only really available in a development kit format right now. https://www.apm.com/products/__data-center/x-gene-family/x-__c1-developm ent-kits/ https://www.apm.com/products/data-center/x-gene-family/x-c1-development- kits/ Better Option: Ambedded CY7 - 7 nodes in 1U half Depth, 6 positions for SATA disks, and one node with mSATA SSD http://www.ambedded.com.tw/__pt_list.php?CM_ID=20140214001 http://www.ambedded.com.tw/pt_list.php?CM_ID=20140214001 --phil On 09 April 2015 at 15:57 Quentin Hartman qhart...@direwolfdigital.com mailto:qhart...@direwolfdigital.com wrote: I'm skeptical about how well this would work, but a Banana Pi might be a place to start. Like a raspberry pi, but it has a SATA connector: http://www.bananapi.org/ On Thu, Apr 9, 2015 at 3:18 AM, Jerker Nyberg jer...@update.uu.se mailto:jer...@update.uu.se mailto:jer...@update.uu.se mailto:jer...@update.uu.se wrote: Hello ceph users, Is anyone running any low powered single disk nodes with Ceph now? Calxeda seems to be no more according to Wikipedia. I do not think HP moonshot is what I am looking for - I want stand-alone nodes, not server cartridges integrated into server chassis. And I do not want to be locked to a single vendor. I was playing with Raspberry Pi 2 for signage when I thought of my old experiments with Ceph. I am thinking of for example Odroid-C1 or Odroid-XU3 Lite or maybe something with a low-power Intel x64/x86 processor. Together with one SSD or one low power HDD the node could get all power via PoE (via splitter or integrated into board if such boards exist). PoE provide remote power-on power-off even for consumer grade nodes. The cost for a single low power node should be able to compete with traditional PC-servers price per disk. Ceph take care of redundancy. I think simple custom casing should be good enough - maybe just strap or velcro everything on trays in the rack, at least for the nodes with SSD. Kind regards, -- Jerker Nyberg, Uppsala, Sweden. _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com mailto:ceph-us...@lists.ceph.__com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or
Re: [ceph-users] low power single disk nodes
How about drives that run Linux with an ARM processor, RAM, and an ethernet port right on the drive? Notice the Ceph logo. :) https://www.hgst.com/science-of-storage/emerging-technologies/open-ethernet-drive-architecture Mark On 04/09/2015 10:37 AM, Scott Laird wrote: Minnowboard Max? 2 atom cores, 1 SATA port, and a real (non-USB) Ethernet port. On Thu, Apr 9, 2015, 8:03 AM p...@philw.com mailto:p...@philw.com p...@philw.com mailto:p...@philw.com wrote: Rather expensive option: Applied Micro X-Gene, overkill for a single disk, and only really available in a development kit format right now. https://www.apm.com/products/__data-center/x-gene-family/x-__c1-development-kits/ https://www.apm.com/products/data-center/x-gene-family/x-c1-development-kits/ Better Option: Ambedded CY7 - 7 nodes in 1U half Depth, 6 positions for SATA disks, and one node with mSATA SSD http://www.ambedded.com.tw/__pt_list.php?CM_ID=20140214001 http://www.ambedded.com.tw/pt_list.php?CM_ID=20140214001 --phil On 09 April 2015 at 15:57 Quentin Hartman qhart...@direwolfdigital.com mailto:qhart...@direwolfdigital.com wrote: I'm skeptical about how well this would work, but a Banana Pi might be a place to start. Like a raspberry pi, but it has a SATA connector: http://www.bananapi.org/ On Thu, Apr 9, 2015 at 3:18 AM, Jerker Nyberg jer...@update.uu.se mailto:jer...@update.uu.se mailto:jer...@update.uu.se mailto:jer...@update.uu.se wrote: Hello ceph users, Is anyone running any low powered single disk nodes with Ceph now? Calxeda seems to be no more according to Wikipedia. I do not think HP moonshot is what I am looking for - I want stand-alone nodes, not server cartridges integrated into server chassis. And I do not want to be locked to a single vendor. I was playing with Raspberry Pi 2 for signage when I thought of my old experiments with Ceph. I am thinking of for example Odroid-C1 or Odroid-XU3 Lite or maybe something with a low-power Intel x64/x86 processor. Together with one SSD or one low power HDD the node could get all power via PoE (via splitter or integrated into board if such boards exist). PoE provide remote power-on power-off even for consumer grade nodes. The cost for a single low power node should be able to compete with traditional PC-servers price per disk. Ceph take care of redundancy. I think simple custom casing should be good enough - maybe just strap or velcro everything on trays in the rack, at least for the nodes with SSD. Kind regards, -- Jerker Nyberg, Uppsala, Sweden. _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com mailto:ceph-us...@lists.ceph.__com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] long blocking with writes on rbds
On Wed, Apr 8, 2015 at 7:36 PM, Lionel Bouton lionel+c...@bouton.name wrote: On 04/08/15 18:24, Jeff Epstein wrote: Hi, I'm having sporadic very poor performance running ceph. Right now mkfs, even with nodiscard, takes 30 mintes or more. These kind of delays happen often but irregularly .There seems to be no common denominator. Clearly, however, they make it impossible to deploy ceph in production. I reported this problem earlier on ceph's IRC, and was told to add nodiscard to mkfs. That didn't help. Here is the command that I'm using to format an rbd: For example: mkfs.ext4 -text4 -m0 -b4096 -E nodiscard /dev/rbd1 I probably won't be able to help much, but people knowing more will need at least: - your Ceph version, - the kernel version of the host on which you are trying to format /dev/rbd1, - which hardware and network you are using for this cluster (CPU, RAM, HDD or SSD models, network cards, jumbo frames, ...). Ceph says everything is okay: cluster e96e10d3-ad2b-467f-9fe4-ab5269b70206 health HEALTH_OK monmap e1: 3 mons at {a=192.168.224.4:6789/0,b=192.168.232.4:6789/0,c=192.168.240.4:6789/0}, election epoch 12, quorum 0,1,2 a,b,c osdmap e972: 6 osds: 6 up, 6 in pgmap v4821: 4400 pgs, 44 pools, 5157 MB data, 1654 objects 46138 MB used, 1459 GB / 1504 GB avail 4400 active+clean Are there any slow request warnings in the logs? Assuming a 30 minute mkfs is somewhat reproducible, can you bump osd and ms log levels and try to capture it? Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Motherboard recommendation?
Hi, i have a backup-storage with ceph 0,93 As every backup-system it is only been written and hopefully never read. The hardware is 3 Supermicro SC847-cases with 30 SATA-HDDS each (2- and 4-TB-WD-disks) = 250TB I have realized, that the motherboards and CPUs are totally undersized, so i want to install new boards. I'm thinking of the following: 3 Supermicro X10DRH-CT or X10DRC-T4+ with 128GB memory each. What do you think about these boards? Will they fit into the SC847? They have SAS and 10G-Base-T onboard, so no extra controller seems to be necessary. What Xeon-v3 should i take, how many cores? Does anyone know if M.2-SSDs are supported in their pci-e-slots? Thank you very much, Markus -- Markus Goldberg Universität Hildesheim Rechenzentrum Tel +49 5121 88392822 Universitätsplatz 1, D-31141 Hildesheim, Germany Fax +49 5121 88392823 email goldb...@uni-hildesheim.de -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Motherboard recommendation?
Hi Mohamed, thank you for your reply. I thougt, there is a SAS-Expander on the backplanes of the SC847, so all drives can be run. Am i wrong? thanks, Markus Am 09.04.2015 um 10:24 schrieb Mohamed Pakkeer: Hi Markus, X10DRH-CT can support only 16 drive as default. If you want to connect more drive,there is a special SKU for more drive support from super micro or you need additional SAS controller. We are using 2630 V3( 8 core - 2.4GHz) *2 for 30 drives on SM X10DRI-T. It is working perfectly on replication based cluster. If you are planning to use erasure coding, you have to think about higher spec. Does any one know about the exact processor requirement of 30 drives node for erasure coding? . I can't find suitable hardware recommendation for erasure coding. Cheers K.Mohamed Pakkeer On Thu, Apr 9, 2015 at 1:30 PM, Markus Goldberg goldb...@uni-hildesheim.de mailto:goldb...@uni-hildesheim.de wrote: Hi, i have a backup-storage with ceph 0,93 As every backup-system it is only been written and hopefully never read. The hardware is 3 Supermicro SC847-cases with 30 SATA-HDDS each (2- and 4-TB-WD-disks) = 250TB I have realized, that the motherboards and CPUs are totally undersized, so i want to install new boards. I'm thinking of the following: 3 Supermicro X10DRH-CT or X10DRC-T4+ with 128GB memory each. What do you think about these boards? Will they fit into the SC847? They have SAS and 10G-Base-T onboard, so no extra controller seems to be necessary. What Xeon-v3 should i take, how many cores? Does anyone know if M.2-SSDs are supported in their pci-e-slots? Thank you very much, Markus -- Markus Goldberg Universität Hildesheim Rechenzentrum Tel +49 5121 88392822 Universitätsplatz 1, D-31141 Hildesheim, Germany Fax +49 5121 88392823 email goldb...@uni-hildesheim.de mailto:goldb...@uni-hildesheim.de -- ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Thanks Regards K.Mohamed Pakkeer Mobile- 0091-8754410114 -- MfG, Markus Goldberg -- Markus Goldberg Universität Hildesheim Rechenzentrum Tel +49 5121 88392822 Universitätsplatz 1, D-31141 Hildesheim, Germany Fax +49 5121 88392823 email goldb...@uni-hildesheim.de -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSD Hardware recommendation
Hi all, just an update - but an important one - of the previous benchmark with 2 new 10 DWPD class contenders : - Seagate 1200 - ST200FM0053 - SAS 12Gb/s - Intel DC S3700 - SATA 6Gb/s The graph : http://www.4shared.com/download/yaeJgJiFce/Perf-SSDs-Toshiba-Seagate-Inte.png?lgfp=3000 It speaks by itself, the Seagate is clearly a massive improvement over our best SSD so far (Toshiba M2). That's a 430MB/s write bandwidth reached with blocks as small as 4KB, written with SYNC and DIRECT flags. This was somewhat expected after reading this review http://www.tweaktown.com/reviews/6075/seagate-1200-stx00fm-12gb-s-sas-enterprise-ssd-review/index.html An impressive result that should make the Seagate as a SSD of choice for journal on hosts with SAS controllers I had also access to an Intel DC S3700, an unavoidable reference as Ceph journal. Indeed not bad on 4k blocks for the price. The benchs were made on Dell R730xd with H730P SAS controller (LSI 3108 12GB/s SAS) Frederic f...@univ-lr.fr f...@univ-lr.fr a écrit le 31/03/15 14:09 : Hi, in our quest to get the right SSD for OSD journals, I managed to benchmark two kind of 10 DWPD SSDs : - Toshiba M2 PX02SMF020 - Samsung 845DC PRO I wan't to determine if a disk is appropriate considering its absolute performances, and the optimal number of ceph-osd processes using the SSD as a journal. The benchmark consists of a fio command, with SYNC and DIRECT access options, and 4k blocks write accesses. fio --filename=/dev/sda --direct=1 --sync=1 --rw=write --bs=4k --runtime=60 --time_based --group_reporting --name=journal-test --iodepth=1 or 16 --numjobs= ranging from 1 to 16 I think numjobs can represent the concurrent number of OSD served by this SSD. Am I right on this ? http://www.4shared.com/download/WOvooKVXce/Fio-Direct-Sync-ToshibaM2-Sams.png?lgfp=3000 My understanding of that data is that the 845DC Pro cannot be used for more that 4 OSD. The M2 is very constant in its comportment. The iodepth has almost no impact on perfs here. Could someone having other SSD types make the same test to consolidate the data ? Among the short list that could be considered for that task (for their price/perfs/DWPD/...) : - Seagate 1200 SSD 200GB, SAS 12Gb/s ST200FM0053 - Hitachi SSD800MM MLC HUSMM8020ASS200 - Intel DC3700 I've not yet considered write amplification mentionned in other posts. Frederic Josef Johansson jose...@gmail.com a écrit le 20/03/15 10:29 : The 845DC Pro does look really nice, comparable with s3700 with TDW even. The price is what really does it, as it’s almost a third compared with s3700.. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] protocol feature mismatch after upgrading to Hammer
I upgraded from giant to hammer yesterday and now 'ceph -w' is constantly repeating this message: 2015-04-09 08:50:26.318042 7f95dbf86700 0 -- 10.5.38.1:0/2037478 10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0 l=1 c=0x7f95e0023670).connect protocol feature mismatch, my 3fff peer 13fff missing 1 It isn't always the same IP for the destination - here's another: 2015-04-09 08:50:20.322059 7f95dc087700 0 -- 10.5.38.1:0/2037478 10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0 l=1 c=0x7f95e002b480).connect protocol feature mismatch, my 3fff peer 13fff missing 1 Some details about our install: We have 24 hosts with 18 OSDs each. 16 per host are spinning disks in an erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions used for a caching tier in front of the EC pool. All 24 hosts are monitors. 4 hosts are mds. We are running cephfs with a client trying to write data over cephfs when we're seeing these messages. Any ideas? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS unmatched rstat after upgrade hammer
Alright sounds good. Only one comment then: From an IT/ops perspective all I see is ERR and that raises red flags. So the exposure of the message might need some tweaking. In production I like to be notified of an issue but have reassurance it was fixed within the system. Best Regards On Wed, Apr 8, 2015 at 8:10 PM Yan, Zheng uker...@gmail.com wrote: On Thu, Apr 9, 2015 at 7:09 AM, Scottix scot...@gmail.com wrote: I was testing the upgrade on our dev environment and after I restarted the mds I got the following errors. 2015-04-08 15:58:34.056470 mds.0 [ERR] unmatched rstat on 605, inode has n(v70 rc2015-03-16 09:11:34.390905), dirfrags have n(v0 rc2015-03-16 09:11:34.390905 1=0+1) 2015-04-08 15:58:34.056530 mds.0 [ERR] unmatched rstat on 604, inode has n(v69 rc2015-03-31 08:07:09.265241), dirfrags have n(v0 rc2015-03-31 08:07:09.265241 1=0+1) 2015-04-08 15:58:34.056581 mds.0 [ERR] unmatched rstat on 606, inode has n(v67 rc2015-03-16 08:54:36.314790), dirfrags have n(v0 rc2015-03-16 08:54:36.314790 1=0+1) 2015-04-08 15:58:34.056633 mds.0 [ERR] unmatched rstat on 607, inode has n(v57 rc2015-03-16 08:54:46.797240), dirfrags have n(v0 rc2015-03-16 08:54:46.797240 1=0+1) 2015-04-08 15:58:34.056687 mds.0 [ERR] unmatched rstat on 608, inode has n(v23 rc2015-03-16 08:54:59.634299), dirfrags have n(v0 rc2015-03-16 08:54:59.634299 1=0+1) 2015-04-08 15:58:34.056737 mds.0 [ERR] unmatched rstat on 609, inode has n(v62 rc2015-03-16 08:55:06.598286), dirfrags have n(v0 rc2015-03-16 08:55:06.598286 1=0+1) 2015-04-08 15:58:34.056789 mds.0 [ERR] unmatched rstat on 600, inode has n(v101 rc2015-03-16 08:55:16.153175), dirfrags have n(v0 rc2015-03-16 08:55:16.153175 1=0+1) These errors are likely caused by the bug that rstats are not set to correct values when creating new fs. Nothing to worry about, the MDS automatically fixes rstat errors. I am not sure if this is an issue or got fixed or something I should worry about. But would just like some context around this issue since it came up in the ceph -w and other users might see it as well. I have done a lot of unsafe stuff on this mds so not to freak anyone out if that is the issue. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] low power single disk nodes
Minnowboard Max? 2 atom cores, 1 SATA port, and a real (non-USB) Ethernet port. On Thu, Apr 9, 2015, 8:03 AM p...@philw.com p...@philw.com wrote: Rather expensive option: Applied Micro X-Gene, overkill for a single disk, and only really available in a development kit format right now. https://www.apm.com/products/data-center/x-gene-family/x- c1-development-kits/ Better Option: Ambedded CY7 - 7 nodes in 1U half Depth, 6 positions for SATA disks, and one node with mSATA SSD http://www.ambedded.com.tw/pt_list.php?CM_ID=20140214001 --phil On 09 April 2015 at 15:57 Quentin Hartman qhart...@direwolfdigital.com wrote: I'm skeptical about how well this would work, but a Banana Pi might be a place to start. Like a raspberry pi, but it has a SATA connector: http://www.bananapi.org/ On Thu, Apr 9, 2015 at 3:18 AM, Jerker Nyberg jer...@update.uu.se mailto:jer...@update.uu.se wrote: Hello ceph users, Is anyone running any low powered single disk nodes with Ceph now? Calxeda seems to be no more according to Wikipedia. I do not think HP moonshot is what I am looking for - I want stand-alone nodes, not server cartridges integrated into server chassis. And I do not want to be locked to a single vendor. I was playing with Raspberry Pi 2 for signage when I thought of my old experiments with Ceph. I am thinking of for example Odroid-C1 or Odroid-XU3 Lite or maybe something with a low-power Intel x64/x86 processor. Together with one SSD or one low power HDD the node could get all power via PoE (via splitter or integrated into board if such boards exist). PoE provide remote power-on power-off even for consumer grade nodes. The cost for a single low power node should be able to compete with traditional PC-servers price per disk. Ceph take care of redundancy. I think simple custom casing should be good enough - maybe just strap or velcro everything on trays in the rack, at least for the nodes with SSD. Kind regards, -- Jerker Nyberg, Uppsala, Sweden. ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] protocol feature mismatch after upgrading to Hammer
Can you dump your crush map and post it on pastebin or something? On Thu, Apr 9, 2015 at 7:26 AM, Kyle Hutson kylehut...@ksu.edu wrote: Nope - it's 64-bit. (Sorry, I missed the reply-all last time.) On Thu, Apr 9, 2015 at 9:24 AM, Gregory Farnum g...@gregs42.com wrote: [Re-added the list] Hmm, I'm checking the code and that shouldn't be possible. What's your ciient? (In particular, is it 32-bit? That's the only thing i can think of that might have slipped through our QA.) On Thu, Apr 9, 2015 at 7:17 AM, Kyle Hutson kylehut...@ksu.edu wrote: I did nothing to enable anything else. Just changed my ceph repo from 'giant' to 'hammer', then did 'yum update' and restarted services. On Thu, Apr 9, 2015 at 9:15 AM, Gregory Farnum g...@gregs42.com wrote: Did you enable the straw2 stuff? CRUSHV4 shouldn't be required by the cluster unless you made changes to the layout requiring it. If you did, the clients have to be upgraded to understand it. You could disable all the v4 features; that should let them connect again. -Greg On Thu, Apr 9, 2015 at 7:07 AM, Kyle Hutson kylehut...@ksu.edu wrote: This particular problem I just figured out myself ('ceph -w' was still running from before the upgrade, and ctrl-c and restarting solved that issue), but I'm still having a similar problem on the ceph client: libceph: mon19 10.5.38.20:6789 feature set mismatch, my 2b84a042aca server's 102b84a042aca, missing 1 It appears that even the latest kernel doesn't have support for CEPH_FEATURE_CRUSH_V4 How do I make my ceph cluster backward-compatible with the old cephfs client? On Thu, Apr 9, 2015 at 8:58 AM, Kyle Hutson kylehut...@ksu.edu wrote: I upgraded from giant to hammer yesterday and now 'ceph -w' is constantly repeating this message: 2015-04-09 08:50:26.318042 7f95dbf86700 0 -- 10.5.38.1:0/2037478 10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0 l=1 c=0x7f95e0023670).connect protocol feature mismatch, my 3fff peer 13fff missing 1 It isn't always the same IP for the destination - here's another: 2015-04-09 08:50:20.322059 7f95dc087700 0 -- 10.5.38.1:0/2037478 10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0 l=1 c=0x7f95e002b480).connect protocol feature mismatch, my 3fff peer 13fff missing 1 Some details about our install: We have 24 hosts with 18 OSDs each. 16 per host are spinning disks in an erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions used for a caching tier in front of the EC pool. All 24 hosts are monitors. 4 hosts are mds. We are running cephfs with a client trying to write data over cephfs when we're seeing these messages. Any ideas? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Motherboard recommendation?
Hi Markus, I think,if you connect more than 16 drives on back plane,X10DRH-CT will detect and show only 16 drives in BIOS. I am not sure about that. If you test this motherboard, please let me know the result. Msg form supermicro site LSI 3108 SAS3 (12Gbps) controller; - 2GB cache; HW RAID 0, 1, 5, 6, 10, 50, 60 - Supports up to 16 devices as default, more HDD devices support is also available as an option * For special SKU, please contact your Supermicro Sales. Thanks K.Mohamed Pakkeer On Thu, Apr 9, 2015 at 5:05 PM, Markus Goldberg goldb...@uni-hildesheim.de wrote: Hi Mohamed, thank you for your reply. I thougt, there is a SAS-Expander on the backplanes of the SC847, so all drives can be run. Am i wrong? thanks, Markus Am 09.04.2015 um 10:24 schrieb Mohamed Pakkeer: Hi Markus, X10DRH-CT can support only 16 drive as default. If you want to connect more drive,there is a special SKU for more drive support from super micro or you need additional SAS controller. We are using 2630 V3( 8 core - 2.4GHz) *2 for 30 drives on SM X10DRI-T. It is working perfectly on replication based cluster. If you are planning to use erasure coding, you have to think about higher spec. Does any one know about the exact processor requirement of 30 drives node for erasure coding? . I can't find suitable hardware recommendation for erasure coding. Cheers K.Mohamed Pakkeer On Thu, Apr 9, 2015 at 1:30 PM, Markus Goldberg goldb...@uni-hildesheim.de wrote: Hi, i have a backup-storage with ceph 0,93 As every backup-system it is only been written and hopefully never read. The hardware is 3 Supermicro SC847-cases with 30 SATA-HDDS each (2- and 4-TB-WD-disks) = 250TB I have realized, that the motherboards and CPUs are totally undersized, so i want to install new boards. I'm thinking of the following: 3 Supermicro X10DRH-CT or X10DRC-T4+ with 128GB memory each. What do you think about these boards? Will they fit into the SC847? They have SAS and 10G-Base-T onboard, so no extra controller seems to be necessary. What Xeon-v3 should i take, how many cores? Does anyone know if M.2-SSDs are supported in their pci-e-slots? Thank you very much, Markus -- Markus Goldberg Universität Hildesheim Rechenzentrum Tel +49 5121 88392822 Universitätsplatz 1, D-31141 Hildesheim, Germany Fax +49 5121 88392823 email goldb...@uni-hildesheim.de -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Thanks Regards K.Mohamed Pakkeer Mobile- 0091-8754410114 -- MfG, Markus Goldberg -- Markus Goldberg Universität Hildesheim Rechenzentrum Tel +49 5121 88392822 Universitätsplatz 1, D-31141 Hildesheim, Germany Fax +49 5121 88392823 email goldb...@uni-hildesheim.de -- -- Thanks Regards K.Mohamed Pakkeer Mobile- 0091-8754410114 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Recovering incomplete PGs with ceph_objectstore_tool
Congrats Chris and nice save on that RBD! -- Paul On Apr 9, 2015, at 11:11 AM, Chris Kitzmiller ckitzmil...@hampshire.edu wrote: Success! Hopefully my notes from the process will help: In the event of multiple disk failures the cluster could lose PGs. Should this occur it is best to attempt to restart the OSD process and have the drive marked as up+out. Marking the drive as out will cause data to flow off the drive to elsewhere in the cluster. In the event that the ceph-osd process is unable to keep running you could try using the ceph_objectstore_tool program to extract just the damaged PGs and import them into working PGs. Fixing Journals In this particular scenario things were complicated by the fact that ceph_objectstore_tool came out in Giant but we were running Firefly. Not wanting to upgrade the cluster in a degraded state this required that the OSD drives be moved to a different physical machine for repair. This added a lot of steps related to the journals but it wasn't a big deal. That process looks like: On Storage1: stop ceph-osd id=15 ceph-osd -i 15 --flush-journal ls -l /var/lib/ceph/osd/ceph-15/journal Note the journal device UUID then pull the disk and move it to Ithome: rm /var/lib/ceph/osd/ceph-15/journal ceph-osd -i 15 --mkjournal That creates a colocated journal for which to use during the ceph_objectstore_tool commands. Once done then: ceph-osd -i 15 --flush-journal rm /var/lib/ceph/osd/ceph-15/journal Pull the disk and bring it back to Storage1. Then: ln -s /dev/disk/by-partitionuuid/b4f8d911-5ac9-4bf0-a06a-b8492e25a00f /var/lib/ceph/osd/ceph-15/journal ceph-osd -i 15 --mkjournal start ceph-osd id=15 This all won't be needed once the cluster is running Hammer because then there will be an available version of ceph_objectstore_tool on the local machine and you can keep the journals throughout the process. Recovery Process We were missing two PGs, 3.c7 and 3.102. These PGs were hosted on OSD.0 and OSD.15 which were the two disks which failed out of Storage1. The disk for OSD.0 seemed to be a total loss while the disk for OSD.15 was somewhat more cooperative but not in a place to be up and running in the cluster. I took the dying OSD.15 drive and placed it into a new physical machine with a fresh install of Ceph Giant. Using Giant's ceph_objectstore_tool I was able to extract the PGs with a command like: for i in 3.c7 3.102 ; do ceph_objectstore_tool --data /var/lib/ceph/osd/ceph-15 --journal /var/lib/ceph/osd/ceph-15/journal --op export --pgid $i --file ~/${i}.export Once both PGs were successfully exported I attempted to import them into a new temporary OSD following instructions from here. For some reason that didn't work. The OSD was up+in but wasn't backfilling the PGs into the cluster. If you find yourself in this process I would try that first just in case it provides a cleaner process. Considering the above didn't work and we were looking at the possibility of losing the RBD volume (or perhaps worse, the potential of fruitlessly fscking 35TB) I took what I might describe as heroic measures: Running ceph pg dump | grep incomplete 3.c7 0 0 0 0 0 0 0 incomplete 2015-04-02 20:49:32.968841 0'0 15730:17 [15,0] 15 [15,0] 15 13985'54076 2015-03-31 19:14:22.721695 13985'54076 2015-03-31 19:14:22.721695 3.102 0 0 0 0 0 0 0 incomplete 2015-04-02 20:49:32.529594 0'0 15730:21 [0,15] 0 [0,15] 0 13985'53107 2015-03-29 21:17:15.568125 13985'49195 2015-03-24 18:38:08.244769 Then I stopped all OSDs, which blocked all I/O to the cluster, with: stop ceph-osd-all Then I looked for all copies of the PG on all OSDs with: for i in 3.c7 3.102 ; do find /var/lib/ceph/osd/ -maxdepth 3 -type d -name $i ; done | sort -V /var/lib/ceph/osd/ceph-0/current/3.c7_head /var/lib/ceph/osd/ceph-0/current/3.102_head /var/lib/ceph/osd/ceph-3/current/3.c7_head /var/lib/ceph/osd/ceph-13/current/3.102_head /var/lib/ceph/osd/ceph-15/current/3.c7_head /var/lib/ceph/osd/ceph-15/current/3.102_head Then I flushed the journals for all of those OSDs with: for i in 0 3 13 15 ; do ceph-osd -i $i --flush-journal ; done Then I removed all of those drives and moved them (using Journal Fixing above) to Ithome where I used ceph_objectstore_tool to remove all traces of 3.102 and 3.c7: for i in 0 3 13 15 ; do for j in 3.c7 3.102 ; do ceph_objectstore_tool --data /var/lib/ceph/osd/ceph-$i --journal /var/lib/ceph/osd/ceph-$i/journal --op remove --pgid $j ; done ; done Then I imported the PGs onto OSD.0 and OSD.15 with: for i in 0 15 ; do for j in 3.c7 3.102 ; do ceph_objectstore_tool --data /var/lib/ceph/osd/ceph-$i --journal /var/lib/ceph/osd/ceph-$i/journal --op import --file ~/${j}.export ; done ; done for i in 0 15 ; do ceph-osd -i $i --flush-journal rm /var/log/ceph/osd/ceph-$i/journal ; done Then I moved the disks back
Re: [ceph-users] cache-tier do not evict
Hi, ceph version 0.87.1 thanks best regards -Original message- From: Chu Duc Minh chu.ducm...@gmail.com Sent: Thursday 9th April 2015 15:03 To: Patrik Plank pat...@plank.me Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com ceph-users@lists.ceph.com Subject: Re: [ceph-users] cache-tier do not evict What ceph version do you use? Regards, On 9 Apr 2015 18:58, Patrik Plank pat...@plank.me mailto:pat...@plank.me wrote: Hi, i have build a cach-tier pool (replica 2) with 3 x 512gb ssd for my kvm pool. these are my settings : ceph osd tier add kvm cache-pool ceph osd tier cache-mode cache-pool writeback ceph osd tier set-overlay kvm cache-pool ceph osd pool set cache-pool hit_set_type bloom ceph osd pool set cache-pool hit_set_count 1 ceph osd pool set cache-pool hit set period 3600 ceph osd pool set cache-pool target_max_bytes 751619276800 ceph osd pool set cache-pool target_max_objects 100 ceph osd pool set cache-pool cache_min_flush_age 1800 ceph osd pool set cache-pool cache_min_evict_age 600 ceph osd pool cache-pool cache_target_dirty_ratio .4 ceph osd pool cache-pool cache target_full_ratio .8 So the problem is, the cache-tier do no evict automatically. If i copy some kvm images to the ceph cluster, the cache osds always run full. Is that normal? Is there a miss configuration? thanks best regards Patrik ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSDs not coming up on one host
On Wed, Apr 08, 2015 at 03:42:29PM +, Gregory Farnum wrote: Im on my phone so can't check exactly what those threads are trying to do, but the osd has several threads which are stuck. The FileStore threads are certainly trying to access the disk/local filesystem. You may not have a hardware fault, but it looks like something in your stack is not behaving when the osd asks the filesystem to do something. Check dmesg, etc. -Greg Noticed a bit in dmesg that seems to be controller-related (HP Smart Array P420i) where I/O was hanging in some cases[1]; fixed by updating from 5.42 to 6.00 [1] http://h20564.www2.hp.com/hpsc/doc/public/display?docId=emr_na-c03555882 In dmesg: [11775.779477] hpsa :08:00.0: ABORT REQUEST on C1:B0:T0:L0 Tag:0x:0010 Command:0x2a SN:0x49fb REQUEST SUCCEEDED. [11812.170350] hpsa :08:00.0: Abort request on C1:B0:T0:L0 [11817.386773] hpsa :08:00.0: cp 880522bff000 is reported invalid (probably means target device no longer present) [11817.386784] hpsa :08:00.0: ABORT REQUEST on C1:B0:T0:L0 Tag:0x:0010 Command:0x2a SN:0x4a13 REQUEST SUCCEEDED. The problem still appears to be persisting in the cluster, although I am no longer seeing the disk-related errors in dmesg, I am still getting errors in the osd logs: 2015-04-08 17:24:15.024820 7f0f21e9f700 1 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f0f21e9f700' had timed out after 4 2015-04-08 17:24:15.025043 7f0f2169e700 1 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f0f2169e700' had timed out after 4 2015-04-08 17:48:33.146399 7f0f21e9f700 1 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f0f21e9f700' had timed out after 4 2015-04-08 17:48:33.146439 7f0f2169e700 1 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f0f2169e700' had timed out after 4 2015-04-08 18:55:31.107727 7f0f16740700 1 heartbeat_map reset_timeout 'OSD::recovery_tp thread 0x7f0f16740700' had timed out after 4 2015-04-08 18:55:31.107774 7f0f2169e700 1 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f0f2169e700' had timed out after 4 2015-04-08 18:55:31.107789 7f0f21e9f700 1 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f0f21e9f700' had timed out after 4 2015-04-08 18:55:31.108225 7f0f29eaf700 1 heartbeat_map is_healthy 'OSD::command_tp thread 0x7f0f1573e700' had timed out after 4 2015-04-08 18:55:31.108268 7f0f15f3f700 1 heartbeat_map reset_timeout 'OSD::disk_tp thread 0x7f0f15f3f700' had timed out after 4 2015-04-08 18:55:31.108272 7f0f29eaf700 1 heartbeat_map is_healthy 'OSD::op_tp thread 0x7f0f17742700' had timed out after 4 2015-04-08 18:55:31.108281 7f0f29eaf700 1 heartbeat_map is_healthy 'OSD::op_tp thread 0x7f0f16f41700' had timed out after 4 2015-04-08 18:55:31.108285 7f0f1573e700 1 heartbeat_map reset_timeout 'OSD::command_tp thread 0x7f0f1573e700' had timed out after 4 2015-04-08 18:55:31.108345 7f0f16f41700 1 heartbeat_map reset_timeout 'OSD::op_tp thread 0x7f0f16f41700' had timed out after 4 2015-04-08 18:55:31.108378 7f0f17742700 1 heartbeat_map reset_timeout 'OSD::op_tp thread 0x7f0f17742700' had timed out after 4 2015-04-08 19:01:20.694897 7f0f15f3f700 1 heartbeat_map reset_timeout 'OSD::disk_tp thread 0x7f0f15f3f700' had timed out after 4 2015-04-08 19:01:20.694928 7f0f17742700 1 heartbeat_map reset_timeout 'OSD::op_tp thread 0x7f0f17742700' had timed out after 4 2015-04-08 19:01:20.694970 7f0f16f41700 1 heartbeat_map reset_timeout 'OSD::op_tp thread 0x7f0f16f41700' had timed out after 4 2015-04-08 19:01:20.695544 7f0f1573e700 1 heartbeat_map reset_timeout 'OSD::command_tp thread 0x7f0f1573e700' had timed out after 4 2015-04-08 19:01:20.695665 7f0f16740700 1 heartbeat_map reset_timeout 'OSD::recovery_tp thread 0x7f0f16740700' had timed out after 4 2015-04-08 19:01:34.979288 7f0f1573e700 1 heartbeat_map reset_timeout 'OSD::command_tp thread 0x7f0f1573e700' had timed out after 4 2015-04-08 19:01:34.979498 7f0f21e9f700 1 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f0f21e9f700' had timed out after 4 2015-04-08 19:01:34.979513 7f0f16f41700 1 heartbeat_map reset_timeout 'OSD::op_tp thread 0x7f0f16f41700' had timed out after 4 2015-04-08 19:01:34.979535 7f0f2169e700 1 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f0f2169e700' had timed out after 4 2015-04-08 19:01:34.980021 7f0f15f3f700 1 heartbeat_map reset_timeout 'OSD::disk_tp thread 0x7f0f15f3f700' had timed out after 4 2015-04-08 19:01:34.980051 7f0f17742700 1 heartbeat_map reset_timeout 'OSD::op_tp thread 0x7f0f17742700' had timed out after 4 2015-04-08 19:01:34.980392 7f0f16740700 1 heartbeat_map reset_timeout 'OSD::recovery_tp thread 0x7f0f16740700' had timed out after 4 2015-04-08 19:03:34.731872 7f0f1573e700 1 heartbeat_map reset_timeout 'OSD::command_tp thread 0x7f0f1573e700' had timed out after 4 2015-04-08 19:03:34.731972 7f0f21e9f700 1 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f0f21e9f700'
Re: [ceph-users] long blocking with writes on rbds
On Thu, 09 Apr 2015 00:25:08 -0400 Jeff Epstein wrote: Running Ceph on AWS is, as was mentioned before, certainly not going to improve things when compared to real HW. At the very least it will make performance unpredictable. Your 6 OSDs are on a single VM from what I gather? Aside from being a very small number for something that you seem to be using in some sort of production environment (Ceph gets faster the more OSDs you add), where is the redundancy, HA in that? The number of your PGs and PGPs need to have at least a semblance of being correctly sized, as others mentioned before. You want to re-read the Ceph docs about that and check out the PG calculator: http://ceph.com/pgcalc/ Our workload involves creating and destroying a lot of pools. Each pool has 100 pgs, so it adds up. Could this be causing the problem? What would you suggest instead? ...this is most likely the cause. Deleting a pool causes the data and pgs associated with it to be deleted asynchronously, which can be a lot of background work for the osds. If you're using the cfq scheduler you can try decreasing the priority of these operations with the osd disk thread ioprio... options: http://ceph.com/docs/master/rados/configuration/osd-config-ref/#operations If that doesn't help enough, deleting data from pools before deleting the pools might help, since you can control the rate more finely. And of course not creating/deleting so many pools would eliminate the hidden background cost of deleting the pools. Thanks for your answer. Some follow-up questions: - I wouldn't expect that pool deletion is the problem, since our pools, although many, don't contain much data. Typically, we will have one rbd per pool, several GB in size, but in practice containing little data. Would you expect that performance penalty from deleting pool to be relative to the requested size of the rbd, or relative to the quantity of data actually stored in it? Since RBDs are sparsely allocated, the actual data used is the key factor. But you're adding the pool removal overhead to this. - Rather than creating and deleting multiple pools, each containing a single rbd, do you think we would see a speed-up if we were to instead have one pool, containing multiple (frequently created and deleted) rbds? Does the performance penalty stem only from deleting pools themselves, or from deleting objects within the pool as well? Both and the fact that you have overloaded the PGs by nearly a factor of 10 (or 20 if you're actually using a replica of 3 and not 1)doesn't help one bit. And lets clarify what objects are in the Ceph/RBD context, they're the (by default) 4MB blobs that make up a RBD image. - Somewhat off-topic, but for my own curiosity: Why is deleting data so slow, in terms of ceph's architecture? Shouldn't it just be a matter of flagging a region as available and allowing it to be overwritten, as would a traditional file system? Apples and oranges, as RBD is block storage, not a FS. That said, a traditional FS is local and updates an inode or equivalent bit. For Ceph to delete a RBD image, it has to go to all cluster nodes with OSDs that have PGs that contain objects of that image. Then those objects have to be deleted on the local filesystem of the OSD and various maps updated cluster wide. Rince and repeat until all objects have been dealt with. Quite a bit more involved, but that's the price you have to pay when you have a DISTRIBUTED storage architecture that doesn't rely on a single item (like an inode) to reflect things for the whole system. Christian Jeff ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] long blocking with writes on rbds
On 04/09/2015 03:14 AM, Christian Balzer wrote: Your 6 OSDs are on a single VM from what I gather? Aside from being a very small number for something that you seem to be using in some sort of production environment (Ceph gets faster the more OSDs you add), where is the redundancy, HA in that? We are running one OSD per VM. All data is replicated across three VMs. The number of your PGs and PGPs need to have at least a semblance of being correctly sized, as others mentioned before. You want to re-read the Ceph docs about that and check out the PG calculator: http://ceph.com/pgcalc/ My choice of pgs is based on this page. Since each pool is spread across 3 OSDs, 100 seemed like a good number. Am I misinterpreting this documentation? http://ceph.com/docs/master/rados/operations/placement-groups/ Since RBDs are sparsely allocated, the actual data used is the key factor. But you're adding the pool removal overhead to this. How much overhead does pool removal add? Both and the fact that you have overloaded the PGs by nearly a factor of 10 (or 20 if you're actually using a replica of 3 and not 1)doesn't help one bit. And lets clarify what objects are in the Ceph/RBD context, they're the (by default) 4MB blobs that make up a RBD image. I'm curious how you reached your estimation of overloading. According to the pg calculator you linked to, given that each pool occupies only 3 OSDs, the suggested number of pgs is around 100. Can you explain? - Somewhat off-topic, but for my own curiosity: Why is deleting data so slow, in terms of ceph's architecture? Shouldn't it just be a matter of flagging a region as available and allowing it to be overwritten, as would a traditional file system? Apples and oranges, as RBD is block storage, not a FS. That said, a traditional FS is local and updates an inode or equivalent bit. For Ceph to delete a RBD image, it has to go to all cluster nodes with OSDs that have PGs that contain objects of that image. Then those objects have to be deleted on the local filesystem of the OSD and various maps updated cluster wide. Rince and repeat until all objects have been dealt with. Quite a bit more involved, but that's the price you have to pay when you have a DISTRIBUTED storage architecture that doesn't rely on a single item (like an inode) to reflect things for the whole system. Thank you for explaining. Jeff ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Hammer : Ceph-deploy 1.5.23-0 : RGW civetweb :: Not getting installed
Hi Vickey, The keyring gets created as part of the initial deployment so it should be on your admin node right alongside the admin keyring etc. FWIW, I tried this quickly yesterday and it failed because the RGW directory didn't exist on the node that I was attempting to deploy to ... but I didn't actually look that deeply into it as it's not critical for what I wanted to complete today. The keyring was definately there following a successful deployment though. Kind regards Iain On Thu, Apr 9, 2015 at 7:41 PM, Vickey Singh vickey.singh22...@gmail.com wrote: Hello Cephers I am trying to setup RGW using Ceph-deploy which is described here http://docs.ceph.com/docs/master/start/quick-ceph-deploy/#add-an-rgw-instance But unfortunately it doesn't seems to be working Is there something i am missing or you know some fix for this. [root@ceph-node1 yum.repos.d]# ceph -v *ceph version 0.94* (e61c4f093f88e44961d157f65091733580cea79a) [root@ceph-node1 yum.repos.d]# # yum update ceph-deploy SKIPPED Verifying : ceph-deploy-1.5.22-0.noarch 2/2 Updated: * ceph-deploy.noarch 0:1.5.23-0* Complete! [root@ceph-node1 ceph]# [root@ceph-node1 ceph]# ceph-deploy rgw create rgw-node1 [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.23): /usr/bin/ceph-deploy rgw create rgw-node1 [ceph_deploy.rgw][DEBUG ] Deploying rgw, cluster ceph hosts rgw-node1:rgw.rgw-node1 *[ceph_deploy][ERROR ] RuntimeError: bootstrap-rgw keyring not found; run 'gatherkeys'* [root@ceph-node1 ceph]# ceph-deploy --overwrite-conf mon create-initial [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.23): /usr/bin/ceph-deploy --overwrite-conf mon create-initial SKIPPED [ceph_deploy.mon][INFO ] mon.ceph-node1 monitor has reached quorum! [ceph_deploy.mon][INFO ] all initial monitors are running and have formed quorum [ceph_deploy.mon][INFO ] Running gatherkeys... [ceph_deploy.gatherkeys][DEBUG ] Have ceph.client.admin.keyring [ceph_deploy.gatherkeys][DEBUG ] Have ceph.mon.keyring [ceph_deploy.gatherkeys][DEBUG ] Have ceph.bootstrap-osd.keyring [ceph_deploy.gatherkeys][DEBUG ] Have ceph.bootstrap-mds.keyring [ceph_deploy.gatherkeys][DEBUG ] Checking ceph-node1 for /var/lib/ceph/bootstrap-rgw/ceph.keyring [ceph-node1][DEBUG ] connected to host: ceph-node1 [ceph-node1][DEBUG ] detect platform information from remote host [ceph-node1][DEBUG ] detect machine type [ceph-node1][DEBUG ] fetch remote file *[ceph_deploy.gatherkeys][WARNIN] Unable to find /var/lib/ceph/bootstrap-rgw/ceph.keyring on ceph-node1* *[ceph_deploy.gatherkeys][WARNIN] No RGW bootstrap key found. Will not be able to deploy RGW daemons* [root@ceph-node1 ceph]# [root@ceph-node1 ceph]# ceph-deploy gatherkeys ceph-node1 [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.23): /usr/bin/ceph-deploy gatherkeys ceph-node1 [ceph_deploy.gatherkeys][DEBUG ] Have ceph.client.admin.keyring [ceph_deploy.gatherkeys][DEBUG ] Have ceph.mon.keyring [ceph_deploy.gatherkeys][DEBUG ] Have ceph.bootstrap-osd.keyring [ceph_deploy.gatherkeys][DEBUG ] Have ceph.bootstrap-mds.keyring [ceph_deploy.gatherkeys][DEBUG ] Checking ceph-node1 for /var/lib/ceph/bootstrap-rgw/ceph.keyring [ceph-node1][DEBUG ] connected to host: ceph-node1 [ceph-node1][DEBUG ] detect platform information from remote host [ceph-node1][DEBUG ] detect machine type [ceph-node1][DEBUG ] fetch remote file *[ceph_deploy.gatherkeys][WARNIN] Unable to find /var/lib/ceph/bootstrap-rgw/ceph.keyring on ceph-node1* *[ceph_deploy.gatherkeys][WARNIN] No RGW bootstrap key found. Will not be able to deploy RGW daemons* [root@ceph-node1 ceph]# Regards VS ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Iain Geddes Application Engineer[image: Cyan] http://cyaninc.com/1383 North McDowell Blvd. Petaluma, CA 94954M+353 89 432 6811eiain.ged...@cyaninc.comwww.cyaninc.com[image: Facebook] http://www.facebook.com/CyanInc [image: LinkedIn] http://www.linkedin.com/company/cyan-inc?trk=hb_tab_compy_id_2171992 [image: Twitter] http://twitter.com/CyanNews ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cascading Failure of OSDs
Francois Lafont wrote: Just in case it could be useful, I have noticed the -s option (on my Ubuntu) that offer an output probably easier to parse: # column -t is just to make it's nice for the human eyes. ifconfig -s | column -t Since ifconfig is deprecated, one should use iproute2 instead. ip -s link show p2p1 | awk '/(RX|TX):/{getline; print $3;}' However, the sysfs interface is probably a better alternative. See https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-class-net-statistics and https://www.kernel.org/doc/Documentation/ABI/README. -- Carl-Johan Schenström Driftansvarig / System Administrator Språkbanken Svensk nationell datatjänst / The Swedish Language Bank Swedish National Data Service Göteborgs universitet / University of Gothenburg carl-johan.schenst...@gu.se / +46 709 116769 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] cache-tier do not evict
Hi, i have build a cach-tier pool (replica 2) with 3 x 512gb ssd for my kvm pool. these are my settings : ceph osd tier add kvm cache-pool ceph osd tier cache-mode cache-pool writeback ceph osd tier set-overlay kvm cache-pool ceph osd pool set cache-pool hit_set_type bloom ceph osd pool set cache-pool hit_set_count 1 ceph osd pool set cache-pool hit set period 3600 ceph osd pool set cache-pool target_max_bytes 751619276800 ceph osd pool set cache-pool target_max_objects 100 ceph osd pool set cache-pool cache_min_flush_age 1800 ceph osd pool set cache-pool cache_min_evict_age 600 ceph osd pool cache-pool cache_target_dirty_ratio .4 ceph osd pool cache-pool cache target_full_ratio .8 So the problem is, the cache-tier do no evict automatically. If i copy some kvm images to the ceph cluster, the cache osds always run full. Is that normal? Is there a miss configuration? thanks best regards Patrik ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] protocol feature mismatch after upgrading to Hammer
[Re-added the list] Hmm, I'm checking the code and that shouldn't be possible. What's your ciient? (In particular, is it 32-bit? That's the only thing i can think of that might have slipped through our QA.) On Thu, Apr 9, 2015 at 7:17 AM, Kyle Hutson kylehut...@ksu.edu wrote: I did nothing to enable anything else. Just changed my ceph repo from 'giant' to 'hammer', then did 'yum update' and restarted services. On Thu, Apr 9, 2015 at 9:15 AM, Gregory Farnum g...@gregs42.com wrote: Did you enable the straw2 stuff? CRUSHV4 shouldn't be required by the cluster unless you made changes to the layout requiring it. If you did, the clients have to be upgraded to understand it. You could disable all the v4 features; that should let them connect again. -Greg On Thu, Apr 9, 2015 at 7:07 AM, Kyle Hutson kylehut...@ksu.edu wrote: This particular problem I just figured out myself ('ceph -w' was still running from before the upgrade, and ctrl-c and restarting solved that issue), but I'm still having a similar problem on the ceph client: libceph: mon19 10.5.38.20:6789 feature set mismatch, my 2b84a042aca server's 102b84a042aca, missing 1 It appears that even the latest kernel doesn't have support for CEPH_FEATURE_CRUSH_V4 How do I make my ceph cluster backward-compatible with the old cephfs client? On Thu, Apr 9, 2015 at 8:58 AM, Kyle Hutson kylehut...@ksu.edu wrote: I upgraded from giant to hammer yesterday and now 'ceph -w' is constantly repeating this message: 2015-04-09 08:50:26.318042 7f95dbf86700 0 -- 10.5.38.1:0/2037478 10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0 l=1 c=0x7f95e0023670).connect protocol feature mismatch, my 3fff peer 13fff missing 1 It isn't always the same IP for the destination - here's another: 2015-04-09 08:50:20.322059 7f95dc087700 0 -- 10.5.38.1:0/2037478 10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0 l=1 c=0x7f95e002b480).connect protocol feature mismatch, my 3fff peer 13fff missing 1 Some details about our install: We have 24 hosts with 18 OSDs each. 16 per host are spinning disks in an erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions used for a caching tier in front of the EC pool. All 24 hosts are monitors. 4 hosts are mds. We are running cephfs with a client trying to write data over cephfs when we're seeing these messages. Any ideas? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cache-tier do not evict
Hi, set the cache-tier size to 644245094400. This should work. But it is the same. thanks regards -Original message- From: Gregory Farnum g...@gregs42.com Sent: Thursday 9th April 2015 15:44 To: Patrik Plank pat...@plank.me Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] cache-tier do not evict On Thu, Apr 9, 2015 at 4:56 AM, Patrik Plank pat...@plank.me wrote: Hi, i have build a cach-tier pool (replica 2) with 3 x 512gb ssd for my kvm pool. these are my settings : ceph osd tier add kvm cache-pool ceph osd tier cache-mode cache-pool writeback ceph osd tier set-overlay kvm cache-pool ceph osd pool set cache-pool hit_set_type bloom ceph osd pool set cache-pool hit_set_count 1 ceph osd pool set cache-pool hit set period 3600 ceph osd pool set cache-pool target_max_bytes 751619276800 ˆ 750 GB. For 3*512GB disks that's too large a target value. ceph osd pool set cache-pool target_max_objects 100 ceph osd pool set cache-pool cache_min_flush_age 1800 ceph osd pool set cache-pool cache_min_evict_age 600 ceph osd pool cache-pool cache_target_dirty_ratio .4 ceph osd pool cache-pool cache target_full_ratio .8 So the problem is, the cache-tier do no evict automatically. If i copy some kvm images to the ceph cluster, the cache osds always run full. Is that normal? Is there a miss configuration? thanks best regards Patrik ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] protocol feature mismatch after upgrading to Hammer
http://people.beocat.cis.ksu.edu/~kylehutson/crushmap On Thu, Apr 9, 2015 at 11:25 AM, Gregory Farnum g...@gregs42.com wrote: Hmmm. That does look right and neither I nor Sage can come up with anything via code inspection. Can you post the actual binary crush map somewhere for download so that we can inspect it with our tools? -Greg On Thu, Apr 9, 2015 at 7:57 AM, Kyle Hutson kylehut...@ksu.edu wrote: Here 'tis: https://dpaste.de/POr1 On Thu, Apr 9, 2015 at 9:49 AM, Gregory Farnum g...@gregs42.com wrote: Can you dump your crush map and post it on pastebin or something? On Thu, Apr 9, 2015 at 7:26 AM, Kyle Hutson kylehut...@ksu.edu wrote: Nope - it's 64-bit. (Sorry, I missed the reply-all last time.) On Thu, Apr 9, 2015 at 9:24 AM, Gregory Farnum g...@gregs42.com wrote: [Re-added the list] Hmm, I'm checking the code and that shouldn't be possible. What's your ciient? (In particular, is it 32-bit? That's the only thing i can think of that might have slipped through our QA.) On Thu, Apr 9, 2015 at 7:17 AM, Kyle Hutson kylehut...@ksu.edu wrote: I did nothing to enable anything else. Just changed my ceph repo from 'giant' to 'hammer', then did 'yum update' and restarted services. On Thu, Apr 9, 2015 at 9:15 AM, Gregory Farnum g...@gregs42.com wrote: Did you enable the straw2 stuff? CRUSHV4 shouldn't be required by the cluster unless you made changes to the layout requiring it. If you did, the clients have to be upgraded to understand it. You could disable all the v4 features; that should let them connect again. -Greg On Thu, Apr 9, 2015 at 7:07 AM, Kyle Hutson kylehut...@ksu.edu wrote: This particular problem I just figured out myself ('ceph -w' was still running from before the upgrade, and ctrl-c and restarting solved that issue), but I'm still having a similar problem on the ceph client: libceph: mon19 10.5.38.20:6789 feature set mismatch, my 2b84a042aca server's 102b84a042aca, missing 1 It appears that even the latest kernel doesn't have support for CEPH_FEATURE_CRUSH_V4 How do I make my ceph cluster backward-compatible with the old cephfs client? On Thu, Apr 9, 2015 at 8:58 AM, Kyle Hutson kylehut...@ksu.edu wrote: I upgraded from giant to hammer yesterday and now 'ceph -w' is constantly repeating this message: 2015-04-09 08:50:26.318042 7f95dbf86700 0 -- 10.5.38.1:0/2037478 10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0 l=1 c=0x7f95e0023670).connect protocol feature mismatch, my 3fff peer 13fff missing 1 It isn't always the same IP for the destination - here's another: 2015-04-09 08:50:20.322059 7f95dc087700 0 -- 10.5.38.1:0/2037478 10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0 l=1 c=0x7f95e002b480).connect protocol feature mismatch, my 3fff peer 13fff missing 1 Some details about our install: We have 24 hosts with 18 OSDs each. 16 per host are spinning disks in an erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions used for a caching tier in front of the EC pool. All 24 hosts are monitors. 4 hosts are mds. We are running cephfs with a client trying to write data over cephfs when we're seeing these messages. Any ideas? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSDs not coming up on one host
On Thu, Apr 09, 2015 at 08:46:07AM -0700, Gregory Farnum wrote: On Thu, Apr 9, 2015 at 8:14 AM, Jacob Reid lists-c...@jacob-reid.co.uk wrote: On Thu, Apr 09, 2015 at 06:43:45AM -0700, Gregory Farnum wrote: You can turn up debugging (debug osd = 10 and debug filestore = 10 are probably enough, or maybe 20 each) and see what comes out to get more information about why the threads are stuck. But just from the log my answer is the same as before, and now I don't trust that controller (or maybe its disks), regardless of what it's admitting to. ;) -Greg Ran with osd and filestore debug both at 20; still nothing jumping out at me. Logfile attached as it got huge fairly quickly, but mostly seems to be the same extra lines. I tried running some test I/O on the drives in question to try and provoke some kind of problem, but they seem fine now... Okay, this is strange. Something very wonky is happening with your scheduler — it looks like these threads are all idle, and they're scheduling wakeups that handle an appreciable amount of time after they're supposed to. For instance: 2015-04-09 15:56:55.953116 7f70a7963700 20 filestore(/var/lib/ceph/osd/osd.15) sync_entry woke after 5.416704 2015-04-09 15:56:55.953153 7f70a7963700 20 filestore(/var/lib/ceph/osd/osd.15) sync_entry waiting for max_interval 5.00 This is the thread that syncs your backing store, and it always sets itself to get woken up at 5-second intervals — but here it took 5.4 seconds, and later on in your log it takes more than 6 seconds. It looks like all the threads which are getting timed out are also idle, but are taking so much longer to wake up than they're set for that they get a timeout warning. There might be some bugs in here where we're expecting wakeups to be more precise than they can be, but these sorts of misses are definitely not normal. Is this server overloaded on the CPU? Have you done something to make the scheduler or wakeups wonky? -Greg CPU load is minimal - the host does nothing but run OSDs and has 8 cores that are all sitting idle with a load average of 0.1. I haven't done anything to scheduling. That was with the debug logging on, if that could be the cause of any delays. A scheduler issue seems possible - I haven't done anything to it, but `time sleep 5` run a few times returns anything spread randomly from 5.002 to 7.1(!) seconds but mostly in the 5.5-6.0 region where it managed fairly consistently 5.2 on the other servers in the cluster and 5.02 on my desktop. I have disabled the CPU power saving mode as the only thing I could think of that might be having an effect on this, and running the same test again gives more sane results... we'll see if this reflects in the OSD logs or not, I guess. If this is the cause, it's probably something that the next version might want to make a specific warning case of detecting. I will keep you updated as to their behaviour now... ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Recovering incomplete PGs with ceph_objectstore_tool
Success! Hopefully my notes from the process will help: In the event of multiple disk failures the cluster could lose PGs. Should this occur it is best to attempt to restart the OSD process and have the drive marked as up+out. Marking the drive as out will cause data to flow off the drive to elsewhere in the cluster. In the event that the ceph-osd process is unable to keep running you could try using the ceph_objectstore_tool program to extract just the damaged PGs and import them into working PGs. Fixing Journals In this particular scenario things were complicated by the fact that ceph_objectstore_tool came out in Giant but we were running Firefly. Not wanting to upgrade the cluster in a degraded state this required that the OSD drives be moved to a different physical machine for repair. This added a lot of steps related to the journals but it wasn't a big deal. That process looks like: On Storage1: stop ceph-osd id=15 ceph-osd -i 15 --flush-journal ls -l /var/lib/ceph/osd/ceph-15/journal Note the journal device UUID then pull the disk and move it to Ithome: rm /var/lib/ceph/osd/ceph-15/journal ceph-osd -i 15 --mkjournal That creates a colocated journal for which to use during the ceph_objectstore_tool commands. Once done then: ceph-osd -i 15 --flush-journal rm /var/lib/ceph/osd/ceph-15/journal Pull the disk and bring it back to Storage1. Then: ln -s /dev/disk/by-partitionuuid/b4f8d911-5ac9-4bf0-a06a-b8492e25a00f /var/lib/ceph/osd/ceph-15/journal ceph-osd -i 15 --mkjournal start ceph-osd id=15 This all won't be needed once the cluster is running Hammer because then there will be an available version of ceph_objectstore_tool on the local machine and you can keep the journals throughout the process. Recovery Process We were missing two PGs, 3.c7 and 3.102. These PGs were hosted on OSD.0 and OSD.15 which were the two disks which failed out of Storage1. The disk for OSD.0 seemed to be a total loss while the disk for OSD.15 was somewhat more cooperative but not in a place to be up and running in the cluster. I took the dying OSD.15 drive and placed it into a new physical machine with a fresh install of Ceph Giant. Using Giant's ceph_objectstore_tool I was able to extract the PGs with a command like: for i in 3.c7 3.102 ; do ceph_objectstore_tool --data /var/lib/ceph/osd/ceph-15 --journal /var/lib/ceph/osd/ceph-15/journal --op export --pgid $i --file ~/${i}.export Once both PGs were successfully exported I attempted to import them into a new temporary OSD following instructions from here. For some reason that didn't work. The OSD was up+in but wasn't backfilling the PGs into the cluster. If you find yourself in this process I would try that first just in case it provides a cleaner process. Considering the above didn't work and we were looking at the possibility of losing the RBD volume (or perhaps worse, the potential of fruitlessly fscking 35TB) I took what I might describe as heroic measures: Running ceph pg dump | grep incomplete 3.c7 0 0 0 0 0 0 0 incomplete 2015-04-02 20:49:32.968841 0'0 15730:17 [15,0] 15 [15,0] 15 13985'54076 2015-03-31 19:14:22.721695 13985'54076 2015-03-31 19:14:22.721695 3.102 0 0 0 0 0 0 0 incomplete 2015-04-02 20:49:32.529594 0'0 15730:21 [0,15] 0 [0,15] 0 13985'53107 2015-03-29 21:17:15.568125 13985'49195 2015-03-24 18:38:08.244769 Then I stopped all OSDs, which blocked all I/O to the cluster, with: stop ceph-osd-all Then I looked for all copies of the PG on all OSDs with: for i in 3.c7 3.102 ; do find /var/lib/ceph/osd/ -maxdepth 3 -type d -name $i ; done | sort -V /var/lib/ceph/osd/ceph-0/current/3.c7_head /var/lib/ceph/osd/ceph-0/current/3.102_head /var/lib/ceph/osd/ceph-3/current/3.c7_head /var/lib/ceph/osd/ceph-13/current/3.102_head /var/lib/ceph/osd/ceph-15/current/3.c7_head /var/lib/ceph/osd/ceph-15/current/3.102_head Then I flushed the journals for all of those OSDs with: for i in 0 3 13 15 ; do ceph-osd -i $i --flush-journal ; done Then I removed all of those drives and moved them (using Journal Fixing above) to Ithome where I used ceph_objectstore_tool to remove all traces of 3.102 and 3.c7: for i in 0 3 13 15 ; do for j in 3.c7 3.102 ; do ceph_objectstore_tool --data /var/lib/ceph/osd/ceph-$i --journal /var/lib/ceph/osd/ceph-$i/journal --op remove --pgid $j ; done ; done Then I imported the PGs onto OSD.0 and OSD.15 with: for i in 0 15 ; do for j in 3.c7 3.102 ; do ceph_objectstore_tool --data /var/lib/ceph/osd/ceph-$i --journal /var/lib/ceph/osd/ceph-$i/journal --op import --file ~/${j}.export ; done ; done for i in 0 15 ; do ceph-osd -i $i --flush-journal rm /var/log/ceph/osd/ceph-$i/journal ; done Then I moved the disks back to Storage1 and started them all back up again. I think that this should have worked but what happened in this case was that OSD.0 didn't start up for some reason. I initially thought that that wouldn't matter because OSD.15 did start and
Re: [ceph-users] low power single disk nodes
I'm skeptical about how well this would work, but a Banana Pi might be a place to start. Like a raspberry pi, but it has a SATA connector: http://www.bananapi.org/ On Thu, Apr 9, 2015 at 3:18 AM, Jerker Nyberg jer...@update.uu.se wrote: Hello ceph users, Is anyone running any low powered single disk nodes with Ceph now? Calxeda seems to be no more according to Wikipedia. I do not think HP moonshot is what I am looking for - I want stand-alone nodes, not server cartridges integrated into server chassis. And I do not want to be locked to a single vendor. I was playing with Raspberry Pi 2 for signage when I thought of my old experiments with Ceph. I am thinking of for example Odroid-C1 or Odroid-XU3 Lite or maybe something with a low-power Intel x64/x86 processor. Together with one SSD or one low power HDD the node could get all power via PoE (via splitter or integrated into board if such boards exist). PoE provide remote power-on power-off even for consumer grade nodes. The cost for a single low power node should be able to compete with traditional PC-servers price per disk. Ceph take care of redundancy. I think simple custom casing should be good enough - maybe just strap or velcro everything on trays in the rack, at least for the nodes with SSD. Kind regards, -- Jerker Nyberg, Uppsala, Sweden. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] protocol feature mismatch after upgrading to Hammer
Here 'tis: https://dpaste.de/POr1 On Thu, Apr 9, 2015 at 9:49 AM, Gregory Farnum g...@gregs42.com wrote: Can you dump your crush map and post it on pastebin or something? On Thu, Apr 9, 2015 at 7:26 AM, Kyle Hutson kylehut...@ksu.edu wrote: Nope - it's 64-bit. (Sorry, I missed the reply-all last time.) On Thu, Apr 9, 2015 at 9:24 AM, Gregory Farnum g...@gregs42.com wrote: [Re-added the list] Hmm, I'm checking the code and that shouldn't be possible. What's your ciient? (In particular, is it 32-bit? That's the only thing i can think of that might have slipped through our QA.) On Thu, Apr 9, 2015 at 7:17 AM, Kyle Hutson kylehut...@ksu.edu wrote: I did nothing to enable anything else. Just changed my ceph repo from 'giant' to 'hammer', then did 'yum update' and restarted services. On Thu, Apr 9, 2015 at 9:15 AM, Gregory Farnum g...@gregs42.com wrote: Did you enable the straw2 stuff? CRUSHV4 shouldn't be required by the cluster unless you made changes to the layout requiring it. If you did, the clients have to be upgraded to understand it. You could disable all the v4 features; that should let them connect again. -Greg On Thu, Apr 9, 2015 at 7:07 AM, Kyle Hutson kylehut...@ksu.edu wrote: This particular problem I just figured out myself ('ceph -w' was still running from before the upgrade, and ctrl-c and restarting solved that issue), but I'm still having a similar problem on the ceph client: libceph: mon19 10.5.38.20:6789 feature set mismatch, my 2b84a042aca server's 102b84a042aca, missing 1 It appears that even the latest kernel doesn't have support for CEPH_FEATURE_CRUSH_V4 How do I make my ceph cluster backward-compatible with the old cephfs client? On Thu, Apr 9, 2015 at 8:58 AM, Kyle Hutson kylehut...@ksu.edu wrote: I upgraded from giant to hammer yesterday and now 'ceph -w' is constantly repeating this message: 2015-04-09 08:50:26.318042 7f95dbf86700 0 -- 10.5.38.1:0/2037478 10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0 l=1 c=0x7f95e0023670).connect protocol feature mismatch, my 3fff peer 13fff missing 1 It isn't always the same IP for the destination - here's another: 2015-04-09 08:50:20.322059 7f95dc087700 0 -- 10.5.38.1:0/2037478 10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0 l=1 c=0x7f95e002b480).connect protocol feature mismatch, my 3fff peer 13fff missing 1 Some details about our install: We have 24 hosts with 18 OSDs each. 16 per host are spinning disks in an erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions used for a caching tier in front of the EC pool. All 24 hosts are monitors. 4 hosts are mds. We are running cephfs with a client trying to write data over cephfs when we're seeing these messages. Any ideas? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Rebuild bucket index
Hello ceph users, Do you know a way to rebuild a bucket index ? I would like to change the num_shards for an existing bucket. If I change this value in bucket meta, the new index objects are well created, but empty (bucket listing return null). It would be nice to be able to recreate the index from the objects. Does anyone have an idea for doing this? Thanks. Laurent Barbe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] protocol feature mismatch after upgrading to Hammer
Nope - it's 64-bit. (Sorry, I missed the reply-all last time.) On Thu, Apr 9, 2015 at 9:24 AM, Gregory Farnum g...@gregs42.com wrote: [Re-added the list] Hmm, I'm checking the code and that shouldn't be possible. What's your ciient? (In particular, is it 32-bit? That's the only thing i can think of that might have slipped through our QA.) On Thu, Apr 9, 2015 at 7:17 AM, Kyle Hutson kylehut...@ksu.edu wrote: I did nothing to enable anything else. Just changed my ceph repo from 'giant' to 'hammer', then did 'yum update' and restarted services. On Thu, Apr 9, 2015 at 9:15 AM, Gregory Farnum g...@gregs42.com wrote: Did you enable the straw2 stuff? CRUSHV4 shouldn't be required by the cluster unless you made changes to the layout requiring it. If you did, the clients have to be upgraded to understand it. You could disable all the v4 features; that should let them connect again. -Greg On Thu, Apr 9, 2015 at 7:07 AM, Kyle Hutson kylehut...@ksu.edu wrote: This particular problem I just figured out myself ('ceph -w' was still running from before the upgrade, and ctrl-c and restarting solved that issue), but I'm still having a similar problem on the ceph client: libceph: mon19 10.5.38.20:6789 feature set mismatch, my 2b84a042aca server's 102b84a042aca, missing 1 It appears that even the latest kernel doesn't have support for CEPH_FEATURE_CRUSH_V4 How do I make my ceph cluster backward-compatible with the old cephfs client? On Thu, Apr 9, 2015 at 8:58 AM, Kyle Hutson kylehut...@ksu.edu wrote: I upgraded from giant to hammer yesterday and now 'ceph -w' is constantly repeating this message: 2015-04-09 08:50:26.318042 7f95dbf86700 0 -- 10.5.38.1:0/2037478 10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0 l=1 c=0x7f95e0023670).connect protocol feature mismatch, my 3fff peer 13fff missing 1 It isn't always the same IP for the destination - here's another: 2015-04-09 08:50:20.322059 7f95dc087700 0 -- 10.5.38.1:0/2037478 10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0 l=1 c=0x7f95e002b480).connect protocol feature mismatch, my 3fff peer 13fff missing 1 Some details about our install: We have 24 hosts with 18 OSDs each. 16 per host are spinning disks in an erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions used for a caching tier in front of the EC pool. All 24 hosts are monitors. 4 hosts are mds. We are running cephfs with a client trying to write data over cephfs when we're seeing these messages. Any ideas? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] installing and updating while leaving osd drive data intact
Referencing this old thread below, I am wondering what is the proper way to install say new versions of ceph and start up daemons but keep all the data on the osd drives. I had been using ceph-deploy new which I guess creates a new cluster fsid. Normally for my testing I had been starting with clean osd drives but I would also like to be able to restart and leave the osd drives as is. -- Tom Hi, I have faced a similar issue. This happens if the ceph disks aren't purged/cleaned completely. Clear of the contents in the /dev/sdb1 device. There is a file named ceph_fsid in the disk which would have the old cluster's fsid. This needs to be deleted for it to work. Hope it helps. Sharmila On Mon, May 26, 2014 at 2:52 PM, JinHwan Hwang calanchue at gmail.com wrote: I'm trying to install ceph 0.80.1 on ubuntu 14.04. All other things goes well except 'activate osd' phase. It tells me they can't find proper fsid when i do 'activate osd'. This is not my first time of installing ceph, and all the process i did was ok when i did on other(though they were ubuntu 12.04 , virtual machines, ceph-emperor) ceph at ceph-mon:~$ ceph-deploy osd activate ceph-osd0:/dev/sdb1 ceph-osd0:/dev/sdc1 ceph-osd1:/dev/sdb1 ceph-osd1:/dev/sdc1 ... [ceph-osd0][WARNIN] ceph-disk: Error: No cluster conf found in /etc/ceph with fsid 05b994a0-20f9-48d7-8d34-107ffcb39e5b .. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSDs not coming up on one host
On Thu, Apr 9, 2015 at 8:14 AM, Jacob Reid lists-c...@jacob-reid.co.uk wrote: On Thu, Apr 09, 2015 at 06:43:45AM -0700, Gregory Farnum wrote: You can turn up debugging (debug osd = 10 and debug filestore = 10 are probably enough, or maybe 20 each) and see what comes out to get more information about why the threads are stuck. But just from the log my answer is the same as before, and now I don't trust that controller (or maybe its disks), regardless of what it's admitting to. ;) -Greg Ran with osd and filestore debug both at 20; still nothing jumping out at me. Logfile attached as it got huge fairly quickly, but mostly seems to be the same extra lines. I tried running some test I/O on the drives in question to try and provoke some kind of problem, but they seem fine now... Okay, this is strange. Something very wonky is happening with your scheduler — it looks like these threads are all idle, and they're scheduling wakeups that handle an appreciable amount of time after they're supposed to. For instance: 2015-04-09 15:56:55.953116 7f70a7963700 20 filestore(/var/lib/ceph/osd/osd.15) sync_entry woke after 5.416704 2015-04-09 15:56:55.953153 7f70a7963700 20 filestore(/var/lib/ceph/osd/osd.15) sync_entry waiting for max_interval 5.00 This is the thread that syncs your backing store, and it always sets itself to get woken up at 5-second intervals — but here it took 5.4 seconds, and later on in your log it takes more than 6 seconds. It looks like all the threads which are getting timed out are also idle, but are taking so much longer to wake up than they're set for that they get a timeout warning. There might be some bugs in here where we're expecting wakeups to be more precise than they can be, but these sorts of misses are definitely not normal. Is this server overloaded on the CPU? Have you done something to make the scheduler or wakeups wonky? -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] low power single disk nodes
Where's the take my money button? On Thu, Apr 9, 2015 at 9:43 AM, Mark Nelson mnel...@redhat.com wrote: How about drives that run Linux with an ARM processor, RAM, and an ethernet port right on the drive? Notice the Ceph logo. :) https://www.hgst.com/science-of-storage/emerging- technologies/open-ethernet-drive-architecture Mark On 04/09/2015 10:37 AM, Scott Laird wrote: Minnowboard Max? 2 atom cores, 1 SATA port, and a real (non-USB) Ethernet port. On Thu, Apr 9, 2015, 8:03 AM p...@philw.com mailto:p...@philw.com p...@philw.com mailto:p...@philw.com wrote: Rather expensive option: Applied Micro X-Gene, overkill for a single disk, and only really available in a development kit format right now. https://www.apm.com/products/__data-center/x-gene-family/x- __c1-development-kits/ https://www.apm.com/products/data-center/x-gene-family/x- c1-development-kits/ Better Option: Ambedded CY7 - 7 nodes in 1U half Depth, 6 positions for SATA disks, and one node with mSATA SSD http://www.ambedded.com.tw/__pt_list.php?CM_ID=20140214001 http://www.ambedded.com.tw/pt_list.php?CM_ID=20140214001 --phil On 09 April 2015 at 15:57 Quentin Hartman qhart...@direwolfdigital.com mailto:qhart...@direwolfdigital.com wrote: I'm skeptical about how well this would work, but a Banana Pi might be a place to start. Like a raspberry pi, but it has a SATA connector: http://www.bananapi.org/ On Thu, Apr 9, 2015 at 3:18 AM, Jerker Nyberg jer...@update.uu.se mailto:jer...@update.uu.se mailto:jer...@update.uu.se mailto:jer...@update.uu.se wrote: Hello ceph users, Is anyone running any low powered single disk nodes with Ceph now? Calxeda seems to be no more according to Wikipedia. I do not think HP moonshot is what I am looking for - I want stand-alone nodes, not server cartridges integrated into server chassis. And I do not want to be locked to a single vendor. I was playing with Raspberry Pi 2 for signage when I thought of my old experiments with Ceph. I am thinking of for example Odroid-C1 or Odroid-XU3 Lite or maybe something with a low-power Intel x64/x86 processor. Together with one SSD or one low power HDD the node could get all power via PoE (via splitter or integrated into board if such boards exist). PoE provide remote power-on power-off even for consumer grade nodes. The cost for a single low power node should be able to compete with traditional PC-servers price per disk. Ceph take care of redundancy. I think simple custom casing should be good enough - maybe just strap or velcro everything on trays in the rack, at least for the nodes with SSD. Kind regards, -- Jerker Nyberg, Uppsala, Sweden. _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com mailto:ceph-us...@lists.ceph.__com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cache-tier do not evict
On Thu, Apr 9, 2015 at 4:56 AM, Patrik Plank pat...@plank.me wrote: Hi, i have build a cach-tier pool (replica 2) with 3 x 512gb ssd for my kvm pool. these are my settings : ceph osd tier add kvm cache-pool ceph osd tier cache-mode cache-pool writeback ceph osd tier set-overlay kvm cache-pool ceph osd pool set cache-pool hit_set_type bloom ceph osd pool set cache-pool hit_set_count 1 ceph osd pool set cache-pool hit set period 3600 ceph osd pool set cache-pool target_max_bytes 751619276800 ^ 750 GB. For 3*512GB disks that's too large a target value. ceph osd pool set cache-pool target_max_objects 100 ceph osd pool set cache-pool cache_min_flush_age 1800 ceph osd pool set cache-pool cache_min_evict_age 600 ceph osd pool cache-pool cache_target_dirty_ratio .4 ceph osd pool cache-pool cache target_full_ratio .8 So the problem is, the cache-tier do no evict automatically. If i copy some kvm images to the ceph cluster, the cache osds always run full. Is that normal? Is there a miss configuration? thanks best regards Patrik ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] use ZFS for OSDs
I had surgery and have been off for a while. Had to rebuild test ceph+openstack cluster with whatever spare parts I had. I apologize for the delay for anyone who's been interested. Here are the results; == Hardware/Software 3 node CEPH cluster, 3 OSDs (one OSD per node) -- CPU = 1x E5-2670 v1 RAM = 8GB OS Disk = 500GB SATA OSD = 900GB 10k SAS (sdc - whole device) Journal = Shared Intel SSD DC3500 80GB (sdb1 - 10GB partition) ZFS log = Shared Intel SSD DC3500 80GB (sdb2 - 4GB partition) ZFS L2ARC = Intel SSD 320 40GB (sdd - whole device) - ceph 0.87 ZoL 0.63 CentOS 7.0 2 node KVM/Openstack cluster CPU = 2x Xeon X5650 RAM = 24 GB OS Disk = 500GB SATA - Ubuntu 14.04 OpenStack Juno the rough performance of this oddball sized test ceph cluster is 8k 1000-1500 IOPS == Compression; (cut out unneeded details) Various Debian and CentOS images, with lots of test SVN and GIT data KVM/OpenStack [root@ceph03 ~]# zfs get all SAS1 NAME PROPERTY VALUE SOURCE SAS1 used 586G - SAS1 compressratio 1.50x - SAS1 recordsize32Klocal SAS1 checksum on default SAS1 compression lz4local SAS1 refcompressratio 1.50x - SAS1 written 586G - SAS1 logicalused 877G - == Dedupe; (dedupe is enabled on a dataset level but can dedupe space savings only be viewed at a pool level - bit odd I know) Various Debian and CentOS images, with lots of test SVN and GIT data KVM/OpenStack [root@ceph01 ~]# zpool get all SAS1 NAME PROPERTY VALUE SOURCE SAS1 size 836G - SAS1 capacity 70%- SAS1 dedupratio 1.02x - SAS1 free 250G - SAS1 allocated 586G - == Bitrot/Corruption; Injected random data to random locations (changed seek to random value) of sdc with; dd if=/dev/urandom of=/dev/sdc seek=54356 bs=4k count=1 Results; 1. ZFS detects error on disk affecting PG files, being as this is a single vdev (no zraid or mirror) it cannot automatically fix. It blocks all(but delete) access to the entire files(inaccessible). *note: I ran this after status after already repairing 2 PGs (5.15 and 5.25), ZFS status will no longer list filename after it has been repaired/deleted/cleared* [root@ceph01 ~]# zpool status -v pool: SAS1 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://zfsonlinux.org/msg/ZFS-8000-8A scan: scrub in progress since Thu Apr 9 13:04:54 2015 153G scanned out of 586G at 40.3M/s, 3h3m to go 0 repaired, 26.05% done config: NAME STATE READ WRITE CKSUM SAS1 ONLINE 0 035 sdc ONLINE 0 070 logs sdb2ONLINE 0 0 0 cache sdd ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: /SAS1/current/5.e_head/DIR_E/DIR_0/DIR_6/rbd\udata.2ba762ae8944a.24cc__head_6153260E__5 2. CEPH-OSD cannot read PG file. Kicks off scrub/deep-scrub /var/log/ceph/ceph-osd.2.log 2015-04-09 13:10:18.319312 7fcbb163a700 -1 log_channel(default) log [ERR] : 5.18 shard 1: soid cd635018/rbd_data.93d1f74b0dc51.18ee/head//5 candidate had a read error, digest 1835988768 != known digest 473354757 2015-04-09 13:11:38.587014 7fcbb1e3b700 -1 log_channel(default) log [ERR] : 5.18 deep-scrub 0 missing, 1 inconsistent objects 2015-04-09 13:11:38.587020 7fcbb1e3b700 -1 log_channel(default) log [ERR] : 5.18 deep-scrub 1 errors /var/log/ceph/ceph-osd.1.log 2015-04-09 13:11:43.640499 7fe10b3c5700 -1 log_channel(default) log [ERR] : 5.25 shard 1: soid 73eb0125/rbd_data.5315b2ae8944a.5348/head//5 candidate had a read error, digest 1522345897 != known digest 1180025616 2015-04-09 13:12:44.781546 7fe10abc4700 -1 log_channel(default) log [ERR] : 5.25 deep-scrub 0 missing, 1 inconsistent objects 2015-04-09 13:12:44.781553 7fe10abc4700 -1 log_channel(default) log [ERR] : 5.25 deep-scrub 1 errors --- 3. CEPH STATUS reports an error --- [root@client01 ~]# ceph status cluster e93ce4d3-3a46-4082-9ec5-e23c82ca616e health HEALTH_WARN 2
Re: [ceph-users] ceph-osd failure following 0.92 - 0.94 upgrade
On Thu, Apr 9, 2015 at 2:05 PM, Dirk Grunwald dirk.grunw...@colorado.edu wrote: Ceph cluster, U14.10 base system, OSD's using BTRFS, journal on same disk as partition (done using ceph-deploy) I had been running 0.92 without (significant) issue. I upgraded to Hammer (0.94) be modifying /etc/apt/sources.list, apt-get update, apt-get upgrade Upgraded and restarted ceph-mon and then ceph-osd Most of the 50 OSD's are in a failure cycle with the error os/Transaction.cc: 504: FAILED assert(ops == data.ops) Right now, the entire cluster is useless because of this. Any suggestions? It looks like maybe it's under the v80.x section instead of general upgrading, but the release notes include: * If you are upgrading specifically from v0.92, you must stop all OSD daemons and flush their journals (``ceph-osd -i NNN --flush-journal``) before upgrading. There was a transaction encoding bug in v0.92 that broke compatibility. Upgrading from v0.93, v0.91, or anything earlier is safe. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Motherboard recommendation?
Hi Markus, X10DRH-CT can support only 16 drive as default. If you want to connect more drive,there is a special SKU for more drive support from super micro or you need additional SAS controller. We are using 2630 V3( 8 core - 2.4GHz) *2 for 30 drives on SM X10DRI-T. It is working perfectly on replication based cluster. If you are planning to use erasure coding, you have to think about higher spec. Does any one know about the exact processor requirement of 30 drives node for erasure coding? . I can't find suitable hardware recommendation for erasure coding. Cheers K.Mohamed Pakkeer On Thu, Apr 9, 2015 at 1:30 PM, Markus Goldberg goldb...@uni-hildesheim.de wrote: Hi, i have a backup-storage with ceph 0,93 As every backup-system it is only been written and hopefully never read. The hardware is 3 Supermicro SC847-cases with 30 SATA-HDDS each (2- and 4-TB-WD-disks) = 250TB I have realized, that the motherboards and CPUs are totally undersized, so i want to install new boards. I'm thinking of the following: 3 Supermicro X10DRH-CT or X10DRC-T4+ with 128GB memory each. What do you think about these boards? Will they fit into the SC847? They have SAS and 10G-Base-T onboard, so no extra controller seems to be necessary. What Xeon-v3 should i take, how many cores? Does anyone know if M.2-SSDs are supported in their pci-e-slots? Thank you very much, Markus -- Markus Goldberg Universität Hildesheim Rechenzentrum Tel +49 5121 88392822 Universitätsplatz 1, D-31141 Hildesheim, Germany Fax +49 5121 88392823 email goldb...@uni-hildesheim.de -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Thanks Regards K.Mohamed Pakkeer Mobile- 0091-8754410114 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Firefly - Giant : CentOS 7 : install failed ceph-deploy
On 04/08/2015 03:00 PM, Travis Rhoden wrote: Hi Vickey, The easiest way I know of to get around this right now is to add the following line in section for epel in /etc/yum.repos.d/epel.repo exclude=python-rados python-rbd So this is what my epel.repo file looks like: http://fpaste.org/208681/ It is those two packages in EPEL that are causing problems. I also tried enabling epel-testing, but that didn't work either. My wild guess is that enabling epel-testing is not enough, because the offending 0.80.7-0.4.el7 build in the stable EPEL repository is still visible to yum. When you set that exclude= parameter in /etc/yum.repos.d/epel.repo, like exclude=python-rados python-rbd python-cephfs, *and* also try --enablerepo=epel-testing, does it work? - Ken ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSDs not coming up on one host
You can turn up debugging (debug osd = 10 and debug filestore = 10 are probably enough, or maybe 20 each) and see what comes out to get more information about why the threads are stuck. But just from the log my answer is the same as before, and now I don't trust that controller (or maybe its disks), regardless of what it's admitting to. ;) -Greg On Thu, Apr 9, 2015 at 1:28 AM, Jacob Reid lists-c...@jacob-reid.co.uk wrote: On Wed, Apr 08, 2015 at 03:42:29PM +, Gregory Farnum wrote: Im on my phone so can't check exactly what those threads are trying to do, but the osd has several threads which are stuck. The FileStore threads are certainly trying to access the disk/local filesystem. You may not have a hardware fault, but it looks like something in your stack is not behaving when the osd asks the filesystem to do something. Check dmesg, etc. -Greg Noticed a bit in dmesg that seems to be controller-related (HP Smart Array P420i) where I/O was hanging in some cases[1]; fixed by updating from 5.42 to 6.00 [1] http://h20564.www2.hp.com/hpsc/doc/public/display?docId=emr_na-c03555882 In dmesg: [11775.779477] hpsa :08:00.0: ABORT REQUEST on C1:B0:T0:L0 Tag:0x:0010 Command:0x2a SN:0x49fb REQUEST SUCCEEDED. [11812.170350] hpsa :08:00.0: Abort request on C1:B0:T0:L0 [11817.386773] hpsa :08:00.0: cp 880522bff000 is reported invalid (probably means target device no longer present) [11817.386784] hpsa :08:00.0: ABORT REQUEST on C1:B0:T0:L0 Tag:0x:0010 Command:0x2a SN:0x4a13 REQUEST SUCCEEDED. The problem still appears to be persisting in the cluster, although I am no longer seeing the disk-related errors in dmesg, I am still getting errors in the osd logs: 2015-04-08 17:24:15.024820 7f0f21e9f700 1 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f0f21e9f700' had timed out after 4 2015-04-08 17:24:15.025043 7f0f2169e700 1 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f0f2169e700' had timed out after 4 2015-04-08 17:48:33.146399 7f0f21e9f700 1 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f0f21e9f700' had timed out after 4 2015-04-08 17:48:33.146439 7f0f2169e700 1 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f0f2169e700' had timed out after 4 2015-04-08 18:55:31.107727 7f0f16740700 1 heartbeat_map reset_timeout 'OSD::recovery_tp thread 0x7f0f16740700' had timed out after 4 2015-04-08 18:55:31.107774 7f0f2169e700 1 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f0f2169e700' had timed out after 4 2015-04-08 18:55:31.107789 7f0f21e9f700 1 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f0f21e9f700' had timed out after 4 2015-04-08 18:55:31.108225 7f0f29eaf700 1 heartbeat_map is_healthy 'OSD::command_tp thread 0x7f0f1573e700' had timed out after 4 2015-04-08 18:55:31.108268 7f0f15f3f700 1 heartbeat_map reset_timeout 'OSD::disk_tp thread 0x7f0f15f3f700' had timed out after 4 2015-04-08 18:55:31.108272 7f0f29eaf700 1 heartbeat_map is_healthy 'OSD::op_tp thread 0x7f0f17742700' had timed out after 4 2015-04-08 18:55:31.108281 7f0f29eaf700 1 heartbeat_map is_healthy 'OSD::op_tp thread 0x7f0f16f41700' had timed out after 4 2015-04-08 18:55:31.108285 7f0f1573e700 1 heartbeat_map reset_timeout 'OSD::command_tp thread 0x7f0f1573e700' had timed out after 4 2015-04-08 18:55:31.108345 7f0f16f41700 1 heartbeat_map reset_timeout 'OSD::op_tp thread 0x7f0f16f41700' had timed out after 4 2015-04-08 18:55:31.108378 7f0f17742700 1 heartbeat_map reset_timeout 'OSD::op_tp thread 0x7f0f17742700' had timed out after 4 2015-04-08 19:01:20.694897 7f0f15f3f700 1 heartbeat_map reset_timeout 'OSD::disk_tp thread 0x7f0f15f3f700' had timed out after 4 2015-04-08 19:01:20.694928 7f0f17742700 1 heartbeat_map reset_timeout 'OSD::op_tp thread 0x7f0f17742700' had timed out after 4 2015-04-08 19:01:20.694970 7f0f16f41700 1 heartbeat_map reset_timeout 'OSD::op_tp thread 0x7f0f16f41700' had timed out after 4 2015-04-08 19:01:20.695544 7f0f1573e700 1 heartbeat_map reset_timeout 'OSD::command_tp thread 0x7f0f1573e700' had timed out after 4 2015-04-08 19:01:20.695665 7f0f16740700 1 heartbeat_map reset_timeout 'OSD::recovery_tp thread 0x7f0f16740700' had timed out after 4 2015-04-08 19:01:34.979288 7f0f1573e700 1 heartbeat_map reset_timeout 'OSD::command_tp thread 0x7f0f1573e700' had timed out after 4 2015-04-08 19:01:34.979498 7f0f21e9f700 1 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f0f21e9f700' had timed out after 4 2015-04-08 19:01:34.979513 7f0f16f41700 1 heartbeat_map reset_timeout 'OSD::op_tp thread 0x7f0f16f41700' had timed out after 4 2015-04-08 19:01:34.979535 7f0f2169e700 1 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f0f2169e700' had timed out after 4 2015-04-08 19:01:34.980021 7f0f15f3f700 1 heartbeat_map reset_timeout 'OSD::disk_tp thread 0x7f0f15f3f700' had timed out after 4 2015-04-08
Re: [ceph-users] protocol feature mismatch after upgrading to Hammer
This particular problem I just figured out myself ('ceph -w' was still running from before the upgrade, and ctrl-c and restarting solved that issue), but I'm still having a similar problem on the ceph client: libceph: mon19 10.5.38.20:6789 feature set mismatch, my 2b84a042aca server's 102b84a042aca, missing 1 It appears that even the latest kernel doesn't have support for CEPH_FEATURE_CRUSH_V4 How do I make my ceph cluster backward-compatible with the old cephfs client? On Thu, Apr 9, 2015 at 8:58 AM, Kyle Hutson kylehut...@ksu.edu wrote: I upgraded from giant to hammer yesterday and now 'ceph -w' is constantly repeating this message: 2015-04-09 08:50:26.318042 7f95dbf86700 0 -- 10.5.38.1:0/2037478 10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0 l=1 c=0x7f95e0023670).connect protocol feature mismatch, my 3fff peer 13fff missing 1 It isn't always the same IP for the destination - here's another: 2015-04-09 08:50:20.322059 7f95dc087700 0 -- 10.5.38.1:0/2037478 10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0 l=1 c=0x7f95e002b480).connect protocol feature mismatch, my 3fff peer 13fff missing 1 Some details about our install: We have 24 hosts with 18 OSDs each. 16 per host are spinning disks in an erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions used for a caching tier in front of the EC pool. All 24 hosts are monitors. 4 hosts are mds. We are running cephfs with a client trying to write data over cephfs when we're seeing these messages. Any ideas? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cache-tier do not evict
What ceph version do you use? Regards, On 9 Apr 2015 18:58, Patrik Plank pat...@plank.me wrote: Hi, i have build a cach-tier pool (replica 2) with 3 x 512gb ssd for my kvm pool. these are my settings : ceph osd tier add kvm cache-pool ceph osd tier cache-mode cache-pool writeback ceph osd tier set-overlay kvm cache-pool ceph osd pool set cache-pool hit_set_type bloom ceph osd pool set cache-pool hit_set_count 1 ceph osd pool set cache-pool hit set period 3600 ceph osd pool set cache-pool target_max_bytes 751619276800 ceph osd pool set cache-pool target_max_objects 100 ceph osd pool set cache-pool cache_min_flush_age 1800 ceph osd pool set cache-pool cache_min_evict_age 600 ceph osd pool cache-pool cache_target_dirty_ratio .4 ceph osd pool cache-pool cache target_full_ratio .8 So the problem is, the cache-tier do no evict automatically. If i copy some kvm images to the ceph cluster, the cache osds always run full. Is that normal? Is there a miss configuration? thanks best regards Patrik ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] low power single disk nodes
Hello ceph users, Is anyone running any low powered single disk nodes with Ceph now? Calxeda seems to be no more according to Wikipedia. I do not think HP moonshot is what I am looking for - I want stand-alone nodes, not server cartridges integrated into server chassis. And I do not want to be locked to a single vendor. I was playing with Raspberry Pi 2 for signage when I thought of my old experiments with Ceph. I am thinking of for example Odroid-C1 or Odroid-XU3 Lite or maybe something with a low-power Intel x64/x86 processor. Together with one SSD or one low power HDD the node could get all power via PoE (via splitter or integrated into board if such boards exist). PoE provide remote power-on power-off even for consumer grade nodes. The cost for a single low power node should be able to compete with traditional PC-servers price per disk. Ceph take care of redundancy. I think simple custom casing should be good enough - maybe just strap or velcro everything on trays in the rack, at least for the nodes with SSD. Kind regards, -- Jerker Nyberg, Uppsala, Sweden. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph Hammer : Ceph-deploy 1.5.23-0 : RGW civetweb :: Not getting installed
Hello Cephers I am trying to setup RGW using Ceph-deploy which is described here http://docs.ceph.com/docs/master/start/quick-ceph-deploy/#add-an-rgw-instance But unfortunately it doesn't seems to be working Is there something i am missing or you know some fix for this. [root@ceph-node1 yum.repos.d]# ceph -v *ceph version 0.94* (e61c4f093f88e44961d157f65091733580cea79a) [root@ceph-node1 yum.repos.d]# # yum update ceph-deploy SKIPPED Verifying : ceph-deploy-1.5.22-0.noarch 2/2 Updated: * ceph-deploy.noarch 0:1.5.23-0* Complete! [root@ceph-node1 ceph]# [root@ceph-node1 ceph]# ceph-deploy rgw create rgw-node1 [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.23): /usr/bin/ceph-deploy rgw create rgw-node1 [ceph_deploy.rgw][DEBUG ] Deploying rgw, cluster ceph hosts rgw-node1:rgw.rgw-node1 *[ceph_deploy][ERROR ] RuntimeError: bootstrap-rgw keyring not found; run 'gatherkeys'* [root@ceph-node1 ceph]# ceph-deploy --overwrite-conf mon create-initial [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.23): /usr/bin/ceph-deploy --overwrite-conf mon create-initial SKIPPED [ceph_deploy.mon][INFO ] mon.ceph-node1 monitor has reached quorum! [ceph_deploy.mon][INFO ] all initial monitors are running and have formed quorum [ceph_deploy.mon][INFO ] Running gatherkeys... [ceph_deploy.gatherkeys][DEBUG ] Have ceph.client.admin.keyring [ceph_deploy.gatherkeys][DEBUG ] Have ceph.mon.keyring [ceph_deploy.gatherkeys][DEBUG ] Have ceph.bootstrap-osd.keyring [ceph_deploy.gatherkeys][DEBUG ] Have ceph.bootstrap-mds.keyring [ceph_deploy.gatherkeys][DEBUG ] Checking ceph-node1 for /var/lib/ceph/bootstrap-rgw/ceph.keyring [ceph-node1][DEBUG ] connected to host: ceph-node1 [ceph-node1][DEBUG ] detect platform information from remote host [ceph-node1][DEBUG ] detect machine type [ceph-node1][DEBUG ] fetch remote file *[ceph_deploy.gatherkeys][WARNIN] Unable to find /var/lib/ceph/bootstrap-rgw/ceph.keyring on ceph-node1* *[ceph_deploy.gatherkeys][WARNIN] No RGW bootstrap key found. Will not be able to deploy RGW daemons* [root@ceph-node1 ceph]# [root@ceph-node1 ceph]# ceph-deploy gatherkeys ceph-node1 [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.23): /usr/bin/ceph-deploy gatherkeys ceph-node1 [ceph_deploy.gatherkeys][DEBUG ] Have ceph.client.admin.keyring [ceph_deploy.gatherkeys][DEBUG ] Have ceph.mon.keyring [ceph_deploy.gatherkeys][DEBUG ] Have ceph.bootstrap-osd.keyring [ceph_deploy.gatherkeys][DEBUG ] Have ceph.bootstrap-mds.keyring [ceph_deploy.gatherkeys][DEBUG ] Checking ceph-node1 for /var/lib/ceph/bootstrap-rgw/ceph.keyring [ceph-node1][DEBUG ] connected to host: ceph-node1 [ceph-node1][DEBUG ] detect platform information from remote host [ceph-node1][DEBUG ] detect machine type [ceph-node1][DEBUG ] fetch remote file *[ceph_deploy.gatherkeys][WARNIN] Unable to find /var/lib/ceph/bootstrap-rgw/ceph.keyring on ceph-node1* *[ceph_deploy.gatherkeys][WARNIN] No RGW bootstrap key found. Will not be able to deploy RGW daemons* [root@ceph-node1 ceph]# Regards VS ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] low power single disk nodes
These are really interesting to me, but how can you buy them? What's the performance like in ceph? Are they using the keyvaluestore backend, or something specific to these drives? Also what kind of chassis do they go into (some kind of ethernet JBOD)? Bryan On 4/9/15, 9:43 AM, Mark Nelson mnel...@redhat.com wrote: How about drives that run Linux with an ARM processor, RAM, and an ethernet port right on the drive? Notice the Ceph logo. :) https://www.hgst.com/science-of-storage/emerging-technologies/open-etherne t-drive-architecture Mark On 04/09/2015 10:37 AM, Scott Laird wrote: Minnowboard Max? 2 atom cores, 1 SATA port, and a real (non-USB) Ethernet port. On Thu, Apr 9, 2015, 8:03 AM p...@philw.com mailto:p...@philw.com p...@philw.com mailto:p...@philw.com wrote: Rather expensive option: Applied Micro X-Gene, overkill for a single disk, and only really available in a development kit format right now. https://www.apm.com/products/__data-center/x-gene-family/x-__c1-developm ent-kits/ https://www.apm.com/products/data-center/x-gene-family/x-c1-development- kits/ Better Option: Ambedded CY7 - 7 nodes in 1U half Depth, 6 positions for SATA disks, and one node with mSATA SSD http://www.ambedded.com.tw/__pt_list.php?CM_ID=20140214001 http://www.ambedded.com.tw/pt_list.php?CM_ID=20140214001 --phil On 09 April 2015 at 15:57 Quentin Hartman qhart...@direwolfdigital.com mailto:qhart...@direwolfdigital.com wrote: I'm skeptical about how well this would work, but a Banana Pi might be a place to start. Like a raspberry pi, but it has a SATA connector: http://www.bananapi.org/ On Thu, Apr 9, 2015 at 3:18 AM, Jerker Nyberg jer...@update.uu.se mailto:jer...@update.uu.se mailto:jer...@update.uu.se mailto:jer...@update.uu.se wrote: Hello ceph users, Is anyone running any low powered single disk nodes with Ceph now? Calxeda seems to be no more according to Wikipedia. I do not think HP moonshot is what I am looking for - I want stand-alone nodes, not server cartridges integrated into server chassis. And I do not want to be locked to a single vendor. I was playing with Raspberry Pi 2 for signage when I thought of my old experiments with Ceph. I am thinking of for example Odroid-C1 or Odroid-XU3 Lite or maybe something with a low-power Intel x64/x86 processor. Together with one SSD or one low power HDD the node could get all power via PoE (via splitter or integrated into board if such boards exist). PoE provide remote power-on power-off even for consumer grade nodes. The cost for a single low power node should be able to compete with traditional PC-servers price per disk. Ceph take care of redundancy. I think simple custom casing should be good enough - maybe just strap or velcro everything on trays in the rack, at least for the nodes with SSD. Kind regards, -- Jerker Nyberg, Uppsala, Sweden. _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com mailto:ceph-us...@lists.ceph.__com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this
Re: [ceph-users] RBD hard crash on kernel 3.10
Thanks for the pointer to the patched kernel. I'll give that a shot. On Thu, Apr 9, 2015, 5:56 AM Ilya Dryomov idryo...@gmail.com wrote: On Wed, Apr 8, 2015 at 5:25 PM, Shawn Edwards lesser.e...@gmail.com wrote: We've been working on a storage repository for xenserver 6.5, which uses the 3.10 kernel (ug). I got the xenserver guys to include the rbd and libceph kernel modules into the 6.5 release, so that's at least available. Where things go bad is when we have many (10 or so) VMs on one host, all using RBD clones for the storage mapped using the rbd kernel module. The Xenserver crashes so badly that it doesn't even get a chance to kernel panic. The whole box just hangs. I'm not very familiar with Xen and ways to debug it but if the problem lies in libceph or rbd kernel modules we'd like to fix it. Perhaps try grabbing a vmcore? If it just hangs and doesn't panic you can normally induce a crash with a sysrq. Has anyone else seen this sort of behavior? We have a lot of ways to try to work around this, but none of them are very pretty: * move the code to user space, ditch the kernel driver: The build tools for Xenserver are all CentOS5 based, and it is painful to get all of the deps built to get the ceph user space libs built. * backport the ceph and rbd kernel modules to 3.10. Has proven painful, as the block device code changed somewhere in the 3.14-3.16 timeframe. https://github.com/ceph/ceph-client/commits/rhel7-3.10.0-123.9.3 branch would be a good start - it has libceph.ko and rbd.ko as of 3.18-rc5 backported to rhel7 (which is based on 3.10) and may be updated in the future as well, although no promises on that. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD hard crash on kernel 3.10
On Wed, Apr 8, 2015 at 5:25 PM, Shawn Edwards lesser.e...@gmail.com wrote: We've been working on a storage repository for xenserver 6.5, which uses the 3.10 kernel (ug). I got the xenserver guys to include the rbd and libceph kernel modules into the 6.5 release, so that's at least available. Where things go bad is when we have many (10 or so) VMs on one host, all using RBD clones for the storage mapped using the rbd kernel module. The Xenserver crashes so badly that it doesn't even get a chance to kernel panic. The whole box just hangs. I'm not very familiar with Xen and ways to debug it but if the problem lies in libceph or rbd kernel modules we'd like to fix it. Perhaps try grabbing a vmcore? If it just hangs and doesn't panic you can normally induce a crash with a sysrq. Has anyone else seen this sort of behavior? We have a lot of ways to try to work around this, but none of them are very pretty: * move the code to user space, ditch the kernel driver: The build tools for Xenserver are all CentOS5 based, and it is painful to get all of the deps built to get the ceph user space libs built. * backport the ceph and rbd kernel modules to 3.10. Has proven painful, as the block device code changed somewhere in the 3.14-3.16 timeframe. https://github.com/ceph/ceph-client/commits/rhel7-3.10.0-123.9.3 branch would be a good start - it has libceph.ko and rbd.ko as of 3.18-rc5 backported to rhel7 (which is based on 3.10) and may be updated in the future as well, although no promises on that. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Firefly - Giant : CentOS 7 : install failed ceph-deploy
Thanks for the help guys , here is my feedback with the tests @Michael Kidd :yum install ceph ceph-common --disablerepo=base --disablerepo=epel Did not worked here are the logs http://fpaste.org/208828/56448714/ @Travis Rhoden : Yep *exclude=python-rados python-rbd* under epel.repo did the trick and i can install Firefly / Giant without errors. Thanks Any idea when this be fixed once for all ( so i no longer to patch epel.repo to exclude python-r*) - VS - On Thu, Apr 9, 2015 at 4:26 AM, Michael Kidd linuxk...@redhat.com wrote: I don't think this came through the first time.. resending.. If it's a dupe, my apologies.. For Firefly / Giant installs, I've had success with the following: yum install ceph ceph-common --disablerepo=base --disablerepo=epel Let us know if this works for you as well. Thanks, Michael J. Kidd Sr. Storage Consultant Inktank Professional Services - by Red Hat On Wed, Apr 8, 2015 at 9:07 PM, Michael Kidd linuxk...@redhat.com wrote: For Firefly / Giant installs, I've had success with the following: yum install ceph ceph-common --disablerepo=base --disablerepo=epel Let us know if this works for you as well. Thanks, Michael J. Kidd Sr. Storage Consultant Inktank Professional Services - by Red Hat On Wed, Apr 8, 2015 at 8:55 PM, Travis Rhoden trho...@gmail.com wrote: I did also confirm that, as Ken mentioned, this is not a problem on Hammer since Hammer includes the package split (python-ceph became python-rados and python-rbd). - Travis On Wed, Apr 8, 2015 at 5:00 PM, Travis Rhoden trho...@gmail.com wrote: Hi Vickey, The easiest way I know of to get around this right now is to add the following line in section for epel in /etc/yum.repos.d/epel.repo exclude=python-rados python-rbd So this is what my epel.repo file looks like: http://fpaste.org/208681/ It is those two packages in EPEL that are causing problems. I also tried enabling epel-testing, but that didn't work either. Unfortunately you would need to add this line on each node where Ceph Giant is being installed. - Travis On Wed, Apr 8, 2015 at 4:11 PM, Vickey Singh vickey.singh22...@gmail.com wrote: Community , need help. -VS- On Wed, Apr 8, 2015 at 4:36 PM, Vickey Singh vickey.singh22...@gmail.com wrote: Any suggestion geeks VS On Wed, Apr 8, 2015 at 2:15 PM, Vickey Singh vickey.singh22...@gmail.com wrote: Hi The below suggestion also didn’t worked Full logs here : http://paste.ubuntu.com/10771939/ [root@rgw-node1 yum.repos.d]# yum --showduplicates list ceph Loaded plugins: fastestmirror, priorities Loading mirror speeds from cached hostfile * base: mirror.zetup.net * epel: ftp.fi.muni.cz * extras: mirror.zetup.net * updates: mirror.zetup.net 25 packages excluded due to repository priority protections Available Packages ceph.x86_64 0.80.6-0.el7.centos Ceph ceph.x86_64 0.80.7-0.el7.centos Ceph ceph.x86_64 0.80.8-0.el7.centos Ceph ceph.x86_64 0.80.9-0.el7.centos Ceph [root@rgw-node1 yum.repos.d]# Its not able to install latest available package , yum is getting confused with other DOT releases. Any other suggestion to fix this ??? -- Processing Dependency: libboost_system-mt.so.1.53.0()(64bit) for package: librbd1-0.80.9-0.el7.centos.x86_64 -- Processing Dependency: libboost_thread-mt.so.1.53.0()(64bit) for package: librbd1-0.80.9-0.el7.centos.x86_64 -- Finished Dependency Resolution Error: Package: librbd1-0.80.7-0.el7.centos.x86_64 (Ceph) Requires: libboost_system-mt.so.1.53.0()(64bit) Error: Package: ceph-0.80.7-0.el7.centos.x86_64 (Ceph) Requires: libboost_system-mt.so.1.53.0()(64bit) Error: Package: ceph-0.80.7-0.el7.centos.x86_64 (Ceph) Requires: libaio.so.1(LIBAIO_0.4)(64bit) Error: Package: ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph) Requires: libboost_thread-mt.so.1.53.0()(64bit) Error: Package: ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph) Requires: librados2 = 0.80.7-0.el7.centos Available: librados2-0.80.6-0.el7.centos.x86_64 (Ceph) librados2 = 0.80.6-0.el7.centos Available: librados2-0.80.7-0.el7.centos.x86_64 (Ceph) librados2 = 0.80.7-0.el7.centos Available: librados2-0.80.8-0.el7.centos.x86_64 (Ceph) librados2 = 0.80.8-0.el7.centos Installing: librados2-0.80.9-0.el7.centos.x86_64 (Ceph) librados2 = 0.80.9-0.el7.centos Error: Package: libcephfs1-0.80.7-0.el7.centos.x86_64 (Ceph) Requires: libboost_thread-mt.so.1.53.0()(64bit) Error: Package: ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph) Requires: python-requests Error: Package: ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph) Requires: librbd1 = 0.80.7-0.el7.centos Available: librbd1-0.80.6-0.el7.centos.x86_64 (Ceph) librbd1 =
Re: [ceph-users] Cascading Failure of OSDs
I use the folowing: cat /sys/class/net/em1/statistics/rx_bytes for the em1 interface all other stats are available Paul Hewlett Senior Systems Engineer Velocix, Cambridge Alcatel-Lucent t: +44 1223 435893 m: +44 7985327353 From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Carl-Johan Schenström [carl-johan.schenst...@gu.se] Sent: 09 April 2015 07:34 To: Francois Lafont; ceph-users@lists.ceph.com Subject: Re: [ceph-users] Cascading Failure of OSDs Francois Lafont wrote: Just in case it could be useful, I have noticed the -s option (on my Ubuntu) that offer an output probably easier to parse: # column -t is just to make it's nice for the human eyes. ifconfig -s | column -t Since ifconfig is deprecated, one should use iproute2 instead. ip -s link show p2p1 | awk '/(RX|TX):/{getline; print $3;}' However, the sysfs interface is probably a better alternative. See https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-class-net-statistics and https://www.kernel.org/doc/Documentation/ABI/README. -- Carl-Johan Schenström Driftansvarig / System Administrator Språkbanken Svensk nationell datatjänst / The Swedish Language Bank Swedish National Data Service Göteborgs universitet / University of Gothenburg carl-johan.schenst...@gu.se / +46 709 116769 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS unmatched rstat after upgrade hammer
On 09/04/2015 17:09, Scottix wrote: Alright sounds good. Only one comment then: From an IT/ops perspective all I see is ERR and that raises red flags. So the exposure of the message might need some tweaking. In production I like to be notified of an issue but have reassurance it was fixed within the system. Fair point. Unfortunately, in general we can't distinguish between inconsistencies we're fixing up due to a known software bug, and inconsistencies that we're encountering for unknown reasons. The reason this is an error rather than a warning is that we handle this case by arbitrarily trusting one statistic when it disagrees with another, so we don't *know* that we've correctly repaired, we just hope. Anyway: the solution is the forthcoming scrub functionality, which will be able to unambiguously repair things like this, and give you a clearer statement about what happened. Cheers, John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com