Re: [ceph-users] Intel SSD D3-S4510 and Intel SSD D3-S4610 firmware advisory notice
Wow!!! пт, 19 апр. 2019 г. в 10:16, Stefan Kooman : > Hi List, > > TL;DR: > > For those of you who are running a Ceph cluster with Intel SSD D3-S4510 > and or Intel SSD D3-S4610 with firmware version XCV10100 please upgrade > to firmware XCV10110 ASAP. At least before ~ 1700 power up hours. > > More information here: > > > https://support.microsoft.com/en-us/help/4499612/intel-ssd-drives-unresponsive-after-1700-idle-hours > > > https://downloadcenter.intel.com/download/28673/SSD-S4510-S4610-2-5-non-searchable-firmware-links/ > > Gr. Stefan > > P.s. Thanks to Frank Dennis (@jedisct1) for retweeting @NerdPyle: > https://twitter.com/jedisct1/status/1118623635072258049 > > > -- > | BIT BV http://www.bit.nl/Kamer van Koophandel 09090351 > | GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Urgent: Reduced data availability / All pgs inactive
Hi, You have problems with MRG. http://docs.ceph.com/docs/master/rados/operations/pg-states/ *The ceph-mgr hasn’t yet received any information about the PG’s state from an OSD since mgr started up.* чт, 21 февр. 2019 г. в 09:04, Irek Fasikhov : > Hi, > > You have problems with MRG. > http://docs.ceph.com/docs/master/rados/operations/pg-states/ > *The ceph-mgr hasn’t yet received any information about the PG’s state > from an OSD since mgr started up.* > > > ср, 20 февр. 2019 г. в 23:10, Ranjan Ghosh : > >> Hi all, >> >> hope someone can help me. After restarting a node of my 2-node-cluster >> suddenly I get this: >> >> root@yak2 /var/www/projects # ceph -s >> cluster: >> id: 749b2473-9300-4535-97a6-ee6d55008a1b >> health: HEALTH_WARN >> Reduced data availability: 200 pgs inactive >> >> services: >> mon: 3 daemons, quorum yak1,yak2,yak0 >> mgr: yak0.planwerk6.de(active), standbys: yak1.planwerk6.de, >> yak2.planwerk6.de >> mds: cephfs-1/1/1 up {0=yak1.planwerk6.de=up:active}, 1 up:standby >> osd: 2 osds: 2 up, 2 in >> >> data: >> pools: 2 pools, 200 pgs >> objects: 0 objects, 0 B >> usage: 0 B used, 0 B / 0 B avail >> pgs: 100.000% pgs unknown >> 200 unknown >> >> And this: >> >> >> root@yak2 /var/www/projects # ceph health detail >> HEALTH_WARN Reduced data availability: 200 pgs inactive >> PG_AVAILABILITY Reduced data availability: 200 pgs inactive >> pg 1.34 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.35 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.36 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.37 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.38 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.39 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.3a is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.3b is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.3c is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.3d is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.3e is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.3f is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.40 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.41 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.42 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.43 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.44 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.45 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.46 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.47 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.48 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.49 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.4a is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.4b is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.4c is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.4d is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.34 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.35 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.36 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.38 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.39 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.3a is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.3b is st
Re: [ceph-users] How to speed up backfill
ceph tell osd.* injectargs '--osd_recovery_delay_start 30' 2018-01-11 10:31 GMT+03:00 shadow_lin: > Hi , > Mine is purely backfilling(remove a osd from the cluster) and it > started at 600Mb/s and ended at about 3MB/s. > How is your recovery made up?Is it backfill or log replay pg recovery > or both? > > 2018-01-11 > -- > shadow_lin > -- > > *发件人:*Josef Zelenka > *发送时间:*2018-01-11 15:26 > *主题:*Re: [ceph-users] How to speed up backfill > *收件人:*"shadow_lin" > *抄送:*"ceph-users" > > > Hi, our recovery slowed down significantly towards the end, however it was > still about five times faster than the original speed.We suspected that > this is caused somehow by threading (more objects transferred - more > threads used), but this is only an assumption. > > On 11/01/18 05:02, shadow_lin wrote: > > Hi, > I had tried these two method and for backfilling it seems only > osd-max-backfills works. > How was your recovery speed when it comes to the last few pgs or objects? > > 2018-01-11 > -- > shadow_lin > -- > > *发件人:*Josef Zelenka > > *发送时间:*2018-01-11 04:53 > *主题:*Re: [ceph-users] How to speed up backfill > *收件人:*"shadow_lin" > *抄送:* > > > Hi, i had the same issue a few days back, i tried playing around with > these two: > > ceph tell 'osd.*' injectargs '--osd-max-backfills ' > ceph tell 'osd.*' injectargs '--osd-recovery-max-active ' > and it helped greatly(increased our recovery speed 20x), but be careful to > not overload your systems. > > > On 10/01/18 17:50, shadow_lin wrote: > > Hi all, > I am playing with setting for backfill to try to find how to control the > speed of backfill. > > Now I only find "osd max backfills" can have effect the backfill speed. > But after all pg need to be backfilled begin backfilling I can't find any > way to speed up backfills. > > Especailly when it comes to the last pg to recover, the speed is only a > few MB/s(when there are multi pg are backfilled the speed could be more > than 600MB/s in my test) > > I am a little confused about the setting of backfills and recovery.Though > backfilling is a kind of recovery but It seems recovery setting is only > about to replay pg logs to do recover pg. > > Would change "osd recovery max active" or other recovery setting have any > effect on backfilling? > > I did tried "osd recovery op priority" and "osd recovery max active" with > no luck. > > Any advice would be greatly appreciated.Thanks > > 2018-01-11 > -- > lin.yunfan > > > ___ > ceph-users mailing > listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Luminous release_type "rc"
Hi No cause for concern: https://github.com/ceph/ceph/pull/17348/commits/2b5f84586ec4d20ebb5aacd6f3c71776c621bf3b 2017-09-26 11:23 GMT+03:00 Stefan Kooman: > Hi, > > I noticed the ceph version still gives "rc" although we are using the > latest Ceph packages: 12.2.0-1xenial > (https://download.ceph.com/debian-luminous xenial/main amd64 Packages): > > ceph daemon mon.mon5 version > {"version":"12.2.0","release":"luminous","release_type":"rc"} > > Why is this important (to me)? I want to make a monitoring check that > ensures we > are running identical, "stable" packages, instead of "beta" / "rc" in > production. > > Gr. Stefan > > > > -- > | BIT BV http://www.bit.nl/Kamer van Koophandel 09090351 > | GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Long OSD restart after upgrade to 10.2.9
Hi, Anton. You need to run the OSD with debug_ms = 1/1 and debug_osd = 20/20 for detailed information. 2017-07-17 8:26 GMT+03:00 Anton Dmitriev: > Hi, all! > > After upgrading from 10.2.7 to 10.2.9 I see that restarting osds by > 'restart ceph-osd id=N' or 'restart ceph-osd-all' takes about 10 minutes > for getting OSD from DOWN to UP. The same situation on all 208 OSDs on 7 > servers. > > Also very long OSD start after rebooting servers. > > Before upgrade it took no more than 2 minutes. > > Does anyone has the same situation like mine? > > > 2017-07-17 08:07:26.895600 7fac2d656840 0 set uid:gid to 4402:4402 > (ceph:ceph) > 2017-07-17 08:07:26.895615 7fac2d656840 0 ceph version 10.2.9 > (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0), process ceph-osd, pid 197542 > 2017-07-17 08:07:26.897018 7fac2d656840 0 pidfile_write: ignore empty > --pid-file > 2017-07-17 08:07:26.906489 7fac2d656840 0 filestore(/var/lib/ceph/osd/ceph-0) > backend xfs (magic 0x58465342) > 2017-07-17 08:07:26.917074 7fac2d656840 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-0) > detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config > option > 2017-07-17 08:07:26.917092 7fac2d656840 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-0) > detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data > hole' config option > 2017-07-17 08:07:26.917112 7fac2d656840 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-0) > detect_features: splice is supported > 2017-07-17 08:07:27.037031 7fac2d656840 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-0) > detect_features: syncfs(2) syscall fully supported (by glibc and kernel) > 2017-07-17 08:07:27.037154 7fac2d656840 0 > xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) > detect_feature: extsize is disabled by conf > 2017-07-17 08:15:17.839072 7fac2d656840 0 filestore(/var/lib/ceph/osd/ceph-0) > mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled > 2017-07-17 08:15:20.150446 7fac2d656840 0 > cls/hello/cls_hello.cc:305: loading cls_hello > 2017-07-17 08:15:20.152483 7fac2d656840 0 > cls/cephfs/cls_cephfs.cc:202: loading cephfs_size_scan > 2017-07-17 08:15:20.210428 7fac2d656840 0 osd.0 224167 crush map has > features 2200130813952, adjusting msgr requires for clients > 2017-07-17 08:15:20.210443 7fac2d656840 0 osd.0 224167 crush map has > features 2200130813952 was 8705, adjusting msgr requires for mons > 2017-07-17 08:15:20.210448 7fac2d656840 0 osd.0 224167 crush map has > features 2200130813952, adjusting msgr requires for osds > 2017-07-17 08:15:58.902173 7fac2d656840 0 osd.0 224167 load_pgs > 2017-07-17 08:16:19.083406 7fac2d656840 0 osd.0 224167 load_pgs opened > 242 pgs > 2017-07-17 08:16:19.083969 7fac2d656840 0 osd.0 224167 using 0 op queue > with priority op cut off at 64. > 2017-07-17 08:16:19.109547 7fac2d656840 -1 osd.0 224167 log_to_monitors > {default=true} > 2017-07-17 08:16:19.522448 7fac2d656840 0 osd.0 224167 done with init, > starting boot process > > -- > Dmitriev Anton > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] qemu-img convert vs rbd import performance
Hi. You need to add to the ceph.conf [client] rbd cache = true rbd readahead trigger requests = 5 rbd readahead max bytes = 419430400 *rbd readahead disable after bytes = 0* rbd_concurrent_management_ops = 50 2017-07-13 15:29 GMT+03:00 Mahesh Jambhulkar: > Seeing some performance issues on my ceph cluster with *qemu-img convert > *directly > writing to ceph against normal rbd import command. > > *Direct data copy (without qemu-img convert) took 5 hours 43 minutes for > 465GB data.* > > > [root@cephlarge vm_res_id_24291e4b-93d2-47ad-80a8-bf3c395319b9_vdb]# time > rbd import 66582225-6539-4e5e-9b7a-59aa16739df1 -p volumes > 66582225-6539-4e5e-9b7a-59aa16739df1_directCopy --image-format 2 > rbd: --pool is deprecated for import, use --dest-pool > Importing image: 100% complete...done. > > real*343m38.028s* > user4m40.779s > sys 7m18.916s > [root@cephlarge vm_res_id_24291e4b-93d2-47ad-80a8-bf3c395319b9_vdb]# rbd > info volumes/66582225-6539-4e5e-9b7a-59aa16739df1_directCopy > rbd image '66582225-6539-4e5e-9b7a-59aa16739df1_directCopy': > size 465 GB in 119081 objects > order 22 (4096 kB objects) > block_name_prefix: rbd_data.373174b0dc51 > format: 2 > features: layering, exclusive-lock, object-map, fast-diff, > deep-flatten > flags: > [root@cephlarge vm_res_id_24291e4b-93d2-47ad-80a8-bf3c395319b9_vdb]# > > > *Qemu-img convert is still in progress and completed merely 10% in more > than 40 hours. (for 465GB data)* > > [root@cephlarge mnt]# time qemu-img convert -p -t none -O raw > /mnt/data/workload_326e8a43-a90a-4fe9-8aab-6d33bcdf5a05/snap > shot_9f0cee13-8200-4562-82ec-1fb9f234bcd8/vm_id_05e9534e- > 5c84-4487-9613-1e0e227e4c1a/vm_res_id_24291e4b-93d2-47ad- > 80a8-bf3c395319b9_vdb/66582225-6539-4e5e-9b7a-59aa16739df1 > rbd:volumes/24291e4b-93d2-47ad-80a8-bf3c395319b9 > (0.00/100%) > > > (10.00/100%) > > > *Rbd bench-write shows speed of ~21MB/s.* > > [root@cephlarge ~]# rbd bench-write image01 --pool=rbdbench > bench-write io_size 4096 io_threads 16 bytes 1073741824 pattern sequential > SEC OPS OPS/SEC BYTES/SEC > 2 6780 3133.53 12834946.35 > 3 6831 1920.65 7866998.17 > 4 8896 2040.50 8357871.83 > 5 13058 2562.61 10496432.34 > 6 17225 2836.78 11619432.99 > 7 20345 2736.84 11210076.25 > 8 23534 3761.57 15407392.94 > 9 25689 3601.35 14751109.98 >10 29670 3391.53 13891695.57 >11 33169 3218.29 13182107.64 >12 36356 3135.34 12842344.21 >13 38431 2972.62 12175863.99 >14 47780 4389.77 17980497.11 >15 55452 5156.40 21120627.26 >16 59298 4772.32 19547440.33 >17 61437 5151.20 21099315.94 >18 67702 5861.64 24009295.97 >19 77086 5895.03 24146032.34 >20 85474 5936.09 24314243.88 >21 93848 7499.73 30718898.25 >22100115 7783.39 31880760.34 >23105405 7524.76 30821410.70 >24111677 6797.12 27841003.78 >25116971 6274.51 25700386.48 >26121156 5468.77 22400087.81 >27126484 5345.83 21896515.02 >28137937 6412.41 26265239.30 >29143229 6347.28 25998461.13 >30149505 6548.76 26823729.97 >31159978 7815.37 32011752.09 >32171431 8821.65 36133479.15 >33181084 8795.28 36025472.27 >35182856 6322.41 25896605.75 >36186891 5592.25 22905872.73 >37190906 4876.30 19973339.07 >38190943 3076.87 12602853.89 >39190974 1536.79 6294701.64 >40195323 2344.75 9604081.07 >41198479 2703.00 11071492.89 >42208893 3918.55 16050365.70 >43214172 4702.42 19261091.89 >44215263 5167.53 21166212.98 >45219435 5392.57 22087961.94 >46225731 5242.85 21474728.85 >47234101 5009.43 20518607.70 >48243529 6326.00 25911280.08 >49254058 7944.90 32542315.10 > elapsed:50 ops: 262144 ops/sec: 5215.19 bytes/sec: 21361431.86 > [root@cephlarge ~]# > > This CEPH deployment has 2 OSDs. > > It would be of great help if anyone can give me pointers. > > -- > Regards, > mahesh j > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] To backup or not to backup the classic way - How to backup hundreds of TB?
Hi. We use Ceph Rados GW S3. And we are very happy :). Each administrator is responsible for its service. Using the following clients S3: Linux - s3cmd, duply; Windows - cloudberry. P.S 500 TB data, 3x replication, 3 datacenter. С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 2017-02-14 12:15 GMT+03:00 Götz Reinicke: > Hi, > > I guess that's a question that pops up in different places, but I could > not find any which fits to my thoughts. > > Currently we start to use ceph for file shares of our films produced by > our students and some xen/vmware VMs. Thd VM data is already backed up; the > fils original footage is stored in other places. > > We start with some 100TB rbd and mount smb/NFS shares from the clients. > May be we look into ceph fs soon. > > The question is: How would someone handle a backup of 100 TB data? > Rsyncing that to an other system or having a commercial backup solution > looks not that good e.g. regarding the price. > > One thought is, is there some sort of best practice in the ceph world e.g. > replicating to an other physical independent cluster? Or use more replicas, > odds, nodes and do snapshots in one cluster? > > Having productive data and backup on the same hardware currently makes me > feel not that good too….But the world changes :) > > Long story short: How do you do backup hundreds of TB? > > Curious for suggestions and thoughts .. Thanks and Regards . Götz > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Migrating data from a Ceph clusters to another
Hi. I recommend using rbd import/export. С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 2017-02-09 11:13 GMT+03:00 林自均: > Hi, > > I have 2 Ceph clusters, cluster A and cluster B. I want to move all the > pools on A to B. The pool names don't conflict between clusters. I guess > it's like RBD mirroring, except that it's pool mirroring. Is there any > proper ways to do it? > > Thanks for any suggestions. > > Best, > John Lin > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Experience with 5k RPM/archive HDDs
Hi, Maxime. Linux SMR is only starting with version 4.9 kernel. С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 2017-02-03 10:26 GMT+03:00 Maxime Guyot: > Hi everyone, > > > > I’m wondering if anyone in the ML is running a cluster with archive type > HDDs, like the HGST Ultrastar Archive (10TB@7.2k RPM) or the Seagate > Enterprise Archive (8TB@5.9k RPM)? > > As far as I read they both fall in the enterprise class HDDs so **might** > be suitable for a low performance, low cost cluster? > > > > Cheers, > > Maxime > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] XFS no space left on device
Привет, Василий. Hi,Vasily. You are busy inode. see "df -i" С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 2016-10-25 15:52 GMT+03:00 Василий Ангапов: > This is a a bit more information about that XFS: > > root@ed-ds-c178:[~]:$ xfs_info /dev/mapper/disk23p1 > meta-data=/dev/mapper/disk23p1 isize=2048 agcount=6, agsize=268435455 > blks > = sectsz=4096 attr=2, projid32bit=1 > = crc=0finobt=0 > data = bsize=4096 blocks=1465130385, imaxpct=5 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0 ftype=0 > log =internal bsize=4096 blocks=521728, version=2 > = sectsz=4096 sunit=1 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > > root@ed-ds-c178:[~]:$ xfs_db /dev/mapper/disk23p1 > xfs_db> frag > actual 25205642, ideal 22794438, fragmentation factor 9.57% > > 2016-10-25 14:59 GMT+03:00 Василий Ангапов : > > Actually all OSDs are already mounted with inode64 option. Otherwise I > > could not write beyond 1TB. > > > > 2016-10-25 14:53 GMT+03:00 Ashley Merrick : > >> Sounds like 32bit Inode limit, if you mount with -o inode64 (not 100% > how you would do in ceph), would allow data to continue to be wrote. > >> > >> ,Ashley > >> > >> -Original Message- > >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf > Of ??? ??? > >> Sent: 25 October 2016 12:38 > >> To: ceph-users > >> Subject: [ceph-users] XFS no space left on device > >> > >> Hello, > >> > >> I got Ceph 10.2.1 cluster with 10 nodes, each having 29 * 6TB OSDs. > >> Yesterday I found that 3 OSDs were down and out with 89% space > utilization. > >> In logs there is: > >> 2016-10-24 22:36:37.599253 7f8309c5e800 0 ceph version 10.2.1 ( > 3a66dd4f30852819c1bdaa8ec23c795d4ad77269), process ceph-osd, pid > >> 2602081 > >> 2016-10-24 22:36:37.600129 7f8309c5e800 0 pidfile_write: ignore empty > --pid-file > >> 2016-10-24 22:36:37.635769 7f8309c5e800 0 > >> filestore(/var/lib/ceph/osd/ceph-123) backend xfs (magic 0x58465342) > >> 2016-10-24 22:36:37.635805 7f8309c5e800 -1 > >> genericfilestorebackend(/var/lib/ceph/osd/ceph-123) detect_features: > >> unable to create /var/lib/ceph/osd/ceph-123/fiemap_test: (28) No space > left on device > >> 2016-10-24 22:36:37.635814 7f8309c5e800 -1 > >> filestore(/var/lib/ceph/osd/ceph-123) _detect_fs: detect_features > >> error: (28) No space left on device > >> 2016-10-24 22:36:37.635818 7f8309c5e800 -1 > >> filestore(/var/lib/ceph/osd/ceph-123) FileStore::mount: error in > >> _detect_fs: (28) No space left on device > >> 2016-10-24 22:36:37.635824 7f8309c5e800 -1 osd.123 0 OSD:init: unable > to mount object store > >> 2016-10-24 22:36:37.635827 7f8309c5e800 -1 ESC[0;31m ** ERROR: osd init > failed: (28) No space left on deviceESC[0m > >> > >> root@ed-ds-c178:[/var/lib/ceph/osd/ceph-123]:$ df -h > /var/lib/ceph/osd/ceph-123 > >> FilesystemSize Used Avail Use% Mounted on > >> /dev/mapper/disk23p1 5.5T 4.9T 651G 89% /var/lib/ceph/osd/ceph-123 > >> > >> root@ed-ds-c178:[/var/lib/ceph/osd/ceph-123]:$ df -i > /var/lib/ceph/osd/ceph-123 > >> Filesystem InodesIUsed IFree IUse% Mounted on > >> /dev/mapper/disk23p1 146513024 22074752 124438272 16% > >> /var/lib/ceph/osd/ceph-123 > >> > >> root@ed-ds-c178:[/var/lib/ceph/osd/ceph-123]:$ touch 123 > >> touch: cannot touch ‘123’: No space left on device > >> > >> root@ed-ds-c178:[/var/lib/ceph/osd/ceph-123]:$ grep ceph-123 > /proc/mounts > >> /dev/mapper/disk23p1 /var/lib/ceph/osd/ceph-123 xfs > rw,noatime,attr2,inode64,noquota 0 0 > >> > >> The same situation is for all three down OSDs. OSD can be unmounted and > mounted without problem: > >> root@ed-ds-c178:[~]:$ umount /var/lib/ceph/osd/ceph-123 > >> root@ed-ds-c178:[~]:$ > root@ed-ds-c178:[~]:$ mount /var/lib/ceph/osd/ceph-123 root@ed-ds-c178:[~]:$ > touch /var/lib/ceph/osd/ceph-123/123 > >> touch: cannot touch ‘/var/lib/ceph/osd/ceph-123/123’: No space left on > device > >> > >> xfs_repair gives no error for FS. > >> > >> Kernel is > >> root@ed-ds-c178:[~]:$ uname -r > >> 4.7.0-1.el7.wg.x86_64 > >> > >> What else can I do to rectify that situation? > >> ___ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] data corruption with hammer
Hi, Nick I switched between forward and writeback. (forward -> writeback) С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 2016-03-17 16:10 GMT+03:00 Nick Fisk <n...@fisk.me.uk>: > > -Original Message- > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > > Irek Fasikhov > > Sent: 17 March 2016 13:00 > > To: Sage Weil <sw...@redhat.com> > > Cc: Robert LeBlanc <robert.lebl...@endurance.com>; ceph-users > us...@lists.ceph.com>; Nick Fisk <n...@fisk.me.uk>; William Perkins > > <william.perk...@endurance.com> > > Subject: Re: [ceph-users] data corruption with hammer > > > > Hi,All. > > > > I confirm the problem. When min_read_recency_for_promote> 1 data > > failure. > > But what scenario is this? Are you switching between forward and > writeback, or just running in writeback? > > > > > > > С уважением, Фасихов Ирек Нургаязович > > Моб.: +79229045757 > > > > 2016-03-17 15:26 GMT+03:00 Sage Weil <sw...@redhat.com>: > > On Thu, 17 Mar 2016, Nick Fisk wrote: > > > There is got to be something else going on here. All that PR does is to > > > potentially delay the promotion to hit_set_period*recency instead of > > > just doing it on the 2nd read regardless, it's got to be uncovering > > > another bug. > > > > > > Do you see the same problem if the cache is in writeback mode before > you > > > start the unpacking. Ie is it the switching mid operation which causes > > > the problem? If it only happens mid operation, does it still occur if > > > you pause IO when you make the switch? > > > > > > Do you also see this if you perform on a RBD mount, to rule out any > > > librbd/qemu weirdness? > > > > > > Do you know if it’s the actual data that is getting corrupted or if > it's > > > the FS metadata? I'm only wondering as unpacking should really only be > > > writing to each object a couple of times, whereas FS metadata could > > > potentially be being updated+read back lots of times for the same group > > > of objects and ordering is very important. > > > > > > Thinking through it logically the only difference is that with > recency=1 > > > the object will be copied up to the cache tier, where recency=6 it will > > > be proxy read for a long time. If I had to guess I would say the issue > > > would lie somewhere in the proxy read + writeback<->forward logic. > > > > That seems reasonable. Was switching from writeback -> forward always > > part of the sequence that resulted in corruption? Not that there is a > > known ordering issue when switching to forward mode. I wouldn't really > > expect it to bite real users but it's possible.. > > > > http://tracker.ceph.com/issues/12814 > > > > I've opened a ticket to track this: > > > > http://tracker.ceph.com/issues/15171 > > > > What would be *really* great is if you could reproduce this with a > > ceph_test_rados workload (from ceph-tests). I.e., get ceph_test_rados > > running, and then find the sequence of operations that are sufficient to > > trigger a failure. > > > > sage > > > > > > > > > > > > > > > > > > > -Original Message- > > > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On > > Behalf Of > > > > Mike Lovell > > > > Sent: 16 March 2016 23:23 > > > > To: ceph-users <ceph-users@lists.ceph.com>; sw...@redhat.com > > > > Cc: Robert LeBlanc <robert.lebl...@endurance.com>; William Perkins > > > > <william.perk...@endurance.com> > > > > Subject: Re: [ceph-users] data corruption with hammer > > > > > > > > just got done with a test against a build of 0.94.6 minus the two > commits > > that > > > > were backported in PR 7207. everything worked as it should with the > > cache- > > > > mode set to writeback and the min_read_recency_for_promote set to 2. > > > > assuming it works properly on master, there must be a commit that > we're > > > > missing on the backport to support this properly. > > > > > > > > sage, > > > > i'm adding you to the recipients on this so hopefully you see it. > the tl;dr > > > > version is that the backport of the cache recency fix to hammer > doesn't > > work > > > > right and potentially corrupts data when > > > > the mi
Re: [ceph-users] data corruption with hammer
Hi,All. I confirm the problem. When min_read_recency_for_promote> 1 data failure. С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 2016-03-17 15:26 GMT+03:00 Sage Weil: > On Thu, 17 Mar 2016, Nick Fisk wrote: > > There is got to be something else going on here. All that PR does is to > > potentially delay the promotion to hit_set_period*recency instead of > > just doing it on the 2nd read regardless, it's got to be uncovering > > another bug. > > > > Do you see the same problem if the cache is in writeback mode before you > > start the unpacking. Ie is it the switching mid operation which causes > > the problem? If it only happens mid operation, does it still occur if > > you pause IO when you make the switch? > > > > Do you also see this if you perform on a RBD mount, to rule out any > > librbd/qemu weirdness? > > > > Do you know if it’s the actual data that is getting corrupted or if it's > > the FS metadata? I'm only wondering as unpacking should really only be > > writing to each object a couple of times, whereas FS metadata could > > potentially be being updated+read back lots of times for the same group > > of objects and ordering is very important. > > > > Thinking through it logically the only difference is that with recency=1 > > the object will be copied up to the cache tier, where recency=6 it will > > be proxy read for a long time. If I had to guess I would say the issue > > would lie somewhere in the proxy read + writeback<->forward logic. > > That seems reasonable. Was switching from writeback -> forward always > part of the sequence that resulted in corruption? Not that there is a > known ordering issue when switching to forward mode. I wouldn't really > expect it to bite real users but it's possible.. > > http://tracker.ceph.com/issues/12814 > > I've opened a ticket to track this: > > http://tracker.ceph.com/issues/15171 > > What would be *really* great is if you could reproduce this with a > ceph_test_rados workload (from ceph-tests). I.e., get ceph_test_rados > running, and then find the sequence of operations that are sufficient to > trigger a failure. > > sage > > > > > > > > > > > > -Original Message- > > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf > Of > > > Mike Lovell > > > Sent: 16 March 2016 23:23 > > > To: ceph-users ; sw...@redhat.com > > > Cc: Robert LeBlanc ; William Perkins > > > > > > Subject: Re: [ceph-users] data corruption with hammer > > > > > > just got done with a test against a build of 0.94.6 minus the two > commits that > > > were backported in PR 7207. everything worked as it should with the > cache- > > > mode set to writeback and the min_read_recency_for_promote set to 2. > > > assuming it works properly on master, there must be a commit that we're > > > missing on the backport to support this properly. > > > > > > sage, > > > i'm adding you to the recipients on this so hopefully you see it. the > tl;dr > > > version is that the backport of the cache recency fix to hammer > doesn't work > > > right and potentially corrupts data when > > > the min_read_recency_for_promote is set to greater than 1. > > > > > > mike > > > > > > On Wed, Mar 16, 2016 at 4:41 PM, Mike Lovell > > > wrote: > > > robert and i have done some further investigation the past couple days > on > > > this. we have a test environment with a hard drive tier and an ssd > tier as a > > > cache. several vms were created with volumes from the ceph cluster. i > did a > > > test in each guest where i un-tarred the linux kernel source multiple > times > > > and then did a md5sum check against all of the files in the resulting > source > > > tree. i started off with the monitors and osds running 0.94.5 and > never saw > > > any problems. > > > > > > a single node was then upgraded to 0.94.6 which has osds in both the > ssd and > > > hard drive tier. i then proceeded to run the same test and, while the > untar > > > and md5sum operations were running, i changed the ssd tier cache-mode > > > from forward to writeback. almost immediately the vms started > reporting io > > > errors and odd data corruption. the remainder of the cluster was > updated to > > > 0.94.6, including the monitors, and the same thing happened. > > > > > > things were cleaned up and reset and then a test was run > > > where min_read_recency_for_promote for the ssd cache pool was set to 1. > > > we previously had it set to 6. there was never an error with the > recency > > > setting set to 1. i then tested with it set to 2 and it immediately > caused > > > failures. we are currently thinking that it is related to the backport > of the fix > > > for the recency promotion and are in progress of making a .6 build > without > > > that backport to see if we can cause corruption. is anyone using a > version > > > from after the original
Re: [ceph-users] ceph 9.2.0 SAMSUNG ssd performance issue?
Hi. You need to read : https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 2016-02-12 10:41 GMT+03:00 Huan Zhang: > Hi, > > ceph VERY SLOW with 24 osd(SAMSUNG ssd). > fio /dev/rbd0 iodepth=1 direct=1 IOPS only ~200 > fio /dev/rbd0 iodepth=32 direct=1 IOPS only ~3000 > > But test single ssd deive with fio: > fio iodepth=1 direct=1 IOPS ~15000 > fio iodepth=32 direct=1 IOPS ~3 > > Why ceph SO SLOW? Could you give me some help? > Appreciated! > > > My Enviroment: > [root@szcrh-controller ~]# ceph -s > cluster eb26a8b9-e937-4e56-a273-7166ffaa832e > health HEALTH_WARN > 1 mons down, quorum 0,1,2,3,4 ceph01,ceph02,ceph03,ceph04, > ceph05 > monmap e1: 6 mons at {ceph01= > > 10.10.204.144:6789/0,ceph02=10.10.204.145:6789/0,ceph03=10.10.204.146:6789/0,ceph04=10.10.204.147:6789/0,ceph05=10.10.204.148:6789/0,ceph06=0.0.0.0:0/5 > } > election epoch 6, quorum 0,1,2,3,4 > ceph01,ceph02,ceph03,ceph04,ceph05 > osdmap e114: 24 osds: 24 up, 24 in > flags sortbitwise > pgmap v2213: 1864 pgs, 3 pools, 49181 MB data, 4485 objects > 144 GB used, 42638 GB / 42782 GB avail > 1864 active+clean > > [root@ceph03 ~]# lsscsi > [0:0:6:0]diskATA SAMSUNG MZ7KM1T9 003Q /dev/sda > [0:0:7:0]diskATA SAMSUNG MZ7KM1T9 003Q /dev/sdb > [0:0:8:0]diskATA SAMSUNG MZ7KM1T9 003Q /dev/sdc > [0:0:9:0]diskATA SAMSUNG MZ7KM1T9 003Q /dev/sdd > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Undersized pgs problem
You have time to synchronize? С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 2015-11-27 15:57 GMT+03:00 Vasiliy Angapov <anga...@gmail.com>: > > It seams that you played around with crushmap, and done something wrong. > > Compare the look of 'ceph osd tree' and crushmap. There are some 'osd' > devices renamed to 'device' think threre is you problem. > Is this a mistake actually? What I did is removed a bunch of OSDs from > my cluster that's why the numeration is sparse. But is it an issue to > a have a sparse numeration of OSDs? > > > Hi. > > Vasiliy, Yes it is a problem with crusmap. Look at height: > > -3 14.56000 host slpeah001 > > -2 14.56000 host slpeah002 > What exactly is wrong here? > > I also found out that my OSD logs are full of such records: > 2015-11-26 08:31:19.273268 7fe4f49b1700 0 cephx: verify_authorizer > could not get service secret for service osd secret_id=2924 > 2015-11-26 08:31:19.273276 7fe4f49b1700 0 -- > 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x41fd1000 > sd=79 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee1a520).accept: got bad > authorizer > 2015-11-26 08:31:24.273207 7fe4f49b1700 0 auth: could not find > secret_id=2924 > 2015-11-26 08:31:24.273225 7fe4f49b1700 0 cephx: verify_authorizer > could not get service secret for service osd secret_id=2924 > 2015-11-26 08:31:24.273231 7fe4f49b1700 0 -- > 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x3f90b000 > sd=79 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee1a3c0).accept: got bad > authorizer > 2015-11-26 08:31:29.273199 7fe4f49b1700 0 auth: could not find > secret_id=2924 > 2015-11-26 08:31:29.273215 7fe4f49b1700 0 cephx: verify_authorizer > could not get service secret for service osd secret_id=2924 > 2015-11-26 08:31:29.273222 7fe4f49b1700 0 -- > 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x41fd1000 > sd=79 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee1a260).accept: got bad > authorizer > 2015-11-26 08:31:34.273469 7fe4f49b1700 0 auth: could not find > secret_id=2924 > 2015-11-26 08:31:34.273482 7fe4f49b1700 0 cephx: verify_authorizer > could not get service secret for service osd secret_id=2924 > 2015-11-26 08:31:34.273486 7fe4f49b1700 0 -- > 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x3f90b000 > sd=79 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee1a100).accept: got bad > authorizer > 2015-11-26 08:31:39.273310 7fe4f49b1700 0 auth: could not find > secret_id=2924 > 2015-11-26 08:31:39.273331 7fe4f49b1700 0 cephx: verify_authorizer > could not get service secret for service osd secret_id=2924 > 2015-11-26 08:31:39.273342 7fe4f49b1700 0 -- > 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x41fcc000 > sd=98 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee19fa0).accept: got bad > authorizer > 2015-11-26 08:31:44.273753 7fe4f49b1700 0 auth: could not find > secret_id=2924 > 2015-11-26 08:31:44.273769 7fe4f49b1700 0 cephx: verify_authorizer > could not get service secret for service osd secret_id=2924 > 2015-11-26 08:31:44.273776 7fe4f49b1700 0 -- > 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x41fcc000 > sd=98 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee189a0).accept: got bad > authorizer > 2015-11-26 08:31:49.273412 7fe4f49b1700 0 auth: could not find > secret_id=2924 > 2015-11-26 08:31:49.273431 7fe4f49b1700 0 cephx: verify_authorizer > could not get service secret for service osd secret_id=2924 > 2015-11-26 08:31:49.273455 7fe4f49b1700 0 -- > 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x41fd1000 > sd=98 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee19080).accept: got bad > authorizer > 2015-11-26 08:31:54.273293 7fe4f49b1700 0 auth: could not find > secret_id=2924 > > What does it mean? Google sais it might be a time sync issue, but my > clocks are perfectly synchronized... > > 2015-11-26 21:05 GMT+08:00 Irek Fasikhov <malm...@gmail.com>: > > Hi. > > Vasiliy, Yes it is a problem with crusmap. Look at height: > > " -3 14.56000 host slpeah001 > > -2 14.56000 host slpeah002 > > " > > > > С уважением, Фасихов Ирек Нургаязович > > Моб.: +79229045757 > > > > 2015-11-26 13:16 GMT+03:00 ЦИТ РТ-Курамшин Камиль Фидаилевич > > <kamil.kurams...@tatar.ru>: > >> > >> It seams that you played around with crushmap, and done something wrong. > >> Compare the look of 'ceph osd tree' and crushmap. There are some 'osd' > >> devices renamed to 'device' think threre is you problem. > >> > >> Отправлено с мобильного устройства. > >> > >> > >> -Original Message- > >> From: Vasiliy Angapov <anga...
Re: [ceph-users] Undersized pgs problem
Hi. Vasiliy, Yes it is a problem with crusmap. Look at height: " -3 14.56000 host slpeah001 -2 14.56000 host slpeah002 " С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 2015-11-26 13:16 GMT+03:00 ЦИТ РТ-Курамшин Камиль Фидаилевич < kamil.kurams...@tatar.ru>: > It seams that you played around with crushmap, and done something wrong. > Compare the look of 'ceph osd tree' and crushmap. There are some 'osd' > devices renamed to 'device' think threre is you problem. > > Отправлено с мобильного устройства. > > > -Original Message- > From: Vasiliy Angapov> To: ceph-users > Sent: чт, 26 нояб. 2015 7:53 > Subject: [ceph-users] Undersized pgs problem > > Hi, colleagues! > > I have small 4-node CEPH cluster (0.94.2), all pools have size 3, min_size > 1. > This night one host failed and cluster was unable to rebalance saying > there are a lot of undersized pgs. > > root@slpeah002:[~]:# ceph -s > cluster 78eef61a-3e9c-447c-a3ec-ce84c617d728 > health HEALTH_WARN > 1486 pgs degraded > 1486 pgs stuck degraded > 2257 pgs stuck unclean > 1486 pgs stuck undersized > 1486 pgs undersized > recovery 80429/555185 <80429555185> objects degraded > (14.487%) > recovery 40079/555185 objects misplaced (7.219%) > 4/20 in osds are down > 1 mons down, quorum 1,2 slpeah002,slpeah007 > monmap e7: 3 mons at > {slpeah001= > 192.168.254.11:6780/0,slpeah002=192.168.254.12:6780/0,slpeah007=172.31.252.46:6789/0} > > election epoch 710, quorum 1,2 slpeah002,slpeah007 > osdmap e14062: 20 osds: 16 up, 20 in; 771 remapped pgs > pgmap v7021316: 4160 pgs, 5 pools, 1045 GB data, 180 kobjects > 3366 GB used, 93471 GB / 96838 GB avail > 80429/555185 <80429555185> objects degraded (14.487%) > 40079/555185 objects misplaced (7.219%) > 1903 active+clean > 1486 active+undersized+degraded > 771 active+remapped > client io 0 B/s rd, 246 kB/s wr, 67 op/s > > root@slpeah002:[~]:# ceph osd tree > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -1 94.63998 root default > -9 32.75999 host slpeah007 > 72 5.45999 osd.72 up 1.0 1.0 > 73 5.45999 osd.73 up 1.0 1.0 > 74 5.45999 osd.74 up 1.0 1.0 > 75 5.45999 osd.75 up 1.0 1.0 > 76 5.45999 osd.76 up 1.0 1.0 > 77 5.45999 osd.77 up 1.0 1.0 > -10 32.75999 host slpeah008 > 78 5.45999 osd.78 up 1.0 1.0 > 79 5.45999 osd.79 up 1.0 1.0 > 80 5.45999 osd.80 up 1.0 1.0 > 81 5.45999 osd.81 up 1.0 1.0 > 82 5.45999 osd.82 up 1.0 1.0 > 83 5.45999 osd.83 up 1.0 1.0 > -3 14.56000 host slpeah001 > 1 3.64000 osd.1 down 1.0 1.0 > 33 3.64000 osd.33down 1.0 1.0 > 34 3.64000 osd.34down 1.0 1.0 > 35 3.64000 osd.35down 1.0 1.0 > -2 14.56000 host slpeah002 > 0 3.64000 osd.0 up 1.0 1.0 > 36 3.64000 osd.36 up 1.0 1.0 > 37 3.64000 osd.37 up 1.0 1.0 > 38 3.64000 osd.38 up 1.0 1.0 > > Crushmap: > > # begin crush map > tunable choose_local_tries 0 > tunable choose_local_fallback_tries 0 > tunable choose_total_tries 50 > tunable chooseleaf_descend_once 1 > tunable chooseleaf_vary_r 1 > tunable straw_calc_version 1 > tunable allowed_bucket_algs 54 > > # devices > device 0 osd.0 > device 1 osd.1 > device 2 device2 > device 3 device3 > device 4 device4 > device 5 device5 > device 6 device6 > device 7 device7 > device 8 device8 > device 9 device9 > device 10 device10 > device 11 device11 > device 12 device12 > device 13 device13 > device 14 device14 > device 15 device15 > device 16 device16 > device 17 device17 > device 18 device18 > device 19 device19 > device 20 device20 > device 21 device21 > device 22 device22 > device 23 device23 > device 24 device24 > device 25 device25 > device 26 device26 > device 27 device27 > device 28 device28 > device 29 device29 > device 30 device30 > device 31 device31 > device 32 device32 > device 33 osd.33 > device 34 osd.34 > device 35 osd.35 > device 36 osd.36 > device 37 osd.37 > device 38 osd.38 > device 39 device39 > device 40 device40 > device 41 device41 > device 42 device42 > device 43 device43 > device 44 device44 > device 45 device45 >
Re: [ceph-users] proxmox 4.0 release : lxc with krbd support and qemu librbd improvements
Hi, Alexandre. Very Very Good! Thank you for your work! :) С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 2015-10-07 7:25 GMT+03:00 Alexandre DERUMIER: > Hi, > > proxmox 4.0 has been released: > > http://forum.proxmox.com/threads/23780-Proxmox-VE-4-0-released! > > > Some ceph improvements : > > - lxc containers with krbd support (multiple disks + snapshots) > - qemu with jemalloc support (improve librbd performance) > - qemu iothread option by disk (improve scaling rbd with multiple disk) > - librbd hammer version > > Regards, > > Alexandre > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Repair inconsistent pgs..
Hi, Igor. You need to repair the PG. for i in `ceph pg dump| grep inconsistent | grep -v 'inconsistent+repair' | awk {'print$1'}`;do ceph pg repair $i;done С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 2015-08-18 8:27 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com: Hi all, at our production cluster, due high rebalancing ((( we have 2 pgs in inconsistent state... root@temp:~# ceph health detail | grep inc HEALTH_ERR 2 pgs inconsistent; 18 scrub errors pg 2.490 is active+clean+inconsistent, acting [56,15,29] pg 2.c4 is active+clean+inconsistent, acting [56,10,42] From OSD logs, after recovery attempt: root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i; do ceph pg repair ${i} ; done dumped all in format plain instructing pg 2.490 on osd.56 to repair instructing pg 2.c4 on osd.56 to repair /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 f5759490/rbd_data.1631755377d7e.04da/head//2 expected clone 90c59490/rbd_data.eb486436f2beb.7a65/141//2 /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected clone f5759490/rbd_data.1631755377d7e.04da/141//2 /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected clone fee49490/rbd_data.12483d3ba0794b.522f/141//2 /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected clone a9b39490/rbd_data.12483d3ba0794b.37b3/141//2 /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected clone bac19490/rbd_data.1238e82ae8944a.032e/141//2 /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected clone 98519490/rbd_data.123e9c2ae8944a.0807/141//2 /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 28809490/rbd_data.edea7460fe42b.01d9/head//2 expected clone c3c09490/rbd_data.1238e82ae8944a.0c2b/141//2 /var/log/ceph/ceph-osd.56.log:58:2015-08-18 07:26:37.036432 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 e1509490/rbd_data.1423897545e146.09a6/head//2 expected clone 28809490/rbd_data.edea7460fe42b.01d9/141//2 /var/log/ceph/ceph-osd.56.log:59:2015-08-18 07:26:38.548765 7f94663b3700 -1 log_channel(cluster) log [ERR] : 2.490 deep-scrub 17 errors So, how i can solve expected clone situation by hand? Thank in advance! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Geographical Replication and Disaster Recovery Support
Hi. This document applies only to RadosGW. You need to read the data document: https://wiki.ceph.com/Planning/Blueprints/Hammer/RBD%3A_Mirroring С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 2015-08-13 11:40 GMT+03:00 Özhan Rüzgar Karaman oruzgarkara...@gmail.com: Hi; I like to learn about Ceph's Geographical Replication and Disaster Recovery Options. I know that currently we do not have a built-in official Geo Replication or disaster recovery, there are some third party tools like drbd but they are not like a solution that business needs. I also read the RGW document at Ceph Wiki Site. https://wiki.ceph.com/Planning/Blueprints/Dumpling/RGW_Geo-Replication_and_Disaster_Recovery The document is from Dumpling Release nearly year 2013. Do we have any active works or efforts to achieve disaster recovery or geographical replication features to Ceph, is it on our current road map? Thanks Özhan KARAMAN ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CEPH cache layer. Very slow
Hi, Igor. Try to roll the patch here: http://www.theirek.com/blog/2014/02/16/patch-dlia-raboty-s-enierghoniezavisimym-keshiem-ssd-diskov P.S. I am no longer tracks changes in this direction(kernel), because we use already recommended SSD С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 2015-08-13 11:56 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com: So, after testing SSD (i wipe 1 SSD, and used it for tests) root@ix-s2:~# sudo fio --filename=/dev/sda --direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based --gr[53/1800] ting --name=journal-test journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1 fio-2.1.3 Starting 1 process Jobs: 1 (f=1): [W] [100.0% done] [0KB/1152KB/0KB /s] [0/288/0 iops] [eta 00m:00s] journal-test: (groupid=0, jobs=1): err= 0: pid=2849460: Thu Aug 13 10:46:42 2015 write: io=68972KB, bw=1149.6KB/s, iops=287, runt= 60001msec clat (msec): min=2, max=15, avg= 3.48, stdev= 1.08 lat (msec): min=2, max=15, avg= 3.48, stdev= 1.08 clat percentiles (usec): | 1.00th=[ 2704], 5.00th=[ 2800], 10.00th=[ 2864], 20.00th=[ 2928], | 30.00th=[ 3024], 40.00th=[ 3088], 50.00th=[ 3280], 60.00th=[ 3408], | 70.00th=[ 3504], 80.00th=[ 3728], 90.00th=[ 3856], 95.00th=[ 4016], | 99.00th=[ 9024], 99.50th=[ 9280], 99.90th=[ 9792], 99.95th=[10048], | 99.99th=[14912] bw (KB /s): min= 1064, max= 1213, per=100.00%, avg=1150.07, stdev=34.31 lat (msec) : 4=94.99%, 10=4.96%, 20=0.05% cpu : usr=0.13%, sys=0.57%, ctx=17248, majf=0, minf=7 IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, =64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% issued: total=r=0/w=17243/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): WRITE: io=68972KB, aggrb=1149KB/s, minb=1149KB/s, maxb=1149KB/s, mint=60001msec, maxt=60001msec Disk stats (read/write): sda: ios=0/17224, merge=0/0, ticks=0/59584, in_queue=59576, util=99.30% So, it's pain... SSD do only 287 iops on 4K... 1,1 MB/s I try to change cache mode : echo temporary write through /sys/class/scsi_disk/2:0:0:0/cache_type echo temporary write through /sys/class/scsi_disk/3:0:0:0/cache_type no luck, still same shit results, also i found this article: https://lkml.org/lkml/2013/11/20/264 pointed to old very simple patch, which disable CMD_FLUSH https://gist.github.com/TheCodeArtist/93dddcd6a21dc81414ba Has everybody better ideas, how to improve this? (or disable CMD_FLUSH without recompile kernel, i used ubuntu and 4.0.4 for now (4.x branch because SSD 850 Pro have issue with NCQ TRIM and before 4.0.4 this exception was not included into libsata.c) 2015-08-12 19:17 GMT+03:00 Pieter Koorts pieter.koo...@me.com: Hi Igor I suspect you have very much the same problem as me. https://www.mail-archive.com/ceph-users@lists.ceph.com/msg22260.html Basically Samsung drives (like many SATA SSD's) are very much hit and miss so you will need to test them like described here to see if they are any good. http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ To give you an idea my average performance went from 11MB/s (with Samsung SSD) to 30MB/s (without any SSD) on write performance. This is a very small cluster. Pieter On Aug 12, 2015, at 04:33 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi all, we have setup CEPH cluster with 60 OSD (2 diff types) (5 nodes, 12 disks on each, 10 HDD, 2 SSD) Also we cover this with custom crushmap with 2 root leaf ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -100 5.0 root ssd -102 1.0 host ix-s2-ssd 2 1.0 osd.2 up 1.0 1.0 9 1.0 osd.9 up 1.0 1.0 -103 1.0 host ix-s3-ssd 3 1.0 osd.3 up 1.0 1.0 7 1.0 osd.7 up 1.0 1.0 -104 1.0 host ix-s5-ssd 1 1.0 osd.1 up 1.0 1.0 6 1.0 osd.6 up 1.0 1.0 -105 1.0 host ix-s6-ssd 4 1.0 osd.4 up 1.0 1.0 8 1.0 osd.8 up 1.0 1.0 -106 1.0 host ix-s7-ssd 0 1.0 osd.0 up 1.0 1.0 5 1.0 osd.5 up 1.0 1.0 -1 5.0 root platter -2 1.0 host ix-s2-platter 13 1.0 osd.13 up 1.0 1.0 17 1.0 osd.17 up 1.0 1.0 21 1.0 osd.21 up 1.0 1.0
Re: [ceph-users] RBD performance slowly degrades :-(
Hi. Read this thread here: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg17360.html С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 2015-08-12 14:52 GMT+03:00 Pieter Koorts pieter.koo...@me.com: Hi Something that's been bugging me for a while is I am trying to diagnose iowait time within KVM guests. Guests doing reads or writes tend do about 50% to 90% iowait but the host itself is only doing about 1% to 2% iowait. So the result is the guests are extremely slow. I currently run 3x hosts each with a single SSD and single HDD OSD in cache-teir writeback mode. Although the SSD (Samsung 850 EVO 120GB) is not a great one it should at least perform reasonably compared to a hard disk and doing some direct SSD tests I get approximately 100MB/s write and 200MB/s read on each SSD. When I run rados bench though, the benchmark starts with a not great but okay speed and as the benchmark progresses it just gets slower and slower till it's worse than a USB hard drive. The SSD cache pool is 120GB in size (360GB RAW) and in use at about 90GB. I have tried tuning the XFS mount options as well but it has had little effect. Understandably the server spec is not great but I don't expect performance to be that bad. *OSD config:* [osd] osd crush update on start = false osd mount options xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M *Servers spec:* Dual Quad Core XEON E5410 and 32GB RAM in each server 10GBE @ 10G speed with 8000byte Jumbo Frames. *Rados bench result:* (starts at 50MB/s average and plummets down to 11MB/s) sudo rados bench -p rbd 50 write --no-cleanup -t 1 Maintaining 1 concurrent writes of 4194304 bytes for up to 50 seconds or 0 objects Object prefix: benchmark_data_osc-mgmt-1_10007 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 11413 51.990652 0.0671911 0.074661 2 12726 51.990852 0.0631836 0.0751152 3 13736 47.992140 0.0691167 0.0802425 4 15150 49.992256 0.0816432 0.0795869 5 15655 43.993420 0.208393 0.088523 6 1616039.99420 0.241164 0.0999179 7 16463 35.993412 0.239001 0.106577 8 16665 32.4942 8 0.214354 0.122767 9 17271 31.5524 0.132588 0.125438 10 17776 30.394820 0.256474 0.128548 11 17978 28.3589 8 0.183564 0.138354 12 18281 26.995612 0.345809 0.145523 13 1858425.84212 0.373247 0.151291 14 18685 24.2819 4 0.950586 0.160694 15 18685 22.6632 0 - 0.160694 16 19089 22.2466 8 0.204714 0.178352 17 19493 21.879116 0.282236 0.180571 18 19897 21.552416 0.262566 0.183742 19 1 101 100 21.049512 0.357659 0.187477 20 1 104 10320.59712 0.369327 0.192479 21 1 105 104 19.8066 4 0.373233 0.194217 22 1 105 104 18.9064 0 - 0.194217 23 1 106 105 18.2582 2 2.35078 0.214756 24 1 107 106 17.6642 4 0.680246 0.219147 25 1 109 108 17.2776 8 0.677688 0.229222 26 1 113 112 17.228316 0.29171 0.230487 27 1 117 116 17.182816 0.255915 0.231101 28 1 120 119 16.997612 0.412411 0.235122 29 1 120 119 16.4115 0 - 0.235122 30 1 120 119 15.8645 0 - 0.235122 31 1 120 119 15.3527 0 - 0.235122 32 1 122 121 15.1229 2 0.319309 0.262822 33 1 124 123 14.9071 8 0.344094 0.266201 34 1 127 126 14.821512 0.33534 0.267913 35 1 129 128 14.6266 8 0.355403 0.269241 36 1 132 131 14.553612 0.581528 0.274327 37 1 132 131 14.1603 0 - 0.274327 38 1 133 132 13.8929 2 1.43621 0.28313 39 1 134 133 13.6392 4 0.894817 0.287729 40 1 134 133 13.2982 0 - 0.287729 41 1
Re: [ceph-users] rbd_cache, limiting read on high iops around 40k
It is already possible to do in proxmox 3.4 (with the latest updates qemu-kvm 2.2.x). But it is necessary to register in the conf file iothread:1. For single drives the ambiguous behavior of productivity. 2015-06-22 10:12 GMT+03:00 Stefan Priebe - Profihost AG s.pri...@profihost.ag: Am 22.06.2015 um 09:08 schrieb Alexandre DERUMIER aderum...@odiso.com: Just an update, there seems to be no proper way to pass iothread parameter from openstack-nova (not at least in Juno release). So a default single iothread per VM is what all we have. So in conclusion a nova instance max iops on ceph rbd will be limited to 30-40K. Thanks for the update. For proxmox users, I have added iothread option to gui for proxmox 4.0 Can we make iothread the default? Does it also help for single disks or only multiple disks? and added jemalloc as default memory allocator I have also send a jemmaloc patch to qemu dev mailing https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg05265.html (Help is welcome to push it in qemu upstream ! ) - Mail original - De: pushpesh sharma pushpesh@gmail.com À: aderumier aderum...@odiso.com Cc: Somnath Roy somnath@sandisk.com, Irek Fasikhov malm...@gmail.com, ceph-devel ceph-de...@vger.kernel.org, ceph-users ceph-users@lists.ceph.com Envoyé: Lundi 22 Juin 2015 07:58:47 Objet: Re: rbd_cache, limiting read on high iops around 40k Just an update, there seems to be no proper way to pass iothread parameter from openstack-nova (not at least in Juno release). So a default single iothread per VM is what all we have. So in conclusion a nova instance max iops on ceph rbd will be limited to 30-40K. On Tue, Jun 16, 2015 at 10:08 PM, Alexandre DERUMIER aderum...@odiso.com wrote: Hi, some news about qemu with tcmalloc vs jemmaloc. I'm testing with multiple disks (with iothreads) in 1 qemu guest. And if tcmalloc is a little faster than jemmaloc, I have hit a lot of time the tcmalloc::ThreadCache::ReleaseToCentralCache bug. increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES, don't help. with multiple disk, I'm around 200k iops with tcmalloc (before hitting the bug) and 350kiops with jemmaloc. The problem is that when I hit malloc bug, I'm around 4000-1 iops, and only way to fix is is to restart qemu ... - Mail original - De: pushpesh sharma pushpesh@gmail.com À: aderumier aderum...@odiso.com Cc: Somnath Roy somnath@sandisk.com, Irek Fasikhov malm...@gmail.com, ceph-devel ceph-de...@vger.kernel.org, ceph-users ceph-users@lists.ceph.com Envoyé: Vendredi 12 Juin 2015 08:58:21 Objet: Re: rbd_cache, limiting read on high iops around 40k Thanks, posted the question in openstack list. Hopefully will get some expert opinion. On Fri, Jun 12, 2015 at 11:33 AM, Alexandre DERUMIER aderum...@odiso.com wrote: Hi, here a libvirt xml sample from libvirt src (you need to define iothreads number, then assign then in disks). I don't use openstack, so I really don't known how it's working with it. domain type='qemu' nameQEMUGuest1/name uuidc7a5fdbd-edaf-9455-926a-d65c16db1809/uuid memory unit='KiB'219136/memory currentMemory unit='KiB'219136/currentMemory vcpu placement='static'2/vcpu iothreads2/iothreads os type arch='i686' machine='pc'hvm/type boot dev='hd'/ /os clock offset='utc'/ on_poweroffdestroy/on_poweroff on_rebootrestart/on_reboot on_crashdestroy/on_crash devices emulator/usr/bin/qemu/emulator disk type='file' device='disk' driver name='qemu' type='raw' iothread='1'/ source file='/var/lib/libvirt/images/iothrtest1.img'/ target dev='vdb' bus='virtio'/ address type='pci' domain='0x' bus='0x00' slot='0x04' function='0x0'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' iothread='2'/ source file='/var/lib/libvirt/images/iothrtest2.img'/ target dev='vdc' bus='virtio'/ /disk controller type='usb' index='0'/ controller type='ide' index='0'/ controller type='pci' index='0' model='pci-root'/ memballoon model='none'/ /devices /domain - Mail original - De: pushpesh sharma pushpesh@gmail.com À: aderumier aderum...@odiso.com Cc: Somnath Roy somnath@sandisk.com, Irek Fasikhov malm...@gmail.com, ceph-devel ceph-de...@vger.kernel.org, ceph-users ceph-users@lists.ceph.com Envoyé: Vendredi 12 Juin 2015 07:52:41 Objet: Re: rbd_cache, limiting read on high iops around 40k Hi Alexandre, I agree with your rational, of one iothread per disk. CPU consumed in IOwait is pretty high in each VM. But I am not finding a way to set the same on a nova instance. I am using openstack Juno with QEMU+KVM. As per libvirt documentation for setting iothreads, I can edit domain.xml directly and achieve the same effect. However in as in openstack env domain xml is created by nova with some additional metadata, so editing
Re: [ceph-users] rbd_cache, limiting read on high iops around 40k
| Proxmox 4.0 will allow to enable|disable 1 iothread by disk. Alexandre, Useful option! In proxmox 3.4 will it be possible to add at least in the configuration file? Or it entails a change in the source code KVM? Thanks. 2015-06-22 11:54 GMT+03:00 Alexandre DERUMIER aderum...@odiso.com: It is already possible to do in proxmox 3.4 (with the latest updates qemu-kvm 2.2.x). But it is necessary to register in the conf file iothread:1. For single drives the ambiguous behavior of productivity. Yes and no ;) Currently in proxmox 3.4, iothread:1 generate only 1 iothread for all disks. So, you'll have a small extra boost, but it'll not scale with multiple disks. Proxmox 4.0 will allow to enable|disable 1 iothread by disk. Does it also help for single disks or only multiple disks? Iothread can also help for single disk, because by default qemu use a main thread for disk but also other things(don't remember what exactly) - Mail original - De: Irek Fasikhov malm...@gmail.com À: Stefan Priebe s.pri...@profihost.ag Cc: aderumier aderum...@odiso.com, pushpesh sharma pushpesh@gmail.com, Somnath Roy somnath@sandisk.com, ceph-devel ceph-de...@vger.kernel.org, ceph-users ceph-users@lists.ceph.com Envoyé: Lundi 22 Juin 2015 09:22:13 Objet: Re: rbd_cache, limiting read on high iops around 40k It is already possible to do in proxmox 3.4 (with the latest updates qemu-kvm 2.2.x). But it is necessary to register in the conf file iothread:1. For single drives the ambiguous behavior of productivity. 2015-06-22 10:12 GMT+03:00 Stefan Priebe - Profihost AG s.pri...@profihost.ag : Am 22.06.2015 um 09:08 schrieb Alexandre DERUMIER aderum...@odiso.com : Just an update, there seems to be no proper way to pass iothread parameter from openstack-nova (not at least in Juno release). So a default single iothread per VM is what all we have. So in conclusion a nova instance max iops on ceph rbd will be limited to 30-40K. Thanks for the update. For proxmox users, I have added iothread option to gui for proxmox 4.0 Can we make iothread the default? Does it also help for single disks or only multiple disks? and added jemalloc as default memory allocator I have also send a jemmaloc patch to qemu dev mailing https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg05265.html (Help is welcome to push it in qemu upstream ! ) - Mail original - De: pushpesh sharma pushpesh@gmail.com À: aderumier aderum...@odiso.com Cc: Somnath Roy somnath@sandisk.com , Irek Fasikhov malm...@gmail.com , ceph-devel ceph-de...@vger.kernel.org , ceph-users ceph-users@lists.ceph.com Envoyé: Lundi 22 Juin 2015 07:58:47 Objet: Re: rbd_cache, limiting read on high iops around 40k Just an update, there seems to be no proper way to pass iothread parameter from openstack-nova (not at least in Juno release). So a default single iothread per VM is what all we have. So in conclusion a nova instance max iops on ceph rbd will be limited to 30-40K. On Tue, Jun 16, 2015 at 10:08 PM, Alexandre DERUMIER aderum...@odiso.com wrote: Hi, some news about qemu with tcmalloc vs jemmaloc. I'm testing with multiple disks (with iothreads) in 1 qemu guest. And if tcmalloc is a little faster than jemmaloc, I have hit a lot of time the tcmalloc::ThreadCache::ReleaseToCentralCache bug. increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES, don't help. with multiple disk, I'm around 200k iops with tcmalloc (before hitting the bug) and 350kiops with jemmaloc. The problem is that when I hit malloc bug, I'm around 4000-1 iops, and only way to fix is is to restart qemu ... - Mail original - De: pushpesh sharma pushpesh@gmail.com À: aderumier aderum...@odiso.com Cc: Somnath Roy somnath@sandisk.com , Irek Fasikhov malm...@gmail.com , ceph-devel ceph-de...@vger.kernel.org , ceph-users ceph-users@lists.ceph.com Envoyé: Vendredi 12 Juin 2015 08:58:21 Objet: Re: rbd_cache, limiting read on high iops around 40k Thanks, posted the question in openstack list. Hopefully will get some expert opinion. On Fri, Jun 12, 2015 at 11:33 AM, Alexandre DERUMIER aderum...@odiso.com wrote: Hi, here a libvirt xml sample from libvirt src (you need to define iothreads number, then assign then in disks). I don't use openstack, so I really don't known how it's working with it. domain type='qemu' nameQEMUGuest1/name uuidc7a5fdbd-edaf-9455-926a-d65c16db1809/uuid memory unit='KiB'219136/memory currentMemory unit='KiB'219136/currentMemory vcpu placement='static'2/vcpu iothreads2/iothreads os type arch='i686' machine='pc'hvm/type boot dev='hd'/ /os clock offset='utc'/ on_poweroffdestroy/on_poweroff on_rebootrestart/on_reboot on_crashdestroy/on_crash devices emulator/usr/bin/qemu/emulator disk type='file' device='disk' driver
Re: [ceph-users] rbd_cache, limiting read on high iops around 40k
If necessary, there are RPM files for centos 7: gperftools.spec https://drive.google.com/file/d/0BxoNLVWxzOJWaVVmWTA3Z18zbUE/edit?usp=drive_web pprof-2.4-1.el7.centos.noarch.rpm https://drive.google.com/file/d/0BxoNLVWxzOJWRmQ2ZEt6a1pnSVk/edit?usp=drive_web gperftools-libs-2.4-1.el7.centos.x86_64.rpm https://drive.google.com/file/d/0BxoNLVWxzOJWcVByNUZHWWJqRXc/edit?usp=drive_web gperftools-devel-2.4-1.el7.centos.x86_64.rpm https://drive.google.com/file/d/0BxoNLVWxzOJWYTUzQTNha3J3NEU/edit?usp=drive_web gperftools-debuginfo-2.4-1.el7.centos.x86_64.rpm https://drive.google.com/file/d/0BxoNLVWxzOJWVzBic043YUk2LWM/edit?usp=drive_web gperftools-2.4-1.el7.centos.x86_64.rpm https://drive.google.com/file/d/0BxoNLVWxzOJWNm81QWdQYU9ZaG8/edit?usp=drive_web 2015-06-17 8:01 GMT+03:00 Alexandre DERUMIER aderum...@odiso.com: Hi, I finally fix it with tcmalloc with TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=268435456 LD_PRELOAD} = /usr/lib/libtcmalloc_minimal.so.4 qemu I got almost same result than jemmaloc in this case, maybe a littleb it faster Here the iops results for 1qemu vm with iothread by disk (iodepth=32, 4krandread, nocache) qemu randread 4k nocache libc6 iops 1 disk 29052 2 disks 55878 4 disks 127899 8 disks 240566 15 disks269976 qemu randread 4k nocache jemmaloc iops 1 disk 41278 2 disks 75781 4 disks 195351 8 disks 294241 15 disks 298199 qemu randread 4k nocache tcmalloc 16M cache iops 1 disk 37911 2 disks 67698 4 disks 41076 8 disks 43312 15 disks 37569 qemu randread 4k nocache tcmalloc patched 256M iops 1 disk no-iothread 1 disk 42160 2 disks 83135 4 disks 194591 8 disks 306038 15 disks 302278 - Mail original - De: aderumier aderum...@odiso.com À: Mark Nelson mnel...@redhat.com Cc: ceph-users ceph-users@lists.ceph.com Envoyé: Mardi 16 Juin 2015 20:27:54 Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k I forgot to ask, is this with the patched version of tcmalloc that theoretically fixes the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES issue? Yes, the patched version of tcmalloc, but also the last version from gperftools git. (I'm talking about qemu here, not osds). I have tried to increased TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES, but it doesn't help. For osd, increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES is helping. (Benchs are still running, I try to overload them as much as possible) - Mail original - De: Mark Nelson mnel...@redhat.com À: ceph-users ceph-users@lists.ceph.com Envoyé: Mardi 16 Juin 2015 19:04:27 Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k I forgot to ask, is this with the patched version of tcmalloc that theoretically fixes the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES issue? Mark On 06/16/2015 11:46 AM, Mark Nelson wrote: Hi Alexandre, Excellent find! Have you also informed the QEMU developers of your discovery? Mark On 06/16/2015 11:38 AM, Alexandre DERUMIER wrote: Hi, some news about qemu with tcmalloc vs jemmaloc. I'm testing with multiple disks (with iothreads) in 1 qemu guest. And if tcmalloc is a little faster than jemmaloc, I have hit a lot of time the tcmalloc::ThreadCache::ReleaseToCentralCache bug. increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES, don't help. with multiple disk, I'm around 200k iops with tcmalloc (before hitting the bug) and 350kiops with jemmaloc. The problem is that when I hit malloc bug, I'm around 4000-1 iops, and only way to fix is is to restart qemu ... - Mail original - De: pushpesh sharma pushpesh@gmail.com À: aderumier aderum...@odiso.com Cc: Somnath Roy somnath@sandisk.com, Irek Fasikhov malm...@gmail.com, ceph-devel ceph-de...@vger.kernel.org, ceph-users ceph-users@lists.ceph.com Envoyé: Vendredi 12 Juin 2015 08:58:21 Objet: Re: rbd_cache, limiting read on high iops around 40k Thanks, posted the question in openstack list. Hopefully will get some expert opinion. On Fri, Jun 12, 2015 at 11:33 AM, Alexandre DERUMIER aderum...@odiso.com wrote: Hi, here a libvirt xml sample from libvirt src (you need to define iothreads number, then assign then in disks). I don't use openstack, so I really don't known how it's working with it. domain type='qemu' nameQEMUGuest1/name uuidc7a5fdbd-edaf-9455-926a-d65c16db1809/uuid memory unit='KiB'219136/memory currentMemory unit='KiB'219136/currentMemory vcpu placement='static'2/vcpu iothreads2/iothreads os type arch='i686' machine='pc'hvm/type boot dev='hd'/ /os clock offset='utc'/ on_poweroffdestroy/on_poweroff on_rebootrestart/on_reboot on_crashdestroy/on_crash devices emulator/usr/bin/qemu/emulator disk type='file' device='disk' driver name='qemu' type='raw' iothread='1'/ source file='/var/lib/libvirt/images/iothrtest1
Re: [ceph-users] [Fwd: adding a a monitor wil result in cephx: verify_reply couldn't decrypt with error: error decoding block for decryption]
It is necessary to synchronize time 2015-06-11 11:09 GMT+03:00 Makkelie, R (ITCDCC) - KLM ramon.makke...@klm.com: i'm trying to add a extra monitor to my already existing cluster i do this with the ceph-deploy with the following command ceph-deploy mon add mynewhost the ceph-deploy says its all finished but when i take a look at my new monitor host in the logs i see the following error cephx: verify_reply couldn't decrypt with error: error decoding block for decryption and when i take a look in my existing monitor logs i see this error cephx: verify_authorizer could not decrypt ticket info: error: NSS AES final round failed: -8190 i tried gatherking key's copy keys reinstall/purge the new monitor node greetz Ramon For information, services and offers, please visit our web site: http://www.klm.com. This e-mail and any attachment may contain confidential and privileged material intended for the addressee only. If you are not the addressee, you are notified that no part of the e-mail or any attachment may be disclosed, copied or distributed, and that any other action related to this e-mail or attachment is strictly prohibited, and may be unlawful. If you have received this e-mail by error, please notify the sender immediately by return e-mail, and delete this message. Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its employees shall not be liable for the incorrect or incomplete transmission of this e-mail or any attachments, nor responsible for any delay in receipt. Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch Airlines) is registered in Amstelveen, The Netherlands, with registered number 33014286 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [Fwd: adding a a monitor wil result in cephx: verify_reply couldn't decrypt with error: error decoding block for decryption]
Hands follow command: ntpdate NTPADDRESS 2015-06-11 12:36 GMT+03:00 Makkelie, R (ITCDCC) - KLM ramon.makke...@klm.com: all ceph releated servers have the same NTP server and double checked the time and timezones the are all correct -Original Message- *From*: Irek Fasikhov malm...@gmail.com irek%20fasikhov%20%3cmalm...@gmail.com%3e *To*: Makkelie, R (ITCDCC) - KLM ramon.makke...@klm.com %22Makkelie,%20r%20%28itcdcc%29%20-%20klm%22%20%3cramon.makke...@klm.com%3e *Cc*: ceph-users@lists.ceph.com ceph-users@lists.ceph.com %22ceph-us...@lists.ceph.com%22%20%3cceph-us...@lists.ceph.com%3e *Subject*: Re: [ceph-users] [Fwd: adding a a monitor wil result in cephx: verify_reply couldn't decrypt with error: error decoding block for decryption] *Date*: Thu, 11 Jun 2015 12:16:53 +0300 It is necessary to synchronize time 2015-06-11 11:09 GMT+03:00 Makkelie, R (ITCDCC) - KLM ramon.makke...@klm.com: i'm trying to add a extra monitor to my already existing cluster i do this with the ceph-deploy with the following command ceph-deploy mon add mynewhost the ceph-deploy says its all finished but when i take a look at my new monitor host in the logs i see the following error cephx: verify_reply couldn't decrypt with error: error decoding block for decryption and when i take a look in my existing monitor logs i see this error cephx: verify_authorizer could not decrypt ticket info: error: NSS AES final round failed: -8190 i tried gatherking key's copy keys reinstall/purge the new monitor node greetz Ramon For information, services and offers, please visit our web site: http://www.klm.com. This e-mail and any attachment may contain confidential and privileged material intended for the addressee only. If you are not the addressee, you are notified that no part of the e-mail or any attachment may be disclosed, copied or distributed, and that any other action related to this e-mail or attachment is strictly prohibited, and may be unlawful. If you have received this e-mail by error, please notify the sender immediately by return e-mail, and delete this message. Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its employees shall not be liable for the incorrect or incomplete transmission of this e-mail or any attachments, nor responsible for any delay in receipt. Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch Airlines) is registered in Amstelveen, The Netherlands, with registered number 33014286 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 For information, services and offers, please visit our web site: http://www.klm.com. This e-mail and any attachment may contain confidential and privileged material intended for the addressee only. If you are not the addressee, you are notified that no part of the e-mail or any attachment may be disclosed, copied or distributed, and that any other action related to this e-mail or attachment is strictly prohibited, and may be unlawful. If you have received this e-mail by error, please notify the sender immediately by return e-mail, and delete this message. Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its employees shall not be liable for the incorrect or incomplete transmission of this e-mail or any attachments, nor responsible for any delay in receipt. Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch Airlines) is registered in Amstelveen, The Netherlands, with registered number 33014286 -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd_cache, limiting read on high iops around 40k
Hi, Alexandre. Very good work! Do you have a rpm-file? Thanks. 2015-06-10 7:10 GMT+03:00 Alexandre DERUMIER aderum...@odiso.com: Hi, I have tested qemu with last tcmalloc 2.4, and the improvement is huge with iothread: 50k iops (+45%) ! qemu : no iothread : glibc : iops=33395 qemu : no-iothread : tcmalloc (2.2.1) : iops=34516 (+3%) qemu : no-iothread : jemmaloc : iops=42226 (+26%) qemu : no-iothread : tcmalloc (2.4) : iops=35974 (+7%) qemu : iothread : glibc : iops=34516 qemu : iothread : tcmalloc : iops=38676 (+12%) qemu : iothread : jemmaloc : iops=28023 (-19%) qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%) qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%) -- rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 fio-2.1.11 Starting 1 process Jobs: 1 (f=1): [r(1)] [100.0% done] [214.7MB/0KB/0KB /s] [54.1K/0/0 iops] [eta 00m:00s] rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=894: Wed Jun 10 05:54:24 2015 read : io=5120.0MB, bw=201108KB/s, iops=50276, runt= 26070msec slat (usec): min=1, max=1136, avg= 3.54, stdev= 3.58 clat (usec): min=128, max=6262, avg=631.41, stdev=197.71 lat (usec): min=149, max=6265, avg=635.27, stdev=197.40 clat percentiles (usec): | 1.00th=[ 318], 5.00th=[ 378], 10.00th=[ 418], 20.00th=[ 474], | 30.00th=[ 516], 40.00th=[ 564], 50.00th=[ 612], 60.00th=[ 652], | 70.00th=[ 700], 80.00th=[ 756], 90.00th=[ 860], 95.00th=[ 980], | 99.00th=[ 1272], 99.50th=[ 1384], 99.90th=[ 1688], 99.95th=[ 1896], | 99.99th=[ 3760] bw (KB /s): min=145608, max=249688, per=100.00%, avg=201108.00, stdev=21718.87 lat (usec) : 250=0.04%, 500=25.84%, 750=53.00%, 1000=16.63% lat (msec) : 2=4.46%, 4=0.03%, 10=0.01% cpu : usr=9.73%, sys=24.93%, ctx=66417, majf=0, minf=38 IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, =64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, =64=0.0% issued: total=r=1310720/w=0/d=0, short=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=32 Run status group 0 (all jobs): READ: io=5120.0MB, aggrb=201107KB/s, minb=201107KB/s, maxb=201107KB/s, mint=26070msec, maxt=26070msec Disk stats (read/write): vdb: ios=1302555/0, merge=0/0, ticks=715176/0, in_queue=714840, util=99.73% rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 fio-2.1.11 Starting 1 process Jobs: 1 (f=1): [r(1)] [100.0% done] [158.7MB/0KB/0KB /s] [40.6K/0/0 iops] [eta 00m:00s] rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=889: Wed Jun 10 06:05:06 2015 read : io=5120.0MB, bw=143897KB/s, iops=35974, runt= 36435msec slat (usec): min=1, max=710, avg= 3.31, stdev= 3.35 clat (usec): min=191, max=4740, avg=884.66, stdev=315.65 lat (usec): min=289, max=4743, avg=888.31, stdev=315.51 clat percentiles (usec): | 1.00th=[ 462], 5.00th=[ 516], 10.00th=[ 548], 20.00th=[ 596], | 30.00th=[ 652], 40.00th=[ 764], 50.00th=[ 868], 60.00th=[ 940], | 70.00th=[ 1004], 80.00th=[ 1096], 90.00th=[ 1256], 95.00th=[ 1416], | 99.00th=[ 2024], 99.50th=[ 2224], 99.90th=[ 2544], 99.95th=[ 2640], | 99.99th=[ 3632] bw (KB /s): min=98352, max=177328, per=99.91%, avg=143772.11, stdev=21782.39 lat (usec) : 250=0.01%, 500=3.48%, 750=35.69%, 1000=30.01% lat (msec) : 2=29.74%, 4=1.07%, 10=0.01% cpu : usr=7.10%, sys=16.90%, ctx=54855, majf=0, minf=38 IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, =64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, =64=0.0% issued: total=r=1310720/w=0/d=0, short=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=32 Run status group 0 (all jobs): READ: io=5120.0MB, aggrb=143896KB/s, minb=143896KB/s, maxb=143896KB/s, mint=36435msec, maxt=36435msec Disk stats (read/write): vdb: ios=1301357/0, merge=0/0, ticks=1033036/0, in_queue=1032716, util=99.85% - Mail original - De: aderumier aderum...@odiso.com À: Robert LeBlanc rob...@leblancnet.us Cc: Mark Nelson mnel...@redhat.com, ceph-devel ceph-de...@vger.kernel.org, pushpesh sharma pushpesh@gmail.com, ceph-users ceph-users@lists.ceph.com Envoyé: Mardi 9 Juin 2015 18:47:27 Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k Hi Robert, What I found was that Ceph OSDs performed well with either tcmalloc or jemalloc (except when RocksDB was built with jemalloc instead of tcmalloc, I'm still working to dig into why that might be the case). yes,from my test, for osd tcmalloc is a little faster (but very
Re: [ceph-users] HEALTH_WARN 6 requests are blocked
Patrick, At the moment, you do not have any problems related to the slow query. 2015-05-12 8:56 GMT+03:00 Patrik Plank pat...@plank.me: So ok, understand. But what can I do if the scrubbing process hangs by one page since last night: root@ceph01:~# ceph health detail HEALTH_OK root@ceph01:~# ceph pg dump | grep scrub pg_statobjectsmipdegrmispunfbyteslog disklogstatestate_stampvreportedupup_primary actingacting_primarylast_scrubscrub_stamplast_deep_scrub deep_scrub_stamp 2.5cb1010000423620608324324 active+clean+scrubbing+deep2015-05-11 23:01:37.0567474749'324 4749:6524[14,10]14[14,10]144749'3182015-05-10 22:05:29.2528763423'3092015-05-04 21:44:46.609791 Perhaps an idea? best regards -Original message- *From:* Irek Fasikhov malm...@gmail.com *Sent:* Tuesday 12th May 2015 7:49 *To:* Patrik Plank pat...@plank.me; ceph-users@lists.ceph.com *Subject:* Re: [ceph-users] HEALTH_WARN 6 requests are blocked Scrubbing greatly affects the I / O and can slow queries on OSD. For more information, look in the 'ceph health detail' and 'ceph pg dump | grep scrub' 2015-05-12 8:42 GMT+03:00 Patrik Plank pat...@plank.me: Hi, is that the reason for the Health Warn or the scrubbing notification? thanks regards -Original message- *From:* Irek Fasikhov malm...@gmail.com *Sent:* Tuesday 12th May 2015 7:33 *To:* Patrik Plank pat...@plank.me *Cc:* ceph-users@lists.ceph.com ceph-users@lists.ceph.com ceph-users@lists.ceph.com *Subject:* Re: [ceph-users] HEALTH_WARN 6 requests are blocked Hi, Patrik. You must configure the priority of the I / O for scrubbing. http://dachary.org/?p=3268 2015-05-12 8:03 GMT+03:00 Patrik Plank pat...@plank.me: Hi, the ceph cluster shows always the scrubbing notifications, although he do not scrub. And what does the Health Warn mean. Does anybody have an idea why the warning is displayed. How can I solve this? cluster 78227661-3a1b-4e56-addc-c2a272933ac2 health HEALTH_WARN 6 requests are blocked 32 sec monmap e3: 3 mons at {ceph01= 10.0.0.20:6789/0,ceph02=10.0.0.21:6789/0,ceph03=10.0.0.22:6789/0}, election epoch 92, quorum 0,1,2 ceph01,ceph02,ceph03 osdmap e4749: 30 osds: 30 up, 30 in pgmap v2321129: 4608 pgs, 2 pools, 1712 GB data, 440 kobjects 3425 GB used, 6708 GB / 10134 GB avail 1 active+clean+scrubbing+deep 4607 active+clean client io 3282 kB/s rd, 10742 kB/s wr, 182 op/s thanks best regards ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] HEALTH_WARN 6 requests are blocked
Hi, Patrik. You must configure the priority of the I / O for scrubbing. http://dachary.org/?p=3268 2015-05-12 8:03 GMT+03:00 Patrik Plank pat...@plank.me: Hi, the ceph cluster shows always the scrubbing notifications, although he do not scrub. And what does the Health Warn mean. Does anybody have an idea why the warning is displayed. How can I solve this? cluster 78227661-3a1b-4e56-addc-c2a272933ac2 health HEALTH_WARN 6 requests are blocked 32 sec monmap e3: 3 mons at {ceph01= 10.0.0.20:6789/0,ceph02=10.0.0.21:6789/0,ceph03=10.0.0.22:6789/0}, election epoch 92, quorum 0,1,2 ceph01,ceph02,ceph03 osdmap e4749: 30 osds: 30 up, 30 in pgmap v2321129: 4608 pgs, 2 pools, 1712 GB data, 440 kobjects 3425 GB used, 6708 GB / 10134 GB avail 1 active+clean+scrubbing+deep 4607 active+clean client io 3282 kB/s rd, 10742 kB/s wr, 182 op/s thanks best regards ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] HEALTH_WARN 6 requests are blocked
Scrubbing greatly affects the I / O and can slow queries on OSD. For more information, look in the 'ceph health detail' and 'ceph pg dump | grep scrub' 2015-05-12 8:42 GMT+03:00 Patrik Plank pat...@plank.me: Hi, is that the reason for the Health Warn or the scrubbing notification? thanks regards -Original message- *From:* Irek Fasikhov malm...@gmail.com *Sent:* Tuesday 12th May 2015 7:33 *To:* Patrik Plank pat...@plank.me *Cc:* ceph-users@lists.ceph.com ceph-users@lists.ceph.com ceph-users@lists.ceph.com *Subject:* Re: [ceph-users] HEALTH_WARN 6 requests are blocked Hi, Patrik. You must configure the priority of the I / O for scrubbing. http://dachary.org/?p=3268 2015-05-12 8:03 GMT+03:00 Patrik Plank pat...@plank.me: Hi, the ceph cluster shows always the scrubbing notifications, although he do not scrub. And what does the Health Warn mean. Does anybody have an idea why the warning is displayed. How can I solve this? cluster 78227661-3a1b-4e56-addc-c2a272933ac2 health HEALTH_WARN 6 requests are blocked 32 sec monmap e3: 3 mons at {ceph01= 10.0.0.20:6789/0,ceph02=10.0.0.21:6789/0,ceph03=10.0.0.22:6789/0}, election epoch 92, quorum 0,1,2 ceph01,ceph02,ceph03 osdmap e4749: 30 osds: 30 up, 30 in pgmap v2321129: 4608 pgs, 2 pools, 1712 GB data, 440 kobjects 3425 GB used, 6708 GB / 10134 GB avail 1 active+clean+scrubbing+deep 4607 active+clean client io 3282 kB/s rd, 10742 kB/s wr, 182 op/s thanks best regards ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] very different performance on two volumes in the same pool
Hi, Nikola. https://www.mail-archive.com/ceph-users@lists.ceph.com/msg19152.html 2015-04-27 14:17 GMT+03:00 Nikola Ciprich nikola.cipr...@linuxbox.cz: Hello Somnath, Thanks for the perf data..It seems innocuous..I am not seeing single tcmalloc trace, are you running with tcmalloc by the way ? according to ldd, it seems I have it compiled in, yes: [root@vfnphav1a ~]# ldd /usr/bin/ceph-osd . . libtcmalloc.so.4 = /usr/lib64/libtcmalloc.so.4 (0x7f7a3756e000) . . What about my other question, is the performance of slow volume increasing if you stop IO on the other volume ? I don't have any other cpeh users, actually whole cluster is idle.. Are you using default ceph.conf ? Probably, you want to try with different osd_op_num_shards (may be = 10 , based on your osd server config) and osd_op_num_threads_per_shard (may be = 1). Also, you may want to see the effect by doing osd_enable_op_tracker = false I guess I'm using pretty default settings, few changes probably not much related: [osd] osd crush update on start = false [client] rbd cache = true rbd cache writethrough until flush = true [mon] debug paxos = 0 I now tried setting throttler perf counter = false osd enable op tracker = false osd_op_num_threads_per_shard = 1 osd_op_num_shards = 10 and restarting all ceph servers.. but it seems to make no big difference.. Are you seeing similar resource consumption on both the servers while IO is going on ? yes, on all three nodes, ceph-osd seems to be consuming lots of CPU during benchmark. Need some information about your client, are the volumes exposed with krbd or running with librbd environment ? If krbd and with same physical box, hope you mapped the images with 'noshare' enabled. I'm using fio with ceph engine, so I guess none rbd related stuff is in use here? Too many questions :-) But, this may give some indication what is going on there. :-) hopefully my answers are not too confused, I'm still pretty new to ceph.. BR nik Thanks Regards Somnath -Original Message- From: Nikola Ciprich [mailto:nikola.cipr...@linuxbox.cz] Sent: Sunday, April 26, 2015 7:32 AM To: Somnath Roy Cc: ceph-users@lists.ceph.com; n...@linuxbox.cz Subject: Re: [ceph-users] very different performance on two volumes in the same pool Hello Somnath, On Fri, Apr 24, 2015 at 04:23:19PM +, Somnath Roy wrote: This could be again because of tcmalloc issue I reported earlier. Two things to observe. 1. Is the performance improving if you stop IO on other volume ? If so, it could be different issue. there is no other IO.. only cephfs mounted, but no users of it. 2. Run perf top in the OSD node and see if tcmalloc traces are popping up. don't see anything special: 3.34% libc-2.12.so [.] _int_malloc 2.87% libc-2.12.so [.] _int_free 2.79% [vdso][.] __vdso_gettimeofday 2.67% libsoftokn3.so[.] 0x0001fad9 2.34% libfreeblpriv3.so [.] 0x000355e6 2.33% libpthread-2.12.so[.] pthread_mutex_unlock 2.19% libpthread-2.12.so[.] pthread_mutex_lock 1.80% libc-2.12.so [.] malloc 1.43% [kernel] [k] do_raw_spin_lock 1.42% libc-2.12.so [.] memcpy 1.23% [kernel] [k] __switch_to 1.19% [kernel] [k] acpi_processor_ffh_cstate_enter 1.09% libc-2.12.so [.] malloc_consolidate 1.08% [kernel] [k] __schedule 1.05% libtcmalloc.so.4.1.0 [.] 0x00017e6f 0.98% libc-2.12.so [.] vfprintf 0.83% libstdc++.so.6.0.13 [.] std::basic_ostreamchar, std::char_traitschar std::__ostream_insertchar, std::char_traitschar (std::basic_ostreamchar, 0.76% libstdc++.so.6.0.13 [.] 0x0008092a 0.73% libc-2.12.so [.] __memset_sse2 0.72% libc-2.12.so [.] __strlen_sse42 0.70% libstdc++.so.6.0.13 [.] std::basic_streambufchar, std::char_traitschar ::xsputn(char const*, long) 0.68% libpthread-2.12.so[.] pthread_mutex_trylock 0.67% librados.so.2.0.0 [.] ceph_crc32c_sctp 0.63% libpython2.6.so.1.0 [.] 0x0007d823 0.55% libnss3.so[.] 0x00056d2a 0.52% libc-2.12.so [.] free 0.50% libstdc++.so.6.0.13 [.] std::basic_stringchar, std::char_traitschar, std::allocatorchar ::basic_string(std::string const) should I check anything else? BR nik Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Nikola Ciprich Sent: Friday, April 24, 2015 7:10 AM To:
Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
Hi,Alexandre! Do not try to change the parameter vm.min_free_kbytes? 2015-04-23 19:24 GMT+03:00 Somnath Roy somnath@sandisk.com: Alexandre, You can configure with --with-jemalloc or ./do_autogen -J to build ceph with jemalloc. Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Alexandre DERUMIER Sent: Thursday, April 23, 2015 4:56 AM To: Mark Nelson Cc: ceph-users; ceph-devel; Milosz Tanski Subject: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops If you have the means to compile the same version of ceph with jemalloc, I would be very interested to see how it does. Yes, sure. (I have around 3-4 weeks to do all the benchs) But I don't know how to do it ? I'm running the cluster on centos7.1, maybe it can be easy to patch the srpms to rebuild the package with jemalloc. - Mail original - De: Mark Nelson mnel...@redhat.com À: aderumier aderum...@odiso.com, Srinivasula Maram srinivasula.ma...@sandisk.com Cc: ceph-users ceph-users@lists.ceph.com, ceph-devel ceph-de...@vger.kernel.org, Milosz Tanski mil...@adfin.com Envoyé: Jeudi 23 Avril 2015 13:33:00 Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops Thanks for the testing Alexandre! If you have the means to compile the same version of ceph with jemalloc, I would be very interested to see how it does. In some ways I'm glad it turned out not to be NUMA. I still suspect we will have to deal with it at some point, but perhaps not today. ;) Mark On 04/23/2015 05:58 AM, Alexandre DERUMIER wrote: Maybe it's tcmalloc related I thinked to have patched it correctly, but perf show a lot of tcmalloc::ThreadCache::ReleaseToCentralCache before osd restart (100k) -- 11.66% ceph-osd libtcmalloc.so.4.1.2 [.] tcmalloc::ThreadCache::ReleaseToCentralCache 8.51% ceph-osd libtcmalloc.so.4.1.2 [.] tcmalloc::CentralFreeList::FetchFromSpans 3.04% ceph-osd libtcmalloc.so.4.1.2 [.] tcmalloc::CentralFreeList::ReleaseToSpans 2.04% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.63% swapper [kernel.kallsyms] [k] intel_idle 1.35% ceph-osd libtcmalloc.so.4.1.2 [.] tcmalloc::CentralFreeList::ReleaseListToSpans 1.33% ceph-osd libtcmalloc.so.4.1.2 [.] operator delete 1.07% ceph-osd libstdc++.so.6.0.19 [.] std::basic_stringchar, std::char_traitschar, std::allocatorchar ::basic_string 0.91% ceph-osd libpthread-2.17.so [.] pthread_mutex_trylock 0.88% ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 0.81% ceph-osd ceph-osd [.] Mutex::Lock 0.79% ceph-osd [kernel.kallsyms] [k] copy_user_enhanced_fast_string 0.74% ceph-osd libpthread-2.17.so [.] pthread_mutex_unlock 0.67% ceph-osd [kernel.kallsyms] [k] _raw_spin_lock 0.63% swapper [kernel.kallsyms] [k] native_write_msr_safe 0.62% ceph-osd [kernel.kallsyms] [k] avc_has_perm_noaudit 0.58% ceph-osd ceph-osd [.] operator 0.57% ceph-osd [kernel.kallsyms] [k] __schedule 0.57% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu 0.54% swapper [kernel.kallsyms] [k] __schedule after osd restart (300k iops) -- 3.47% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.92% ceph-osd libtcmalloc.so.4.1.2 [.] operator delete 1.86% swapper [kernel.kallsyms] [k] intel_idle 1.52% ceph-osd libstdc++.so.6.0.19 [.] std::basic_stringchar, std::char_traitschar, std::allocatorchar ::basic_string 1.34% ceph-osd libtcmalloc.so.4.1.2 [.] tcmalloc::ThreadCache::ReleaseToCentralCache 1.24% ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 1.23% ceph-osd ceph-osd [.] Mutex::Lock 1.21% ceph-osd libpthread-2.17.so [.] pthread_mutex_trylock 1.11% ceph-osd [kernel.kallsyms] [k] copy_user_enhanced_fast_string 0.95% ceph-osd libpthread-2.17.so [.] pthread_mutex_unlock 0.94% ceph-osd [kernel.kallsyms] [k] _raw_spin_lock 0.78% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu 0.70% ceph-osd [kernel.kallsyms] [k] tcp_sendmsg 0.70% ceph-osd ceph-osd [.] Message::Message 0.68% ceph-osd [kernel.kallsyms] [k] __schedule 0.66% ceph-osd [kernel.kallsyms] [k] idle_cpu 0.65% ceph-osd libtcmalloc.so.4.1.2 [.] tcmalloc::CentralFreeList::FetchFromSpans 0.64% swapper [kernel.kallsyms] [k] native_write_msr_safe 0.61% ceph-osd ceph-osd [.] std::tr1::_Sp_counted_base(__gnu_cxx::_Lock_policy)2::_M_release 0.60% swapper [kernel.kallsyms] [k] __schedule 0.60% ceph-osd libstdc++.so.6.0.19 [.] 0x000bdd2b 0.57% ceph-osd ceph-osd [.] operator 0.57% ceph-osd ceph-osd [.] crc32_iscsi_00 0.56% ceph-osd libstdc++.so.6.0.19 [.] std::string::_Rep::_M_dispose 0.55% ceph-osd [kernel.kallsyms] [k] __switch_to 0.54% ceph-osd libc-2.17.so [.] vfprintf 0.52% ceph-osd [kernel.kallsyms] [k] fget_light - Mail original - De: aderumier aderum...@odiso.com À: Srinivasula Maram
Re: [ceph-users] Firefly - Giant : CentOS 7 : install failed ceph-deploy
I use Centos 7.1. The problem is that in the basic package repository has ceph-common. [root@ceph01p24 cluster]# yum --showduplicates list ceph-common Loaded plugins: dellsysid, etckeeper, fastestmirror, priorities Loading mirror speeds from cached hostfile * base: centos-mirror.rbc.ru * epel: be.mirror.eurid.eu * extras: ftp.funet.fi * updates: centos-mirror.rbc.ru Installed Packages ceph-common.x86_64 0.80.7-0.el7.centos @Ceph Available Packages ceph-common.x86_64 0.80.6-0.el7.centos Ceph ceph-common.x86_64 0.80.7-0.el7.centos Ceph ceph-common.x86_64 0.80.8-0.el7.centos Ceph ceph-common.x86_64 0.80.9-0.el7.centos Ceph ceph-common.x86_64 1:0.80.7-0.4.el7 epel ceph-common.x86_64 1:0.80.7-2.el7 base I make the installation as follows: rpm -ivh http://ceph.com/rpm-firefly/el7/noarch/ceph-release-1-0.el7.noarch.rpm yum install redhat-lsb-core-4.1-27.el7.centos.1.x86_64 gperftools-libs.x86_64 yum-plugin-priorities.noarch ntp -y yum install librbd1-0.80.7-0.el7.centos librados2-0.80.7-0.el7.centos.x86_64.rpm -y yum install gdisk cryptsetup leveldb python-jinja2 hdparm -y yum install --disablerepo=base --disablerepo=epel ceph-common-0.80.7-0.el7.centos.x86_64 -y yum install --disablerepo=base --disablerepo=epel ceph-0.80.7-0.el7.centos -y 2015-04-08 12:40 GMT+03:00 Vickey Singh vickey.singh22...@gmail.com: Hello Everyone I also tried setting higher priority as suggested by SAM but no luck Please see the Full logs here http://paste.ubuntu.com/10771358/ While installing yum searches for correct Ceph repository but it founds 3 versions of python-ceph under http://ceph.com/rpm-giant/el7/x86_64/ How can i instruct yum to install latest version of ceph from giant repository ?? FYI i have this setting already [root@rgw-node1 yum.repos.d]# cat /etc/yum/pluginconf.d/priorities.conf [main] enabled = 1 check_obsoletes = 1 [root@rgw-node1 yum.repos.d]# This issue can be easily reproduced, just now i tried on a fresh server centos 7.0.1406 but it still fails. Please help. Please help. Please help. # cat /etc/redhat-release CentOS Linux release 7.0.1406 (Core) # # uname -r 3.10.0-123.20.1.el7.x86_64 # Regards VS On Wed, Apr 8, 2015 at 11:10 AM, Sam Wouters s...@ericom.be wrote: Hi Vickey, we had a similar issue and we resolved it by giving the centos base and update repo a higher priority (ex 10) then the epel repo. The ceph-deploy tool only sets a prio of 1 for the ceph repo's, but the centos and epel repo's stay on the default of 99. regards, Sam On 08-04-15 09:32, Vickey Singh wrote: Hi Ken As per your suggestion , i tried enabling epel-testing repository but still no luck. Please check the below output. I would really appreciate any help here. # yum install ceph --enablerepo=epel-testing --- Package python-rbd.x86_64 1:0.80.7-0.5.el7 will be installed -- Processing Dependency: librbd1 = 1:0.80.7 for package: 1:python-rbd-0.80.7-0.5.el7.x86_64 -- Finished Dependency Resolution Error: Package: 1:python-cephfs-0.80.7-0.4.el7.x86_64 (epel) Requires: libcephfs1 = 1:0.80.7 Available: 1:libcephfs1-0.86-0.el7.centos.x86_64 (Ceph) libcephfs1 = 1:0.86-0.el7.centos Available: 1:libcephfs1-0.87-0.el7.centos.x86_64 (Ceph) libcephfs1 = 1:0.87-0.el7.centos Installing: 1:libcephfs1-0.87.1-0.el7.centos.x86_64 (Ceph) libcephfs1 = 1:0.87.1-0.el7.centos *Error: Package: 1:python-rbd-0.80.7-0.5.el7.x86_64 (epel-testing)* Requires: librbd1 = 1:0.80.7 Removing: librbd1-0.80.9-0.el7.centos.x86_64 (@Ceph) librbd1 = 0.80.9-0.el7.centos Updated By: 1:librbd1-0.87.1-0.el7.centos.x86_64 (Ceph) librbd1 = 1:0.87.1-0.el7.centos Available: 1:librbd1-0.86-0.el7.centos.x86_64 (Ceph) librbd1 = 1:0.86-0.el7.centos Available: 1:librbd1-0.87-0.el7.centos.x86_64 (Ceph) librbd1 = 1:0.87-0.el7.centos *Error: Package: 1:python-rados-0.80.7-0.5.el7.x86_64 (epel-testing)* Requires: librados2 = 1:0.80.7 Removing: librados2-0.80.9-0.el7.centos.x86_64 (@Ceph) librados2 = 0.80.9-0.el7.centos Updated By: 1:librados2-0.87.1-0.el7.centos.x86_64 (Ceph) librados2 = 1:0.87.1-0.el7.centos Available: 1:librados2-0.86-0.el7.centos.x86_64 (Ceph)
Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?
You have a number of replication? 2015-03-03 15:14 GMT+03:00 Andrija Panic andrija.pa...@gmail.com: Hi Irek, yes, stoping OSD (or seting it to OUT) resulted in only 3% of data degraded and moved/recovered. When I after that removed it from Crush map ceph osd crush rm id, that's when the stuff with 37% happened. And thanks Irek for help - could you kindly just let me know of the prefered steps when removing whole node? Do you mean I first stop all OSDs again, or just remove each OSD from crush map, or perhaps, just decompile cursh map, delete the node completely, compile back in, and let it heal/recover ? Do you think this would result in less data missplaces and moved arround ? Sorry for bugging you, I really appreaciate your help. Thanks On 3 March 2015 at 12:58, Irek Fasikhov malm...@gmail.com wrote: A large percentage of the rebuild of the cluster map (But low percentage degradation). If you had not made ceph osd crush rm id, the percentage would be low. In your case, the correct option is to remove the entire node, rather than each disk individually 2015-03-03 14:27 GMT+03:00 Andrija Panic andrija.pa...@gmail.com: Another question - I mentioned here 37% of objects being moved arround - this is MISPLACED object (degraded objects were 0.001%, after I removed 1 OSD from cursh map (out of 44 OSD or so). Can anybody confirm this is normal behaviour - and are there any workarrounds ? I understand this is because of the object placement algorithm of CEPH, but still 37% of object missplaces just by removing 1 OSD from crush maps out of 44 make me wonder why this large percentage ? Seems not good to me, and I have to remove another 7 OSDs (we are demoting some old hardware nodes). This means I can potentialy go with 7 x the same number of missplaced objects...? Any thoughts ? Thanks On 3 March 2015 at 12:14, Andrija Panic andrija.pa...@gmail.com wrote: Thanks Irek. Does this mean, that after peering for each PG, there will be delay of 10sec, meaning that every once in a while, I will have 10sec od the cluster NOT being stressed/overloaded, and then the recovery takes place for that PG, and then another 10sec cluster is fine, and then stressed again ? I'm trying to understand process before actually doing stuff (config reference is there on ceph.com but I don't fully understand the process) Thanks, Andrija On 3 March 2015 at 11:32, Irek Fasikhov malm...@gmail.com wrote: Hi. Use value osd_recovery_delay_start example: [root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok config show | grep osd_recovery_delay_start osd_recovery_delay_start: 10 2015-03-03 13:13 GMT+03:00 Andrija Panic andrija.pa...@gmail.com: HI Guys, I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused over 37% od the data to rebalance - let's say this is fine (this is when I removed it frm Crush Map). I'm wondering - I have previously set some throtling mechanism, but during first 1h of rebalancing, my rate of recovery was going up to 1500 MB/s - and VMs were unusable completely, and then last 4h of the duration of recover this recovery rate went down to, say, 100-200 MB.s and during this VM performance was still pretty impacted, but at least I could work more or a less So my question, is this behaviour expected, is throtling here working as expected, since first 1h was almoust no throtling applied if I check the recovery rate 1500MB/s and the impact on Vms. And last 4h seemed pretty fine (although still lot of impact in general) I changed these throtling on the fly with: ceph tell osd.* injectargs '--osd_recovery_max_active 1' ceph tell osd.* injectargs '--osd_recovery_op_priority 1' ceph tell osd.* injectargs '--osd_max_backfills 1' My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one SSD, 6 journals on another SSD) - I have 3 of these hosts. Any thought are welcome. -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- Andrija Panić -- Andrija Panić -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- Andrija Panić -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?
Once you have only three nodes in the cluster. I recommend you add new nodes to the cluster, and then delete the old. 2015-03-03 15:28 GMT+03:00 Irek Fasikhov malm...@gmail.com: You have a number of replication? 2015-03-03 15:14 GMT+03:00 Andrija Panic andrija.pa...@gmail.com: Hi Irek, yes, stoping OSD (or seting it to OUT) resulted in only 3% of data degraded and moved/recovered. When I after that removed it from Crush map ceph osd crush rm id, that's when the stuff with 37% happened. And thanks Irek for help - could you kindly just let me know of the prefered steps when removing whole node? Do you mean I first stop all OSDs again, or just remove each OSD from crush map, or perhaps, just decompile cursh map, delete the node completely, compile back in, and let it heal/recover ? Do you think this would result in less data missplaces and moved arround ? Sorry for bugging you, I really appreaciate your help. Thanks On 3 March 2015 at 12:58, Irek Fasikhov malm...@gmail.com wrote: A large percentage of the rebuild of the cluster map (But low percentage degradation). If you had not made ceph osd crush rm id, the percentage would be low. In your case, the correct option is to remove the entire node, rather than each disk individually 2015-03-03 14:27 GMT+03:00 Andrija Panic andrija.pa...@gmail.com: Another question - I mentioned here 37% of objects being moved arround - this is MISPLACED object (degraded objects were 0.001%, after I removed 1 OSD from cursh map (out of 44 OSD or so). Can anybody confirm this is normal behaviour - and are there any workarrounds ? I understand this is because of the object placement algorithm of CEPH, but still 37% of object missplaces just by removing 1 OSD from crush maps out of 44 make me wonder why this large percentage ? Seems not good to me, and I have to remove another 7 OSDs (we are demoting some old hardware nodes). This means I can potentialy go with 7 x the same number of missplaced objects...? Any thoughts ? Thanks On 3 March 2015 at 12:14, Andrija Panic andrija.pa...@gmail.com wrote: Thanks Irek. Does this mean, that after peering for each PG, there will be delay of 10sec, meaning that every once in a while, I will have 10sec od the cluster NOT being stressed/overloaded, and then the recovery takes place for that PG, and then another 10sec cluster is fine, and then stressed again ? I'm trying to understand process before actually doing stuff (config reference is there on ceph.com but I don't fully understand the process) Thanks, Andrija On 3 March 2015 at 11:32, Irek Fasikhov malm...@gmail.com wrote: Hi. Use value osd_recovery_delay_start example: [root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok config show | grep osd_recovery_delay_start osd_recovery_delay_start: 10 2015-03-03 13:13 GMT+03:00 Andrija Panic andrija.pa...@gmail.com: HI Guys, I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused over 37% od the data to rebalance - let's say this is fine (this is when I removed it frm Crush Map). I'm wondering - I have previously set some throtling mechanism, but during first 1h of rebalancing, my rate of recovery was going up to 1500 MB/s - and VMs were unusable completely, and then last 4h of the duration of recover this recovery rate went down to, say, 100-200 MB.s and during this VM performance was still pretty impacted, but at least I could work more or a less So my question, is this behaviour expected, is throtling here working as expected, since first 1h was almoust no throtling applied if I check the recovery rate 1500MB/s and the impact on Vms. And last 4h seemed pretty fine (although still lot of impact in general) I changed these throtling on the fly with: ceph tell osd.* injectargs '--osd_recovery_max_active 1' ceph tell osd.* injectargs '--osd_recovery_op_priority 1' ceph tell osd.* injectargs '--osd_max_backfills 1' My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one SSD, 6 journals on another SSD) - I have 3 of these hosts. Any thought are welcome. -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- Andrija Panić -- Andrija Panić -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- Andrija Panić -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?
Hi. Use value osd_recovery_delay_start example: [root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok config show | grep osd_recovery_delay_start osd_recovery_delay_start: 10 2015-03-03 13:13 GMT+03:00 Andrija Panic andrija.pa...@gmail.com: HI Guys, I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused over 37% od the data to rebalance - let's say this is fine (this is when I removed it frm Crush Map). I'm wondering - I have previously set some throtling mechanism, but during first 1h of rebalancing, my rate of recovery was going up to 1500 MB/s - and VMs were unusable completely, and then last 4h of the duration of recover this recovery rate went down to, say, 100-200 MB.s and during this VM performance was still pretty impacted, but at least I could work more or a less So my question, is this behaviour expected, is throtling here working as expected, since first 1h was almoust no throtling applied if I check the recovery rate 1500MB/s and the impact on Vms. And last 4h seemed pretty fine (although still lot of impact in general) I changed these throtling on the fly with: ceph tell osd.* injectargs '--osd_recovery_max_active 1' ceph tell osd.* injectargs '--osd_recovery_op_priority 1' ceph tell osd.* injectargs '--osd_max_backfills 1' My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one SSD, 6 journals on another SSD) - I have 3 of these hosts. Any thought are welcome. -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?
A large percentage of the rebuild of the cluster map (But low percentage degradation). If you had not made ceph osd crush rm id, the percentage would be low. In your case, the correct option is to remove the entire node, rather than each disk individually 2015-03-03 14:27 GMT+03:00 Andrija Panic andrija.pa...@gmail.com: Another question - I mentioned here 37% of objects being moved arround - this is MISPLACED object (degraded objects were 0.001%, after I removed 1 OSD from cursh map (out of 44 OSD or so). Can anybody confirm this is normal behaviour - and are there any workarrounds ? I understand this is because of the object placement algorithm of CEPH, but still 37% of object missplaces just by removing 1 OSD from crush maps out of 44 make me wonder why this large percentage ? Seems not good to me, and I have to remove another 7 OSDs (we are demoting some old hardware nodes). This means I can potentialy go with 7 x the same number of missplaced objects...? Any thoughts ? Thanks On 3 March 2015 at 12:14, Andrija Panic andrija.pa...@gmail.com wrote: Thanks Irek. Does this mean, that after peering for each PG, there will be delay of 10sec, meaning that every once in a while, I will have 10sec od the cluster NOT being stressed/overloaded, and then the recovery takes place for that PG, and then another 10sec cluster is fine, and then stressed again ? I'm trying to understand process before actually doing stuff (config reference is there on ceph.com but I don't fully understand the process) Thanks, Andrija On 3 March 2015 at 11:32, Irek Fasikhov malm...@gmail.com wrote: Hi. Use value osd_recovery_delay_start example: [root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok config show | grep osd_recovery_delay_start osd_recovery_delay_start: 10 2015-03-03 13:13 GMT+03:00 Andrija Panic andrija.pa...@gmail.com: HI Guys, I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused over 37% od the data to rebalance - let's say this is fine (this is when I removed it frm Crush Map). I'm wondering - I have previously set some throtling mechanism, but during first 1h of rebalancing, my rate of recovery was going up to 1500 MB/s - and VMs were unusable completely, and then last 4h of the duration of recover this recovery rate went down to, say, 100-200 MB.s and during this VM performance was still pretty impacted, but at least I could work more or a less So my question, is this behaviour expected, is throtling here working as expected, since first 1h was almoust no throtling applied if I check the recovery rate 1500MB/s and the impact on Vms. And last 4h seemed pretty fine (although still lot of impact in general) I changed these throtling on the fly with: ceph tell osd.* injectargs '--osd_recovery_max_active 1' ceph tell osd.* injectargs '--osd_recovery_op_priority 1' ceph tell osd.* injectargs '--osd_max_backfills 1' My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one SSD, 6 journals on another SSD) - I have 3 of these hosts. Any thought are welcome. -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- Andrija Panić -- Andrija Panić -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?
osd_recovery_delay_start - is the delay in seconds between iterations recovery (osd_recovery_max_active) It is described here: https://github.com/ceph/ceph/search?utf8=%E2%9C%93q=osd_recovery_delay_start 2015-03-03 14:27 GMT+03:00 Andrija Panic andrija.pa...@gmail.com: Another question - I mentioned here 37% of objects being moved arround - this is MISPLACED object (degraded objects were 0.001%, after I removed 1 OSD from cursh map (out of 44 OSD or so). Can anybody confirm this is normal behaviour - and are there any workarrounds ? I understand this is because of the object placement algorithm of CEPH, but still 37% of object missplaces just by removing 1 OSD from crush maps out of 44 make me wonder why this large percentage ? Seems not good to me, and I have to remove another 7 OSDs (we are demoting some old hardware nodes). This means I can potentialy go with 7 x the same number of missplaced objects...? Any thoughts ? Thanks On 3 March 2015 at 12:14, Andrija Panic andrija.pa...@gmail.com wrote: Thanks Irek. Does this mean, that after peering for each PG, there will be delay of 10sec, meaning that every once in a while, I will have 10sec od the cluster NOT being stressed/overloaded, and then the recovery takes place for that PG, and then another 10sec cluster is fine, and then stressed again ? I'm trying to understand process before actually doing stuff (config reference is there on ceph.com but I don't fully understand the process) Thanks, Andrija On 3 March 2015 at 11:32, Irek Fasikhov malm...@gmail.com wrote: Hi. Use value osd_recovery_delay_start example: [root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok config show | grep osd_recovery_delay_start osd_recovery_delay_start: 10 2015-03-03 13:13 GMT+03:00 Andrija Panic andrija.pa...@gmail.com: HI Guys, I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused over 37% od the data to rebalance - let's say this is fine (this is when I removed it frm Crush Map). I'm wondering - I have previously set some throtling mechanism, but during first 1h of rebalancing, my rate of recovery was going up to 1500 MB/s - and VMs were unusable completely, and then last 4h of the duration of recover this recovery rate went down to, say, 100-200 MB.s and during this VM performance was still pretty impacted, but at least I could work more or a less So my question, is this behaviour expected, is throtling here working as expected, since first 1h was almoust no throtling applied if I check the recovery rate 1500MB/s and the impact on Vms. And last 4h seemed pretty fine (although still lot of impact in general) I changed these throtling on the fly with: ceph tell osd.* injectargs '--osd_recovery_max_active 1' ceph tell osd.* injectargs '--osd_recovery_op_priority 1' ceph tell osd.* injectargs '--osd_max_backfills 1' My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one SSD, 6 journals on another SSD) - I have 3 of these hosts. Any thought are welcome. -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- Andrija Panić -- Andrija Panić -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] who is using radosgw with civetweb?
I fully support Wido. We also have no problems. OS: CentOS7 [root@s3backup etc]# ceph -v ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) 2015-02-26 13:22 GMT+03:00 Dan van der Ster d...@vanderster.com: Hi Sage, We switched from apache+fastcgi to civetweb (+haproxy) around one month ago and so far it is working quite well. Just like GuangYang, we had seen many error 500's with fastcgi, but we never investigated it deeply. After moving to civetweb we don't get any errors at all no matter what load we send to the gateways. Here are some details: - the whole cluster, radosgw included, is firefly 0.80.8 and Scientific Linux 6.6 - we have 6 gateways, each running on a 2-core VM - civetweb is listening on 8080 - haproxy is listening on _each_ gateway VM on 80 and 443 and proxying to the radosgw's - so far we've written ~20 million objects (mostly very small) through civetweb. Our feedback is that the civetweb configuration is _much_ easier, much cleaner, and more reliable than what we had with apache+fastcgi. Before, we needed the non-standard apache (with 100-continue support) and the fastcgi config was always error-prone. The main goals we had for adding haproxy were for load balancing and to add SSL. Currently haproxy is configured to balance the http sessions evenly over all of our gateways -- one civetweb feature which would be nice to have would be a /health report (which returns e.g. some load metric for that gateway) that we could feed into haproxy so it would be able to better balance the load. In conclusion, +1 from us... AFAWCT civetweb is the way to go for Red Hat's future supported configuration. Best Regards, Dan (+Herve who did the work!) On Wed, Feb 25, 2015 at 8:31 PM, Sage Weil sw...@redhat.com wrote: Hey, We are considering switching to civetweb (the embedded/standalone rgw web server) as the primary supported RGW frontend instead of the current apache + mod-fastcgi or mod-proxy-fcgi approach. Supported here means both the primary platform the upstream development focuses on and what the downstream Red Hat product will officially support. How many people are using RGW standalone using the embedded civetweb server instead of apache? In production? At what scale? What version(s) (civetweb first appeared in firefly and we've backported most fixes). Have you seen any problems? Any other feedback? The hope is to (vastly) simplify deployment. Thanks! sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Dumpling/Firefly/Hammer SSD/Memstore performance comparison
Mark, very very good! 2015-02-17 20:37 GMT+03:00 Mark Nelson mnel...@redhat.com: Hi All, I wrote up a short document describing some tests I ran recently to look at how SSD backed OSD performance has changed across our LTS releases. This is just looking at RADOS performance and not RBD or RGW. It also doesn't offer any real explanations regarding the results. It's just a first high level step toward understanding some of the behaviors folks on the mailing list have reported over the last couple of releases. I hope you find it useful. Mark ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Introducing Learning Ceph : The First ever Book on Ceph
Karan Whether to send the book in Russian? Thanks. 2015-02-13 11:43 GMT+03:00 Karan Singh karan.si...@csc.fi: Here is the new link for sample book : https://www.dropbox.com/s/2zcxawtv4q29fm9/Learning_Ceph_Sample.pdf?dl=0 Karan Singh Systems Specialist , Storage Platforms CSC - IT Center for Science, Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland mobile: +358 503 812758 tel. +358 9 4572001 fax +358 9 4572302 http://www.csc.fi/ On 13 Feb 2015, at 05:25, Frank Yu flyxia...@gmail.com wrote: Wow, Cong BTW, I found the link of sample copy is 404. 2015-02-06 6:53 GMT+08:00 Karan Singh karan.si...@csc.fi: Hello Community Members I am happy to introduce the first book on Ceph with the title “*Learning Ceph*”. Me and many folks from the publishing house together with technical reviewers spent several months to get this book compiled and published. Finally the book is up for sale on , i hope you would like it and surely will learn a lot from it. Amazon : http://www.amazon.com/Learning-Ceph-Karan-Singh/dp/1783985623/ref=sr_1_1?s=booksie=UTF8qid=1423174441sr=1-1keywords=ceph Packtpub : https://www.packtpub.com/application-development/learning-ceph You can grab the sample copy from here : https://www.dropbox.com/s/ek76r01r9prs6pb/Learning_Ceph_Packt.pdf?dl=0 *Finally , I would like to express my sincere thanks to * *Sage Weil* - For developing Ceph and everything around it as well as writing foreword for “Learning Ceph”. *Patrick McGarry *- For his usual off the track support that too always. Last but not the least , to our great community members , who are also reviewers of the book *Don Talton , Julien Recurt , Sebastien Han *and *Zihong Chen *, Thank you guys for your efforts. Karan Singh Systems Specialist , Storage Platforms CSC - IT Center for Science, Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland mobile: +358 503 812758 tel. +358 9 4572001 fax +358 9 4572302 http://www.csc.fi/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Regards Frank Yu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph Performance with SSD journal
Hi. What version? 2015-02-13 6:04 GMT+03:00 Sumit Gaur sumitkg...@gmail.com: Hi Chir, Please fidn my answer below in blue On Thu, Feb 12, 2015 at 12:42 PM, Chris Hoy Poy ch...@gopc.net wrote: Hi Sumit, A couple questions: What brand/model SSD? samsung 480G SSD(PM853T) having random write 90K IOPS (4K, 368MBps) What brand/model HDD? 64GB memory, 300GB SAS HDD (seagate), 10Gb nic Also how they are connected to controller/motherboard? Are they sharing a bus (ie SATA expander)? no , They are connected with local Bus not the SATA expander. RAM? *64GB * Also look at the output of iostat -x or similiar, are the SSDs hitting 100% utilisation? *No, SSD was hitting 2000 iops only. * I suspect that the 5:1 ratio of HDDs to SDDs is not ideal, you now have 5x the write IO trying to fit into a single SSD. * I have not seen any documented reference to calculate the ratio. Could you suggest one. Here I want to mention that results for 1024K write improve a lot. Problem is with 1024K read and 4k write .* *SSD journal 810 IOPS and 810MBps* *HDD journal 620 IOPS and 620 MBps* I'll take a punt on it being a SATA connected SSD (most common), 5x ~130 megabytes/second gets very close to most SATA bus limits. If its a shared BUS, you possibly hit that limit even earlier (since all that data is now being written twice out over the bus). cheers; \Chris -- *From: *Sumit Gaur sumitkg...@gmail.com *To: *ceph-users@lists.ceph.com *Sent: *Thursday, 12 February, 2015 9:23:35 AM *Subject: *[ceph-users] ceph Performance with SSD journal Hi Ceph-Experts, Have a small ceph architecture related question As blogs and documents suggest that ceph perform much better if we use journal on SSD. I have made the ceph cluster with 30 HDD + 6 SSD for 6 OSD nodes. 5 HDD + 1 SSD on each node and each SSD have 5 partition for journaling 5 OSDs on the node. Now I ran similar test as I ran for all HDD setup. What I saw below two reading goes in wrong direction as expected 1) 4K write IOPS are less for SSD setup, though not major difference but less. 2) 1024K Read IOPS are less for SSD setup than HDD setup. On the other hand 4K read and 1024K write both have much better numbers for SSD setup. Let me know if I am missing some obvious concept. Thanks sumit ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] re: Upgrade 0.80.5 to 0.80.8 --the VM's read requestbecome too slow
Hi. hmm ... I thought, why I have such a low speed reading on another cluster P.S. ceph 0.80.8 2015-02-12 14:33 GMT+03:00 Alexandre DERUMIER aderum...@odiso.com: Hi, Can you test with disabling rbd_cache ? I remember of a bug detected in giant, not sure it's also the case for fireflt This was this tracker: http://tracker.ceph.com/issues/9513 But It has been solved and backported to firefly. Also, can you test 0.80.6 and 0.80.7 ? - Mail original - De: killingwolf killingw...@qq.com À: ceph-users ceph-users@lists.ceph.com Envoyé: Jeudi 12 Février 2015 12:16:32 Objet: [ceph-users] re: Upgrade 0.80.5 to 0.80.8 --the VM's read requestbecome too slow I have this problems too , Help! -- 原始邮件 -- 发件人: 杨万元;yangwanyuan8...@gmail.com; 发送时间: 2015年2月12日(星期四) 中午11:14 收件人: ceph-users@lists.ceph.comceph-users@lists.ceph.com; 主题: [ceph-users] Upgrade 0.80.5 to 0.80.8 --the VM's read requestbecome too slow Hello! We use Ceph+Openstack in our private cloud. Recently we upgrade our centos6.5 based cluster from Ceph Emperor to Ceph Firefly. At first,we use redhat yum repo epel to upgrade, this Ceph's version is 0.80.5. First upgrade monitor,then osd,last client. when we complete this upgrade, we boot a VM on the cluster,then use fio to test the io performance. The io performance is as better as before. Everything is ok! Then we upgrade the cluster from 0.80.5 to 0.80.8,when we completed , we reboot the VM to load the newest librbd. after that we also use fio to test the io performance .then we find the randwrite and write is as good as before.but the randread and read is become worse, randwrite's iops from 4000-5000 to 300-400 ,and the latency is worse. the write's bw from 400MB/s to 115MB/s . then I downgrade the ceph client version from 0.80.8 to 0.80.5, then the reslut become normal. So I think maybe something cause about librbd. I compare the 0.80.8 release notes with 0.80.5 ( http://ceph.com/docs/master/release-notes/#v0-80-8-firefly ), I just find this change in 0.80.8 is something about read request : librbd: cap memory utilization for read requests (Jason Dillaman) . Who can explain this? My ceph cluster is 400osd,5mons : ceph -s health HEALTH_OK monmap e11: 5 mons at {BJ-M1-Cloud71= 172.28.2.71:6789/0,BJ-M1-Cloud73=172.28.2.73:6789/0,BJ-M2-Cloud80=172.28.2.80:6789/0,BJ-M2-Cloud81=172.28.2.81:6789/0,BJ-M3-Cloud85=172.28.2.85:6789/0 }, election epoch 198, quorum 0,1,2,3,4 BJ-M1-Cloud71,BJ-M1-Cloud73,BJ-M2-Cloud80,BJ-M2-Cloud81,BJ-M3-Cloud85 osdmap e120157: 400 osds: 400 up, 400 in pgmap v26161895: 29288 pgs, 6 pools, 20862 GB data, 3014 kobjects 41084 GB used, 323 TB / 363 TB avail 29288 active+clean client io 52640 kB/s rd, 32419 kB/s wr, 5193 op/s The follwing is my ceph client conf : [global] auth_service_required = cephx filestore_xattr_use_omap = true auth_client_required = cephx auth_cluster_required = cephx mon_host = 172.29.204.24,172.29.204.48,172.29.204.55,172.29.204.58,172.29.204.73 mon_initial_members = ZR-F5-Cloud24, ZR-F6-Cloud48, ZR-F7-Cloud55, ZR-F8-Cloud58, ZR-F9-Cloud73 fsid = c01c8e28-304e-47a4-b876-cb93acc2e980 mon osd full ratio = .85 mon osd nearfull ratio = .75 public network = 172.29.204.0/24 mon warn on legacy crush tunables = false [osd] osd op threads = 12 filestore journal writeahead = true filestore merge threshold = 40 filestore split multiple = 8 [client] rbd cache = true rbd cache writethrough until flush = false rbd cache size = 67108864 rbd cache max dirty = 50331648 rbd cache target dirty = 33554432 [client.cinder] admin socket = /var/run/ceph/rbd-$pid.asok My VM is 8core16G,we use fio scripts is : fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randread -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randwrite -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=read -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=write -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 The following is the io test result ceph client verison :0.80.5 read: bw= 430MB write: bw=420MB randread: iops= 4875 latency=65ms randwrite: iops=6844 latency=46ms ceph client verison :0.80.8 read: bw= 115MB write: bw=480MB randread: iops= 381 latency=83ms randwrite: iops=4843 latency=68ms ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list
[ceph-users] 0.80.8 ReplicationPG Fail
Morning found that some OSD dropped out of Tier Cache Pool. Maybe it's a coincidence, but at this point was rollback. 2015-02-05 23:23:18.231723 7fd747ff1700 -1 *** Caught signal (Segmentation fault) ** in thread 7fd747ff1700 ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) 1: /usr/bin/ceph-osd() [0x9bde51] 2: (()+0xf710) [0x7fd766f97710] 3: (std::_Rb_tree_decrement(std::_Rb_tree_node_base*)+0xa) [0x7fd7666c1eca] 4: (ReplicatedPG::make_writeable(ReplicatedPG::OpContext*)+0x14c) [0x87cd5c] 5: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x1db) [0x89d29b] 6: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0xcd4) [0x89e0f4] 7: (ReplicatedPG::do_op(std::tr1::shared_ptrOpRequest)+0x2ca5) [0x8a2a55] 8: (ReplicatedPG::do_request(std::tr1::shared_ptrOpRequest, ThreadPool::TPHandle)+0x5b1) [0x832251] 9: (OSD::dequeue_op(boost::intrusive_ptrPG, std::tr1::shared_ptrOpRequest, ThreadPool::TPHandle)+0x37c) [0x61344c] 10: (OSD::OpWQ::_process(boost::intrusive_ptrPG, ThreadPool::TPHandle)+0x63d) [0x6472ad] 11: (ThreadPool::WorkQueueValstd::pairboost::intrusive_ptrPG, std::tr1::shared_ptrOpRequest , boost::intrusive_ptrPG ::_void_process(void*, ThreadPool::TPHandle)+0xae) [0x67dcde] 12: (ThreadPool::worker(ThreadPool::WorkThread*)+0x551) [0xa2a181] 13: (ThreadPool::WorkThread::entry()+0x10) [0xa2d260] 14: (()+0x79d1) [0x7fd766f8f9d1] 15: (clone()+0x6d) [0x7fd765f088fd] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. Are there any ideas? Thank. http://tracker.ceph.com/issues/10778 -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD over cache tier over EC pool: rbd rm doesn't remove objects
Hi,Sage. Yes, Firefly. [root@ceph05 ~]# ceph --version ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) Yes, I have seen this behavior. [root@ceph08 ceph]# rbd info vm-160-disk-1 rbd image 'vm-160-disk-1': size 32768 MB in 8192 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.179faf52eb141f2 format: 2 features: layering parent: rbd/base-145-disk-1@__base__ overlap: 32768 MB [root@ceph08 ceph]# rbd rm vm-160-disk-1 Removing image: 100% complete...done. [root@ceph08 ceph]# rbd info vm-160-disk-1 2015-01-28 10:39:01.595785 7f1fbea9e760 -1 librbd::ImageCtx: error finding header: (2) No such file or directoryrbd: error opening image vm-160-disk-1: (2) No such file or directory [root@ceph08 ceph]# rados -p rbdcache ls | grep 179faf52eb141f2 | wc 59445944 249633 [root@ceph08 ceph]# rados -p rbdcache ls | grep 179faf52eb141f2 | wc 58575857 245979 [root@ceph08 ceph]# rados -p rbd ls | grep 179faf52eb141f2 | wc 43774377 183819 [root@ceph08 ceph]# rados -p rbdcache ls | grep 179faf52eb141f2 | wc 50175017 210699 [root@ceph08 ceph]# rados -p rbdcache ls | grep 179faf52eb141f2 | wc 50155015 210615 [root@ceph08 ceph]# rados -p rbd ls | grep 179faf52eb141f2 | wc [root@ceph08 ceph]# rados -p rcachehe ls | grep 179faf52eb141f2 | wc 19861986 83412 [root@ceph08 ceph]# rados -p rbd ls | grep 179faf52eb141f2 | wc 981 981 41202 [root@ceph08 ceph]# rados -p rbd ls | grep 179faf52eb141f2 | wc 802 802 33684 [root@ceph08 ceph]# rados -p rbdcache ls | grep 179faf52eb141f2 | wc 16111611 67662 Thank, Sage! Tue Jan 27 2015 at 7:01:43 PM, Sage Weil s...@newdream.net: On Tue, 27 Jan 2015, Irek Fasikhov wrote: Hi,All. Indeed, there is a problem. Removed 1 TB of data space on a cluster is not cleared. This feature of the behavior or a bug? And how long will it be cleaned? Your subject says cache tier but I don't see it in the 'ceph df' output below. The cache tiers will store 'whiteout' objects that cache object non-existence that could be delaying some deletion. You can wrangle the cluster into flushing those with ceph osd pool set cachepool cache_target_dirty_ratio .05 (though you'll probably want to change it back to the default .4 later). If there's no cache tier involved, there may be another problem. What version is this? Firefly? sage Sat Sep 20 2014 at 8:19:24 AM, Mika?l Cluseau mclus...@isi.nc: Hi all, I have weird behaviour on my firefly test + convenience storage cluster. It consists of 2 nodes with a light imbalance in available space: # idweighttype nameup/downreweight -114.58root default -28.19host store-1 12.73osd.1up1 02.73osd.0up1 52.73osd.5up1 -36.39host store-2 22.73osd.2up1 32.73osd.3up1 40.93osd.4up1 I used to store ~8TB of rbd volumes, coming to a near-full state. There was some annoying stuck misplaced PGs so I began to remove 4.5TB of data; the weird thing is: the space hasn't been reclaimed on the OSDs, they keeped stuck around 84% usage. I tried to move PGs around and it happens that the space is correctly reclaimed if I take an OSD out, let him empty it XFS volume and then take it in again. I'm currently applying this to and OSD in turn, but I though it could be worth telling about this. The current ceph df output is: GLOBAL: SIZE AVAIL RAW USED %RAW USED 12103G 5311G 6792G56.12 POOLS: NAME ID USED %USED OBJECTS data 0 0 0 0 metadata 1 0 0 0 rbd 2 444G 3.67 117333 [...] archives-ec 14 3628G 29.98 928902 archives 15 37518M 0.30 273167 Before just moving data, AVAIL was around 3TB. I finished the process with the OSDs on store-1, who show the following space usage now: /dev/sdb1 2.8T 1.4T 1.4T 50% /var/lib/ceph/osd/ceph-0 /dev/sdc1 2.8T 1.3T 1.5T 46% /var/lib/ceph/osd/ceph-1 /dev/sdd1 2.8T 1.3T 1.5T 48% /var/lib/ceph/osd/ceph-5 I'm currently fixing OSD 2, 3 will be the last one to be fixed. The df on store-2 shows the following: /dev/sdb1 2.8T 1.9T 855G 70% /var/lib/ceph/osd/ceph-2 /dev/sdc1 2.8T 2.4T 417G
Re: [ceph-users] RBD over cache tier over EC pool: rbd rm doesn't remove objects
Hi,All. Indeed, there is a problem. Removed 1 TB of data space on a cluster is not cleared. This feature of the behavior or a bug? And how long will it be cleaned? Sat Sep 20 2014 at 8:19:24 AM, Mikaël Cluseau mclus...@isi.nc: Hi all, I have weird behaviour on my firefly test + convenience storage cluster. It consists of 2 nodes with a light imbalance in available space: # idweighttype nameup/downreweight -114.58root default -28.19host store-1 12.73osd.1up1 02.73osd.0up1 52.73osd.5up1 -36.39host store-2 22.73osd.2up1 32.73osd.3up1 40.93osd.4up1 I used to store ~8TB of rbd volumes, coming to a near-full state. There was some annoying stuck misplaced PGs so I began to remove 4.5TB of data; the weird thing is: the space hasn't been reclaimed on the OSDs, they keeped stuck around 84% usage. I tried to move PGs around and it happens that the space is correctly reclaimed if I take an OSD out, let him empty it XFS volume and then take it in again. I'm currently applying this to and OSD in turn, but I though it could be worth telling about this. The current ceph df output is: GLOBAL: SIZE AVAIL RAW USED %RAW USED 12103G 5311G 6792G56.12 POOLS: NAME ID USED %USED OBJECTS data 0 0 0 0 metadata 1 0 0 0 rbd 2 444G 3.67 117333 [...] archives-ec 14 3628G 29.98 928902 archives 15 37518M 0.30 273167 Before just moving data, AVAIL was around 3TB. I finished the process with the OSDs on store-1, who show the following space usage now: /dev/sdb1 2.8T 1.4T 1.4T 50% /var/lib/ceph/osd/ceph-0 /dev/sdc1 2.8T 1.3T 1.5T 46% /var/lib/ceph/osd/ceph-1 /dev/sdd1 2.8T 1.3T 1.5T 48% /var/lib/ceph/osd/ceph-5 I'm currently fixing OSD 2, 3 will be the last one to be fixed. The df on store-2 shows the following: /dev/sdb1 2.8T 1.9T 855G *70%* /var/lib/ceph/osd/ceph-2 /dev/sdc1 2.8T 2.4T 417G *86%* /var/lib/ceph/osd/ceph-3 /dev/sdd1 932G 481G 451G 52% /var/lib/ceph/osd/ceph-4 OSD 2 was at 84% 3h ago, and OSD 3 was ~75%. During rbd rm (that took a bit more that 3 days), ceph log was showing things like that: 2014-09-03 16:17:38.831640 mon.0 192.168.1.71:6789/0 417194 : [INF] pgmap v14953987: 3196 pgs: 2882 active+clean, 314 active+remapped; 7647 GB data, 11067 GB used, 3828 GB / 14896 GB avail; 0 B/s rd, 6778 kB/s wr, 18 op/s; -5/5757286 objects degraded (-0.000%) [...] 2014-09-05 03:09:59.895507 mon.0 192.168.1.71:6789/0 513976 : [INF] pgmap v15050766: 3196 pgs: 2882 active+clean, 314 active+remapped; 6010 GB data, 11156 GB used, 3740 GB / 14896 GB avail; 0 B/s rd, 0 B/s wr, 8 op/s; -388631/5247320 objects degraded (-7.406%) [...] 2014-09-06 03:56:50.008109 mon.0 192.168.1.71:6789/0 580816 : [INF] pgmap v15117604: 3196 pgs: 2882 active+clean, 314 active+remapped; 4865 GB data, 11207 GB used, 3689 GB / 14896 GB avail; 0 B/s rd, 6117 kB/s wr, 22 op/s; -706519/3699415 objects degraded (-19.098%) 2014-09-06 03:56:44.476903 osd.0 192.168.1.71:6805/11793 729 : [WRN] 1 slow requests, 1 included below; oldest blocked for 30.058434 secs 2014-09-06 03:56:44.476909 osd.0 192.168.1.71:6805/11793 730 : [WRN] slow request 30.058434 seconds old, received at 2014-09-06 03:56:14.418429: osd_op(client.19843278.0:46081 rb.0.c7fd7f.238e1f29.b3fa [delete] 15.b8fb7551 ack+ondisk+write e38950) v4 currently waiting for blocked object 2014-09-06 03:56:49.477785 osd.0 192.168.1.71:6805/11793 731 : [WRN] 2 slow requests, 1 included below; oldest blocked for 35.059315 secs [... stabilizes here:] 2014-09-06 22:13:48.771531 mon.0 192.168.1.71:6789/0 632527 : [INF] pgmap v15169313: 3196 pgs: 2882 active+clean, 314 active+remapped; 4139 GB data, 11215 GB used, 3681 GB / 14896 GB avail; 64 B/s rd, 64 B/s wr, 0 op/s; -883219/3420796 objects degraded (-25.819%) [...] 2014-09-07 03:09:48.491325 mon.0 192.168.1.71:6789/0 633880 : [INF] pgmap v15170666: 3196 pgs: 2882 active+clean, 314 active+remapped; 4139 GB data, 11215 GB used, 3681 GB / 14896 GB avail; 18727 B/s wr, 2 op/s; -883219/3420796 objects degraded (-25.819%) And now, during data movement I described before: 2014-09-20 15:16:13.394694 mon.0 [INF] pgmap v15344707: 3196 pgs: 2132 active+clean, 432 active+remapped+wait_backfill, 621 active+remapped, 11 active+remapped+backfilling; 4139 GB data, 6831 GB used, 5271 GB / 12103 GB avail; 379097/3792969 objects degraded (9.995%) If some ceph developer wants me to do something or to provide some data, please say so quickly, I will probably
Re: [ceph-users] Part 2: ssd osd fails often with FAILED assert(soid scrubber.start || soid = scrubber.end)
Hi, All,Loic I have exactly the same error. I understand the problem is in 0.80.9? Thank you. Sat Jan 17 2015 at 2:21:09 AM, Loic Dachary l...@dachary.org: On 14/01/2015 18:33, Udo Lembke wrote: Hi Loic, thanks for the answer. I hope it's not like in http://tracker.ceph.com/issues/8747 where the issue happens with an patched version if understand right. http://tracker.ceph.com/issues/8747 is a duplicate of http://tracker.ceph.com/issues/8011 indeed :-) So I must only wait few month ;-) for an backport... Udo Am 14.01.2015 09:40, schrieb Loic Dachary: Hi, This is http://tracker.ceph.com/issues/8011 which is being backported. Cheers -- Loïc Dachary, Artisan Logiciel Libre ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG num calculator live on Ceph.com
Very very good :) пт, 9 янв. 2015, 2:17, William Bloom (wibloom) wibl...@cisco.com: Awesome, thanks Michael. Regards William *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of *Michael J. Kidd *Sent:* Wednesday, January 07, 2015 2:09 PM *To:* ceph-us...@ceph.com *Subject:* [ceph-users] PG num calculator live on Ceph.com Hello all, Just a quick heads up that we now have a PG calculator to help determine the proper PG per pool numbers to achieve a target PG per OSD ratio. http://ceph.com/pgcalc Please check it out! Happy to answer any questions, and always welcome any feedback on the tool / verbiage, etc... As an aside, we're also working to update the documentation to reflect the best practices. See Ceph.com tracker for this at: http://tracker.ceph.com/issues/9867 Thanks! Michael J. Kidd Sr. Storage Consultant Inktank Professional Services - by Red Hat ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] system metrics monitoring
Hi. We use Zabbix. 2014-12-12 8:33 GMT+03:00 pragya jain prag_2...@yahoo.co.in: hello sir! I need some open source monitoring tool for examining these metrics. Please suggest some open source monitoring software. Thanks Regards Pragya Jain On Thursday, 11 December 2014 9:16 PM, Denish Patel den...@omniti.com wrote: Try http://www.circonus.com On Thu, Dec 11, 2014 at 1:22 AM, pragya jain prag_2...@yahoo.co.in wrote: please somebody reply my query. Regards Pragya Jain On Tuesday, 9 December 2014 11:53 AM, pragya jain prag_2...@yahoo.co.in wrote: hello all! As mentioned at statistics and monitoring page of Riak Systems Metrics To Graph http://docs.basho.com/riak/latest/ops/running/stats-and-monitoring/#Systems-Metrics-To-Graph MetricAvailable Disk SpaceIOWaitRead OperationsWrite OperationsNetwork ThroughputLoad Average Can somebody suggest me some monitoring tools that monitor these metrics? Regards Pragya Jain ___ riak-users mailing list riak-us...@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com -- Denish Patel, OmniTI Computer Consulting Inc. Database Architect, http://omniti.com/does/data-management http://www.pateldenish.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] VM restore on Ceph *very* slow
Hi. For faster operation, use rbd export/export-diff and import/import-diff 2014-12-11 17:17 GMT+03:00 Lindsay Mathieson lindsay.mathie...@gmail.com: Anyone know why a VM live restore would be excessively slow on Ceph? restoring a small VM with 12GB disk/2GB Ram is taking 18 *minutes*. Larger VM's can be over half an hour. The same VM's on the same disks, but native, or glusterfs take less than 30 seconds. VM's are KVM on Proxmox. thanks, -- Lindsay ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] VM restore on Ceph *very* slow
Examples Backups: /usr/bin/nice -n +20 /usr/bin/rbd -n client.backup export test/vm-105-disk-1@rbd_data.505392ae8944a - | /usr/bin/pv -s 40G -n -i 1 | /usr/bin/nice -n +20 /usr/bin/pbzip2 -c /backup/vm-105-disk-1 Restore: pbzip2 -dk /nfs/RBD/big-vm-268-disk-1-LyncV2-20140830-011308.pbzip2 -c | rbd -n client.rbdbackup -k /etc/ceph/big.keyring -c /etc/ceph/big.conf import --image-format 2 - rbd/Lyncolddisk1 2014-12-12 8:38 GMT+03:00 Irek Fasikhov malm...@gmail.com: Hi. For faster operation, use rbd export/export-diff and import/import-diff 2014-12-11 17:17 GMT+03:00 Lindsay Mathieson lindsay.mathie...@gmail.com : Anyone know why a VM live restore would be excessively slow on Ceph? restoring a small VM with 12GB disk/2GB Ram is taking 18 *minutes*. Larger VM's can be over half an hour. The same VM's on the same disks, but native, or glusterfs take less than 30 seconds. VM's are KVM on Proxmox. thanks, -- Lindsay ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] What's the difference between ceph-0.87-0.el6.x86_64.rpm and ceph-0.80.7-0.el6.x86_64.rpm
Hi, Cao. https://github.com/ceph/ceph/commits/firefly 2014-12-11 5:00 GMT+03:00 Cao, Buddy buddy@intel.com: Hi, I tried to download firefly rpm package, but found two rpms existing in different folders, what is the difference of 0.87.0 and 0.80.7? http://ceph.com/rpm/el6/x86_64/ceph-0.87-0.el6.x86_64.rpm http://ceph.com/rpm-firefly/el6/x86_64/ceph-0.80.7-0.el6.x86_64.rpm Wei Cao (Buddy) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] active+degraded on an empty new cluster
Hi. http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/ ceph pg force_create_pg pgid 2014-12-09 14:50 GMT+03:00 Giuseppe Civitella giuseppe.civite...@gmail.com : Hi all, last week I installed a new ceph cluster on 3 vm running Ubuntu 14.04 with default kernel. There is a ceph monitor a two osd hosts. Here are some datails: ceph -s cluster c46d5b02-dab1-40bf-8a3d-f8e4a77b79da health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean monmap e1: 1 mons at {ceph-mon1=10.1.1.83:6789/0}, election epoch 1, quorum 0 ceph-mon1 osdmap e83: 6 osds: 6 up, 6 in pgmap v231: 192 pgs, 3 pools, 0 bytes data, 0 objects 207 MB used, 30446 MB / 30653 MB avail 192 active+degraded root@ceph-mon1:/home/ceph# ceph osd dump epoch 99 fsid c46d5b02-dab1-40bf-8a3d-f8e4a77b79da created 2014-12-06 13:15:06.418843 modified 2014-12-09 11:38:04.353279 flags pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 18 flags hashpspool crash_replay_interval 45 stripe_width 0 pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 19 flags hashpspool stripe_width 0 pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 20 flags hashpspool stripe_width 0 max_osd 6 osd.0 up in weight 1 up_from 90 up_thru 90 down_at 89 last_clean_interval [58,89) 10.1.1.84:6805/995 10.1.1.84:6806/4000995 10.1.1.84:6807/4000995 10.1.1.84:6808/4000995 exists,up e3895075-614d-48e2-b956-96e13dbd87fe osd.1 up in weight 1 up_from 88 up_thru 0 down_at 87 last_clean_interval [8,87) 10.1.1.85:6800/23146 10.1.1.85:6815/7023146 10.1.1.85:6816/7023146 10.1.1.85:6817/7023146 exists,up 144bc6ee-2e3d-4118-a460-8cc2bb3ec3e8 osd.2 up in weight 1 up_from 61 up_thru 0 down_at 60 last_clean_interval [11,60) 10.1.1.85:6805/26784 10.1.1.85:6802/5026784 10.1.1.85:6811/5026784 10.1.1.85:6812/5026784 exists,up 8d5c7108-ef11-4947-b28c-8e20371d6d78 osd.3 up in weight 1 up_from 95 up_thru 0 down_at 94 last_clean_interval [57,94) 10.1.1.84:6800/810 10.1.1.84:6810/3000810 10.1.1.84:6811/3000810 10.1.1.84:6812/3000810 exists,up bd762b2d-f94c-4879-8865-cecd63895557 osd.4 up in weight 1 up_from 97 up_thru 0 down_at 96 last_clean_interval [74,96) 10.1.1.84:6801/9304 10.1.1.84:6802/2009304 10.1.1.84:6803/2009304 10.1.1.84:6813/2009304 exists,up 7d28a54b-b474-4369-b958-9e6bf6c856aa osd.5 up in weight 1 up_from 99 up_thru 0 down_at 98 last_clean_interval [79,98) 10.1.1.85:6801/19513 10.1.1.85:6808/2019513 10.1.1.85:6810/2019513 10.1.1.85:6813/2019513 exists,up f4d76875-0e40-487c-a26d-320f8b8d60c5 root@ceph-mon1:/home/ceph# ceph osd tree # idweight type name up/down reweight -1 0 root default -2 0 host ceph-osd1 0 0 osd.0 up 1 3 0 osd.3 up 1 4 0 osd.4 up 1 -3 0 host ceph-osd2 1 0 osd.1 up 1 2 0 osd.2 up 1 5 0 osd.5 up 1 Current HEALTH_WARN state says 192 active+degraded since I rebooted an osd host. Previously it was incomplete. It never reached a HEALTH_OK state. Any hint about what to do next to have an healthy cluster? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Issue in renaming rbd
Hi. You can only rename in the same pool. For transfer to another pool: rbd cp and rbd export/import. 2014-12-03 16:15 GMT+03:00 Mallikarjun Biradar mallikarjuna.bira...@gmail.com: Hi all, Whether renaming rbd is allowed? I am getting this error, ems@rack6-ramp-4:~$ sudo rbd rename rbdPool1 -p testPool2 rbdPoolTest1 -p testPool2 rbd: mv/rename across pools not supported source pool: testPool2 dest pool: rbd ems@rack6-ramp-4:~$ ems@rack6-ramp-4:~$ sudo rbd rename rbdPool1 rbdPoolTest1 rbd: rename error: (2) No such file or directory 2014-12-03 18:41:50.786397 7f73b4f75840 -1 librbd: error finding source object: (2) No such file or directory ems@rack6-ramp-4:~$ ems@rack6-ramp-4:~$ sudo rbd ls -p testPool2 rbdPool1 ems@rack6-ramp-4:~$ Why its taking rbd as destination pool, though I have provided another pool as per syntax. Syntax in rbd help: rbd (mv | rename) src dest rename src image to dest The rbd which I am trying to rename is mounted and IO is running on it. -Thanks regards, Mallikarjun Biradar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Issue in renaming rbd
root@backhb2:~# ceph osd pool -h | grep rename osd pool rename poolname poolnamerename srcpool to destpool 2014-12-03 16:23 GMT+03:00 Mallikarjun Biradar mallikarjuna.bira...@gmail.com: Hi, I am trying to rename in the same pool. sudo rbd rename rbdPool1 -p testPool2 rbdPoolTest1 -p testPool2 -Thanks Regards, Mallikarjun Biradar On Wed, Dec 3, 2014 at 6:50 PM, Irek Fasikhov malm...@gmail.com wrote: Hi. You can only rename in the same pool. For transfer to another pool: rbd cp and rbd export/import. 2014-12-03 16:15 GMT+03:00 Mallikarjun Biradar mallikarjuna.bira...@gmail.com: Hi all, Whether renaming rbd is allowed? I am getting this error, ems@rack6-ramp-4:~$ sudo rbd rename rbdPool1 -p testPool2 rbdPoolTest1 -p testPool2 rbd: mv/rename across pools not supported source pool: testPool2 dest pool: rbd ems@rack6-ramp-4:~$ ems@rack6-ramp-4:~$ sudo rbd rename rbdPool1 rbdPoolTest1 rbd: rename error: (2) No such file or directory 2014-12-03 18:41:50.786397 7f73b4f75840 -1 librbd: error finding source object: (2) No such file or directory ems@rack6-ramp-4:~$ ems@rack6-ramp-4:~$ sudo rbd ls -p testPool2 rbdPool1 ems@rack6-ramp-4:~$ Why its taking rbd as destination pool, though I have provided another pool as per syntax. Syntax in rbd help: rbd (mv | rename) src dest rename src image to dest The rbd which I am trying to rename is mounted and IO is running on it. -Thanks regards, Mallikarjun Biradar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] trouble starting second monitor
[celtic][DEBUG ] create the mon path if it does not exist mkdir /var/lib/ceph/mon/ 2014-12-01 4:32 GMT+03:00 K Richard Pixley r...@noir.com: What does this mean, please? --rich ceph@adriatic:~/my-cluster$ ceph status cluster 1023db58-982f-4b78-b507-481233747b13 health HEALTH_OK monmap e1: 1 mons at {black=192.168.1.77:6789/0}, election epoch 2, quorum 0 black mdsmap e7: 1/1/1 up {0=adriatic=up:active}, 3 up:standby osdmap e17: 4 osds: 4 up, 4 in pgmap v48: 192 pgs, 3 pools, 1884 bytes data, 20 objects 29134 MB used, 113 GB / 149 GB avail 192 active+clean ceph@adriatic:~/my-cluster$ ceph-deploy mon create celtic [ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.20): /usr/bin/ceph-deploy mon create celtic [ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts celtic [ceph_deploy.mon][DEBUG ] detecting platform for host celtic ... [celtic][DEBUG ] connection detected need for sudo [celtic][DEBUG ] connected to host: celtic [celtic][DEBUG ] detect platform information from remote host [celtic][DEBUG ] detect machine type [ceph_deploy.mon][INFO ] distro info: Ubuntu 14.04 trusty [celtic][DEBUG ] determining if provided host has same hostname in remote [celtic][DEBUG ] get remote short hostname [celtic][DEBUG ] deploying mon to celtic [celtic][DEBUG ] get remote short hostname [celtic][DEBUG ] remote hostname: celtic [celtic][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf [celtic][DEBUG ] create the mon path if it does not exist [celtic][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-celtic/ done [celtic][DEBUG ] create a done file to avoid re-doing the mon deployment [celtic][DEBUG ] create the init path if it does not exist [celtic][DEBUG ] locating the `service` executable... [celtic][INFO ] Running command: sudo initctl emit ceph-mon cluster=ceph id=celtic [celtic][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.celtic.asok mon_status [celtic][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory [celtic][WARNIN] monitor: mon.celtic, might not be running yet [celtic][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.celtic.asok mon_status [celtic][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory [celtic][WARNIN] celtic is not defined in `mon initial members` [celtic][WARNIN] monitor celtic does not exist in monmap [celtic][WARNIN] neither `public_addr` nor `public_network` keys are defined for monitors [celtic][WARNIN] monitors may not be able to form quorum ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] S3CMD and Ceph
Hi,Ben! Do you have problems with permissions. The configuration is fully operational. 2014-11-27 11:39 GMT+03:00 Ben b@benjackson.email: Even with those settings it doesnt work. I still get ERROR: Access to bucket 'BUCKET' was denied' Radosgw-admin shows me as the owner of the bucket, and when i do 's3cmd ls' by itself, it lists all buckets. But when I do 's3cmd ls s3://BUCKET' it gives me denied error. On 27/11/14 19:32, Irek Fasikhov wrote: I like this work. [rbd@rbdbackup ~]$ cat .s3cfg [default] access_key = 2M4PRTYOGI3AXBZFAXFR secret_key = LQYFttxRn+7bBJ5rD1Y7ckZCN8XjEInOFY3s9RUR host_base = s3.X.ru host_bucket = %(bucket)s.s3.X.ru enable_multipart = True multipart_chunk_size_mb = 30 use_https = True 2014-11-27 7:43 GMT+03:00 b b@benjackson.email: I'm having some issues with a user in ceph using S3 Browser and S3cmd It was previously working. I can no longer use s3cmd to list the contents of a bucket, i am getting 403 and 405 errors When using S3browser, I can see the contents of the bucket, I can upload files, but i cannot create additional folders within the bucket (i get 403 error) The bucket is owned by the user, I am using the correct keys, I have checked the keys for escape characters, but there are no slashes in the key. I'm not sure what else I can do to get this to work. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] S3CMD and Ceph
I like this work. [rbd@rbdbackup ~]$ cat .s3cfg [default] access_key = 2M4PRTYOGI3AXBZFAXFR secret_key = LQYFttxRn+7bBJ5rD1Y7ckZCN8XjEInOFY3s9RUR host_base = s3.X.ru host_bucket = %(bucket)s.s3.X.ru enable_multipart = True multipart_chunk_size_mb = 30 use_https = True 2014-11-27 7:43 GMT+03:00 b b@benjackson.email: I'm having some issues with a user in ceph using S3 Browser and S3cmd It was previously working. I can no longer use s3cmd to list the contents of a bucket, i am getting 403 and 405 errors When using S3browser, I can see the contents of the bucket, I can upload files, but i cannot create additional folders within the bucket (i get 403 error) The bucket is owned by the user, I am using the correct keys, I have checked the keys for escape characters, but there are no slashes in the key. I'm not sure what else I can do to get this to work. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] osds fails to start with mismatch in id
Hi, Ramakrishna. I think you understand what the problem is: [ceph@ceph05 ~]$ cat /var/lib/ceph/osd/ceph-56/whoami 56 [ceph@ceph05 ~]$ cat /var/lib/ceph/osd/ceph-57/whoami 57 Tue Nov 11 2014 at 6:01:40, Ramakrishna Nishtala (rnishtal) rnish...@cisco.com: Hi Greg, Thanks for the pointer. I think you are right. The full story is like this. After installation, everything works fine until I reboot. I do observe udevadm getting triggered in logs, but the devices do not come up after reboot. Exact issue as http://tracker.ceph.com/issues/5194. But this has been fixed a while back per the case details. As a workaround, I copied the contents from /proc/mounts to fstab and that’s where I landed into the issue. After your suggestion, defined as UUID in fstab, but similar problem. blkid.tab now moved to tmpfs and also isn’t consistent ever after issuing blkid explicitly to get the UUID’s. Goes in line with ceph-disk comments. Decided to reinstall, dd the partitions, zapdisks etc. Did not help. Very weird that links below change in /dev/disk/by-uuid and /dev/disk/by-partuuid etc. *Before reboot* lrwxrwxrwx 1 root root 10 Nov 10 06:31 11aca3e2-a9d5-4bcc-a5b0-441c53d473b6 - ../../sdd2 lrwxrwxrwx 1 root root 10 Nov 10 06:31 89594989-90cb-4144-ac99-0ffd6a04146e - ../../sde2 lrwxrwxrwx 1 root root 10 Nov 10 06:31 c17fe791-5525-4b09-92c4-f90eaaf80dc6 - ../../sda2 lrwxrwxrwx 1 root root 10 Nov 10 06:31 c57541a1-6820-44a8-943f-94d68b4b03d4 - ../../sdc2 lrwxrwxrwx 1 root root 10 Nov 10 06:31 da7030dd-712e-45e4-8d89-6e795d9f8011 - ../../sdb2 *After reboot* lrwxrwxrwx 1 root root 10 Nov 10 09:50 11aca3e2-a9d5-4bcc-a5b0-441c53d473b6 - ../../sdd2 lrwxrwxrwx 1 root root 10 Nov 10 09:50 89594989-90cb-4144-ac99-0ffd6a04146e - ../../sde2 lrwxrwxrwx 1 root root 10 Nov 10 09:50 c17fe791-5525-4b09-92c4-f90eaaf80dc6 - ../../sda2 lrwxrwxrwx 1 root root 10 Nov 10 09:50 c57541a1-6820-44a8-943f-94d68b4b03d4 - ../../sdb2 lrwxrwxrwx 1 root root 10 Nov 10 09:50 da7030dd-712e-45e4-8d89-6e795d9f8011 - ../../sdh2 Essentially, the transformation here is sdb2-sdh2 and sdc2- sdb2. In fact I haven’t partitioned my sdh at all before the test. The only difference probably from the standard procedure is I have pre-created the partitions for the journal and data, with parted. /lib/udev/rules.d osd rules has four different partition GUID codes, 45b0969e-9b03-4f30-b4c6-5ec00ceff106, 45b0969e-9b03-4f30-b4c6-b4b80ceff106, 4fbd7e29-9d25-41b8-afd0-062c0ceff05d, 4fbd7e29-9d25-41b8-afd0-5ec00ceff05d, But all my partitions journal/data are having ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 as partition guid code. Appreciate any help. Regards, Rama = -Original Message- From: Gregory Farnum [mailto:g...@gregs42.com] Sent: Sunday, November 09, 2014 3:36 PM To: Ramakrishna Nishtala (rnishtal) Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] osds fails to start with mismatch in id On Sun, Nov 9, 2014 at 3:21 PM, Ramakrishna Nishtala (rnishtal) rnish...@cisco.com wrote: Hi I am on ceph 0.87, RHEL 7 Out of 60 few osd’s start and the rest complain about mismatch about id’s as below. 2014-11-09 07:09:55.501177 7f4633e01880 -1 OSD id 56 != my id 53 2014-11-09 07:09:55.810048 7f636edf4880 -1 OSD id 57 != my id 54 2014-11-09 07:09:56.122957 7f459a766880 -1 OSD id 58 != my id 55 2014-11-09 07:09:56.429771 7f87f8e0c880 -1 OSD id 0 != my id 56 2014-11-09 07:09:56.741329 7fadd9b91880 -1 OSD id 2 != my id 57 Found one OSD ID in /var/lib/ceph/cluster-id/keyring. To check this out manually corrected it and turned authentication to none too, but did not help. Any clues, how it can be corrected? It sounds like maybe the symlinks to data and journal aren't matching up with where they're supposed to be. This is usually a result of using unstable /dev links that don't always match to the same physical disks. Have you checked that? -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG inconsistency
What is your version of the ceph? 0.80.0 - 0.80.3 https://github.com/ceph/ceph/commit/7557a8139425d1705b481d7f010683169fd5e49b Thu Nov 06 2014 at 16:24:21, GuangYang yguan...@outlook.com: Hello Cephers, Recently we observed a couple of inconsistencies in our Ceph cluster, there were two major patterns leading to inconsistency as I observed: 1) EIO to read the file, 2) the digest is inconsistent (for EC) even there is no read error). While ceph has built-in tool sets to repair the inconsistencies, I also would like to check with the community in terms of what is the best ways to handle such issues (e.g. should we run fsck / xfs_repair when such issue happens). In more details, I have the following questions: 1. When there is inconsistency detected, what is the chance there is some hardware issues which need to be repaired physically, or should I run some disk/filesystem tools to further check? 2. Should we use fsck / xfs_repair to fix the inconsistencies, or should we solely relay on Ceph's repair tool sets? It would be great to hear you experience and suggestions. BTW, we are using XFS in the cluster. Thanks, Guang ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG inconsistency
Thu Nov 06 2014 at 16:44:09, GuangYang yguan...@outlook.com: Thanks Dan. By killed/formatted/replaced the OSD, did you replace the disk? Not an filesystem expert here, but would like to understand the underlying what happened behind the EIO and does that reveal something (e.g. hardware issue). In our case, we are using 6TB drive so that there are lot of data to migrate and as backfilling/recovering bring latency increasing, we hope to avoid that as much as we can.. For example, use the following parameters: osd_recovery_delay_start = 10 osd recovery op priority = 2 osd max backfills = 1 osd recovery max active =1 osd recovery threads = 1 Thanks, Guang From: daniel.vanders...@cern.ch Date: Thu, 6 Nov 2014 13:36:46 + Subject: Re: PG inconsistency To: yguan...@outlook.com; ceph-users@lists.ceph.com Hi, I've only ever seen (1), EIO to read a file. In this case I've always just killed / formatted / replaced that OSD completely -- that moves the PG to a new master and the new replication fixes the inconsistency. This way, I've never had to pg repair. I don't know if this is a best or even good practise, but it works for us. Cheers, Dan On Thu Nov 06 2014 at 2:24:32 PM GuangYang yguan...@outlook.commailto:yguan...@outlook.com wrote: Hello Cephers, Recently we observed a couple of inconsistencies in our Ceph cluster, there were two major patterns leading to inconsistency as I observed: 1) EIO to read the file, 2) the digest is inconsistent (for EC) even there is no read error). While ceph has built-in tool sets to repair the inconsistencies, I also would like to check with the community in terms of what is the best ways to handle such issues (e.g. should we run fsck / xfs_repair when such issue happens). In more details, I have the following questions: 1. When there is inconsistency detected, what is the chance there is some hardware issues which need to be repaired physically, or should I run some disk/filesystem tools to further check? 2. Should we use fsck / xfs_repair to fix the inconsistencies, or should we solely relay on Ceph's repair tool sets? It would be great to hear you experience and suggestions. BTW, we are using XFS in the cluster. Thanks, Guang ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
Hi,Udo. Good value :) Whether an additional optimization on the host? Thanks. Thu Nov 06 2014 at 16:57:36, Udo Lembke ulem...@polarzone.de: Hi, from one host to five OSD-hosts. NIC Intel 82599EB; jumbo-frames; single Switch IBM G8124 (blade network). rtt min/avg/max/mdev = 0.075/0.114/0.231/0.037 ms rtt min/avg/max/mdev = 0.088/0.164/0.739/0.072 ms rtt min/avg/max/mdev = 0.081/0.141/0.229/0.030 ms rtt min/avg/max/mdev = 0.083/0.115/0.183/0.030 ms rtt min/avg/max/mdev = 0.087/0.144/0.190/0.028 ms Udo Am 06.11.2014 14:18, schrieb Wido den Hollander: Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Full backup/restore of Ceph cluster?
Hi. I changed the script and added it multithreaded archiver. See: http://www.theirek.com/blog/2014/10/26/primier-biekapa-rbd-ustroistva 2014-11-05 14:03 GMT+03:00 Alexandre DERUMIER aderum...@odiso.com: What if I just wanted to back up a running cluster without having another cluster to replicate to Yes, import is optionnal, you can simply export and pipe to tar rbd export-diff --from-snap snap1 pool/image@snap2 - | tar - Mail original - De: Christopher Armstrong ch...@opdemand.com À: Alexandre DERUMIER aderum...@odiso.com Cc: ceph-users@lists.ceph.com Envoyé: Mercredi 5 Novembre 2014 10:08:49 Objet: Re: [ceph-users] Full backup/restore of Ceph cluster? Hi Alexandre, Thanks for the link! Unless I'm misunderstanding, this is to replicate an RBD volume from one cluster to another. ? i.e. I'd ideally like a tarball of raw files that I could extract on a new host, start the Ceph daemons, and get up and running. Chris Armstrong Head of Services OpDemand / Deis.io GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/ On Wed, Nov 5, 2014 at 1:04 AM, Alexandre DERUMIER aderum...@odiso.com wrote: Is RBD snapshotting what I'm looking for? Is this even possible? Yes, you can use rbd snapshoting, export / import http://ceph.com/dev-notes/incremental-snapshots-with-rbd/ But you need to do it for each rbd volume. Here a script to do it: http://www.rapide.nl/blog/item/ceph_-_rbd_replication (AFAIK it's not possible to do it at pool level) - Mail original - De: Christopher Armstrong ch...@opdemand.com À: ceph-users@lists.ceph.com Envoyé: Mercredi 5 Novembre 2014 08:52:31 Objet: [ceph-users] Full backup/restore of Ceph cluster? Hi folks, I was wondering if anyone has a solution for performing a complete backup and restore of a CEph cluster. A Google search came up with some articles/blog posts, some of which are old, and I don't really have a great idea of the feasibility of this. Here's what I've found: http://ceph.com/community/blog/tag/backup/ http://ceph.com/docs/giant/rbd/rbd-snapshot/ http://t3491.file-systems-ceph-user.file-systemstalk.us/backups-t3491.html Is RBD snapshotting what I'm looking for? Is this even possible? Any info is much appreciated! Thanks, Chris Chris Armstrong Head of Services OpDemand / Deis.io GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] where to download 0.87 debs?
http://ceph.com/debian-giant/ :) 2014-10-30 12:45 GMT+03:00 Jon Kåre Hellan jon.kare.hel...@uninett.no: Will there be debs? On 30/10/14 10:37, Irek Fasikhov wrote: Hi. Use http://ceph.com/rpm-giant/ 2014-10-30 12:34 GMT+03:00 Kenneth Waegeman kenneth.waege...@ugent.be: Hi, Will http://ceph.com/rpm/ also be updated to have the giant packages? Thanks Kenneth - Message from Patrick McGarry patr...@inktank.com - Date: Wed, 29 Oct 2014 22:13:50 -0400 From: Patrick McGarry patr...@inktank.com Subject: Re: [ceph-users] where to download 0.87 RPMS? To: 廖建锋 de...@f-club.cn Cc: ceph-users ceph-users@lists.ceph.com I have updated the http://ceph.com/get page to reflect a more generic approach to linking. It's also worth noting that the new http://download.ceph.com/ infrastructure is available now. To get to the rpms specifically you can either crawl the download.ceph.com tree or use the symlink at http://ceph.com/rpm-giant/ Hope that (and the updated linkage on ceph.com/get) helps. Thanks! Best Regards, Patrick McGarry Director Ceph Community || Red Hat http://ceph.com || http://community.redhat.com @scuttlemonkey || @ceph On Wed, Oct 29, 2014 at 9:15 PM, 廖建锋 de...@f-club.cn wrote: ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com - End message from Patrick McGarry patr...@inktank.com - -- Met vriendelijke groeten, Kenneth Waegeman ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] When will Ceph 0.72.3?
Dear developers. Very much want io priorities ;) During the execution of Snap roollback appear slow queries. Thanks -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Use 2 osds to create cluster but health check display active+degraded
Hi. Because the disc requires three different hosts, the default number of replications 3. 2014-10-29 10:56 GMT+03:00 Vickie CH mika.leaf...@gmail.com: Hi all, Try to use two OSDs to create a cluster. After the deply finished, I found the health status is 88 active+degraded 104 active+remapped. Before use 2 osds to create cluster the result is ok. I'm confuse why this situation happened. Do I need to set crush map to fix this problem? --ceph.conf- [global] fsid = c404ded6-4086-4f0b-b479-89bc018af954 mon_initial_members = storage0 mon_host = 192.168.1.10 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd_pool_default_size = 2 osd_pool_default_min_size = 1 osd_pool_default_pg_num = 128 osd_journal_size = 2048 osd_pool_default_pgp_num = 128 osd_mkfs_type = xfs - ---ceph -s--- cluster c404ded6-4086-4f0b-b479-89bc018af954 health HEALTH_WARN 88 pgs degraded; 192 pgs stuck unclean monmap e1: 1 mons at {storage0=192.168.10.10:6789/0}, election epoch 2, quorum 0 storage0 osdmap e20: 2 osds: 2 up, 2 in pgmap v45: 192 pgs, 3 pools, 0 bytes data, 0 objects 79752 kB used, 1858 GB / 1858 GB avail 88 active+degraded 104 active+remapped Best wishes, Mika ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Use 2 osds to create cluster but health check display active+degraded
Hi. This parameter does not apply to pools by default. ceph osd dump | grep pool. see size=? 2014-10-29 11:40 GMT+03:00 Vickie CH mika.leaf...@gmail.com: Der Irek: Thanks for your reply. Even already set osd_pool_default_size = 2 the cluster still need 3 different hosts right? Is this default number can be changed by user and write into ceph.conf before deploy? Best wishes, Mika 2014-10-29 16:29 GMT+08:00 Irek Fasikhov malm...@gmail.com: Hi. Because the disc requires three different hosts, the default number of replications 3. 2014-10-29 10:56 GMT+03:00 Vickie CH mika.leaf...@gmail.com: Hi all, Try to use two OSDs to create a cluster. After the deply finished, I found the health status is 88 active+degraded 104 active+remapped. Before use 2 osds to create cluster the result is ok. I'm confuse why this situation happened. Do I need to set crush map to fix this problem? --ceph.conf- [global] fsid = c404ded6-4086-4f0b-b479-89bc018af954 mon_initial_members = storage0 mon_host = 192.168.1.10 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd_pool_default_size = 2 osd_pool_default_min_size = 1 osd_pool_default_pg_num = 128 osd_journal_size = 2048 osd_pool_default_pgp_num = 128 osd_mkfs_type = xfs - ---ceph -s--- cluster c404ded6-4086-4f0b-b479-89bc018af954 health HEALTH_WARN 88 pgs degraded; 192 pgs stuck unclean monmap e1: 1 mons at {storage0=192.168.10.10:6789/0}, election epoch 2, quorum 0 storage0 osdmap e20: 2 osds: 2 up, 2 in pgmap v45: 192 pgs, 3 pools, 0 bytes data, 0 objects 79752 kB used, 1858 GB / 1858 GB avail 88 active+degraded 104 active+remapped Best wishes, Mika ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Use 2 osds to create cluster but health check display active+degraded
Mark. I meant that the existing pools, this parameter is not used. I'm sure he pools DATA, METADATA, RDB(They are created by default) have size = 3. 2014-10-29 11:56 GMT+03:00 Mark Kirkwood mark.kirkw...@catalyst.net.nz: That is not my experience: $ ceph -v ceph version 0.86-579-g06a73c3 (06a73c39169f2f332dec760f56d3ec20455b1646) $ cat /etc/ceph/ceph.conf [global] ... osd pool default size = 2 $ ceph osd dump|grep size pool 2 'hot' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 47 flags hashpspool,incomplete_clones tier_of 1 cache_mode writeback target_bytes 20 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 3600s x1 stripe_width 0 pool 10 '.rgw.root' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 102 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 11 '.rgw.control' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 104 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 12 '.rgw' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 106 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 13 '.rgw.gc' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 107 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 14 '.users.uid' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 108 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 15 '.rgw.buckets.index' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 110 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 16 '.rgw.buckets' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 112 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 17 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 186 flags hashpspool stripe_width 0 On 29/10/14 21:46, Irek Fasikhov wrote: Hi. This parameter does not apply to pools by default. ceph osd dump | grep pool. see size=? 2014-10-29 11:40 GMT+03:00 Vickie CH mika.leaf...@gmail.com mailto:mika.leaf...@gmail.com: Der Irek: Thanks for your reply. Even already set osd_pool_default_size = 2 the cluster still need 3 different hosts right? Is this default number can be changed by user and write into ceph.conf before deploy? Best wishes, Mika 2014-10-29 16:29 GMT+08:00 Irek Fasikhov malm...@gmail.com mailto:malm...@gmail.com: Hi. Because the disc requires three different hosts, the default number of replications 3. 2014-10-29 10:56 GMT+03:00 Vickie CH mika.leaf...@gmail.com mailto:mika.leaf...@gmail.com: Hi all, Try to use two OSDs to create a cluster. After the deply finished, I found the health status is 88 active+degraded 104 active+remapped. Before use 2 osds to create cluster the result is ok. I'm confuse why this situation happened. Do I need to set crush map to fix this problem? --ceph.conf- [global] fsid = c404ded6-4086-4f0b-b479-89bc018af954 mon_initial_members = storage0 mon_host = 192.168.1.10 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd_pool_default_size = 2 osd_pool_default_min_size = 1 osd_pool_default_pg_num = 128 osd_journal_size = 2048 osd_pool_default_pgp_num = 128 osd_mkfs_type = xfs - ---ceph -s--- cluster c404ded6-4086-4f0b-b479-89bc018af954 health HEALTH_WARN 88 pgs degraded; 192 pgs stuck unclean monmap e1: 1 mons at {storage0=192.168.10.10:6789/0 http://192.168.10.10:6789/0}, election epoch 2, quorum 0 storage0 osdmap e20: 2 osds: 2 up, 2 in pgmap v45: 192 pgs, 3 pools, 0 bytes data, 0 objects 79752 kB used, 1858 GB / 1858 GB avail 88 active+degraded 104 active+remapped Best wishes, Mika ___ ceph-users mailing list
Re: [ceph-users] Use 2 osds to create cluster but health check display active+degraded
ceph osd tree please :) 2014-10-29 12:03 GMT+03:00 Vickie CH mika.leaf...@gmail.com: Dear all, Thanks for the reply. Pool replicated size is 2. Because the replicated size parameter already write into ceph.conf before deploy. Because not familiar crush map. I will according Mark's information to do a test that change the crush map to see the result. ---ceph.conf-- [global] fsid = c404ded6-4086-4f0b-b479- 89bc018af954 mon_initial_members = storage0 mon_host = 192.168.1.10 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true *osd_pool_default_size = 2osd_pool_default_min_size = 1* osd_pool_default_pg_num = 128 osd_journal_size = 2048 osd_pool_default_pgp_num = 128 osd_mkfs_type = xfs --- --ceph osd dump result - pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 14 flags hashpspool crash_replay_interval 45 stripe_width 0 pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 15 flags hashpspool stripe_width 0 pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 16 flags hashpspool stripe_width 0 max_osd 2 -- Best wishes, Mika Best wishes, Mika 2014-10-29 16:56 GMT+08:00 Mark Kirkwood mark.kirkw...@catalyst.net.nz: That is not my experience: $ ceph -v ceph version 0.86-579-g06a73c3 (06a73c39169f2f332dec760f56d3ec20455b1646) $ cat /etc/ceph/ceph.conf [global] ... osd pool default size = 2 $ ceph osd dump|grep size pool 2 'hot' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 47 flags hashpspool,incomplete_clones tier_of 1 cache_mode writeback target_bytes 20 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 3600s x1 stripe_width 0 pool 10 '.rgw.root' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 102 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 11 '.rgw.control' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 104 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 12 '.rgw' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 106 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 13 '.rgw.gc' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 107 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 14 '.users.uid' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 108 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 15 '.rgw.buckets.index' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 110 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 16 '.rgw.buckets' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 112 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 17 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 186 flags hashpspool stripe_width 0 On 29/10/14 21:46, Irek Fasikhov wrote: Hi. This parameter does not apply to pools by default. ceph osd dump | grep pool. see size=? 2014-10-29 11:40 GMT+03:00 Vickie CH mika.leaf...@gmail.com mailto:mika.leaf...@gmail.com: Der Irek: Thanks for your reply. Even already set osd_pool_default_size = 2 the cluster still need 3 different hosts right? Is this default number can be changed by user and write into ceph.conf before deploy? Best wishes, Mika 2014-10-29 16:29 GMT+08:00 Irek Fasikhov malm...@gmail.com mailto:malm...@gmail.com: Hi. Because the disc requires three different hosts, the default number of replications 3. 2014-10-29 10:56 GMT+03:00 Vickie CH mika.leaf...@gmail.com mailto:mika.leaf...@gmail.com: Hi all, Try to use two OSDs to create a cluster. After the deply finished, I found the health status is 88 active+degraded 104 active+remapped. Before use 2 osds to create cluster the result is ok. I'm confuse why this situation happened. Do I need to set crush map to fix this problem? --ceph.conf- [global] fsid = c404ded6-4086-4f0b-b479-89bc018af954 mon_initial_members
Re: [ceph-users] Scrub proces, IO performance
No. Appeared in 0.80.6. But there is a bug which is corrected in 0.80.8 See: http://tracker.ceph.com/issues/9677 2014-10-28 14:50 GMT+03:00 Mateusz Skała mateusz.sk...@budikom.net: Thanks for reply, we are using now ceph 0.80.1 firefly, is this options available? *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of *Mateusz Skała *Sent:* Tuesday, October 28, 2014 9:27 AM *To:* ceph-us...@ceph.com *Subject:* [ceph-users] Scrub proces, IO performance Hello, We are using Ceph as a storage backend for KVM, used for hosting MS Windows RDP, Linux for web applications with MySQL database and file sharing from Linux. Wen scrub or deep-scrub process is active, RDP sessions are freezing for a few seconds and web applications have big replay latency. New we have disabled scrubbing and deep-scrubbing process between 6AM - 10PM, when majority of users doesn't work, but user experience is still poor, like I write above. We are considering disabling scrubbing process at all. Does a new version 0.87 with addresses scrubbing priority is going to solve our problem (according to http://tracker.ceph.com/issues/6278)? Can we switch off scrubbing at all? How we can change our configuration to lower scrubbing performance impact? Does changing block size can lower scrubbing impact or increase performance? Our Ceph cluster configuration : * we are using ~216 RBD disks for KVM VM's * ~11TB used, 3.593TB data, replica count 3 * we have 5 mons, 32 OSD * 3 pools/ 4096pgs (only one - RBD in use) * 6 nodes (5osd+mon, 1 osd only) in two racks * 1 SATA disk for system, 1 SSD disk for journal and 4 or 6 SATA disk for OSD * 2 networks on 2 NIC 1Gbps (cluster + public) on all nodes. * 2x 10GBps links between racks * without scrub max 45 iops * when scrub running 120 - 180 iops ceph.conf mon initial members = ceph35, ceph30, ceph20, ceph15, ceph10 mon host = 10.20.8.35, 10.20.8.30, 10.20.8.20, 10.20.8.15, 10.20.8.10 public network = 10.20.8.0/22 cluster network = 10.20.4.0/22 filestore xattr use omap = true filestore max sync interval = 15 osd journal size = 10240 osd pool default size = 3 osd pool default min size = 1 osd pool default pg num = 2048 osd pool default pgp num = 2048 osd crush chooseleaf type = 1 osd recovery max active = 1 osd recovery op priority = 1 osd max backfills = 1 auth cluster required = cephx auth service required = cephx auth client required = cephx rbd default format = 2 Regards, Mateusz ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] osd_disk_thread_ioprio_class/_priorioty ignored?
Hi. Already have the necessary changes in git. https://github.com/ceph/ceph/commit/86926c6089d63014dd770b4bb61fc7aca3998542 2014-10-23 16:42 GMT+04:00 Paweł Sadowski c...@sadziu.pl: On 10/23/2014 09:10 AM, Paweł Sadowski wrote: Hi, I was trying to determine performance impact of deep-scrubbing with osd_disk_thread_ioprio_class option set but it looks like it's ignored. Performance (during deep-scrub) is the same with this options set or left with defaults (1/3 of normal performance). # ceph --admin-daemon /var/run/ceph/ceph-osd.26.asok config show | grep osd_disk_thread_ioprio osd_disk_thread_ioprio_class: idle, osd_disk_thread_ioprio_priority: 7, # ps -efL | grep 'ce[p]h-osd --cluster=ceph -i 26' | awk '{ print $4; }' | xargs --no-run-if-empty ionice -p | sort | uniq -c 18 unknown: prio 0 186 unknown: prio 4 # cat /sys/class/block/sdf/queue/scheduler noop deadline [cfq] And finallyGDB: Breakpoint 1, ceph_ioprio_string_to_class (s=...) at common/io_priority.cc:48 warning: Source file is more recent than executable. 48return IOPRIO_CLASS_IDLE; (gdb) cont Continuing. Breakpoint 2, OSD::set_disk_tp_priority (this=0x3398000) at osd/OSD.cc:8548 warning: Source file is more recent than executable. 8548 disk_tp.set_ioprio(cls, cct-_conf-osd_disk_thread_ioprio_priority); (gdb) print cls $1 = -22 So the IO priorities are *NOT*set (cls = 0). I'm not sure where this -22 came from.Any ideas? In the mean time I'll compile ceph from sources and check again. Ceph installed from Ceph repositories: # ceph-osd -v ceph version 0.86 (97dcc0539dfa7dac3de74852305d51580b7b1f82) # apt-cache policy ceph ceph: Installed: 0.86-1precise Candidate: 0.86-1precise Version table: *** 0.86-1precise 0 500 http://eu.ceph.com/debian-giant/ precise/main amd64 Packages 100 /var/lib/dpkg/status Following patch corrects problem: diff --git a/src/common/io_priority.cc b/src/common/io_priority index b9eeae8..4cd299a 100644 --- a/src/common/io_priority.cc +++ b/src/common/io_priority.cc @@ -41,7 +41,7 @@ int ceph_ioprio_set(int whence, int who, int int ceph_ioprio_string_to_class(const std::string s) { - std::string l; + std::string l(s); std::transform(s.begin(), s.end(), l.begin(), ::tolower); if (l == idle) # ps -efL | grep 'ce[p]h-osd --cluster=ceph -i 26' | awk '{ print $4; }' | xargs --no-run-if-empty ionice -p | sort | uniq -c 1 idle 4 unknown: prio 0 183 unknown: prio 4 Change to *best effort* (ceph tell osd.26 injectargs '--osd_disk_thread_ioprio_class be') # ps -efL | grep 'ce[p]h-osd --cluster=ceph -i 26' | awk '{ print $4; }' | xargs --no-run-if-empty ionice -p | sort | uniq -c 1 best-effort: prio 7 4 unknown: prio 0 183 unknown: prio 4 -- PS ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Why performance of benchmarks with small blocks is extremely small?
Timur, read this thread: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg12486.html Тимур, прочитай эту ветку. 2014-10-01 16:24 GMT+04:00 Andrei Mikhailovsky and...@arhont.com: Timur, As far as I know, the latest master has a number of improvements for ssd disks. If you check the mailing list discussion from a couple of weeks back, you can see that the latest stable firefly is not that well optimised for ssd drives and IO is limited. However changes are being made to address that. I am well surprised that you can get 10K IOps as in my tests I was not getting over 3K IOPs on the ssd disks which are capable of doing 90K IOps. P.S. does anyone know if the ssd optimisation code will be added to the next maintenance release of firefy? Andrei -- *From: *Timur Nurlygayanov tnurlygaya...@mirantis.com *To: *Christian Balzer ch...@gol.com *Cc: *ceph-us...@ceph.com *Sent: *Wednesday, 1 October, 2014 1:11:25 PM *Subject: *Re: [ceph-users] Why performance of benchmarks with small blocks is extremely small? Hello Christian, Thank you for your detailed answer! I have other pre-production environment with 4 Ceph servers, 4 SSD disks per Ceph server (each Ceph OSD node on the separate SSD disk) Probably I should move journals to other disks or it is not required in my case? [root@ceph-node ~]# mount | grep ceph /dev/sdb4 on /var/lib/ceph/osd/ceph-0 type xfs (rw,noexec,nodev,noatime,nodiratime,inode64,logbsize=256k,delaylog,user_xattr,data=writeback) /dev/sde4 on /var/lib/ceph/osd/ceph-5 type xfs (rw,noexec,nodev,noatime,nodiratime,inode64,logbsize=256k,delaylog,user_xattr,data=writeback) /dev/sdd4 on /var/lib/ceph/osd/ceph-2 type xfs (rw,noexec,nodev,noatime,nodiratime,inode64,logbsize=256k,delaylog,user_xattr,data=writeback) /dev/sdc4 on /var/lib/ceph/osd/ceph-1 type xfs (rw,noexec,nodev,noatime,nodiratime,inode64,logbsize=256k,delaylog,user_xattr,data=writeback) [root@ceph-node ~]# find /var/lib/ceph/osd/ | grep journal /var/lib/ceph/osd/ceph-0/journal /var/lib/ceph/osd/ceph-5/journal /var/lib/ceph/osd/ceph-1/journal /var/lib/ceph/osd/ceph-2/journal My SSD disks have ~ 40k IOPS per disk, but on the VM I can see only ~ 10k - 14k IOPS for disks operations. To check this I execute the following command on VM with root partition mounted on disk in Ceph storage: root@test-io:/home/ubuntu# rm -rf /tmp/test spew -d --write -r -b 4096 10M /tmp/test WTR:56506.22 KiB/s Transfer time: 00:00:00IOPS:14126.55 Is it expected result or I can improve the performance and get at least 30k-40k IOPS on the VM disks? (I have 2x 10Gb/s networks interfaces in LACP bonding for storage network, looks like network can't be the bottleneck). Thank you! On Wed, Oct 1, 2014 at 6:50 AM, Christian Balzer ch...@gol.com wrote: Hello, [reduced to ceph-users] On Sat, 27 Sep 2014 19:17:22 +0400 Timur Nurlygayanov wrote: Hello all, I installed OpenStack with Glance + Ceph OSD with replication factor 2 and now I can see the write operations are extremly slow. For example, I can see only 0.04 MB/s write speed when I run rados bench with 512b blocks: rados bench -p test 60 write --no-cleanup -t 1 -b 512 There are 2 things wrong with that this test: 1. You're using rados bench, when in fact you should be testing from within VMs. For starters a VM could make use of the rbd cache you enabled, rados bench won't. 2. Given the parameters of this test you're testing network latency more than anything else. If you monitor the Ceph nodes (atop is a good tool for that), you will probably see that neither CPU nor disks resources are being exhausted. With a single thread rados puts that tiny block of 512 bytes on the wire, the primary OSD for the PG has to write this to the journal (on your slow, non-SSD disks) and send it to the secondary OSD, which has to ACK the write to its journal back to the primary one, which in turn then ACKs it to the client (rados bench) and then rados bench can send the next packet. You get the drift. Using your parameters I can get 0.17MB/s on a pre-production cluster that uses 4xQDR Infiniband (IPoIB) connections, on my shitty test cluster with 1GB/s links I get similar results to you, unsurprisingly. Ceph excels only with lots of parallelism, so an individual thread might be slow (and in your case HAS to be slow, which has nothing to do with Ceph per se) but many parallel ones will utilize the resources available. Having data blocks that are adequately sized (4MB, the default rados size) will help for bandwidth and the rbd cache inside a properly configured VM should make that happen. Of course in most real life scenarios you will run out of IOPS long before you run out of bandwidth. Maintaining 1 concurrent writes of 512 bytes for up to 60 seconds or 0 objects Object prefix: benchmark_data_node-17.domain.tld_15862 sec Cur ops started finished
[ceph-users] rbd export - nc -rbd import = memory leak
Hi, All. I see a memory leak when importing raw deviсe. Export Scheme: [rbd@rbdbackup ~]$ rbd --no-progress -n client.rbdbackup -k /etc/ceph/big.keyring -c /etc/ceph/big.conf export rbdtest/vm-111-disk-1 - | nc 10.43.255.252 12345 [root@ct2 ~]# nc -l 12345 | rbd import --no-progress --image-format 2 - rbd/vm-111-disk-1 This is the same problem with ssh Memory usage, see the screenshots: https://drive.google.com/folderview?id=0BxoNLVWxzOJWSHlTSEZvM3lkQXMusp=sharing -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd export - nc -rbd import = memory leak
OS: CentOS 6.5 Kernel: 2.6.32-431.el6.x86_64 Ceph --version: ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60) 2014-09-26 15:44 GMT+04:00 Irek Fasikhov malm...@gmail.com: Hi, All. I see a memory leak when importing raw deviсe. Export Scheme: [rbd@rbdbackup ~]$ rbd --no-progress -n client.rbdbackup -k /etc/ceph/big.keyring -c /etc/ceph/big.conf export rbdtest/vm-111-disk-1 - | nc 10.43.255.252 12345 [root@ct2 ~]# nc -l 12345 | rbd import --no-progress --image-format 2 - rbd/vm-111-disk-1 This is the same problem with ssh Memory usage, see the screenshots: https://drive.google.com/folderview?id=0BxoNLVWxzOJWSHlTSEZvM3lkQXMusp=sharing -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [PG] Slow request *** seconds old,v4 currently waiting for pg to exist locally
osd_op(client.4625.1:9005787) . This is due to external factors. For example, the network settings. 2014-09-25 10:05 GMT+04:00 Udo Lembke ulem...@polarzone.de: Hi again, sorry - forgot my post... see osdmap e421: 9 osds: 9 up, 9 in shows that all your 9 osds are up! Do you have trouble with your journal/filesystem? Udo Am 25.09.2014 08:01, schrieb Udo Lembke: Hi, looks that some osds are down?! What is the output of ceph osd tree Udo Am 25.09.2014 04:29, schrieb Aegeaner: The cluster healthy state is WARN: health HEALTH_WARN 118 pgs degraded; 8 pgs down; 59 pgs incomplete; 28 pgs peering; 292 pgs stale; 87 pgs stuck inactive; 292 pgs stuck stale; 205 pgs stuck unclean; 22 requests are blocked 32 sec; recovery 12474/46357 objects degraded (26.909%) monmap e3: 3 mons at {CVM-0-mon01= 172.18.117.146:6789/0,CVM-0-mon02=172.18.117.152:6789/0,CVM-0-mon03=172.18.117.153:6789/0 }, election epoch 24, quorum 0,1,2 CVM-0-mon01,CVM-0-mon02,CVM-0-mon03 osdmap e421: 9 osds: 9 up, 9 in pgmap v2261: 292 pgs, 4 pools, 91532 MB data, 23178 objects 330 MB used, 3363 GB / 3363 GB avail 12474/46357 objects degraded (26.909%) 20 stale+peering 87 stale+active+clean 8 stale+down+peering 59 stale+incomplete 118 stale+active+degraded What does these errors mean? Can these PGs be recovered? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Rebalancing slow I/O.
Hi,All. DELL R720X8,96 OSDs, Network 2x10Gbit LACP. When one of the nodes crashes, I get very slow I / O operations on virtual machines. A cluster map by default. [ceph@ceph08 ~]$ ceph osd tree # idweight type name up/down reweight -1 262.1 root defaults -2 32.76 host ceph01 0 2.73osd.0 up 1 ... 11 2.73osd.11 up 1 -3 32.76 host ceph02 13 2.73osd.13 up 1 .. 12 2.73osd.12 up 1 -4 32.76 host ceph03 24 2.73osd.24 up 1 35 2.73osd.35 up 1 -5 32.76 host ceph04 37 2.73osd.37 up 1 . 47 2.73osd.47 up 1 -6 32.76 host ceph05 48 2.73osd.48 up 1 ... 59 2.73osd.59 up 1 -7 32.76 host ceph06 60 2.73osd.60 down0 ... 71 2.73osd.71 down0 -8 32.76 host ceph07 72 2.73osd.72 up 1 83 2.73osd.83 up 1 -9 32.76 host ceph08 84 2.73osd.84 up 1 95 2.73osd.95 up 1 If I change the cluster map on the following: root---| | |-rack1 || |host ceph01 |host ceph02 |host ceph03 |host ceph04 | |---rack2 | host ceph05 host ceph06 host ceph07 host ceph08 What will povidenie cluster failover one node? And how much will it affect the performance? Thank you -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] enrich ceph test methods, what is your concern about ceph. thanks
Hi. I and many people use fio. For ceph rbd has a special engine: https://telekomcloud.github.io/ceph/2014/02/26/ceph-performance-analysis_fio_rbd.html 2014-08-26 12:15 GMT+04:00 yuelongguang fasts...@163.com: hi,all i am planning to do a test on ceph, include performance, throughput, scalability,availability. in order to get a full test result, i hope you all can give me some advice. meanwhile i can send the result to you,if you like. as for each category test( performance, throughput, scalability,availability) , do you have some some test idea and test tools? basicly i have know some tools to test throughtput,iops . but you can tell the tools you prefer and the result you expect. thanks very much ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph monitor load, low performance
Move logs on the SSD and immediately increase performance. you have about 50% of the performance lost on logs. And just for the three replications recommended more than 5 hosts 2014-08-26 12:17 GMT+04:00 Mateusz Skała mateusz.sk...@budikom.net: Hi thanks for reply. From the top of my head, it is recommended to use 3 mons in production. Also, for the 22 osds your number of PGs look a bug low, you should look at that. I get it from http://ceph.com/docs/master/rados/operations/placement- groups/ (22osd's * 100)/3 replicas = 733, ~1024 pgs Please correct me if I'm wrong. It will be 5 mons (on 6 hosts) but now we must migrate some data from used servers. The performance of the cluster is poor - this is too vague. What is your current performance, what benchmarks have you tried, what is your data workload and most importantly, how is your cluster setup. what disks, ssds, network, ram, etc. Please provide more information so that people could help you. Andrei Hardware informations: ceph15: RAM: 4GB Network: 4x 1GB NIC OSD disk's: 2x SATA Seagate ST31000524NS 2x SATA WDC WD1003FBYX-18Y7B0 ceph25: RAM: 16GB Network: 4x 1GB NIC OSD disk's: 2x SATA WDC WD7500BPKX-7 2x SATA WDC WD7500BPKX-2 2x SATA SSHD ST1000LM014-1EJ164 ceph30 RAM: 16GB Network: 4x 1GB NIC OSD disks: 6x SATA SSHD ST1000LM014-1EJ164 ceph35: RAM: 16GB Network: 4x 1GB NIC OSD disks: 6x SATA SSHD ST1000LM014-1EJ164 All journals are on OSD's. 2 NIC are for backend network (10.20.4.0/22) and 2 NIC are for frontend (10.20.8.0/22). This cluster we use as storage backend for 100VM's on KVM. I don't make benchmarks but all vm's are migrated from Xen+GlusterFS(NFS), before migration every VM are running fine, now each VM from time to time hangs for few seconds, apps installed on VM's loading much more time. GlusterFS are running on 2 servers with 1x 1GB NIC and 2x8 disks WDC WD7500BPKX-7. I make one test with recovery, if disk marks out, then recovery io is 150-200MB/s but all vm's hangs until recovery ends. Biggest load is on ceph35, IOps on each disk are near 150, cpu load ~4-5. On other hosts cpu load 2, 120~130iops Our ceph.conf === [global] fsid=a9d17295-62f2-46f6-8325-1cad7724e97f mon initial members = ceph35, ceph30, ceph25, ceph15 mon host = 10.20.8.35, 10.20.8.30, 10.20.8.25, 10.20.8.15 public network = 10.20.8.0/22 cluster network = 10.20.4.0/22 osd journal size = 1024 filestore xattr use omap = true osd pool default size = 3 osd pool default min size = 1 osd pool default pg num = 1024 osd pool default pgp num = 1024 osd crush chooseleaf type = 1 auth cluster required = cephx auth service required = cephx auth client required = cephx rbd default format = 2 ##ceph35 osds [osd.0] cluster addr = 10.20.4.35 [osd.1] cluster addr = 10.20.4.35 [osd.2] cluster addr = 10.20.4.35 [osd.3] cluster addr = 10.20.4.36 [osd.4] cluster addr = 10.20.4.36 [osd.5] cluster addr = 10.20.4.36 ##ceph25 osds [osd.6] cluster addr = 10.20.4.25 public addr = 10.20.8.25 [osd.7] cluster addr = 10.20.4.25 public addr = 10.20.8.25 [osd.8] cluster addr = 10.20.4.25 public addr = 10.20.8.25 [osd.9] cluster addr = 10.20.4.26 public addr = 10.20.8.26 [osd.10] cluster addr = 10.20.4.26 public addr = 10.20.8.26 [osd.11] cluster addr = 10.20.4.26 public addr = 10.20.8.26 ##ceph15 osds [osd.12] cluster addr = 10.20.4.15 public addr = 10.20.8.15 [osd.13] cluster addr = 10.20.4.15 public addr = 10.20.8.15 [osd.14] cluster addr = 10.20.4.15 public addr = 10.20.8.15 [osd.15] cluster addr = 10.20.4.16 public addr = 10.20.8.16 ##ceph30 osds [osd.16] cluster addr = 10.20.4.30 public addr = 10.20.8.30 [osd.17] cluster addr = 10.20.4.30 public addr = 10.20.8.30 [osd.18] cluster addr = 10.20.4.30 public addr = 10.20.8.30 [osd.19] cluster addr = 10.20.4.31 public addr = 10.20.8.31 [osd.20] cluster addr = 10.20.4.31 public addr = 10.20.8.31 [osd.21] cluster addr = 10.20.4.31 public addr = 10.20.8.31 [mon.ceph35] host = ceph35 mon addr = 10.20.8.35:6789 [mon.ceph30] host = ceph30 mon addr = 10.20.8.30:6789 [mon.ceph25] host = ceph25 mon addr = 10.20.8.25:6789 [mon.ceph15] host = ceph15 mon addr = 10.20.8.15:6789 Regards, Mateusz ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph monitor load, low performance
I'm sorry, of course it journals) 2014-08-26 13:16 GMT+04:00 Mateusz Skała mateusz.sk...@budikom.net: You mean to move /var/log/ceph/* to SSD disk? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] enrich ceph test methods, what is your concern about ceph. thanks
For me, the bottleneck is single-threaded operation. The recording will have more or less solved with the inclusion of rbd cache, but there are problems with reading. But I think that these problems can be solved cache pool, but have not tested. It follows that the more threads, the greater the speed of reading and writing. But in reality it is different. The speed and number of operations, depending on many factors, such as network latency. Examples testing, special attention to the charts: https://software.intel.com/en-us/blogs/2013/10/25/measure-ceph-rbd-performance-in-a-quantitative-way-part-i and https://software.intel.com/en-us/blogs/2013/11/20/measure-ceph-rbd-performance-in-a-quantitative-way-part-ii 2014-08-26 15:11 GMT+04:00 yuelongguang fasts...@163.com: thanks Irek Fasikhov. is it the only way to test ceph-rbd? and an important aim of the test is to find where the bottleneck is. qemu/librbd/ceph. could you share your test result with me? thanks 在 2014-08-26 04:22:22,Irek Fasikhov malm...@gmail.com 写道: Hi. I and many people use fio. For ceph rbd has a special engine: https://telekomcloud.github.io/ceph/2014/02/26/ceph-performance-analysis_fio_rbd.html 2014-08-26 12:15 GMT+04:00 yuelongguang fasts...@163.com: hi,all i am planning to do a test on ceph, include performance, throughput, scalability,availability. in order to get a full test result, i hope you all can give me some advice. meanwhile i can send the result to you,if you like. as for each category test( performance, throughput, scalability,availability) , do you have some some test idea and test tools? basicly i have know some tools to test throughtput,iops . but you can tell the tools you prefer and the result you expect. thanks very much ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] enrich ceph test methods, what is your concern about ceph. thanks
Sorry..Enter pressed :) continued... no, it's not the only way to check, but it depends what you want to use ceph 2014-08-26 15:22 GMT+04:00 Irek Fasikhov malm...@gmail.com: For me, the bottleneck is single-threaded operation. The recording will have more or less solved with the inclusion of rbd cache, but there are problems with reading. But I think that these problems can be solved cache pool, but have not tested. It follows that the more threads, the greater the speed of reading and writing. But in reality it is different. The speed and number of operations, depending on many factors, such as network latency. Examples testing, special attention to the charts: https://software.intel.com/en-us/blogs/2013/10/25/measure-ceph-rbd-performance-in-a-quantitative-way-part-i and https://software.intel.com/en-us/blogs/2013/11/20/measure-ceph-rbd-performance-in-a-quantitative-way-part-ii 2014-08-26 15:11 GMT+04:00 yuelongguang fasts...@163.com: thanks Irek Fasikhov. is it the only way to test ceph-rbd? and an important aim of the test is to find where the bottleneck is. qemu/librbd/ceph. could you share your test result with me? thanks 在 2014-08-26 04:22:22,Irek Fasikhov malm...@gmail.com 写道: Hi. I and many people use fio. For ceph rbd has a special engine: https://telekomcloud.github.io/ceph/2014/02/26/ceph-performance-analysis_fio_rbd.html 2014-08-26 12:15 GMT+04:00 yuelongguang fasts...@163.com: hi,all i am planning to do a test on ceph, include performance, throughput, scalability,availability. in order to get a full test result, i hope you all can give me some advice. meanwhile i can send the result to you,if you like. as for each category test( performance, throughput, scalability,availability) , do you have some some test idea and test tools? basicly i have know some tools to test throughtput,iops . but you can tell the tools you prefer and the result you expect. thanks very much ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to calculate necessary disk amount
Hi. 10ТB*2/0.85 ~= 24 TB with two replications, total volume for the raw data. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to calculate necessary disk amount
I recommend you use replication, because radosgw uses asynchronous replication. Yes divided by nearfull ratio. No, it's for the entire cluster. 2014-08-22 11:51 GMT+04:00 idzzy idez...@gmail.com: Hi, If not use replication, Is it only to divide by nearfull_ratio? (does only radosgw support replication?) 10T/0.85 = 11.8 TB of each node? # ceph pg dump | egrep full_ratio|nearfulll_ratio full_ratio 0.95 nearfull_ratio 0.85 Sorry I’m not familiar with ceph architecture. Thanks for the reply. — idzzy On August 22, 2014 at 3:53:21 PM, Irek Fasikhov (malm...@gmail.com) wrote: Hi. 10ТB*2/0.85 ~= 24 TB with two replications, total volume for the raw data. -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to calculate necessary disk amount
node1: 4[TB], node2: 4[TB], node3: 4[TB] :) 22 авг. 2014 г. 12:53 пользователь idzzy idez...@gmail.com написал: Hi Irek, Understood. Let me ask about only this. No, it's for the entire cluster. Is this meant that total disk amount size of all nodes is over than 11.8 TB? e.g node1: 4[TB], node2: 4[TB], node3: 4[TB] not each node. e.g node1: 11.8[TB], node2: 11.8[TB], node3:11.8 [TB] Thank you. On August 22, 2014 at 5:06:02 PM, Irek Fasikhov (malm...@gmail.com) wrote: I recommend you use replication, because radosgw uses asynchronous replication. Yes divided by nearfull ratio. No, it's for the entire cluster. 2014-08-22 11:51 GMT+04:00 idzzy idez...@gmail.com: Hi, If not use replication, Is it only to divide by nearfull_ratio? (does only radosgw support replication?) 10T/0.85 = 11.8 TB of each node? # ceph pg dump | egrep full_ratio|nearfulll_ratio full_ratio 0.95 nearfull_ratio 0.85 Sorry I’m not familiar with ceph architecture. Thanks for the reply. — idzzy On August 22, 2014 at 3:53:21 PM, Irek Fasikhov (malm...@gmail.com) wrote: Hi. 10ТB*2/0.85 ~= 24 TB with two replications, total volume for the raw data. -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Fw: external monitoring tools for processes
Hi. I use ZABBIX with the following script: [ceph@ceph08 ~]$ cat /etc/zabbix/external/ceph #!/usr/bin/python import sys import os import commands import json import datetime import time #Chech arguments. If count arguments equally 1, then false. if len(sys.argv) == 1: print You will need arguments!; exit; def generate(data,type): JSON={\data\:[ for js in range(len(splits)): JSON+={\{#+type+}\:\+splits[js]+\},; return JSON[:-1]+]} if sys.argv[1] == osd: if len(sys.argv)==2: splits=commands.getoutput('df | grep osd | awk {\'print $6\'}| sed \'s/[^0-9]//g\'| sed \':a;N;$!ba;s/\\n/,/g\'').split(,) print generate(splits,OSD) else: ID=sys.argv[2] LEVEL=sys.argv[3] PERF=sys.argv[4] CACHEFILE=/tmp/zabbix.ceph.osd+ID+.cache CACHETTL=5 TIME=int(round(float(datetime.datetime.now().strftime(%s ##CACHE FOR OPTIMIZATION PERFORMANCE# if os.path.isfile(CACHEFILE): CACHETIME=int(round(os.stat(CACHEFILE).st_mtime)) else: CACHETIME=0 if TIME-CACHETIMECACHETTL: if os.system('sudo ceph --admin-daemon /var/run/ceph/ceph-osd.'+ID+'.asok perfcounters_dump '+CACHEFILE)0: exit json_data=open(CACHEFILE) data = json.load(json_data) json_data.close() ## PARSING if LEVEL in data: if PERF in data[LEVEL]: try: key=data[LEVEL][PERF].has_key(sum) print (data[LEVEL][PERF][sum])/(data[LEVEL][PERF][avgcount]) except AttributeError: print data[LEVEL][PERF] and zabbix templates: https://dl.dropboxusercontent.com/u/575018/zbx_export_templates.xml 2014-08-11 7:42 GMT+04:00 pragya jain prag_2...@yahoo.co.in: please somebody reply my question. On Saturday, 9 August 2014 3:34 PM, pragya jain prag_2...@yahoo.co.in wrote: hi all, can somebody suggest me some external monitoring tools which can monitor whether the processes in ceph, such as, heartbeating, data scrubbing, authentication, backfilling, recovering etc. are working properly or not. Regards Pragya Jain ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mounting RBD in linux containers
dmesg output please. 2014-08-11 2:16 GMT+04:00 Lorieri lori...@gmail.com: same here, did you manage to fix it ? On Mon, Oct 28, 2013 at 3:13 PM, Kevin Weiler kevin.wei...@imc-chicago.com wrote: Hi Josh, We did map it directly to the host, and it seems to work just fine. I think this is a problem with how the container is accessing the rbd module. -- Kevin Weiler IT IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606 | http://imc-chicago.com/ Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail: kevin.wei...@imc-chicago.com On 10/18/13 7:50 PM, Josh Durgin josh.dur...@inktank.com wrote: On 10/18/2013 10:04 AM, Kevin Weiler wrote: The kernel is 3.11.4-201.fc19.x86_64, and the image format is 1. I did, however, try a map with an RBD that was format 2. I got the same error. To rule out any capability drops as the culprit, can you map an rbd image on the same host outside of a container? Josh -- *Kevin Weiler* IT IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606 | http://imc-chicago.com/ Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail: _kevin.wei...@imc-chicago.com mailto:kevin.wei...@imc-chicago.com_ From: Gregory Farnum g...@inktank.com mailto:g...@inktank.com Date: Friday, October 18, 2013 10:26 AM To: Omar Marquez omar.marq...@imc-chicago.com mailto:omar.marq...@imc-chicago.com Cc: Kyle Bader kyle.ba...@gmail.com mailto:kyle.ba...@gmail.com, Kevin Weiler kevin.wei...@imc-chicago.com mailto:kevin.wei...@imc-chicago.com, ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com, Khalid Goudeaux khalid.goude...@imc-chicago.com mailto:khalid.goude...@imc-chicago.com Subject: Re: [ceph-users] mounting RBD in linux containers What kernel are you running, and which format is the RBD image? I thought we had a special return code for when the kernel doesn't support the features used by that image, but that could be the problem. -Greg On Thursday, October 17, 2013, Omar Marquez wrote: Strace produces below: Š futex(0xb5637c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0xb56378, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 futex(0xb562f8, FUTEX_WAKE_PRIVATE, 1) = 1 add_key(0x424408, 0x7fff82c4e210, 0x7fff82c4e140, 0x22, 0xfffe) = 607085216 stat(/sys/bus/rbd, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 *open(/sys/bus/rbd/add, O_WRONLY) = 3* *write(3, 10.198.41.6:6789 http://10.198.41.6:6789,10.198.41.8:678 http://10.198.41.8:678..., 96) = -1 EINVAL (Invalid argument)* close(3)= 0 rt_sigaction(SIGINT, {SIG_IGN, [], SA_RESTORER, 0x7fbf8a7efa90}, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGQUIT, {SIG_IGN, [], SA_RESTORER, 0x7fbf8a7efa90}, {SIG_DFL, [], 0}, 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [PIPE], 8) = 0 clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7fff82c4e040) = 22 wait4(22, [{WIFEXITED(s) WEXITSTATUS(s) == 0}], 0, NULL) = 22 rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7fbf8a7efa90}, NULL, 8) = 0 rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x7fbf8a7efa90}, NULL, 8) = 0 rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0 write(2, rbd: add failed: , 17rbd: add failed: ) = 17 write(2, (22) Invalid argument, 21(22) Invalid argument) = 21 write(2, \n, 1 ) = 1 exit_group(1) = ? +++ exited with 1 +++ The app is run inside the container with setuid = 0 and the container is able to mount all required filesystems Š could this still be a capability problem ? Also I do not see any call to capset() in the strafe log Š -- Om From: Kyle Bader kyle.ba...@gmail.com Date: Thursday, October 17, 2013 5:08 PM To: Kevin Weiler kevin.wei...@imc-chicago.com Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com, Omar Marquez omar.marq...@imc-chicago.com, Khalid Goudeaux khalid.goude...@imc-chicago.com Subject: Re: [ceph-users] mounting RBD in linux containers My first guess would be that it's due to LXC dropping capabilities, I'd investigate whether CAP_SYS_ADMIN is being dropped. You need CAP_SYS_ADMIN for mount and block ioctls, if the container doesn't have those privs a map will likely fail. Maybe try tracing the command with strace? On Thu, Oct 17, 2013 at 2:45 PM, Kevin Weiler kevin.wei...@imc-chicago.com wrote: Hi all, We're trying to mount an rbd image inside of a linux container
Re: [ceph-users] flashcache from fb and dm-cache??
Ceph has at CachePool. which can be created from SSD. 30 июля 2014 г. 18:41 пользователь German Anders gand...@despegar.com написал: Also, does someone try flashcache from facebook on ceph? cons? pros? any perf improvement? and dm-cache? *German Anders* ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rbd rm. Error: trim_objectcould not find coid
Hi, All. I encountered such a problem. Was the status of one pg - inconsistent. RBD found this device and deleted it, now on the OSD get the following error: cod 0'0 active+inconsistent snaptrimq=[15~1,89~1]] exit Started/Primary/Active/Recovering 0.025609 1 0.53 -8 2014-07-23 12:03:13.386747 7f3617b02700 5 osd.94 pg_epoch: 35725 pg[80.3d6( v 35718'170614 (34929'167412,35718'170614] local-les=35724 n=2242 ec=9510 les/c 35724/35718 35723/35723/35723) [94,36] r=0 lpr=35723 pi=35713-35722/2 ml cod 0'0 active+inconsistent snaptrimq=[15~1,89~1]] enter Started/Primary/Active/Recovered -7 2014-07-23 12:03:13.386783 7f3617b02700 5 osd.94 pg_epoch: 35725 pg[80.3d6( v 35718'170614 (34929'167412,35718'170614] local-les=35724 n=2242 ec=9510 les/c 35724/35718 35723/35723/35723) [94,36] r=0 lpr=35723 pi=35713-35722/2 ml cod 0'0 active+inconsistent snaptrimq=[15~1,89~1]] exit Started/Primary/Active/Recovered 0.35 0 0.00 -6 2014-07-23 12:03:13.386795 7f3617b02700 5 osd.94 pg_epoch: 35725 pg[80.3d6( v 35718'170614 (34929'167412,35718'170614] local-les=35724 n=2242 ec=9510 les/c 35724/35718 35723/35723/35723) [94,36] r=0 lpr=35723 pi=35713-35722/2 ml cod 0'0 active+inconsistent snaptrimq=[15~1,89~1]] enter Started/Primary/Active/Clean -5 2014-07-23 12:03:13.386932 7f3617b02700 5 osd.94 pg_epoch: 35725 pg[2.772( v 35722'486163 lc 35716'486156 (35141'483132,35722'486163] local-les=35724 n=1328 ec=1 les/c 35724/35718 35723/35723/35723) [94,38,59] r=0 lpr=35723 pi=3 5713-35722/2 lcod 0'0 mlcod 0'0 active+recovery_wait m=4] exit Started/Primary/Active/WaitLocalRecoveryReserved 4.377808 7 0.96 -4 2014-07-23 12:03:13.386956 7f3617b02700 5 osd.94 pg_epoch: 35725 pg[2.772( v 35722'486163 lc 35716'486156 (35141'483132,35722'486163] local-les=35724 n=1328 ec=1 les/c 35724/35718 35723/35723/35723) [94,38,59] r=0 lpr=35723 pi=3 5713-35722/2 lcod 0'0 mlcod 0'0 active+recovery_wait m=4] enter Started/Primary/Active/WaitRemoteRecoveryReserved -3 2014-07-23 12:03:13.387282 7f36148fd700 -1 osd.94 pg_epoch: 35725 pg[80.3d6( v 35718'170614 (34929'167412,35718'170614] local-les=35724 n=2242 ec=9510 les/c 35724/35725 35723/35723/35723) [94,36] r=0 lpr=35723 mlcod 0'0 active+cl ean+inconsistent snaptrimq=[15~1,89~1]] *trim_objectcould not find coid * f022c7d6/rbd_data.3ed9c72ae8944a.0717/15//80 -2 2014-07-23 12:03:13.388628 7f3617101700 5 osd.94 pg_epoch: 35725 pg[2.772( v 35722'486163 lc 35716'486156 (35141'483132,35722'486163] local-les=35724 n=1328 ec=1 les/c 35724/35718 35723/35723/35723) [94,38,59] r=0 lpr=35723 pi=3 5713-35722/2 lcod 0'0 mlcod 0'0 active+recovery_wait m=4] exit Started/Primary/Active/WaitRemoteRecoveryReserved 0.001672 2 0.79 -1 2014-07-23 12:03:13.388670 7f3617101700 5 osd.94 pg_epoch: 35725 pg[2.772( v 35722'486163 lc 35716'486156 (35141'483132,35722'486163] local-les=35724 n=1328 ec=1 les/c 35724/35718 35723/35723/35723) [94,38,59] r=0 lpr=35723 pi=3 5713-35722/2 lcod 0'0 mlcod 0'0 active+recovery_wait m=4] enter Started/Primary/Active/Recovering 0 2014-07-23 12:03:13.389138 7f36148fd700 -1 osd/ReplicatedPG.cc: In function 'ReplicatedPG::RepGather* ReplicatedPG::trim_object(const hobject_t)' thread 7f36148fd700 time 2014-07-23 12:03:13.387304 osd/ReplicatedPG.cc: 1824: FAILED assert(0) [root@ceph08 DIR_7]# find /var/lib/ceph/osd/ceph-94/ -name '*3ed9c72ae8944a.0717*' -ls 10745283770 -rw-r--r-- 1 root root1 Июл 23 11:30 /var/lib/ceph/osd/ceph-94/current/80.3d6_head/DIR_6/DIR_D/DIR_7/rbd\\udata.3ed9c72ae8944a.0717__15_F022C7D6__50 How to make ceph forgot about the existence of this file? Ceph version 0.72.2 Thanks. -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] feature set mismatch after upgrade from Emperor to Firefly
Привет, Андрей. ceph osd getcrushmap -o /tmp/crush crushtool -i /tmp/crush --set-chooseleaf_vary_r 0 -o /tmp/crush.new ceph osd setcrushmap -i /tmp/crush.new Or update kernel 3.15. 2014-07-20 20:19 GMT+04:00 Andrei Mikhailovsky and...@arhont.com: Hello guys, I have noticed the following message/error after upgrading to firefly. Does anyone know what needs doing to correct it? Thanks Andrei [ 25.911055] libceph: mon1 192.168.168.201:6789 feature set mismatch, my 40002 server's 20002040002, missing 2000200 [ 25.911698] libceph: mon1 192.168.168.201:6789 socket error on read [ 35.913049] libceph: mon2 192.168.168.13:6789 feature set mismatch, my 40002 server's 20002040002, missing 2000200 [ 35.913694] libceph: mon2 192.168.168.13:6789 socket error on read [ 45.909466] libceph: mon0 192.168.168.200:6789 feature set mismatch, my 40002 server's 20002040002, missing 2000200 [ 45.910104] libceph: mon0 192.168.168.200:6789 socket error on read ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph RBD and Backup.
Hi,All. Dear community. How do you make backups CEPH RDB? Thanks -- Fasihov Irek (aka Kataklysm). С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Calamari Goes Open Source
Very Very Good! Thanks Inktank/RedHat. 2014-05-31 2:43 GMT+04:00 John Kinsella j...@stratosec.co: Cool! Looking forward to kicking the tires on that... On May 30, 2014, at 3:04 PM, Patrick McGarry patr...@inktank.com wrote: Hey cephers, Sorry to push this announcement so late on a Friday but... Calamari has arrived! The source code bits have been flipped, the ticket tracker has been moved, and we have even given you a little bit of background from both a technical and vision point of view: Technical (ceph.com): http://ceph.com/community/ceph-calamari-goes-open-source/ Vision (inktank.com): http://www.inktank.com/software/future-of-calamari/ The ceph.com link should give you everything you need to know about what tech comprises Calamari, where the source lives, and where the discussions will take place. If you have any questions feel free to hit the new ceph-calamari list or stop by IRC and we'll get you started. Hope you all enjoy the GUI! Best Regards, Patrick McGarry Director, Community || Inktank http://ceph.com || http://inktank.com @scuttlemonkey || @ceph || @inktank ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Stratosec http://stratosec.co/ - Compliance as a Service o: 415.315.9385 @johnlkinsella http://twitter.com/johnlkinsella ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RAID implementation.
No sense doing RAID for ceph! 2014-04-29 15:58 GMT+04:00 yalla.gnan.ku...@accenture.com: Hi All, I have setup a three node ceph storage cluster on Ubunut. I want to implement RAID 5, RAID 1 and RAID 1+0 volumes using ceph. Any information or link providing information on this will help a lot. Thanks Kumar -- This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. __ www.accenture.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSD journal overload?
You what model SSD? Which version of the kernel? 2014-04-28 12:35 GMT+04:00 Udo Lembke ulem...@polarzone.de: Hi, perhaps due IOs from the journal? You can test with iostat (like iostat -dm 5 sdg). on debian iostat is in the package sysstat. Udo Am 28.04.2014 07:38, schrieb Indra Pramana: Hi Craig, Good day to you, and thank you for your enquiry. As per your suggestion, I have created a 3rd partition on the SSDs and did the dd test directly into the device, and the result is very slow. root@ceph-osd-08:/mnt# dd bs=1M count=128 if=/dev/zero of=/dev/sdg3 conv=fdatasync oflag=direct 128+0 records in 128+0 records out 134217728 bytes (134 MB) copied, 19.5223 s, 6.9 MB/s root@ceph-osd-08:/mnt# dd bs=1M count=128 if=/dev/zero of=/dev/sdf3 conv=fdatasync oflag=direct 128+0 records in 128+0 records out 134217728 bytes (134 MB) copied, 5.34405 s, 25.1 MB/s I did a test onto another server with exactly similar specification and similar SSD drive (Seagate SSD 100 GB) but not added into the cluster yet (thus no load), and the result is fast: root@ceph-osd-09:/home/indra# dd bs=1M count=128 if=/dev/zero of=/dev/sdf1 conv=fdatasync oflag=direct 128+0 records in 128+0 records out 134217728 bytes (134 MB) copied, 0.742077 s, 181 MB/s Is the Ceph journal load really takes up a lot of the SSD resources? I don't understand how come the performance can drop significantly. Especially since the two Ceph journals are only taking the first 20 GB out of the 100 GB of the SSD total capacity. Any advice is greatly appreciated. Looking forward to your reply, thank you. Cheers. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSD journal overload?
Most likely you need to apply a patch to the kernel. http://www.theirek.com/blog/2014/02/16/patch-dlia-raboty-s-enierghoniezavisimym-keshiem-ssd-diskov 2014-04-28 15:20 GMT+04:00 Indra Pramana in...@sg.or.id: Hi Udo and Irek, Good day to you, and thank you for your emails. perhaps due IOs from the journal? You can test with iostat (like iostat -dm 5 sdg). Yes, I have shared the iostat result earlier on this same thread. At times the utilisation of the 2 journal drives will hit 100%, especially when I simulate writing data using rados bench command. Any suggestions what could be the cause of the I/O issue? avg-cpu: %user %nice %system %iowait %steal %idle 1.850.001.653.140.00 93.36 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdg 0.00 0.000.00 55.00 0.00 25365.33 922.3834.22 568.900.00 568.90 17.82 98.00 sdf 0.00 0.000.00 55.67 0.00 25022.67 899.0229.76 500.570.00 500.57 17.60 98.00 avg-cpu: %user %nice %system %iowait %steal %idle 2.100.001.372.070.00 94.46 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdg 0.00 0.000.00 56.67 0.00 25220.00 890.1223.60 412.140.00 412.14 17.62 99.87 sdf 0.00 0.000.00 52.00 0.00 24637.33 947.5933.65 587.410.00 587.41 19.23 100.00 avg-cpu: %user %nice %system %iowait %steal %idle 2.210.001.776.750.00 89.27 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdg 0.00 0.000.00 54.33 0.00 24802.67 912.9825.75 486.360.00 486.36 18.40 100.00 sdf 0.00 0.000.00 53.00 0.00 24716.00 932.6835.26 669.890.00 669.89 18.87 100.00 avg-cpu: %user %nice %system %iowait %steal %idle 1.870.001.675.250.00 91.21 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdg 0.00 0.000.00 94.33 0.00 26257.33 556.6918.29 208.440.00 208.44 10.50 99.07 sdf 0.00 0.000.00 51.33 0.00 24470.67 953.4032.75 684.620.00 684.62 19.51 100.13 avg-cpu: %user %nice %system %iowait %steal %idle 1.510.001.347.250.00 89.89 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdg 0.00 0.000.00 52.00 0.00 22565.33 867.9024.73 446.510.00 446.51 19.10 99.33 sdf 0.00 0.000.00 64.67 0.00 24892.00 769.8619.50 330.020.00 330.02 15.32 99.07 You what model SSD? For this one, I am using Seagate 100GB SSD, model: HDS-2TM-ST100FM0012 Which version of the kernel? Ubuntu 13.04, Linux kernel version: 3.8.0-19-generic #30-Ubuntu SMP Wed May 1 16:35:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux Looking forward to your reply, thank you. Cheers. On Mon, Apr 28, 2014 at 4:45 PM, Irek Fasikhov malm...@gmail.com wrote: You what model SSD? Which version of the kernel? 2014-04-28 12:35 GMT+04:00 Udo Lembke ulem...@polarzone.de: Hi, perhaps due IOs from the journal? You can test with iostat (like iostat -dm 5 sdg). on debian iostat is in the package sysstat. Udo Am 28.04.2014 07:38, schrieb Indra Pramana: Hi Craig, Good day to you, and thank you for your enquiry. As per your suggestion, I have created a 3rd partition on the SSDs and did the dd test directly into the device, and the result is very slow. root@ceph-osd-08:/mnt# dd bs=1M count=128 if=/dev/zero of=/dev/sdg3 conv=fdatasync oflag=direct 128+0 records in 128+0 records out 134217728 bytes (134 MB) copied, 19.5223 s, 6.9 MB/s root@ceph-osd-08:/mnt# dd bs=1M count=128 if=/dev/zero of=/dev/sdf3 conv=fdatasync oflag=direct 128+0 records in 128+0 records out 134217728 bytes (134 MB) copied, 5.34405 s, 25.1 MB/s I did a test onto another server with exactly similar specification and similar SSD drive (Seagate SSD 100 GB) but not added into the cluster yet (thus no load), and the result is fast: root@ceph-osd-09:/home/indra# dd bs=1M count=128 if=/dev/zero of=/dev/sdf1 conv=fdatasync oflag=direct 128+0 records in 128+0 records out 134217728 bytes (134 MB) copied, 0.742077 s, 181 MB/s Is the Ceph journal load really takes up a lot of the SSD resources? I don't understand how come the performance can drop significantly. Especially since the two Ceph journals are only taking
Re: [ceph-users] SSD journal overload?
This is my article :). To patch to the kernel (http://www.theirek.com/downloads/code/CMD_FLUSH.diff ). After rebooting, run the following commands: echo temporary write through /sys/class/scsi_disk/disk/cache_type 2014-04-28 15:44 GMT+04:00 Indra Pramana in...@sg.or.id: Hi Irek, Thanks for the article. Do you have any other web sources pertaining to the same issue, which is in English? Looking forward to your reply, thank you. Cheers. On Mon, Apr 28, 2014 at 7:40 PM, Irek Fasikhov malm...@gmail.com wrote: Most likely you need to apply a patch to the kernel. http://www.theirek.com/blog/2014/02/16/patch-dlia-raboty-s-enierghoniezavisimym-keshiem-ssd-diskov 2014-04-28 15:20 GMT+04:00 Indra Pramana in...@sg.or.id: Hi Udo and Irek, Good day to you, and thank you for your emails. perhaps due IOs from the journal? You can test with iostat (like iostat -dm 5 sdg). Yes, I have shared the iostat result earlier on this same thread. At times the utilisation of the 2 journal drives will hit 100%, especially when I simulate writing data using rados bench command. Any suggestions what could be the cause of the I/O issue? avg-cpu: %user %nice %system %iowait %steal %idle 1.850.001.653.140.00 93.36 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdg 0.00 0.000.00 55.00 0.00 25365.33 922.3834.22 568.900.00 568.90 17.82 98.00 sdf 0.00 0.000.00 55.67 0.00 25022.67 899.0229.76 500.570.00 500.57 17.60 98.00 avg-cpu: %user %nice %system %iowait %steal %idle 2.100.001.372.070.00 94.46 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdg 0.00 0.000.00 56.67 0.00 25220.00 890.1223.60 412.140.00 412.14 17.62 99.87 sdf 0.00 0.000.00 52.00 0.00 24637.33 947.5933.65 587.410.00 587.41 19.23 100.00 avg-cpu: %user %nice %system %iowait %steal %idle 2.210.001.776.750.00 89.27 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdg 0.00 0.000.00 54.33 0.00 24802.67 912.9825.75 486.360.00 486.36 18.40 100.00 sdf 0.00 0.000.00 53.00 0.00 24716.00 932.6835.26 669.890.00 669.89 18.87 100.00 avg-cpu: %user %nice %system %iowait %steal %idle 1.870.001.675.250.00 91.21 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdg 0.00 0.000.00 94.33 0.00 26257.33 556.6918.29 208.440.00 208.44 10.50 99.07 sdf 0.00 0.000.00 51.33 0.00 24470.67 953.4032.75 684.620.00 684.62 19.51 100.13 avg-cpu: %user %nice %system %iowait %steal %idle 1.510.001.347.250.00 89.89 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdg 0.00 0.000.00 52.00 0.00 22565.33 867.9024.73 446.510.00 446.51 19.10 99.33 sdf 0.00 0.000.00 64.67 0.00 24892.00 769.8619.50 330.020.00 330.02 15.32 99.07 You what model SSD? For this one, I am using Seagate 100GB SSD, model: HDS-2TM-ST100FM0012 Which version of the kernel? Ubuntu 13.04, Linux kernel version: 3.8.0-19-generic #30-Ubuntu SMP Wed May 1 16:35:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux Looking forward to your reply, thank you. Cheers. On Mon, Apr 28, 2014 at 4:45 PM, Irek Fasikhov malm...@gmail.comwrote: You what model SSD? Which version of the kernel? 2014-04-28 12:35 GMT+04:00 Udo Lembke ulem...@polarzone.de: Hi, perhaps due IOs from the journal? You can test with iostat (like iostat -dm 5 sdg). on debian iostat is in the package sysstat. Udo Am 28.04.2014 07:38, schrieb Indra Pramana: Hi Craig, Good day to you, and thank you for your enquiry. As per your suggestion, I have created a 3rd partition on the SSDs and did the dd test directly into the device, and the result is very slow. root@ceph-osd-08:/mnt# dd bs=1M count=128 if=/dev/zero of=/dev/sdg3 conv=fdatasync oflag=direct 128+0 records in 128+0 records out 134217728 bytes (134 MB) copied, 19.5223 s, 6.9 MB/s root@ceph-osd-08:/mnt# dd bs=1M count=128 if=/dev/zero of=/dev/sdf3 conv=fdatasync oflag=direct 128+0 records in 128+0 records out 134217728 bytes (134 MB) copied, 5.34405 s, 25.1 MB/s I did a test onto another server with exactly similar specification
Re: [ceph-users] Pool with empty name recreated
Hi. radosgw-admin bucket list 2014-04-25 15:32 GMT+04:00 myk...@gmail.com: Hi, All. Yesterday i managed to reproduce the bug on my test environment with a fresh installation of dumpling release. I`ve attached the link to archive with debug logs. http://lamcdn.net/pool_with_empty_name_bug_logs.tar.gz Test cluster contains only one bucket with name test and one file in this bucket with name README and acl public-read. Pool with empty name is created when RGW processes request with non-existent bucket name. For example: $ curl -kIL http://rgw.test.lo/test/README HTTP/1.1 200 OK - bucket exists, file exists $ curl -kIL http://test.rgw.test.lo/README HTTP/1.1 200 OK - bucket exists, file exists $ curl -kIL http://rgw.test.lo/test/README2 HTTP/1.1 403 OK - bucket exists, file does not exists $ curl -kIL http://test.rgw.test.lo/README2 HTTP/1.1 403 Forbidden - bucket exists, file does not exists $ curl -kIL http://rgw.test.lo/test2/README HTTP/1.1 404 Not Found - bucket does not exists, pool with empty name is created $ curl -kIL http://test2.rgw.test.lo/README HTTP/1.1 404 Not Found - bucket does not exists, pool with empty name is created If someone confirm this behaviour we can file a bug and request backport. -- Regards, Mikhail On Thu, 24 Apr 2014 10:33:00 -0700 Gregory Farnum g...@inktank.com wrote: Yehuda says he's fixed several of these bugs in recent code, but if you're seeing it from a recent dev release, please file a bug! Likewise if you're on a named release and would like to see a backport. :) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Thu, Apr 24, 2014 at 4:10 AM, Dan van der Ster daniel.vanders...@cern.ch wrote: Hi, We also get the '' pool from rgw, which is clearly a bug somewhere. But we recently learned that you can prevent it from being recreated by removing the 'x' capability on the mon from your client.radosgw.* users, for example: client.radosgw.cephrgw1 key: xxx caps: [mon] allow r caps: [osd] allow rwx Cheers, Dan myk...@gmail.com wrote: Hi, I cant delete pool with empty name: $ sudo rados rmpool --yes-i-really-really-mean-it successfully deleted pool but after a few seconds it is recreated automatically. $ sudo ceph osd dump | grep '^pool' pool 3 '.rgw' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 9 owner 18446744073709551615 pool 4 '.rgw.gc' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 10 owner 18446744073709551615 pool 5 '.rgw.control' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 11 owner 18446744073709551615 pool 6 '.users.uid' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 13 owner 0 pool 7 '.users.email' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 15 owner 0 pool 8 '.users' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 17 owner 0 pool 9 '.rgw.buckets' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 38 owner 18446744073709551615 pool 10 '.rgw.root' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 100 owner 0 pool 17 '' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 3347 owner 0 ceph version 0.67.7 (d7ab4244396b57aac8b7e80812115bbd079e6b73) How can i delete it forever? -- Dan van der Ster || Data Storage Services || CERN IT Department -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Pool with empty name recreated
You need to create a pool named .rgw.buckets.index 2014-04-24 14:05 GMT+04:00 myk...@gmail.com: Hi, I cant delete pool with empty name: $ sudo rados rmpool --yes-i-really-really-mean-it successfully deleted pool but after a few seconds it is recreated automatically. $ sudo ceph osd dump | grep '^pool' pool 3 '.rgw' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 9 owner 18446744073709551615 pool 4 '.rgw.gc' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 10 owner 18446744073709551615 pool 5 '.rgw.control' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 11 owner 18446744073709551615 pool 6 '.users.uid' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 13 owner 0 pool 7 '.users.email' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 15 owner 0 pool 8 '.users' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 17 owner 0 pool 9 '.rgw.buckets' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 38 owner 18446744073709551615 pool 10 '.rgw.root' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 100 owner 0 pool 17 '' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 3347 owner 0 ceph version 0.67.7 (d7ab4244396b57aac8b7e80812115bbd079e6b73) How can i delete it forever? -- Regards, Mikhail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Pool with empty name recreated
These pools of different purposes. [root@ceph01 ~]# radosgw-admin zone list { zones: [ default]} [root@ceph01 ~]# radosgw-admin zone get default { domain_root: .rgw, control_pool: .rgw.control, gc_pool: .rgw.gc, log_pool: .log, intent_log_pool: .intent-log, usage_log_pool: .usage, user_keys_pool: .users, user_email_pool: .users.email, user_swift_pool: .users.swift, user_uid_pool: .users.uid, system_key: { access_key: , secret_key: }, placement_pools: [ { key: default-placement, val: { *index_pool: .rgw.buckets.index,* *data_pool: .rgw.buckets*}}]} [root@ceph01 ~]# ceph osd dump | grep pool pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 5156 owner 0 crash_replay_interval 45 pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 64 pgp_num 64 last_change 5158 owner 0 pool 2 'rbd' rep size 3 min_size 2 crush_ruleset 2 object_hash rjenkins pg_num 3200 pgp_num 3200 last_change 11642 owner 0 pool 80 'rbdtest' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 11550 owner 0 pool 101 '.rgw.root' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 11476 owner 0 pool 102 '.rgw.control' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 11478 owner 0 pool 103 '.users.uid' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 11480 owner 18446744073709551615 pool 104 '.rgw' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 11482 owner 18446744073709551615 pool 105 '.rgw.gc' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 11484 owner 18446744073709551615 pool 106 '.users.email' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 11486 owner 0 pool 107 '.users' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 11488 owner 0 pool 108 '.rgw.buckets.index' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 11490 owner 0 pool 109 '.rgw.buckets' rep size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 512 pgp_num 512 last_change 11631 owner 0 2014-04-24 14:30 GMT+04:00 myk...@gmail.com: You need to create a pool named .rgw.buckets.index I tried it before i sent a letter to the list. All of my buckets have index_pool: .rgw.buckets. -- Regards, Mikhail On Thu, 24 Apr 2014 14:21:57 +0400 Irek Fasikhov malm...@gmail.com wrote: You need to create a pool named .rgw.buckets.index 2014-04-24 14:05 GMT+04:00 myk...@gmail.com: Hi, I cant delete pool with empty name: $ sudo rados rmpool --yes-i-really-really-mean-it successfully deleted pool but after a few seconds it is recreated automatically. $ sudo ceph osd dump | grep '^pool' pool 3 '.rgw' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 9 owner 18446744073709551615 pool 4 '.rgw.gc' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 10 owner 18446744073709551615 pool 5 '.rgw.control' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 11 owner 18446744073709551615 pool 6 '.users.uid' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 13 owner 0 pool 7 '.users.email' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 15 owner 0 pool 8 '.users' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 17 owner 0 pool 9 '.rgw.buckets' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 38 owner 18446744073709551615 pool 10 '.rgw.root' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 100 owner 0 pool 17 '' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 3347 owner 0 ceph version 0.67.7 (d7ab4244396b57aac8b7e80812115bbd079e6b73) How can i delete it forever? -- Regards, Mikhail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Pool with empty name recreated
I do not use distributed replication across zones. :) 2014-04-24 15:00 GMT+04:00 myk...@gmail.com: I dont use distributed replication across zones. $ sudo radosgw-admin zone list { zones: [ default]} -- Regards, Mikhail On Thu, 24 Apr 2014 14:52:09 +0400 Irek Fasikhov malm...@gmail.com wrote: These pools of different purposes. [root@ceph01 ~]# radosgw-admin zone list { zones: [ default]} [root@ceph01 ~]# radosgw-admin zone get default { domain_root: .rgw, control_pool: .rgw.control, gc_pool: .rgw.gc, log_pool: .log, intent_log_pool: .intent-log, usage_log_pool: .usage, user_keys_pool: .users, user_email_pool: .users.email, user_swift_pool: .users.swift, user_uid_pool: .users.uid, system_key: { access_key: , secret_key: }, placement_pools: [ { key: default-placement, val: { *index_pool: .rgw.buckets.index,* *data_pool: .rgw.buckets*}}]} [root@ceph01 ~]# ceph osd dump | grep pool pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 5156 owner 0 crash_replay_interval 45 pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 64 pgp_num 64 last_change 5158 owner 0 pool 2 'rbd' rep size 3 min_size 2 crush_ruleset 2 object_hash rjenkins pg_num 3200 pgp_num 3200 last_change 11642 owner 0 pool 80 'rbdtest' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 11550 owner 0 pool 101 '.rgw.root' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 11476 owner 0 pool 102 '.rgw.control' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 11478 owner 0 pool 103 '.users.uid' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 11480 owner 18446744073709551615 pool 104 '.rgw' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 11482 owner 18446744073709551615 pool 105 '.rgw.gc' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 11484 owner 18446744073709551615 pool 106 '.users.email' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 11486 owner 0 pool 107 '.users' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 11488 owner 0 pool 108 '.rgw.buckets.index' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 11490 owner 0 pool 109 '.rgw.buckets' rep size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 512 pgp_num 512 last_change 11631 owner 0 2014-04-24 14:30 GMT+04:00 myk...@gmail.com: You need to create a pool named .rgw.buckets.index I tried it before i sent a letter to the list. All of my buckets have index_pool: .rgw.buckets. -- Regards, Mikhail On Thu, 24 Apr 2014 14:21:57 +0400 Irek Fasikhov malm...@gmail.com wrote: You need to create a pool named .rgw.buckets.index 2014-04-24 14:05 GMT+04:00 myk...@gmail.com: Hi, I cant delete pool with empty name: $ sudo rados rmpool --yes-i-really-really-mean-it successfully deleted pool but after a few seconds it is recreated automatically. $ sudo ceph osd dump | grep '^pool' pool 3 '.rgw' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 9 owner 18446744073709551615 pool 4 '.rgw.gc' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 10 owner 18446744073709551615 pool 5 '.rgw.control' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 11 owner 18446744073709551615 pool 6 '.users.uid' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 13 owner 0 pool 7 '.users.email' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 15 owner 0 pool 8 '.users' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 17 owner 0 pool 9 '.rgw.buckets' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 38 owner 18446744073709551615 pool 10 '.rgw.root' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 100 owner 0 pool 17 '' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 3347 owner 0 ceph version 0.67.7 (d7ab4244396b57aac8b7e80812115bbd079e6b73) How can i delete it forever? -- Regards, Mikhail