Re: [ceph-users] Move RGW bucket index
Thanks, Sean! BTW, is it a good idea to turn off scrub and deep-scrub on bucket.index pool? We have something like 5 million objects in it and when it is scrubbing RGW just stops working until it's finished... Or will setting the "idle" IO priority for scrub help? 2016-06-12 16:07 GMT+03:00 Sean Redmond: > Hi Vasily, > > You don't need to create a new pool and move the data to a new pool, you can > just update the crush map rule set to tell the existing RGW index pool to > use a different 'root'. > (http://docs.ceph.com/docs/master/rados/operations/crush-map/#crushmaprules) > > This change can be done online, but I would advise you do it at a quite time > and set sensible levels of back fill and recovery as it will result in the > movement of data, > > Thanks > > On Sun, Jun 12, 2016 at 1:43 PM, Василий Ангапов wrote: >> >> Hello! >> >> I did not find any information on how to move existing RGW bucket >> index pool to new one. >> I want to move my bucket indices on SSD disks, do I have to shut down >> the whole RGW or not? Would be very grateful for any tip. >> >> Regards, Vasily. >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW pools type
.rgw.buckets are all we have as EC. The remainder are replication. Thanks, CJ On Sun, Jun 12, 2016 at 4:12 AM, Василий Ангаповwrote: > Hello! > > I have a question regarding RGW pools type: what pools can be Erasure > Coded? > More exactly, I have the following pools: > > .rgw.root (EC) > ed-1.rgw.control (EC) > ed-1.rgw.data.root (EC) > ed-1.rgw.gc (EC) > ed-1.rgw.intent-log (EC) > ed-1.rgw.buckets.data (EC) > ed-1.rgw.meta (EC) > ed-1.rgw.users.keys (REPL) > ed-1.rgw.users.email (REPL) > ed-1.rgw.users.uid (REPL) > ed-1.rgw.users.swift (REPL) > ed-1.rgw.users (REPL) > ed-1.rgw.log (REPL) > ed-1.rgw.buckets.index (REPL) > ed-1.rgw.buckets.non-ec (REPL) > ed-1.rgw.usage (REPL) > > Is that ok? > > Regards, Vasily > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Best Regards, Chris Jones cjo...@cloudm2.com (p) 770.655.0770 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] librados and multithreading
I don't know. That is why i'm asking here. 2016-06-12 6:36 GMT+03:00 Ken Peng: > Hi, > > We had experienced the similar error, when writing to RBD block with > multi-threads using fio, some OSD got error and down. > Did we talk about the same stuff? > > 2016-06-11 0:37 GMT+08:00 Юрий Соколов : >> >> Good day, all. >> >> I found this issue: https://github.com/ceph/ceph/pull/5991 >> >> Did this issue affected librados ? >> Were it safe to use single rados_ioctx_t from multiple threads before this >> fix? >> >> -- >> With regards, >> Sokolov Yura aka funny_falcon >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- With regards, Sokolov Yura aka funny_falcon ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Move RGW bucket index
Hello! I did not find any information on how to move existing RGW bucket index pool to new one. I want to move my bucket indices on SSD disks, do I have to shut down the whole RGW or not? Would be very grateful for any tip. Regards, Vasily. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW performance s3 many objects
Wade, I'm having the same problem as you do. We have currently 5+ million objects in a bucket and it is not even sharded, so we observe many problems with that. Did you manage to test RGW with tons of files? 2016-05-24 2:45 GMT+03:00 Wade Holler: > We (my customer ) are trying to test at Jewell now but I can say that the > above behavior was also observed by my customer at Infernalis. After 300 > million or so objects in a single bucket the cluster basically fell down as > described above. Few hundred osds in this cluster. We are concerned that > this may not be remedied by a hundreds of buckets approach as well. Testing > continues. > On Mon, May 23, 2016 at 7:35 PM Vickey Singh > wrote: >> >> Hello Guys >> >> Is several millions of object with Ceph ( for RGW use case ) still an >> issue ? Or is it fixed ? >> >> Thnx >> Vickey >> >> On Thu, Jan 28, 2016 at 12:55 AM, Krzysztof Księżyk >> wrote: >>> >>> Stefan Rogge writes: >>> >>> > >>> > >>> > Hi, >>> > we are using the Ceph with RadosGW and S3 setting. >>> > With more and more objects in the storage the writing speed slows down >>> significantly. With 5 million object in the storage we had a writing >>> speed >>> of 10MS/s. With 10 million objects in the storage its only 5MB/s. >>> > Is this a common issue? >>> > Is the RadosGW suitable for a large amount of objects or would you >>> recommend to not use the RadosGW with these amount of objects? >>> > >>> > Thank you. >>> > >>> > Stefan >>> > >>> > I found also a ticket at the ceph tracker with the same issue: >>> > >>> > >>> > http://tracker.ceph.com/projects/ceph/wiki/Rgw_-_bucket_index_scalability >>> > >>> > ___ >>> > ceph-users mailing list >>> > ceph-users@... >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> > >>> >>> Hi, >>> >>> I'm struggling with the same issue on Ceph 9.2.0. Unfortunately I wasn't >>> aware of it and now the only way to improve things is create new bucket >>> with bucket index shrading or change way our apps store data into >>> buckets. >>> And of course copy tons of data :( In my case also sth happened to >>> leveldb >>> files and now I cannot even run some radosgw-admin commands like: >>> >>> radosgw-admin bucket check -b >>> >>> what causes osd daemon flapping and process timeout messages in logs. PGS >>> containing .rgw.bucket.index can't be even backfilled to other osd as >>> osd >>> process dies with messages: >>> >>> [...] >>> > 2016-01-25 15:47:22.700737 7f79fc66d700 1 heartbeat_map is_healthy >>> 'OSD::osd_op_tp thread 0x7f7992c86700' had suicide timed out after 150 >>> > 2016-01-25 15:47:22.702619 7f79fc66d700 -1 common/HeartbeatMap.cc: In >>> function 'bool ceph::HeartbeatMap::_check(const >>> ceph::heartbeat_handle_d*, >>> const char*, time_t)' thread 7f79fc66d700 time 2016-01-25 15:47:22.700751 >>> > common/HeartbeatMap.cc: 81: FAILED assert(0 == "hit suicide timeout") >>> > >>> > ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299) >>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> const*)+0x85) [0x7f7a019f4be5] >>> > 2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char >>> const*, long)+0x2d9) [0x7f7a019343b9] >>> > 3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0x7f7a01934bf6] >>> > 4: (ceph::HeartbeatMap::check_touch_file()+0x2c) [0x7f7a019353bc] >>> > 5: (CephContextServiceThread::entry()+0x15b) [0x7f7a01a10dcb] >>> > 6: (()+0x7df5) [0x7f79ffa8fdf5] >>> > 7: (clone()+0x6d) [0x7f79fe3381ad] >>> > >>> > >>> I don't know - maybe it's because number of leveldb files in omap folder >>> (total 5.1GB). Read somewhere that things can be improved by setting >>> 'leveldb_compression' to false and leveldb_compact_on_mount to true but I >>> don't know if these options have any effect in 9.2.0 as they are not >>> documented for this release. Tried with 'leveldb_compression' but without >>> visible effect and wasn't brave enough with trying >>> leveldb_compact_on_mount >>> on production env. But setting it to true on my test 0.94.5 makes osd >>> failing on restart. >>> >>> Kind regards - >>> Krzysztof Księżyk >>> >>> >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy prepare journal on software raid ( md device )
Hi to myself =) just in case other's run into the same: #1: You will have to update parted from version 3.1 to 3.2 ( for example simply take the fedora package, its newer, and replace with it ) -which is responsible for partprobe. #2: Softwareraid will still not work, because of the guid of the partition. ceph-deploy will recognize it as something different than expected. So ceph-deploy + software raid will not work. Maybe it will work with a manual osd creation, i did not test it. In any case: updating the parted package to make partprobe less complaining is a very good idea if you work with any kind of raid devices. -- Mit freundlichen Gruessen / Best regards Oliver Dzombic IP-Interactive mailto:i...@ip-interactive.de Anschrift: IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3 63571 Gelnhausen HRB 93402 beim Amtsgericht Hanau Geschäftsführung: Oliver Dzombic Steuer Nr.: 35 236 3622 1 UST ID: DE274086107 Am 08.06.2016 um 19:55 schrieb Oliver Dzombic: > Hi, > > i red, that ceph-deploy does not support software raid devices > > http://tracker.ceph.com/issues/13084 > > But thats already nearly 1 year ago, and the problem is different. > > As it seems to me, the "only" major problem is, that the newly created > journal partition remains in the "Device or ressource busy" state. So > that ceph-deploy gives up after some time. > > Does anyone knows a workaround ? > > > [root@cephmon1 ceph-cluster-gen2]# ceph-deploy osd prepare > cephosd1:/dev/sdf:/dev/md128 > [ceph_deploy.conf][DEBUG ] found configuration file at: > /root/.cephdeploy.conf > [ceph_deploy.cli][INFO ] Invoked (1.5.33): /usr/bin/ceph-deploy osd > prepare cephosd1:/dev/sdf:/dev/md128 > [ceph_deploy.cli][INFO ] ceph-deploy options: > [ceph_deploy.cli][INFO ] username : None > [ceph_deploy.cli][INFO ] disk : [('cephosd1', > '/dev/sdf', '/dev/md128')] > [ceph_deploy.cli][INFO ] dmcrypt : False > [ceph_deploy.cli][INFO ] verbose : False > [ceph_deploy.cli][INFO ] bluestore : None > [ceph_deploy.cli][INFO ] overwrite_conf: False > [ceph_deploy.cli][INFO ] subcommand: prepare > [ceph_deploy.cli][INFO ] dmcrypt_key_dir : > /etc/ceph/dmcrypt-keys > [ceph_deploy.cli][INFO ] quiet : False > [ceph_deploy.cli][INFO ] cd_conf : > > [ceph_deploy.cli][INFO ] cluster : ceph > [ceph_deploy.cli][INFO ] fs_type : xfs > [ceph_deploy.cli][INFO ] func : at 0x7f57abff9c08> > [ceph_deploy.cli][INFO ] ceph_conf : None > [ceph_deploy.cli][INFO ] default_release : False > [ceph_deploy.cli][INFO ] zap_disk : False > [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks > cephosd1:/dev/sdf:/dev/md128 > [cephosd1][DEBUG ] connected to host: cephosd1 > [cephosd1][DEBUG ] detect platform information from remote host > [cephosd1][DEBUG ] detect machine type > [cephosd1][DEBUG ] find the location of an executable > [ceph_deploy.osd][INFO ] Distro info: CentOS Linux 7.2.1511 Core > [ceph_deploy.osd][DEBUG ] Deploying osd to cephosd1 > [cephosd1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf > [cephosd1][WARNIN] osd keyring does not exist yet, creating one > [cephosd1][DEBUG ] create a keyring file > [ceph_deploy.osd][DEBUG ] Preparing host cephosd1 disk /dev/sdf journal > /dev/md128 activate False > [cephosd1][DEBUG ] find the location of an executable > [cephosd1][INFO ] Running command: /usr/sbin/ceph-disk -v prepare > --cluster ceph --fs-type xfs -- /dev/sdf /dev/md128 > [cephosd1][WARNIN] command: Running command: /usr/bin/ceph-osd > --cluster=ceph --show-config-value=fsid > [cephosd1][WARNIN] command: Running command: /usr/bin/ceph-osd > --check-allows-journal -i 0 --cluster ceph > [cephosd1][WARNIN] command: Running command: /usr/bin/ceph-osd > --check-wants-journal -i 0 --cluster ceph > [cephosd1][WARNIN] command: Running command: /usr/bin/ceph-osd > --check-needs-journal -i 0 --cluster ceph > [cephosd1][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdf uuid path is > /sys/dev/block/8:80/dm/uuid > [cephosd1][WARNIN] command: Running command: /usr/bin/ceph-osd > --cluster=ceph --show-config-value=osd_journal_size > [cephosd1][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdf uuid path is > /sys/dev/block/8:80/dm/uuid > [cephosd1][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdf uuid path is > /sys/dev/block/8:80/dm/uuid > [cephosd1][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdf uuid path is > /sys/dev/block/8:80/dm/uuid > [cephosd1][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdf1 uuid path is > /sys/dev/block/8:81/dm/uuid > [cephosd1][WARNIN] command: Running command: /usr/bin/ceph-conf > --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs > [cephosd1][WARNIN] command: Running command:
Re: [ceph-users] hdparm SG_IO: bad/missing sense data LSI 3108
Hi Brad, thank you very much for your answer. I found after digging quiet long the information to update the parted version of centos. From version 3.1 to version 3.2. This caused, in the very end, that partprobe started to run cleanly through with ceph-deploy. So i could leave the path of trying to manually create the osd's. Have a nice sunday ! -- Mit freundlichen Gruessen / Best regards Oliver Dzombic IP-Interactive mailto:i...@ip-interactive.de Anschrift: IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3 63571 Gelnhausen HRB 93402 beim Amtsgericht Hanau Geschäftsführung: Oliver Dzombic Steuer Nr.: 35 236 3622 1 UST ID: DE274086107 Am 12.06.2016 um 02:08 schrieb Brad Hubbard: > On Sat, Jun 11, 2016 at 9:48 PM, Oliver Dzombic> wrote: >> Hi, >> >> ceph-osd fails because: >> >> #ceph-osd -i 0 --mkfs --mkkey --osd-journal /dev/sde1 >> >> SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0d 00 00 00 >> 00 20 00 00 00 00 00 00 >> 00 00 00 00 00 00 00 00 00 00 00 00 00 > > $ sg_decode_sense 70 00 05 00 00 00 00 0d 00 00 00 00 20 00 00 00 00 > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > Fixed format, current; Sense key: Illegal Request > Additional sense: Invalid command operation code > > Which is clearly the "5" and "20" from this table. > > http://ohlandl.ipv7.net/errors/adv_diags_SCSI_Errors.html > > This looks like an error at at least the driver level. > > I'd be inclined to double check configuration, crank up SCSI debugging > (which may identify which command it doesn't like) and run as many > diagnostics as possible. > > HTH, > Brad > >> 2016-06-11 15:37:50.776385 7f64c5f4a800 -1 journal check: ondisk fsid >> -0 >> 000--- doesn't match expected >> 564ad954-6ec3-4638-8cc0-ea1f33 >> 1ad1f9, invalid (someone else's?) journal >> SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0d 00 00 00 >> 00 20 00 00 00 00 00 00 >> 00 00 00 00 00 00 00 00 00 00 00 00 00 >> SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0d 00 00 00 >> 00 20 00 00 00 00 00 00 >> 00 00 00 00 00 00 00 00 00 00 00 00 00 >> SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0d 00 00 00 >> 00 20 00 00 00 00 00 00 >> 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 2016-06-11 15:37:50.791534 7f64c5f4a800 -1 >> filestore(/var/lib/ceph/osd/ceph-0) c >> ould not find #-1:7b3f43c4:::osd_superblock:0# in index: (2) >> No such file or directory >> 2016-06-11 15:37:50.795717 7f64c5f4a800 -1 created object store >> /var/lib/ceph/osd/ceph-0 >> for osd.0 fsid a8171427-141c-4766-9e0f-533d86dd4ef8 >> 2016-06-11 15:37:50.795754 7f64c5f4a800 -1 auth: error reading file: >> /var/lib/ce >> ph/osd/ceph-0/keyring: can't open /var/lib/ceph/osd/ceph-0/keyring: (2) >> No such file or directory >> 2016-06-11 15:37:50.795885 7f64c5f4a800 -1 created new key in keyring >> /var/lib/c >> eph/osd/ceph-0/keyring >> >> >> --- >> >> >> # hdparm -i /dev/sde >> >> /dev/sde: >> SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0d 00 00 00 >> 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> HDIO_GET_IDENTITY failed: Invalid argument >> >> >> Thats connected to a LSI 3108 controller, exportet as Raid 1. >> I have also some Raid 10, and Raid 0 devices exportet. >> >> Everywhere the same. >> >> Kernel is 3.10.0-327.18.2.el7.x86_64 >> >> >> Can anyone tell me how to fix this ? >> >> Thank you ! >> >> -- >> Mit freundlichen Gruessen / Best regards >> >> Oliver Dzombic >> IP-Interactive >> >> mailto:i...@ip-interactive.de >> >> Anschrift: >> >> IP Interactive UG ( haftungsbeschraenkt ) >> Zum Sonnenberg 1-3 >> 63571 Gelnhausen >> >> HRB 93402 beim Amtsgericht Hanau >> Geschäftsführung: Oliver Dzombic >> >> Steuer Nr.: 35 236 3622 1 >> UST ID: DE274086107 >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Help recovering failed cluster
> Current cluster health: >cluster 537a3e12-95d8-48c3-9e82-91abbfdf62e0 > health HEALTH_WARN >5 pgs degraded >8 pgs down >48 pgs incomplete >3 pgs recovering >1 pgs recovery_wait >76 pgs stale >5 pgs stuck degraded >48 pgs stuck inactive >76 pgs stuck stale >53 pgs stuck unclean >5 pgs stuck undersized >5 pgs undersized First I have to remark on you having 7 mons. Your cluster is very small - many clusters with hundreds of OSD’s are happy with 5. At the Vancouver OpenStack summer there was a discussion re number of mons, there was consensus that 5 is generally plenty and that with 7+ the traffic among them really starts being excessive. YMMV of course. Assuming you have size on your pools set to 3 and min_size set to 2, this might be one of those times where temporarily setting min_size on the pools to 1 does the trick or at least helps. I suspect in your case it wouldn’t completely heal the cluster but it might improve it and allow recovery to proceed. Later you’d revert to the usual setting for obvious reasons. — Anthony ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Journal partition owner's not change to ceph
> The GUID for a CEPH journal partition should be > "45B0969E-9B03-4F30-B4C6-B4B80CEFF106" > I haven't been able to find this info in the documentation on the ceph site The GUID typecodes are listed in the /usr/sbin/ceph-disk script. I had an issue a couple years ago where a subset of OSD’s in one cluster would not start at boot, but if they were mounted and manually started they would run. Turned out that some goof that predated me had messed with the typecodes ; correcting them with sgdisk restored them to normal behavior. That was Dumpling FWIW. — Anthony ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RGW pools type
Hello! I have a question regarding RGW pools type: what pools can be Erasure Coded? More exactly, I have the following pools: .rgw.root (EC) ed-1.rgw.control (EC) ed-1.rgw.data.root (EC) ed-1.rgw.gc (EC) ed-1.rgw.intent-log (EC) ed-1.rgw.buckets.data (EC) ed-1.rgw.meta (EC) ed-1.rgw.users.keys (REPL) ed-1.rgw.users.email (REPL) ed-1.rgw.users.uid (REPL) ed-1.rgw.users.swift (REPL) ed-1.rgw.users (REPL) ed-1.rgw.log (REPL) ed-1.rgw.buckets.index (REPL) ed-1.rgw.buckets.non-ec (REPL) ed-1.rgw.usage (REPL) Is that ok? Regards, Vasily ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com