Re: [ceph-users] Change Partition Schema on OSD Possible?
> Op 17 jan. 2017 om 05:31 heeft Hauke Homburg het > volgende geschreven: > > Am 16.01.2017 um 12:24 schrieb Wido den Hollander: >>> Op 14 januari 2017 om 14:58 schreef Hauke Homburg : >>> >>> >>> Am 14.01.2017 um 12:59 schrieb Wido den Hollander: > Op 14 januari 2017 om 11:05 schreef Hauke Homburg > : > > > Hello, > > In our Ceph Cluster are our HDD in the OSD with 50% DATA in GPT > Partitions configured. Can we change this Schema to have more Data > Storage? > How do you mean? > Our HDD are 5TB so i hope to have more Space when i change the GPT > bigger from 2TB to 3 oder 4 TB. > On a 5TB disks only 50% is used for data? What is the other 50% being used for? >>> I think for Journal. We worked with cephdeploy an with >>> data-path:journal-path on a Device. >> Hmm, that's weird. ceph-deploy uses a 5GB partition by default for the >> journal. >> >> Are you sure about that? Can you post a partition scheme of a disk and a 'df >> -h' output? > sgdisk -p /dev/sdg > Disk /dev/sdg: 11721045168 sectors, 5.5 TiB > Logical sector size: 512 bytes > Disk identifier (GUID): BFC047BB-75D7-4F18-B8A6-0C538454FA43 > Partition table holds up to 128 entries > First usable sector is 34, last usable sector is 11721045134 > Partitions will be aligned on 2048-sector boundaries > Total free space is 2014 sectors (1007.0 KiB) > > Number Start (sector)End (sector) Size Code Name > 110487808 11721045134 5.5 TiB ceph data > 2204810487807 5.0 GiB ceph journal > Looks good. 5GB journal and rest for the OSD's data. Nothing wrong there. Wido >> >> Wido >> > Can we modify the Partitions without install reinstall the Server? > Sure! Just like changing any other GPT partition. Don't forget to resize XFS afterwards with xfs_growfs. However, test this on one OSD/disk first before doing it on all. Wido > Whats the best Way to do this? Boot the Node with a Rescue CD and change > the Partition with gparted, and boot the Server again? > > Thanks for help > > Regards > > Hauke > > -- > www.w3-creative.de > > www.westchat.de > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> -- >>> www.w3-creative.de >>> >>> www.westchat.de >>> > > > -- > www.w3-creative.de > > www.westchat.de > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS
> Op 17 jan. 2017 om 03:47 heeft Tu Holmes het volgende > geschreven: > > I could use either one. I'm just trying to get a feel for how stable the > technology is in general. Stable. Multiple customers of me run it in production with the kernel client and serious load on it. No major problems. Wido >> On Mon, Jan 16, 2017 at 3:19 PM Sean Redmond wrote: >> What's your use case? Do you plan on using kernel or fuse clients? >> >> On 16 Jan 2017 23:03, "Tu Holmes" wrote: >> So what's the consensus on CephFS? >> >> Is it ready for prime time or not? >> >> //Tu >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS
I could use either one. I'm just trying to get a feel for how stable the technology is in general. On Mon, Jan 16, 2017 at 3:19 PM Sean Redmond wrote: > What's your use case? Do you plan on using kernel or fuse clients? > > On 16 Jan 2017 23:03, "Tu Holmes" wrote: > > So what's the consensus on CephFS? > > Is it ready for prime time or not? > > //Tu > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS
What's your use case? Do you plan on using kernel or fuse clients? On 16 Jan 2017 23:03, "Tu Holmes" wrote: > So what's the consensus on CephFS? > > Is it ready for prime time or not? > > //Tu > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 答复: 答复: 答复: Pipe "deadlock" in Hammer, 0.94.5
On Sat, Jan 14, 2017 at 7:54 PM, 许雪寒 wrote: > Thanks for your help:-) > > I checked the source code again, and in read_message, it does hold the > Connection::lock: You're correct of course; I wasn't looking and forgot about this bit. This was added to deal with client-allocated buffers and/or op cancellation in librados, IIRC, and unfortunately definitely does need to be synchronized — I'm not sure about with pipe lookups, but probably even that. :/ Unfortunately it looks like you're running a version that didn't come from upstream (I see hash 81d4ad40d0c2a4b73529ff0db3c8f22acd15c398 in another email, which I can't find), so there's not much we can do to help with the specifics of this case — it's fiddly and my guess would be the same as Sage's, which you say is not the case. -Greg > > while (left > 0) { > // wait for data > if (tcp_read_wait() < 0) > goto out_dethrottle; > > // get a buffer > connection_state->lock.Lock(); > map >::iterator p = > > connection_state->rx_buffers.find(header.tid); > if (p != connection_state->rx_buffers.end()) { > if (rxbuf.length() == 0 || p->second.second > != rxbuf_version) { > ldout(msgr->cct,10) > << "reader > seleting rx buffer v " > > << p->second.second << " at offset " > > << offset << " len " > > << p->second.first.length() << dendl; > rxbuf = p->second.first; > rxbuf_version = p->second.second; > // make sure it's big enough > if (rxbuf.length() < data_len) > rxbuf.push_back( > > buffer::create(data_len - rxbuf.length())); > blp = p->second.first.begin(); > blp.advance(offset); > } > } else { > if (!newbuf.length()) { > ldout(msgr->cct,20) > << "reader > allocating new rx buffer at offset " > > << offset << dendl; > alloc_aligned_buffer(newbuf, > data_len, data_off); > blp = newbuf.begin(); > blp.advance(offset); > } > } > bufferptr bp = blp.get_current_ptr(); > int read = MIN(bp.length(), left); > ldout(msgr->cct,20) > << "reader reading > nonblocking into " > << (void*) > bp.c_str() << " len " << bp.length() > << dendl; > int got = tcp_read_nonblocking(bp.c_str(), read); > ldout(msgr->cct,30) > << "reader read " << got << " > of " << read << dendl; > connection_state->lock.Unlock(); > if (got < 0) > goto out_dethrottle; > if (got > 0) { > blp.advance(got); > data.append(bp, 0, got); > offset += got; > left -= got; > } // else we got a signal or something; just loop. > } > > As shown in the above code, in the reading loop, it first lock > connection_state->lock and then do tcp_read_nonblocking. connection_state is > of type PipeConnectionRef, connection_state->lock is Connection::lock. > > On the other hand, I'll check that whether there are a lot of message to send > as you suggested. Thanks:-) > > > > 发件人: Gregory Farnum [gfar...@redhat.com] > > 发送时间: 2017年1月14日 9:39 > > 收件人: 许雪寒 > > Cc: jiajia zhong; ceph-users@lists.ceph.com > > 主题: Re: [ceph-users] 答复: 答复: Pipe "deadlock" in Hammer, 0.94.5 > > > > > > > > > > On Thu, Jan 12, 2017 at 7:58 PM, 许雪寒
[ceph-users] CephFS
So what's the consensus on CephFS? Is it ready for prime time or not? //Tu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mkfs.ext4 hang on RBD volume
Can you ensure that you have the "admin socket" configured for your librbd-backed VM so that you can do the following when you hit that condition: ceph --admin-daemon objecter_requests That will dump out any hung IO requests between librbd and the OSDs. I would also check your librbd logs to see if you are seeing an error like "heartbeat_map is_healthy 'tp_librbd thread tp_librbd' had timed out after 60" being logged periodically, which would indicate a thread deadlock within librbd. On Mon, Jan 16, 2017 at 1:12 PM, Vincent Godin wrote: > We are using librbd on a host with CentOS 7.2 via virtio-blk. This server > hosts the VMs on which we are doing our tests. But we have exactly the same > behaviour than #9071. We try to follow the thread to the bug 8818 but we > didn't reproduce the issue with a lot of DD. Each time we try with > mkfs.ext4, there is always one process over the 16 (we have 16 volumes) > which hangs ! > > 2017-01-16 17:45 GMT+01:00 Jason Dillaman : >> >> Are you using krbd directly within the VM or librbd via >> virtio-blk/scsi? Ticket #9071 is against krbd. >> >> On Mon, Jan 16, 2017 at 11:34 AM, Vincent Godin >> wrote: >> > In fact, we can reproduce the problem from VM with CentOS 6.7, 7.2 or >> > 7.3. >> > We can reproduce it each time with this config : one VM (here in CentOS >> > 6.7) >> > with 16 RBD volumes of 100GB attached. When we launch in serial >> > mkfs.ext4 on >> > each of these volumes, we allways encounter the problem on one of them. >> > We >> > tried with the option -E nodiscard but we still have the problem. It' >> > look >> > exactly like the bug #9071 with the same dmesg message : >> > >> > vdh: unknown partition table >> > EXT4-fs (vdf): mounted filesystem with ordered data mode. Opts: >> > EXT4-fs (vdg): mounted filesystem with ordered data mode. Opts: >> > INFO: task flush-252:112:2903 blocked for more than 120 seconds. >> > Not tainted 2.6.32-573.18.1.el6.x86_64 #1 >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this >> > message. >> > flush-252:112 D 0 2903 2 0x0080 >> > 8808328bf6e0 0046 8808 3d697f73 >> > 88082fbd7ec0 00021454 a78356ec >> > 2b9db4fe 81aa6700 88082efc9ad8 8808328bffd8 >> > Call Trace: >> > [] io_schedule+0x73/0xc0 >> > [] get_request_wait+0x108/0x1d0 >> > [] ? autoremove_wake_function+0x0/0x40 >> > [] blk_queue_bio+0x99/0x610 >> > [] generic_make_request+0x240/0x5a0 >> > [] ? mempool_alloc_slab+0x15/0x20 >> > [] ? mempool_alloc+0x63/0x140 >> > [] submit_bio+0x70/0x120 >> > [] submit_bh+0x11d/0x1f0 >> > [] __block_write_full_page+0x1c8/0x330 >> > [] ? end_buffer_async_write+0x0/0x190 >> > [] ? blkdev_get_block+0x0/0x20 >> > [] ? blkdev_get_block+0x0/0x20 >> > [] block_write_full_page_endio+0xe0/0x120 >> > [] ? find_get_pages_tag+0x40/0x130 >> > [] block_write_full_page+0x15/0x20 >> > [] blkdev_writepage+0x18/0x20 >> > [] __writepage+0x17/0x40 >> > [] write_cache_pages+0x1fd/0x4c0 >> > [] ? __writepage+0x0/0x40 >> > [] generic_writepages+0x24/0x30 >> > [] do_writepages+0x21/0x40 >> > [] writeback_single_inode+0xdd/0x290 >> > [] writeback_sb_inodes+0xbd/0x170 >> > [] writeback_inodes_wb+0xab/0x1b0 >> > [] wb_writeback+0x2f3/0x410 >> > [] wb_do_writeback+0xbb/0x240 >> > [] bdi_writeback_task+0x63/0x1b0 >> > [] ? bit_waitqueue+0x17/0xd0 >> > [] ? bdi_start_fn+0x0/0x100 >> > [] bdi_start_fn+0x86/0x100 >> > [] ? bdi_start_fn+0x0/0x100 >> > [] kthread+0x9e/0xc0 >> > [] child_rip+0xa/0x20 >> > [] ? kthread+0x0/0xc0 >> > [] ? child_rip+0x0/0x20 >> > INFO: task mkfs.ext4:3040 blocked for more than 120 seconds. >> > Not tainted 2.6.32-573.18.1.el6.x86_64 #1 >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this >> > message. >> > mkfs.ext4 D 0002 0 3040 3038 0x0080 >> > 88075e79f4d8 0082 8808 3d697f73 >> > 88082fb73130 00021472 a78356ec >> > 2b9db4fe 81aa6700 88082e787068 88075e79ffd8 >> > Call Trace: >> > [] io_schedule+0x73/0xc0 >> > [] get_request_wait+0x108/0x1d0 >> > [] ? autoremove_wake_function+0x0/0x40 >> > [] blk_queue_bio+0x99/0x610 >> > >> > Ceph version is Jewel 10.2.3 >> > Ceph clients, mons and servers have the kernel >> > 3.10.0-327.36.3.el7.x86_64 >> > on CentOS 7.2 >> > >> > 2017-01-13 20:07 GMT+01:00 Jason Dillaman : >> >> >> >> You might be hitting this issue [1] where mkfs is issuing lots of >> >> discard operations. If you get a chance, can you retest w/ the "-E >> >> nodiscard" option? >> >> >> >> Thanks >> >> >> >> [1] http://tracker.ceph.com/issues/16689 >> >> >> >> On Fri, Jan 13, 2017 at 12:57 PM, Vincent Godin >> >> wrote: >> >> > Thanks Jason, >> >> > >> >> > We observed a curious behavior : we have some VMs on CentOS 6.x >> >> > hosted >> >> > on >> >> > our Openstack computes which are
Re: [ceph-users] mkfs.ext4 hang on RBD volume
We are using librbd on a host with CentOS 7.2 via virtio-blk. This server hosts the VMs on which we are doing our tests. But we have exactly the same behaviour than #9071. We try to follow the thread to the bug 8818 but we didn't reproduce the issue with a lot of DD. Each time we try with mkfs.ext4, there is always one process over the 16 (we have 16 volumes) which hangs ! 2017-01-16 17:45 GMT+01:00 Jason Dillaman : > Are you using krbd directly within the VM or librbd via > virtio-blk/scsi? Ticket #9071 is against krbd. > > On Mon, Jan 16, 2017 at 11:34 AM, Vincent Godin > wrote: > > In fact, we can reproduce the problem from VM with CentOS 6.7, 7.2 or > 7.3. > > We can reproduce it each time with this config : one VM (here in CentOS > 6.7) > > with 16 RBD volumes of 100GB attached. When we launch in serial > mkfs.ext4 on > > each of these volumes, we allways encounter the problem on one of them. > We > > tried with the option -E nodiscard but we still have the problem. It' > look > > exactly like the bug #9071 with the same dmesg message : > > > > vdh: unknown partition table > > EXT4-fs (vdf): mounted filesystem with ordered data mode. Opts: > > EXT4-fs (vdg): mounted filesystem with ordered data mode. Opts: > > INFO: task flush-252:112:2903 blocked for more than 120 seconds. > > Not tainted 2.6.32-573.18.1.el6.x86_64 #1 > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this > message. > > flush-252:112 D 0 2903 2 0x0080 > > 8808328bf6e0 0046 8808 3d697f73 > > 88082fbd7ec0 00021454 a78356ec > > 2b9db4fe 81aa6700 88082efc9ad8 8808328bffd8 > > Call Trace: > > [] io_schedule+0x73/0xc0 > > [] get_request_wait+0x108/0x1d0 > > [] ? autoremove_wake_function+0x0/0x40 > > [] blk_queue_bio+0x99/0x610 > > [] generic_make_request+0x240/0x5a0 > > [] ? mempool_alloc_slab+0x15/0x20 > > [] ? mempool_alloc+0x63/0x140 > > [] submit_bio+0x70/0x120 > > [] submit_bh+0x11d/0x1f0 > > [] __block_write_full_page+0x1c8/0x330 > > [] ? end_buffer_async_write+0x0/0x190 > > [] ? blkdev_get_block+0x0/0x20 > > [] ? blkdev_get_block+0x0/0x20 > > [] block_write_full_page_endio+0xe0/0x120 > > [] ? find_get_pages_tag+0x40/0x130 > > [] block_write_full_page+0x15/0x20 > > [] blkdev_writepage+0x18/0x20 > > [] __writepage+0x17/0x40 > > [] write_cache_pages+0x1fd/0x4c0 > > [] ? __writepage+0x0/0x40 > > [] generic_writepages+0x24/0x30 > > [] do_writepages+0x21/0x40 > > [] writeback_single_inode+0xdd/0x290 > > [] writeback_sb_inodes+0xbd/0x170 > > [] writeback_inodes_wb+0xab/0x1b0 > > [] wb_writeback+0x2f3/0x410 > > [] wb_do_writeback+0xbb/0x240 > > [] bdi_writeback_task+0x63/0x1b0 > > [] ? bit_waitqueue+0x17/0xd0 > > [] ? bdi_start_fn+0x0/0x100 > > [] bdi_start_fn+0x86/0x100 > > [] ? bdi_start_fn+0x0/0x100 > > [] kthread+0x9e/0xc0 > > [] child_rip+0xa/0x20 > > [] ? kthread+0x0/0xc0 > > [] ? child_rip+0x0/0x20 > > INFO: task mkfs.ext4:3040 blocked for more than 120 seconds. > > Not tainted 2.6.32-573.18.1.el6.x86_64 #1 > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this > message. > > mkfs.ext4 D 0002 0 3040 3038 0x0080 > > 88075e79f4d8 0082 8808 3d697f73 > > 88082fb73130 00021472 a78356ec > > 2b9db4fe 81aa6700 88082e787068 88075e79ffd8 > > Call Trace: > > [] io_schedule+0x73/0xc0 > > [] get_request_wait+0x108/0x1d0 > > [] ? autoremove_wake_function+0x0/0x40 > > [] blk_queue_bio+0x99/0x610 > > > > Ceph version is Jewel 10.2.3 > > Ceph clients, mons and servers have the kernel > 3.10.0-327.36.3.el7.x86_64 > > on CentOS 7.2 > > > > 2017-01-13 20:07 GMT+01:00 Jason Dillaman : > >> > >> You might be hitting this issue [1] where mkfs is issuing lots of > >> discard operations. If you get a chance, can you retest w/ the "-E > >> nodiscard" option? > >> > >> Thanks > >> > >> [1] http://tracker.ceph.com/issues/16689 > >> > >> On Fri, Jan 13, 2017 at 12:57 PM, Vincent Godin > >> wrote: > >> > Thanks Jason, > >> > > >> > We observed a curious behavior : we have some VMs on CentOS 6.x hosted > >> > on > >> > our Openstack computes which are in CentOS 7.2. If we try to make a > >> > mkfs.ext4 on a volume create with the Jewel default (61) on the VM > it's > >> > hung > >> > and we have to reboot the VM to get a responsive system. This is > strange > >> > because the libvirt process is launched from the host which is in > CentOS > >> > 7.2. If a disable some features, the mkfs.ext4 succeed. If the VM is > in > >> > CentOS 7.x, there is no probleme at all. Maybe the kernel of the > CentOS > >> > 6.X > >> > is unable to use the exclusive-lock feature ? > >> > I think we will have to stay in a very conservative > rbd_default_features > >> > such 1 because we don't use stripping and the others features are not > >> > compa
Re: [ceph-users] ceph.com outages
Ignore that last post. After another try or 2 I got to the new site with the updates as described. Looks great! On 1/16/17, 9:12 AM, "ceph-devel-ow...@vger.kernel.org on behalf of McFarland, Bruce" wrote: >Patrick, >I’m probably overlooking something, but when I follow the ceph days link >there are no 2017 events only past. The cephalocon link goes to a 404 page >not found. > >Bruce > >On 1/16/17, 7:03 AM, "ceph-devel-ow...@vger.kernel.org on behalf of >Patrick McGarry" pmcga...@redhat.com> wrote: > >>Ok, the new website should be up and functional. Shout if you see >>anything that is still broken. >> >>As for the site itself, I'd like to highlight a few things worth checking >>out: >> >>* Ceph Days -- The first two Ceph Days have been posted, as well as >>the historical events for all of last year. >>http://ceph.com/cephdays/ >> >>* Cephalocon -- Our inaugural conference is happening in Aug. If you >>would like to know more, or are interested in sponsoring, please take >>a look. >>http://ceph.com/cephalocon2017/ >> >>* Resources -- A whole host of new resources have been aggregated on >>the new resources page. This includes links to use cases, performance, >>an updated pgcalc tool, ceph tech talks, publications and others. >>http://ceph.com/resources/ >> >>* Featured Developers -- This section will now feature a community >>developer for each of our named releases, starting with Kraken. >>Congratulations to Haomai on being our first featured developer! >>http://ceph.com/community/featured-developers/ >> >>* Logos -- Now we have reasonably-hig-res logos that you can use for >>your websites, presentations, and other assets. If you need something >>not featured here, drop us a line. >>http://ceph.com/logos/ >> >> >>Those may be the high points, but definitely take a few minutes to >>crawl through the site if you get a chance, lots of goodies await you. >>As always, if you see something amiss, please let me know. Thanks! >> >> >>On Mon, Jan 16, 2017 at 9:07 AM, Patrick McGarry >>wrote: >>> Hey cephers, >>> >>> Please bear with us as we migrate ceph.com as there may be some >>> outages. They should be quick and over soon. Thanks! >>> >>> >>> -- >>> >>> Best Regards, >>> >>> Patrick McGarry >>> Director Ceph Community || Red Hat >>> http://ceph.com || http://community.redhat.com >>> @scuttlemonkey || @ceph >> >> >> >>-- >> >>Best Regards, >> >>Patrick McGarry >>Director Ceph Community || Red Hat >>http://ceph.com || http://community.redhat.com >>@scuttlemonkey || @ceph >>-- >>To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>the body of a message to majord...@vger.kernel.org >>More majordomo info at http://vger.kernel.org/majordomo-info.html > >?��칻?�&�~�&�?��+-��ݶ?��w��˛���m�?�?^��b��^n�r���z�?��h&��?�G���h�?(�階 >�ݢj"��?�?m�z�ޖ���f���h���~�m� ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph.com outages
Patrick, I’m probably overlooking something, but when I follow the ceph days link there are no 2017 events only past. The cephalocon link goes to a 404 page not found. Bruce On 1/16/17, 7:03 AM, "ceph-devel-ow...@vger.kernel.org on behalf of Patrick McGarry" wrote: >Ok, the new website should be up and functional. Shout if you see >anything that is still broken. > >As for the site itself, I'd like to highlight a few things worth checking >out: > >* Ceph Days -- The first two Ceph Days have been posted, as well as >the historical events for all of last year. >http://ceph.com/cephdays/ > >* Cephalocon -- Our inaugural conference is happening in Aug. If you >would like to know more, or are interested in sponsoring, please take >a look. >http://ceph.com/cephalocon2017/ > >* Resources -- A whole host of new resources have been aggregated on >the new resources page. This includes links to use cases, performance, >an updated pgcalc tool, ceph tech talks, publications and others. >http://ceph.com/resources/ > >* Featured Developers -- This section will now feature a community >developer for each of our named releases, starting with Kraken. >Congratulations to Haomai on being our first featured developer! >http://ceph.com/community/featured-developers/ > >* Logos -- Now we have reasonably-hig-res logos that you can use for >your websites, presentations, and other assets. If you need something >not featured here, drop us a line. >http://ceph.com/logos/ > > >Those may be the high points, but definitely take a few minutes to >crawl through the site if you get a chance, lots of goodies await you. >As always, if you see something amiss, please let me know. Thanks! > > >On Mon, Jan 16, 2017 at 9:07 AM, Patrick McGarry >wrote: >> Hey cephers, >> >> Please bear with us as we migrate ceph.com as there may be some >> outages. They should be quick and over soon. Thanks! >> >> >> -- >> >> Best Regards, >> >> Patrick McGarry >> Director Ceph Community || Red Hat >> http://ceph.com || http://community.redhat.com >> @scuttlemonkey || @ceph > > > >-- > >Best Regards, > >Patrick McGarry >Director Ceph Community || Red Hat >http://ceph.com || http://community.redhat.com >@scuttlemonkey || @ceph >-- >To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >the body of a message to majord...@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mkfs.ext4 hang on RBD volume
Are you using krbd directly within the VM or librbd via virtio-blk/scsi? Ticket #9071 is against krbd. On Mon, Jan 16, 2017 at 11:34 AM, Vincent Godin wrote: > In fact, we can reproduce the problem from VM with CentOS 6.7, 7.2 or 7.3. > We can reproduce it each time with this config : one VM (here in CentOS 6.7) > with 16 RBD volumes of 100GB attached. When we launch in serial mkfs.ext4 on > each of these volumes, we allways encounter the problem on one of them. We > tried with the option -E nodiscard but we still have the problem. It' look > exactly like the bug #9071 with the same dmesg message : > > vdh: unknown partition table > EXT4-fs (vdf): mounted filesystem with ordered data mode. Opts: > EXT4-fs (vdg): mounted filesystem with ordered data mode. Opts: > INFO: task flush-252:112:2903 blocked for more than 120 seconds. > Not tainted 2.6.32-573.18.1.el6.x86_64 #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > flush-252:112 D 0 2903 2 0x0080 > 8808328bf6e0 0046 8808 3d697f73 > 88082fbd7ec0 00021454 a78356ec > 2b9db4fe 81aa6700 88082efc9ad8 8808328bffd8 > Call Trace: > [] io_schedule+0x73/0xc0 > [] get_request_wait+0x108/0x1d0 > [] ? autoremove_wake_function+0x0/0x40 > [] blk_queue_bio+0x99/0x610 > [] generic_make_request+0x240/0x5a0 > [] ? mempool_alloc_slab+0x15/0x20 > [] ? mempool_alloc+0x63/0x140 > [] submit_bio+0x70/0x120 > [] submit_bh+0x11d/0x1f0 > [] __block_write_full_page+0x1c8/0x330 > [] ? end_buffer_async_write+0x0/0x190 > [] ? blkdev_get_block+0x0/0x20 > [] ? blkdev_get_block+0x0/0x20 > [] block_write_full_page_endio+0xe0/0x120 > [] ? find_get_pages_tag+0x40/0x130 > [] block_write_full_page+0x15/0x20 > [] blkdev_writepage+0x18/0x20 > [] __writepage+0x17/0x40 > [] write_cache_pages+0x1fd/0x4c0 > [] ? __writepage+0x0/0x40 > [] generic_writepages+0x24/0x30 > [] do_writepages+0x21/0x40 > [] writeback_single_inode+0xdd/0x290 > [] writeback_sb_inodes+0xbd/0x170 > [] writeback_inodes_wb+0xab/0x1b0 > [] wb_writeback+0x2f3/0x410 > [] wb_do_writeback+0xbb/0x240 > [] bdi_writeback_task+0x63/0x1b0 > [] ? bit_waitqueue+0x17/0xd0 > [] ? bdi_start_fn+0x0/0x100 > [] bdi_start_fn+0x86/0x100 > [] ? bdi_start_fn+0x0/0x100 > [] kthread+0x9e/0xc0 > [] child_rip+0xa/0x20 > [] ? kthread+0x0/0xc0 > [] ? child_rip+0x0/0x20 > INFO: task mkfs.ext4:3040 blocked for more than 120 seconds. > Not tainted 2.6.32-573.18.1.el6.x86_64 #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > mkfs.ext4 D 0002 0 3040 3038 0x0080 > 88075e79f4d8 0082 8808 3d697f73 > 88082fb73130 00021472 a78356ec > 2b9db4fe 81aa6700 88082e787068 88075e79ffd8 > Call Trace: > [] io_schedule+0x73/0xc0 > [] get_request_wait+0x108/0x1d0 > [] ? autoremove_wake_function+0x0/0x40 > [] blk_queue_bio+0x99/0x610 > > Ceph version is Jewel 10.2.3 > Ceph clients, mons and servers have the kernel 3.10.0-327.36.3.el7.x86_64 > on CentOS 7.2 > > 2017-01-13 20:07 GMT+01:00 Jason Dillaman : >> >> You might be hitting this issue [1] where mkfs is issuing lots of >> discard operations. If you get a chance, can you retest w/ the "-E >> nodiscard" option? >> >> Thanks >> >> [1] http://tracker.ceph.com/issues/16689 >> >> On Fri, Jan 13, 2017 at 12:57 PM, Vincent Godin >> wrote: >> > Thanks Jason, >> > >> > We observed a curious behavior : we have some VMs on CentOS 6.x hosted >> > on >> > our Openstack computes which are in CentOS 7.2. If we try to make a >> > mkfs.ext4 on a volume create with the Jewel default (61) on the VM it's >> > hung >> > and we have to reboot the VM to get a responsive system. This is strange >> > because the libvirt process is launched from the host which is in CentOS >> > 7.2. If a disable some features, the mkfs.ext4 succeed. If the VM is in >> > CentOS 7.x, there is no probleme at all. Maybe the kernel of the CentOS >> > 6.X >> > is unable to use the exclusive-lock feature ? >> > I think we will have to stay in a very conservative rbd_default_features >> > such 1 because we don't use stripping and the others features are not >> > compatible with our old CentOS 6.x VMs .. >> > >> > A last question : is the rbd object-map rebuild a long process ? in an >> > other >> > way, does it cost the same time as a delete (which read all the blocks >> > possible for an image without omap feature). Is it a good idea to enable >> > omap feature on an already used image ? (I know that during the rebuild >> > process, the VM will have to be stopped) >> > >> > >> > >> > 2017-01-13 15:09 GMT+01:00 Jason Dillaman : >> >> >> >> On Fri, Jan 13, 2017 at 5:11 AM, Vincent Godin >> >> wrote: >> >> > We are using a production cluster which started in Firefly, then >> >> > moved >> >> > to >> >> > Giant,
Re: [ceph-users] mkfs.ext4 hang on RBD volume
In fact, we can reproduce the problem from VM with CentOS 6.7, 7.2 or 7.3. We can reproduce it each time with this config : one VM (here in CentOS 6.7) with 16 RBD volumes of 100GB attached. When we launch in serial mkfs.ext4 on each of these volumes, we allways encounter the problem on one of them. We tried with the option -E nodiscard but we still have the problem. It' look exactly like the bug #9071 with the same dmesg message : vdh: unknown partition table EXT4-fs (vdf): mounted filesystem with ordered data mode. Opts: EXT4-fs (vdg): mounted filesystem with ordered data mode. Opts: INFO: task flush-252:112:2903 blocked for more than 120 seconds. Not tainted 2.6.32-573.18.1.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. flush-252:112 D 0 2903 2 0x0080 8808328bf6e0 0046 8808 3d697f73 88082fbd7ec0 00021454 a78356ec 2b9db4fe 81aa6700 88082efc9ad8 8808328bffd8 Call Trace: [] io_schedule+0x73/0xc0 [] get_request_wait+0x108/0x1d0 [] ? autoremove_wake_function+0x0/0x40 [] blk_queue_bio+0x99/0x610 [] generic_make_request+0x240/0x5a0 [] ? mempool_alloc_slab+0x15/0x20 [] ? mempool_alloc+0x63/0x140 [] submit_bio+0x70/0x120 [] submit_bh+0x11d/0x1f0 [] __block_write_full_page+0x1c8/0x330 [] ? end_buffer_async_write+0x0/0x190 [] ? blkdev_get_block+0x0/0x20 [] ? blkdev_get_block+0x0/0x20 [] block_write_full_page_endio+0xe0/0x120 [] ? find_get_pages_tag+0x40/0x130 [] block_write_full_page+0x15/0x20 [] blkdev_writepage+0x18/0x20 [] __writepage+0x17/0x40 [] write_cache_pages+0x1fd/0x4c0 [] ? __writepage+0x0/0x40 [] generic_writepages+0x24/0x30 [] do_writepages+0x21/0x40 [] writeback_single_inode+0xdd/0x290 [] writeback_sb_inodes+0xbd/0x170 [] writeback_inodes_wb+0xab/0x1b0 [] wb_writeback+0x2f3/0x410 [] wb_do_writeback+0xbb/0x240 [] bdi_writeback_task+0x63/0x1b0 [] ? bit_waitqueue+0x17/0xd0 [] ? bdi_start_fn+0x0/0x100 [] bdi_start_fn+0x86/0x100 [] ? bdi_start_fn+0x0/0x100 [] kthread+0x9e/0xc0 [] child_rip+0xa/0x20 [] ? kthread+0x0/0xc0 [] ? child_rip+0x0/0x20 INFO: task mkfs.ext4:3040 blocked for more than 120 seconds. Not tainted 2.6.32-573.18.1.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mkfs.ext4 D 0002 0 3040 3038 0x0080 88075e79f4d8 0082 8808 3d697f73 88082fb73130 00021472 a78356ec 2b9db4fe 81aa6700 88082e787068 88075e79ffd8 Call Trace: [] io_schedule+0x73/0xc0 [] get_request_wait+0x108/0x1d0 [] ? autoremove_wake_function+0x0/0x40 [] blk_queue_bio+0x99/0x610 Ceph version is Jewel 10.2.3 Ceph clients, mons and servers have the kernel 3.10.0-327.36.3.el7.x86_64 on CentOS 7.2 2017-01-13 20:07 GMT+01:00 Jason Dillaman : > You might be hitting this issue [1] where mkfs is issuing lots of > discard operations. If you get a chance, can you retest w/ the "-E > nodiscard" option? > > Thanks > > [1] http://tracker.ceph.com/issues/16689 > > On Fri, Jan 13, 2017 at 12:57 PM, Vincent Godin > wrote: > > Thanks Jason, > > > > We observed a curious behavior : we have some VMs on CentOS 6.x hosted on > > our Openstack computes which are in CentOS 7.2. If we try to make a > > mkfs.ext4 on a volume create with the Jewel default (61) on the VM it's > hung > > and we have to reboot the VM to get a responsive system. This is strange > > because the libvirt process is launched from the host which is in CentOS > > 7.2. If a disable some features, the mkfs.ext4 succeed. If the VM is in > > CentOS 7.x, there is no probleme at all. Maybe the kernel of the CentOS > 6.X > > is unable to use the exclusive-lock feature ? > > I think we will have to stay in a very conservative rbd_default_features > > such 1 because we don't use stripping and the others features are not > > compatible with our old CentOS 6.x VMs .. > > > > A last question : is the rbd object-map rebuild a long process ? in an > other > > way, does it cost the same time as a delete (which read all the blocks > > possible for an image without omap feature). Is it a good idea to enable > > omap feature on an already used image ? (I know that during the rebuild > > process, the VM will have to be stopped) > > > > > > > > 2017-01-13 15:09 GMT+01:00 Jason Dillaman : > >> > >> On Fri, Jan 13, 2017 at 5:11 AM, Vincent Godin > >> wrote: > >> > We are using a production cluster which started in Firefly, then moved > >> > to > >> > Giant, Hammer and finally Jewel. So our images have different features > >> > correspondind to the value of "rbd_default_features" of the version > when > >> > they were created. > >> > We have actually three pack of features activated : > >> > image with : > >> > - layering ~ 1 > >> > - layering, striping ~3 > >> > - layering, exclusive-lock, object-map, fast-diff, de
[ceph-users] Ceph.com
The site looks great! Good job! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] librbd cache and clone awareness
On Mon, Jan 16, 2017 at 10:11 AM Jason Dillaman wrote: > On Sun, Jan 15, 2017 at 2:56 PM, Shawn Edwards > wrote: > > If I, say, have 10 rbd attached to the same box using librbd, all 10 of > the > > rbd are clones of the same snapshot, and I have caching turned on, will > each > > rbd be caching blocks from the parent snapshot individually, or will the > 10 > > rbd processes be building up their own cache, ignorant of the other > > process's caching? > > Each process will utilize its own, independent cache for the parent > image. There is no interprocess coordination to share the cache bits. > > That's what I figured. A shared, interprocess cache would be Hard, and not have many use cases. > > The use case I have is a lot of rbd-nbd mounted rbd used as VM boot disks > > all based on the same clone. It would be great if the parent clone > could be > > cached once and then only blocks which were copy-on-write were in each > > individual process's cache. > > If you are worried about the footprint, you could disable the use of > the cache within just the parent image by running "rbd metadata set > conf_rbd_cache false" against the parent image. Since you > are using a VM on top of librbd, the librbd cache is effectively a L2 > cache so you probably aren't gaining much by caching reads for the > parent image since the VM will most likely cache the reads as well. > > Ah, interesting. I'll give that a shot and see if that helps. Thanks for the suggestion. > -- > Jason > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] librbd cache and clone awareness
On Sun, Jan 15, 2017 at 2:56 PM, Shawn Edwards wrote: > If I, say, have 10 rbd attached to the same box using librbd, all 10 of the > rbd are clones of the same snapshot, and I have caching turned on, will each > rbd be caching blocks from the parent snapshot individually, or will the 10 > rbd processes be building up their own cache, ignorant of the other > process's caching? Each process will utilize its own, independent cache for the parent image. There is no interprocess coordination to share the cache bits. > The use case I have is a lot of rbd-nbd mounted rbd used as VM boot disks > all based on the same clone. It would be great if the parent clone could be > cached once and then only blocks which were copy-on-write were in each > individual process's cache. If you are worried about the footprint, you could disable the use of the cache within just the parent image by running "rbd metadata set conf_rbd_cache false" against the parent image. Since you are using a VM on top of librbd, the librbd cache is effectively a L2 cache so you probably aren't gaining much by caching reads for the parent image since the VM will most likely cache the reads as well. -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph.com outages
Hello, Le 16/01/2017 à 16:03, Patrick McGarry a écrit : > Ok, the new website should be up and functional. Shout if you see > anything that is still broken. Minor typos: "It replicates and re-balance data within the cluster dynamically—elminating this tedious task" -> re-balances -> eliminating http://ceph.com/ceph-storage/ > > As for the site itself, I'd like to highlight a few things worth checking out: > > * Ceph Days -- The first two Ceph Days have been posted, as well as > the historical events for all of last year. > http://ceph.com/cephdays/ > > * Cephalocon -- Our inaugural conference is happening in Aug. If you > would like to know more, or are interested in sponsoring, please take > a look. > http://ceph.com/cephalocon2017/ > > * Resources -- A whole host of new resources have been aggregated on > the new resources page. This includes links to use cases, performance, > an updated pgcalc tool, ceph tech talks, publications and others. > http://ceph.com/resources/ > > * Featured Developers -- This section will now feature a community > developer for each of our named releases, starting with Kraken. > Congratulations to Haomai on being our first featured developer! > http://ceph.com/community/featured-developers/ > > * Logos -- Now we have reasonably-hig-res logos that you can use for > your websites, presentations, and other assets. If you need something > not featured here, drop us a line. > http://ceph.com/logos/ > > > Those may be the high points, but definitely take a few minutes to > crawl through the site if you get a chance, lots of goodies await you. > As always, if you see something amiss, please let me know. Thanks! > > > On Mon, Jan 16, 2017 at 9:07 AM, Patrick McGarry wrote: >> Hey cephers, >> >> Please bear with us as we migrate ceph.com as there may be some >> outages. They should be quick and over soon. Thanks! >> >> >> -- >> >> Best Regards, >> >> Patrick McGarry >> Director Ceph Community || Red Hat >> http://ceph.com || http://community.redhat.com >> @scuttlemonkey || @ceph > > > -- Loris Cuoghi ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph.com outages
FYI, our ipv6 is lagging a bit behind ipv4 (and the red hat nameservers may take a bit to catch up), so you may see the old site for just a little bit longer. On Mon, Jan 16, 2017 at 10:03 AM, Patrick McGarry wrote: > Ok, the new website should be up and functional. Shout if you see > anything that is still broken. > > As for the site itself, I'd like to highlight a few things worth checking out: > > * Ceph Days -- The first two Ceph Days have been posted, as well as > the historical events for all of last year. > http://ceph.com/cephdays/ > > * Cephalocon -- Our inaugural conference is happening in Aug. If you > would like to know more, or are interested in sponsoring, please take > a look. > http://ceph.com/cephalocon2017/ > > * Resources -- A whole host of new resources have been aggregated on > the new resources page. This includes links to use cases, performance, > an updated pgcalc tool, ceph tech talks, publications and others. > http://ceph.com/resources/ > > * Featured Developers -- This section will now feature a community > developer for each of our named releases, starting with Kraken. > Congratulations to Haomai on being our first featured developer! > http://ceph.com/community/featured-developers/ > > * Logos -- Now we have reasonably-hig-res logos that you can use for > your websites, presentations, and other assets. If you need something > not featured here, drop us a line. > http://ceph.com/logos/ > > > Those may be the high points, but definitely take a few minutes to > crawl through the site if you get a chance, lots of goodies await you. > As always, if you see something amiss, please let me know. Thanks! > > > On Mon, Jan 16, 2017 at 9:07 AM, Patrick McGarry wrote: >> Hey cephers, >> >> Please bear with us as we migrate ceph.com as there may be some >> outages. They should be quick and over soon. Thanks! >> >> >> -- >> >> Best Regards, >> >> Patrick McGarry >> Director Ceph Community || Red Hat >> http://ceph.com || http://community.redhat.com >> @scuttlemonkey || @ceph > > > > -- > > Best Regards, > > Patrick McGarry > Director Ceph Community || Red Hat > http://ceph.com || http://community.redhat.com > @scuttlemonkey || @ceph -- Best Regards, Patrick McGarry Director Ceph Community || Red Hat http://ceph.com || http://community.redhat.com @scuttlemonkey || @ceph ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph.com outages
Ok, the new website should be up and functional. Shout if you see anything that is still broken. As for the site itself, I'd like to highlight a few things worth checking out: * Ceph Days -- The first two Ceph Days have been posted, as well as the historical events for all of last year. http://ceph.com/cephdays/ * Cephalocon -- Our inaugural conference is happening in Aug. If you would like to know more, or are interested in sponsoring, please take a look. http://ceph.com/cephalocon2017/ * Resources -- A whole host of new resources have been aggregated on the new resources page. This includes links to use cases, performance, an updated pgcalc tool, ceph tech talks, publications and others. http://ceph.com/resources/ * Featured Developers -- This section will now feature a community developer for each of our named releases, starting with Kraken. Congratulations to Haomai on being our first featured developer! http://ceph.com/community/featured-developers/ * Logos -- Now we have reasonably-hig-res logos that you can use for your websites, presentations, and other assets. If you need something not featured here, drop us a line. http://ceph.com/logos/ Those may be the high points, but definitely take a few minutes to crawl through the site if you get a chance, lots of goodies await you. As always, if you see something amiss, please let me know. Thanks! On Mon, Jan 16, 2017 at 9:07 AM, Patrick McGarry wrote: > Hey cephers, > > Please bear with us as we migrate ceph.com as there may be some > outages. They should be quick and over soon. Thanks! > > > -- > > Best Regards, > > Patrick McGarry > Director Ceph Community || Red Hat > http://ceph.com || http://community.redhat.com > @scuttlemonkey || @ceph -- Best Regards, Patrick McGarry Director Ceph Community || Red Hat http://ceph.com || http://community.redhat.com @scuttlemonkey || @ceph ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Monitoring
On Mon, Jan 16, 2017 at 3:54 PM, Andre Forigato wrote: > Hello Marius Vaitiekunas, Chris Jones, > > Thank you for your contributions. > I was looking for this information. > > I'm starting to use Ceph, and my concern is about monitoring. > > Do you have any scripts for this monitoring? > If you can help me. I will be very grateful to you. > > (Excuse me if there is misinterpretation) > > Best Regards, > André Forigato > > Try prometheus exporter for monitoring: https://github.com/digitalocean/ceph_exporter And inkscope is a nice tool for management tasks - https://github.com/inkscope/inkscope -- Marius Vaitiekūnas ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to fix: HEALTH_ERR 45 pgs are stuck inactive for more than 300 seconds; 19 pgs degraded; 45 pgs stuck inactive; 19 pgs stuck unclean; 19 pgs undersized; recovery 2514/5028 objects
give this a try ceph osd set noout On Jan 16, 2017 9:08 AM, "Stéphane Klein" wrote: > I see my mistake: > > ``` > osdmap e57: 2 osds: 1 up, 1 in; 64 remapped pgs > flags sortbitwise,require_jewel_osds > ``` > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to fix: HEALTH_ERR 45 pgs are stuck inactive for more than 300 seconds; 19 pgs degraded; 45 pgs stuck inactive; 19 pgs stuck unclean; 19 pgs undersized; recovery 2514/5028 objects
I see my mistake: ``` osdmap e57: 2 osds: 1 up, 1 in; 64 remapped pgs flags sortbitwise,require_jewel_osds ``` ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph.com outages
Hey cephers, Please bear with us as we migrate ceph.com as there may be some outages. They should be quick and over soon. Thanks! -- Best Regards, Patrick McGarry Director Ceph Community || Red Hat http://ceph.com || http://community.redhat.com @scuttlemonkey || @ceph ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Monitoring
Hello Marius Vaitiekunas, Chris Jones, Thank you for your contributions. I was looking for this information. I'm starting to use Ceph, and my concern is about monitoring. Do you have any scripts for this monitoring? If you can help me. I will be very grateful to you. (Excuse me if there is misinterpretation) Best Regards, André Forigato - Mensagem original - > De: "Marius Vaitiekunas" > Para: "Chris Jones" , ceph-us...@ceph.com > Enviadas: Domingo, 15 de janeiro de 2017 19:26:05 > Assunto: Re: [ceph-users] Ceph Monitoring > On Fri, 13 Jan 2017 at 22:15, Chris Jones < cjo...@cloudm2.com > wrote: >> General question/survey: >> Those that have larger clusters, how are you doing alerting/monitoring? >> Meaning, >> do you trigger off of 'HEALTH_WARN', etc? Not really talking about collectd >> related but more on initial alerts of an issue or potential issue? What >> threshold do you use basically? Just trying to get a pulse of what others are >> doing. >> Thanks in advance. >> -- >> Best Regards, >> Chris Jones >> Bloomberg >> Hi, >> We monitor for 'low iops'. The number differs on our clusters. For example >> if we >> have only 3000 iops per second, there is something wrong going on. >> Another good check is for s3 api. We try to read an object from s3 api every >> 30 >> seconds. >> Also we have many checks like more than 10% osds are down, pg inactive, >> cluster >> has degradated capacity and similiar. Some of these checks are not critical >> and >> we get only emails. >> One more important thing is disk latency monitoring. We've had huge >> slowdowns on >> our cluster when journalling ssd disks wear out. It's quite hard to >> understand >> what's going on, because all osds are up and running, but cluster is not >> performing at all. >> Network.errors on interfaces could be important. We had some issues, when >> physical cable was mulfunctioning and cluster had many blocks. > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to fix: HEALTH_ERR 45 pgs are stuck inactive for more than 300 seconds; 19 pgs degraded; 45 pgs stuck inactive; 19 pgs stuck unclean; 19 pgs undersized; recovery 2514/5028 objects
2017-01-16 12:24 GMT+01:00 Loris Cuoghi : > Hello, > > Le 16/01/2017 à 11:50, Stéphane Klein a écrit : > >> Hi, >> >> I have two OSD and Mon nodes. >> >> I'm going to add third osd and mon on this cluster but before I want to >> fix this error: >> > > > > [SNIP SNAP] > > You've just created your cluster. > > With the standard CRUSH rules you need one OSD on three different hosts > for an active+clean cluster. > > With this parameters: ``` # cat /etc/ceph/ceph.conf [global] mon initial members = ceph-rbx-1,ceph-rbx-2 cluster network = 172.29.20.0/24 mon host = 172.29.20.10,172.29.20.11 osd_pool_default_size = 2 osd_pool_default_min_size = 1 public network = 172.29.20.0/24 max open files = 131072 fsid = [client.libvirt] admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok # must be writable by QEMU and allowed by SELinux or AppArmor log file = /var/log/ceph/qemu-guest-$pid.log # must be writable by QEMU and allowed by SELinux or AppArmor [osd] osd mkfs options xfs = -f -i size=2048 osd mkfs type = xfs osd journal size = 5120 osd mount options xfs = noatime,largeio,inode64,swalloc ``` ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] All SSD cluster performance
Hi Kees, Assuming 3 replicas and collocated journal each RBD write will trigger 6 SSD writes (excluding FS overhead and occasional re-balance). Intel has 4 tiers of Data center SATA SSD (other manufacturers may have fewer): - S31xx: ~0.1 DWPD (counted on 3 years): Very read intensive - S35xx: ~1 DWPD: Read intensive - S36xx: ~3 DWPD: Mixed workloads - S37xx: ~10 DWPD: Write intensive (DWPD = Disk write per day) For example a cluster of 90* 960GB S3520 has an write endurance of 26.25 PB, so around 14 TB/day. IMO the S3610 (maybe soon the S3620 :D) is a good enough middle of the road option if you don’t know the write volume of the RBD backed VMs. Then after a few months in production you can use the SMART data and re-evaluate. I cannot highlight enough how important it is to monitor the SSD wear level. Cheers, Maxime On 16/01/17 11:36, "ceph-users on behalf of Kees Meijs" wrote: Hi Maxime, Given your remark below, what kind of SATA SSD do you recommend for OSD usage? Thanks! Regards, Kees On 15-01-17 21:33, Maxime Guyot wrote: > I don’t have firsthand experience with the S3520, as Christian pointed out their endurance doesn’t make them suitable for OSDs in most cases. I can only advise you to keep a close eye on the SMART status of the SSDs. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to update osd pool default size at runtime?
2017-01-16 12:47 GMT+01:00 Jay Linux : > Hello Stephane, > > Try this . > > $ceph osd pool get size -->> it will prompt the " > osd_pool_default_size " > $ceph osd pool get min_size-->> it will prompt the " > osd_pool_default_min_size " > > if you want to change in runtime, trigger below command > > $ceph osd pool set size > $ceph osd pool set min_size > > Ok thanks, it's work: # ceph osd pool get rbd size size: 2 # ceph osd pool get rbd min_size min_size: 1 It's possible to add a href in the doc to redirect user in the good documentation section? Best regards, Stéphane ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to update osd pool default size at runtime?
Hello Stephane, Try this . $ceph osd pool get size -->> it will prompt the " osd_pool_default_size " $ceph osd pool get min_size-->> it will prompt the " osd_pool_default_min_size " if you want to change in runtime, trigger below command $ceph osd pool set size $ceph osd pool set min_size Cheers Jay On Mon, Jan 16, 2017 at 4:39 PM, Stéphane Klein wrote: > In documentation I read here: http://docs.ceph.com/docs/ > master/rados/troubleshooting/troubleshooting-pg/?highlight= > stuck%20inactive#fewer-osds-than-replicas > > « You can make the changes at runtime. If you make the changes in your > Ceph configuration file, you may need to restart your cluster. » > > but documentation don't explain how to make the changes at runtime. > > Best regards, > Stéphane > -- > Stéphane Klein > blog: http://stephane-klein.info > cv : http://cv.stephane-klein.info > Twitter: http://twitter.com/klein_stephane > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to fix: HEALTH_ERR 45 pgs are stuck inactive for more than 300 seconds; 19 pgs degraded; 45 pgs stuck inactive; 19 pgs stuck unclean; 19 pgs undersized; recovery 2514/5028 objects
Hello, Le 16/01/2017 à 11:50, Stéphane Klein a écrit : Hi, I have two OSD and Mon nodes. I'm going to add third osd and mon on this cluster but before I want to fix this error: > > [SNIP SNAP] You've just created your cluster. With the standard CRUSH rules you need one OSD on three different hosts for an active+clean cluster. You've created two OSDs already, are they on different hosts? Also, your monitors. Keep in mind that an odd number of monitors is the only good choice. So, 1, 3 or 5. The reason for this? I'm sure you've already encountered this paragraph from the documentation: "Ceph uses the Paxos algorithm, which requires a majority of monitors (i.e., 1, 2:3, 3:4, 3:5, 4:6, etc.) to form a quorum." http://docs.ceph.com/docs/master/start/quick-ceph-deploy/#adding-monitors But that's all in the docs, happy reading! http://docs.ceph.com/docs/master/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] How to update osd pool default size at runtime?
In documentation I read here: http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/?highlight=stuck%20inactive#fewer-osds-than-replicas « You can make the changes at runtime. If you make the changes in your Ceph configuration file, you may need to restart your cluster. » but documentation don't explain how to make the changes at runtime. Best regards, Stéphane -- Stéphane Klein blog: http://stephane-klein.info cv : http://cv.stephane-klein.info Twitter: http://twitter.com/klein_stephane ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] How to fix: HEALTH_ERR 45 pgs are stuck inactive for more than 300 seconds; 19 pgs degraded; 45 pgs stuck inactive; 19 pgs stuck unclean; 19 pgs undersized; recovery 2514/5028 objects deg
Hi, I have two OSD and Mon nodes. I'm going to add third osd and mon on this cluster but before I want to fix this error: ``` # ceph -s cluster 8461e3b5-abda-4471-98c0-913e56aec890 health HEALTH_WARN 64 pgs degraded 64 pgs stuck unclean 64 pgs undersized recovery 8261/16522 objects degraded (50.000%) monmap e1: 2 mons at {ceph-rbx-1= 172.29.20.10:6789/0,ceph-rbx-2=172.29.20.11:6789/0} election epoch 22, quorum 0,1 ceph-rbx-1,ceph-rbx-2 osdmap e57: 2 osds: 1 up, 1 in; 64 remapped pgs flags sortbitwise,require_jewel_osds pgmap v784695: 64 pgs, 1 pools, 31719 MB data, 8261 objects 31539 MB used, 65692 MB / 97231 MB avail 8261/16522 objects degraded (50.000%) 64 active+undersized+degraded client io 22038 B/s wr, 0 op/s rd, 0 op/s wr ``` I have executed this command: ``` # ceph pg ls degraded | tail -n +2 | awk '{print $1}' | xargs -n 1 ceph pg force_create_pg ``` after which I have: ``` # ceph health HEALTH_ERR 45 pgs are stuck inactive for more than 300 seconds; 19 pgs degraded; 45 pgs stuck inactive; 19 pgs stuck unclean; 19 pgs undersized; recovery 2514/5028 objects degraded (50.000%) ``` If I look the pg detail like explain here http://docs.ceph.com/docs/infernalis/rados/troubleshooting/troubleshooting-pg/#placement-group-down-peering-failure I have: ``` # ceph pg 0.1 query { "state": "active+undersized+degraded", "snap_trimq": "[]", "epoch": 57, "up": [ 1 ], "acting": [ 1 ], "actingbackfill": [ "1" ], "info": { "pgid": "0.1", "last_update": "57'32353", "last_complete": "57'32353", "log_tail": "42'25917", "last_user_version": 32353, "last_backfill": "MAX", "last_backfill_bitwise": 0, "purged_snaps": "[1~3]", "history": { "epoch_created": 1, "last_epoch_started": 52, "last_epoch_clean": 52, "last_epoch_split": 0, "last_epoch_marked_full": 0, "same_up_since": 51, "same_interval_since": 51, "same_primary_since": 34, "last_scrub": "50'28863", "last_scrub_stamp": "2017-01-14 07:12:27.930427", "last_deep_scrub": "42'23417", "last_deep_scrub_stamp": "2017-01-10 20:31:12.351497", "last_clean_scrub_stamp": "2017-01-14 07:12:27.930427" }, "stats": { "version": "57'32353", "reported_seq": "31704", "reported_epoch": "57", "state": "active+undersized+degraded", "last_fresh": "2017-01-16 10:47:07.330850", "last_change": "2017-01-14 13:42:42.104820", "last_active": "2017-01-16 10:47:07.330850", "last_peered": "2017-01-16 10:47:07.330850", "last_clean": "2017-01-14 11:29:21.619183", "last_became_active": "2017-01-14 13:42:42.104820", "last_became_peered": "2017-01-14 13:42:42.104820", "last_unstale": "2017-01-16 10:47:07.330850", "last_undegraded": "2017-01-14 13:42:41.066061", "last_fullsized": "2017-01-14 13:42:41.066061", "mapping_epoch": 37, "log_start": "42'25917", "ondisk_log_start": "42'25917", "created": 1, "last_epoch_clean": 52, "parent": "0.0", "parent_split_bits": 0, "last_scrub": "50'28863", "last_scrub_stamp": "2017-01-14 07:12:27.930427", "last_deep_scrub": "42'23417", "last_deep_scrub_stamp": "2017-01-10 20:31:12.351497", "last_clean_scrub_stamp": "2017-01-14 07:12:27.930427", "log_size": 6436, "ondisk_log_size": 6436, "stats_invalid": false, "dirty_stats_invalid": false, "omap_stats_invalid": false, "hitset_stats_invalid": false, "hitset_bytes_stats_invalid": false, "pin_stats_invalid": false, "stat_sum": { "num_bytes": 567734272, "num_objects": 140, "num_object_clones": 0, "num_object_copies": 280, "num_objects_missing_on_primary": 0, "num_objects_missing": 0, "num_objects_degraded": 140, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 140, "num_whiteouts": 0, "num_read": 5801, "num_read_kb": 176032, "num_write": 64516, "num_write_kb": 1211660, "num_scrub_errors": 0, "num_shallow_scrub_errors": 0, "num_deep_scrub_errors": 0, "num_objects_recovered": 2, "num_bytes_recovered": 8388608, "nu
Re: [ceph-users] All SSD cluster performance
Hi Maxime, Given your remark below, what kind of SATA SSD do you recommend for OSD usage? Thanks! Regards, Kees On 15-01-17 21:33, Maxime Guyot wrote: > I don’t have firsthand experience with the S3520, as Christian pointed out > their endurance doesn’t make them suitable for OSDs in most cases. I can only > advise you to keep a close eye on the SMART status of the SSDs. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] unable to do regionmap update
Hi Orit Executing period update resolved issue. Thanks for help. Kind regards, Marko On 1/15/17 08:53, Orit Wasserman wrote: On Wed, Jan 11, 2017 at 2:53 PM, Marko Stojanovic wrote: Hello all, I have issue with radosgw-admin regionmap update . It doesn't update map. With zone configured like this: radosgw-admin zone get { "id": "fc12ac44-e27e-44e3-9b13-347162d3c1d2", "name": "oak-1", "domain_root": "oak-1.rgw.data.root", "control_pool": "oak-1.rgw.control", "gc_pool": "oak-1.rgw.gc", "log_pool": "oak-1.rgw.log", "intent_log_pool": "oak-1.rgw.intent-log", "usage_log_pool": "oak-1.rgw.usage", "user_keys_pool": "oak-1.rgw.users.keys", "user_email_pool": "oak-1.rgw.users.email", "user_swift_pool": "oak-1.rgw.users.swift", "user_uid_pool": "oak-1.rgw.users.uid", "system_key": { "access_key": "XX", "secret_key": "XX" }, "placement_pools": [ { "key": "default-placement", "val": { "index_pool": "oak-1.rgw.buckets.index", "data_pool": "oak-1.rgw.buckets.data", "data_extra_pool": "oak-1.rgw.buckets.non-ec", "index_type": 0 } }, { "key": "ssd-placement", "val": { "index_pool": "oak-1.rgw.buckets.index-ssd", "data_pool": "oak-1.rgw.buckets.data-ssd", "data_extra_pool": "oak-1.rgw.buckets.non-ec-ssd", "index_type": 0 } } ], "metadata_heap": "oak-1.rgw.meta", "realm_id": "67e26f6b-4774-4b14-9668-a5cf76b9e9ce" } And region radosgw-admin region get { "id": "dbec3557-87bb-4460-8546-b59b4fde7e10", "name": "oak", "api_name": "oak", "is_master": "true", "endpoints": [], "hostnames": [], "hostnames_s3website": [], "master_zone": "fc12ac44-e27e-44e3-9b13-347162d3c1d2", "zones": [ { "id": "fc12ac44-e27e-44e3-9b13-347162d3c1d2", "name": "oak-1", "endpoints": [ "http:\/\/ceph1.oak.vast.com:7480" ], "log_meta": "true", "log_data": "false", "bucket_index_max_shards": 0, "read_only": "false" } ], "placement_targets": [ { "name": "default-placement", "tags": [ "default-placement" ] }, { "name": "ssd-placement", "tags": [ "ssd-placement" ] } ], "default_placement": "default-placement", "realm_id": "67e26f6b-4774-4b14-9668-a5cf76b9e9ce" When I run radosgw-admin regionmap update I don't get ssd-placement as placement_target: { "zonegroups": [ { "key": "dbec3557-87bb-4460-8546-b59b4fde7e10", "val": { "id": "dbec3557-87bb-4460-8546-b59b4fde7e10", "name": "oak", "api_name": "oak", "is_master": "true", "endpoints": [], "hostnames": [], "hostnames_s3website": [], "master_zone": "fc12ac44-e27e-44e3-9b13-347162d3c1d2", "zones": [ { "id": "fc12ac44-e27e-44e3-9b13-347162d3c1d2", "name": "oak-1", "endpoints": [ "http:\/\/ceph1.oak.vast.com:7480" ], "log_meta": "true", "log_data": "false", "bucket_index_max_shards": 0, "read_only": "false" } ], "placement_targets": [ { "name": "default-placement", "tags": [] } ], "default_placement": "default-placement", "realm_id": "67e26f6b-4774-4b14-9668-a5cf76b9e9ce" } } ], "master_zonegroup": "dbec3557-87bb-4460-8546-b59b4fde7e10", "bucket_quota": { "enabled": false, "max_size_kb": -1, "max_objects": -1 }, "user_quota": { "enabled": false, "max_size_kb": -1, "max_objects": -1 } } Ceph version is: ceph --version ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367) Any advises? First I recommend using zonegroup for jewel as region was renamed to zonegroup. How did you create/update the zones and zonegroup? Did you executed period update? Orit Thanks in advance Marko Stojanovic ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users