Re: [ceph-users] CEPH OSD Restarts taking too long v10.2.9
Nikhil R 于2019年3月29日周五 下午1:44写道: > > if i comment filestore_split_multiple = 72 filestore_merge_threshold = 480 > in the ceph.conf wont ceph take the default value of 2 and 10 and we would be > in more splits and crashes? > Yes, that aimed to make it clear what results in the long start time, leveldb compact or filestore split? > in.linkedin.com/in/nikhilravindra > > > > On Fri, Mar 29, 2019 at 6:55 AM huang jun wrote: >> >> It seems like the split settings result the problem, >> what about comment out those settings then see it still used that long >> time to restart? >> As a fast search in code, these two >> filestore_split_multiple = 72 >> filestore_merge_threshold = 480 >> doesn't support online change. >> >> Nikhil R 于2019年3月28日周四 下午6:33写道: >> > >> > Thanks huang for the reply. >> > Its is the disk compaction taking more time >> > the disk i/o is completely utilized upto 100% >> > looks like both osd_compact_leveldb_on_mount = false & >> > leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9 >> > is there a way to turn off compaction? >> > >> > Also, the reason why we are restarting osd's is due to splitting and we >> > increased split multiple and merge_threshold. >> > Is there a way we would inject it? Is osd restarts the only solution? >> > >> > Thanks In Advance >> > >> > in.linkedin.com/in/nikhilravindra >> > >> > >> > >> > On Thu, Mar 28, 2019 at 3:58 PM huang jun wrote: >> >> >> >> Did the time really cost on db compact operation? >> >> or you can turn on debug_osd=20 to see what happens, >> >> what about the disk util during start? >> >> >> >> Nikhil R 于2019年3月28日周四 下午4:36写道: >> >> > >> >> > CEPH osd restarts are taking too long a time >> >> > below is my ceph.conf >> >> > [osd] >> >> > osd_compact_leveldb_on_mount = false >> >> > leveldb_compact_on_mount = false >> >> > leveldb_cache_size=1073741824 >> >> > leveldb_compression = false >> >> > osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k" >> >> > osd_max_backfills = 1 >> >> > osd_recovery_max_active = 1 >> >> > osd_recovery_op_priority = 1 >> >> > filestore_split_multiple = 72 >> >> > filestore_merge_threshold = 480 >> >> > osd_max_scrubs = 1 >> >> > osd_scrub_begin_hour = 22 >> >> > osd_scrub_end_hour = 3 >> >> > osd_deep_scrub_interval = 2419200 >> >> > osd_scrub_sleep = 0.1 >> >> > >> >> > looks like both osd_compact_leveldb_on_mount = false & >> >> > leveldb_compact_on_mount = false isnt working as expected on ceph >> >> > v10.2.9 >> >> > >> >> > Any ideas on a fix would be appreciated asap >> >> > in.linkedin.com/in/nikhilravindra >> >> > >> >> > ___ >> >> > ceph-users mailing list >> >> > ceph-users@lists.ceph.com >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> >> >> >> >> >> -- >> >> Thank you! >> >> HuangJun >> >> >> >> -- >> Thank you! >> HuangJun -- Thank you! HuangJun ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Bluestore WAL/DB decisions
On Fri, 29 Mar 2019 01:22:06 -0400 Erik McCormick wrote: > Hello all, > > Having dug through the documentation and reading mailing list threads > until my eyes rolled back in my head, I am left with a conundrum > still. Do I separate the DB / WAL or not. > You clearly didn't find this thread, most significant post here but read it all: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-March/033799.html In short, a 30GB DB(and thus WAL) partition should do the trick for many use cases and will still be better than nothing. Christian > I had a bunch of nodes running filestore with 8 x 8TB spinning OSDs > and 2 x 240 GB SSDs. I had put the OS on the first SSD, and then split > the journals on the remaining SSD space. > > My initial minimal understanding of Bluestore was that one should > stick the DB and WAL on an SSD, and if it filled up it would just > spill back onto the OSD itself where it otherwise would have been > anyway. > > So now I start digging and see that the minimum recommended size is 4% > of OSD size. For me that's ~2.6 TB of SSD. Clearly I do not have that > available to me. > > I've also read that it's not so much the data size that matters but > the number of objects and their size. Just looking at my current usage > and extrapolating that to my maximum capacity, I get to ~1.44 million > objects / OSD. > > So the question is, do I: > > 1) Put everything on the OSD and forget the SSDs exist. > > 2) Put just the WAL on the SSDs > > 3) Put the DB (and therefore the WAL) on SSD, ignore the size > recommendations, and just give each as much space as I can. Maybe 48GB > / OSD. > > 4) Some scenario I haven't considered. > > Is the penalty for a too small DB on an SSD partition so severe that > it's not worth doing? > > Thanks, > Erik > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Rakuten Communications ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CEPH OSD Restarts taking too long v10.2.9
if i comment filestore_split_multiple = 72 filestore_merge_threshold = 480 in the ceph.conf wont ceph take the default value of 2 and 10 and we would be in more splits and crashes? in.linkedin.com/in/nikhilravindra On Fri, Mar 29, 2019 at 6:55 AM huang jun wrote: > It seems like the split settings result the problem, > what about comment out those settings then see it still used that long > time to restart? > As a fast search in code, these two > filestore_split_multiple = 72 > filestore_merge_threshold = 480 > doesn't support online change. > > Nikhil R 于2019年3月28日周四 下午6:33写道: > > > > Thanks huang for the reply. > > Its is the disk compaction taking more time > > the disk i/o is completely utilized upto 100% > > looks like both osd_compact_leveldb_on_mount = false & > leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9 > > is there a way to turn off compaction? > > > > Also, the reason why we are restarting osd's is due to splitting and we > increased split multiple and merge_threshold. > > Is there a way we would inject it? Is osd restarts the only solution? > > > > Thanks In Advance > > > > in.linkedin.com/in/nikhilravindra > > > > > > > > On Thu, Mar 28, 2019 at 3:58 PM huang jun wrote: > >> > >> Did the time really cost on db compact operation? > >> or you can turn on debug_osd=20 to see what happens, > >> what about the disk util during start? > >> > >> Nikhil R 于2019年3月28日周四 下午4:36写道: > >> > > >> > CEPH osd restarts are taking too long a time > >> > below is my ceph.conf > >> > [osd] > >> > osd_compact_leveldb_on_mount = false > >> > leveldb_compact_on_mount = false > >> > leveldb_cache_size=1073741824 > >> > leveldb_compression = false > >> > osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k" > >> > osd_max_backfills = 1 > >> > osd_recovery_max_active = 1 > >> > osd_recovery_op_priority = 1 > >> > filestore_split_multiple = 72 > >> > filestore_merge_threshold = 480 > >> > osd_max_scrubs = 1 > >> > osd_scrub_begin_hour = 22 > >> > osd_scrub_end_hour = 3 > >> > osd_deep_scrub_interval = 2419200 > >> > osd_scrub_sleep = 0.1 > >> > > >> > looks like both osd_compact_leveldb_on_mount = false & > leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9 > >> > > >> > Any ideas on a fix would be appreciated asap > >> > in.linkedin.com/in/nikhilravindra > >> > > >> > ___ > >> > ceph-users mailing list > >> > ceph-users@lists.ceph.com > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > >> > >> > >> -- > >> Thank you! > >> HuangJun > > > > -- > Thank you! > HuangJun > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Bluestore WAL/DB decisions
Hello all, Having dug through the documentation and reading mailing list threads until my eyes rolled back in my head, I am left with a conundrum still. Do I separate the DB / WAL or not. I had a bunch of nodes running filestore with 8 x 8TB spinning OSDs and 2 x 240 GB SSDs. I had put the OS on the first SSD, and then split the journals on the remaining SSD space. My initial minimal understanding of Bluestore was that one should stick the DB and WAL on an SSD, and if it filled up it would just spill back onto the OSD itself where it otherwise would have been anyway. So now I start digging and see that the minimum recommended size is 4% of OSD size. For me that's ~2.6 TB of SSD. Clearly I do not have that available to me. I've also read that it's not so much the data size that matters but the number of objects and their size. Just looking at my current usage and extrapolating that to my maximum capacity, I get to ~1.44 million objects / OSD. So the question is, do I: 1) Put everything on the OSD and forget the SSDs exist. 2) Put just the WAL on the SSDs 3) Put the DB (and therefore the WAL) on SSD, ignore the size recommendations, and just give each as much space as I can. Maybe 48GB / OSD. 4) Some scenario I haven't considered. Is the penalty for a too small DB on an SSD partition so severe that it's not worth doing? Thanks, Erik ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CEPH OSD Restarts taking too long v10.2.9
It seems like the split settings result the problem, what about comment out those settings then see it still used that long time to restart? As a fast search in code, these two filestore_split_multiple = 72 filestore_merge_threshold = 480 doesn't support online change. Nikhil R 于2019年3月28日周四 下午6:33写道: > > Thanks huang for the reply. > Its is the disk compaction taking more time > the disk i/o is completely utilized upto 100% > looks like both osd_compact_leveldb_on_mount = false & > leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9 > is there a way to turn off compaction? > > Also, the reason why we are restarting osd's is due to splitting and we > increased split multiple and merge_threshold. > Is there a way we would inject it? Is osd restarts the only solution? > > Thanks In Advance > > in.linkedin.com/in/nikhilravindra > > > > On Thu, Mar 28, 2019 at 3:58 PM huang jun wrote: >> >> Did the time really cost on db compact operation? >> or you can turn on debug_osd=20 to see what happens, >> what about the disk util during start? >> >> Nikhil R 于2019年3月28日周四 下午4:36写道: >> > >> > CEPH osd restarts are taking too long a time >> > below is my ceph.conf >> > [osd] >> > osd_compact_leveldb_on_mount = false >> > leveldb_compact_on_mount = false >> > leveldb_cache_size=1073741824 >> > leveldb_compression = false >> > osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k" >> > osd_max_backfills = 1 >> > osd_recovery_max_active = 1 >> > osd_recovery_op_priority = 1 >> > filestore_split_multiple = 72 >> > filestore_merge_threshold = 480 >> > osd_max_scrubs = 1 >> > osd_scrub_begin_hour = 22 >> > osd_scrub_end_hour = 3 >> > osd_deep_scrub_interval = 2419200 >> > osd_scrub_sleep = 0.1 >> > >> > looks like both osd_compact_leveldb_on_mount = false & >> > leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9 >> > >> > Any ideas on a fix would be appreciated asap >> > in.linkedin.com/in/nikhilravindra >> > >> > ___ >> > ceph-users mailing list >> > ceph-users@lists.ceph.com >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> >> -- >> Thank you! >> HuangJun -- Thank you! HuangJun ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] scrub errors
On Fri, Mar 29, 2019 at 7:54 AM solarflow99 wrote: > > ok, I tried doing ceph osd out on each of the 4 OSDs 1 by 1. I got it out of > backfill mode but still not sure if it'll fix anything. pg 10.2a still shows > state active+clean+inconsistent. Peer 8 is now > remapped+inconsistent+peering, and the other peer is active+clean+inconsistent Per the document I linked previously if a pg remains remapped you likely have a problem with your configuration. Take a good look at your crushmap, pg distribution, pool configuration, etc. > > > On Wed, Mar 27, 2019 at 4:13 PM Brad Hubbard wrote: >> >> On Thu, Mar 28, 2019 at 8:33 AM solarflow99 wrote: >> > >> > yes, but nothing seems to happen. I don't understand why it lists OSDs 7 >> > in the "recovery_state": when i'm only using 3 replicas and it seems to >> > use 41,38,8 >> >> Well, osd 8s state is listed as >> "active+undersized+degraded+remapped+wait_backfill" so it seems to be >> stuck waiting for backfill for some reason. One thing you could try is >> restarting all of the osds including 7 and 17 to see if forcing them >> to peer again has any positive effect. Don't restart them all at once, >> just one at a time waiting until each has peered before moving on. >> >> > >> > # ceph health detail >> > HEALTH_ERR 1 pgs inconsistent; 47 scrub errors >> > pg 10.2a is active+clean+inconsistent, acting [41,38,8] >> > 47 scrub errors >> > >> > >> > >> > As you can see all OSDs are up and in: >> > >> > # ceph osd stat >> > osdmap e23265: 49 osds: 49 up, 49 in >> > >> > >> > >> > >> > And this just stays the same: >> > >> > "up": [ >> > 41, >> > 38, >> > 8 >> > ], >> > "acting": [ >> > 41, >> > 38, >> > 8 >> > >> > "recovery_state": [ >> > { >> > "name": "Started\/Primary\/Active", >> > "enter_time": "2018-09-22 07:07:48.637248", >> > "might_have_unfound": [ >> > { >> > "osd": "7", >> > "status": "not queried" >> > }, >> > { >> > "osd": "8", >> > "status": "already probed" >> > }, >> > { >> > "osd": "17", >> > "status": "not queried" >> > }, >> > { >> > "osd": "38", >> > "status": "already probed" >> > } >> > ], >> > >> > >> > On Tue, Mar 26, 2019 at 4:53 PM Brad Hubbard wrote: >> >> >> >> http://docs.ceph.com/docs/hammer/rados/troubleshooting/troubleshooting-pg/ >> >> >> >> Did you try repairing the pg? >> >> >> >> >> >> On Tue, Mar 26, 2019 at 9:08 AM solarflow99 wrote: >> >> > >> >> > yes, I know its old. I intend to have it replaced but thats a few >> >> > months away and was hoping to get past this. the other OSDs appear to >> >> > be ok, I see them up and in, why do you see something wrong? >> >> > >> >> > On Mon, Mar 25, 2019 at 4:00 PM Brad Hubbard >> >> > wrote: >> >> >> >> >> >> Hammer is no longer supported. >> >> >> >> >> >> What's the status of osds 7 and 17? >> >> >> >> >> >> On Tue, Mar 26, 2019 at 8:56 AM solarflow99 >> >> >> wrote: >> >> >> > >> >> >> > hi, thanks. Its still using Hammer. Here's the output from the pg >> >> >> > query, the last command you gave doesn't work at all but be too old. >> >> >> > >> >> >> > >> >> >> > # ceph pg 10.2a query >> >> >> > { >> >> >> > "state": "active+clean+inconsistent", >> >> >> > "snap_trimq": "[]", >> >> >> > "epoch": 23265, >> >> >> > "up": [ >> >> >> > 41, >> >> >> > 38, >> >> >> > 8 >> >> >> > ], >> >> >> > "acting": [ >> >> >> > 41, >> >> >> > 38, >> >> >> > 8 >> >> >> > ], >> >> >> > "actingbackfill": [ >> >> >> > "8", >> >> >> > "38", >> >> >> > "41" >> >> >> > ], >> >> >> > "info": { >> >> >> > "pgid": "10.2a", >> >> >> > "last_update": "23265'20886859", >> >> >> > "last_complete": "23265'20886859", >> >> >> > "log_tail": "23265'20883809", >> >> >> > "last_user_version": 20886859, >> >> >> > "last_backfill": "MAX", >> >> >> > "purged_snaps": "[]", >> >> >> > "history": { >> >> >> > "epoch_created": 8200, >> >> >> > "last_epoch_started": 21481, >> >> >> > "last_epoch_clean": 21487, >> >> >> > "last_epoch_split": 0, >> >> >> > "same_up_since": 21472, >> >> >> > "same_interval_since": 21474, >> >> >> > "same_primary_since": 8244, >> >> >> > "last_scrub": "23265'20864209", >> >> >> > "last_scrub_stamp": "2019-03-22 22:39:13.930673", >> >> >> > "last_deep_scrub": "23265'20864209", >> >> >> > "last_deep_scrub_stamp": "2019-03-22 22:39:13.930673", >> >> >> > "last_clea
Re: [ceph-users] scrub errors
ok, I tried doing ceph osd out on each of the 4 OSDs 1 by 1. I got it out of backfill mode but still not sure if it'll fix anything. pg 10.2a still shows state active+clean+inconsistent. Peer 8 is now remapped+inconsistent+peering, and the other peer is active+clean+inconsistent On Wed, Mar 27, 2019 at 4:13 PM Brad Hubbard wrote: > On Thu, Mar 28, 2019 at 8:33 AM solarflow99 wrote: > > > > yes, but nothing seems to happen. I don't understand why it lists OSDs > 7 in the "recovery_state": when i'm only using 3 replicas and it seems to > use 41,38,8 > > Well, osd 8s state is listed as > "active+undersized+degraded+remapped+wait_backfill" so it seems to be > stuck waiting for backfill for some reason. One thing you could try is > restarting all of the osds including 7 and 17 to see if forcing them > to peer again has any positive effect. Don't restart them all at once, > just one at a time waiting until each has peered before moving on. > > > > > # ceph health detail > > HEALTH_ERR 1 pgs inconsistent; 47 scrub errors > > pg 10.2a is active+clean+inconsistent, acting [41,38,8] > > 47 scrub errors > > > > > > > > As you can see all OSDs are up and in: > > > > # ceph osd stat > > osdmap e23265: 49 osds: 49 up, 49 in > > > > > > > > > > And this just stays the same: > > > > "up": [ > > 41, > > 38, > > 8 > > ], > > "acting": [ > > 41, > > 38, > > 8 > > > > "recovery_state": [ > > { > > "name": "Started\/Primary\/Active", > > "enter_time": "2018-09-22 07:07:48.637248", > > "might_have_unfound": [ > > { > > "osd": "7", > > "status": "not queried" > > }, > > { > > "osd": "8", > > "status": "already probed" > > }, > > { > > "osd": "17", > > "status": "not queried" > > }, > > { > > "osd": "38", > > "status": "already probed" > > } > > ], > > > > > > On Tue, Mar 26, 2019 at 4:53 PM Brad Hubbard > wrote: > >> > >> > http://docs.ceph.com/docs/hammer/rados/troubleshooting/troubleshooting-pg/ > >> > >> Did you try repairing the pg? > >> > >> > >> On Tue, Mar 26, 2019 at 9:08 AM solarflow99 > wrote: > >> > > >> > yes, I know its old. I intend to have it replaced but thats a few > months away and was hoping to get past this. the other OSDs appear to be > ok, I see them up and in, why do you see something wrong? > >> > > >> > On Mon, Mar 25, 2019 at 4:00 PM Brad Hubbard > wrote: > >> >> > >> >> Hammer is no longer supported. > >> >> > >> >> What's the status of osds 7 and 17? > >> >> > >> >> On Tue, Mar 26, 2019 at 8:56 AM solarflow99 > wrote: > >> >> > > >> >> > hi, thanks. Its still using Hammer. Here's the output from the > pg query, the last command you gave doesn't work at all but be too old. > >> >> > > >> >> > > >> >> > # ceph pg 10.2a query > >> >> > { > >> >> > "state": "active+clean+inconsistent", > >> >> > "snap_trimq": "[]", > >> >> > "epoch": 23265, > >> >> > "up": [ > >> >> > 41, > >> >> > 38, > >> >> > 8 > >> >> > ], > >> >> > "acting": [ > >> >> > 41, > >> >> > 38, > >> >> > 8 > >> >> > ], > >> >> > "actingbackfill": [ > >> >> > "8", > >> >> > "38", > >> >> > "41" > >> >> > ], > >> >> > "info": { > >> >> > "pgid": "10.2a", > >> >> > "last_update": "23265'20886859", > >> >> > "last_complete": "23265'20886859", > >> >> > "log_tail": "23265'20883809", > >> >> > "last_user_version": 20886859, > >> >> > "last_backfill": "MAX", > >> >> > "purged_snaps": "[]", > >> >> > "history": { > >> >> > "epoch_created": 8200, > >> >> > "last_epoch_started": 21481, > >> >> > "last_epoch_clean": 21487, > >> >> > "last_epoch_split": 0, > >> >> > "same_up_since": 21472, > >> >> > "same_interval_since": 21474, > >> >> > "same_primary_since": 8244, > >> >> > "last_scrub": "23265'20864209", > >> >> > "last_scrub_stamp": "2019-03-22 22:39:13.930673", > >> >> > "last_deep_scrub": "23265'20864209", > >> >> > "last_deep_scrub_stamp": "2019-03-22 22:39:13.930673", > >> >> > "last_clean_scrub_stamp": "2019-03-15 01:33:21.447438" > >> >> > }, > >> >> > "stats": { > >> >> > "version": "23265'20886859", > >> >> > "reported_seq": "10109937", > >> >> > "reported_epoch": "23265", > >> >> > "state": "active+clean+inconsistent", > >> >> > "last_fresh": "2019-03-25 15:52:53.720768", > >> >> > "last_change": "2019-03-22 22:39:13.9
Re: [ceph-users] Blocked ops after change from filestore on HDD to bluestore on SDD
Hi Uwe, Am 28. Februar 2019 11:02:09 MEZ schrieb Uwe Sauter : >Am 28.02.19 um 10:42 schrieb Matthew H: >> Have you made any changes to your ceph.conf? If so, would you mind >copying them into this thread? > >No, I just deleted an OSD, replaced HDD with SDD and created a new OSD >(with bluestore). Once the cluster was healty again, I >repeated with the next OSD. > > >[global] > auth client required = cephx > auth cluster required = cephx > auth service required = cephx > cluster network = 169.254.42.0/24 > fsid = 753c9bbd-74bd-4fea-8c1e-88da775c5ad4 > keyring = /etc/pve/priv/$cluster.$name.keyring > public network = 169.254.42.0/24 > >[mon] > mon allow pool delete = true > mon data avail crit = 5 > mon data avail warn = 15 > >[osd] > keyring = /var/lib/ceph/osd/ceph-$id/keyring > osd journal size = 5120 > osd pool default min size = 2 > osd pool default size = 3 > osd max backfills = 6 > osd recovery max active = 12 I guess should decrease this last two parameters to 1. This should help to avoid to much pressure on your drives... Hth - Mehmet > >[mon.px-golf-cluster] > host = px-golf-cluster > mon addr = 169.254.42.54:6789 > >[mon.px-hotel-cluster] > host = px-hotel-cluster > mon addr = 169.254.42.55:6789 > >[mon.px-india-cluster] > host = px-india-cluster > mon addr = 169.254.42.56:6789 > > > > >> >> >-- >> *From:* ceph-users on behalf of >Vitaliy Filippov >> *Sent:* Wednesday, February 27, 2019 4:21 PM >> *To:* Ceph Users >> *Subject:* Re: [ceph-users] Blocked ops after change from filestore >on HDD to bluestore on SDD >> >> I think this should not lead to blocked ops in any case, even if the >> performance is low... >> >> -- >> With best regards, >> Vitaliy Filippov >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >___ >ceph-users mailing list >ceph-users@lists.ceph.com >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Latest recommendations on sizing
Hi, We are looking at extending one of our Ceph clusters, currently running Luminous. The cluster is all SSD, providing RBD to Openstack, using 70 OSDs on 5 hosts. We have a couple of projects kicking off that will need significantly more, albeit slower storage. I am looking at speccing out some new OSD nodes with higher capacity spinning drives. We are deploying 25GbE these days, so I am not worried about network bandwidth (and have taken on recent comments suggesting that there is no reason to run separate cluster/public networks. What about CPUs - is it still worth 2x CPUs? Our current OSD hosts have 2x CPUs but neither seems particularly busy. Would a single higher spec CPU win out over dual lower spec CPUs, taking on board previous discussion that GHz is king. SSD/NVMe for WAL etc? We're running Bluestore on all of our SSD OSDS with colocated WAL. We are looking to provide ~500TB into a separate (non-default) storage pool, and so would appreciate suggestions about where my money should be going (or not going). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CEPH OSD Restarts taking too long v10.2.9
Thanks huang for the reply. Its is the disk compaction taking more time the disk i/o is completely utilized upto 100% looks like both osd_compact_leveldb_on_mount = false & leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9 is there a way to turn off compaction? Also, the reason why we are restarting osd's is due to splitting and we increased split multiple and merge_threshold. Is there a way we would inject it? Is osd restarts the only solution? Thanks In Advance in.linkedin.com/in/nikhilravindra On Thu, Mar 28, 2019 at 3:58 PM huang jun wrote: > Did the time really cost on db compact operation? > or you can turn on debug_osd=20 to see what happens, > what about the disk util during start? > > Nikhil R 于2019年3月28日周四 下午4:36写道: > > > > CEPH osd restarts are taking too long a time > > below is my ceph.conf > > [osd] > > osd_compact_leveldb_on_mount = false > > leveldb_compact_on_mount = false > > leveldb_cache_size=1073741824 > > leveldb_compression = false > > osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k" > > osd_max_backfills = 1 > > osd_recovery_max_active = 1 > > osd_recovery_op_priority = 1 > > filestore_split_multiple = 72 > > filestore_merge_threshold = 480 > > osd_max_scrubs = 1 > > osd_scrub_begin_hour = 22 > > osd_scrub_end_hour = 3 > > osd_deep_scrub_interval = 2419200 > > osd_scrub_sleep = 0.1 > > > > looks like both osd_compact_leveldb_on_mount = false & > leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9 > > > > Any ideas on a fix would be appreciated asap > > in.linkedin.com/in/nikhilravindra > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- > Thank you! > HuangJun > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CEPH OSD Restarts taking too long v10.2.9
Did the time really cost on db compact operation? or you can turn on debug_osd=20 to see what happens, what about the disk util during start? Nikhil R 于2019年3月28日周四 下午4:36写道: > > CEPH osd restarts are taking too long a time > below is my ceph.conf > [osd] > osd_compact_leveldb_on_mount = false > leveldb_compact_on_mount = false > leveldb_cache_size=1073741824 > leveldb_compression = false > osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k" > osd_max_backfills = 1 > osd_recovery_max_active = 1 > osd_recovery_op_priority = 1 > filestore_split_multiple = 72 > filestore_merge_threshold = 480 > osd_max_scrubs = 1 > osd_scrub_begin_hour = 22 > osd_scrub_end_hour = 3 > osd_deep_scrub_interval = 2419200 > osd_scrub_sleep = 0.1 > > looks like both osd_compact_leveldb_on_mount = false & > leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9 > > Any ideas on a fix would be appreciated asap > in.linkedin.com/in/nikhilravindra > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Thank you! HuangJun ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] "Failed to authpin" results in large number of blocked requests
We're running a Ceph mimic (13.2.4) cluster which is predominantly used for CephFS. We have recently switched to using multiple active MDSes to cope with load on the cluster, but are experiencing problems with large numbers of blocked requests when research staff run large experiments. The error associated with the block is: 2019-03-28 09:31:34.246326 [WRN] 6 slow requests, 0 included below; oldest blocked for > 423.987868 secs 2019-03-28 09:31:29.246202 [WRN] slow request 62.572806 seconds old, received at 2019-03-28 09:30:26.673298: client_request(client.5882168:1404749 lookup #0x1000441/run_output 2019-03-28 09:30:26.653089 caller_uid=0, caller_gid=0{}) currently failed to authpin, subtree is being exported Eventually, many hundreds of requests are blocked for hours. It appears (As alluded to by the subtree is being exported error) that this is related to the MDSes remapping entries between ranks under load, as it is always accompanied by messages along the lines of "mds.0.migrator nicely exporting to mds.1". Migrations that occur when the cluster is not under heavy load complete OK, but under load it seems the operation is not completed or entering deadlock for some reason. We can clear the immediate problem by restarting the affected MDS, and have a partial solution by using subtree pinning on everything but this is far from ideal. Does anyone have any pointers where else we should be looking to troubleshoot this? Thanks, Zoe. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] CEPH OSD Restarts taking too long v10.2.9
CEPH osd restarts are taking too long a time below is my ceph.conf [osd] osd_compact_leveldb_on_mount = false leveldb_compact_on_mount = false leveldb_cache_size=1073741824 leveldb_compression = false osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k" osd_max_backfills = 1 osd_recovery_max_active = 1 osd_recovery_op_priority = 1 filestore_split_multiple = 72 filestore_merge_threshold = 480 osd_max_scrubs = 1 osd_scrub_begin_hour = 22 osd_scrub_end_hour = 3 osd_deep_scrub_interval = 2419200 osd_scrub_sleep = 0.1 looks like both osd_compact_leveldb_on_mount = false & leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9 Any ideas on a fix would be appreciated asap in.linkedin.com/in/nikhilravindra ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com