Re: [ceph-users] CEPH OSD Restarts taking too long v10.2.9

2019-03-28 Thread huang jun
Nikhil R  于2019年3月29日周五 下午1:44写道:
>
> if i comment filestore_split_multiple = 72 filestore_merge_threshold = 480   
> in the ceph.conf wont ceph take the default value of 2 and 10 and we would be 
> in more splits and crashes?
>
Yes, that aimed to make it clear what results in the long start time,
leveldb compact or filestore split?
> in.linkedin.com/in/nikhilravindra
>
>
>
> On Fri, Mar 29, 2019 at 6:55 AM huang jun  wrote:
>>
>> It seems like the split settings result the problem,
>> what about comment out those settings then see it still used that long
>> time to restart?
>> As a fast search in code, these two
>> filestore_split_multiple = 72
>> filestore_merge_threshold = 480
>> doesn't support online change.
>>
>> Nikhil R  于2019年3月28日周四 下午6:33写道:
>> >
>> > Thanks huang for the reply.
>> > Its is the disk compaction taking more time
>> > the disk i/o is completely utilized upto 100%
>> > looks like both osd_compact_leveldb_on_mount = false & 
>> > leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9
>> > is there a way to turn off compaction?
>> >
>> > Also, the reason why we are restarting osd's is due to splitting and we 
>> > increased split multiple and merge_threshold.
>> > Is there a way we would inject it? Is osd restarts the only solution?
>> >
>> > Thanks In Advance
>> >
>> > in.linkedin.com/in/nikhilravindra
>> >
>> >
>> >
>> > On Thu, Mar 28, 2019 at 3:58 PM huang jun  wrote:
>> >>
>> >> Did the time really cost on db compact operation?
>> >> or you can turn on debug_osd=20 to see what happens,
>> >> what about the disk util during start?
>> >>
>> >> Nikhil R  于2019年3月28日周四 下午4:36写道:
>> >> >
>> >> > CEPH osd restarts are taking too long a time
>> >> > below is my ceph.conf
>> >> > [osd]
>> >> > osd_compact_leveldb_on_mount = false
>> >> > leveldb_compact_on_mount = false
>> >> > leveldb_cache_size=1073741824
>> >> > leveldb_compression = false
>> >> > osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k"
>> >> > osd_max_backfills = 1
>> >> > osd_recovery_max_active = 1
>> >> > osd_recovery_op_priority = 1
>> >> > filestore_split_multiple = 72
>> >> > filestore_merge_threshold = 480
>> >> > osd_max_scrubs = 1
>> >> > osd_scrub_begin_hour = 22
>> >> > osd_scrub_end_hour = 3
>> >> > osd_deep_scrub_interval = 2419200
>> >> > osd_scrub_sleep = 0.1
>> >> >
>> >> > looks like both osd_compact_leveldb_on_mount = false & 
>> >> > leveldb_compact_on_mount = false isnt working as expected on ceph 
>> >> > v10.2.9
>> >> >
>> >> > Any ideas on a fix would be appreciated asap
>> >> > in.linkedin.com/in/nikhilravindra
>> >> >
>> >> > ___
>> >> > ceph-users mailing list
>> >> > ceph-users@lists.ceph.com
>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>
>> >>
>> >>
>> >> --
>> >> Thank you!
>> >> HuangJun
>>
>>
>>
>> --
>> Thank you!
>> HuangJun



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore WAL/DB decisions

2019-03-28 Thread Christian Balzer
On Fri, 29 Mar 2019 01:22:06 -0400 Erik McCormick wrote:

> Hello all,
> 
> Having dug through the documentation and reading mailing list threads
> until my eyes rolled back in my head, I am left with a conundrum
> still. Do I separate the DB / WAL or not.
> 
You clearly didn't find this thread, most significant post here but read
it all:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-March/033799.html

In short, a 30GB DB(and thus WAL) partition should do the trick for many
use cases and will still be better than nothing.

Christian

> I had a bunch of nodes running filestore with 8 x 8TB spinning OSDs
> and 2 x 240 GB SSDs. I had put the OS on the first SSD, and then split
> the journals on the remaining SSD space.
> 
> My initial minimal understanding of Bluestore was that one should
> stick the DB and WAL on an SSD, and if it filled up it would just
> spill back onto the OSD itself where it otherwise would have been
> anyway.
> 
> So now I start digging and see that the minimum recommended size is 4%
> of OSD size. For me that's ~2.6 TB of SSD. Clearly I do not have that
> available to me.
> 
> I've also read that it's not so much the data size that matters but
> the number of objects and their size. Just looking at my current usage
> and extrapolating that to my maximum capacity, I get to ~1.44 million
> objects / OSD.
> 
> So the question is, do I:
> 
> 1) Put everything on the OSD and forget the SSDs exist.
> 
> 2) Put just the WAL on the SSDs
> 
> 3) Put the DB (and therefore the WAL) on SSD, ignore the size
> recommendations, and just give each as much space as I can. Maybe 48GB
> / OSD.
> 
> 4) Some scenario I haven't considered.
> 
> Is the penalty for a too small DB on an SSD partition so severe that
> it's not worth doing?
> 
> Thanks,
> Erik
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH OSD Restarts taking too long v10.2.9

2019-03-28 Thread Nikhil R
if i comment filestore_split_multiple = 72 filestore_merge_threshold = 480
 in the ceph.conf wont ceph take the default value of 2 and 10 and we would
be in more splits and crashes?

in.linkedin.com/in/nikhilravindra



On Fri, Mar 29, 2019 at 6:55 AM huang jun  wrote:

> It seems like the split settings result the problem,
> what about comment out those settings then see it still used that long
> time to restart?
> As a fast search in code, these two
> filestore_split_multiple = 72
> filestore_merge_threshold = 480
> doesn't support online change.
>
> Nikhil R  于2019年3月28日周四 下午6:33写道:
> >
> > Thanks huang for the reply.
> > Its is the disk compaction taking more time
> > the disk i/o is completely utilized upto 100%
> > looks like both osd_compact_leveldb_on_mount = false &
> leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9
> > is there a way to turn off compaction?
> >
> > Also, the reason why we are restarting osd's is due to splitting and we
> increased split multiple and merge_threshold.
> > Is there a way we would inject it? Is osd restarts the only solution?
> >
> > Thanks In Advance
> >
> > in.linkedin.com/in/nikhilravindra
> >
> >
> >
> > On Thu, Mar 28, 2019 at 3:58 PM huang jun  wrote:
> >>
> >> Did the time really cost on db compact operation?
> >> or you can turn on debug_osd=20 to see what happens,
> >> what about the disk util during start?
> >>
> >> Nikhil R  于2019年3月28日周四 下午4:36写道:
> >> >
> >> > CEPH osd restarts are taking too long a time
> >> > below is my ceph.conf
> >> > [osd]
> >> > osd_compact_leveldb_on_mount = false
> >> > leveldb_compact_on_mount = false
> >> > leveldb_cache_size=1073741824
> >> > leveldb_compression = false
> >> > osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k"
> >> > osd_max_backfills = 1
> >> > osd_recovery_max_active = 1
> >> > osd_recovery_op_priority = 1
> >> > filestore_split_multiple = 72
> >> > filestore_merge_threshold = 480
> >> > osd_max_scrubs = 1
> >> > osd_scrub_begin_hour = 22
> >> > osd_scrub_end_hour = 3
> >> > osd_deep_scrub_interval = 2419200
> >> > osd_scrub_sleep = 0.1
> >> >
> >> > looks like both osd_compact_leveldb_on_mount = false &
> leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9
> >> >
> >> > Any ideas on a fix would be appreciated asap
> >> > in.linkedin.com/in/nikhilravindra
> >> >
> >> > ___
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >>
> >>
> >> --
> >> Thank you!
> >> HuangJun
>
>
>
> --
> Thank you!
> HuangJun
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bluestore WAL/DB decisions

2019-03-28 Thread Erik McCormick
Hello all,

Having dug through the documentation and reading mailing list threads
until my eyes rolled back in my head, I am left with a conundrum
still. Do I separate the DB / WAL or not.

I had a bunch of nodes running filestore with 8 x 8TB spinning OSDs
and 2 x 240 GB SSDs. I had put the OS on the first SSD, and then split
the journals on the remaining SSD space.

My initial minimal understanding of Bluestore was that one should
stick the DB and WAL on an SSD, and if it filled up it would just
spill back onto the OSD itself where it otherwise would have been
anyway.

So now I start digging and see that the minimum recommended size is 4%
of OSD size. For me that's ~2.6 TB of SSD. Clearly I do not have that
available to me.

I've also read that it's not so much the data size that matters but
the number of objects and their size. Just looking at my current usage
and extrapolating that to my maximum capacity, I get to ~1.44 million
objects / OSD.

So the question is, do I:

1) Put everything on the OSD and forget the SSDs exist.

2) Put just the WAL on the SSDs

3) Put the DB (and therefore the WAL) on SSD, ignore the size
recommendations, and just give each as much space as I can. Maybe 48GB
/ OSD.

4) Some scenario I haven't considered.

Is the penalty for a too small DB on an SSD partition so severe that
it's not worth doing?

Thanks,
Erik
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH OSD Restarts taking too long v10.2.9

2019-03-28 Thread huang jun
It seems like the split settings result the problem,
what about comment out those settings then see it still used that long
time to restart?
As a fast search in code, these two
filestore_split_multiple = 72
filestore_merge_threshold = 480
doesn't support online change.

Nikhil R  于2019年3月28日周四 下午6:33写道:
>
> Thanks huang for the reply.
> Its is the disk compaction taking more time
> the disk i/o is completely utilized upto 100%
> looks like both osd_compact_leveldb_on_mount = false & 
> leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9
> is there a way to turn off compaction?
>
> Also, the reason why we are restarting osd's is due to splitting and we 
> increased split multiple and merge_threshold.
> Is there a way we would inject it? Is osd restarts the only solution?
>
> Thanks In Advance
>
> in.linkedin.com/in/nikhilravindra
>
>
>
> On Thu, Mar 28, 2019 at 3:58 PM huang jun  wrote:
>>
>> Did the time really cost on db compact operation?
>> or you can turn on debug_osd=20 to see what happens,
>> what about the disk util during start?
>>
>> Nikhil R  于2019年3月28日周四 下午4:36写道:
>> >
>> > CEPH osd restarts are taking too long a time
>> > below is my ceph.conf
>> > [osd]
>> > osd_compact_leveldb_on_mount = false
>> > leveldb_compact_on_mount = false
>> > leveldb_cache_size=1073741824
>> > leveldb_compression = false
>> > osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k"
>> > osd_max_backfills = 1
>> > osd_recovery_max_active = 1
>> > osd_recovery_op_priority = 1
>> > filestore_split_multiple = 72
>> > filestore_merge_threshold = 480
>> > osd_max_scrubs = 1
>> > osd_scrub_begin_hour = 22
>> > osd_scrub_end_hour = 3
>> > osd_deep_scrub_interval = 2419200
>> > osd_scrub_sleep = 0.1
>> >
>> > looks like both osd_compact_leveldb_on_mount = false & 
>> > leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9
>> >
>> > Any ideas on a fix would be appreciated asap
>> > in.linkedin.com/in/nikhilravindra
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> --
>> Thank you!
>> HuangJun



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] scrub errors

2019-03-28 Thread Brad Hubbard
On Fri, Mar 29, 2019 at 7:54 AM solarflow99  wrote:
>
> ok, I tried doing ceph osd out on each of the 4 OSDs 1 by 1.  I got it out of 
> backfill mode but still not sure if it'll fix anything.  pg 10.2a still shows 
> state active+clean+inconsistent.  Peer 8  is now 
> remapped+inconsistent+peering, and the other peer is active+clean+inconsistent

Per the document I linked previously if a pg remains remapped you
likely have a problem with your configuration. Take a good look at
your crushmap, pg distribution, pool configuration, etc.

>
>
> On Wed, Mar 27, 2019 at 4:13 PM Brad Hubbard  wrote:
>>
>> On Thu, Mar 28, 2019 at 8:33 AM solarflow99  wrote:
>> >
>> > yes, but nothing seems to happen.  I don't understand why it lists OSDs 7 
>> > in the  "recovery_state": when i'm only using 3 replicas and it seems to 
>> > use 41,38,8
>>
>> Well, osd 8s state is listed as
>> "active+undersized+degraded+remapped+wait_backfill" so it seems to be
>> stuck waiting for backfill for some reason. One thing you could try is
>> restarting all of the osds including 7 and 17 to see if forcing them
>> to peer again has any positive effect. Don't restart them all at once,
>> just one at a time waiting until each has peered before moving on.
>>
>> >
>> > # ceph health detail
>> > HEALTH_ERR 1 pgs inconsistent; 47 scrub errors
>> > pg 10.2a is active+clean+inconsistent, acting [41,38,8]
>> > 47 scrub errors
>> >
>> >
>> >
>> > As you can see all OSDs are up and in:
>> >
>> > # ceph osd stat
>> >  osdmap e23265: 49 osds: 49 up, 49 in
>> >
>> >
>> >
>> >
>> > And this just stays the same:
>> >
>> > "up": [
>> > 41,
>> > 38,
>> > 8
>> > ],
>> > "acting": [
>> > 41,
>> > 38,
>> > 8
>> >
>> >  "recovery_state": [
>> > {
>> > "name": "Started\/Primary\/Active",
>> > "enter_time": "2018-09-22 07:07:48.637248",
>> > "might_have_unfound": [
>> > {
>> > "osd": "7",
>> > "status": "not queried"
>> > },
>> > {
>> > "osd": "8",
>> > "status": "already probed"
>> > },
>> > {
>> > "osd": "17",
>> > "status": "not queried"
>> > },
>> > {
>> > "osd": "38",
>> > "status": "already probed"
>> > }
>> > ],
>> >
>> >
>> > On Tue, Mar 26, 2019 at 4:53 PM Brad Hubbard  wrote:
>> >>
>> >> http://docs.ceph.com/docs/hammer/rados/troubleshooting/troubleshooting-pg/
>> >>
>> >> Did you try repairing the pg?
>> >>
>> >>
>> >> On Tue, Mar 26, 2019 at 9:08 AM solarflow99  wrote:
>> >> >
>> >> > yes, I know its old.  I intend to have it replaced but thats a few 
>> >> > months away and was hoping to get past this.  the other OSDs appear to 
>> >> > be ok, I see them up and in, why do you see something wrong?
>> >> >
>> >> > On Mon, Mar 25, 2019 at 4:00 PM Brad Hubbard  
>> >> > wrote:
>> >> >>
>> >> >> Hammer is no longer supported.
>> >> >>
>> >> >> What's the status of osds 7 and 17?
>> >> >>
>> >> >> On Tue, Mar 26, 2019 at 8:56 AM solarflow99  
>> >> >> wrote:
>> >> >> >
>> >> >> > hi, thanks.  Its still using Hammer.  Here's the output from the pg 
>> >> >> > query, the last command you gave doesn't work at all but be too old.
>> >> >> >
>> >> >> >
>> >> >> > # ceph pg 10.2a query
>> >> >> > {
>> >> >> > "state": "active+clean+inconsistent",
>> >> >> > "snap_trimq": "[]",
>> >> >> > "epoch": 23265,
>> >> >> > "up": [
>> >> >> > 41,
>> >> >> > 38,
>> >> >> > 8
>> >> >> > ],
>> >> >> > "acting": [
>> >> >> > 41,
>> >> >> > 38,
>> >> >> > 8
>> >> >> > ],
>> >> >> > "actingbackfill": [
>> >> >> > "8",
>> >> >> > "38",
>> >> >> > "41"
>> >> >> > ],
>> >> >> > "info": {
>> >> >> > "pgid": "10.2a",
>> >> >> > "last_update": "23265'20886859",
>> >> >> > "last_complete": "23265'20886859",
>> >> >> > "log_tail": "23265'20883809",
>> >> >> > "last_user_version": 20886859,
>> >> >> > "last_backfill": "MAX",
>> >> >> > "purged_snaps": "[]",
>> >> >> > "history": {
>> >> >> > "epoch_created": 8200,
>> >> >> > "last_epoch_started": 21481,
>> >> >> > "last_epoch_clean": 21487,
>> >> >> > "last_epoch_split": 0,
>> >> >> > "same_up_since": 21472,
>> >> >> > "same_interval_since": 21474,
>> >> >> > "same_primary_since": 8244,
>> >> >> > "last_scrub": "23265'20864209",
>> >> >> > "last_scrub_stamp": "2019-03-22 22:39:13.930673",
>> >> >> > "last_deep_scrub": "23265'20864209",
>> >> >> > "last_deep_scrub_stamp": "2019-03-22 22:39:13.930673",
>> >> >> > "last_clea

Re: [ceph-users] scrub errors

2019-03-28 Thread solarflow99
ok, I tried doing ceph osd out on each of the 4 OSDs 1 by 1.  I got it out
of backfill mode but still not sure if it'll fix anything.  pg 10.2a still
shows state active+clean+inconsistent.  Peer 8  is now
remapped+inconsistent+peering, and the other peer is
active+clean+inconsistent


On Wed, Mar 27, 2019 at 4:13 PM Brad Hubbard  wrote:

> On Thu, Mar 28, 2019 at 8:33 AM solarflow99  wrote:
> >
> > yes, but nothing seems to happen.  I don't understand why it lists OSDs
> 7 in the  "recovery_state": when i'm only using 3 replicas and it seems to
> use 41,38,8
>
> Well, osd 8s state is listed as
> "active+undersized+degraded+remapped+wait_backfill" so it seems to be
> stuck waiting for backfill for some reason. One thing you could try is
> restarting all of the osds including 7 and 17 to see if forcing them
> to peer again has any positive effect. Don't restart them all at once,
> just one at a time waiting until each has peered before moving on.
>
> >
> > # ceph health detail
> > HEALTH_ERR 1 pgs inconsistent; 47 scrub errors
> > pg 10.2a is active+clean+inconsistent, acting [41,38,8]
> > 47 scrub errors
> >
> >
> >
> > As you can see all OSDs are up and in:
> >
> > # ceph osd stat
> >  osdmap e23265: 49 osds: 49 up, 49 in
> >
> >
> >
> >
> > And this just stays the same:
> >
> > "up": [
> > 41,
> > 38,
> > 8
> > ],
> > "acting": [
> > 41,
> > 38,
> > 8
> >
> >  "recovery_state": [
> > {
> > "name": "Started\/Primary\/Active",
> > "enter_time": "2018-09-22 07:07:48.637248",
> > "might_have_unfound": [
> > {
> > "osd": "7",
> > "status": "not queried"
> > },
> > {
> > "osd": "8",
> > "status": "already probed"
> > },
> > {
> > "osd": "17",
> > "status": "not queried"
> > },
> > {
> > "osd": "38",
> > "status": "already probed"
> > }
> > ],
> >
> >
> > On Tue, Mar 26, 2019 at 4:53 PM Brad Hubbard 
> wrote:
> >>
> >>
> http://docs.ceph.com/docs/hammer/rados/troubleshooting/troubleshooting-pg/
> >>
> >> Did you try repairing the pg?
> >>
> >>
> >> On Tue, Mar 26, 2019 at 9:08 AM solarflow99 
> wrote:
> >> >
> >> > yes, I know its old.  I intend to have it replaced but thats a few
> months away and was hoping to get past this.  the other OSDs appear to be
> ok, I see them up and in, why do you see something wrong?
> >> >
> >> > On Mon, Mar 25, 2019 at 4:00 PM Brad Hubbard 
> wrote:
> >> >>
> >> >> Hammer is no longer supported.
> >> >>
> >> >> What's the status of osds 7 and 17?
> >> >>
> >> >> On Tue, Mar 26, 2019 at 8:56 AM solarflow99 
> wrote:
> >> >> >
> >> >> > hi, thanks.  Its still using Hammer.  Here's the output from the
> pg query, the last command you gave doesn't work at all but be too old.
> >> >> >
> >> >> >
> >> >> > # ceph pg 10.2a query
> >> >> > {
> >> >> > "state": "active+clean+inconsistent",
> >> >> > "snap_trimq": "[]",
> >> >> > "epoch": 23265,
> >> >> > "up": [
> >> >> > 41,
> >> >> > 38,
> >> >> > 8
> >> >> > ],
> >> >> > "acting": [
> >> >> > 41,
> >> >> > 38,
> >> >> > 8
> >> >> > ],
> >> >> > "actingbackfill": [
> >> >> > "8",
> >> >> > "38",
> >> >> > "41"
> >> >> > ],
> >> >> > "info": {
> >> >> > "pgid": "10.2a",
> >> >> > "last_update": "23265'20886859",
> >> >> > "last_complete": "23265'20886859",
> >> >> > "log_tail": "23265'20883809",
> >> >> > "last_user_version": 20886859,
> >> >> > "last_backfill": "MAX",
> >> >> > "purged_snaps": "[]",
> >> >> > "history": {
> >> >> > "epoch_created": 8200,
> >> >> > "last_epoch_started": 21481,
> >> >> > "last_epoch_clean": 21487,
> >> >> > "last_epoch_split": 0,
> >> >> > "same_up_since": 21472,
> >> >> > "same_interval_since": 21474,
> >> >> > "same_primary_since": 8244,
> >> >> > "last_scrub": "23265'20864209",
> >> >> > "last_scrub_stamp": "2019-03-22 22:39:13.930673",
> >> >> > "last_deep_scrub": "23265'20864209",
> >> >> > "last_deep_scrub_stamp": "2019-03-22 22:39:13.930673",
> >> >> > "last_clean_scrub_stamp": "2019-03-15 01:33:21.447438"
> >> >> > },
> >> >> > "stats": {
> >> >> > "version": "23265'20886859",
> >> >> > "reported_seq": "10109937",
> >> >> > "reported_epoch": "23265",
> >> >> > "state": "active+clean+inconsistent",
> >> >> > "last_fresh": "2019-03-25 15:52:53.720768",
> >> >> > "last_change": "2019-03-22 22:39:13.9

Re: [ceph-users] Blocked ops after change from filestore on HDD to bluestore on SDD

2019-03-28 Thread ceph
Hi Uwe,

Am 28. Februar 2019 11:02:09 MEZ schrieb Uwe Sauter :
>Am 28.02.19 um 10:42 schrieb Matthew H:
>> Have you made any changes to your ceph.conf? If so, would you mind
>copying them into this thread?
>
>No, I just deleted an OSD, replaced HDD with SDD and created a new OSD
>(with bluestore). Once the cluster was healty again, I
>repeated with the next OSD.
>
>
>[global]
>  auth client required = cephx
>  auth cluster required = cephx
>  auth service required = cephx
>  cluster network = 169.254.42.0/24
>  fsid = 753c9bbd-74bd-4fea-8c1e-88da775c5ad4
>  keyring = /etc/pve/priv/$cluster.$name.keyring
>  public network = 169.254.42.0/24
>
>[mon]
>  mon allow pool delete = true
>  mon data avail crit = 5
>  mon data avail warn = 15
>
>[osd]
>  keyring = /var/lib/ceph/osd/ceph-$id/keyring
>  osd journal size = 5120
>  osd pool default min size = 2
>  osd pool default size = 3
>  osd max backfills = 6
>  osd recovery max active = 12

I guess should decrease  this last two  parameters to 1. This should help to 
avoid to much pressure on your drives...

Hth
- Mehmet 

>
>[mon.px-golf-cluster]
>  host = px-golf-cluster
>  mon addr = 169.254.42.54:6789
>
>[mon.px-hotel-cluster]
>  host = px-hotel-cluster
>  mon addr = 169.254.42.55:6789
>
>[mon.px-india-cluster]
>  host = px-india-cluster
>  mon addr = 169.254.42.56:6789
>
>
>
>
>> 
>>
>--
>> *From:* ceph-users  on behalf of
>Vitaliy Filippov 
>> *Sent:* Wednesday, February 27, 2019 4:21 PM
>> *To:* Ceph Users
>> *Subject:* Re: [ceph-users] Blocked ops after change from filestore
>on HDD to bluestore on SDD
>>  
>> I think this should not lead to blocked ops in any case, even if the 
>> performance is low...
>> 
>> -- 
>> With best regards,
>>    Vitaliy Filippov
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Latest recommendations on sizing

2019-03-28 Thread Nathan Harper
Hi,

We are looking at extending one of our Ceph clusters, currently running
Luminous.   The cluster is all SSD, providing RBD to Openstack, using 70
OSDs on 5 hosts.

We have a couple of projects kicking off that will need significantly more,
albeit slower storage.  I am looking at speccing out some new OSD nodes
with higher capacity spinning drives.

We are deploying 25GbE these days, so I am not worried about network
bandwidth (and have taken on recent comments suggesting that there is no
reason to run separate cluster/public networks.

What about CPUs - is it still worth 2x CPUs?  Our current OSD hosts have 2x
CPUs but neither seems particularly busy.  Would a single higher spec CPU
win out over dual lower spec CPUs, taking on board previous discussion that
GHz is king.

SSD/NVMe for WAL etc?   We're running Bluestore on all of our SSD OSDS with
colocated WAL.

We are looking to provide ~500TB into a separate (non-default) storage
pool, and so would appreciate suggestions about where my money should be
going (or not going).
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH OSD Restarts taking too long v10.2.9

2019-03-28 Thread Nikhil R
Thanks huang for the reply.
Its is the disk compaction taking more time
the disk i/o is completely utilized upto 100%
looks like both osd_compact_leveldb_on_mount = false &
leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9
is there a way to turn off compaction?

Also, the reason why we are restarting osd's is due to splitting and we
increased split multiple and merge_threshold.
Is there a way we would inject it? Is osd restarts the only solution?

Thanks In Advance

in.linkedin.com/in/nikhilravindra



On Thu, Mar 28, 2019 at 3:58 PM huang jun  wrote:

> Did the time really cost on db compact operation?
> or you can turn on debug_osd=20 to see what happens,
> what about the disk util during start?
>
> Nikhil R  于2019年3月28日周四 下午4:36写道:
> >
> > CEPH osd restarts are taking too long a time
> > below is my ceph.conf
> > [osd]
> > osd_compact_leveldb_on_mount = false
> > leveldb_compact_on_mount = false
> > leveldb_cache_size=1073741824
> > leveldb_compression = false
> > osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k"
> > osd_max_backfills = 1
> > osd_recovery_max_active = 1
> > osd_recovery_op_priority = 1
> > filestore_split_multiple = 72
> > filestore_merge_threshold = 480
> > osd_max_scrubs = 1
> > osd_scrub_begin_hour = 22
> > osd_scrub_end_hour = 3
> > osd_deep_scrub_interval = 2419200
> > osd_scrub_sleep = 0.1
> >
> > looks like both osd_compact_leveldb_on_mount = false &
> leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9
> >
> > Any ideas on a fix would be appreciated asap
> > in.linkedin.com/in/nikhilravindra
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Thank you!
> HuangJun
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH OSD Restarts taking too long v10.2.9

2019-03-28 Thread huang jun
Did the time really cost on db compact operation?
or you can turn on debug_osd=20 to see what happens,
what about the disk util during start?

Nikhil R  于2019年3月28日周四 下午4:36写道:
>
> CEPH osd restarts are taking too long a time
> below is my ceph.conf
> [osd]
> osd_compact_leveldb_on_mount = false
> leveldb_compact_on_mount = false
> leveldb_cache_size=1073741824
> leveldb_compression = false
> osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k"
> osd_max_backfills = 1
> osd_recovery_max_active = 1
> osd_recovery_op_priority = 1
> filestore_split_multiple = 72
> filestore_merge_threshold = 480
> osd_max_scrubs = 1
> osd_scrub_begin_hour = 22
> osd_scrub_end_hour = 3
> osd_deep_scrub_interval = 2419200
> osd_scrub_sleep = 0.1
>
> looks like both osd_compact_leveldb_on_mount = false & 
> leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9
>
> Any ideas on a fix would be appreciated asap
> in.linkedin.com/in/nikhilravindra
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] "Failed to authpin" results in large number of blocked requests

2019-03-28 Thread Zoë O'Connell
We're running a Ceph mimic (13.2.4) cluster which is predominantly used 
for CephFS. We have recently switched to using multiple active MDSes to 
cope with load on the cluster, but are experiencing problems with large 
numbers of blocked requests when research staff run large experiments. 
The error associated with the block is:


2019-03-28 09:31:34.246326 [WRN]  6 slow requests, 0 included below; 
oldest blocked for > 423.987868 secs
2019-03-28 09:31:29.246202 [WRN]  slow request 62.572806 seconds old, 
received at 2019-03-28 09:30:26.673298: 
client_request(client.5882168:1404749 lookup #0x1000441/run_output 
2019-03-28 09:30:26.653089 caller_uid=0, caller_gid=0{}) currently 
failed to authpin, subtree is being exported


Eventually, many hundreds of requests are blocked for hours.

It appears (As alluded to by the subtree is being exported error) that 
this is related to the MDSes remapping entries between ranks under load, 
as it is always accompanied by messages along the lines of 
"mds.0.migrator nicely exporting to mds.1". Migrations that occur when 
the cluster is not under heavy load complete OK, but under load it seems 
the operation is not completed or entering deadlock for some reason.


We can clear the immediate problem by restarting the affected MDS, and 
have a partial solution by using subtree pinning on everything but this 
is far from ideal.  Does anyone have any pointers where else we should 
be looking to troubleshoot this?


Thanks,

Zoe.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CEPH OSD Restarts taking too long v10.2.9

2019-03-28 Thread Nikhil R
CEPH osd restarts are taking too long a time
below is my ceph.conf
[osd]
osd_compact_leveldb_on_mount = false
leveldb_compact_on_mount = false
leveldb_cache_size=1073741824
leveldb_compression = false
osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k"
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_op_priority = 1
filestore_split_multiple = 72
filestore_merge_threshold = 480
osd_max_scrubs = 1
osd_scrub_begin_hour = 22
osd_scrub_end_hour = 3
osd_deep_scrub_interval = 2419200
osd_scrub_sleep = 0.1

looks like both osd_compact_leveldb_on_mount = false &
leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9

Any ideas on a fix would be appreciated asap
in.linkedin.com/in/nikhilravindra
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com