Re: [ceph-users] RGW pools don't show up in luminous

2018-08-24 Thread Robert Stanford
 Casey - this was exactly it.  My ceph-mgr had issues.  I didn't know this
was necessary for ceph df to work.  Thank you

R

On Fri, Aug 24, 2018 at 8:56 AM Casey Bodley  wrote:

>
>
> On 08/23/2018 01:22 PM, Robert Stanford wrote:
> >
> >  I installed a new Ceph cluster with Luminous, after a long time
> > working with Jewel.  I created my RGW pools the same as always (pool
> > create default.rgw.buckets.data etc.), but they don't show up in ceph
> > df with Luminous.  Has the command changed?
> >
> >  Thanks
> >  R
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> Hi Robert,
>
> Do you have a ceph-mgr running? I believe the accounting for 'ceph df'
> is performed by ceph-mgr in Luminous and beyond, rather than ceph-mon.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client hangs

2018-08-24 Thread Yan, Zheng
Are there hang request in /sys/kernel/debug/ceph//osdc

On Fri, Aug 24, 2018 at 9:32 PM Zhenshi Zhou  wrote:
>
> I'm afaid that the client hangs again...the log shows:
>
> 2018-08-24 21:27:54.714334 [WRN]  slow request 62.607608 seconds old, 
> received at 2018-08-24 21:26:52.106633: client_request(client.213528:241811 
> getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:26:52.106425 caller_uid=0, 
> caller_gid=0{}) currently failed to rdlock, waiting
> 2018-08-24 21:27:54.714320 [WRN]  3 slow requests, 1 included below; oldest 
> blocked for > 843.556758 secs
> 2018-08-24 21:27:24.713740 [WRN]  slow request 32.606979 seconds old, 
> received at 2018-08-24 21:26:52.106633: client_request(client.213528:241811 
> getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:26:52.106425 caller_uid=0, 
> caller_gid=0{}) currently failed to rdlock, waiting
> 2018-08-24 21:27:24.713729 [WRN]  3 slow requests, 1 included below; oldest 
> blocked for > 813.556129 secs
> 2018-08-24 21:25:49.711778 [WRN]  slow request 483.807963 seconds old, 
> received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810 
> getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:17:45.903049 caller_uid=0, 
> caller_gid=0{}) currently failed to rdlock, waiting
> 2018-08-24 21:25:49.711766 [WRN]  2 slow requests, 1 included below; oldest 
> blocked for > 718.554206 secs
> 2018-08-24 21:21:54.707536 [WRN]  client.213528 isn't responding to 
> mclientcaps(revoke), ino 0x12e7e5a pending pAsLsXsFr issued pAsLsXsFscr, 
> sent 483.548912 seconds ago
> 2018-08-24 21:21:54.706930 [WRN]  slow request 483.549363 seconds old, 
> received at 2018-08-24 21:13:51.157483: client_request(client.267792:649065 
> setattr size=0 mtime=2018-08-24 21:13:51.163236 #0x12e7e5a 2018-08-24 
> 21:13:51.163236 caller_uid=0, caller_gid=0{}) currently failed to xlock, 
> waiting
> 2018-08-24 21:21:54.706920 [WRN]  2 slow requests, 1 included below; oldest 
> blocked for > 483.549363 secs
> 2018-08-24 21:21:49.706838 [WRN]  slow request 243.803027 seconds old, 
> received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810 
> getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:17:45.903049 caller_uid=0, 
> caller_gid=0{}) currently failed to rdlock, waiting
> 2018-08-24 21:21:49.706828 [WRN]  2 slow requests, 1 included below; oldest 
> blocked for > 478.549269 secs
> 2018-08-24 21:19:49.704294 [WRN]  slow request 123.800486 seconds old, 
> received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810 
> getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:17:45.903049 caller_uid=0, 
> caller_gid=0{}) currently failed to rdlock, waiting
> 2018-08-24 21:19:49.704284 [WRN]  2 slow requests, 1 included below; oldest 
> blocked for > 358.546729 secs
> 2018-08-24 21:18:49.703073 [WRN]  slow request 63.799269 seconds old, 
> received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810 
> getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:17:45.903049 caller_uid=0, 
> caller_gid=0{}) currently failed to rdlock, waiting
> 2018-08-24 21:18:49.703062 [WRN]  2 slow requests, 1 included below; oldest 
> blocked for > 298.545511 secs
> 2018-08-24 21:18:19.702465 [WRN]  slow request 33.798637 seconds old, 
> received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810 
> getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:17:45.903049 caller_uid=0, 
> caller_gid=0{}) currently failed to rdlock, waiting
> 2018-08-24 21:18:19.702456 [WRN]  2 slow requests, 1 included below; oldest 
> blocked for > 268.544880 secs
> 2018-08-24 21:17:54.702517 [WRN]  client.213528 isn't responding to 
> mclientcaps(revoke), ino 0x12e7e5a pending pAsLsXsFr issued pAsLsXsFscr, 
> sent 243.543893 seconds ago
> 2018-08-24 21:17:54.701904 [WRN]  slow request 243.544331 seconds old, 
> received at 2018-08-24 21:13:51.157483: client_request(client.267792:649065 
> setattr size=0 mtime=2018-08-24 21:13:51.163236 #0x12e7e5a 2018-08-24 
> 21:13:51.163236 caller_uid=0, caller_gid=0{}) currently failed to xlock, 
> waiting
> 2018-08-24 21:17:54.701894 [WRN]  1 slow requests, 1 included below; oldest 
> blocked for > 243.544331 secs
> 2018-08-24 21:15:54.700034 [WRN]  client.213528 isn't responding to 
> mclientcaps(revoke), ino 0x12e7e5a pending pAsLsXsFr issued pAsLsXsFscr, 
> sent 123.541410 seconds ago
> 2018-08-24 21:15:54.699385 [WRN]  slow request 123.541822 seconds old, 
> received at 2018-08-24 21:13:51.157483: client_request(client.267792:649065 
> setattr size=0 mtime=2018-08-24 21:13:51.163236 #0x12e7e5a 2018-08-24 
> 21:13:51.163236 caller_uid=0, caller_gid=0{}) currently failed to xlock, 
> waiting
> 2018-08-24 21:15:54.699375 [WRN]  1 slow requests, 1 included below; oldest 
> blocked for > 123.541822 secs
> 2018-08-24 21:14:57.055183 [WRN]  Health check failed: 1 clients failing to 
> respond to capability release (MDS_CLIENT_LATE_RELEASE)
> 2018-08-24 21:14:56.167868 [WRN]  MDS health message (mds.0): Client docker39 
> failing to respond 

Re: [ceph-users] ceph-fuse slow cache?

2018-08-24 Thread Gregory Farnum
On Fri, Aug 24, 2018 at 1:20 AM Stefan Kooman  wrote:

> Hi Gregory,
>
> Quoting Gregory Farnum (gfar...@redhat.com):
> > This is quite strange. Given that you have a log, I think what you want
> to
> > do is find one request in the log, trace it through its lifetime, and see
> > where the time is elapsed. You may find a bifurcation, where some
> > categories of requests happen instantly but other categories take a
> second
> > or more; focus on the second, obviously.
>
> So that is what I did. Turns out it's not the (slow) cache at all, probably
> not to your surprise. The reads are quit fast actually, compared to
> kernel client it's ~ 8 ms slower, or ~ 40%. It looks like couple
> of writes / updates to, at least a session file, are slow:
>
> 2018-08-23 16:40:25.631 7f79156a8700 10 client.15158830 put_inode on
> 0x1693859.head(faked_ino=0 ref=5 ll_ref=1 cap_refs={} open={3=1}
> mode=100600 size=0/4194304 nlink=1 btime=2018-08-23 16:40:25.632601
> mtime=2018-08-23 16:40:25.632601 ctime=2018-08-23 16:40:25.632601
> caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x1693859 ts 0/0
> objects 0 dirty_or_tx 0]
> parents=0x168547c.head["sess_ia0agoj01ul4rob7ji55ouca41"]
> 0x5646ff0e8000)
>
> 2018-08-23 16:40:28.547 7f79156a8700 10 client.15158830
> update_inode_file_time 0x1693859.head(faked_ino=0 ref=4 ll_ref=1
> cap_refs={} open={3=1} mode=100600 size=0/4194304 nlink=1
> btime=2018-08-23 16:40:25.632601 mtime=2018-08-23 16:40:25.632601
> ctime=2018-08-23 16:40:25.632601
> caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x1693859 ts 0/0
> objects 0 dirty_or_tx 0]
> parents=0x168547c.head["sess_ia0agoj01ul4rob7ji55ouca41"]
> 0x5646ff0e8000) pAsxLsXsxFsxcrwb ctime 2018-08-23 16:40:25.632601 mtime
> 2018-08-23 16:40:25.632601
>

Hmm, these aren't actually the start and end times to the same operation.
put_inode() is literally adjusting a refcount, which can happen for reasons
ranging from the VFS doing something that drops it to an internal operation
completing to a response coming back from the MDS. You should be able to
find requests coming in from the kernel and a response going back out (the
function names will be prefixed with "ll_", eg "ll_lookup").


>
> So, almost 3 seconds. Page is only served after this, and possibly, after
> some cache files have been written. Note though that ceph-fuse is in
> debug=20 mode. Apparently the kernel client is _much_ faster in writing
> than ceph-fuse. If I write a file with "dd" (from /dev/urandom) it's in
> the tens of milliseconds range, not seconds. atime / ctime changes are
> handled in < 5 ms.
>
> I wonder if tuning file-striping [1] with stripe units of 4KB would be
> beneficial in this case.
>
> Gr. Stefan
>
> [1]: http://docs.ceph.com/docs/master/dev/file-striping/
>
> --
> | BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
> | GPG: 0xD14839C6   +31 318 648 688
> <+31%20318%20648%20688> / i...@bit.nl
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mimic - troubleshooting prometheus

2018-08-24 Thread Steven Vacaroaia
Hi,

Any idea/suggestions for troubleshooting prometheus ?

what logs /commands are available to find out why OSD servers specific
data ( IOPS, disk and network data) is not scrapped but cluster specific
data ( pools, capacity ..etc) is ?

Increasing log level for MGR showed only the following

2018-08-24 13:46:23.395 7f73d54ce700 20 mgr[prometheus] ignoring
osd.op_r_latency_out_bytes_histogram, type
2018-08-24 13:46:23.395 7f73d54ce700 20 mgr[prometheus] ignoring
osd.op_rw_latency_out_bytes_histogram, type
2018-08-24 13:46:23.395 7f73d54ce700 20 mgr[prometheus] ignoring
osd.op_rw_latency_in_bytes_histogram, type
2018-08-24 13:46:23.395 7f73d54ce700 20 mgr[prometheus] ignoring
osd.op_w_latency_in_bytes_histogram, type
2018-08-24 13:46:23.395 7f73d54ce700 20 mgr[prometheus] ignoring
osd.op_r_latency_out_bytes_histogram, type
2018-08-24 13:46:23.396 7f73d54ce700 20 mgr[prometheus] ignoring
osd.op_rw_latency_out_bytes_histogram, type
2018-08-24 13:46:23.396 7f73d54ce700 20 mgr[prometheus] ignoring
osd.op_rw_latency_in_bytes_histogram, type
2018-08-24 13:46:23.396 7f73d54ce700 20 mgr[prometheus] ignoring
osd.op_w_latency_in_bytes_histogram, type
2018-08-24 13:46:23.396 7f73d54ce700 20 mgr[prometheus] ignoring
osd.op_r_latency_out_bytes_histogram, type
2018-08-24 13:46:23.397 7f73d54ce700 20 mgr[prometheus] ignoring
osd.op_rw_latency_out_bytes_histogram, type
2018-08-24 13:46:23.397 7f73d54ce700 20 mgr[prometheus] ignoring
osd.op_rw_latency_in_bytes_histogram, type
2018-08-24 13:46:23.397 7f73d54ce700 20 mgr[prometheus] ignoring
osd.op_w_latency_in_bytes_histogram, type
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mimic prometheus plugin -no socket could be created

2018-08-24 Thread Steven Vacaroaia
To have prometheus plugin working you HAVE to tell it to listen to an IPV4
address ...like this

ceph config set mgr mgr/prometheus/server_addr 0.0.0.0

On Fri, 24 Aug 2018 at 12:44, Jones de Andrade  wrote:

> Hi all.
>
> I'm new to ceph, and after having serious problems in ceph stages 0, 1 and
> 2 that I could solve myself, now it seems that I have hit a wall harder
> than my head. :)
>
> When I run salt-run state.orch ceph.stage.deploy, i monitor I see it going
> up to here:
>
> ###
> [14/71]   ceph.sysctl on
>   node01... ✓ (0.5s)
>   node02 ✓ (0.7s)
>   node03... ✓ (0.6s)
>   node04. ✓ (0.5s)
>   node05... ✓ (0.6s)
>   node06.. ✓ (0.5s)
>
> [15/71]   ceph.osd on
>   node01.. ❌ (0.7s)
>   node02 ❌ (0.7s)
>   node03... ❌ (0.7s)
>   node04. ❌ (0.6s)
>   node05... ❌ (0.6s)
>   node06.. ❌ (0.7s)
>
> Ended stage: ceph.stage.deploy succeeded=14/71 failed=1/71 time=624.7s
>
> Failures summary:
>
> ceph.osd (/srv/salt/ceph/osd):
>   node02:
> deploy OSDs: Module function osd.deploy threw an exception. Exception:
> Mine on node02 for cephdisks.list
>   node03:
> deploy OSDs: Module function osd.deploy threw an exception. Exception:
> Mine on node03 for cephdisks.list
>   node01:
> deploy OSDs: Module function osd.deploy threw an exception. Exception:
> Mine on node01 for cephdisks.list
>   node04:
> deploy OSDs: Module function osd.deploy threw an exception. Exception:
> Mine on node04 for cephdisks.list
>   node05:
> deploy OSDs: Module function osd.deploy threw an exception. Exception:
> Mine on node05 for cephdisks.list
>   node06:
> deploy OSDs: Module function osd.deploy threw an exception. Exception:
> Mine on node06 for cephdisks.list
> ###
>
> Since this is a first attempt in 6 simple test machines, we are going to
> put the mon, osds, etc, in all nodes at first. Only the master is left in a
> single machine (node01) by now.
>
> As they are simple machines, they have a single hdd, which is partitioned
> as follows (the hda4 partition is unmounted and left for the ceph system):
>
> ###
> # lsblk
> NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
> sda  8:00 465,8G  0 disk
> ├─sda1   8:10   500M  0 part /boot/efi
> ├─sda2   8:2016G  0 part [SWAP]
> ├─sda3   8:30  49,3G  0 part /
> └─sda4   8:40   400G  0 part
> sr0 11:01   3,7G  0 rom
>
> # salt -I 'roles:storage' cephdisks.list
> node01:
> node02:
> node03:
> node04:
> node05:
> node06:
>
> # salt -I 'roles:storage' pillar.get ceph
> node02:
> --
> storage:
> --
> osds:
> --
> /dev/sda4:
> --
> format:
> bluestore
> standalone:
> True
> (and so on for all 6 machines)
> ##
>
> Finally and just in case, my policy.cfg file reads:
>
> #
> #cluster-unassigned/cluster/*.sls
> cluster-ceph/cluster/*.sls
> profile-default/cluster/*.sls
> profile-default/stack/default/ceph/minions/*yml
> config/stack/default/global.yml
> config/stack/default/ceph/cluster.yml
> role-master/cluster/node01.sls
> role-admin/cluster/*.sls
> role-mon/cluster/*.sls
> role-mgr/cluster/*.sls
> role-mds/cluster/*.sls
> role-ganesha/cluster/*.sls
> role-client-nfs/cluster/*.sls
> role-client-cephfs/cluster/*.sls
> ##
>
> Please, could someone help me and shed some light on this issue?
>
> Thanks a lot in advance,
>
> Regasrds,
>
> Jones
>
>
>
> On Thu, Aug 23, 2018 at 2:46 PM John Spray  wrote:
>
>> On Thu, Aug 23, 2018 at 5:18 PM Steven Vacaroaia 
>> wrote:
>> >
>> > Hi All,
>> >
>> > I am trying to enable prometheus plugin with no success due to "no
>> socket could be created"
>> >
>> > The instructions for enabling the plugin are very straightforward and
>> simple
>> >
>> > Note
>> > My ultimate goal is to use Prometheus with Cephmetrics
>> > Some of you suggested to deploy ceph-exporter but why do we need to do
>> that when there is a plugin already ?
>> >
>> >
>> > How can I troubleshoot this further ?
>> >
>> > nhandled exception from module 'prometheus' while running on mgr.mon01:
>> error('No socket could be created',)
>> > Aug 23 12:03:06 mon01 ceph-mgr: 2018-08-23 12:03:06.615 7fadab50e700 -1
>> prometheus.serve:
>> > Aug 23 12:03:06 mon01 ceph-mgr: 2018-08-23 12:03:06.615 7fadab50e700 -1
>> Traceback (most recent call last):
>> > Aug 23 12:03:06 mon01 ceph-mgr: File
>> 

[ceph-users] Ceph-Deploy error on 15/71 stage

2018-08-24 Thread Jones de Andrade
(Please forgive my previous email: I was using another message and
completely forget to update the subject)

Hi all.

I'm new to ceph, and after having serious problems in ceph stages 0, 1 and
2 that I could solve myself, now it seems that I have hit a wall harder
than my head. :)

When I run salt-run state.orch ceph.stage.deploy, i monitor I see it going
up to here:

###
[14/71]   ceph.sysctl on
  node01... ✓ (0.5s)
  node02 ✓ (0.7s)
  node03... ✓ (0.6s)
  node04. ✓ (0.5s)
  node05... ✓ (0.6s)
  node06.. ✓ (0.5s)

[15/71]   ceph.osd on
  node01.. ❌ (0.7s)
  node02 ❌ (0.7s)
  node03... ❌ (0.7s)
  node04. ❌ (0.6s)
  node05... ❌ (0.6s)
  node06.. ❌ (0.7s)

Ended stage: ceph.stage.deploy succeeded=14/71 failed=1/71 time=624.7s

Failures summary:

ceph.osd (/srv/salt/ceph/osd):
  node02:
deploy OSDs: Module function osd.deploy threw an exception. Exception:
Mine on node02 for cephdisks.list
  node03:
deploy OSDs: Module function osd.deploy threw an exception. Exception:
Mine on node03 for cephdisks.list
  node01:
deploy OSDs: Module function osd.deploy threw an exception. Exception:
Mine on node01 for cephdisks.list
  node04:
deploy OSDs: Module function osd.deploy threw an exception. Exception:
Mine on node04 for cephdisks.list
  node05:
deploy OSDs: Module function osd.deploy threw an exception. Exception:
Mine on node05 for cephdisks.list
  node06:
deploy OSDs: Module function osd.deploy threw an exception. Exception:
Mine on node06 for cephdisks.list
###

Since this is a first attempt in 6 simple test machines, we are going to
put the mon, osds, etc, in all nodes at first. Only the master is left in a
single machine (node01) by now.

As they are simple machines, they have a single hdd, which is partitioned
as follows (the hda4 partition is unmounted and left for the ceph system):

###
# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda  8:00 465,8G  0 disk
├─sda1   8:10   500M  0 part /boot/efi
├─sda2   8:2016G  0 part [SWAP]
├─sda3   8:30  49,3G  0 part /
└─sda4   8:40   400G  0 part
sr0 11:01   3,7G  0 rom

# salt -I 'roles:storage' cephdisks.list
node01:
node02:
node03:
node04:
node05:
node06:

# salt -I 'roles:storage' pillar.get ceph
node02:
--
storage:
--
osds:
--
/dev/sda4:
--
format:
bluestore
standalone:
True
(and so on for all 6 machines)
##

Finally and just in case, my policy.cfg file reads:

#
#cluster-unassigned/cluster/*.sls
cluster-ceph/cluster/*.sls
profile-default/cluster/*.sls
profile-default/stack/default/ceph/minions/*yml
config/stack/default/global.yml
config/stack/default/ceph/cluster.yml
role-master/cluster/node01.sls
role-admin/cluster/*.sls
role-mon/cluster/*.sls
role-mgr/cluster/*.sls
role-mds/cluster/*.sls
role-ganesha/cluster/*.sls
role-client-nfs/cluster/*.sls
role-client-cephfs/cluster/*.sls
##

Please, could someone help me and shed some light on this issue?

Thanks a lot in advance,

Regasrds,

Jones
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mimic prometheus plugin -no socket could be created

2018-08-24 Thread Jones de Andrade
Hi all.

I'm new to ceph, and after having serious problems in ceph stages 0, 1 and
2 that I could solve myself, now it seems that I have hit a wall harder
than my head. :)

When I run salt-run state.orch ceph.stage.deploy, i monitor I see it going
up to here:

###
[14/71]   ceph.sysctl on
  node01... ✓ (0.5s)
  node02 ✓ (0.7s)
  node03... ✓ (0.6s)
  node04. ✓ (0.5s)
  node05... ✓ (0.6s)
  node06.. ✓ (0.5s)

[15/71]   ceph.osd on
  node01.. ❌ (0.7s)
  node02 ❌ (0.7s)
  node03... ❌ (0.7s)
  node04. ❌ (0.6s)
  node05... ❌ (0.6s)
  node06.. ❌ (0.7s)

Ended stage: ceph.stage.deploy succeeded=14/71 failed=1/71 time=624.7s

Failures summary:

ceph.osd (/srv/salt/ceph/osd):
  node02:
deploy OSDs: Module function osd.deploy threw an exception. Exception:
Mine on node02 for cephdisks.list
  node03:
deploy OSDs: Module function osd.deploy threw an exception. Exception:
Mine on node03 for cephdisks.list
  node01:
deploy OSDs: Module function osd.deploy threw an exception. Exception:
Mine on node01 for cephdisks.list
  node04:
deploy OSDs: Module function osd.deploy threw an exception. Exception:
Mine on node04 for cephdisks.list
  node05:
deploy OSDs: Module function osd.deploy threw an exception. Exception:
Mine on node05 for cephdisks.list
  node06:
deploy OSDs: Module function osd.deploy threw an exception. Exception:
Mine on node06 for cephdisks.list
###

Since this is a first attempt in 6 simple test machines, we are going to
put the mon, osds, etc, in all nodes at first. Only the master is left in a
single machine (node01) by now.

As they are simple machines, they have a single hdd, which is partitioned
as follows (the hda4 partition is unmounted and left for the ceph system):

###
# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda  8:00 465,8G  0 disk
├─sda1   8:10   500M  0 part /boot/efi
├─sda2   8:2016G  0 part [SWAP]
├─sda3   8:30  49,3G  0 part /
└─sda4   8:40   400G  0 part
sr0 11:01   3,7G  0 rom

# salt -I 'roles:storage' cephdisks.list
node01:
node02:
node03:
node04:
node05:
node06:

# salt -I 'roles:storage' pillar.get ceph
node02:
--
storage:
--
osds:
--
/dev/sda4:
--
format:
bluestore
standalone:
True
(and so on for all 6 machines)
##

Finally and just in case, my policy.cfg file reads:

#
#cluster-unassigned/cluster/*.sls
cluster-ceph/cluster/*.sls
profile-default/cluster/*.sls
profile-default/stack/default/ceph/minions/*yml
config/stack/default/global.yml
config/stack/default/ceph/cluster.yml
role-master/cluster/node01.sls
role-admin/cluster/*.sls
role-mon/cluster/*.sls
role-mgr/cluster/*.sls
role-mds/cluster/*.sls
role-ganesha/cluster/*.sls
role-client-nfs/cluster/*.sls
role-client-cephfs/cluster/*.sls
##

Please, could someone help me and shed some light on this issue?

Thanks a lot in advance,

Regasrds,

Jones



On Thu, Aug 23, 2018 at 2:46 PM John Spray  wrote:

> On Thu, Aug 23, 2018 at 5:18 PM Steven Vacaroaia  wrote:
> >
> > Hi All,
> >
> > I am trying to enable prometheus plugin with no success due to "no
> socket could be created"
> >
> > The instructions for enabling the plugin are very straightforward and
> simple
> >
> > Note
> > My ultimate goal is to use Prometheus with Cephmetrics
> > Some of you suggested to deploy ceph-exporter but why do we need to do
> that when there is a plugin already ?
> >
> >
> > How can I troubleshoot this further ?
> >
> > nhandled exception from module 'prometheus' while running on mgr.mon01:
> error('No socket could be created',)
> > Aug 23 12:03:06 mon01 ceph-mgr: 2018-08-23 12:03:06.615 7fadab50e700 -1
> prometheus.serve:
> > Aug 23 12:03:06 mon01 ceph-mgr: 2018-08-23 12:03:06.615 7fadab50e700 -1
> Traceback (most recent call last):
> > Aug 23 12:03:06 mon01 ceph-mgr: File
> "/usr/lib64/ceph/mgr/prometheus/module.py", line 720, in serve
> > Aug 23 12:03:06 mon01 ceph-mgr: cherrypy.engine.start()
> > Aug 23 12:03:06 mon01 ceph-mgr: File
> "/usr/lib/python2.7/site-packages/cherrypy/process/wspbus.py", line 250, in
> start
> > Aug 23 12:03:06 mon01 ceph-mgr: raise e_info
> > Aug 23 12:03:06 mon01 ceph-mgr: ChannelFailures: error('No socket could
> be created',)
>
> The things I usually check if a process can't create a socket are:
>  - is 

Re: [ceph-users] ceph auto repair. What is wrong?

2018-08-24 Thread Fyodor Ustinov

Hi!

Did not help. :(

HEALTH_WARN 3 osds down; 1 host (3 osds) down; 1 rack (3 osds) down; 
Degraded data redundancy: 112 pgs undersized

OSD_DOWN 3 osds down
osd.24 (root=default,rack=R-26-7-1,host=S-26-7-1-1) is down
osd.25 (root=default,rack=R-26-7-1,host=S-26-7-1-1) is down
osd.26 (root=default,rack=R-26-7-1,host=S-26-7-1-1) is down
OSD_HOST_DOWN 1 host (3 osds) down
host S-26-7-1-1 (root=default,rack=R-26-7-1) (3 osds) is down
OSD_RACK_DOWN 1 rack (3 osds) down
rack R-26-7-1 (root=default) (3 osds) is down
PG_DEGRADED Degraded data redundancy: 112 pgs undersized
pg 2.0 is stuck undersized for 2466.145928, current state 
active+undersized, last acting [18,33]
pg 2.6 is stuck undersized for 2466.144061, current state 
active+undersized, last acting [15,18]
pg 2.1b is stuck undersized for 2466.143789, current state 
active+undersized, last acting [30,6]
pg 2.22 is stuck undersized for 2466.141138, current state 
active+undersized, last acting [15,21]

[]




[root@S-26-6-1-2 tmp]# ceph config dump
WHO   MASK LEVELOPTION VALUE RO
  mon  advanced mon_allow_pool_delete  true
  mon  advanced mon_osd_down_out_subtree_limit pod   *



On 08/24/18 17:12, Fyodor Ustinov wrote:

Hi!

I.e. I have to do
ceph config set mon mon_osd_down_out_subtree_limit row
and restart every mon?

On 08/24/18 12:44, Paul Emmerich wrote:

Ceph doesn't mark out whole racks by default, set
mon_osd_down_out_subtree_limit to something higher like row or pod.


Paul

2018-08-24 10:50 GMT+02:00 Christian Balzer :

Hello,

On Fri, 24 Aug 2018 11:30:34 +0300 (EEST) Fyodor Ustinov wrote:


Hi!

I wait about hour.


Aside from verifying those timeout values in your cluster, what's your
mon_osd_down_out_subtree_limit set to?

Christian


- Original Message -
From: "Wido den Hollander" 
To: "Fyodor Ustinov" , ceph-users@lists.ceph.com
Sent: Friday, 24 August, 2018 09:52:23
Subject: Re: [ceph-users] ceph auto repair. What is wrong?

On 08/24/2018 06:11 AM, Fyodor Ustinov wrote:

Hi!

I have fresh ceph cluster. 12 host and 3 osd on each host (one - 
hdd and two - ssd). Each host located in own rack.


I make such crush configuration on fresh ceph installation:

    sudo ceph osd crush add-bucket R-26-3-1 rack
    sudo ceph osd crush add-bucket R-26-3-2 rack
    sudo ceph osd crush add-bucket R-26-4-1 rack
    sudo ceph osd crush add-bucket R-26-4-2 rack
[...]
    sudo ceph osd crush add-bucket R-26-8-1 rack
    sudo ceph osd crush add-bucket R-26-8-2 rack

    sudo ceph osd crush move R-26-3-1 root=default
[...]
    sudo ceph osd crush move R-26-8-2 root=default

 sudo ceph osd crush move S-26-3-1-1 rack=R-26-3-1
[...]
 sudo ceph osd crush move S-26-8-2-1 rack=R-26-8-2

 sudo ceph osd crush rule create-replicated hddreplrule default 
rack hdd

 sudo ceph osd pool create rbd 256 256 replicated hddreplrule
 sudo ceph osd pool set rbd size 3
 sudo ceph osd pool set rbd min_size 2

osd tree look like:
ID  CLASS WEIGHT    TYPE NAME   STATUS REWEIGHT PRI-AFF
  -1   117.36346 root default
  -2 9.78029 rack R-26-3-1
-27 9.78029 host S-26-3-1-1
   0   hdd   9.32390 osd.0   up  1.0 1.0
   1   ssd   0.22820 osd.1   up  1.0 1.0
   2   ssd   0.22820 osd.2   up  1.0 1.0
  -3 9.78029 rack R-26-3-2
-43 9.78029 host S-26-3-2-1
   3   hdd   9.32390 osd.3   up  1.0 1.0
   4   ssd   0.22820 osd.4   up  1.0 1.0
   5   ssd   0.22820 osd.5   up  1.0 1.0
[...]


Now write some data to rbd pool and shutdown one node.
   cluster:
 id: 9000d700-8529-4d38-b9f5-24d6079429a2
 health: HEALTH_WARN
 3 osds down
 1 host (3 osds) down
 1 rack (3 osds) down
 Degraded data redundancy: 1223/12300 objects degraded 
(9.943%), 74 pgs degraded, 74 pgs undersized


And ceph does not try to repair pool. Why?


How long did you wait? The default timeout is 600 seconds before
recovery starts.

These OSDs are not marked as out yet.

Wido



WBR,
 Fyodor.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Christian Balzer    Network/Systems Engineer
ch...@gol.com   Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Clients report OSDs down/up (dmesg) nothing in Ceph logs (flapping OSDs)

2018-08-24 Thread Eugen Block

Update:
I changed the primary affinity of one OSD back to 1.0 to test if those  
metrics change, and indeed they do:

OSD.24 immediately shows values greater than 0.
I guess the metrics are completely unrelated to the flapping.

So the search goes on...


Zitat von Eugen Block :

An hour ago host5 started to report the OSDs on host4 as down (still  
no clue why), resulting in slow requests. This time no flapping  
occured, the cluster recovered a couple of minutes later. No other  
OSDs reported that, only those two on host5. There's nothing in the  
logs of the reporting or the affected OSDs.


Then I compared a perf dump of one healthy OSD with one on host4.  
There's something strange about the metrics (many of them are 0), I  
just can't tell if it's related to the fact that host4 has no  
primary OSDs. But even with no primary OSD I would expect different  
values for OSDs that are running for a week now.


---cut here---
host1:~ # diff -u perfdump.osd1 perfdump.osd24
--- perfdump.osd1   2018-08-23 11:03:03.695927316 +0200
+++ perfdump.osd24  2018-08-23 11:02:09.919927375 +0200
@@ -1,99 +1,99 @@
 {
 "osd": {
 "op_wip": 0,
-"op": 7878594,
-"op_in_bytes": 852767683202,
-"op_out_bytes": 1019871565411,
+"op": 0,
+"op_in_bytes": 0,
+"op_out_bytes": 0,
 "op_latency": {
-"avgcount": 7878594,
-"sum": 1018863.131206702,
-"avgtime": 0.129320425
+"avgcount": 0,
+"sum": 0.0,
+"avgtime": 0.0
 },
 "op_process_latency": {
-"avgcount": 7878594,
-"sum": 879970.400440694,
-"avgtime": 0.111691299
+"avgcount": 0,
+"sum": 0.0,
+"avgtime": 0.0
 },
 "op_prepare_latency": {
-"avgcount": 8321733,
-"sum": 41376.442963329,
-"avgtime": 0.004972094
+"avgcount": 0,
+"sum": 0.0,
+"avgtime": 0.0
 },
-"op_r": 3574792,
-"op_r_out_bytes": 1019871565411,
+"op_r": 0,
+"op_r_out_bytes": 0,
 "op_r_latency": {
-"avgcount": 3574792,
-"sum": 54750.502669010,
-"avgtime": 0.015315717
+"avgcount": 0,
+"sum": 0.0,
+"avgtime": 0.0
 },
 "op_r_process_latency": {
-"avgcount": 3574792,
-"sum": 34107.703579874,
-"avgtime": 0.009541171
+"avgcount": 0,
+"sum": 0.0,
+"avgtime": 0.0
 },
 "op_r_prepare_latency": {
-"avgcount": 3574817,
-"sum": 34262.515884817,
-"avgtime": 0.009584411
+"avgcount": 0,
+"sum": 0.0,
+"avgtime": 0.0
 },
-"op_w": 4249520,
-"op_w_in_bytes": 847518164870,
+"op_w": 0,
+"op_w_in_bytes": 0,
 "op_w_latency": {
-"avgcount": 4249520,
-"sum": 960898.540843217,
-"avgtime": 0.226119312
+"avgcount": 0,
+"sum": 0.0,
+"avgtime": 0.0
 },
 "op_w_process_latency": {
-"avgcount": 4249520,
-"sum": 844398.804808119,
-"avgtime": 0.198704513
+"avgcount": 0,
+"sum": 0.0,
+"avgtime": 0.0
 },
 "op_w_prepare_latency": {
-"avgcount": 4692618,
-"sum": 7032.358957948,
-"avgtime": 0.001498600
+"avgcount": 0,
+"sum": 0.0,
+"avgtime": 0.0
 },
-"op_rw": 54282,
-"op_rw_in_bytes": 5249518332,
+"op_rw": 0,
+"op_rw_in_bytes": 0,
 "op_rw_out_bytes": 0,
 "op_rw_latency": {
-"avgcount": 54282,
-"sum": 3214.087694475,
-"avgtime": 0.059210929
+"avgcount": 0,
+"sum": 0.0,
+"avgtime": 0.0
 },
 "op_rw_process_latency": {
-"avgcount": 54282,
-"sum": 1463.892052701,
-"avgtime": 0.026968277
+"avgcount": 0,
+"sum": 0.0,
+"avgtime": 0.0
 },
 "op_rw_prepare_latency": {
-"avgcount": 54298,
-"sum": 81.568120564,
-"avgtime": 0.001502230
+"avgcount": 0,
+"sum": 0.0,
+"avgtime": 0.0
 },
 "op_before_queue_op_lat": {
-"avgcount": 25469574,
-"sum": 6654.779033909,
-"avgtime": 0.000261283
+"avgcount": 4307123,
+"sum": 2361.603323307,
+"avgtime": 0.000548301
 },
 "op_before_dequeue_op_lat": {
-   

Re: [ceph-users] Reminder: bi-weekly dashboard sync call today (15:00 CET)

2018-08-24 Thread Ernesto Puerta
My mistake, Lenz. That recording is just the 7 minutes of fun before
everyone joined.

This is the proper one (~1 hour): https://bluejeans.com/s/HUofE

Ernesto

ERNESTO PUERTA

SENIOR SOFTWARE ENGINEER, CEPH R

Red Hat



On Fri, Aug 24, 2018 at 4:38 PM Lenz Grimmer  wrote:
>
> On 08/24/2018 02:00 PM, Lenz Grimmer wrote:
>
> > On 08/24/2018 10:59 AM, Lenz Grimmer wrote:
> >
> >> JFYI, the team working on the Ceph Manager Dashboard has a bi-weekly
> >> conference call that discusses the ongoing development and gives an
> >> update on recent improvements/features.
> >>
> >> Today, we plan to give a demo of the new dashboard landing page (See
> >> https://tracker.ceph.com/issues/24573 and
> >> https://github.com/ceph/ceph/pull/23568 for details) and the
> >> implementation of the "RBD trash" functionality in the UI
> >> (http://tracker.ceph.com/issues/24272 and
> >> https://github.com/ceph/ceph/pull/23351)
> >>
> >> The meeting takes places every second Friday at 15:00 CET at this URL:
> >>
> >>   https://bluejeans.com/150063190
> >
> > My apologies, I picked an incorrect meeting URL - this is the correct one:
> >
> >   https://bluejeans.com/470119167/
> >
> > Sorry for the confusion.
>
> Thanks to everyone who participated. We actually moved to yet another
> different BlueJeans session in order to be able to record it...
>
> For those of you who missed it, here's a recording:
>
> https://bluejeans.com/s/HXnam
>
> Have a nice weekend!
>
> Lenz
>
> --
> SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
> GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd + openshift cause cpu stuck now and then

2018-08-24 Thread Jeffrey Zhang
I am testing  openshift with ceph rbd, it works as expected. except that
sometimes the container which has a rbd volume start slowly.  And the load
on the node that containers running will pretty high, until following error
raise in dmesg.

After some google, i found one similar issue at[0]. seems it is a kernel
bug? But since i can not reproduce this issue steadily, so i wanna make
sure that does anyone could confirm this issue and fix it?

Btw, here is my env:

OS: centos 7.5
kernel: Linux ocm-74 3.10.0-862.6.3.el7.x86_64 #1 SMP Tue Jun 26 16:32:21
UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
ceph: ceph-12.2.5-0.el7.x86_64 from ceph offical repo
openshift: origin-3.9.0-1.el7.git.0.ba7faec.x86_64

[0] https://www.spinics.net/lists/ceph-users/msg46963.html

4381870.921579] device veth816c5e2f entered promiscuous mode
[4381899.771170] NMI watchdog: BUG: soft lockup - CPU#14 stuck for
23s! [mount:2760216]
[4381899.772326] Modules linked in: vfat fat isofs ip_vs fuse ext4
mbcache jbd2 rbd libceph dns_resolver cfg80211 rfkill udp_diag
unix_diag tcp_diag inet_diag veth nf_conntrack_netlink nfnetlink
xt_statistic xt_nat xt_recent ipt_REJECT nf_reject_ipv4 xt_mark
xt_comment ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat
xt_addrtype iptable_filter xt_conntrack br_netfilter bridge stp llc
overlay(T) scsi_transport_iscsi bonding vport_vxlan vxlan
ip6_udp_tunnel udp_tunnel openvswitch nf_conntrack_ipv6 nf_nat_ipv6
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat
nf_conntrack sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi
kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel
lrw gf128mul glue_helper ablk_helper cryptd sg ipmi_ssif joydev mei_me
mei iTCO_wdt iTCO_vendor_support
[4381899.772379]  pcspkr dcdbas ipmi_si ipmi_devintf ipmi_msghandler
shpchp lpc_ich acpi_pad acpi_power_meter wmi nfsd auth_rpcgss nfs_acl
lockd grace sunrpc ip_tables xfs sr_mod sd_mod cdrom mgag200
i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops ahci ttm libahci drm ixgbe libata crc32c_intel tg3
megaraid_sas i2c_core mdio dca ptp pps_core dm_mirror dm_region_hash
dm_log dm_snapshot target_core_user uio target_core_mod crc_t10dif
crct10dif_generic crct10dif_pclmul crct10dif_common dm_multipath
dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio dm_mod
libcrc32c
[4381899.772421] CPU: 14 PID: 2760216 Comm: mount Kdump: loaded
Tainted: GWL  T 3.10.0-862.6.3.el7.x86_64 #1
[4381899.772423] Hardware name: Dell Inc. PowerEdge R720/0DCWD1, BIOS
2.6.1 02/12/2018
[4381899.772426] task: 93917178dee0 ti: 93a1cf22c000 task.ti:
93a1cf22c000
[4381899.772428] RIP: 0010:[]  []
__call_rcu+0x98/0x2c0
[4381899.772440] RSP: 0018:93a1cf22fd30  EFLAGS: 0246
[4381899.772441] RAX: 02e07679 RBX: 939c3f9dbb80 RCX:
acd41e20
[4381899.772443] RDX: acc73000 RSI: 00014340 RDI:
0246
[4381899.772445] RBP: 93a1cf22fd58 R08:  R09:

[4381899.772446] R10: 939c3f9dbb80 R11: db1b021f3800 R12:
2a7c93a8
[4381899.772448] R13: 93a1cf22fd58 R14: 93a1cf22fcb0 R15:
938cc7ce208f
[4381899.772450] FS:  7fc6c57db880() GS:939c3f9c()
knlGS:
[4381899.772452] CS:  0010 DS:  ES:  CR0: 80050033
[4381899.772465] CR2: 7fc6c499c15c CR3: 00108440 CR4:
001607e0
[4381899.772467] Call Trace:
[4381899.772474]  [] call_rcu_sched+0x1d/0x20
[4381899.772479]  [] d_free+0x4f/0x70
[4381899.772481]  [] __dentry_kill+0x16a/0x180
[4381899.772483]  [] shrink_dentry_list+0xde/0x230
[4381899.772485]  [] shrink_dcache_sb+0x9a/0xe0
[4381899.772491]  [] do_remount_sb+0x51/0x200
[4381899.772496]  [] do_mount+0x757/0xce0
[4381899.772501]  [] ? memdup_user+0x42/0x70
[4381899.772503]  [] SyS_mount+0x83/0xd0
[4381899.772512]  [] system_call_fastpath+0x1c/0x21
[4381899.772513] Code: 3c cd a0 53 d3 ac 80 3d 66 eb 02 01 00 8b 87 70
01 00 00 0f 85 3a 01 00 00 80 3d 3f ba bc 00 00 0f 84 bd 01 00 00 4c
89 ef 57 9d <0f> 1f 44 00 00 48 83 c4 10 5b 41 5c 41 5d 5d c3 0f 1f 84
00 00
[4381899.984579] rbd: rbd0: encountered watch error: -107
[4381927.770763] NMI watchdog: BUG: soft lockup - CPU#14 stuck for
23s! [mount:2760216]
[4381927.771950] Modules linked in: vfat fat isofs ip_vs fuse ext4
mbcache jbd2 rbd libceph dns_resolver cfg80211 rfkill udp_diag
unix_diag tcp_diag inet_diag veth nf_conntrack_netlink nfnetlink
xt_statistic xt_nat xt_recent ipt_REJECT nf_reject_ipv4 xt_mark
xt_comment ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat
xt_addrtype iptable_filter xt_conntrack br_netfilter bridge stp llc
overlay(T) scsi_transport_iscsi bonding vport_vxlan vxlan
ip6_udp_tunnel udp_tunnel openvswitch nf_conntrack_ipv6 nf_nat_ipv6
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat
nf_conntrack sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi
kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel
lrw gf128mul 

Re: [ceph-users] Reminder: bi-weekly dashboard sync call today (15:00 CET)

2018-08-24 Thread Lenz Grimmer
On 08/24/2018 02:00 PM, Lenz Grimmer wrote:

> On 08/24/2018 10:59 AM, Lenz Grimmer wrote:
> 
>> JFYI, the team working on the Ceph Manager Dashboard has a bi-weekly
>> conference call that discusses the ongoing development and gives an
>> update on recent improvements/features.
>>
>> Today, we plan to give a demo of the new dashboard landing page (See
>> https://tracker.ceph.com/issues/24573 and
>> https://github.com/ceph/ceph/pull/23568 for details) and the
>> implementation of the "RBD trash" functionality in the UI
>> (http://tracker.ceph.com/issues/24272 and
>> https://github.com/ceph/ceph/pull/23351)
>>
>> The meeting takes places every second Friday at 15:00 CET at this URL:
>>
>>   https://bluejeans.com/150063190
> 
> My apologies, I picked an incorrect meeting URL - this is the correct one:
> 
>   https://bluejeans.com/470119167/
> 
> Sorry for the confusion.

Thanks to everyone who participated. We actually moved to yet another
different BlueJeans session in order to be able to record it...

For those of you who missed it, here's a recording:

https://bluejeans.com/s/HXnam

Have a nice weekend!

Lenz

-- 
SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrating from pre-luminous multi-root crush hierachy

2018-08-24 Thread Paul Emmerich
The proper way would be to do this change atomically by adjusting the
crush hierarchy and rules at the same time by editing and setting the
crush map manually.


Paul

2018-08-24 9:40 GMT+02:00 Konstantin Shalygin :
> On 08/24/2018 01:57 PM, Buchberger, Carsten wrote:
>>
>> Hi Konstantin,
>>
>> sounds easy;-)  If i apply the new rule to the existing pools there won't
>> be any osds to satisfy the requirements of the rule - because the osds are
>> not in the new root yet.
>> Isn't that a problem ?
>>
>> Thank you
>
>
> Your IO will stall.
> You need fast move osds to new root. Make list of commands `ceph osd crush
> move  host=` and paste it after apply crush rule.
>
>
>
>
>
> k
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph auto repair. What is wrong?

2018-08-24 Thread Fyodor Ustinov

Hi!

I.e. I have to do
ceph config set mon mon_osd_down_out_subtree_limit row
and restart every mon?

On 08/24/18 12:44, Paul Emmerich wrote:

Ceph doesn't mark out whole racks by default, set
mon_osd_down_out_subtree_limit to something higher like row or pod.


Paul

2018-08-24 10:50 GMT+02:00 Christian Balzer :

Hello,

On Fri, 24 Aug 2018 11:30:34 +0300 (EEST) Fyodor Ustinov wrote:


Hi!

I wait about hour.


Aside from verifying those timeout values in your cluster, what's your
mon_osd_down_out_subtree_limit set to?

Christian


- Original Message -
From: "Wido den Hollander" 
To: "Fyodor Ustinov" , ceph-users@lists.ceph.com
Sent: Friday, 24 August, 2018 09:52:23
Subject: Re: [ceph-users] ceph auto repair. What is wrong?

On 08/24/2018 06:11 AM, Fyodor Ustinov wrote:

Hi!

I have fresh ceph cluster. 12 host and 3 osd on each host (one - hdd and two - 
ssd). Each host located in own rack.

I make such crush configuration on fresh ceph installation:

sudo ceph osd crush add-bucket R-26-3-1 rack
sudo ceph osd crush add-bucket R-26-3-2 rack
sudo ceph osd crush add-bucket R-26-4-1 rack
sudo ceph osd crush add-bucket R-26-4-2 rack
[...]
sudo ceph osd crush add-bucket R-26-8-1 rack
sudo ceph osd crush add-bucket R-26-8-2 rack

sudo ceph osd crush move R-26-3-1 root=default
[...]
sudo ceph osd crush move R-26-8-2 root=default

 sudo ceph osd crush move S-26-3-1-1 rack=R-26-3-1
[...]
 sudo ceph osd crush move S-26-8-2-1 rack=R-26-8-2

 sudo ceph osd crush rule create-replicated hddreplrule default rack hdd
 sudo ceph osd pool create rbd 256 256 replicated hddreplrule
 sudo ceph osd pool set rbd size 3
 sudo ceph osd pool set rbd min_size 2

osd tree look like:
ID  CLASS WEIGHTTYPE NAME   STATUS REWEIGHT PRI-AFF
  -1   117.36346 root default
  -2 9.78029 rack R-26-3-1
-27 9.78029 host S-26-3-1-1
   0   hdd   9.32390 osd.0   up  1.0 1.0
   1   ssd   0.22820 osd.1   up  1.0 1.0
   2   ssd   0.22820 osd.2   up  1.0 1.0
  -3 9.78029 rack R-26-3-2
-43 9.78029 host S-26-3-2-1
   3   hdd   9.32390 osd.3   up  1.0 1.0
   4   ssd   0.22820 osd.4   up  1.0 1.0
   5   ssd   0.22820 osd.5   up  1.0 1.0
[...]


Now write some data to rbd pool and shutdown one node.
   cluster:
 id: 9000d700-8529-4d38-b9f5-24d6079429a2
 health: HEALTH_WARN
 3 osds down
 1 host (3 osds) down
 1 rack (3 osds) down
 Degraded data redundancy: 1223/12300 objects degraded (9.943%), 74 
pgs degraded, 74 pgs undersized

And ceph does not try to repair pool. Why?


How long did you wait? The default timeout is 600 seconds before
recovery starts.

These OSDs are not marked as out yet.

Wido



WBR,
 Fyodor.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw: need couple of blind (indexless) buckets, how-to?

2018-08-24 Thread Casey Bodley



On 08/24/2018 06:44 AM, Konstantin Shalygin wrote:


Answer to myself.

radosgw-admin realm create --rgw-realm=default --default
radosgw-admin zonegroup modify --rgw-zonegroup=default --rgw-realm=default
radosgw-admin period update --commit
radosgw-admin zonegroup placement add --rgw-zonegroup="default" \
  --placement-id="indexless-placement"
radosgw-admin zonegroup placement default 
--placement-id="default-placement"

radosgw-admin period update --commit
radosgw-admin zone placement add --rgw-zone="default" \
  --placement-id="indexless-placement" \
  --data-pool="default.rgw.buckets.data" \
  --index-pool="default.rgw.buckets.index" \
  --data_extra_pool="default.rgw.buckets.non-ec" \
  --placement-index-type="indexless"


Restart rgw instances and now is possible to create indexless buckets:

s3cmd mb s3://blindbucket --region=:indexless-placement


The documentation of Object Storage Gateway worse that for rbd or 
cephfs and have outdated (removed year ago) strings.


http://tracker.ceph.com/issues/18082

http://tracker.ceph.com/issues/24508

http://tracker.ceph.com/issues/8073

Hope this post will help somebody in future.



k



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Thank you very much! If anyone would like to help update these docs, I 
would be happy to help with guidance/review.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW pools don't show up in luminous

2018-08-24 Thread Casey Bodley



On 08/23/2018 01:22 PM, Robert Stanford wrote:


 I installed a new Ceph cluster with Luminous, after a long time 
working with Jewel.  I created my RGW pools the same as always (pool 
create default.rgw.buckets.data etc.), but they don't show up in ceph 
df with Luminous.  Has the command changed?


 Thanks
 R



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Hi Robert,

Do you have a ceph-mgr running? I believe the accounting for 'ceph df' 
is performed by ceph-mgr in Luminous and beyond, rather than ceph-mon.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client hangs

2018-08-24 Thread Zhenshi Zhou
I'm afaid that the client hangs again...the log shows:

2018-08-24 21:27:54.714334 [WRN]  slow request 62.607608 seconds old,
received at 2018-08-24 21:26:52.106633: client_request(client.213528:241811
getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:26:52.106425 caller_uid=0,
caller_gid=0{}) currently failed to rdlock, waiting
2018-08-24 21:27:54.714320 [WRN]  3 slow requests, 1 included below; oldest
blocked for > 843.556758 secs
2018-08-24 21:27:24.713740 [WRN]  slow request 32.606979 seconds old,
received at 2018-08-24 21:26:52.106633: client_request(client.213528:241811
getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:26:52.106425 caller_uid=0,
caller_gid=0{}) currently failed to rdlock, waiting
2018-08-24 21:27:24.713729 [WRN]  3 slow requests, 1 included below; oldest
blocked for > 813.556129 secs
2018-08-24 21:25:49.711778 [WRN]  slow request 483.807963 seconds old,
received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810
getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:17:45.903049 caller_uid=0,
caller_gid=0{}) currently failed to rdlock, waiting
2018-08-24 21:25:49.711766 [WRN]  2 slow requests, 1 included below; oldest
blocked for > 718.554206 secs
2018-08-24 21:21:54.707536 [WRN]  client.213528 isn't responding to
mclientcaps(revoke), ino 0x12e7e5a pending pAsLsXsFr issued
pAsLsXsFscr, sent 483.548912 seconds ago
2018-08-24 21:21:54.706930 [WRN]  slow request 483.549363 seconds old,
received at 2018-08-24 21:13:51.157483: client_request(client.267792:649065
setattr size=0 mtime=2018-08-24 21:13:51.163236 #0x12e7e5a 2018-08-24
21:13:51.163236 caller_uid=0, caller_gid=0{}) currently failed to xlock,
waiting
2018-08-24 21:21:54.706920 [WRN]  2 slow requests, 1 included below; oldest
blocked for > 483.549363 secs
2018-08-24 21:21:49.706838 [WRN]  slow request 243.803027 seconds old,
received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810
getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:17:45.903049 caller_uid=0,
caller_gid=0{}) currently failed to rdlock, waiting
2018-08-24 21:21:49.706828 [WRN]  2 slow requests, 1 included below; oldest
blocked for > 478.549269 secs
2018-08-24 21:19:49.704294 [WRN]  slow request 123.800486 seconds old,
received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810
getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:17:45.903049 caller_uid=0,
caller_gid=0{}) currently failed to rdlock, waiting
2018-08-24 21:19:49.704284 [WRN]  2 slow requests, 1 included below; oldest
blocked for > 358.546729 secs
2018-08-24 21:18:49.703073 [WRN]  slow request 63.799269 seconds old,
received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810
getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:17:45.903049 caller_uid=0,
caller_gid=0{}) currently failed to rdlock, waiting
2018-08-24 21:18:49.703062 [WRN]  2 slow requests, 1 included below; oldest
blocked for > 298.545511 secs
2018-08-24 21:18:19.702465 [WRN]  slow request 33.798637 seconds old,
received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810
getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:17:45.903049 caller_uid=0,
caller_gid=0{}) currently failed to rdlock, waiting
2018-08-24 21:18:19.702456 [WRN]  2 slow requests, 1 included below; oldest
blocked for > 268.544880 secs
2018-08-24 21:17:54.702517 [WRN]  client.213528 isn't responding to
mclientcaps(revoke), ino 0x12e7e5a pending pAsLsXsFr issued
pAsLsXsFscr, sent 243.543893 seconds ago
2018-08-24 21:17:54.701904 [WRN]  slow request 243.544331 seconds old,
received at 2018-08-24 21:13:51.157483: client_request(client.267792:649065
setattr size=0 mtime=2018-08-24 21:13:51.163236 #0x12e7e5a 2018-08-24
21:13:51.163236 caller_uid=0, caller_gid=0{}) currently failed to xlock,
waiting
2018-08-24 21:17:54.701894 [WRN]  1 slow requests, 1 included below; oldest
blocked for > 243.544331 secs
2018-08-24 21:15:54.700034 [WRN]  client.213528 isn't responding to
mclientcaps(revoke), ino 0x12e7e5a pending pAsLsXsFr issued
pAsLsXsFscr, sent 123.541410 seconds ago
2018-08-24 21:15:54.699385 [WRN]  slow request 123.541822 seconds old,
received at 2018-08-24 21:13:51.157483: client_request(client.267792:649065
setattr size=0 mtime=2018-08-24 21:13:51.163236 #0x12e7e5a 2018-08-24
21:13:51.163236 caller_uid=0, caller_gid=0{}) currently failed to xlock,
waiting
2018-08-24 21:15:54.699375 [WRN]  1 slow requests, 1 included below; oldest
blocked for > 123.541822 secs
2018-08-24 21:14:57.055183 [WRN]  Health check failed: 1 clients failing to
respond to capability release (MDS_CLIENT_LATE_RELEASE)
2018-08-24 21:14:56.167868 [WRN]  MDS health message (mds.0): Client
docker39 failing to respond to capability release
2018-08-24 21:14:54.698753 [WRN]  client.213528 isn't responding to
mclientcaps(revoke), ino 0x12e7e5a pending pAsLsXsFr issued
pAsLsXsFscr, sent 63.540127 seconds ago
2018-08-24 21:14:54.698104 [WRN]  slow request 63.540533 seconds old,
received at 2018-08-24 21:13:51.157483: 

Re: [ceph-users] Reminder: bi-weekly dashboard sync call today (15:00 CET)

2018-08-24 Thread Lenz Grimmer
On 08/24/2018 10:59 AM, Lenz Grimmer wrote:

> JFYI, the team working on the Ceph Manager Dashboard has a bi-weekly
> conference call that discusses the ongoing development and gives an
> update on recent improvements/features.
> 
> Today, we plan to give a demo of the new dashboard landing page (See
> https://tracker.ceph.com/issues/24573 and
> https://github.com/ceph/ceph/pull/23568 for details) and the
> implementation of the "RBD trash" functionality in the UI
> (http://tracker.ceph.com/issues/24272 and
> https://github.com/ceph/ceph/pull/23351)
> 
> The meeting takes places every second Friday at 15:00 CET at this URL:
> 
>   https://bluejeans.com/150063190

My apologies, I picked an incorrect meeting URL - this is the correct one:

  https://bluejeans.com/470119167/

Sorry for the confusion.

Lenz

-- 
SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw: need couple of blind (indexless) buckets, how-to?

2018-08-24 Thread Konstantin Shalygin

Answer to myself.

radosgw-admin realm create --rgw-realm=default --default
radosgw-admin zonegroup modify --rgw-zonegroup=default --rgw-realm=default
radosgw-admin period update --commit
radosgw-admin zonegroup placement add --rgw-zonegroup="default" \
  --placement-id="indexless-placement"
radosgw-admin zonegroup placement default --placement-id="default-placement"
radosgw-admin period update --commit
radosgw-admin zone placement add --rgw-zone="default" \
  --placement-id="indexless-placement" \
  --data-pool="default.rgw.buckets.data" \
  --index-pool="default.rgw.buckets.index" \
  --data_extra_pool="default.rgw.buckets.non-ec" \
  --placement-index-type="indexless"


Restart rgw instances and now is possible to create indexless buckets:

s3cmd mb s3://blindbucket --region=:indexless-placement


The documentation of Object Storage Gateway worse that for rbd or cephfs 
and have outdated (removed year ago) strings.


http://tracker.ceph.com/issues/18082

http://tracker.ceph.com/issues/24508

http://tracker.ceph.com/issues/8073

Hope this post will help somebody in future.



k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stability Issue with 52 OSD hosts

2018-08-24 Thread Andras Pataki
We pin half the OSDs to each socket (and to the corresponding memory).  
Since the disk controller and the network card is connected only to one 
socket, this still probably produces quite a bit of QPI traffic.
It is also worth investigating how the network does under high load.  We 
did run into problems where 40Gbps cards dropped packets heavily under load.


Andras


On 08/24/2018 05:16 AM, Marc Roos wrote:
  
Can this be related to numa issues? I have also dual processor nodes,

and was wondering if there is some guide on how to optimize for numa.




-Original Message-
From: Tyler Bishop [mailto:tyler.bis...@beyondhosting.net]
Sent: vrijdag 24 augustus 2018 3:11
To: Andras Pataki
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Stability Issue with 52 OSD hosts

Thanks for the info. I was investigating bluestore as well.  My host
dont go unresponsive but I do see parallel io slow down.

On Thu, Aug 23, 2018, 8:02 PM Andras Pataki
 wrote:


We are also running some fairly dense nodes with CentOS 7.4 and ran
into
similar problems.  The nodes ran filestore OSDs (Jewel, then
Luminous).
Sometimes a node would be so unresponsive that one couldn't even
ssh to
it (even though the root disk was a physically separate drive on a
separate controller from the OSD drives).  Often these would
coincide
with kernel stack traces about hung tasks. Initially we did blame
high
load, etc. from all the OSDs.

But then we benchmarked the nodes independently of ceph (with
iozone and
such) and noticed problems there too.  When we started a few dozen
iozone processes on separate JBOD drives with xfs, some didn't even

start and write a single byte for minutes.  The conclusion we came
to
was that there is some interference among a lot of mounted xfs file

systems in the Red Hat 3.10 kernels.  Some kind of central lock
that
prevents dozens of xfs file systems from running in parallel.  When
we
do I/O directly to raw devices in parallel, we saw no problems (no
high
loads, etc.).  So we built a newer kernel, and the situation got
better.  4.4 is already much better, nowadays we are testing moving
to 4.14.

Also, migrating to bluestore significantly reduced the load on
these
nodes too.  At busy times, the filestore host loads were 20-30,
even
higher (on a 28 core node), while the bluestore nodes hummed along
at a
lot of perhaps 6 or 8.  This also confirms that somehow lots of xfs

mounts don't work in parallel.

Andras


On 08/23/2018 03:24 PM, Tyler Bishop wrote:
> Yes I've reviewed all the logs from monitor and host.   I am not
> getting useful errors (or any) in dmesg or general messages.
>
> I have 2 ceph clusters, the other cluster is 300 SSD and i never
have
> issues like this.   That's why Im looking for help.
>
> On Thu, Aug 23, 2018 at 3:22 PM Alex Gorbachev
 wrote:
>> On Wed, Aug 22, 2018 at 11:39 PM Tyler Bishop
>>  wrote:
>>> During high load testing I'm only seeing user and sys cpu load
around 60%... my load doesn't seem to be anything crazy on the host and
iowait stays between 6 and 10%.  I have very good `ceph osd perf`
numbers too.
>>>
>>> I am using 10.2.11 Jewel.
>>>
>>>
>>> On Wed, Aug 22, 2018 at 11:30 PM Christian Balzer
 wrote:
 Hello,

 On Wed, 22 Aug 2018 23:00:24 -0400 Tyler Bishop wrote:

> Hi,   I've been fighting to get good stability on my cluster
for about
> 3 weeks now.  I am running into intermittent issues with OSD
flapping
> marking other OSD down then going back to a stable state for
hours and
> days.
>
> The cluster is 4x Cisco UCS S3260 with dual E5-2660, 256GB
ram, 40G
> Network to 40G Brocade VDX Switches.  The OSD are 6TB HGST
SAS drives
> with 400GB HGST SAS 12G SSDs.   My configuration is 4
journals per
> host with 12 disk per journal for a total of 56 disk per
system and 52
> OSD.
>
 Any denser and you'd have a storage black hole.

 You already pointed your finger in the (or at least one) right
direction
 and everybody will agree that this setup is woefully
underpowered in the
 CPU department.

> I am using CentOS 7 with kernel 3.10 and the redhat tuned-adm
profile
> for throughput-performance enabled.
>
 Ceph version would be interesting as well...

> I have these sysctls set:
>
> kernel.pid_max = 4194303
> fs.file-max = 6553600
> vm.swappiness = 0
> 

Re: [ceph-users] ceph auto repair. What is wrong?

2018-08-24 Thread Paul Emmerich
Ceph doesn't mark out whole racks by default, set
mon_osd_down_out_subtree_limit to something higher like row or pod.


Paul

2018-08-24 10:50 GMT+02:00 Christian Balzer :
> Hello,
>
> On Fri, 24 Aug 2018 11:30:34 +0300 (EEST) Fyodor Ustinov wrote:
>
>> Hi!
>>
>> I wait about hour.
>>
> Aside from verifying those timeout values in your cluster, what's your
> mon_osd_down_out_subtree_limit set to?
>
> Christian
>
>> - Original Message -
>> From: "Wido den Hollander" 
>> To: "Fyodor Ustinov" , ceph-users@lists.ceph.com
>> Sent: Friday, 24 August, 2018 09:52:23
>> Subject: Re: [ceph-users] ceph auto repair. What is wrong?
>>
>> On 08/24/2018 06:11 AM, Fyodor Ustinov wrote:
>> > Hi!
>> >
>> > I have fresh ceph cluster. 12 host and 3 osd on each host (one - hdd and 
>> > two - ssd). Each host located in own rack.
>> >
>> > I make such crush configuration on fresh ceph installation:
>> >
>> >sudo ceph osd crush add-bucket R-26-3-1 rack
>> >sudo ceph osd crush add-bucket R-26-3-2 rack
>> >sudo ceph osd crush add-bucket R-26-4-1 rack
>> >sudo ceph osd crush add-bucket R-26-4-2 rack
>> > [...]
>> >sudo ceph osd crush add-bucket R-26-8-1 rack
>> >sudo ceph osd crush add-bucket R-26-8-2 rack
>> >
>> >sudo ceph osd crush move R-26-3-1 root=default
>> > [...]
>> >sudo ceph osd crush move R-26-8-2 root=default
>> >
>> > sudo ceph osd crush move S-26-3-1-1 rack=R-26-3-1
>> > [...]
>> > sudo ceph osd crush move S-26-8-2-1 rack=R-26-8-2
>> >
>> > sudo ceph osd crush rule create-replicated hddreplrule default rack hdd
>> > sudo ceph osd pool create rbd 256 256 replicated hddreplrule
>> > sudo ceph osd pool set rbd size 3
>> > sudo ceph osd pool set rbd min_size 2
>> >
>> > osd tree look like:
>> > ID  CLASS WEIGHTTYPE NAME   STATUS REWEIGHT PRI-AFF
>> >  -1   117.36346 root default
>> >  -2 9.78029 rack R-26-3-1
>> > -27 9.78029 host S-26-3-1-1
>> >   0   hdd   9.32390 osd.0   up  1.0 1.0
>> >   1   ssd   0.22820 osd.1   up  1.0 1.0
>> >   2   ssd   0.22820 osd.2   up  1.0 1.0
>> >  -3 9.78029 rack R-26-3-2
>> > -43 9.78029 host S-26-3-2-1
>> >   3   hdd   9.32390 osd.3   up  1.0 1.0
>> >   4   ssd   0.22820 osd.4   up  1.0 1.0
>> >   5   ssd   0.22820 osd.5   up  1.0 1.0
>> > [...]
>> >
>> >
>> > Now write some data to rbd pool and shutdown one node.
>> >   cluster:
>> > id: 9000d700-8529-4d38-b9f5-24d6079429a2
>> > health: HEALTH_WARN
>> > 3 osds down
>> > 1 host (3 osds) down
>> > 1 rack (3 osds) down
>> > Degraded data redundancy: 1223/12300 objects degraded 
>> > (9.943%), 74 pgs degraded, 74 pgs undersized
>> >
>> > And ceph does not try to repair pool. Why?
>>
>> How long did you wait? The default timeout is 600 seconds before
>> recovery starts.
>>
>> These OSDs are not marked as out yet.
>>
>> Wido
>>
>> >
>> > WBR,
>> > Fyodor.
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Rakuten Communications
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stability Issue with 52 OSD hosts

2018-08-24 Thread Marc Roos
 
Can this be related to numa issues? I have also dual processor nodes, 
and was wondering if there is some guide on how to optimize for numa. 




-Original Message-
From: Tyler Bishop [mailto:tyler.bis...@beyondhosting.net] 
Sent: vrijdag 24 augustus 2018 3:11
To: Andras Pataki
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Stability Issue with 52 OSD hosts

Thanks for the info. I was investigating bluestore as well.  My host 
dont go unresponsive but I do see parallel io slow down.

On Thu, Aug 23, 2018, 8:02 PM Andras Pataki 
 wrote:


We are also running some fairly dense nodes with CentOS 7.4 and ran 
into 
similar problems.  The nodes ran filestore OSDs (Jewel, then 
Luminous).  
Sometimes a node would be so unresponsive that one couldn't even 
ssh to 
it (even though the root disk was a physically separate drive on a 
separate controller from the OSD drives).  Often these would 
coincide 
with kernel stack traces about hung tasks. Initially we did blame 
high 
load, etc. from all the OSDs.

But then we benchmarked the nodes independently of ceph (with 
iozone and 
such) and noticed problems there too.  When we started a few dozen 
iozone processes on separate JBOD drives with xfs, some didn't even 

start and write a single byte for minutes.  The conclusion we came 
to 
was that there is some interference among a lot of mounted xfs file 

systems in the Red Hat 3.10 kernels.  Some kind of central lock 
that 
prevents dozens of xfs file systems from running in parallel.  When 
we 
do I/O directly to raw devices in parallel, we saw no problems (no 
high 
loads, etc.).  So we built a newer kernel, and the situation got 
better.  4.4 is already much better, nowadays we are testing moving 
to 4.14.

Also, migrating to bluestore significantly reduced the load on 
these 
nodes too.  At busy times, the filestore host loads were 20-30, 
even 
higher (on a 28 core node), while the bluestore nodes hummed along 
at a 
lot of perhaps 6 or 8.  This also confirms that somehow lots of xfs 

mounts don't work in parallel.

Andras


On 08/23/2018 03:24 PM, Tyler Bishop wrote:
> Yes I've reviewed all the logs from monitor and host.   I am not
> getting useful errors (or any) in dmesg or general messages.
>
> I have 2 ceph clusters, the other cluster is 300 SSD and i never 
have
> issues like this.   That's why Im looking for help.
>
> On Thu, Aug 23, 2018 at 3:22 PM Alex Gorbachev 
 wrote:
>> On Wed, Aug 22, 2018 at 11:39 PM Tyler Bishop
>>  wrote:
>>> During high load testing I'm only seeing user and sys cpu load 
around 60%... my load doesn't seem to be anything crazy on the host and 
iowait stays between 6 and 10%.  I have very good `ceph osd perf` 
numbers too.
>>>
>>> I am using 10.2.11 Jewel.
>>>
>>>
>>> On Wed, Aug 22, 2018 at 11:30 PM Christian Balzer 
 wrote:
 Hello,

 On Wed, 22 Aug 2018 23:00:24 -0400 Tyler Bishop wrote:

> Hi,   I've been fighting to get good stability on my cluster 
for about
> 3 weeks now.  I am running into intermittent issues with OSD 
flapping
> marking other OSD down then going back to a stable state for 
hours and
> days.
>
> The cluster is 4x Cisco UCS S3260 with dual E5-2660, 256GB 
ram, 40G
> Network to 40G Brocade VDX Switches.  The OSD are 6TB HGST 
SAS drives
> with 400GB HGST SAS 12G SSDs.   My configuration is 4 
journals per
> host with 12 disk per journal for a total of 56 disk per 
system and 52
> OSD.
>
 Any denser and you'd have a storage black hole.

 You already pointed your finger in the (or at least one) right 
direction
 and everybody will agree that this setup is woefully 
underpowered in the
 CPU department.

> I am using CentOS 7 with kernel 3.10 and the redhat tuned-adm 
profile
> for throughput-performance enabled.
>
 Ceph version would be interesting as well...

> I have these sysctls set:
>
> kernel.pid_max = 4194303
> fs.file-max = 6553600
> vm.swappiness = 0
> vm.vfs_cache_pressure = 50
> vm.min_free_kbytes = 3145728
>
> I feel like my issue is directly related to the high number 
of OSD per
> host but I'm not sure what issue I'm really running into.   I 
believe
> that I have ruled out network issues, i am able to get 38Gbit
> consistently via 

Re: [ceph-users] Ceph RGW Index Sharding In Jewel

2018-08-24 Thread Alexandru Cucu
You should probably have a look at ceph-ansible as it has a
"take-over-existing-cluster" playbook. I think versions older than 2.0
support Ceph versions older than Jewel.

---
Alex Cucu

On Fri, Aug 24, 2018 at 4:31 AM Russell Holloway
 wrote:
>
> Thanks. Unfortunately even my version of hammer is too old on 0.94.5. I think 
> my only route to address this issue is to figure out the upgrade, at the very 
> least to 0.94.10. The biggest issue again is the deployment tool originally 
> used is set on 0.94.5 and pretty convoluted and no longer receiving updates, 
> but this isn't a ceph issue.
>
> -Russ
>
>
> 
> From: David Turner 
> Sent: Wednesday, August 22, 2018 11:48 PM
> To: Russell Holloway
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Ceph RGW Index Sharding In Jewel
>
> The release notes for 0.94.10 mention the introduction of the `radosgw-admin 
> bucket reshard` command. Redhat [1] documentation for their Enterprise 
> version of Jewel goes into detail for the procedure. You can also search the 
> ML archives for the command to find several conversations about the process 
> as well as problems.  Make sure that the procedure works on a test bucket for 
> Hammer before attempting it on your 12M object bucket.
>
>
> [1] 
> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html/object_gateway_guide_for_ubuntu/administration_cli#rados_gateway_user_management
>
>
> On Wed, Aug 22, 2018, 9:23 PM Russell Holloway  
> wrote:
>
> Did I say Jewel? I was too hopeful. I meant hammer. This particular cluster 
> is hammer :(
>
>
> -Russ
>
> 
> From: ceph-users  on behalf of Russell 
> Holloway 
> Sent: Wednesday, August 22, 2018 8:49:19 PM
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] Ceph RGW Index Sharding In Jewel
>
>
> So, I've finally journeyed deeper into the depths of ceph and discovered a 
> grand mistake that is likely the root cause of many woeful nights of blocked 
> requests. To start off, I'm running jewel, and I know that is dated and I 
> need to upgrade (if anyone knows if this is a seamless upgrade even though 
> several major versions behind, do let me know.
>
>
> My current issue is due to a rgw bucket index. I have just discovered I have 
> a bucket with about 12M objects in it. Sharding is not enabled on it. And 
> it's on a spinning disk, not SSD (journal is SSD though, so it could be 
> worse?). A bad combination as I just learned. From my recent understanding, 
> in jewel I could maybe update the rgw region to set max shards for buckets, 
> but it also sounds like this may or may not affect my existing bucket. 
> Furthermore, somewhere I saw mention that prior to luminous, resharding 
> needed to be done offline. I haven't found any documentation on this process 
> though. There is some mention around putting bucket indexes on SSD for 
> performance and latency reasons, which sounds great, but I get the feeling if 
> I modified crush map and tried to get the index pool on SSDs, and tried to 
> start moving things around involving this PG, it will fail in the same way I 
> can't even do a deep scrub on the PG.
>
>
> Does anyone have a good reference on how I could begin to clean this bucket 
> up or get it sharded while on jewel? Again, it sounds like in Luminous it may 
> just start resharding itself and fix itself right up, but I feel going to 
> luminous will require more work and testing (mostly due to my original 
> deployment tool Fuel 8 for openstack, bound to jewel, and no easy upgrade 
> path for fuel...I'll have to sort out how to transition away from that while 
> maintaining my existing nodes)
>
>
> The core issue was identified when I took finer grained control over deep 
> scrubs and trigger them manually. I eventually found out I could trigger my 
> entire ceph cluster to hang by triggering a deep scrub on a single PG, which 
> happens to be the one hosting this index. The OSD hosting it basically 
> becomes unresponsive for a very long time and begins blocking a lot of other 
> requests affecting all sorts of VMs using rbd. I could simply not deep scrub 
> this PG (ceph ends up marking OSD as down and deep scrub seems to fail, never 
> completes, and about 30 minutes after hung requests, cluster eventually 
> recovers), but I know I need to address this bucket sizing issue and then try 
> to work on upgrading ceph.
>
>
> Is it doable? For what it's worth, I tried to list the keys in ceph with 
> rados and that also hung requests. I'm not quite sure how to break the bucket 
> up at a software level especially if I cannot list the contents, so I hope 
> within ceph there is some route forward here...
>
>
> Thanks a bunch in advance for helping a naive ceph operator.
>
>
> -Russ
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> 

[ceph-users] Reminder: bi-weekly dashboard sync call today (15:00 CET)

2018-08-24 Thread Lenz Grimmer
Hi all,

JFYI, the team working on the Ceph Manager Dashboard has a bi-weekly
conference call that discusses the ongoing development and gives an
update on recent improvements/features.

Today, we plan to give a demo of the new dashboard landing page (See
https://tracker.ceph.com/issues/24573 and
https://github.com/ceph/ceph/pull/23568 for details) and the
implementation of the "RBD trash" functionality in the UI
(http://tracker.ceph.com/issues/24272 and
https://github.com/ceph/ceph/pull/23351)

The meeting takes places every second Friday at 15:00 CET at this URL:

  https://bluejeans.com/150063190

See you there!

Lenz

-- 
SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph auto repair. What is wrong?

2018-08-24 Thread Christian Balzer
Hello,

On Fri, 24 Aug 2018 11:30:34 +0300 (EEST) Fyodor Ustinov wrote:

> Hi!
> 
> I wait about hour.
>
Aside from verifying those timeout values in your cluster, what's your 
mon_osd_down_out_subtree_limit set to?

Christian
 
> - Original Message -
> From: "Wido den Hollander" 
> To: "Fyodor Ustinov" , ceph-users@lists.ceph.com
> Sent: Friday, 24 August, 2018 09:52:23
> Subject: Re: [ceph-users] ceph auto repair. What is wrong?
> 
> On 08/24/2018 06:11 AM, Fyodor Ustinov wrote:
> > Hi!
> > 
> > I have fresh ceph cluster. 12 host and 3 osd on each host (one - hdd and 
> > two - ssd). Each host located in own rack.
> > 
> > I make such crush configuration on fresh ceph installation:
> > 
> >sudo ceph osd crush add-bucket R-26-3-1 rack
> >sudo ceph osd crush add-bucket R-26-3-2 rack
> >sudo ceph osd crush add-bucket R-26-4-1 rack
> >sudo ceph osd crush add-bucket R-26-4-2 rack
> > [...]
> >sudo ceph osd crush add-bucket R-26-8-1 rack
> >sudo ceph osd crush add-bucket R-26-8-2 rack
> > 
> >sudo ceph osd crush move R-26-3-1 root=default
> > [...]
> >sudo ceph osd crush move R-26-8-2 root=default
> > 
> > sudo ceph osd crush move S-26-3-1-1 rack=R-26-3-1
> > [...]
> > sudo ceph osd crush move S-26-8-2-1 rack=R-26-8-2
> > 
> > sudo ceph osd crush rule create-replicated hddreplrule default rack hdd
> > sudo ceph osd pool create rbd 256 256 replicated hddreplrule
> > sudo ceph osd pool set rbd size 3
> > sudo ceph osd pool set rbd min_size 2
> > 
> > osd tree look like:
> > ID  CLASS WEIGHTTYPE NAME   STATUS REWEIGHT PRI-AFF
> >  -1   117.36346 root default
> >  -2 9.78029 rack R-26-3-1
> > -27 9.78029 host S-26-3-1-1
> >   0   hdd   9.32390 osd.0   up  1.0 1.0
> >   1   ssd   0.22820 osd.1   up  1.0 1.0
> >   2   ssd   0.22820 osd.2   up  1.0 1.0
> >  -3 9.78029 rack R-26-3-2
> > -43 9.78029 host S-26-3-2-1
> >   3   hdd   9.32390 osd.3   up  1.0 1.0
> >   4   ssd   0.22820 osd.4   up  1.0 1.0
> >   5   ssd   0.22820 osd.5   up  1.0 1.0
> > [...]
> > 
> > 
> > Now write some data to rbd pool and shutdown one node.
> >   cluster:
> > id: 9000d700-8529-4d38-b9f5-24d6079429a2
> > health: HEALTH_WARN
> > 3 osds down
> > 1 host (3 osds) down
> > 1 rack (3 osds) down
> > Degraded data redundancy: 1223/12300 objects degraded (9.943%), 
> > 74 pgs degraded, 74 pgs undersized
> > 
> > And ceph does not try to repair pool. Why?  
> 
> How long did you wait? The default timeout is 600 seconds before
> recovery starts.
> 
> These OSDs are not marked as out yet.
> 
> Wido
> 
> > 
> > WBR,
> > Fyodor.
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >  
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph auto repair. What is wrong?

2018-08-24 Thread Fyodor Ustinov
Hi!

I wait about hour.

- Original Message -
From: "Wido den Hollander" 
To: "Fyodor Ustinov" , ceph-users@lists.ceph.com
Sent: Friday, 24 August, 2018 09:52:23
Subject: Re: [ceph-users] ceph auto repair. What is wrong?

On 08/24/2018 06:11 AM, Fyodor Ustinov wrote:
> Hi!
> 
> I have fresh ceph cluster. 12 host and 3 osd on each host (one - hdd and two 
> - ssd). Each host located in own rack.
> 
> I make such crush configuration on fresh ceph installation:
> 
>sudo ceph osd crush add-bucket R-26-3-1 rack
>sudo ceph osd crush add-bucket R-26-3-2 rack
>sudo ceph osd crush add-bucket R-26-4-1 rack
>sudo ceph osd crush add-bucket R-26-4-2 rack
> [...]
>sudo ceph osd crush add-bucket R-26-8-1 rack
>sudo ceph osd crush add-bucket R-26-8-2 rack
> 
>sudo ceph osd crush move R-26-3-1 root=default
> [...]
>sudo ceph osd crush move R-26-8-2 root=default
> 
> sudo ceph osd crush move S-26-3-1-1 rack=R-26-3-1
> [...]
> sudo ceph osd crush move S-26-8-2-1 rack=R-26-8-2
> 
> sudo ceph osd crush rule create-replicated hddreplrule default rack hdd
> sudo ceph osd pool create rbd 256 256 replicated hddreplrule
> sudo ceph osd pool set rbd size 3
> sudo ceph osd pool set rbd min_size 2
> 
> osd tree look like:
> ID  CLASS WEIGHTTYPE NAME   STATUS REWEIGHT PRI-AFF
>  -1   117.36346 root default
>  -2 9.78029 rack R-26-3-1
> -27 9.78029 host S-26-3-1-1
>   0   hdd   9.32390 osd.0   up  1.0 1.0
>   1   ssd   0.22820 osd.1   up  1.0 1.0
>   2   ssd   0.22820 osd.2   up  1.0 1.0
>  -3 9.78029 rack R-26-3-2
> -43 9.78029 host S-26-3-2-1
>   3   hdd   9.32390 osd.3   up  1.0 1.0
>   4   ssd   0.22820 osd.4   up  1.0 1.0
>   5   ssd   0.22820 osd.5   up  1.0 1.0
> [...]
> 
> 
> Now write some data to rbd pool and shutdown one node.
>   cluster:
> id: 9000d700-8529-4d38-b9f5-24d6079429a2
> health: HEALTH_WARN
> 3 osds down
> 1 host (3 osds) down
> 1 rack (3 osds) down
> Degraded data redundancy: 1223/12300 objects degraded (9.943%), 
> 74 pgs degraded, 74 pgs undersized
> 
> And ceph does not try to repair pool. Why?

How long did you wait? The default timeout is 600 seconds before
recovery starts.

These OSDs are not marked as out yet.

Wido

> 
> WBR,
> Fyodor.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-fuse slow cache?

2018-08-24 Thread Stefan Kooman
Hi Gregory,

Quoting Gregory Farnum (gfar...@redhat.com):
> This is quite strange. Given that you have a log, I think what you want to
> do is find one request in the log, trace it through its lifetime, and see
> where the time is elapsed. You may find a bifurcation, where some
> categories of requests happen instantly but other categories take a second
> or more; focus on the second, obviously.

So that is what I did. Turns out it's not the (slow) cache at all, probably
not to your surprise. The reads are quit fast actually, compared to
kernel client it's ~ 8 ms slower, or ~ 40%. It looks like couple
of writes / updates to, at least a session file, are slow:

2018-08-23 16:40:25.631 7f79156a8700 10 client.15158830 put_inode on
0x1693859.head(faked_ino=0 ref=5 ll_ref=1 cap_refs={} open={3=1}
mode=100600 size=0/4194304 nlink=1 btime=2018-08-23 16:40:25.632601
mtime=2018-08-23 16:40:25.632601 ctime=2018-08-23 16:40:25.632601
caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x1693859 ts 0/0
objects 0 dirty_or_tx 0]
parents=0x168547c.head["sess_ia0agoj01ul4rob7ji55ouca41"]
0x5646ff0e8000)

2018-08-23 16:40:28.547 7f79156a8700 10 client.15158830
update_inode_file_time 0x1693859.head(faked_ino=0 ref=4 ll_ref=1
cap_refs={} open={3=1} mode=100600 size=0/4194304 nlink=1
btime=2018-08-23 16:40:25.632601 mtime=2018-08-23 16:40:25.632601
ctime=2018-08-23 16:40:25.632601
caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x1693859 ts 0/0
objects 0 dirty_or_tx 0]
parents=0x168547c.head["sess_ia0agoj01ul4rob7ji55ouca41"]
0x5646ff0e8000) pAsxLsXsxFsxcrwb ctime 2018-08-23 16:40:25.632601 mtime
2018-08-23 16:40:25.632601

So, almost 3 seconds. Page is only served after this, and possibly, after
some cache files have been written. Note though that ceph-fuse is in
debug=20 mode. Apparently the kernel client is _much_ faster in writing
than ceph-fuse. If I write a file with "dd" (from /dev/urandom) it's in
the tens of milliseconds range, not seconds. atime / ctime changes are
handled in < 5 ms.

I wonder if tuning file-striping [1] with stripe units of 4KB would be
beneficial in this case.

Gr. Stefan

[1]: http://docs.ceph.com/docs/master/dev/file-striping/

-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Shared WAL/DB device partition for multiple OSDs?

2018-08-24 Thread Eugen Block

Hi,

I don't know why but, I noticed in the ceph-volume-systemd.log  
(above in bold), that there are 2 different lines corresponding to  
the lvm-1 (normally associated to the osd.1) ?


One seems to have the correct id, while the other has a bad  
one...and it's looks like he's trying to start the one with the  
wrong id !?


those can be remains of previous attempts to create OSDs. There are  
probably still enabled systemd-units referring to old LVs, just  
disable them to rule it out as the root cause.
I've seen these messages, too, but eventually ceph-volume was able to  
find the right LVs. In your case it seems like it doesn't, though.


Regards,
Eugen


Zitat von Hervé Ballans :


Le 23/08/2018 à 18:44, Alfredo Deza a écrit :

ceph-volume-systemd.log (extract)
[2018-08-20 11:26:26,386][systemd][INFO  ] raw systemd input received:
lvm-6-ba351d69-5c48-418e-a377-4034f503af93
[2018-08-20 11:26:26,386][systemd][INFO  ] raw systemd input received:
lvm-3-9380cd27-c0fe-4ede-9ed3-d09eff545037
*[2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input  
received: lvm-1-bcb9d7e6-44ea-449b-ad97-1aa5f880dfdd*

[2018-08-20 11:26:26,386][systemd][INFO  ] raw systemd input received:
lvm-4-02540fff-5478-4a67-bf5c-679c72150e8d
[2018-08-20 11:26:26,386][systemd][INFO  ] raw systemd input received:
lvm-0-98bfb597-009b-4e88-bc5e-dd22587d21fe
[2018-08-20 11:26:26,386][systemd][INFO  ] raw systemd input received:
lvm-8-913e65e3-62d9-48f8-a0ef-45315cf64593
[2018-08-20 11:26:26,386][systemd][INFO  ] raw systemd input received:
lvm-5-b7100200-9eef-4c85-b855-b5a0a435354c
[2018-08-20 11:26:26,386][systemd][INFO  ] raw systemd input received:
lvm-7-5d4af2fc-388c-4795-9d1a-53ad8aba56d8
[2018-08-20 11:26:26,386][systemd][INFO  ] parsed sub-command: lvm, extra
data: 6-ba351d69-5c48-418e-a377-4034f503af93
[2018-08-20 11:26:26,386][systemd][INFO  ] parsed sub-command: lvm, extra
data: 3-9380cd27-c0fe-4ede-9ed3-d09eff545037
[2018-08-20 11:26:26,386][systemd][INFO  ] parsed sub-command: lvm, extra
data: 1-bcb9d7e6-44ea-449b-ad97-1aa5f880dfdd
[2018-08-20 11:26:26,387][systemd][INFO  ] parsed sub-command: lvm, extra
data: 4-02540fff-5478-4a67-bf5c-679c72150e8d
[2018-08-20 11:26:26,387][systemd][INFO  ] parsed sub-command: lvm, extra
data: 0-98bfb597-009b-4e88-bc5e-dd22587d21fe
[2018-08-20 11:26:26,387][systemd][INFO  ] parsed sub-command: lvm, extra
data: 8-913e65e3-62d9-48f8-a0ef-45315cf64593
[2018-08-20 11:26:26,387][systemd][INFO  ] parsed sub-command: lvm, extra
data: 5-b7100200-9eef-4c85-b855-b5a0a435354c
[2018-08-20 11:26:26,386][systemd][INFO  ] raw systemd input received:
lvm-9-2e5b3463-5904-4aee-9ae1-7d31d8576dc8
[2018-08-20 11:26:26,386][systemd][INFO  ] raw systemd input received:
lvm-2-b8e82f22-e993-4458-984b-90232b8b3d55
[2018-08-20 11:26:26,387][systemd][INFO  ] parsed sub-command: lvm, extra
data: 7-5d4af2fc-388c-4795-9d1a-53ad8aba56d8
*[2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input  
received: lvm-1-4a9954ce-0a0f-432b-a91d-eaacb45287d4*

[2018-08-20 11:26:26,387][systemd][INFO  ] parsed sub-command: lvm, extra
data: 1-4a9954ce-0a0f-432b-a91d-eaacb45287d4
[2018-08-20 11:26:26,387][systemd][INFO  ] parsed sub-command: lvm, extra
data: 9-2e5b3463-5904-4aee-9ae1-7d31d8576dc8
[2018-08-20 11:26:26,387][systemd][INFO  ] parsed sub-command: lvm, extra
data: 2-b8e82f22-e993-4458-984b-90232b8b3d55
[2018-08-20 11:26:26,458][ceph_volume.process][INFO  ] Running command:
ceph-volume lvm trigger 3-9380cd27-c0fe-4ede-9ed3-d09eff545037
[2018-08-20 11:26:26,458][ceph_volume.process][INFO  ] Running command:
ceph-volume lvm trigger 2-b8e82f22-e993-4458-984b-90232b8b3d55
[2018-08-20 11:26:26,458][ceph_volume.process][INFO  ] Running command:
ceph-volume lvm trigger 5-b7100200-9eef-4c85-b855-b5a0a435354c
[2018-08-20 11:26:26,458][ceph_volume.process][INFO  ] Running command:
ceph-volume lvm trigger 6-ba351d69-5c48-418e-a377-4034f503af93
[2018-08-20 11:26:26,458][ceph_volume.process][INFO  ] Running command:
ceph-volume lvm trigger 4-02540fff-5478-4a67-bf5c-679c72150e8d
[2018-08-20 11:26:26,459][ceph_volume.process][INFO  ] Running command:
ceph-volume lvm trigger 8-913e65e3-62d9-48f8-a0ef-45315cf64593
[2018-08-20 11:26:26,459][ceph_volume.process][INFO  ] Running command:
ceph-volume lvm trigger 0-98bfb597-009b-4e88-bc5e-dd22587d21fe
[2018-08-20 11:26:26,459][ceph_volume.process][INFO  ] Running command:
ceph-volume lvm trigger 7-5d4af2fc-388c-4795-9d1a-53ad8aba56d8
*[2018-08-20 11:26:26,459][ceph_volume.process][INFO ] Running  
command: ceph-volume lvm trigger  
1-4a9954ce-0a0f-432b-a91d-eaacb45287d4*

[2018-08-20 11:26:26,459][ceph_volume.process][INFO  ] Running command:
ceph-volume lvm trigger 9-2e5b3463-5904-4aee-9ae1-7d31d8576dc8
*[2018-08-20 11:26:26,459][ceph_volume.process][INFO ] Running  
command: ceph-volume lvm trigger  
1-bcb9d7e6-44ea-449b-ad97-1aa5f880dfdd*

[2018-08-20 11:26:27,068][ceph_volume.process][INFO  ] stderr -->
RuntimeError: could not find osd.1 with fsid
4a9954ce-0a0f-432b-a91d-eaacb45287d4

Re: [ceph-users] RGW pools don't show up in luminous

2018-08-24 Thread Konstantin Shalygin

  I installed a new Ceph cluster with Luminous, after a long time working
with Jewel.  I created my RGW pools the same as always (pool create
default.rgw.buckets.data etc.), but they don't show up in ceph df with
Luminous.  Has the command changed?


Since Luminous you don't need to create pools. rgw will create it 
automatically.


And no, rgw pools will be present on 'ceph df' or 'rados df'.



k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Shared WAL/DB device partition for multiple OSDs?

2018-08-24 Thread Hervé Ballans

Le 23/08/2018 à 18:44, Alfredo Deza a écrit :

ceph-volume-systemd.log (extract)
[2018-08-20 11:26:26,386][systemd][INFO  ] raw systemd input received:
lvm-6-ba351d69-5c48-418e-a377-4034f503af93
[2018-08-20 11:26:26,386][systemd][INFO  ] raw systemd input received:
lvm-3-9380cd27-c0fe-4ede-9ed3-d09eff545037
*[2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input received: 
lvm-1-bcb9d7e6-44ea-449b-ad97-1aa5f880dfdd*

[2018-08-20 11:26:26,386][systemd][INFO  ] raw systemd input received:
lvm-4-02540fff-5478-4a67-bf5c-679c72150e8d
[2018-08-20 11:26:26,386][systemd][INFO  ] raw systemd input received:
lvm-0-98bfb597-009b-4e88-bc5e-dd22587d21fe
[2018-08-20 11:26:26,386][systemd][INFO  ] raw systemd input received:
lvm-8-913e65e3-62d9-48f8-a0ef-45315cf64593
[2018-08-20 11:26:26,386][systemd][INFO  ] raw systemd input received:
lvm-5-b7100200-9eef-4c85-b855-b5a0a435354c
[2018-08-20 11:26:26,386][systemd][INFO  ] raw systemd input received:
lvm-7-5d4af2fc-388c-4795-9d1a-53ad8aba56d8
[2018-08-20 11:26:26,386][systemd][INFO  ] parsed sub-command: lvm, extra
data: 6-ba351d69-5c48-418e-a377-4034f503af93
[2018-08-20 11:26:26,386][systemd][INFO  ] parsed sub-command: lvm, extra
data: 3-9380cd27-c0fe-4ede-9ed3-d09eff545037
[2018-08-20 11:26:26,386][systemd][INFO  ] parsed sub-command: lvm, extra
data: 1-bcb9d7e6-44ea-449b-ad97-1aa5f880dfdd
[2018-08-20 11:26:26,387][systemd][INFO  ] parsed sub-command: lvm, extra
data: 4-02540fff-5478-4a67-bf5c-679c72150e8d
[2018-08-20 11:26:26,387][systemd][INFO  ] parsed sub-command: lvm, extra
data: 0-98bfb597-009b-4e88-bc5e-dd22587d21fe
[2018-08-20 11:26:26,387][systemd][INFO  ] parsed sub-command: lvm, extra
data: 8-913e65e3-62d9-48f8-a0ef-45315cf64593
[2018-08-20 11:26:26,387][systemd][INFO  ] parsed sub-command: lvm, extra
data: 5-b7100200-9eef-4c85-b855-b5a0a435354c
[2018-08-20 11:26:26,386][systemd][INFO  ] raw systemd input received:
lvm-9-2e5b3463-5904-4aee-9ae1-7d31d8576dc8
[2018-08-20 11:26:26,386][systemd][INFO  ] raw systemd input received:
lvm-2-b8e82f22-e993-4458-984b-90232b8b3d55
[2018-08-20 11:26:26,387][systemd][INFO  ] parsed sub-command: lvm, extra
data: 7-5d4af2fc-388c-4795-9d1a-53ad8aba56d8
*[2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input received: 
lvm-1-4a9954ce-0a0f-432b-a91d-eaacb45287d4*

[2018-08-20 11:26:26,387][systemd][INFO  ] parsed sub-command: lvm, extra
data: 1-4a9954ce-0a0f-432b-a91d-eaacb45287d4
[2018-08-20 11:26:26,387][systemd][INFO  ] parsed sub-command: lvm, extra
data: 9-2e5b3463-5904-4aee-9ae1-7d31d8576dc8
[2018-08-20 11:26:26,387][systemd][INFO  ] parsed sub-command: lvm, extra
data: 2-b8e82f22-e993-4458-984b-90232b8b3d55
[2018-08-20 11:26:26,458][ceph_volume.process][INFO  ] Running command:
ceph-volume lvm trigger 3-9380cd27-c0fe-4ede-9ed3-d09eff545037
[2018-08-20 11:26:26,458][ceph_volume.process][INFO  ] Running command:
ceph-volume lvm trigger 2-b8e82f22-e993-4458-984b-90232b8b3d55
[2018-08-20 11:26:26,458][ceph_volume.process][INFO  ] Running command:
ceph-volume lvm trigger 5-b7100200-9eef-4c85-b855-b5a0a435354c
[2018-08-20 11:26:26,458][ceph_volume.process][INFO  ] Running command:
ceph-volume lvm trigger 6-ba351d69-5c48-418e-a377-4034f503af93
[2018-08-20 11:26:26,458][ceph_volume.process][INFO  ] Running command:
ceph-volume lvm trigger 4-02540fff-5478-4a67-bf5c-679c72150e8d
[2018-08-20 11:26:26,459][ceph_volume.process][INFO  ] Running command:
ceph-volume lvm trigger 8-913e65e3-62d9-48f8-a0ef-45315cf64593
[2018-08-20 11:26:26,459][ceph_volume.process][INFO  ] Running command:
ceph-volume lvm trigger 0-98bfb597-009b-4e88-bc5e-dd22587d21fe
[2018-08-20 11:26:26,459][ceph_volume.process][INFO  ] Running command:
ceph-volume lvm trigger 7-5d4af2fc-388c-4795-9d1a-53ad8aba56d8
*[2018-08-20 11:26:26,459][ceph_volume.process][INFO ] Running 
command: ceph-volume lvm trigger 1-4a9954ce-0a0f-432b-a91d-eaacb45287d4*

[2018-08-20 11:26:26,459][ceph_volume.process][INFO  ] Running command:
ceph-volume lvm trigger 9-2e5b3463-5904-4aee-9ae1-7d31d8576dc8
*[2018-08-20 11:26:26,459][ceph_volume.process][INFO ] Running 
command: ceph-volume lvm trigger 1-bcb9d7e6-44ea-449b-ad97-1aa5f880dfdd*

[2018-08-20 11:26:27,068][ceph_volume.process][INFO  ] stderr -->
RuntimeError: could not find osd.1 with fsid
4a9954ce-0a0f-432b-a91d-eaacb45287d4

This is odd: why is osd.1 not found? Do you have an OSD with that ID and FSID?

This line means that we have queried all the LVs in the system and we
haven't found anything that responds to that ID and FSID


Hi Alfredo,

I don't know why but, I noticed in the ceph-volume-systemd.log (above in 
bold), that there are 2 different lines corresponding to the lvm-1 
(normally associated to the osd.1) ?


One seems to have the correct id, while the other has a bad one...and 
it's looks like he's trying to start the one with the wrong id !?


Just a stupid assumption, but would it be possible, following the NVMe 
device path reversal, that a second lvm path for the same osd to then be 
created ?



Re: [ceph-users] Migrating from pre-luminous multi-root crush hierachy

2018-08-24 Thread Konstantin Shalygin

On 08/24/2018 01:57 PM, Buchberger, Carsten wrote:

Hi Konstantin,

sounds easy;-)  If i apply the new rule to the existing pools there won't be 
any osds to satisfy the requirements of the rule - because the osds are not in 
the new root yet.
Isn't that a problem ?

Thank you


Your IO will stall.
You need fast move osds to new root. Make list of commands `ceph osd 
crush move  host=` and paste it after apply crush rule.





k
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PG auto repair with BlueStore

2018-08-24 Thread Wido den Hollander
Hi,

osd_scrub_auto_repair still defaults to false and I was wondering how we
think about enabling this feature by default.

Would we say it's safe to enable this with BlueStore?

Wido
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph auto repair. What is wrong?

2018-08-24 Thread Wido den Hollander



On 08/24/2018 06:11 AM, Fyodor Ustinov wrote:
> Hi!
> 
> I have fresh ceph cluster. 12 host and 3 osd on each host (one - hdd and two 
> - ssd). Each host located in own rack.
> 
> I make such crush configuration on fresh ceph installation:
> 
>sudo ceph osd crush add-bucket R-26-3-1 rack
>sudo ceph osd crush add-bucket R-26-3-2 rack
>sudo ceph osd crush add-bucket R-26-4-1 rack
>sudo ceph osd crush add-bucket R-26-4-2 rack
> [...]
>sudo ceph osd crush add-bucket R-26-8-1 rack
>sudo ceph osd crush add-bucket R-26-8-2 rack
> 
>sudo ceph osd crush move R-26-3-1 root=default
> [...]
>sudo ceph osd crush move R-26-8-2 root=default
> 
> sudo ceph osd crush move S-26-3-1-1 rack=R-26-3-1
> [...]
> sudo ceph osd crush move S-26-8-2-1 rack=R-26-8-2
> 
> sudo ceph osd crush rule create-replicated hddreplrule default rack hdd
> sudo ceph osd pool create rbd 256 256 replicated hddreplrule
> sudo ceph osd pool set rbd size 3
> sudo ceph osd pool set rbd min_size 2
> 
> osd tree look like:
> ID  CLASS WEIGHTTYPE NAME   STATUS REWEIGHT PRI-AFF
>  -1   117.36346 root default
>  -2 9.78029 rack R-26-3-1
> -27 9.78029 host S-26-3-1-1
>   0   hdd   9.32390 osd.0   up  1.0 1.0
>   1   ssd   0.22820 osd.1   up  1.0 1.0
>   2   ssd   0.22820 osd.2   up  1.0 1.0
>  -3 9.78029 rack R-26-3-2
> -43 9.78029 host S-26-3-2-1
>   3   hdd   9.32390 osd.3   up  1.0 1.0
>   4   ssd   0.22820 osd.4   up  1.0 1.0
>   5   ssd   0.22820 osd.5   up  1.0 1.0
> [...]
> 
> 
> Now write some data to rbd pool and shutdown one node.
>   cluster:
> id: 9000d700-8529-4d38-b9f5-24d6079429a2
> health: HEALTH_WARN
> 3 osds down
> 1 host (3 osds) down
> 1 rack (3 osds) down
> Degraded data redundancy: 1223/12300 objects degraded (9.943%), 
> 74 pgs degraded, 74 pgs undersized
> 
> And ceph does not try to repair pool. Why?

How long did you wait? The default timeout is 600 seconds before
recovery starts.

These OSDs are not marked as out yet.

Wido

> 
> WBR,
> Fyodor.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrating from pre-luminous multi-root crush hierachy

2018-08-24 Thread Konstantin Shalygin

We recently upgrade to luminous (you can see the device-classes in the output). 
So it should be possible to have one single root, no fake hosts and just use 
the device-class.
We added some hosts/osds recently which back a new pools, so we also created a 
new hierarchy and crush rules for those. That worked perfect, and of course we 
want to have that for the old parts of the cluster, too

Is it possible to move the existing osd's to a new root/bucket without having 
to move all the data around (which might be difficult cause we don't have 
enough capacity to move 50 % of the osd's ) ?

I imagine something like:


1. Magic maintenance command

2. Move osds to new bucket in hierarchy

3. Update either existing crush-rule or create new rule an update pool

4. Magic maintenance-done command

We also plan to migrate the ods to bluestore. Should we do this
a) before moving
b) after moving

I hope our issue is clear.

Best regards
Carsten



You don't need "magic maintenance command", when you online apply your 
crush rule you need to move your osds to root defined in new crush rule.


Data movement is not huge in this case.



k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com