Re: [ceph-users] CephFS: client hangs

2019-02-21 Thread Hennen, Christian
Of coure, you’re right. After using the right name, the connection worked :) I 
tried to connect via a newer kernel client (under Ubuntu 16.04) and it worked 
as well. So the issue clearly seems to be related to our client kernel version.

 

Thank you all very much for your time and help! 

 

 

Von: David Turner  
Gesendet: Dienstag, 19. Februar 2019 19:32
An: Hennen, Christian 
Cc: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] CephFS: client hangs

 

You're attempting to use mismatching client name and keyring.  You want to use 
matching name and keyring.  For your example, you would want to either use 
`--keyring /etc/ceph/ceph.client.admin.keyring --name client.admin` or 
`--keyring /etc/ceph/ceph.client.cephfs.keyring --name client.cephfs`.  Mixing 
and matching does not work.  Treat them like username and password.  You 
wouldn't try to log into your computer under your account with the admin 
password.

 

On Tue, Feb 19, 2019 at 12:58 PM Hennen, Christian 
mailto:christian.hen...@uni-trier.de> > wrote:

> sounds like network issue. are there firewall/NAT between nodes?
No, there is currently no firewall in place. Nodes and clients are on the same 
network. MTUs match, ports are opened according to nmap.

> try running ceph-fuse on the node that run mds, check if it works properly.
When I try to run ceph-fuse on either a client or cephfiler1 (MON,MGR,MDS,OSDs) 
I get
- "operation not permitted" when using the client keyring
- "invalid argument" when using the admin keyring
- "ms_handle_refused" when using the admin keyring and connecting to 
127.0.0.1:6789 <http://127.0.0.1:6789> 

ceph-fuse --keyring /etc/ceph/ceph.client.admin.keyring --name client.cephfs -m 
192.168.1.17:6789 <http://192.168.1.17:6789>  /mnt/cephfs

-Ursprüngliche Nachricht-
Von: Yan, Zheng mailto:uker...@gmail.com> > 
Gesendet: Dienstag, 19. Februar 2019 11:31
An: Hennen, Christian mailto:christian.hen...@uni-trier.de> >
Cc: ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
Betreff: Re: [ceph-users] CephFS: client hangs

On Tue, Feb 19, 2019 at 5:10 PM Hennen, Christian 
mailto:christian.hen...@uni-trier.de> > wrote:
>
> Hi!
>
> >mon_max_pg_per_osd = 400
> >
> >In the ceph.conf and then restart all the services / or inject the 
> >config into the running admin
>
> I restarted each server (MONs and OSDs weren’t enough) and now the health 
> warning is gone. Still no luck accessing CephFS though.
>
>
> > MDS show a client got evicted. Nothing else looks abnormal.  Do new 
> > cephfs clients also get evicted quickly?
>
> Aside from the fact that evicted clients don’t show up in ceph –s, we observe 
> other strange things:
>
> ·   Setting max_mds has no effect
>
> ·   Ceph osd blacklist ls sometimes lists cluster nodes
>

sounds like network issue. are there firewall/NAT between nodes?

> The only client that is currently running is ‚master1‘. It also hosts a MON 
> and a MGR. Its syslog (https://gitlab.uni-trier.de/snippets/78) shows 
> messages like:
>
> Feb 13 06:40:33 master1 kernel: [56165.943008] libceph: wrong peer, 
> want 192.168.1.17:6800/-2045158358 <http://192.168.1.17:6800/-2045158358> , 
> got 192.168.1.17:6800/1699349984 <http://192.168.1.17:6800/1699349984> 
>
> Feb 13 06:40:33 master1 kernel: [56165.943014] libceph: mds1 
> 192.168.1.17:6800 <http://192.168.1.17:6800>  wrong peer at address
>
> The other day I did the update from 12.2.8 to 12.2.11, which can also be seen 
> in the logs. Again, there appeared these messages. I assume that’s normal 
> operations since ports can change and daemons have to find each other again? 
> But what about Feb 13 in the morning? I didn’t do any restarts then.
>
> Also, clients are printing messages like the following on the console:
>
> [1026589.751040] ceph: handle_cap_import: mismatched seq/mseq: ino 
> (1994988.fffe) mds0 seq1 mseq 15 importer mds1 has 
> peer seq 2 mseq 15
>
> [1352658.876507] ceph: build_path did not end path lookup where 
> expected, namelen is 23, pos is 0
>
> Oh, and btw, the ceph nodes are running on Ubuntu 16.04, clients are on 14.04 
> with kernel 4.4.0-133.
>

try running ceph-fuse on the node that run mds, check if it works properly.


> For reference:
>
> > Cluster details: https://gitlab.uni-trier.de/snippets/77
>
> > MDS log: 
> > https://gitlab.uni-trier.de/snippets/79?expanded=true 
> > <https://gitlab.uni-trier.de/snippets/79?expanded=true&viewer=simple> 
> > &viewer=simple)
>
>
> Kind regards
> Christian Hennen
>
> Project Manager Infrastructural Services ZIMK University of Trier 
> Germany
>
> Von: Ashley Merrick  <mailt

Re: [ceph-users] CephFS: client hangs

2019-02-19 Thread David Turner
You're attempting to use mismatching client name and keyring.  You want to
use matching name and keyring.  For your example, you would want to either
use `--keyring /etc/ceph/ceph.client.admin.keyring --name client.admin` or
`--keyring /etc/ceph/ceph.client.cephfs.keyring --name client.cephfs`.
Mixing and matching does not work.  Treat them like username and password.
You wouldn't try to log into your computer under your account with the
admin password.

On Tue, Feb 19, 2019 at 12:58 PM Hennen, Christian <
christian.hen...@uni-trier.de> wrote:

> > sounds like network issue. are there firewall/NAT between nodes?
> No, there is currently no firewall in place. Nodes and clients are on the
> same network. MTUs match, ports are opened according to nmap.
>
> > try running ceph-fuse on the node that run mds, check if it works
> properly.
> When I try to run ceph-fuse on either a client or cephfiler1
> (MON,MGR,MDS,OSDs) I get
> - "operation not permitted" when using the client keyring
> - "invalid argument" when using the admin keyring
> - "ms_handle_refused" when using the admin keyring and connecting to
> 127.0.0.1:6789
>
> ceph-fuse --keyring /etc/ceph/ceph.client.admin.keyring --name
> client.cephfs -m 192.168.1.17:6789 /mnt/cephfs
>
> -Ursprüngliche Nachricht-
> Von: Yan, Zheng 
> Gesendet: Dienstag, 19. Februar 2019 11:31
> An: Hennen, Christian 
> Cc: ceph-users@lists.ceph.com
> Betreff: Re: [ceph-users] CephFS: client hangs
>
> On Tue, Feb 19, 2019 at 5:10 PM Hennen, Christian <
> christian.hen...@uni-trier.de> wrote:
> >
> > Hi!
> >
> > >mon_max_pg_per_osd = 400
> > >
> > >In the ceph.conf and then restart all the services / or inject the
> > >config into the running admin
> >
> > I restarted each server (MONs and OSDs weren’t enough) and now the
> health warning is gone. Still no luck accessing CephFS though.
> >
> >
> > > MDS show a client got evicted. Nothing else looks abnormal.  Do new
> > > cephfs clients also get evicted quickly?
> >
> > Aside from the fact that evicted clients don’t show up in ceph –s, we
> observe other strange things:
> >
> > ·   Setting max_mds has no effect
> >
> > ·   Ceph osd blacklist ls sometimes lists cluster nodes
> >
>
> sounds like network issue. are there firewall/NAT between nodes?
>
> > The only client that is currently running is ‚master1‘. It also hosts a
> MON and a MGR. Its syslog (https://gitlab.uni-trier.de/snippets/78) shows
> messages like:
> >
> > Feb 13 06:40:33 master1 kernel: [56165.943008] libceph: wrong peer,
> > want 192.168.1.17:6800/-2045158358, got 192.168.1.17:6800/1699349984
> >
> > Feb 13 06:40:33 master1 kernel: [56165.943014] libceph: mds1
> > 192.168.1.17:6800 wrong peer at address
> >
> > The other day I did the update from 12.2.8 to 12.2.11, which can also be
> seen in the logs. Again, there appeared these messages. I assume that’s
> normal operations since ports can change and daemons have to find each
> other again? But what about Feb 13 in the morning? I didn’t do any restarts
> then.
> >
> > Also, clients are printing messages like the following on the console:
> >
> > [1026589.751040] ceph: handle_cap_import: mismatched seq/mseq: ino
> > (1994988.fffe) mds0 seq1 mseq 15 importer mds1 has
> > peer seq 2 mseq 15
> >
> > [1352658.876507] ceph: build_path did not end path lookup where
> > expected, namelen is 23, pos is 0
> >
> > Oh, and btw, the ceph nodes are running on Ubuntu 16.04, clients are on
> 14.04 with kernel 4.4.0-133.
> >
>
> try running ceph-fuse on the node that run mds, check if it works properly.
>
>
> > For reference:
> >
> > > Cluster details: https://gitlab.uni-trier.de/snippets/77
> >
> > > MDS log:
> > > https://gitlab.uni-trier.de/snippets/79?expanded=true&viewer=simple)
> >
> >
> > Kind regards
> > Christian Hennen
> >
> > Project Manager Infrastructural Services ZIMK University of Trier
> > Germany
> >
> > Von: Ashley Merrick 
> > Gesendet: Montag, 18. Februar 2019 16:53
> > An: Hennen, Christian 
> > Cc: ceph-users@lists.ceph.com
> > Betreff: Re: [ceph-users] CephFS: client hangs
> >
> > Correct yes from my expirence OSD’s aswel.
> >
> > On Mon, 18 Feb 2019 at 11:51 PM, Hennen, Christian <
> christian.hen...@uni-trier.de> wrote:
> >
> > Hi!
> >
> > >mon_max_pg_per_osd = 400
> > >
> > >In the ceph.conf and then restart all the se

Re: [ceph-users] CephFS: client hangs

2019-02-19 Thread Hennen, Christian
> sounds like network issue. are there firewall/NAT between nodes?
No, there is currently no firewall in place. Nodes and clients are on the same 
network. MTUs match, ports are opened according to nmap.

> try running ceph-fuse on the node that run mds, check if it works properly.
When I try to run ceph-fuse on either a client or cephfiler1 (MON,MGR,MDS,OSDs) 
I get
- "operation not permitted" when using the client keyring
- "invalid argument" when using the admin keyring
- "ms_handle_refused" when using the admin keyring and connecting to 
127.0.0.1:6789

ceph-fuse --keyring /etc/ceph/ceph.client.admin.keyring --name client.cephfs -m 
192.168.1.17:6789 /mnt/cephfs

-Ursprüngliche Nachricht-
Von: Yan, Zheng  
Gesendet: Dienstag, 19. Februar 2019 11:31
An: Hennen, Christian 
Cc: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] CephFS: client hangs

On Tue, Feb 19, 2019 at 5:10 PM Hennen, Christian 
 wrote:
>
> Hi!
>
> >mon_max_pg_per_osd = 400
> >
> >In the ceph.conf and then restart all the services / or inject the 
> >config into the running admin
>
> I restarted each server (MONs and OSDs weren’t enough) and now the health 
> warning is gone. Still no luck accessing CephFS though.
>
>
> > MDS show a client got evicted. Nothing else looks abnormal.  Do new 
> > cephfs clients also get evicted quickly?
>
> Aside from the fact that evicted clients don’t show up in ceph –s, we observe 
> other strange things:
>
> ·   Setting max_mds has no effect
>
> ·   Ceph osd blacklist ls sometimes lists cluster nodes
>

sounds like network issue. are there firewall/NAT between nodes?

> The only client that is currently running is ‚master1‘. It also hosts a MON 
> and a MGR. Its syslog (https://gitlab.uni-trier.de/snippets/78) shows 
> messages like:
>
> Feb 13 06:40:33 master1 kernel: [56165.943008] libceph: wrong peer, 
> want 192.168.1.17:6800/-2045158358, got 192.168.1.17:6800/1699349984
>
> Feb 13 06:40:33 master1 kernel: [56165.943014] libceph: mds1 
> 192.168.1.17:6800 wrong peer at address
>
> The other day I did the update from 12.2.8 to 12.2.11, which can also be seen 
> in the logs. Again, there appeared these messages. I assume that’s normal 
> operations since ports can change and daemons have to find each other again? 
> But what about Feb 13 in the morning? I didn’t do any restarts then.
>
> Also, clients are printing messages like the following on the console:
>
> [1026589.751040] ceph: handle_cap_import: mismatched seq/mseq: ino 
> (1994988.fffe) mds0 seq1 mseq 15 importer mds1 has 
> peer seq 2 mseq 15
>
> [1352658.876507] ceph: build_path did not end path lookup where 
> expected, namelen is 23, pos is 0
>
> Oh, and btw, the ceph nodes are running on Ubuntu 16.04, clients are on 14.04 
> with kernel 4.4.0-133.
>

try running ceph-fuse on the node that run mds, check if it works properly.


> For reference:
>
> > Cluster details: https://gitlab.uni-trier.de/snippets/77
>
> > MDS log: 
> > https://gitlab.uni-trier.de/snippets/79?expanded=true&viewer=simple)
>
>
> Kind regards
> Christian Hennen
>
> Project Manager Infrastructural Services ZIMK University of Trier 
> Germany
>
> Von: Ashley Merrick 
> Gesendet: Montag, 18. Februar 2019 16:53
> An: Hennen, Christian 
> Cc: ceph-users@lists.ceph.com
> Betreff: Re: [ceph-users] CephFS: client hangs
>
> Correct yes from my expirence OSD’s aswel.
>
> On Mon, 18 Feb 2019 at 11:51 PM, Hennen, Christian 
>  wrote:
>
> Hi!
>
> >mon_max_pg_per_osd = 400
> >
> >In the ceph.conf and then restart all the services / or inject the 
> >config into the running admin
>
> I restarted all MONs, but I assume the OSDs need to be restarted as well?
>
> > MDS show a client got evicted. Nothing else looks abnormal.  Do new 
> > cephfs clients also get evicted quickly?
>
> Yeah, it seems so. But strangely there is no indication of it in 'ceph 
> -s' or 'ceph health detail'. And they don't seem to be evicted 
> permanently? Right now, only 1 client is connected. The others are shut down 
> since last week.
> 'ceph osd blacklist ls' shows 0 entries.
>
>
> Kind regards
> Christian Hennen
>
> Project Manager Infrastructural Services ZIMK University of Trier 
> Germany
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: client hangs

2019-02-19 Thread Yan, Zheng
On Tue, Feb 19, 2019 at 5:10 PM Hennen, Christian
 wrote:
>
> Hi!
>
> >mon_max_pg_per_osd = 400
> >
> >In the ceph.conf and then restart all the services / or inject the config
> >into the running admin
>
> I restarted each server (MONs and OSDs weren’t enough) and now the health 
> warning is gone. Still no luck accessing CephFS though.
>
>
> > MDS show a client got evicted. Nothing else looks abnormal.  Do new cephfs
> > clients also get evicted quickly?
>
> Aside from the fact that evicted clients don’t show up in ceph –s, we observe 
> other strange things:
>
> ·   Setting max_mds has no effect
>
> ·   Ceph osd blacklist ls sometimes lists cluster nodes
>

sounds like network issue. are there firewall/NAT between nodes?

> The only client that is currently running is ‚master1‘. It also hosts a MON 
> and a MGR. Its syslog (https://gitlab.uni-trier.de/snippets/78) shows 
> messages like:
>
> Feb 13 06:40:33 master1 kernel: [56165.943008] libceph: wrong peer, want 
> 192.168.1.17:6800/-2045158358, got 192.168.1.17:6800/1699349984
>
> Feb 13 06:40:33 master1 kernel: [56165.943014] libceph: mds1 
> 192.168.1.17:6800 wrong peer at address
>
> The other day I did the update from 12.2.8 to 12.2.11, which can also be seen 
> in the logs. Again, there appeared these messages. I assume that’s normal 
> operations since ports can change and daemons have to find each other again? 
> But what about Feb 13 in the morning? I didn’t do any restarts then.
>
> Also, clients are printing messages like the following on the console:
>
> [1026589.751040] ceph: handle_cap_import: mismatched seq/mseq: ino 
> (1994988.fffe) mds0 seq1 mseq 15 importer mds1 has peer seq 2 
> mseq 15
>
> [1352658.876507] ceph: build_path did not end path lookup where expected, 
> namelen is 23, pos is 0
>
> Oh, and btw, the ceph nodes are running on Ubuntu 16.04, clients are on 14.04 
> with kernel 4.4.0-133.
>

try running ceph-fuse on the node that run mds, check if it works properly.


> For reference:
>
> > Cluster details: https://gitlab.uni-trier.de/snippets/77
>
> > MDS log: 
> > https://gitlab.uni-trier.de/snippets/79?expanded=true&viewer=simple)
>
>
> Kind regards
> Christian Hennen
>
> Project Manager Infrastructural Services ZIMK University of Trier
> Germany
>
> Von: Ashley Merrick 
> Gesendet: Montag, 18. Februar 2019 16:53
> An: Hennen, Christian 
> Cc: ceph-users@lists.ceph.com
> Betreff: Re: [ceph-users] CephFS: client hangs
>
> Correct yes from my expirence OSD’s aswel.
>
> On Mon, 18 Feb 2019 at 11:51 PM, Hennen, Christian 
>  wrote:
>
> Hi!
>
> >mon_max_pg_per_osd = 400
> >
> >In the ceph.conf and then restart all the services / or inject the config
> >into the running admin
>
> I restarted all MONs, but I assume the OSDs need to be restarted as well?
>
> > MDS show a client got evicted. Nothing else looks abnormal.  Do new cephfs
> > clients also get evicted quickly?
>
> Yeah, it seems so. But strangely there is no indication of it in 'ceph -s' or
> 'ceph health detail'. And they don't seem to be evicted permanently? Right
> now, only 1 client is connected. The others are shut down since last week.
> 'ceph osd blacklist ls' shows 0 entries.
>
>
> Kind regards
> Christian Hennen
>
> Project Manager Infrastructural Services ZIMK University of Trier
> Germany
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: client hangs

2019-02-19 Thread Hennen, Christian
Hi!

>mon_max_pg_per_osd = 400
>
>In the ceph.conf and then restart all the services / or inject the config 
>into the running admin

I restarted each server (MONs and OSDs weren’t enough) and now the health 
warning is gone. Still no luck accessing CephFS though.

> MDS show a client got evicted. Nothing else looks abnormal.  Do new cephfs 
> clients also get evicted quickly?

Aside from the fact that evicted clients don’t show up in ceph –s, we observe 
other strange things:
*   Setting max_mds has no effect
*   Ceph osd blacklist ls sometimes lists cluster nodes

The only client that is currently running is ‚master1‘. It also hosts a MON and 
a MGR. Its syslog (https://gitlab.uni-trier.de/snippets/78) shows messages like:
Feb 13 06:40:33 master1 kernel: [56165.943008] libceph: wrong peer, want 
192.168.1.17:6800/-2045158358, got 192.168.1.17:6800/1699349984
Feb 13 06:40:33 master1 kernel: [56165.943014] libceph: mds1 192.168.1.17:6800 
wrong peer at address
The other day I did the update from 12.2.8 to 12.2.11, which can also be seen 
in the logs. Again, there appeared these messages. I assume that’s normal 
operations since ports can change and daemons have to find each other again? 
But what about Feb 13 in the morning? I didn’t do any restarts then.

Also, clients are printing messages like the following on the console:
[1026589.751040] ceph: handle_cap_import: mismatched seq/mseq: ino 
(1994988.fffe) mds0 seq1 mseq 15 importer mds1 has peer seq 2 
mseq 15
[1352658.876507] ceph: build_path did not end path lookup where expected, 
namelen is 23, pos is 0

Oh, and btw, the ceph nodes are running on Ubuntu 16.04, clients are on 14.04 
with kernel 4.4.0-133.

For reference:
> Cluster details: https://gitlab.uni-trier.de/snippets/77 
> MDS log: https://gitlab.uni-trier.de/snippets/79?expanded=true&viewer=simple)

Kind regards
Christian Hennen

Project Manager Infrastructural Services ZIMK University of Trier
Germany

Von: Ashley Merrick  
Gesendet: Montag, 18. Februar 2019 16:53
An: Hennen, Christian 
Cc: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] CephFS: client hangs

Correct yes from my expirence OSD’s aswel.

On Mon, 18 Feb 2019 at 11:51 PM, Hennen, Christian 
mailto:christian.hen...@uni-trier.de> > wrote:
Hi!

>mon_max_pg_per_osd = 400
>
>In the ceph.conf and then restart all the services / or inject the config 
>into the running admin

I restarted all MONs, but I assume the OSDs need to be restarted as well?

> MDS show a client got evicted. Nothing else looks abnormal.  Do new cephfs 
> clients also get evicted quickly?

Yeah, it seems so. But strangely there is no indication of it in 'ceph -s' or 
'ceph health detail'. And they don't seem to be evicted permanently? Right 
now, only 1 client is connected. The others are shut down since last week. 
'ceph osd blacklist ls' shows 0 entries.


Kind regards
Christian Hennen

Project Manager Infrastructural Services ZIMK University of Trier
Germany

___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: client hangs

2019-02-18 Thread Ashley Merrick
Correct yes from my expirence OSD’s aswel.

On Mon, 18 Feb 2019 at 11:51 PM, Hennen, Christian <
christian.hen...@uni-trier.de> wrote:

> Hi!
>
> >mon_max_pg_per_osd = 400
> >
> >In the ceph.conf and then restart all the services / or inject the config
> >into the running admin
>
> I restarted all MONs, but I assume the OSDs need to be restarted as well?
>
> > MDS show a client got evicted. Nothing else looks abnormal.  Do new
> cephfs
> > clients also get evicted quickly?
>
> Yeah, it seems so. But strangely there is no indication of it in 'ceph -s'
> or
> 'ceph health detail'. And they don't seem to be evicted permanently? Right
> now, only 1 client is connected. The others are shut down since last week.
> 'ceph osd blacklist ls' shows 0 entries.
>
>
> Kind regards
> Christian Hennen
>
> Project Manager Infrastructural Services ZIMK University of Trier
> Germany
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: client hangs

2019-02-18 Thread Hennen, Christian
Hi!

>mon_max_pg_per_osd = 400
>
>In the ceph.conf and then restart all the services / or inject the config 
>into the running admin

I restarted all MONs, but I assume the OSDs need to be restarted as well?

> MDS show a client got evicted. Nothing else looks abnormal.  Do new cephfs 
> clients also get evicted quickly?

Yeah, it seems so. But strangely there is no indication of it in 'ceph -s' or 
'ceph health detail'. And they don't seem to be evicted permanently? Right 
now, only 1 client is connected. The others are shut down since last week. 
'ceph osd blacklist ls' shows 0 entries.


Kind regards
Christian Hennen

Project Manager Infrastructural Services ZIMK University of Trier
Germany



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: client hangs

2019-02-18 Thread Yan, Zheng
On Mon, Feb 18, 2019 at 10:55 PM Hennen, Christian
 wrote:
>
> Dear Community,
>
>
>
> we are running a Ceph Luminous Cluster with CephFS (Bluestore OSDs). During 
> setup, we made the mistake of configuring the OSDs on RAID Volumes. Initially 
> our cluster consisted of 3 nodes, each housing 1 OSD. Currently, we are in 
> the process of remediating this. After a loss of metadata 
> (http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-March/025612.html) 
> due to resetting the journal (journal entries were not being flushed fast 
> enough), we managed to bring the cluster back up and started adding 2 
> additional nodes 
> (http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-June/027563.html) .
>
>
>
> After adding the two additional nodes, we increased the number of placement 
> groups to not only accomodate the new nodes, but also to prepare for 
> reinstallation of the misconfigured nodes. Since then, the number of 
> placement groups per OSD is too high of course. Despite this fact, cluster 
> health remained fine over the last few months.
>
>
>
> However, we are currently observing massive problems: Whenever we try to 
> access any folder via CephFS, e.g. by listing its contents, there is no 
> response. Clients are getting blacklisted, but there is no warning. ceph -s 
> shows everything is ok, except for the number of PGs being too high. If I 
> grep for „assert“ or „error“ in any of the logs, nothing comes up. Also, it 
> is not possible to reduce the number of active MDS to 1. After issuing ‚ceph 
> fs set fs_data max_mds 1‘ nothing happens.
>
>
>
> Cluster details are available here: https://gitlab.uni-trier.de/snippets/77
>
>
>
> The MDS log  
> (https://gitlab.uni-trier.de/snippets/79?expanded=true&viewer=simple) 
> contains no „nicely exporting to“ messages as usual, but instead these:
>
> 2019-02-15 08:44:52.464926 7fdb13474700  7 mds.0.server 
> try_open_auth_dirfrag: not auth for [dir 0x100011ce7c6 /home/r-admin/ 
> [2,head] rep@1.1 dir_auth=1 state=0 f(v4 m2019-02-14 13:19:41.300993 
> 80=48+32) n(v11339 rc2019-02-14 13:19:41.300993 b10116465260 10869=10202+667) 
> hs=7+0,ss=0+0 | dnwaiter=0 child=1 frozen=0 subtree=1 replicated=0 dirty=0 
> waiter=0 authpin=0 tempexporting=0 0x564343eed100], fw to mds.1
>
>

MDS show a client got evicted. Nothing else looks abnormal.  Do new
cephfs clients also get evicted quickly?

>
> Updates from 12.2.8 to 12.2.11 I ran last week didn’t help.
>
>
>
> Anybody got an idea or a hint where I could look into next? Any help would be 
> greatly appreciated!
>
>
>
> Kind regards
>
> Christian Hennen
>
>
>
> Project Manager Infrastructural Services
> ZIMK University of Trier
>
> Germany
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: client hangs

2019-02-18 Thread Ashley Merrick
I know this may sound simple.

Have you tried raising the PG per an OSD limit, I'm sure I have seen in the
past people with the same kind of issue as you and was just I/O being
blocked due to a limit but not actively logged.

mon_max_pg_per_osd = 400

In the ceph.conf and then restart all the services / or inject the config
into the running admin

On Mon, Feb 18, 2019 at 10:55 PM Hennen, Christian <
christian.hen...@uni-trier.de> wrote:

> Dear Community,
>
>
>
> we are running a Ceph Luminous Cluster with CephFS (Bluestore OSDs).
> During setup, we made the mistake of configuring the OSDs on RAID Volumes.
> Initially our cluster consisted of 3 nodes, each housing 1 OSD. Currently,
> we are in the process of remediating this. After a loss of metadata (
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-March/025612.html)
> due to resetting the journal (journal entries were not being flushed fast
> enough), we managed to bring the cluster back up and started adding 2
> additional nodes (
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-June/027563.html)
> .
>
>
>
> After adding the two additional nodes, we increased the number of
> placement groups to not only accomodate the new nodes, but also to prepare
> for reinstallation of the misconfigured nodes. Since then, the number of
> placement groups per OSD is too high of course. Despite this fact, cluster
> health remained fine over the last few months.
>
>
>
> However, we are currently observing massive problems: Whenever we try to
> access any folder via CephFS, e.g. by listing its contents, there is no
> response. Clients are getting blacklisted, but there is no warning. ceph -s
> shows everything is ok, except for the number of PGs being too high. If I
> grep for „assert“ or „error“ in any of the logs, nothing comes up. Also, it
> is not possible to reduce the number of active MDS to 1. After issuing
> ‚ceph fs set fs_data max_mds 1‘ nothing happens.
>
>
>
> Cluster details are available here:
> https://gitlab.uni-trier.de/snippets/77
>
>
>
> The MDS log  (
> https://gitlab.uni-trier.de/snippets/79?expanded=true&viewer=simple)
> contains no „nicely exporting to“ messages as usual, but instead these:
>
> 2019-02-15 08:44:52.464926 7fdb13474700  7 mds.0.server
> try_open_auth_dirfrag: not auth for [dir 0x100011ce7c6 /home/r-admin/
> [2,head] rep@1.1 dir_auth=1 state=0 f(v4 m2019-02-14 13:19:41.300993
> 80=48+32) n(v11339 rc2019-02-14 13:19:41.300993 b10116465260
> 10869=10202+667) hs=7+0,ss=0+0 | dnwaiter=0 child=1 frozen=0 subtree=1
> replicated=0 dirty=0 waiter=0 authpin=0 tempexporting=0 0x564343eed100], fw
> to mds.1
>
>
>
> Updates from 12.2.8 to 12.2.11 I ran last week didn’t help.
>
>
>
> Anybody got an idea or a hint where I could look into next? Any help would
> be greatly appreciated!
>
>
>
> Kind regards
>
> Christian Hennen
>
>
>
> Project Manager Infrastructural Services
> ZIMK University of Trier
>
> Germany
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS: client hangs

2019-02-18 Thread Hennen, Christian
Dear Community,

 

we are running a Ceph Luminous Cluster with CephFS (Bluestore OSDs). During
setup, we made the mistake of configuring the OSDs on RAID Volumes.
Initially our cluster consisted of 3 nodes, each housing 1 OSD. Currently,
we are in the process of remediating this. After a loss of metadata
(http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-March/025612.html)
due to resetting the journal (journal entries were not being flushed fast
enough), we managed to bring the cluster back up and started adding 2
additional nodes
(http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-June/027563.html)
.

 

After adding the two additional nodes, we increased the number of placement
groups to not only accomodate the new nodes, but also to prepare for
reinstallation of the misconfigured nodes. Since then, the number of
placement groups per OSD is too high of course. Despite this fact, cluster
health remained fine over the last few months.

 

However, we are currently observing massive problems: Whenever we try to
access any folder via CephFS, e.g. by listing its contents, there is no
response. Clients are getting blacklisted, but there is no warning. ceph -s
shows everything is ok, except for the number of PGs being too high. If I
grep for "assert" or "error" in any of the logs, nothing comes up. Also, it
is not possible to reduce the number of active MDS to 1. After issuing ,ceph
fs set fs_data max_mds 1' nothing happens.

 

Cluster details are available here: https://gitlab.uni-trier.de/snippets/77 

 

The MDS log  (https://gitlab.uni-trier.de/snippets/79?expanded=true

&viewer=simple) contains no "nicely exporting to" messages as usual, but
instead these:

2019-02-15 08:44:52.464926 7fdb13474700  7 mds.0.server
try_open_auth_dirfrag: not auth for [dir 0x100011ce7c6 /home/r-admin/
[2,head] rep@1.1 dir_auth=1 state=0 f(v4 m2019-02-14 13:19:41.300993
80=48+32) n(v11339 rc2019-02-14 13:19:41.300993 b10116465260
10869=10202+667) hs=7+0,ss=0+0 | dnwaiter=0 child=1 frozen=0 subtree=1
replicated=0 dirty=0 waiter=0 authpin=0 tempexporting=0 0x564343eed100], fw
to mds.1

 

Updates from 12.2.8 to 12.2.11 I ran last week didn't help.

 

Anybody got an idea or a hint where I could look into next? Any help would
be greatly appreciated!

 

Kind regards

Christian Hennen

 

Project Manager Infrastructural Services
ZIMK University of Trier

Germany



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com