Re: [ceph-users] Ceph Upgrades - sanity check - MDS steps

2019-06-19 Thread Stefan Kooman
Quoting James Wilkins (james.wilk...@fasthosts.com):
> Hi all,
> 
> Just want to (double) check something – we’re in the process of
> luminous -> mimic upgrades for all of our clusters – particularly this
> section regarding MDS steps
> 
>   •  Confirm that only one MDS is online and is rank 0 for your FS: #
>   ceph status •  Upgrade the last remaining MDS daemon by installing
>   the new packages and restarting the daemon:
>
> Namely – is it required to upgrade the live single MDS in place (and
> thus have downtime whilst the MDS restarts – on our first cluster was
> typically 10 minutes of downtime ) – or can we upgrade the
> standby-replays/standbys first and flip once they are back?

You should upgrade in place (the last remaining MDS) and yes that causes
a bit of downtime. In our case it takes ~ 5s. Make sure to _only_
upgrade the ceph packages (no apt upgrade of whole system) as apt will
happily disable services, start updating initramfs ... for all installed
kernels, etc. Doing the full upgrade and reboot can be done later. This
is how we do it:

On (Active) Standby:

mds2: systemctl stop ceph-mds.target

On Active:

apt update
apt policy ceph-base <- check that the version that is available is
indeed the version you want to upgrade to!
apt install ceph-base ceph-common ceph-fuse ceph-mds ceph-mds-dbg
libcephfs2 python-cephfs

If mds doesn't get restarted with the upgrade, do it manually:

systemctl restart ceph-mds.target

^^ a bit of downtime

ceph daemon mds.$id version <- to make sure you are running the upgraded
version

(or run ceph versions to check)

On Standby:

apt install ceph-base ceph-common ceph-fuse ceph-mds ceph-mds-dbg
libcephfs2 python-cephfs

systemctl restart ceph-mds.target

ceph daemon mds.$id version <- to make sure you are running the upgraded
version

On Active:

apt upgrade && reboot

(Standby becomes active)

wait for HEALTH_OK

On (now) Active (previously standby):
apt upgrade && reboot

If you follow this procedure you end up with the same active and standby
as before the upgrades, both up to date with as little downtime as
possible.

That said ... I've accidentally updated a standby MDS to a newer version
than the Active one ... and this didn't cause any issues (12.2.8 ->
12.2.11) ... but I would not recommend it.

Gr. Stefan

-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Possible to move RBD volumes between pools?

2019-06-19 Thread Konstantin Shalygin

Both pools are in the same Ceph cluster. Do you have any documentation on
the live migration process? I'm running 14.2.1


Something like:

```

rbd migration prepare test1 rbd2/test2

rbd migration execute test1

rbd migration commit test1 --force

```



k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] out of date python-rtslib repo on https://shaman.ceph.com/

2019-06-19 Thread Michael Christie
On 06/17/2019 03:41 AM, Matthias Leopold wrote:
> thank you very much for updating python-rtslib!!
> could you maybe also do this for tcmu-runner (version 1.4.1)?

I am just about to make a new 1.5 release. Give me a week. I am working
on a last feature/bug for the gluster team, and then I am going to pass
the code to the gluster tcmu-runner devs for some review and testing.

> shaman repos are very convenient for installing and updating the ceph
> iscsi stack, I would be very happy if I could continue using it
> 
> matthias
> 
> Am 14.06.19 um 18:08 schrieb Matthias Leopold:
>> Hi,
>>
>> to the people running https://shaman.ceph.com/:
>> please update the repo for python-rtslib so recent ceph-iscsi packages
>> can be installed which need python-rtslib >= 2.1.fb68
>>
>> shaman python-rtslib version is 2.1.fb67
>> upstream python-rtslib version is 2.1.fb69
>>
>> thanks + thanks for running this service at all
>> matthias
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ISCSI Setup

2019-06-19 Thread Brent Kennedy
Sounds like progress!  Thanks for the update!  I will see if I can get it
working in my test off of the GH site.

-Brent

-Original Message-
From: Michael Christie  
Sent: Wednesday, June 19, 2019 5:24 PM
To: Brent Kennedy ; 'Ceph Users'

Subject: Re: [ceph-users] ISCSI Setup

On 06/19/2019 12:34 AM, Brent Kennedy wrote:
> Recently upgraded a ceph cluster to nautilus 14.2.1 from Luminous, no 
> issues.  One of the reasons for doing so was to take advantage of some 
> of the new ISCSI updates that were added in Nautilus.  I installed 
> CentOS 7.6 and did all the basic stuff to get the server online.  I 
> then tried to use the 
> http://docs.ceph.com/docs/nautilus/rbd/iscsi-target-cli/ document and 
> hit a hard stop.  Apparently, the package versions for the required 
> packages at the top nor the ceph-iscsi exist yet in any repositories.

I am in the process of updating the upstream docs (Aaron wrote up the
changes to the RHCS docs and I am just converting to the upstream docs and
making into patches for a PR, and ceph-ansible
(https://github.com/ceph/ceph-ansible/pull/3977) for the transition from
ceph-iscsi-cli/config to ceph-iscsi.

The upstream GH for ceph-iscsi is here
https://github.com/ceph/ceph-iscsi

and it is built here:
https://shaman.ceph.com/repos/ceph-iscsi/

I think we are just waiting on one last patch for fqdn support from SUSE so
we can make a new ceph-iscsi release.


> Reminds me of when I first tried to setup RGWs.  Is there a hidden 
> repository somewhere that hosts these required packages?  Also, I 
> found a thread talking about those packages and the instructions being 
> off, which concerns me.  Is there a good tutorial online somewhere?  I 
> saw the ceph-ansible bits, but wasn't sure if that would even work 
> because of the package issue.  I use ansible to deploy machines all 
> the time.  I also wonder if the ISCSI bits are considered production 
> or Test ( I see RedHat has a bunch of docs talking about using iscsi, 
> so I would think production ).
> 
>  
> 
> Thoughts anyone?
> 
>  
> 
> Regards,
> 
> -Brent
> 
>  
> 
> Existing Clusters:
> 
> Test: Nautilus 14.2.1 with 3 osd servers, 1 mon/man, 1 gateway ( all 
> virtual on SSD )
> 
> US Production(HDD): Nautilus 14.2.1 with 11 osd servers, 3 mons, 4 
> gateways behind haproxy LB
> 
> UK Production(HDD): Luminous 12.2.11 with 25 osd servers, 3 mons/man, 
> 3 gateways behind haproxy LB
> 
> US Production(SSD): Luminous 12.2.11 with 6 osd servers, 3 mons/man, 3 
> gateways behind haproxy LB
> 
>  
> 
>  
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Possible to move RBD volumes between pools?

2019-06-19 Thread Brett Chancellor
Both pools are in the same Ceph cluster. Do you have any documentation on
the live migration process? I'm running 14.2.1

On Wed, Jun 19, 2019, 8:35 PM Jason Dillaman  wrote:

> On Wed, Jun 19, 2019 at 6:25 PM Brett Chancellor
>  wrote:
> >
> > Background: We have a few ceph clusters, each serves multiple Openstack
> cluster. Each cluster has it's own set of pools.
> >
> > I'd like to move ~50TB of volumes from an old cluster (we'll call the
> pool cluster01-volumes) to an existing pool (cluster02-volumes) to later be
> imported by a different Openstack cluster. I could run something like
> this...
> > rbd export cluster01-volumes/volume-12345 | rbd import
> cluster02-volumes/volume-12345 .
>
> I'm getting a little confused by the dual use of "cluster" for both
> Ceph and OpenStack. Are both pools in the same Ceph cluster? If so,
> could you just clone the image to the new pool? The Nautilus release
> also includes a simple image live migration tool where it creates a
> clone, copies the data and all snapshots to the clone, and then
> deletes the original image.
>
> > But that would be slow and duplicate the data which I'd rather not do.
> Are there any better ways to it?
> >
> > Thanks,
> >
> > -Brett
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Jason
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] BlueFS spillover detected - 14.2.1

2019-06-19 Thread Nigel Williams
On Thu, 20 Jun 2019 at 09:12, Vitaliy Filippov  wrote:

> All values except 4, 30 and 286 GB are currently useless in ceph with
> default rocksdb settings :)
>

however, several commenters have said that during compaction rocksdb needs
space during the process, and hence the DB partition needs to be twice
those sizes, so 8GB, 60GB and 600GB.

Does rocksdb spill during compaction if it doesn't have enough space?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Possible to move RBD volumes between pools?

2019-06-19 Thread Jason Dillaman
On Wed, Jun 19, 2019 at 6:25 PM Brett Chancellor
 wrote:
>
> Background: We have a few ceph clusters, each serves multiple Openstack 
> cluster. Each cluster has it's own set of pools.
>
> I'd like to move ~50TB of volumes from an old cluster (we'll call the pool 
> cluster01-volumes) to an existing pool (cluster02-volumes) to later be 
> imported by a different Openstack cluster. I could run something like this...
> rbd export cluster01-volumes/volume-12345 | rbd import 
> cluster02-volumes/volume-12345 .

I'm getting a little confused by the dual use of "cluster" for both
Ceph and OpenStack. Are both pools in the same Ceph cluster? If so,
could you just clone the image to the new pool? The Nautilus release
also includes a simple image live migration tool where it creates a
clone, copies the data and all snapshots to the clone, and then
deletes the original image.

> But that would be slow and duplicate the data which I'd rather not do. Are 
> there any better ways to it?
>
> Thanks,
>
> -Brett
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] BlueFS spillover detected - 14.2.1

2019-06-19 Thread Vitaliy Filippov
All values except 4, 30 and 286 GB are currently useless in ceph with  
default rocksdb settings :)


That's what you are seeing - all devices just use ~28 GB and everything  
else goes to HDDs.


--
With best regards,
  Vitaliy Filippov
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Possible to move RBD volumes between pools?

2019-06-19 Thread Brett Chancellor
Background: We have a few ceph clusters, each serves multiple Openstack
cluster. Each cluster has it's own set of pools.

I'd like to move ~50TB of volumes from an old cluster (we'll call the pool
cluster01-volumes) to an existing pool (cluster02-volumes) to later be
imported by a different Openstack cluster. I could run something like
this...
rbd export cluster01-volumes/volume-12345 | rbd import
cluster02-volumes/volume-12345 .

But that would be slow and duplicate the data which I'd rather not do. Are
there any better ways to it?

Thanks,

-Brett
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ISCSI Setup

2019-06-19 Thread Michael Christie
On 06/19/2019 12:34 AM, Brent Kennedy wrote:
> Recently upgraded a ceph cluster to nautilus 14.2.1 from Luminous, no
> issues.  One of the reasons for doing so was to take advantage of some
> of the new ISCSI updates that were added in Nautilus.  I installed
> CentOS 7.6 and did all the basic stuff to get the server online.  I then
> tried to use the
> http://docs.ceph.com/docs/nautilus/rbd/iscsi-target-cli/ document and
> hit a hard stop.  Apparently, the package versions for the required
> packages at the top nor the ceph-iscsi exist yet in any repositories. 

I am in the process of updating the upstream docs (Aaron wrote up the
changes to the RHCS docs and I am just converting to the upstream docs
and making into patches for a PR, and ceph-ansible
(https://github.com/ceph/ceph-ansible/pull/3977) for the transition from
ceph-iscsi-cli/config to ceph-iscsi.

The upstream GH for ceph-iscsi is here
https://github.com/ceph/ceph-iscsi

and it is built here:
https://shaman.ceph.com/repos/ceph-iscsi/

I think we are just waiting on one last patch for fqdn support from SUSE
so we can make a new ceph-iscsi release.


> Reminds me of when I first tried to setup RGWs.  Is there a hidden
> repository somewhere that hosts these required packages?  Also, I found
> a thread talking about those packages and the instructions being off,
> which concerns me.  Is there a good tutorial online somewhere?  I saw
> the ceph-ansible bits, but wasn’t sure if that would even work because
> of the package issue.  I use ansible to deploy machines all the time.  I
> also wonder if the ISCSI bits are considered production or Test ( I see
> RedHat has a bunch of docs talking about using iscsi, so I would think
> production ).
> 
>  
> 
> Thoughts anyone?
> 
>  
> 
> Regards,
> 
> -Brent
> 
>  
> 
> Existing Clusters:
> 
> Test: Nautilus 14.2.1 with 3 osd servers, 1 mon/man, 1 gateway ( all
> virtual on SSD )
> 
> US Production(HDD): Nautilus 14.2.1 with 11 osd servers, 3 mons, 4
> gateways behind haproxy LB
> 
> UK Production(HDD): Luminous 12.2.11 with 25 osd servers, 3 mons/man, 3
> gateways behind haproxy LB
> 
> US Production(SSD): Luminous 12.2.11 with 6 osd servers, 3 mons/man, 3
> gateways behind haproxy LB
> 
>  
> 
>  
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS getattr op stuck in snapshot

2019-06-19 Thread Hector Martin
On 13/06/2019 14.31, Hector Martin wrote:
> On 12/06/2019 22.33, Yan, Zheng wrote:
>> I have tracked down the bug. thank you for reporting this.  'echo 2 >
>> /proc/sys/vm/drop_cache' should fix the hang.  If you can compile ceph
>> from source, please try following patch.
>>
>> diff --git a/src/mds/Locker.cc b/src/mds/Locker.cc
>> index ecd06294fa..94b947975a 100644
>> --- a/src/mds/Locker.cc
>> +++ b/src/mds/Locker.cc
>> @@ -2956,7 +2956,8 @@ void Locker::handle_client_caps(MClientCaps *m)
>>
>>// client flushes and releases caps at the same time. make sure
>> MDCache::cow_inode()
>>// properly setup CInode::client_need_snapflush
>> -  if ((m->get_dirty() & ~cap->issued()) && !need_snapflush)
>> +  if (!need_snapflush && (m->get_dirty() & ~cap->issued()) &&
>> + (m->flags & MClientCaps::FLAG_PENDING_CAPSNAP))
>> cap->mark_needsnapflush();
>>  }
>>
>>
>>
> 
> That was quick, thanks! I can build from source but I won't have time to
> do so and test it until next week, if that's okay.

Okay, I tried building packages for Xenial following this doc, but that
didn't go so well:

http://docs.ceph.com/docs/mimic/install/build-ceph/

It seems install-deps pulls in a ppa with a newer GCC and libstdc++ (!)
and that produces a build that is incompatible with a plain Xenial
machine, no PPAs. The version tag is different too (the -1xenial thing
isn't present).

Is there documentation for how to build Ubuntu packages the exact same
way as they are built for download.ceph.com? i.e.
ceph-mds-dbg_13.2.6-1xenial_amd64.deb. If I can figure that out I can
build a patched mds and test it.


-- 
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS damaged and cannot recover

2019-06-19 Thread Patrick Donnelly
On Wed, Jun 19, 2019 at 9:19 AM Wei Jin  wrote:
>
> There are plenty of data in this cluster (2PB), please help us, thx.
> Before doing this dangerous
> operations(http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#disaster-recovery-experts)
> , any suggestions?
>
> Ceph version: 12.2.12
>
> ceph fs status:
>
> cephfs - 1057 clients
> ==
> +--+-+-+--+---+---+
> | Rank |  State  | MDS | Activity |  dns  |  inos |
> +--+-+-+--+---+---+
> |  0   |  failed | |  |   |   |
> |  1   | resolve | n31-023-214 |  |0  |0  |
> |  2   | resolve | n31-023-215 |  |0  |0  |
> |  3   | resolve | n31-023-218 |  |0  |0  |
> |  4   | resolve | n31-023-220 |  |0  |0  |
> |  5   | resolve | n31-023-217 |  |0  |0  |
> |  6   | resolve | n31-023-222 |  |0  |0  |
> |  7   | resolve | n31-023-216 |  |0  |0  |
> |  8   | resolve | n31-023-221 |  |0  |0  |
> |  9   | resolve | n31-023-223 |  |0  |0  |
> |  10  | resolve | n31-023-225 |  |0  |0  |
> |  11  | resolve | n31-023-224 |  |0  |0  |
> |  12  | resolve | n31-023-219 |  |0  |0  |
> |  13  | resolve | n31-023-229 |  |0  |0  |
> +--+-+-+--+---+---+
> +-+--+---+---+
> |   Pool  |   type   |  used | avail |
> +-+--+---+---+
> | cephfs_metadata | metadata | 2843M | 34.9T |
> |   cephfs_data   |   data   | 2580T |  731T |
> +-+--+---+---+
>
> +-+
> | Standby MDS |
> +-+
> | n31-023-227 |
> | n31-023-226 |
> | n31-023-228 |
> +-+

Are there failovers occurring while all the ranks are in up:resolve?
MDS logs at high debug level would be helpful.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS damaged and cannot recover

2019-06-19 Thread Wei Jin
There are plenty of data in this cluster (2PB), please help us, thx.
Before doing this dangerous
operations(http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#disaster-recovery-experts)
, any suggestions?

Ceph version: 12.2.12

ceph fs status:

cephfs - 1057 clients
==
+--+-+-+--+---+---+
| Rank |  State  | MDS | Activity |  dns  |  inos |
+--+-+-+--+---+---+
|  0   |  failed | |  |   |   |
|  1   | resolve | n31-023-214 |  |0  |0  |
|  2   | resolve | n31-023-215 |  |0  |0  |
|  3   | resolve | n31-023-218 |  |0  |0  |
|  4   | resolve | n31-023-220 |  |0  |0  |
|  5   | resolve | n31-023-217 |  |0  |0  |
|  6   | resolve | n31-023-222 |  |0  |0  |
|  7   | resolve | n31-023-216 |  |0  |0  |
|  8   | resolve | n31-023-221 |  |0  |0  |
|  9   | resolve | n31-023-223 |  |0  |0  |
|  10  | resolve | n31-023-225 |  |0  |0  |
|  11  | resolve | n31-023-224 |  |0  |0  |
|  12  | resolve | n31-023-219 |  |0  |0  |
|  13  | resolve | n31-023-229 |  |0  |0  |
+--+-+-+--+---+---+
+-+--+---+---+
|   Pool  |   type   |  used | avail |
+-+--+---+---+
| cephfs_metadata | metadata | 2843M | 34.9T |
|   cephfs_data   |   data   | 2580T |  731T |
+-+--+---+---+

+-+
| Standby MDS |
+-+
| n31-023-227 |
| n31-023-226 |
| n31-023-228 |
+-+



ceph fs dump:

dumped fsmap epoch 22712
e22712
enable_multiple, ever_enabled_multiple: 0,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client
writeable ranges,3=default file layouts on dirs,4=dir inode in
separate object,5=mds uses versioned encoding,6=dirfrag is stored in
omap,8=no anchor table,9=file layout v2}
legacy client fscid: 1

Filesystem 'cephfs' (1)
fs_name cephfs
epoch 22711
flags 4
created 2018-11-30 10:05:06.015325
modified 2019-06-19 23:37:41.400961
tableserver 0
root 0
session_timeout 60
session_autoclose 300
max_file_size 1099511627776
last_failure 0
last_failure_osd_epoch 22246
compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate
object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
anchor table,9=file layout v2}
max_mds 14
in 0,1,2,3,4,5,6,7,8,9,10,11,12,13
up 
{1=31684663,2=31684674,3=31684576,4=31684673,5=31684678,6=31684612,7=31684688,8=31684683,9=31684698,10=31684695,11=31684693,12=31684586,13=31684617}
failed
damaged 0
stopped
data_pools [2]
metadata_pool 1
inline_data disabled
balancer
standby_count_wanted 1
31684663: 10.31.23.214:6800/829459839 'n31-023-214' mds.1.22682 up:resolve seq 6
31684674: 10.31.23.215:6800/2483123757 'n31-023-215' mds.2.22683
up:resolve seq 3
31684576: 10.31.23.218:6800/3381299029 'n31-023-218' mds.3.22683
up:resolve seq 3
31684673: 10.31.23.220:6800/3540255817 'n31-023-220' mds.4.22685
up:resolve seq 3
31684678: 10.31.23.217:6800/4004537495 'n31-023-217' mds.5.22689
up:resolve seq 3
31684612: 10.31.23.222:6800/1482899141 'n31-023-222' mds.6.22691
up:resolve seq 3
31684688: 10.31.23.216:6800/820115186 'n31-023-216' mds.7.22693 up:resolve seq 3
31684683: 10.31.23.221:6800/1996416037 'n31-023-221' mds.8.22693
up:resolve seq 3
31684698: 10.31.23.223:6800/2807778042 'n31-023-223' mds.9.22695
up:resolve seq 3
31684695: 10.31.23.225:6800/101451176 'n31-023-225' mds.10.22702
up:resolve seq 3
31684693: 10.31.23.224:6800/1597373084 'n31-023-224' mds.11.22695
up:resolve seq 3
31684586: 10.31.23.219:6800/3640206080 'n31-023-219' mds.12.22695
up:resolve seq 3
31684617: 10.31.23.229:6800/3511814011 'n31-023-229' mds.13.22697
up:resolve seq 3


Standby daemons:

31684637: 10.31.23.227:6800/1987867930 'n31-023-227' mds.-1.0 up:standby seq 2
31684690: 10.31.23.226:6800/3695913629 'n31-023-226' mds.-1.0 up:standby seq 2
31689991: 10.31.23.228:6800/2624666750 'n31-023-228' mds.-1.0 up:standby seq 2
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph crush map randomly changes for one host

2019-06-19 Thread Feng Zhang
Could it because all the osds in it are set reweight =0?

-75.45695 host ceph-osd3

 26   hdd  1.81898 osd.26   down0 1.0

 27   hdd  1.81898 osd.27   down0 1.0

 30   hdd  1.81898 osd.30   down0 1.0




Best,

Feng


On Wed, Jun 19, 2019 at 11:36 AM Pelletier, Robert 
wrote:

> Here is a fuller picture. I inherited this ceph cluster from a previous
> admin whom has left the company. Although I am a linux administrator, I
> have very little experience with ceph and have had to learn; definitely
> still a lot to learn. I do know this crush map was made manually. To me it
> does not look right and would like to reorganize it, but I am concerned
> about what effects that would have with a cluster that has data on it.
>
>
>
> I would like to remove both of the osd3-shelf1 and osd3-shelf2 chassis
> buckets and move them to host ceph-osd3 (I don’t see a need from separate
> buckets here). The “chassis” are actually two SAS disk shelves connected to
> ceph-osd3 host.
>
>
>
> However, just moving one osd causes ceph to go unhealthy with
> OBJECT_MISPLACED messages and takes a while to go back into a healthy
> state. I am not too sure this is a big concern, but I am wondering if there
> is a recommended procedure for doing this. As I am just learning, I don’t
> want to do anything that will cause data loss.
>
>
>
>
>
>
>
>
>
> My Tree is supposed to look like below but it keeps changing to the map 
> further
> below. Notice the drives moving from chassis osd3-shelf1 chassis to host
> ceph-osd3. Does anyone know why this may happen?
>
>
>
> I wrote a script to monitor for this and to place the osds back where they
> belong if they notice the change, but this should obviously not be
> necessary. I would appreciate any help with this.
>
>
>
>
>
>
>
> ID  CLASS WEIGHT   TYPE NAME  STATUS REWEIGHT PRI-AFF
>
> -62   14.55199 root osd3-shelf2
>
> -60   14.55199 chassis ceph-osd3-shelf2
>
>   3   hdd  1.81898 osd.3down  1.0 1.0
>
>  40   hdd  1.81898 osd.40   down  1.0 1.0
>
>  41   hdd  1.81898 osd.41   down  1.0 1.0
>
>  42   hdd  1.81898 osd.42   down  1.0 1.0
>
>  43   hdd  1.81898 osd.43   down  1.0 1.0
>
>  44   hdd  1.81898 osd.44   down  1.0 1.0
>
>  45   hdd  1.81898 osd.45   down  1.0 1.0
>
>  46   hdd  1.81898 osd.46   down  1.0 1.0
>
> -581.71599 root osd3-internal
>
> -541.71599 chassis ceph-osd3-internal
>
>  34   hdd  0.42899 osd.34   down  1.0 1.0
>
>  35   hdd  0.42899 osd.35   down  1.0 1.0
>
>  36   hdd  0.42899 osd.36   down  1.0 1.0
>
>  37   hdd  0.42899 osd.37   down  1.0 1.0
>
> -50   14.55199 root osd3-shelf1
>
> -56   14.55199 chassis ceph-osd3-shelf1
>
>  21   hdd  1.81898 osd.21   down  1.0 1.0
>
>  22   hdd  1.81898 osd.22   down  1.0 1.0
>
>  23   hdd  1.81898 osd.23   down  1.0 1.0
>
>  24   hdd  1.81898 osd.24   down  1.0 1.0
>
>  25   hdd  1.81898 osd.25   down  1.0 1.0
>
>  28   hdd  1.81898 osd.28   down  1.0 1.0
>
>  29   hdd  1.81898 osd.29   down  1.0 1.0
>
>  31   hdd  1.81898 osd.31   down  1.0 1.0
>
>  -75.45695 host ceph-osd3
>
>  26   hdd  1.81898 osd.26   down0 1.0
>
>  27   hdd  1.81898 osd.27   down0 1.0
>
>  30   hdd  1.81898 osd.30   down0 1.0
>
>  -1   47.21199 root default
>
> -40   23.59000 rack mainehall
>
>  -3   23.59000 host ceph-osd1
>
>   0   hdd  1.81898 osd.0  up  1.0 1.0
>
>   1   hdd  1.81898 osd.1  up  1.0 1.0
>
>   2   hdd  1.81898 osd.2  up  1.0 1.0
>
>   4   hdd  1.81898 osd.4  up  1.0 1.0
>
>   5   hdd  1.81898 osd.5  up  0.90002 1.0
>
>   6   hdd  1.81898 osd.6  up  1.0 1.0
>
>   7   hdd  1.81898 osd.7  up  1.0 1.0
>
>   8   hdd  1.81898 osd.8  up  1.0 1.0
>
>   9   hdd  1.81898 osd.9  up  1.0 1.0
>
>  10   hdd  1.81898 osd.10 up  0.95001 1.0
>
>  33   hdd  1.76099  

Re: [ceph-users] Ceph crush map randomly changes for one host

2019-06-19 Thread Pelletier, Robert
Here is a fuller picture. I inherited this ceph cluster from a previous admin 
whom has left the company. Although I am a linux administrator, I have very 
little experience with ceph and have had to learn; definitely still a lot to 
learn. I do know this crush map was made manually. To me it does not look right 
and would like to reorganize it, but I am concerned about what effects that 
would have with a cluster that has data on it.

I would like to remove both of the osd3-shelf1 and osd3-shelf2 chassis buckets 
and move them to host ceph-osd3 (I don’t see a need from separate buckets 
here). The “chassis” are actually two SAS disk shelves connected to ceph-osd3 
host.

However, just moving one osd causes ceph to go unhealthy with OBJECT_MISPLACED 
messages and takes a while to go back into a healthy state. I am not too sure 
this is a big concern, but I am wondering if there is a recommended procedure 
for doing this. As I am just learning, I don’t want to do anything that will 
cause data loss.






My Tree is supposed to look like below but it keeps changing to the map further 
below. Notice the drives moving from chassis osd3-shelf1 chassis to host 
ceph-osd3. Does anyone know why this may happen?

I wrote a script to monitor for this and to place the osds back where they 
belong if they notice the change, but this should obviously not be necessary. I 
would appreciate any help with this.



ID  CLASS WEIGHT   TYPE NAME  STATUS REWEIGHT PRI-AFF
-62   14.55199 root osd3-shelf2
-60   14.55199 chassis ceph-osd3-shelf2
  3   hdd  1.81898 osd.3down  1.0 1.0
 40   hdd  1.81898 osd.40   down  1.0 1.0
 41   hdd  1.81898 osd.41   down  1.0 1.0
 42   hdd  1.81898 osd.42   down  1.0 1.0
 43   hdd  1.81898 osd.43   down  1.0 1.0
 44   hdd  1.81898 osd.44   down  1.0 1.0
 45   hdd  1.81898 osd.45   down  1.0 1.0
 46   hdd  1.81898 osd.46   down  1.0 1.0
-581.71599 root osd3-internal
-541.71599 chassis ceph-osd3-internal
 34   hdd  0.42899 osd.34   down  1.0 1.0
 35   hdd  0.42899 osd.35   down  1.0 1.0
 36   hdd  0.42899 osd.36   down  1.0 1.0
 37   hdd  0.42899 osd.37   down  1.0 1.0
-50   14.55199 root osd3-shelf1
-56   14.55199 chassis ceph-osd3-shelf1
 21   hdd  1.81898 osd.21   down  1.0 1.0
 22   hdd  1.81898 osd.22   down  1.0 1.0
 23   hdd  1.81898 osd.23   down  1.0 1.0
 24   hdd  1.81898 osd.24   down  1.0 1.0
 25   hdd  1.81898 osd.25   down  1.0 1.0
 28   hdd  1.81898 osd.28   down  1.0 1.0
 29   hdd  1.81898 osd.29   down  1.0 1.0
 31   hdd  1.81898 osd.31   down  1.0 1.0
 -75.45695 host ceph-osd3
 26   hdd  1.81898 osd.26   down0 1.0
 27   hdd  1.81898 osd.27   down0 1.0
 30   hdd  1.81898 osd.30   down0 1.0
 -1   47.21199 root default
-40   23.59000 rack mainehall
 -3   23.59000 host ceph-osd1
  0   hdd  1.81898 osd.0  up  1.0 1.0
  1   hdd  1.81898 osd.1  up  1.0 1.0
  2   hdd  1.81898 osd.2  up  1.0 1.0
  4   hdd  1.81898 osd.4  up  1.0 1.0
  5   hdd  1.81898 osd.5  up  0.90002 1.0
  6   hdd  1.81898 osd.6  up  1.0 1.0
  7   hdd  1.81898 osd.7  up  1.0 1.0
  8   hdd  1.81898 osd.8  up  1.0 1.0
  9   hdd  1.81898 osd.9  up  1.0 1.0
 10   hdd  1.81898 osd.10 up  0.95001 1.0
 33   hdd  1.76099 osd.33 up  1.0 1.0
 38   hdd  3.63899 osd.38 up  1.0 1.0
-42   23.62199 rack rangleyhall
 -5   23.62199 host ceph-osd2
 11   hdd  1.81898 osd.11 up  1.0 1.0
 12   hdd  1.81898 osd.12 up  0.90002 1.0
 13   hdd  1.81898 osd.13 up  1.0 1.0
 14   hdd  1.81898 osd.14 up  1.0 1.0
 15   hdd  1.81898 osd.15 up  1.0 1.0
 16   hdd  1.81898 osd.16 up  1

Re: [ceph-users] Stop metadata sync in multi-site RGW

2019-06-19 Thread Marcelo Mariano Miziara
Hi Casey, thanks for the quick reply.

The goal is pause replication for a while. Thanks a lot, i'll try this 
rgw_run_sync_thread.

Marcelo M. Miziara
Serviço Federal de Processamento de Dados - SERPRO
marcelo.mizi...@serpro.gov.br

- Mensagem original -
De: "Casey Bodley" 
Para: "ceph-users" 
Enviadas: Quarta-feira, 19 de junho de 2019 11:54:18
Assunto: Re: [ceph-users] Stop metadata sync in multi-site RGW

Right, the sync_from fields in the zone configuration only relate to 
data sync within the zonegroup. Can you clarify what your goal is? Are 
you just trying to pause the replication for a while, or disable it 
permanently?

To pause replication, you can configure rgw_run_sync_thread=0 on all 
gateways in that zone. Just note that replication logs will continue to 
grow, and because this 'paused' zone isn't consuming them, it will 
prevent the logs from being trimmed on all zones until sync is reenabled 
and replication catches up.

To disable replication entirely, you'd want to move that zone out of the 
multisite configuration. This would involve removing the zone from its 
current zonegroup, creating a new realm and zonegroup, moving the zone 
into that, and setting its log_data/log_meta fields to false. I can 
follow up with radosgw-admin commands if that's what you're trying to do.

On 6/19/19 10:14 AM, Marcelo Mariano Miziara wrote:
> Hello all!
>
> I'm trying to stop the sync from two zones, but using the parameter 
> "--sync_from_all=false" seems to stop only the data sync, but not the 
> metadata (i.e. users and buckets are synced).
>
>
> # radosgw-admin sync status
>   realm  (xx)
>       zonegroup  (xx)
>   zone  (xx)
>   metadata sync syncing
>     full sync: 0/64 shards
>     incremental sync: 64/64 shards
>     metadata is caught up with master
>   data sync source: (xx)
>     not syncing from zone
>
> Thanks,
> Marcelo M.
> Serviço Federal de Processamento de Dados - SERPRO
> marcelo.mizi...@serpro.gov.br
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-


"Esta mensagem do SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO), empresa 
pública federal regida pelo disposto na Lei Federal nº 5.615, é enviada 
exclusivamente a seu destinatário e pode conter informações confidenciais, 
protegidas por sigilo profissional. Sua utilização desautorizada é ilegal e 
sujeita o infrator às penas da lei. Se você a recebeu indevidamente, queira, 
por gentileza, reenviá-la ao emitente, esclarecendo o equívoco."

"This message from SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO) -- a 
government company established under Brazilian law (5.615/70) -- is directed 
exclusively to its addressee and may contain confidential data, protected under 
professional secrecy rules. Its unauthorized use is illegal and may subject the 
transgressor to the law's penalties. If you're not the addressee, please send 
it back, elucidating the failure."
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stop metadata sync in multi-site RGW

2019-06-19 Thread Casey Bodley
Right, the sync_from fields in the zone configuration only relate to 
data sync within the zonegroup. Can you clarify what your goal is? Are 
you just trying to pause the replication for a while, or disable it 
permanently?


To pause replication, you can configure rgw_run_sync_thread=0 on all 
gateways in that zone. Just note that replication logs will continue to 
grow, and because this 'paused' zone isn't consuming them, it will 
prevent the logs from being trimmed on all zones until sync is reenabled 
and replication catches up.


To disable replication entirely, you'd want to move that zone out of the 
multisite configuration. This would involve removing the zone from its 
current zonegroup, creating a new realm and zonegroup, moving the zone 
into that, and setting its log_data/log_meta fields to false. I can 
follow up with radosgw-admin commands if that's what you're trying to do.


On 6/19/19 10:14 AM, Marcelo Mariano Miziara wrote:

Hello all!

I'm trying to stop the sync from two zones, but using the parameter 
"--sync_from_all=false" seems to stop only the data sync, but not the 
metadata (i.e. users and buckets are synced).



# radosgw-admin sync status
  realm  (xx)
      zonegroup  (xx)
  zone  (xx)
  metadata sync syncing
    full sync: 0/64 shards
    incremental sync: 64/64 shards
    metadata is caught up with master
  data sync source: (xx)
    not syncing from zone

Thanks,
Marcelo M.
Serviço Federal de Processamento de Dados - SERPRO
marcelo.mizi...@serpro.gov.br

-


"Esta mensagem do SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO), 
empresa pública federal regida pelo disposto na Lei Federal nº 5.615, 
é enviada exclusivamente a seu destinatário e pode conter informações 
confidenciais, protegidas por sigilo profissional. Sua utilização 
desautorizada é ilegal e sujeita o infrator às penas da lei. Se você a 
recebeu indevidamente, queira, por gentileza, reenviá-la ao emitente, 
esclarecendo o equívoco."


"This message from SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO) 
-- a government company established under Brazilian law (5.615/70) -- 
is directed exclusively to its addressee and may contain confidential 
data, protected under professional secrecy rules. Its unauthorized use 
is illegal and may subject the transgressor to the law's penalties. If 
you're not the addressee, please send it back, elucidating the failure."


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Stop metadata sync in multi-site RGW

2019-06-19 Thread Marcelo Mariano Miziara
Hello all! 

I'm trying to stop the sync from two zones, but using the parameter 
"--sync_from_all=false" seems to stop only the data sync, but not the metadata 
(i.e. users and buckets are synced). 


# radosgw-admin sync status 
realm  (xx) 
zonegroup  (xx) 
zone  (xx) 
metadata sync syncing 
full sync: 0/64 shards 
incremental sync: 64/64 shards 
metadata is caught up with master 
data sync source: (xx) 
not syncing from zone 

Thanks, 
Marcelo M. 
Serviço Federal de Processamento de Dados - SERPRO 
marcelo.mizi...@serpro.gov.br 

-


"Esta mensagem do SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO), empresa 
pública federal regida pelo disposto na Lei Federal nº 5.615, é enviada 
exclusivamente a seu destinatário e pode conter informações confidenciais, 
protegidas por sigilo profissional. Sua utilização desautorizada é ilegal e 
sujeita o infrator às penas da lei. Se você a recebeu indevidamente, queira, 
por gentileza, reenviá-la ao emitente, esclarecendo o equívoco."

"This message from SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO) -- a 
government company established under Brazilian law (5.615/70) -- is directed 
exclusively to its addressee and may contain confidential data, protected under 
professional secrecy rules. Its unauthorized use is illegal and may subject the 
transgressor to the law's penalties. If you're not the addressee, please send 
it back, elucidating the failure."
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Reduced data availability: 2 pgs inactive

2019-06-19 Thread Lars Täuber
Hi Paul,

thanks for the hint.
Restarting the primary osds of the inactive pgs resolved the problem:

Before restarting them they said:
2019-06-19 15:55:36.190 7fcd55c4e700 -1 osd.5 33858 get_health_metrics 
reporting 15 slow ops, oldest is osd_op(client.220116.0:967410 21.2e4s0 
21.d4e19ae4 (undecoded) ondisk+write+known_if_redirected e31569)
and
2019-06-19 15:53:31.214 7f9b946d1700 -1 osd.13 33849 get_health_metrics 
reporting 14560 slow ops, oldest is osd_op(mds.0.44294:99584053 23.5 
23.cad28605 (undecoded) ondisk+write+known_if_redirected+full_force e31562)

Is this something to worry about?

Regards,
Lars

Wed, 19 Jun 2019 15:04:06 +0200
Paul Emmerich  ==> Lars Täuber  :
> That shouldn't trigger the PG limit (yet), but increasing "mon max pg per
> osd" from the default of 200 is a good idea anyways since you are running
> with more than 200 PGs per OSD.
> 
> I'd try to restart all OSDs that are in the UP set for that PG:
> 
> 13,
> 21,
> 23
> 7,
> 29,
> 9,
> 28,
> 11,
> 8
> 
> 
> Maybe that solves it (technically it shouldn't), if that doesn't work
> you'll have to dig in deeper into the log files to see where exactly and
> why it is stuck activating.
> 
> Paul
> 


-- 
Informationstechnologie
Berlin-Brandenburgische Akademie der Wissenschaften
Jägerstraße 22-23  10117 Berlin
Tel.: +49 30 20370-352   http://www.bbaw.de
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Reduced data availability: 2 pgs inactive

2019-06-19 Thread Paul Emmerich
That shouldn't trigger the PG limit (yet), but increasing "mon max pg per
osd" from the default of 200 is a good idea anyways since you are running
with more than 200 PGs per OSD.

I'd try to restart all OSDs that are in the UP set for that PG:

13,
21,
23
7,
29,
9,
28,
11,
8


Maybe that solves it (technically it shouldn't), if that doesn't work
you'll have to dig in deeper into the log files to see where exactly and
why it is stuck activating.

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Wed, Jun 19, 2019 at 2:30 PM Lars Täuber  wrote:

> Hi Paul,
>
> thanks for your reply.
>
> Wed, 19 Jun 2019 13:19:55 +0200
> Paul Emmerich  ==> Lars Täuber  :
> > Wild guess: you hit the PG hard limit, how many PGs per OSD do you have?
> > If this is the case: increase "osd max pg per osd hard ratio"
> >
> > Check "ceph pg  query" to see why it isn't activating.
> >
> > Can you share the output of "ceph osd df tree" and "ceph pg  query"
> > of the affected PGs?
>
> The pg queries are attached. I can't read them - to much information.
>
>
> Here is the osd df tree:
> # osd df tree
> ID  CLASS WEIGHTREWEIGHT SIZERAW USE DATAOMAPMETA
>  AVAIL   %USE VAR  PGS STATUS TYPE NAME
>  -1   167.15057- 167 TiB 4.7 TiB 1.2 TiB 952 MiB   57 GiB 162
> TiB 2.79 1.00   -root PRZ
> -1772.43192-  72 TiB 2.0 TiB 535 GiB 393 MiB   25 GiB  70
> TiB 2.78 1.00   -rack 1-eins
>  -922.28674-  22 TiB 640 GiB 170 GiB  82 MiB  9.0 GiB  22
> TiB 2.80 1.01   -host onode1
>   2   hdd   5.57169  1.0 5.6 TiB 162 GiB  45 GiB  11 MiB  2.3 GiB 5.4
> TiB 2.84 1.02 224 up osd.2
>   9   hdd   5.57169  1.0 5.6 TiB 156 GiB  39 GiB  19 MiB  2.1 GiB 5.4
> TiB 2.74 0.98 201 up osd.9
>  14   hdd   5.57169  1.0 5.6 TiB 162 GiB  44 GiB  24 MiB  2.1 GiB 5.4
> TiB 2.84 1.02 230 up osd.14
>  21   hdd   5.57169  1.0 5.6 TiB 160 GiB  42 GiB  27 MiB  2.5 GiB 5.4
> TiB 2.80 1.00 219 up osd.21
> -1322.28674-  22 TiB 640 GiB 170 GiB 123 MiB  8.9 GiB  22
> TiB 2.80 1.00   -host onode4
>   4   hdd   5.57169  1.0 5.6 TiB 156 GiB  39 GiB  38 MiB  2.2 GiB 5.4
> TiB 2.73 0.98 205 up osd.4
>  11   hdd   5.57169  1.0 5.6 TiB 164 GiB  47 GiB  24 MiB  2.0 GiB 5.4
> TiB 2.87 1.03 241 up osd.11
>  18   hdd   5.57169  1.0 5.6 TiB 159 GiB  42 GiB  31 MiB  2.5 GiB 5.4
> TiB 2.79 1.00 221 up osd.18
>  22   hdd   5.57169  1.0 5.6 TiB 160 GiB  43 GiB  29 MiB  2.1 GiB 5.4
> TiB 2.81 1.01 225 up osd.22
>  -527.85843-  28 TiB 782 GiB 195 GiB 188 MiB  6.9 GiB  27
> TiB 2.74 0.98   -host onode7
>   5   hdd   5.57169  1.0 5.6 TiB 158 GiB  41 GiB  26 MiB  1.2 GiB 5.4
> TiB 2.77 0.99 213 up osd.5
>  12   hdd   5.57169  1.0 5.6 TiB 159 GiB  42 GiB  31 MiB  993 MiB 5.4
> TiB 2.79 1.00 222 up osd.12
>  20   hdd   5.57169  1.0 5.6 TiB 157 GiB  40 GiB  47 MiB  1.2 GiB 5.4
> TiB 2.76 0.99 212 up osd.20
>  27   hdd   5.57169  1.0 5.6 TiB 151 GiB  33 GiB  28 MiB  1.9 GiB 5.4
> TiB 2.64 0.95 179 up osd.27
>  29   hdd   5.57169  1.0 5.6 TiB 156 GiB  39 GiB  56 MiB  1.7 GiB 5.4
> TiB 2.74 0.98 203 up osd.29
> -1844.57349-  45 TiB 1.3 TiB 341 GiB 248 MiB   14 GiB  43
> TiB 2.81 1.01   -rack 2-zwei
>  -722.28674-  22 TiB 641 GiB 171 GiB 132 MiB  6.7 GiB  22
> TiB 2.81 1.01   -host onode2
>   1   hdd   5.57169  1.0 5.6 TiB 155 GiB  38 GiB  35 MiB  1.2 GiB 5.4
> TiB 2.72 0.97 203 up osd.1
>   8   hdd   5.57169  1.0 5.6 TiB 163 GiB  46 GiB  36 MiB  2.4 GiB 5.4
> TiB 2.86 1.02 243 up osd.8
>  16   hdd   5.57169  1.0 5.6 TiB 161 GiB  43 GiB  24 MiB 1000 MiB 5.4
> TiB 2.82 1.01 221 up osd.16
>  23   hdd   5.57169  1.0 5.6 TiB 162 GiB  45 GiB  37 MiB  2.1 GiB 5.4
> TiB 2.84 1.02 228 up osd.23
>  -322.28674-  22 TiB 640 GiB 170 GiB 116 MiB  7.6 GiB  22
> TiB 2.80 1.00   -host onode5
>   3   hdd   5.57169  1.0 5.6 TiB 154 GiB  36 GiB  14 MiB 1010 MiB 5.4
> TiB 2.70 0.97 186 up osd.3
>   7   hdd   5.57169  1.0 5.6 TiB 161 GiB  44 GiB  22 MiB  2.2 GiB 5.4
> TiB 2.82 1.01 221 up osd.7
>  15   hdd   5.57169  1.0 5.6 TiB 165 GiB  48 GiB  26 MiB  2.3 GiB 5.4
> TiB 2.89 1.04 249 up osd.15
>  24   hdd   5.57169  1.0 5.6 TiB 160 GiB  42 GiB  54 MiB  2.1 GiB 5.4
> TiB 2.80 1.00 223 up osd.24
> -1950.14517-  50 TiB 1.4 TiB 376 GiB 311 MiB   18 GiB  49
>

Re: [ceph-users] Reduced data availability: 2 pgs inactive

2019-06-19 Thread Lars Täuber
Hi Paul,

thanks for your reply.

Wed, 19 Jun 2019 13:19:55 +0200
Paul Emmerich  ==> Lars Täuber  :
> Wild guess: you hit the PG hard limit, how many PGs per OSD do you have?
> If this is the case: increase "osd max pg per osd hard ratio"
> 
> Check "ceph pg  query" to see why it isn't activating.
> 
> Can you share the output of "ceph osd df tree" and "ceph pg  query"
> of the affected PGs?

The pg queries are attached. I can't read them - to much information.


Here is the osd df tree:
# osd df tree
ID  CLASS WEIGHTREWEIGHT SIZERAW USE DATAOMAPMETA AVAIL   
%USE VAR  PGS STATUS TYPE NAME   
 -1   167.15057- 167 TiB 4.7 TiB 1.2 TiB 952 MiB   57 GiB 162 TiB 
2.79 1.00   -root PRZ
-1772.43192-  72 TiB 2.0 TiB 535 GiB 393 MiB   25 GiB  70 TiB 
2.78 1.00   -rack 1-eins 
 -922.28674-  22 TiB 640 GiB 170 GiB  82 MiB  9.0 GiB  22 TiB 
2.80 1.01   -host onode1 
  2   hdd   5.57169  1.0 5.6 TiB 162 GiB  45 GiB  11 MiB  2.3 GiB 5.4 TiB 
2.84 1.02 224 up osd.2   
  9   hdd   5.57169  1.0 5.6 TiB 156 GiB  39 GiB  19 MiB  2.1 GiB 5.4 TiB 
2.74 0.98 201 up osd.9   
 14   hdd   5.57169  1.0 5.6 TiB 162 GiB  44 GiB  24 MiB  2.1 GiB 5.4 TiB 
2.84 1.02 230 up osd.14  
 21   hdd   5.57169  1.0 5.6 TiB 160 GiB  42 GiB  27 MiB  2.5 GiB 5.4 TiB 
2.80 1.00 219 up osd.21  
-1322.28674-  22 TiB 640 GiB 170 GiB 123 MiB  8.9 GiB  22 TiB 
2.80 1.00   -host onode4 
  4   hdd   5.57169  1.0 5.6 TiB 156 GiB  39 GiB  38 MiB  2.2 GiB 5.4 TiB 
2.73 0.98 205 up osd.4   
 11   hdd   5.57169  1.0 5.6 TiB 164 GiB  47 GiB  24 MiB  2.0 GiB 5.4 TiB 
2.87 1.03 241 up osd.11  
 18   hdd   5.57169  1.0 5.6 TiB 159 GiB  42 GiB  31 MiB  2.5 GiB 5.4 TiB 
2.79 1.00 221 up osd.18  
 22   hdd   5.57169  1.0 5.6 TiB 160 GiB  43 GiB  29 MiB  2.1 GiB 5.4 TiB 
2.81 1.01 225 up osd.22  
 -527.85843-  28 TiB 782 GiB 195 GiB 188 MiB  6.9 GiB  27 TiB 
2.74 0.98   -host onode7 
  5   hdd   5.57169  1.0 5.6 TiB 158 GiB  41 GiB  26 MiB  1.2 GiB 5.4 TiB 
2.77 0.99 213 up osd.5   
 12   hdd   5.57169  1.0 5.6 TiB 159 GiB  42 GiB  31 MiB  993 MiB 5.4 TiB 
2.79 1.00 222 up osd.12  
 20   hdd   5.57169  1.0 5.6 TiB 157 GiB  40 GiB  47 MiB  1.2 GiB 5.4 TiB 
2.76 0.99 212 up osd.20  
 27   hdd   5.57169  1.0 5.6 TiB 151 GiB  33 GiB  28 MiB  1.9 GiB 5.4 TiB 
2.64 0.95 179 up osd.27  
 29   hdd   5.57169  1.0 5.6 TiB 156 GiB  39 GiB  56 MiB  1.7 GiB 5.4 TiB 
2.74 0.98 203 up osd.29  
-1844.57349-  45 TiB 1.3 TiB 341 GiB 248 MiB   14 GiB  43 TiB 
2.81 1.01   -rack 2-zwei 
 -722.28674-  22 TiB 641 GiB 171 GiB 132 MiB  6.7 GiB  22 TiB 
2.81 1.01   -host onode2 
  1   hdd   5.57169  1.0 5.6 TiB 155 GiB  38 GiB  35 MiB  1.2 GiB 5.4 TiB 
2.72 0.97 203 up osd.1   
  8   hdd   5.57169  1.0 5.6 TiB 163 GiB  46 GiB  36 MiB  2.4 GiB 5.4 TiB 
2.86 1.02 243 up osd.8   
 16   hdd   5.57169  1.0 5.6 TiB 161 GiB  43 GiB  24 MiB 1000 MiB 5.4 TiB 
2.82 1.01 221 up osd.16  
 23   hdd   5.57169  1.0 5.6 TiB 162 GiB  45 GiB  37 MiB  2.1 GiB 5.4 TiB 
2.84 1.02 228 up osd.23  
 -322.28674-  22 TiB 640 GiB 170 GiB 116 MiB  7.6 GiB  22 TiB 
2.80 1.00   -host onode5 
  3   hdd   5.57169  1.0 5.6 TiB 154 GiB  36 GiB  14 MiB 1010 MiB 5.4 TiB 
2.70 0.97 186 up osd.3   
  7   hdd   5.57169  1.0 5.6 TiB 161 GiB  44 GiB  22 MiB  2.2 GiB 5.4 TiB 
2.82 1.01 221 up osd.7   
 15   hdd   5.57169  1.0 5.6 TiB 165 GiB  48 GiB  26 MiB  2.3 GiB 5.4 TiB 
2.89 1.04 249 up osd.15  
 24   hdd   5.57169  1.0 5.6 TiB 160 GiB  42 GiB  54 MiB  2.1 GiB 5.4 TiB 
2.80 1.00 223 up osd.24  
-1950.14517-  50 TiB 1.4 TiB 376 GiB 311 MiB   18 GiB  49 TiB 
2.79 1.00   -rack 3-drei 
-1522.28674-  22 TiB 649 GiB 179 GiB 112 MiB  8.2 GiB  22 TiB 
2.84 1.02   -host onode3 
  0   hdd   5.57169  1.0 5.6 TiB 162 GiB  45 GiB  28 MiB  996 MiB 5.4 TiB 
2.84 1.02 229 up osd.0   
 10   hdd   5.57169  1.0 5.6 TiB 159 GiB  42 GiB  21 MiB  2.2 GiB 5.4 TiB 
2.79 1.00 213 up osd.10  
 17   hdd   5.57169  1.0 5.6 TiB 165 GiB  47 GiB  19 MiB  2.5 GiB 5.4 TiB 
2.88 1.03 238 up osd.17  
 25   hdd   5.57169  1.0 5.6 TiB 163 GiB  46 GiB  44 MiB  2.5 GiB 5.4 TiB 
2.86 1.03 242 up osd.25  
-1127.85843-  28 TiB 784 GiB 197 GiB 199 MiB  9.4 GiB  27 TiB 
2.75 0.99   -host onode6 
  6   hdd   5.5

Re: [ceph-users] Reduced data availability: 2 pgs inactive

2019-06-19 Thread Paul Emmerich
Wild guess: you hit the PG hard limit, how many PGs per OSD do you have?
If this is the case: increase "osd max pg per osd hard ratio"

Check "ceph pg  query" to see why it isn't activating.

Can you share the output of "ceph osd df tree" and "ceph pg  query"
of the affected PGs?


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Wed, Jun 19, 2019 at 8:52 AM Lars Täuber  wrote:

> Hi there!
>
> Recently I made our cluster rack aware
> by adding racks to the crush map.
> The failure domain was and still is "host".
>
> rule cephfs2_data {
> id 7
> type erasure
> min_size 3
> max_size 6
> step set_chooseleaf_tries 5
> step set_choose_tries 100
> step take PRZ
> step chooseleaf indep 0 type host
> step emit
>
>
> Then I sorted the hosts into the new
> rack buckets of the crush map as they
> are in reality, by:
>   # osd crush move onodeX rack=XYZ
> for all hosts.
>
> The cluster started to reorder the data.
>
> In the end the cluster has now:
> HEALTH_WARN 1 filesystem is degraded; Reduced data availability: 2 pgs
> inactive; Degraded data redundancy: 678/2371785 objects degraded (0.029%),
> 2 pgs degraded, 2 pgs undersized
> FS_DEGRADED 1 filesystem is degraded
> fs cephfs_1 is degraded
> PG_AVAILABILITY Reduced data availability: 2 pgs inactive
> pg 21.2e4 is stuck inactive for 142792.952697, current state
> activating+undersized+degraded+remapped+forced_backfill, last acting
> [5,2147483647,25,28,11,2]
> pg 23.5 is stuck inactive for 142791.437243, current state
> activating+undersized+degraded+remapped+forced_backfill, last acting [13,21]
> PG_DEGRADED Degraded data redundancy: 678/2371785 objects degraded
> (0.029%), 2 pgs degraded, 2 pgs undersized
> pg 21.2e4 is stuck undersized for 142779.321192, current state
> activating+undersized+degraded+remapped+forced_backfill, last acting
> [5,2147483647,25,28,11,2]
> pg 23.5 is stuck undersized for 142789.747915, current state
> activating+undersized+degraded+remapped+forced_backfill, last acting [13,21]
>
> The cluster hosts a cephfs which is
> not mountable anymore.
>
> I tried a few things (as you can see:
> forced_backfill), but failed.
>
> The cephfs_data pool is EC 4+2.
> Both inactive pgs seem to have enough
> copies to recalculate the contents for
> all osds.
>
> Is there a chance to get both pgs
> clean again?
>
> How can I force the pgs to recalculate
> all necessary copies?
>
>
> Thanks
> Lars
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Debian Buster builds

2019-06-19 Thread Paul Emmerich
On Tue, Jun 18, 2019 at 6:29 PM Daniel Baumann 
wrote:

> On 6/18/19 3:39 PM, Paul Emmerich wrote:
> > we maintain (unofficial) Nautilus builds for Buster here:
> > https://mirror.croit.io/debian-nautilus/
>
> the repository doesn't contain the source packages. just out of
> curiosity to see what you might have changes, apart from just
> (re)building the packages.. are they available somewhere?
>

we (currently) don't apply any patches on Nautilus, some of the older Mimic
packages have a few bug fixes applied.

We build the packages from tags here: https://github.com/croit/ceph, i.e.,
the 14.2.1 packages are https://github.com/croit/ceph/tree/v14.2.1


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90



>
> Regards,
> Daniel
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Nautilus HEALTH_WARN for msgr2 protocol

2019-06-19 Thread Bob Farrell
Aha, yes, that does help ! I tried a lot of variations but couldn't quite
get it to work so used the simpler alternative instead.

Thanks !

On Wed, 19 Jun 2019 at 09:21, Dominik Csapak  wrote:

> On 6/14/19 6:10 PM, Bob Farrell wrote:
> > Hi. Firstly thanks to all involved in this great mailing list, I learn
> > lots from it every day.
> >
>
> Hi,
>
> >
> > I never figured out the correct syntax to set up the first monitor to
> > use both 6789 and 3300. The other monitors that join the cluster set
> > this config automatically but I couldn't work out how to apply it to the
> > first monitor node.
> >
>
> I struggled with this myself yesterday and found that the relevant
> argument is not really documented:
>
> monmaptool --create --addv ID [v1:ip:6789,v2:ip:3300] /path/to/monmap
>
>
> hope this helps :)
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Nautilus HEALTH_WARN for msgr2 protocol

2019-06-19 Thread Dominik Csapak

On 6/14/19 6:10 PM, Bob Farrell wrote:
Hi. Firstly thanks to all involved in this great mailing list, I learn 
lots from it every day.




Hi,



I never figured out the correct syntax to set up the first monitor to 
use both 6789 and 3300. The other monitors that join the cluster set 
this config automatically but I couldn't work out how to apply it to the 
first monitor node.




I struggled with this myself yesterday and found that the relevant
argument is not really documented:

monmaptool --create --addv ID [v1:ip:6789,v2:ip:3300] /path/to/monmap


hope this helps :)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com