Re: [ceph-users] MDS reports metadata damage

2018-06-24 Thread Yan, Zheng
On Thu, Jun 21, 2018 at 5:43 PM Hennen, Christian
 wrote:
>
> Dear Community,
>
>
>
> here at ZIMK at the University of Trier we operate a Ceph Luminous Cluster as 
> filer for a HPC environment via CephFS (Bluestore backend). During setup last 
> year we made the mistake of not configuring the RAID as JBOD, so initially 
> the 3 nodes only housed 1 OSD each. Currently, we are in the process of 
> remediating this. After a loss of metadata due to resetting the journal 
> (journal entries were not being flushed fast enough), we managed to bring the 
> cluster back up and started adding 2 additional nodes. The hardware is a 
> little bit older than the first 3 nodes. We configured the drives on these 
> individually (RAID-0 on each disk since there is no pass-through mode on the 
> controller) and after some rebalancing and re-weight, the first of the 
> original nodes is now empty and ready to be re-installed.
>
>
>
> However, due to the aforementioned metadata loss, we are currently getting 
> warnings about metadata damage.
>
> damage ls shows, that only one folder is affected. As we don’t need this 
> folder, we’d like to delete it and the associated metadata and other 
> informations if possible. Taking the cluster offline for a data-scan right 
> now would be a little bit difficult, so any other suggestions would be 
> appreciated.
>
>
>
> Cluster health details are available here: 
> https://gitlab.uni-trier.de/snippets/65
>
>

Try following commands

mkdir /damaged_dirs
setfattr -n ceph.dir.pin -v 0 /damamged_dirs
mv 
/clients/installserver/repos/ubuntu/mirror/mirror.uni-trier.de/ubuntu/pool/universe/g/gerritlib
/damaged_dirs
mv 
/clients/installserver/repos/ubuntu/mirror/mirror.uni-trier.de/ubuntu/pool/universe/g/gst-fluendo-mp3
/damaged_dirs
ceph daemon mds. journal flush
ceph daemon mds. journal flush
ceph daemon mds. scrub_path /damaged_dirs force recursive repair

Now you should be able to rmdir directories in /dmamged_dirs/

Regards
Yan, Zheng



>
> Regards
>
> Christian Hennen
>
>
>
> Project Manager Infrastructural Services
>
> Zentrum für Informations-, Medien-
>
> und Kommunikationstechnologie (ZIMK)
>
> Universität Trier
>
> 54286 Trier
>
>
>
> Tel.: +49 651 201 3488
>
> Fax: +49 651 201 3921
>
> E-Mail: christian.hen...@uni-trier.de
>
> Web: http://zimk.uni-trier.de
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Uneven data distribution with even pg distribution after rebalancing

2018-06-24 Thread shadow_lin
Hi List,
   The enviroment is:
   Ceph 12.2.4
   Balancer module on and in upmap mode
   Failure domain is per host, 2 OSD per host
   EC k=4 m=2
   PG distribution is almost even before and after the rebalancing.


   After marking out one of the osd,I noticed a lot of the data was moving into 
the other osd on the same host .

   Ceph osd df result is(osd.20 and osd.21 are in the same host and osd.20 was 
marked out):

ID CLASS WEIGHT  REWEIGHT SIZE  USE   AVAIL %USE  VAR  PGS
19   hdd 9.09560  1.0 9313G 7079G 2233G 76.01 1.00 135
21   hdd 9.09560  1.0 9313G 8123G 1190G 87.21 1.15 135
22   hdd 9.09560  1.0 9313G 7026G 2287G 75.44 1.00 133
23   hdd 9.09560  1.0 9313G 7026G 2286G 75.45 1.00 134
   
   I am using RBD only so the objects should all be 4m .I don't understand why 
osd 21 got significant more data 
with the same pg as other osds.
   Is this behavior expected or I misconfiged something or  some kind of bug?
   
   Thanks


2018-06-25
shadow_lin 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] fixing unrepairable inconsistent PG

2018-06-24 Thread Brad Hubbard
Can you try the following?

$ ceph --debug_ms 5 --debug_auth 20 pg 18.2 query

On Fri, Jun 22, 2018 at 7:54 PM, Andrei Mikhailovsky  wrote:
> Hi Brad,
>
> here is the output of the command (replaced the real auth key with [KEY]):
>
>
> 
>
> 2018-06-22 10:47:27.659895 7f70ef9e6700 10 monclient: build_initial_monmap
> 2018-06-22 10:47:27.661995 7f70ef9e6700 10 monclient: init
> 2018-06-22 10:47:27.662002 7f70ef9e6700  5 adding auth protocol: cephx
> 2018-06-22 10:47:27.662004 7f70ef9e6700 10 monclient: auth_supported 2 method 
> cephx
> 2018-06-22 10:47:27.662221 7f70ef9e6700  2 auth: KeyRing::load: loaded key 
> file /etc/ceph/ceph.client.admin.keyring
> 2018-06-22 10:47:27.662338 7f70ef9e6700 10 monclient: _reopen_session rank -1
> 2018-06-22 10:47:27.662425 7f70ef9e6700 10 monclient(hunting): picked 
> mon.noname-b con 0x7f70e8176c80 addr 192.168.168.202:6789/0
> 2018-06-22 10:47:27.662484 7f70ef9e6700 10 monclient(hunting): picked 
> mon.noname-a con 0x7f70e817a2e0 addr 192.168.168.201:6789/0
> 2018-06-22 10:47:27.662534 7f70ef9e6700 10 monclient(hunting): _renew_subs
> 2018-06-22 10:47:27.662544 7f70ef9e6700 10 monclient(hunting): authenticate 
> will time out at 2018-06-22 10:52:27.662543
> 2018-06-22 10:47:27.663831 7f70d77fe700 10 monclient(hunting): handle_monmap 
> mon_map magic: 0 v1
> 2018-06-22 10:47:27.663885 7f70d77fe700 10 monclient(hunting):  got monmap 
> 20, mon.noname-b is now rank -1
> 2018-06-22 10:47:27.663889 7f70d77fe700 10 monclient(hunting): dump:
> epoch 20
> fsid 51e9f641-372e-44ec-92a4-b9fe55cbf9fe
> last_changed 2018-06-16 23:14:48.936175
> created 0.00
> 0: 192.168.168.201:6789/0 mon.arh-ibstorage1-ib
> 1: 192.168.168.202:6789/0 mon.arh-ibstorage2-ib
> 2: 192.168.168.203:6789/0 mon.arh-ibstorage3-ib
>
> 2018-06-22 10:47:27.664005 7f70d77fe700 10 cephx: set_have_need_key no 
> handler for service mon
> 2018-06-22 10:47:27.664020 7f70d77fe700 10 cephx: set_have_need_key no 
> handler for service osd
> 2018-06-22 10:47:27.664021 7f70d77fe700 10 cephx: set_have_need_key no 
> handler for service mgr
> 2018-06-22 10:47:27.664025 7f70d77fe700 10 cephx: set_have_need_key no 
> handler for service auth
> 2018-06-22 10:47:27.664026 7f70d77fe700 10 cephx: validate_tickets want 53 
> have 0 need 53
> 2018-06-22 10:47:27.664032 7f70d77fe700 10 monclient(hunting): my global_id 
> is 411322261
> 2018-06-22 10:47:27.664035 7f70d77fe700 10 cephx client: handle_response ret 
> = 0
> 2018-06-22 10:47:27.664046 7f70d77fe700 10 cephx client:  got initial server 
> challenge d66f2dffc2113d43
> 2018-06-22 10:47:27.664049 7f70d77fe700 10 cephx client: validate_tickets: 
> want=53 need=53 have=0
>
> 2018-06-22 10:47:27.664052 7f70d77fe700 10 cephx: set_have_need_key no 
> handler for service mon
> 2018-06-22 10:47:27.664053 7f70d77fe700 10 cephx: set_have_need_key no 
> handler for service osd
> 2018-06-22 10:47:27.664054 7f70d77fe700 10 cephx: set_have_need_key no 
> handler for service mgr
> 2018-06-22 10:47:27.664055 7f70d77fe700 10 cephx: set_have_need_key no 
> handler for service auth
> 2018-06-22 10:47:27.664056 7f70d77fe700 10 cephx: validate_tickets want 53 
> have 0 need 53
> 2018-06-22 10:47:27.664057 7f70d77fe700 10 cephx client: want=53 need=53 
> have=0
> 2018-06-22 10:47:27.664061 7f70d77fe700 10 cephx client: build_request
> 2018-06-22 10:47:27.664145 7f70d77fe700 10 cephx client: get auth session 
> key: client_challenge d4c95f637e641b55
> 2018-06-22 10:47:27.664175 7f70d77fe700 10 monclient(hunting): handle_monmap 
> mon_map magic: 0 v1
> 2018-06-22 10:47:27.664208 7f70d77fe700 10 monclient(hunting):  got monmap 
> 20, mon.arh-ibstorage1-ib is now rank 0
> 2018-06-22 10:47:27.664211 7f70d77fe700 10 monclient(hunting): dump:
> epoch 20
> fsid 51e9f641-372e-44ec-92a4-b9fe55cbf9fe
> last_changed 2018-06-16 23:14:48.936175
> created 0.00
> 0: 192.168.168.201:6789/0 mon.arh-ibstorage1-ib
> 1: 192.168.168.202:6789/0 mon.arh-ibstorage2-ib
> 2: 192.168.168.203:6789/0 mon.arh-ibstorage3-ib
>
> 2018-06-22 10:47:27.664241 7f70d77fe700 10 cephx: set_have_need_key no 
> handler for service mon
> 2018-06-22 10:47:27.664244 7f70d77fe700 10 cephx: set_have_need_key no 
> handler for service osd
> 2018-06-22 10:47:27.664245 7f70d77fe700 10 cephx: set_have_need_key no 
> handler for service mgr
> 2018-06-22 10:47:27.664246 7f70d77fe700 10 cephx: set_have_need_key no 
> handler for service auth
> 2018-06-22 10:47:27.664247 7f70d77fe700 10 cephx: validate_tickets want 53 
> have 0 need 53
> 2018-06-22 10:47:27.664251 7f70d77fe700 10 monclient(hunting): my global_id 
> is 411323061
> 2018-06-22 10:47:27.664253 7f70d77fe700 10 cephx client: handle_response ret 
> = 0
> 2018-06-22 10:47:27.664256 7f70d77fe700 10 cephx client:  got initial server 
> challenge d5d3c1e5bcf3c0b8
> 2018-06-22 10:47:27.664258 7f70d77fe700 10 cephx client: validate_tickets: 
> want=53 need=53 have=0
> 2018-06-22 10:47:27.664260 7f70d77fe700 10 cephx: set_have_need_key no 
> handler for service mon
> 

Re: [ceph-users] pulled a disk out, ceph still thinks its in

2018-06-24 Thread pixelfairy
15, 5 in each node. 14 currently in.

is there another way to know if theres a problem with one? or to make the
threshold higher?

On Sun, Jun 24, 2018 at 2:14 PM Paul Emmerich 
wrote:

> How many OSDs do you have? How many of them are currently in?
>
> outing OSDs is only enabled if more than 75% of the OSDs are in by default.
>
> Paul
>
> > Am 24.06.2018 um 23:04 schrieb pixelfairy :
> >
> > installed mimic on an empty cluster. yanked out an osd about 1/2hr ago
> and its still showing as in with ceph -s, ceph osd stat, and ceph osd tree.
> >
> > is the timeout long?
> >
> > hosts run ubuntu 16.04. ceph installed using ceph-ansible branch
> stable-3.1 the playbook didnt make the default rbd pool.
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pulled a disk out, ceph still thinks its in

2018-06-24 Thread Paul Emmerich
How many OSDs do you have? How many of them are currently in?

outing OSDs is only enabled if more than 75% of the OSDs are in by default.

Paul

> Am 24.06.2018 um 23:04 schrieb pixelfairy :
> 
> installed mimic on an empty cluster. yanked out an osd about 1/2hr ago and 
> its still showing as in with ceph -s, ceph osd stat, and ceph osd tree.
> 
> is the timeout long? 
> 
> hosts run ubuntu 16.04. ceph installed using ceph-ansible branch stable-3.1 
> the playbook didnt make the default rbd pool.
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] pulled a disk out, ceph still thinks its in

2018-06-24 Thread pixelfairy
installed mimic on an empty cluster. yanked out an osd about 1/2hr ago and
its still showing as in with ceph -s, ceph osd stat, and ceph osd tree.

is the timeout long?

hosts run ubuntu 16.04. ceph installed using ceph-ansible branch stable-3.1
the playbook didnt make the default rbd pool.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] crush map has straw_calc_version=0

2018-06-24 Thread David
Hi!

So I've got an old dumpling production cluster which has slowly been upgraded 
to Jewel.
Now I'm facing the Ceph Health warning that straw_calc_version = 0

According to an old thread from 2016 and the docs it could trigger a small to 
moderate amount of migration.
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-May/009702.html 
(https://link.getmailspring.com/link/1529849940.local-c03a6474-c0ef-v1.2.2-96fb3...@getmailspring.com/0?redirect=http%3A%2F%2Flists.ceph.com%2Fpipermail%2Fceph-users-ceph.com%2F2016-May%2F009702.html=Y2VwaC11c2Vyc0BsaXN0cy5jZXBoLmNvbQ%3D%3D)
http://docs.ceph.com/docs/master/rados/operations/crush-map/#straw-calc-version-tunable-introduced-with-firefly-too
 
(https://link.getmailspring.com/link/1529849940.local-c03a6474-c0ef-v1.2.2-96fb3...@getmailspring.com/1?redirect=http%3A%2F%2Fdocs.ceph.com%2Fdocs%2Fmaster%2Frados%2Foperations%2Fcrush-map%2F%23straw-calc-version-tunable-introduced-with-firefly-too=Y2VwaC11c2Vyc0BsaXN0cy5jZXBoLmNvbQ%3D%3D)
Since we're heading on to Luminous and later on Mimic, I'm not sure it's wise 
to leave it as it is. Since this is a filestore HDD + SSD journals cluster, a 
moderate migration might cause issues to our production servers.
Any way to "test" how much migration it will cause? The servers/disks are 
homogeneous.
Also, would ignoring it cause any issues with Luminous/Mimic? The plan is to 
set up another pool and replicate all data to the new pool on the same OSDs 
(not sure that's in Mimic yet though?)
Kind Regards,
David Majchrzak
> Moving to straw_calc_version 1 and then adjusting a straw bucket (by adding, 
> removing, or reweighting an item, or by using the reweight-all command) can 
> trigger a small to moderate amount of data movement if the cluster has hit 
> one of the problematic conditions.
>
>
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Mimic on CentOS 7.5 dependency issue (liboath)

2018-06-24 Thread Steffen Winther Sørensen
On 24 Jun 2018, at 06.57, Brad Hubbard  wrote:
> 
> As Brian pointed out
> 
> # yum -y install
> https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm 
> 
Yeap has worked fine for me in CentOS 7.5 w/EPEL


> 
> On Sun, Jun 24, 2018 at 2:46 PM, Michael Kuriger  wrote:
>> CentOS 7.5 is pretty new.  Have you tried CentOS 7.4?
>> 
>> Mike Kuriger
>> Sr. Unix Systems Engineer
>> 
>> 
>> 
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
>> Brian :
>> Sent: Saturday, June 23, 2018 1:41 AM
>> To: Stefan Kooman
>> Cc: ceph-users
>> Subject: Re: [ceph-users] Ceph Mimic on CentOS 7.5 dependency issue (liboath)
>> 
>> Hi Stefan
>> 
>> $ sudo yum provides liboath
>> Loaded plugins: fastestmirror
>> Loading mirror speeds from cached hostfile
>> * base: mirror.strencom.net
>> * epel: mirror.sax.uk.as61049.net
>> * extras: mirror.strencom.net
>> * updates: mirror.strencom.net
>> liboath-2.4.1-9.el7.x86_64 : Library for OATH handling
>> Repo: epel
>> 
>> 
>> 
>> On Sat, Jun 23, 2018 at 9:02 AM, Stefan Kooman  wrote:
>>> Hi list,
>>> 
>>> I'm trying to install "Ceph mimic" on a CentOS 7.5 client (base
>>> install). I Added the "rpm-mimic" repo from our mirror and tried to
>>> install ceph-common, but I run into a dependency problem:
>>> 
>>> --> Finished Dependency Resolution
>>> Error: Package: 2:ceph-common-13.2.0-0.el7.x86_64 
>>> (ceph.download.bit.nl_rpm-mimic_el7_x86_64)
>>>   Requires: liboath.so.0()(64bit)
>>> Error: Package: 2:ceph-common-13.2.0-0.el7.x86_64 
>>> (ceph.download.bit.nl_rpm-mimic_el7_x86_64)
>>>   Requires: liboath.so.0(LIBOATH_1.10.0)(64bit)
>>> Error: Package: 2:ceph-common-13.2.0-0.el7.x86_64 
>>> (ceph.download.bit.nl_rpm-mimic_el7_x86_64)
>>>   Requires: liboath.so.0(LIBOATH_1.2.0)(64bit)
>>> Error: Package: 2:librgw2-13.2.0-0.el7.x86_64 
>>> (ceph.download.bit.nl_rpm-mimic_el7_x86_64)
>>> 
>>> Is this "oath" package something I need to install from a 3rd party repo?
>>> 
>>> Gr. Stefan
>>> 
>>> 
>>> --
>>> | BIT BV  
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.bit.nl_=DwICAg=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=7oT7QCZjOE1RiCQwuYT5PejOv8n637nUi2yb5vE1aaQ=aPpOV3zxQyodG4OBQXMWTPfFJBgGMq-9tNFoSSEhMxQ=
>>> Kamer van Koophandel 09090351
>>> | GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwICAg=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=7oT7QCZjOE1RiCQwuYT5PejOv8n637nUi2yb5vE1aaQ=TPzdw4kJULbx2F0LQ1N-L3aQsxWzkKkW0X6b6NJJ5OI=
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwICAg=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=7oT7QCZjOE1RiCQwuYT5PejOv8n637nUi2yb5vE1aaQ=TPzdw4kJULbx2F0LQ1N-L3aQsxWzkKkW0X6b6NJJ5OI=
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> -- 
> Cheers,
> Brad
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw multizone not syncing large bucket completly to other zone

2018-06-24 Thread Enrico Kern
Hello,

We have two ceph luminous clusters (12.2.5).

recently one of our big buckets stopped syncing properly. We have a one
specific bucket which is around 30TB in size consisting of alot of
directories with each one having files of 10-20MB.

The secondary zone is often completly missing multiple days of data in this
bucket, while all other smaller buckets sync just fine.

Even with the complete data missing radosgw-admin sync status always says
everything is fine.

the sync error log doesnt show anything for those days.

Running

radosgw-admin metadata sync and data sync also doesnt solve the issue. The
only way of making it sync again is to disable and re-eanble the sync. That
needs to be done as often as like 10 times in an hour to make it sync
properly.

radosgw-admin bucket sync disable
radosgw-admin bucket sync enable

when i run data init i sometimes get this:

 radosgw-admin data sync init --source-zone berlin
2018-06-24 07:55:46.337858 7fe7557fa700  0 ERROR: failed to distribute
cache for
amsterdam.rgw.log:datalog.sync-status.6a9448d2-bdba-4bec-aad6-aba72cd8eac6

Sometimes when really alot of data is missing (yesterday it was more then 1
month) this helps making them get in sync again when run on the secondary
zone:

radosgw-admin bucket check --fix --check-objects

how can i debug that problem further? We have so many requests on the
cluster that is is hard to dig something out of the log files..

Given all the smaller buckets are perfectly in sync i suspect some problem
because of the size of the bucket.

Any points to the right direction are greatly appreciated.

Regards,

Enrico

-- 

*Enrico Kern*
VP IT Operations

enrico.k...@glispa.com
+49 (0) 30 555713017 / +49 (0) 152 26814501
skype: flyersa
LinkedIn Profile 


 

*Glispa GmbH* | Berlin Office
Sonnenburger Straße 73

10437 Berlin

|

 Germany


Managing Director Din Karol-Gavish
Registered in Berlin
AG Charlottenburg |

HRB
114678B
–
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com