date:20161102

[ceph-users] Introducing DeepSea: A tool for deploying Ceph using Salt

2016-11-02 Thread Tim Serong

Hi All,

I thought I should make a little noise about a project some of us at
SUSE have been working on, called DeepSea.  It's a collection of Salt
states, runners and modules for orchestrating deployment of Ceph
clusters.  To help everyone get a feel for it, I've written a blog post
which walks through using DeepSea to set up a small test cluster:

  http://ourobengr.com/2016/11/hello-salty-goodness/

If you'd like to try it out yourself, the code is on GitHub:

  https://github.com/SUSE/DeepSea

More detailed documentation can be found at:

  https://github.com/SUSE/DeepSea/wiki/intro
  https://github.com/SUSE/DeepSea/wiki/management
  https://github.com/SUSE/DeepSea/wiki/policy

Usual story: feedback, issues, pull requests are all welcome ;)

Enjoy,

Tim
-- 
Tim Serong
Senior Clustering Engineer
SUSE
tser...@suse.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] backup of radosgw config

2016-11-02 Thread Thomas


Hi guys,

I'm not sure this was asked before as I wasn't able to find anything 
googling (and the search function of the list is broken at 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/) - anyway:


- How would you backup the config of all users and bucket configurations 
for the radosgw so that in a disaster recovery we could import those again?


Cheers,
Thomas

--

Thomas Gross
TGMEDIA Ltd.
p. +64 211 569080 | i...@tgmedia.co.nz

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Bluestore + erasure coding memory usage

2016-11-02 Thread bobobo1...@gmail.com

I'm running Kraken built from Git right now and I've found that my OSDs eat
as much memory as they can before they're killed by OOM. I understand that
Bluestore is experimental but thought the fact that it does this should be
known.

My setup:
- Xeon D-1540, 32GB DDR4 ECC RAM
- Arch Linux
- Single node, 4 8TB OSDs, each prepared with "ceph-disk prepare
--bluestore /dev/sdX"
- Built from Git fac6335a1eea12270f76cf2c7814648669e6515a

Steps to reproduce:
- Start mon
- Start OSDs
- ceph osd pool create pool 256 256 erasure myprofile storage
- rados bench -p pool  write -t 32
- ceph osd pool delete pool
- ceph osd pool create pool 256 256 replicated
- rados bench -p pool  write -t 32
- ceph osd pool delete pool

The OSDs start at ~500M used each (according to "ceph tell osd.0 heap
stats"), before they're allocated PGs. After creating and peering PGs,
they're at ~514M each.

After running rados bench for 10s, memory is at ~727M each. Running pprof
on a dump shows the top entry as:

218.9  96.1%  96.1%218.9  96.1% ceph::buffer::create_aligned

Running rados bench another 10s pushes memory to 836M each. pprof again
shows similar results:

305.2  96.8%  96.8%305.2  96.8% ceph::buffer::create_aligned

I can continue this process until the OSDs are killed by OOM.

This only happens with Bluestore, other backends (like filestore) work fine.

When I delete the pool, the OSDs release the memory and return to their
~500M resting point.

Repeating the test with a replicated pool results in the OSDs consuming
elevated memory (~610M peak) while writing but returning to resting levels
when writing ends.

It'd be great if I could do something about this myself but I don't
understand the code very well and I can't figure out if there's a way to
trace the path taken for the memory to be allocated like there is for CPU
usage.

Any advice or solution would be much appreciated.


Thanks!

Lucas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CDM

2016-11-02 Thread Patrick McGarry

In case anyone is disappointed and not on, there were technical
difficulties that split the call. We are on now.

https://bluejeans.com/707503600


On Wed, Nov 2, 2016 at 9:02 PM, Patrick McGarry  wrote:
> Due to low attendance we have had to cancel CDM tonight. Sorry for the
> confusion.
>
>
>
> --
>
> Best Regards,
>
> Patrick McGarry
> Director Ceph Community || Red Hat
> http://ceph.com  ||  http://community.redhat.com
> @scuttlemonkey || @ceph



-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] CDM

2016-11-02 Thread Patrick McGarry

Due to low attendance we have had to cancel CDM tonight. Sorry for the
confusion.



-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] MDS Problems - Solved but reporting for benefit of others

2016-11-02 Thread Nick Fisk

A bit more digging, the original crash appears to be similar (but not exactly 
the same) as this tracker report

http://tracker.ceph.com/issues/16983

I can see that this was fixed in 10.2.3, so I will probably look to upgrade.

If the logs make sense to anybody with a bit more knowledge I would be 
interested if that bug is related or if I have stumbled on
something new.

Nick

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Nick 
> Fisk
> Sent: 02 November 2016 17:58
> To: 'Ceph Users' 
> Subject: [ceph-users] MDS Problems - Solved but reporting for benefit of 
> others
> 
> Hi all,
> 
> Just a bit of an outage with CephFS around the MDS's, I managed to get 
> everything up and running again after a bit of head
scratching
> and thought I would share here what happened.
> 
> Cause
> I believe the MDS's which were running as VM's suffered when the hypervisor 
> ran out of ram and started swapping due to hypervisor
> maintenance. I know this is less than ideal and have put steps in place to 
> prevent this happening again.
> 
> Symptoms
> 1. Noticed that both MDS's were down, log files on both showed that they had 
> crashed 2. After restarting MDS's, their status kept
> flipping between replay and reconnect 3. Now again both MDS's would crash 
> again 4. Log files showed they seemed to keep restarting
> after trying to reconnect clients 5. Clients were all kernel one was 3.19 and 
> the rest 4.8. I believe the problematic client was
one of the
> ones running Kernel 4.8 6. Ceph is 10.2.2
> 
> Resolution
> After some serious head scratching and a little bit of panicking, the fact 
> the log files showed the restart always happened after
trying
> to reconnect the clients gave me the idea to try and kill the sessions on the 
> MDS.  I first reset all the clients and waited, but
this didn't
> seem to have any effect and I could still see the MDS trying to reconnect to 
> the clients. I then decided to try and kill the
sessions from
> the MDS end, so I shutdown the standby MDS (as they kept flipping active 
> roles) and ran
> 
> ceph daemon mds.gp-ceph-mds1 session ls
> 
> I then tried to kill the last session in the list
> 
> ceph daemon mds.gp-ceph-mds1 session evict 
> 
> I had to keep hammering this command to get it at the right point, as the MDS 
> was only responding for a fraction of a second.
> 
> Suddenly in my other window, where I had the tail of the MDS log, I saw a 
> whizz of new information and then stopping with the MDS
> success message. So it seems something the MDS was trying to do whilst 
> reconnecting was upsetting it. Ceph -s updated so show
> MDS was now active. Rebooting other MDS then corrected made it standby as 
> well. Problem solved.
> 
> I have uploaded the 2 MDS logs here if any CephFS dev's are interested in 
> taking a closer look.
> 
> http://app.sys-pro.co.uk/mds_logs.zip
> 
> Nick
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Multi-tenancy and sharing CephFS data pools with other RADOS users

2016-11-02 Thread Dan Jakubiec

We currently have one master RADOS pool in our cluster that is shared among 
many applications.  All objects stored in the pool are currently stored using 
specific namespaces -- nothing is stored in the default namespace.

We would like to add a CephFS filesystem to our cluster, and would like to use 
the same master RADOS pool as the data pool for the filesystem.

Since there are no other tenants using the default namespace would it be safe 
to share our RADOS pool in this way?  Any reason to NOT do this?

Thanks,

-- Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] MDS Problems - Solved but reporting for benefit of others

2016-11-02 Thread Nick Fisk

Hi all,

Just a bit of an outage with CephFS around the MDS's, I managed to get 
everything up and running again after a bit of head
scratching and thought I would share here what happened.

Cause
I believe the MDS's which were running as VM's suffered when the hypervisor ran 
out of ram and started swapping due to hypervisor
maintenance. I know this is less than ideal and have put steps in place to 
prevent this happening again.

Symptoms
1. Noticed that both MDS's were down, log files on both showed that they had 
crashed
2. After restarting MDS's, their status kept flipping between replay and 
reconnect
3. Now again both MDS's would crash again
4. Log files showed they seemed to keep restarting after trying to reconnect 
clients
5. Clients were all kernel one was 3.19 and the rest 4.8. I believe the 
problematic client was one of the ones running Kernel 4.8
6. Ceph is 10.2.2

Resolution
After some serious head scratching and a little bit of panicking, the fact the 
log files showed the restart always happened after
trying to reconnect the clients gave me the idea to try and kill the sessions 
on the MDS.  I first reset all the clients and waited,
but this didn't seem to have any effect and I could still see the MDS trying to 
reconnect to the clients. I then decided to try and
kill the sessions from the MDS end, so I shutdown the standby MDS (as they kept 
flipping active roles) and ran

ceph daemon mds.gp-ceph-mds1 session ls 

I then tried to kill the last session in the list

ceph daemon mds.gp-ceph-mds1 session evict 

I had to keep hammering this command to get it at the right point, as the MDS 
was only responding for a fraction of a second.

Suddenly in my other window, where I had the tail of the MDS log, I saw a whizz 
of new information and then stopping with the MDS
success message. So it seems something the MDS was trying to do whilst 
reconnecting was upsetting it. Ceph -s updated so show MDS
was now active. Rebooting other MDS then corrected made it standby as well. 
Problem solved.

I have uploaded the 2 MDS logs here if any CephFS dev's are interested in 
taking a closer look.

http://app.sys-pro.co.uk/mds_logs.zip

Nick

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [EXTERNAL] Re: pg stuck with unfound objects on non exsisting osd's

2016-11-02 Thread Mehmet

Yes a rolling restart should work. That was enough in my case.

Am 2. November 2016 01:20:48 MEZ, schrieb "Will.Boege" :
>Start with a rolling restart of just the OSDs one system at a time,
>checking the status after each restart.
>
>On Nov 1, 2016, at 6:20 PM, Ronny Aasen
>> wrote:
>
>thanks for the suggestion.
>
>is a rolling reboot sufficient? or must all osd's be down at the same
>time ?
>one is no problem.  the other takes some scheduling..
>
>Ronny Aasen
>
>
>On 01.11.2016 21:52, c...@elchaka.de wrote:
>Hello Ronny,
>
>if it is possible for you, try to Reboot all OSD Nodes.
>
>I had this issue on my test Cluster and it become healthy after
>rebooting.
>
>Hth
>- Mehmet
>
>Am 1. November 2016 19:55:07 MEZ, schrieb Ronny Aasen
>:
>
>Hello.
>
>I have a cluster stuck with 2 pg's stuck undersized degraded, with 25
>unfound objects.
>
># ceph health detail
>HEALTH_WARN 2 pgs degraded; 2 pgs recovering; 2 pgs stuck degraded; 2
>pgs stuck unclean; 2 pgs stuck undersized; 2 pgs undersized; recovery
>294599/149522370 objects degraded (0.197%); recovery 640073/149522370
>objects misplaced (0.428%); recovery 25/46579241 unfound (0.000%);
>noout flag(s) set
>pg 6.d4 is stuck unclean for 8893374.380079, current state
>active+recovering+undersized+degraded+remapped, last acting [62]
>pg 6.ab is stuck unclean for 8896787.249470, current state
>active+recovering+undersized+degraded+remapped, last acting [18,12]
>pg 6.d4 is stuck undersized for 438122.427341, current state
>active+recovering+undersized+degraded+remapped, last acting [62]
>pg 6.ab is stuck undersized for 416947.461950, current state
>active+recovering+undersized+degraded+remapped, last acting [18,12]pg
>6.d4 is stuck degraded for 438122.427402, current state
>active+recovering+undersized+degraded+remapped, last acting [62]
>pg 6.ab is stuck degraded for 416947.462010, current state
>active+recovering+undersized+degraded+remapped, last acting [18,12]
>pg 6.d4 is active+recovering+undersized+degraded+remapped, acting [62],
>25 unfound
>pg 6.ab is active+recovering+undersized+degraded+remapped, acting
>[18,12]
>recovery 294599/149522370 objects degraded (0.197%)
>recovery 640073/149522370 objects misplaced (0.428%)
>recovery 25/46579241 unfound (0.000%)
>noout flag(s) set
>
>
>have been following the troubleshooting guide at
>http://docs.ceph.com/docs/hammer/rados/troubleshooting/troubleshooting-pg/
>but gets stuck without a resolution.
>
>luckily it is not critical data. so i wanted to mark the pg lost so it
>could become health-ok<
> br
>/>
>
># ceph pg 6.d4 mark_unfound_lost delete
>Error EINVAL: pg has 25 unfound objects but we haven't probed all
>sources, not marking lost
>
>querying the pg i see that it would want osd.80 and osd 36
>
>  {
> "osd": "80",
> "status": "osd is down"
> },
>
>trying to mark the osd's lost does not work either. since the osd's was
>removed from the cluster a long time ago.
>
># ceph osd lost 80 --yes-i-really-mean-it
>osd.80 is not down or doesn't exist
>
># ceph osd lost 36 --yes-i-really-mean-it
>osd.36 is not down or doesn't exist
>
>
>and this is where i am stuck.
>
>have tried stopping and starting the 3 osd's but that did not have any
>effect.
>
>Anyone have any advice how to proceed ?
>
>full output at:  http://paste.debian.net/hidden/be03a185/
>
>this is hammer 0.94.9  on debian 8.
>
>
>kind regards
>
>Ronny Aasen
>
>
>
>
>
>
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CDM Tonight @ 9p EDT

2016-11-02 Thread John Spray

On this particular occasion most of the cephfs developers are in Europe, so
we are unlikely to make it.

John

On 2 Nov 2016 5:27 p.m., "Patrick McGarry"  wrote:

> Her cephers,
>
> I wanted to both post a reminder that our Ceph Developer Monthly
> meeting was tonight at 9p EDT, and pose a question:
>
> Are periodic Ceph Developer Meetings helpful and desired? Lately the
> participation has been sadly lacking, and I want to make sure we are
> providing a worthwhile platform for the community to discuss and
> collaborate. If people still desire the CDM platform, we will need to
> see much better participation in the pre-event planning. All that we
> ask there is a simple one-line description and maybe a link to a
> planning doc (a much lighter requirement than the blueprint process).
>
> http://wiki.ceph.com/CDM_02-NOV-2016
>
> If the CDM format is not helpful, I would like to know so that we can
> either alter or discontinue. I welcome any thoughts. Thanks.
>
> --
>
> Best Regards,
>
> Patrick McGarry
> Director Ceph Community || Red Hat
> http://ceph.com  ||  http://community.redhat.com
> @scuttlemonkey || @ceph
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] CDM Tonight @ 9p EDT

2016-11-02 Thread Patrick McGarry

Her cephers,

I wanted to both post a reminder that our Ceph Developer Monthly
meeting was tonight at 9p EDT, and pose a question:

Are periodic Ceph Developer Meetings helpful and desired? Lately the
participation has been sadly lacking, and I want to make sure we are
providing a worthwhile platform for the community to discuss and
collaborate. If people still desire the CDM platform, we will need to
see much better participation in the pre-event planning. All that we
ask there is a simple one-line description and maybe a link to a
planning doc (a much lighter requirement than the blueprint process).

http://wiki.ceph.com/CDM_02-NOV-2016

If the CDM format is not helpful, I would like to know so that we can
either alter or discontinue. I welcome any thoughts. Thanks.

-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] All PGs are active+clean, still remapped PGs

2016-11-02 Thread Wido den Hollander


> Op 2 november 2016 om 16:21 schreef Sage Weil :
> 
> 
> On Wed, 2 Nov 2016, Wido den Hollander wrote:
> > > > > I'm pretty sure this is a race condition that got cleaned up as part 
> > > > > of 
> > > > > https://github.com/ceph/ceph/pull/9078/commits.  The mon only checks 
> > > > > the 
> > > > > pg_temp entries that are getting set/changed, and since those are 
> > > > > already 
> > > > > in place it doesn't recheck them.  Any poke to the cluster that 
> > > > > triggers 
> > > > > peering ought to be enough to clear it up.  So, no need for logs, 
> > > > > thanks!
> > > > > 
> > > > 
> > > > Ok, just checking.
> > > > 
> > > > > We could add a special check during, say, upgrade, but generally the 
> > > > > PGs 
> > > > > will re-peer as the OSDs restart anyway and that will clear it up.
> > > > > 
> > > > > Maybe you can just confirm that marking an osd down (say, ceph osd 
> > > > > down 
> > > > > 31) is also enough to remove the stray entry?
> > > > > 
> > > > 
> > > > I already tried a restart of the OSDs, but that didn't work. I marked 
> > > > osd 31, 160 and 138 as down for PG 4.862 but that didn't work:
> > > > 
> > > > pg_temp 4.862 [31,160,138,2]
> > > > 
> > > > But this works:
> > > > 
> > > > root@mon1:~# ceph osd dump|grep pg_temp
> > > > pg_temp 4.862 [31,160,138,2]
> > > > pg_temp 4.a83 [156,83,10,7]
> > > > pg_temp 4.e8e [164,78,10,8]
> > > > root@mon1:~# ceph osd pg-temp 4.862 31
> > > > set 4.862 pg_temp mapping to [31]
> > > > root@mon1:~# ceph osd dump|grep pg_temp
> > > > pg_temp 4.a83 [156,83,10,7]
> > > > pg_temp 4.e8e [164,78,10,8]
> > > > root@mon1:~#
> > > > 
> > > > So the restarts nor the marking down fixed the issue. Only the pg-temp 
> > > > trick.
> > > > 
> > > > Still have two PGs left which I can test with.
> > > 
> > > Hmm.  Did you leave the OSD down long enough for the PG to peer without 
> > > it?  Can you confirm that doesn't work?
> > > 
> > 
> > I stopped osd.31, waited for all PGs to re-peer, waited another minute or 
> > so and started it again, that didn't work. The pg_temp wasn't resolved.
> > 
> > The whole cluster runs 0.94.9
> 
> Hrmpf.  Well, I guess that means a special case on upgrade would be 
> helpful.  Not convinced it's the most important thing though, given this 
> is probably a pretty rare case and can be fixed manually.  (OTOH, most 
> operators won't know that...)
> 

Yes, I think so. It's on the ML now so search machines can find it if needed!

Fixing the PGs now manually so that the MON stores can start to trim.

Wido

> sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] All PGs are active+clean, still remapped PGs

2016-11-02 Thread Sage Weil

On Wed, 2 Nov 2016, Wido den Hollander wrote:
> > > > I'm pretty sure this is a race condition that got cleaned up as part of 
> > > > https://github.com/ceph/ceph/pull/9078/commits.  The mon only checks 
> > > > the 
> > > > pg_temp entries that are getting set/changed, and since those are 
> > > > already 
> > > > in place it doesn't recheck them.  Any poke to the cluster that 
> > > > triggers 
> > > > peering ought to be enough to clear it up.  So, no need for logs, 
> > > > thanks!
> > > > 
> > > 
> > > Ok, just checking.
> > > 
> > > > We could add a special check during, say, upgrade, but generally the 
> > > > PGs 
> > > > will re-peer as the OSDs restart anyway and that will clear it up.
> > > > 
> > > > Maybe you can just confirm that marking an osd down (say, ceph osd down 
> > > > 31) is also enough to remove the stray entry?
> > > > 
> > > 
> > > I already tried a restart of the OSDs, but that didn't work. I marked osd 
> > > 31, 160 and 138 as down for PG 4.862 but that didn't work:
> > > 
> > > pg_temp 4.862 [31,160,138,2]
> > > 
> > > But this works:
> > > 
> > > root@mon1:~# ceph osd dump|grep pg_temp
> > > pg_temp 4.862 [31,160,138,2]
> > > pg_temp 4.a83 [156,83,10,7]
> > > pg_temp 4.e8e [164,78,10,8]
> > > root@mon1:~# ceph osd pg-temp 4.862 31
> > > set 4.862 pg_temp mapping to [31]
> > > root@mon1:~# ceph osd dump|grep pg_temp
> > > pg_temp 4.a83 [156,83,10,7]
> > > pg_temp 4.e8e [164,78,10,8]
> > > root@mon1:~#
> > > 
> > > So the restarts nor the marking down fixed the issue. Only the pg-temp 
> > > trick.
> > > 
> > > Still have two PGs left which I can test with.
> > 
> > Hmm.  Did you leave the OSD down long enough for the PG to peer without 
> > it?  Can you confirm that doesn't work?
> > 
> 
> I stopped osd.31, waited for all PGs to re-peer, waited another minute or so 
> and started it again, that didn't work. The pg_temp wasn't resolved.
> 
> The whole cluster runs 0.94.9

Hrmpf.  Well, I guess that means a special case on upgrade would be 
helpful.  Not convinced it's the most important thing though, given this 
is probably a pretty rare case and can be fixed manually.  (OTOH, most 
operators won't know that...)

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] All PGs are active+clean, still remapped PGs

2016-11-02 Thread Wido den Hollander


> Op 2 november 2016 om 16:00 schreef Sage Weil :
> 
> 
> On Wed, 2 Nov 2016, Wido den Hollander wrote:
> > > Op 2 november 2016 om 15:06 schreef Sage Weil :
> > > 
> > > 
> > > On Wed, 2 Nov 2016, Wido den Hollander wrote:
> > > > 
> > > > > Op 2 november 2016 om 14:30 schreef Sage Weil :
> > > > > 
> > > > > 
> > > > > On Wed, 2 Nov 2016, Wido den Hollander wrote:
> > > > > > 
> > > > > > > Op 26 oktober 2016 om 11:18 schreef Wido den Hollander 
> > > > > > > :
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > > Op 26 oktober 2016 om 10:44 schreef Sage Weil 
> > > > > > > > :
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On Wed, 26 Oct 2016, Dan van der Ster wrote:
> > > > > > > > > On Tue, Oct 25, 2016 at 7:06 AM, Wido den Hollander 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > >> Op 24 oktober 2016 om 22:29 schreef Dan van der Ster 
> > > > > > > > > >> :
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> Hi Wido,
> > > > > > > > > >>
> > > > > > > > > >> This seems similar to what our dumpling tunables cluster 
> > > > > > > > > >> does when a few
> > > > > > > > > >> particular osds go down... Though in our case the remapped 
> > > > > > > > > >> pgs are
> > > > > > > > > >> correctly shown as remapped, not clean.
> > > > > > > > > >>
> > > > > > > > > >> The fix in our case will be to enable the vary_r tunable 
> > > > > > > > > >> (which will move
> > > > > > > > > >> some data).
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > > Ah, as I figured. I will probably apply the Firefly 
> > > > > > > > > > tunables here. This cluster was upgraded from Dumping to 
> > > > > > > > > > Firefly and to Hammer recently and we didn't change the 
> > > > > > > > > > tunables yet.
> > > > > > > > > >
> > > > > > > > > > The MON stores are 35GB each right now and I think they are 
> > > > > > > > > > not trimming due to the pg_temp which still exists.
> > > > > > > > > >
> > > > > > > > > > I'll report back later, but this rebalance will take a lot 
> > > > > > > > > > of time.
> > > > > > > > > 
> > > > > > > > > I forgot to mention, a workaround for the vary_r issue is to 
> > > > > > > > > simply
> > > > > > > > > remove the down/out osd from the crush map. We just hit this 
> > > > > > > > > issue
> > > > > > > > > again last night on a failed osd and after removing it from 
> > > > > > > > > the crush
> > > > > > > > > map the last degraded PG started backfilling.
> > > > > > > > 
> > > > > > > > Also note that if you do enable vary_r, you can set it to a 
> > > > > > > > higher value 
> > > > > > > > (like 5) to get the benefit without moving as much existing 
> > > > > > > > data.  See the 
> > > > > > > > CRUSH tunable docs for more details!
> > > > > > > > 
> > > > > > > 
> > > > > > > Yes, thanks. So with the input here we have a few options and are 
> > > > > > > deciding which routes to take.
> > > > > > > 
> > > > > > > The cluster is rather old (hw as well), so we have to be careful 
> > > > > > > at this time. For the record, our options are:
> > > > > > > 
> > > > > > > - vary_r to 1: 73% misplaced
> > > > > > > - vary_r to 2 ~ 4: Looking into it
> > > > > > > - Removing dead OSDs from CRUSH
> > > > > > > 
> > > > > > > As the cluster is under some stress we have to do this in the 
> > > > > > > weekends, that makes it a bit difficult, but nothing we can't 
> > > > > > > overcome.
> > > > > > > 
> > > > > > > Thanks again for the input and I'll report on what we did later 
> > > > > > > on.
> > > > > > > 
> > > > > > 
> > > > > > So, what I did:
> > > > > > - Remove all dead OSDs from the CRUSHMap and OSDMap
> > > > > > - Set vary_r to 2
> > > > > > 
> > > > > > This resulted in:
> > > > > > 
> > > > > > osdmap e119647: 169 osds: 166 up, 166 in; 6 remapped pgs
> > > > > > 
> > > > > > pg_temp 4.39 [160,17,10,8]
> > > > > > pg_temp 4.2c9 [164,95,10,7]
> > > > > > pg_temp 4.816 [167,147,57,2]
> > > > > > pg_temp 4.862 [31,160,138,2]
> > > > > > pg_temp 4.a83 [156,83,10,7]
> > > > > > pg_temp 4.e8e [164,78,10,8]
> > > > > > 
> > > > > > In this case, osd 2 and 10 no longer exist, not in the OSDMap nor 
> > > > > > in the CRUSHMap.
> > > > > > 
> > > > > > root@mon1:~# ceph osd metadata 2
> > > > > > Error ENOENT: osd.2 does not exist
> > > > > > root@mon1:~# ceph osd metadata 10
> > > > > > Error ENOENT: osd.10 does not exist
> > > > > > root@mon1:~# ceph osd find 2
> > > > > > Error ENOENT: osd.2 does not exist
> > > > > > root@mon1:~# ceph osd find 10
> > > > > > Error ENOENT: osd.10 does not exist
> > > > > > root@mon1:~#
> > > > > > 
> > > > > > Looking at PG '4.39' for example, a query tells me:
> > > > > > 
> > > > > > "up": [
> > > > > > 160,
> > > > > > 17,
> > > > > > 8
> > > > > > ],
> > > > > > "acting": [
> > > > > > 160,
> > > > > > 17,
> > > > > >

Re: [ceph-users] All PGs are active+clean, still remapped PGs

2016-11-02 Thread Sage Weil

On Wed, 2 Nov 2016, Wido den Hollander wrote:
> > Op 2 november 2016 om 15:06 schreef Sage Weil :
> > 
> > 
> > On Wed, 2 Nov 2016, Wido den Hollander wrote:
> > > 
> > > > Op 2 november 2016 om 14:30 schreef Sage Weil :
> > > > 
> > > > 
> > > > On Wed, 2 Nov 2016, Wido den Hollander wrote:
> > > > > 
> > > > > > Op 26 oktober 2016 om 11:18 schreef Wido den Hollander 
> > > > > > :
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > Op 26 oktober 2016 om 10:44 schreef Sage Weil :
> > > > > > > 
> > > > > > > 
> > > > > > > On Wed, 26 Oct 2016, Dan van der Ster wrote:
> > > > > > > > On Tue, Oct 25, 2016 at 7:06 AM, Wido den Hollander 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > >> Op 24 oktober 2016 om 22:29 schreef Dan van der Ster 
> > > > > > > > >> :
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> Hi Wido,
> > > > > > > > >>
> > > > > > > > >> This seems similar to what our dumpling tunables cluster 
> > > > > > > > >> does when a few
> > > > > > > > >> particular osds go down... Though in our case the remapped 
> > > > > > > > >> pgs are
> > > > > > > > >> correctly shown as remapped, not clean.
> > > > > > > > >>
> > > > > > > > >> The fix in our case will be to enable the vary_r tunable 
> > > > > > > > >> (which will move
> > > > > > > > >> some data).
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > > Ah, as I figured. I will probably apply the Firefly tunables 
> > > > > > > > > here. This cluster was upgraded from Dumping to Firefly and 
> > > > > > > > > to Hammer recently and we didn't change the tunables yet.
> > > > > > > > >
> > > > > > > > > The MON stores are 35GB each right now and I think they are 
> > > > > > > > > not trimming due to the pg_temp which still exists.
> > > > > > > > >
> > > > > > > > > I'll report back later, but this rebalance will take a lot of 
> > > > > > > > > time.
> > > > > > > > 
> > > > > > > > I forgot to mention, a workaround for the vary_r issue is to 
> > > > > > > > simply
> > > > > > > > remove the down/out osd from the crush map. We just hit this 
> > > > > > > > issue
> > > > > > > > again last night on a failed osd and after removing it from the 
> > > > > > > > crush
> > > > > > > > map the last degraded PG started backfilling.
> > > > > > > 
> > > > > > > Also note that if you do enable vary_r, you can set it to a 
> > > > > > > higher value 
> > > > > > > (like 5) to get the benefit without moving as much existing data. 
> > > > > > >  See the 
> > > > > > > CRUSH tunable docs for more details!
> > > > > > > 
> > > > > > 
> > > > > > Yes, thanks. So with the input here we have a few options and are 
> > > > > > deciding which routes to take.
> > > > > > 
> > > > > > The cluster is rather old (hw as well), so we have to be careful at 
> > > > > > this time. For the record, our options are:
> > > > > > 
> > > > > > - vary_r to 1: 73% misplaced
> > > > > > - vary_r to 2 ~ 4: Looking into it
> > > > > > - Removing dead OSDs from CRUSH
> > > > > > 
> > > > > > As the cluster is under some stress we have to do this in the 
> > > > > > weekends, that makes it a bit difficult, but nothing we can't 
> > > > > > overcome.
> > > > > > 
> > > > > > Thanks again for the input and I'll report on what we did later on.
> > > > > > 
> > > > > 
> > > > > So, what I did:
> > > > > - Remove all dead OSDs from the CRUSHMap and OSDMap
> > > > > - Set vary_r to 2
> > > > > 
> > > > > This resulted in:
> > > > > 
> > > > > osdmap e119647: 169 osds: 166 up, 166 in; 6 remapped pgs
> > > > > 
> > > > > pg_temp 4.39 [160,17,10,8]
> > > > > pg_temp 4.2c9 [164,95,10,7]
> > > > > pg_temp 4.816 [167,147,57,2]
> > > > > pg_temp 4.862 [31,160,138,2]
> > > > > pg_temp 4.a83 [156,83,10,7]
> > > > > pg_temp 4.e8e [164,78,10,8]
> > > > > 
> > > > > In this case, osd 2 and 10 no longer exist, not in the OSDMap nor in 
> > > > > the CRUSHMap.
> > > > > 
> > > > > root@mon1:~# ceph osd metadata 2
> > > > > Error ENOENT: osd.2 does not exist
> > > > > root@mon1:~# ceph osd metadata 10
> > > > > Error ENOENT: osd.10 does not exist
> > > > > root@mon1:~# ceph osd find 2
> > > > > Error ENOENT: osd.2 does not exist
> > > > > root@mon1:~# ceph osd find 10
> > > > > Error ENOENT: osd.10 does not exist
> > > > > root@mon1:~#
> > > > > 
> > > > > Looking at PG '4.39' for example, a query tells me:
> > > > > 
> > > > > "up": [
> > > > > 160,
> > > > > 17,
> > > > > 8
> > > > > ],
> > > > > "acting": [
> > > > > 160,
> > > > > 17,
> > > > > 8
> > > > > ],
> > > > > 
> > > > > So I really wonder there the pg_temp with osd.10 comes from.
> > > > 
> > > > Hmm.. are the others also the same like that?  You can manually poke 
> > > > it into adjusting pg-temp with
> > > > 
> > > >  ceph osd pg_temp  
> > > > 
> > > > That'll make peering reevaluate what pg_temp it wants (if any).  It 
> > > > might 
> > >

Re: [ceph-users] All PGs are active+clean, still remapped PGs

2016-11-02 Thread Wido den Hollander


> Op 2 november 2016 om 15:06 schreef Sage Weil :
> 
> 
> On Wed, 2 Nov 2016, Wido den Hollander wrote:
> > 
> > > Op 2 november 2016 om 14:30 schreef Sage Weil :
> > > 
> > > 
> > > On Wed, 2 Nov 2016, Wido den Hollander wrote:
> > > > 
> > > > > Op 26 oktober 2016 om 11:18 schreef Wido den Hollander 
> > > > > :
> > > > > 
> > > > > 
> > > > > 
> > > > > > Op 26 oktober 2016 om 10:44 schreef Sage Weil :
> > > > > > 
> > > > > > 
> > > > > > On Wed, 26 Oct 2016, Dan van der Ster wrote:
> > > > > > > On Tue, Oct 25, 2016 at 7:06 AM, Wido den Hollander 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > >> Op 24 oktober 2016 om 22:29 schreef Dan van der Ster 
> > > > > > > >> :
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> Hi Wido,
> > > > > > > >>
> > > > > > > >> This seems similar to what our dumpling tunables cluster does 
> > > > > > > >> when a few
> > > > > > > >> particular osds go down... Though in our case the remapped pgs 
> > > > > > > >> are
> > > > > > > >> correctly shown as remapped, not clean.
> > > > > > > >>
> > > > > > > >> The fix in our case will be to enable the vary_r tunable 
> > > > > > > >> (which will move
> > > > > > > >> some data).
> > > > > > > >>
> > > > > > > >
> > > > > > > > Ah, as I figured. I will probably apply the Firefly tunables 
> > > > > > > > here. This cluster was upgraded from Dumping to Firefly and to 
> > > > > > > > Hammer recently and we didn't change the tunables yet.
> > > > > > > >
> > > > > > > > The MON stores are 35GB each right now and I think they are not 
> > > > > > > > trimming due to the pg_temp which still exists.
> > > > > > > >
> > > > > > > > I'll report back later, but this rebalance will take a lot of 
> > > > > > > > time.
> > > > > > > 
> > > > > > > I forgot to mention, a workaround for the vary_r issue is to 
> > > > > > > simply
> > > > > > > remove the down/out osd from the crush map. We just hit this issue
> > > > > > > again last night on a failed osd and after removing it from the 
> > > > > > > crush
> > > > > > > map the last degraded PG started backfilling.
> > > > > > 
> > > > > > Also note that if you do enable vary_r, you can set it to a higher 
> > > > > > value 
> > > > > > (like 5) to get the benefit without moving as much existing data.  
> > > > > > See the 
> > > > > > CRUSH tunable docs for more details!
> > > > > > 
> > > > > 
> > > > > Yes, thanks. So with the input here we have a few options and are 
> > > > > deciding which routes to take.
> > > > > 
> > > > > The cluster is rather old (hw as well), so we have to be careful at 
> > > > > this time. For the record, our options are:
> > > > > 
> > > > > - vary_r to 1: 73% misplaced
> > > > > - vary_r to 2 ~ 4: Looking into it
> > > > > - Removing dead OSDs from CRUSH
> > > > > 
> > > > > As the cluster is under some stress we have to do this in the 
> > > > > weekends, that makes it a bit difficult, but nothing we can't 
> > > > > overcome.
> > > > > 
> > > > > Thanks again for the input and I'll report on what we did later on.
> > > > > 
> > > > 
> > > > So, what I did:
> > > > - Remove all dead OSDs from the CRUSHMap and OSDMap
> > > > - Set vary_r to 2
> > > > 
> > > > This resulted in:
> > > > 
> > > > osdmap e119647: 169 osds: 166 up, 166 in; 6 remapped pgs
> > > > 
> > > > pg_temp 4.39 [160,17,10,8]
> > > > pg_temp 4.2c9 [164,95,10,7]
> > > > pg_temp 4.816 [167,147,57,2]
> > > > pg_temp 4.862 [31,160,138,2]
> > > > pg_temp 4.a83 [156,83,10,7]
> > > > pg_temp 4.e8e [164,78,10,8]
> > > > 
> > > > In this case, osd 2 and 10 no longer exist, not in the OSDMap nor in 
> > > > the CRUSHMap.
> > > > 
> > > > root@mon1:~# ceph osd metadata 2
> > > > Error ENOENT: osd.2 does not exist
> > > > root@mon1:~# ceph osd metadata 10
> > > > Error ENOENT: osd.10 does not exist
> > > > root@mon1:~# ceph osd find 2
> > > > Error ENOENT: osd.2 does not exist
> > > > root@mon1:~# ceph osd find 10
> > > > Error ENOENT: osd.10 does not exist
> > > > root@mon1:~#
> > > > 
> > > > Looking at PG '4.39' for example, a query tells me:
> > > > 
> > > > "up": [
> > > > 160,
> > > > 17,
> > > > 8
> > > > ],
> > > > "acting": [
> > > > 160,
> > > > 17,
> > > > 8
> > > > ],
> > > > 
> > > > So I really wonder there the pg_temp with osd.10 comes from.
> > > 
> > > Hmm.. are the others also the same like that?  You can manually poke 
> > > it into adjusting pg-temp with
> > > 
> > >  ceph osd pg_temp  
> > > 
> > > That'll make peering reevaluate what pg_temp it wants (if any).  It might 
> > > be that it isn't noticing that pg_temp matches acting.. but the mon has 
> > > special code to remove those entries, so hrm.  Is this hammer?
> > > 
> > 
> > So yes, that worked. I did it for 3 PGs:
> > 
> > # ceph osd pg-temp 4.39 160
> > # ceph osd pg-temp 4.2c9 164
> > # ceph osd pg-temp 4.816 167
> > 
> > Now my pg_temp

Re: [ceph-users] All PGs are active+clean, still remapped PGs

2016-11-02 Thread Sage Weil

On Wed, 2 Nov 2016, Wido den Hollander wrote:
> 
> > Op 2 november 2016 om 14:30 schreef Sage Weil :
> > 
> > 
> > On Wed, 2 Nov 2016, Wido den Hollander wrote:
> > > 
> > > > Op 26 oktober 2016 om 11:18 schreef Wido den Hollander :
> > > > 
> > > > 
> > > > 
> > > > > Op 26 oktober 2016 om 10:44 schreef Sage Weil :
> > > > > 
> > > > > 
> > > > > On Wed, 26 Oct 2016, Dan van der Ster wrote:
> > > > > > On Tue, Oct 25, 2016 at 7:06 AM, Wido den Hollander  
> > > > > > wrote:
> > > > > > >
> > > > > > >> Op 24 oktober 2016 om 22:29 schreef Dan van der Ster 
> > > > > > >> :
> > > > > > >>
> > > > > > >>
> > > > > > >> Hi Wido,
> > > > > > >>
> > > > > > >> This seems similar to what our dumpling tunables cluster does 
> > > > > > >> when a few
> > > > > > >> particular osds go down... Though in our case the remapped pgs 
> > > > > > >> are
> > > > > > >> correctly shown as remapped, not clean.
> > > > > > >>
> > > > > > >> The fix in our case will be to enable the vary_r tunable (which 
> > > > > > >> will move
> > > > > > >> some data).
> > > > > > >>
> > > > > > >
> > > > > > > Ah, as I figured. I will probably apply the Firefly tunables 
> > > > > > > here. This cluster was upgraded from Dumping to Firefly and to 
> > > > > > > Hammer recently and we didn't change the tunables yet.
> > > > > > >
> > > > > > > The MON stores are 35GB each right now and I think they are not 
> > > > > > > trimming due to the pg_temp which still exists.
> > > > > > >
> > > > > > > I'll report back later, but this rebalance will take a lot of 
> > > > > > > time.
> > > > > > 
> > > > > > I forgot to mention, a workaround for the vary_r issue is to simply
> > > > > > remove the down/out osd from the crush map. We just hit this issue
> > > > > > again last night on a failed osd and after removing it from the 
> > > > > > crush
> > > > > > map the last degraded PG started backfilling.
> > > > > 
> > > > > Also note that if you do enable vary_r, you can set it to a higher 
> > > > > value 
> > > > > (like 5) to get the benefit without moving as much existing data.  
> > > > > See the 
> > > > > CRUSH tunable docs for more details!
> > > > > 
> > > > 
> > > > Yes, thanks. So with the input here we have a few options and are 
> > > > deciding which routes to take.
> > > > 
> > > > The cluster is rather old (hw as well), so we have to be careful at 
> > > > this time. For the record, our options are:
> > > > 
> > > > - vary_r to 1: 73% misplaced
> > > > - vary_r to 2 ~ 4: Looking into it
> > > > - Removing dead OSDs from CRUSH
> > > > 
> > > > As the cluster is under some stress we have to do this in the weekends, 
> > > > that makes it a bit difficult, but nothing we can't overcome.
> > > > 
> > > > Thanks again for the input and I'll report on what we did later on.
> > > > 
> > > 
> > > So, what I did:
> > > - Remove all dead OSDs from the CRUSHMap and OSDMap
> > > - Set vary_r to 2
> > > 
> > > This resulted in:
> > > 
> > > osdmap e119647: 169 osds: 166 up, 166 in; 6 remapped pgs
> > > 
> > > pg_temp 4.39 [160,17,10,8]
> > > pg_temp 4.2c9 [164,95,10,7]
> > > pg_temp 4.816 [167,147,57,2]
> > > pg_temp 4.862 [31,160,138,2]
> > > pg_temp 4.a83 [156,83,10,7]
> > > pg_temp 4.e8e [164,78,10,8]
> > > 
> > > In this case, osd 2 and 10 no longer exist, not in the OSDMap nor in the 
> > > CRUSHMap.
> > > 
> > > root@mon1:~# ceph osd metadata 2
> > > Error ENOENT: osd.2 does not exist
> > > root@mon1:~# ceph osd metadata 10
> > > Error ENOENT: osd.10 does not exist
> > > root@mon1:~# ceph osd find 2
> > > Error ENOENT: osd.2 does not exist
> > > root@mon1:~# ceph osd find 10
> > > Error ENOENT: osd.10 does not exist
> > > root@mon1:~#
> > > 
> > > Looking at PG '4.39' for example, a query tells me:
> > > 
> > > "up": [
> > > 160,
> > > 17,
> > > 8
> > > ],
> > > "acting": [
> > > 160,
> > > 17,
> > > 8
> > > ],
> > > 
> > > So I really wonder there the pg_temp with osd.10 comes from.
> > 
> > Hmm.. are the others also the same like that?  You can manually poke 
> > it into adjusting pg-temp with
> > 
> >  ceph osd pg_temp  
> > 
> > That'll make peering reevaluate what pg_temp it wants (if any).  It might 
> > be that it isn't noticing that pg_temp matches acting.. but the mon has 
> > special code to remove those entries, so hrm.  Is this hammer?
> > 
> 
> So yes, that worked. I did it for 3 PGs:
> 
> # ceph osd pg-temp 4.39 160
> # ceph osd pg-temp 4.2c9 164
> # ceph osd pg-temp 4.816 167
> 
> Now my pg_temp looks like:
> 
> pg_temp 4.862 [31,160,138,2]
> pg_temp 4.a83 [156,83,10,7]
> pg_temp 4.e8e [164,78,10,8]
> 
> There we see the osd.2 and osd.10 again. I'm not setting these yet since you 
> might want logs from the MONs or OSDs?
> 
> This is Hammer 0.94.9

I'm pretty sure this is a race condition that got cleaned up as part of 
https://github.com/ceph/ceph/pull/9078/commits.  The

Re: [ceph-users] XFS no space left on device

2016-11-02 Thread igor.podo...@ts.fujitsu.com

Hey,

http://xfs.org/docs/xfsdocs-xml-dev/XFS_Filesystem_Structure/tmp/en-US/html/Allocation_Groups.html
 
"Each AG can be up to one terabyte in size (512 bytes * 2^31), regardless of 
the underlying device's sector size."
"The only global information maintained by the first AG (primary) is free space 
across the filesystem and total inode counts"

Checkout this link: 
http://osvault.blogspot.de/2011/03/fixing-1tbyte-inode-problem-in-xfs-file.html 
, maybe your first AG (0) is full.

Regards,
Igor.

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> ??? ???
> Sent: Tuesday, October 25, 2016 2:52 PM
> To: ceph-users
> Subject: Re: [ceph-users] XFS no space left on device
> 
> This is a a bit more information about that XFS:
> 
> root@ed-ds-c178:[~]:$ xfs_info /dev/mapper/disk23p1
> meta-data=/dev/mapper/disk23p1   isize=2048   agcount=6,
> agsize=268435455 blks
>  =   sectsz=4096  attr=2, projid32bit=1
>  =   crc=0finobt=0
> data =   bsize=4096   blocks=1465130385, imaxpct=5
>  =   sunit=0  swidth=0 blks
> naming   =version 2  bsize=4096   ascii-ci=0 ftype=0
> log  =internal   bsize=4096   blocks=521728, version=2
>  =   sectsz=4096  sunit=1 blks, lazy-count=1
> realtime =none   extsz=4096   blocks=0, rtextents=0
> 
> root@ed-ds-c178:[~]:$ xfs_db /dev/mapper/disk23p1 xfs_db> frag actual
> 25205642, ideal 22794438, fragmentation factor 9.57%
> 
> 2016-10-25 14:59 GMT+03:00 Василий Ангапов :
> > Actually all OSDs are already mounted with inode64 option. Otherwise I
> > could not write beyond 1TB.
> >
> > 2016-10-25 14:53 GMT+03:00 Ashley Merrick :
> >> Sounds like 32bit Inode limit, if you mount with -o inode64 (not 100% how
> you would do in ceph), would allow data to continue to be wrote.
> >>
> >> ,Ashley
> >>
> >> -Original Message-
> >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> Of ??? ???
> >> Sent: 25 October 2016 12:38
> >> To: ceph-users 
> >> Subject: [ceph-users] XFS no space left on device
> >>
> >> Hello,
> >>
> >> I got Ceph 10.2.1 cluster with 10 nodes, each having 29 * 6TB OSDs.
> >> Yesterday I found that 3 OSDs were down and out with 89% space
> utilization.
> >> In logs there is:
> >> 2016-10-24 22:36:37.599253 7f8309c5e800  0 ceph version 10.2.1
> >> (3a66dd4f30852819c1bdaa8ec23c795d4ad77269), process ceph-osd, pid
> >> 2602081
> >> 2016-10-24 22:36:37.600129 7f8309c5e800  0 pidfile_write: ignore
> >> empty --pid-file
> >> 2016-10-24 22:36:37.635769 7f8309c5e800  0
> >> filestore(/var/lib/ceph/osd/ceph-123) backend xfs (magic 0x58465342)
> >> 2016-10-24 22:36:37.635805 7f8309c5e800 -1
> >> genericfilestorebackend(/var/lib/ceph/osd/ceph-123) detect_features:
> >> unable to create /var/lib/ceph/osd/ceph-123/fiemap_test: (28) No
> >> space left on device
> >> 2016-10-24 22:36:37.635814 7f8309c5e800 -1
> >> filestore(/var/lib/ceph/osd/ceph-123) _detect_fs: detect_features
> >> error: (28) No space left on device
> >> 2016-10-24 22:36:37.635818 7f8309c5e800 -1
> >> filestore(/var/lib/ceph/osd/ceph-123) FileStore::mount: error in
> >> _detect_fs: (28) No space left on device
> >> 2016-10-24 22:36:37.635824 7f8309c5e800 -1 osd.123 0 OSD:init: unable
> >> to mount object store
> >> 2016-10-24 22:36:37.635827 7f8309c5e800 -1 ESC[0;31m ** ERROR: osd
> >> init failed: (28) No space left on deviceESC[0m
> >>
> >> root@ed-ds-c178:[/var/lib/ceph/osd/ceph-123]:$ df -h
> /var/lib/ceph/osd/ceph-123
> >> FilesystemSize  Used Avail Use% Mounted on
> >> /dev/mapper/disk23p1  5.5T  4.9T  651G  89%
> >> /var/lib/ceph/osd/ceph-123
> >>
> >> root@ed-ds-c178:[/var/lib/ceph/osd/ceph-123]:$ df -i
> /var/lib/ceph/osd/ceph-123
> >> Filesystem  InodesIUsed IFree IUse% Mounted on
> >> /dev/mapper/disk23p1 146513024 22074752 124438272   16%
> >> /var/lib/ceph/osd/ceph-123
> >>
> >> root@ed-ds-c178:[/var/lib/ceph/osd/ceph-123]:$ touch 123
> >> touch: cannot touch ‘123’: No space left on device
> >>
> >> root@ed-ds-c178:[/var/lib/ceph/osd/ceph-123]:$ grep ceph-123
> >> /proc/mounts
> >> /dev/mapper/disk23p1 /var/lib/ceph/osd/ceph-123 xfs
> >> rw,noatime,attr2,inode64,noquota 0 0
> >>
> >> The same situation is for all three down OSDs. OSD can be unmounted
> and mounted without problem:
> >> root@ed-ds-c178:[~]:$ umount /var/lib/ceph/osd/ceph-123
> >> root@ed-ds-c178:[~]:$ root@ed-ds-c178:[~]:$ mount
> >> /var/lib/ceph/osd/ceph-123 root@ed-ds-c178:[~]:$ touch
> >> /var/lib/ceph/osd/ceph-123/123
> >> touch: cannot touch ‘/var/lib/ceph/osd/ceph-123/123’: No space left
> >> on device
> >>
> >> xfs_repair gives no error for FS.
> >>
> >> Kernel is
> >> root@ed-ds-c178:[~]:$ uname -r
> >> 4.7.0-1.el7.wg.x86_64
> >>
> >> What else can

Re: [ceph-users] All PGs are active+clean, still remapped PGs

2016-11-02 Thread Wido den Hollander


> Op 2 november 2016 om 14:30 schreef Sage Weil :
> 
> 
> On Wed, 2 Nov 2016, Wido den Hollander wrote:
> > 
> > > Op 26 oktober 2016 om 11:18 schreef Wido den Hollander :
> > > 
> > > 
> > > 
> > > > Op 26 oktober 2016 om 10:44 schreef Sage Weil :
> > > > 
> > > > 
> > > > On Wed, 26 Oct 2016, Dan van der Ster wrote:
> > > > > On Tue, Oct 25, 2016 at 7:06 AM, Wido den Hollander  
> > > > > wrote:
> > > > > >
> > > > > >> Op 24 oktober 2016 om 22:29 schreef Dan van der Ster 
> > > > > >> :
> > > > > >>
> > > > > >>
> > > > > >> Hi Wido,
> > > > > >>
> > > > > >> This seems similar to what our dumpling tunables cluster does when 
> > > > > >> a few
> > > > > >> particular osds go down... Though in our case the remapped pgs are
> > > > > >> correctly shown as remapped, not clean.
> > > > > >>
> > > > > >> The fix in our case will be to enable the vary_r tunable (which 
> > > > > >> will move
> > > > > >> some data).
> > > > > >>
> > > > > >
> > > > > > Ah, as I figured. I will probably apply the Firefly tunables here. 
> > > > > > This cluster was upgraded from Dumping to Firefly and to Hammer 
> > > > > > recently and we didn't change the tunables yet.
> > > > > >
> > > > > > The MON stores are 35GB each right now and I think they are not 
> > > > > > trimming due to the pg_temp which still exists.
> > > > > >
> > > > > > I'll report back later, but this rebalance will take a lot of time.
> > > > > 
> > > > > I forgot to mention, a workaround for the vary_r issue is to simply
> > > > > remove the down/out osd from the crush map. We just hit this issue
> > > > > again last night on a failed osd and after removing it from the crush
> > > > > map the last degraded PG started backfilling.
> > > > 
> > > > Also note that if you do enable vary_r, you can set it to a higher 
> > > > value 
> > > > (like 5) to get the benefit without moving as much existing data.  See 
> > > > the 
> > > > CRUSH tunable docs for more details!
> > > > 
> > > 
> > > Yes, thanks. So with the input here we have a few options and are 
> > > deciding which routes to take.
> > > 
> > > The cluster is rather old (hw as well), so we have to be careful at this 
> > > time. For the record, our options are:
> > > 
> > > - vary_r to 1: 73% misplaced
> > > - vary_r to 2 ~ 4: Looking into it
> > > - Removing dead OSDs from CRUSH
> > > 
> > > As the cluster is under some stress we have to do this in the weekends, 
> > > that makes it a bit difficult, but nothing we can't overcome.
> > > 
> > > Thanks again for the input and I'll report on what we did later on.
> > > 
> > 
> > So, what I did:
> > - Remove all dead OSDs from the CRUSHMap and OSDMap
> > - Set vary_r to 2
> > 
> > This resulted in:
> > 
> > osdmap e119647: 169 osds: 166 up, 166 in; 6 remapped pgs
> > 
> > pg_temp 4.39 [160,17,10,8]
> > pg_temp 4.2c9 [164,95,10,7]
> > pg_temp 4.816 [167,147,57,2]
> > pg_temp 4.862 [31,160,138,2]
> > pg_temp 4.a83 [156,83,10,7]
> > pg_temp 4.e8e [164,78,10,8]
> > 
> > In this case, osd 2 and 10 no longer exist, not in the OSDMap nor in the 
> > CRUSHMap.
> > 
> > root@mon1:~# ceph osd metadata 2
> > Error ENOENT: osd.2 does not exist
> > root@mon1:~# ceph osd metadata 10
> > Error ENOENT: osd.10 does not exist
> > root@mon1:~# ceph osd find 2
> > Error ENOENT: osd.2 does not exist
> > root@mon1:~# ceph osd find 10
> > Error ENOENT: osd.10 does not exist
> > root@mon1:~#
> > 
> > Looking at PG '4.39' for example, a query tells me:
> > 
> > "up": [
> > 160,
> > 17,
> > 8
> > ],
> > "acting": [
> > 160,
> > 17,
> > 8
> > ],
> > 
> > So I really wonder there the pg_temp with osd.10 comes from.
> 
> Hmm.. are the others also the same like that?  You can manually poke 
> it into adjusting pg-temp with
> 
>  ceph osd pg_temp  
> 
> That'll make peering reevaluate what pg_temp it wants (if any).  It might 
> be that it isn't noticing that pg_temp matches acting.. but the mon has 
> special code to remove those entries, so hrm.  Is this hammer?
> 

So yes, that worked. I did it for 3 PGs:

# ceph osd pg-temp 4.39 160
# ceph osd pg-temp 4.2c9 164
# ceph osd pg-temp 4.816 167

Now my pg_temp looks like:

pg_temp 4.862 [31,160,138,2]
pg_temp 4.a83 [156,83,10,7]
pg_temp 4.e8e [164,78,10,8]

There we see the osd.2 and osd.10 again. I'm not setting these yet since you 
might want logs from the MONs or OSDs?

This is Hammer 0.94.9

> > Setting vary_r to 1 will result in a 76% degraded state for the cluster 
> > and I'm trying to avoid that (for now).
> > 
> > I restarted the Primary OSDs for all the affected PGs, but that didn't 
> > help either.
> > 
> > Any bright ideas on how to fix this?
> 
> This part seems unrelated to vary_r... you shouldn't have to 
> reduce it further!
> 

Indeed, like you said, the pg_temp fixed it for 3 PGs already. Holding off with 
the rest in case you want logs or debug it further.

Re: [ceph-users] All PGs are active+clean, still remapped PGs

2016-11-02 Thread Sage Weil

On Wed, 2 Nov 2016, Wido den Hollander wrote:
> 
> > Op 26 oktober 2016 om 11:18 schreef Wido den Hollander :
> > 
> > 
> > 
> > > Op 26 oktober 2016 om 10:44 schreef Sage Weil :
> > > 
> > > 
> > > On Wed, 26 Oct 2016, Dan van der Ster wrote:
> > > > On Tue, Oct 25, 2016 at 7:06 AM, Wido den Hollander  
> > > > wrote:
> > > > >
> > > > >> Op 24 oktober 2016 om 22:29 schreef Dan van der Ster 
> > > > >> :
> > > > >>
> > > > >>
> > > > >> Hi Wido,
> > > > >>
> > > > >> This seems similar to what our dumpling tunables cluster does when a 
> > > > >> few
> > > > >> particular osds go down... Though in our case the remapped pgs are
> > > > >> correctly shown as remapped, not clean.
> > > > >>
> > > > >> The fix in our case will be to enable the vary_r tunable (which will 
> > > > >> move
> > > > >> some data).
> > > > >>
> > > > >
> > > > > Ah, as I figured. I will probably apply the Firefly tunables here. 
> > > > > This cluster was upgraded from Dumping to Firefly and to Hammer 
> > > > > recently and we didn't change the tunables yet.
> > > > >
> > > > > The MON stores are 35GB each right now and I think they are not 
> > > > > trimming due to the pg_temp which still exists.
> > > > >
> > > > > I'll report back later, but this rebalance will take a lot of time.
> > > > 
> > > > I forgot to mention, a workaround for the vary_r issue is to simply
> > > > remove the down/out osd from the crush map. We just hit this issue
> > > > again last night on a failed osd and after removing it from the crush
> > > > map the last degraded PG started backfilling.
> > > 
> > > Also note that if you do enable vary_r, you can set it to a higher value 
> > > (like 5) to get the benefit without moving as much existing data.  See 
> > > the 
> > > CRUSH tunable docs for more details!
> > > 
> > 
> > Yes, thanks. So with the input here we have a few options and are deciding 
> > which routes to take.
> > 
> > The cluster is rather old (hw as well), so we have to be careful at this 
> > time. For the record, our options are:
> > 
> > - vary_r to 1: 73% misplaced
> > - vary_r to 2 ~ 4: Looking into it
> > - Removing dead OSDs from CRUSH
> > 
> > As the cluster is under some stress we have to do this in the weekends, 
> > that makes it a bit difficult, but nothing we can't overcome.
> > 
> > Thanks again for the input and I'll report on what we did later on.
> > 
> 
> So, what I did:
> - Remove all dead OSDs from the CRUSHMap and OSDMap
> - Set vary_r to 2
> 
> This resulted in:
> 
> osdmap e119647: 169 osds: 166 up, 166 in; 6 remapped pgs
> 
> pg_temp 4.39 [160,17,10,8]
> pg_temp 4.2c9 [164,95,10,7]
> pg_temp 4.816 [167,147,57,2]
> pg_temp 4.862 [31,160,138,2]
> pg_temp 4.a83 [156,83,10,7]
> pg_temp 4.e8e [164,78,10,8]
> 
> In this case, osd 2 and 10 no longer exist, not in the OSDMap nor in the 
> CRUSHMap.
> 
> root@mon1:~# ceph osd metadata 2
> Error ENOENT: osd.2 does not exist
> root@mon1:~# ceph osd metadata 10
> Error ENOENT: osd.10 does not exist
> root@mon1:~# ceph osd find 2
> Error ENOENT: osd.2 does not exist
> root@mon1:~# ceph osd find 10
> Error ENOENT: osd.10 does not exist
> root@mon1:~#
> 
> Looking at PG '4.39' for example, a query tells me:
> 
> "up": [
> 160,
> 17,
> 8
> ],
> "acting": [
> 160,
> 17,
> 8
> ],
> 
> So I really wonder there the pg_temp with osd.10 comes from.

Hmm.. are the others also the same like that?  You can manually poke 
it into adjusting pg-temp with

 ceph osd pg_temp  

That'll make peering reevaluate what pg_temp it wants (if any).  It might 
be that it isn't noticing that pg_temp matches acting.. but the mon has 
special code to remove those entries, so hrm.  Is this hammer?

> Setting vary_r to 1 will result in a 76% degraded state for the cluster 
> and I'm trying to avoid that (for now).
> 
> I restarted the Primary OSDs for all the affected PGs, but that didn't 
> help either.
> 
> Any bright ideas on how to fix this?

This part seems unrelated to vary_r... you shouldn't have to 
reduce it further!

sage


> 
> Wido
> 
> > Wido 
> > 
> > > sage
> > > 
> > > 
> > > > 
> > > > Cheers, Dan
> > > > 
> > > > 
> > > > >
> > > > > Wido
> > > > >
> > > > >> Cheers, Dan
> > > > >>
> > > > >> On 24 Oct 2016 22:19, "Wido den Hollander"  wrote:
> > > > >> >
> > > > >> > Hi,
> > > > >> >
> > > > >> > On a cluster running Hammer 0.94.9 (upgraded from Firefly) I have 
> > > > >> > 29
> > > > >> remapped PGs according to the OSDMap, but all PGs are active+clean.
> > > > >> >
> > > > >> > osdmap e111208: 171 osds: 166 up, 166 in; 29 remapped pgs
> > > > >> >
> > > > >> > pgmap v101069070: 6144 pgs, 2 pools, 90122 GB data, 22787 kobjects
> > > > >> > 264 TB used, 184 TB / 448 TB avail
> > > > >> > 6144 active+clean
> > > > >> >
> > > > >> > The OSDMap shows:
> > > > >> >
> > > > >> > root@mon1:~# ceph osd dump|grep pg_temp
> > > > >> >

Re: [ceph-users] All PGs are active+clean, still remapped PGs

2016-11-02 Thread Wido den Hollander


> Op 26 oktober 2016 om 11:18 schreef Wido den Hollander :
> 
> 
> 
> > Op 26 oktober 2016 om 10:44 schreef Sage Weil :
> > 
> > 
> > On Wed, 26 Oct 2016, Dan van der Ster wrote:
> > > On Tue, Oct 25, 2016 at 7:06 AM, Wido den Hollander  wrote:
> > > >
> > > >> Op 24 oktober 2016 om 22:29 schreef Dan van der Ster 
> > > >> :
> > > >>
> > > >>
> > > >> Hi Wido,
> > > >>
> > > >> This seems similar to what our dumpling tunables cluster does when a 
> > > >> few
> > > >> particular osds go down... Though in our case the remapped pgs are
> > > >> correctly shown as remapped, not clean.
> > > >>
> > > >> The fix in our case will be to enable the vary_r tunable (which will 
> > > >> move
> > > >> some data).
> > > >>
> > > >
> > > > Ah, as I figured. I will probably apply the Firefly tunables here. This 
> > > > cluster was upgraded from Dumping to Firefly and to Hammer recently and 
> > > > we didn't change the tunables yet.
> > > >
> > > > The MON stores are 35GB each right now and I think they are not 
> > > > trimming due to the pg_temp which still exists.
> > > >
> > > > I'll report back later, but this rebalance will take a lot of time.
> > > 
> > > I forgot to mention, a workaround for the vary_r issue is to simply
> > > remove the down/out osd from the crush map. We just hit this issue
> > > again last night on a failed osd and after removing it from the crush
> > > map the last degraded PG started backfilling.
> > 
> > Also note that if you do enable vary_r, you can set it to a higher value 
> > (like 5) to get the benefit without moving as much existing data.  See the 
> > CRUSH tunable docs for more details!
> > 
> 
> Yes, thanks. So with the input here we have a few options and are deciding 
> which routes to take.
> 
> The cluster is rather old (hw as well), so we have to be careful at this 
> time. For the record, our options are:
> 
> - vary_r to 1: 73% misplaced
> - vary_r to 2 ~ 4: Looking into it
> - Removing dead OSDs from CRUSH
> 
> As the cluster is under some stress we have to do this in the weekends, that 
> makes it a bit difficult, but nothing we can't overcome.
> 
> Thanks again for the input and I'll report on what we did later on.
> 

So, what I did:
- Remove all dead OSDs from the CRUSHMap and OSDMap
- Set vary_r to 2

This resulted in:

osdmap e119647: 169 osds: 166 up, 166 in; 6 remapped pgs

pg_temp 4.39 [160,17,10,8]
pg_temp 4.2c9 [164,95,10,7]
pg_temp 4.816 [167,147,57,2]
pg_temp 4.862 [31,160,138,2]
pg_temp 4.a83 [156,83,10,7]
pg_temp 4.e8e [164,78,10,8]

In this case, osd 2 and 10 no longer exist, not in the OSDMap nor in the 
CRUSHMap.

root@mon1:~# ceph osd metadata 2
Error ENOENT: osd.2 does not exist
root@mon1:~# ceph osd metadata 10
Error ENOENT: osd.10 does not exist
root@mon1:~# ceph osd find 2
Error ENOENT: osd.2 does not exist
root@mon1:~# ceph osd find 10
Error ENOENT: osd.10 does not exist
root@mon1:~#

Looking at PG '4.39' for example, a query tells me:

"up": [
160,
17,
8
],
"acting": [
160,
17,
8
],

So I really wonder there the pg_temp with osd.10 comes from.

Setting vary_r to 1 will result in a 76% degraded state for the cluster and I'm 
trying to avoid that (for now).

I restarted the Primary OSDs for all the affected PGs, but that didn't help 
either.

Any bright ideas on how to fix this?

Wido

> Wido 
> 
> > sage
> > 
> > 
> > > 
> > > Cheers, Dan
> > > 
> > > 
> > > >
> > > > Wido
> > > >
> > > >> Cheers, Dan
> > > >>
> > > >> On 24 Oct 2016 22:19, "Wido den Hollander"  wrote:
> > > >> >
> > > >> > Hi,
> > > >> >
> > > >> > On a cluster running Hammer 0.94.9 (upgraded from Firefly) I have 29
> > > >> remapped PGs according to the OSDMap, but all PGs are active+clean.
> > > >> >
> > > >> > osdmap e111208: 171 osds: 166 up, 166 in; 29 remapped pgs
> > > >> >
> > > >> > pgmap v101069070: 6144 pgs, 2 pools, 90122 GB data, 22787 kobjects
> > > >> > 264 TB used, 184 TB / 448 TB avail
> > > >> > 6144 active+clean
> > > >> >
> > > >> > The OSDMap shows:
> > > >> >
> > > >> > root@mon1:~# ceph osd dump|grep pg_temp
> > > >> > pg_temp 4.39 [160,17,10,8]
> > > >> > pg_temp 4.52 [161,16,10,11]
> > > >> > pg_temp 4.8b [166,29,10,7]
> > > >> > pg_temp 4.b1 [5,162,148,2]
> > > >> > pg_temp 4.168 [95,59,6,2]
> > > >> > pg_temp 4.1ef [22,162,10,5]
> > > >> > pg_temp 4.2c9 [164,95,10,7]
> > > >> > pg_temp 4.330 [165,154,10,8]
> > > >> > pg_temp 4.353 [2,33,18,54]
> > > >> > pg_temp 4.3f8 [88,67,10,7]
> > > >> > pg_temp 4.41a [30,59,10,5]
> > > >> > pg_temp 4.45f [47,156,21,2]
> > > >> > pg_temp 4.486 [138,43,10,7]
> > > >> > pg_temp 4.674 [59,18,7,2]
> > > >> > pg_temp 4.7b8 [164,68,10,11]
> > > >> > pg_temp 4.816 [167,147,57,2]
> > > >> > pg_temp 4.829 [82,45,10,11]
> > > >> > pg_temp 4.843 [141,34,10,6]
> > > >> > pg_temp 4.862 [31,160,138,2]
> > > >> > pg_temp 4.868 [78,67,10,5]
> > > >> > pg_temp 4.9ca

[ceph-users] PGs stuck at creating forever

2016-11-02 Thread Vlad Blando

I have a 3 Cluster Giant setup with 8 OSD each, during the installation I
had to redo a cluster but it looks like the info is still on crush map
(based on my readings). How do I fix this?

[root@avatar0-ceph1 ~]# ceph -s
cluster 2f0d1928-2ee5-4731-a259-64c0dc16110a
 health HEALTH_WARN 139 pgs stuck inactive; 139 pgs stuck unclean; 2
requests are blocked > 32 sec; pool rbd pg_num 300 > pgp_num 64
 monmap e1: 3 mons at {avatar0-ceph0=
172.40.40.100:6789/0,avatar0-ceph1=172.40.40.101:6789/0,avatar0-ceph2=172.40.40.102:6789/0},
election epoch 56, quorum 0,1,2 avatar0-ceph0,avatar0-ceph1,avatar0-ceph2
 osdmap e557: 24 osds: 24 up, 24 in
  pgmap v1807359: 1500 pgs, 5 pools, 358 GB data, 48728 objects
737 GB used, 88630 GB / 89368 GB avail
 139 creating
1361 active+clean
  client io 44391 B/s wr, 19 op/s
[root@avatar0-ceph1 ~]#


[root@avatar0-ceph0 current]# ceph health detail
HEALTH_WARN 139 pgs stuck inactive; 139 pgs stuck unclean; 2 requests are
blocked > 32 sec; 2 osds have slow requests; pool rbd pg_num 300 > pgp_num
64
pg 0.f4 is stuck inactive since forever, current state creating, last
acting [7,9]
pg 0.f2 is stuck inactive since forever, current state creating, last
acting [16,14]
pg 0.ef is stuck inactive since forever, current state creating, last
acting [13,0]
pg 0.ee is stuck inactive since forever, current state creating, last
acting [19,0]
pg 0.ec is stuck inactive since forever, current state creating, last
acting [12,20]
pg 0.ea is stuck inactive since forever, current state creating, last
acting [11,3]
pg 0.e9 is stuck inactive since forever, current state creating, last
acting [9,21]
pg 0.e3 is stuck inactive since forever, current state creating, last
acting [3,8]
pg 0.e2 is stuck inactive since forever, current state creating, last
acting [8,5]
pg 0.e0 is stuck inactive since forever, current state creating, last
acting [7,18]
pg 0.dd is stuck inactive since forever, current state creating, last
acting [5,17]
pg 0.dc is stuck inactive since forever, current state creating, last
acting [3,13]
pg 0.db is stuck inactive since forever, current state creating, last
acting [4,10]
pg 0.da is stuck inactive since forever, current state creating, last
acting [6,18]
pg 0.d9 is stuck inactive since forever, current state creating, last
acting [23,12]
pg 0.d5 is stuck inactive since forever, current state creating, last
acting [22,11]
pg 0.d3 is stuck inactive since forever, current state creating, last
acting [3,15]
pg 0.d2 is stuck inactive since forever, current state creating, last
acting [1,9]
pg 0.d1 is stuck inactive since forever, current state creating, last
acting [0,13]
pg 0.ce is stuck inactive since forever, current state creating, last
acting [19,7]
pg 0.cd is stuck inactive since forever, current state creating, last
acting [13,3]
pg 0.cc is stuck inactive since forever, current state creating, last
acting [0,15]
pg 0.cb is stuck inactive since forever, current state creating, last
acting [17,3]
pg 0.ca is stuck inactive since forever, current state creating, last
acting [15,7]
pg 0.c9 is stuck inactive since forever, current state creating, last
acting [18,6]
pg 0.c8 is stuck inactive since forever, current state creating, last
acting [3,10]
pg 0.c5 is stuck inactive since forever, current state creating, last
acting [10,22]
pg 0.c4 is stuck inactive since forever, current state creating, last
acting [0,12]
pg 0.c1 is stuck inactive since forever, current state creating, last
acting [2,13]
pg 0.c0 is stuck inactive since forever, current state creating, last
acting [17,4]
pg 0.bf is stuck inactive since forever, current state creating, last
acting [18,12]
pg 0.be is stuck inactive since forever, current state creating, last
acting [13,21]
pg 0.bd is stuck inactive since forever, current state creating, last
acting [23,14]
pg 0.bb is stuck inactive since forever, current state creating, last
acting [23,8]
pg 0.ba is stuck inactive since forever, current state creating, last
acting [17,9]
pg 0.b9 is stuck inactive since forever, current state creating, last
acting [0,16]
pg 0.b7 is stuck inactive since forever, current state creating, last
acting [1,21]
pg 0.b6 is stuck inactive since forever, current state creating, last
acting [0,8]
pg 0.b4 is stuck inactive since forever, current state creating, last
acting [7,9]
pg 0.b2 is stuck inactive since forever, current state creating, last
acting [16,14]
pg 0.af is stuck inactive since forever, current state creating, last
acting [13,0]
pg 0.ae is stuck inactive since forever, current state creating, last
acting [19,0]
pg 0.ac is stuck inactive since forever, current state creating, last
acting [12,20]
pg 0.aa is stuck inactive since forever, current state creating, last
acting [11,3]
pg 0.a9 is stuck inactive since forever, current state creating, last
acting [9,21]
pg 0.a3 is stuck inactive since forever, current state creating, last
acting [3,8]
pg 0.a2 is stuck inactive

[ceph-users] Question about PG class

2016-11-02 Thread xxhdx1985126

Hi, everyone.


What are the meanings of the fields  actingbackfill, want_acting and 
backfill_targets  of the PG class?
Thank you:-)___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Introducing DeepSea: A tool for deploying Ceph using Salt

[ceph-users] backup of radosgw config

[ceph-users] Bluestore + erasure coding memory usage

Re: [ceph-users] CDM

[ceph-users] CDM

Re: [ceph-users] MDS Problems - Solved but reporting for benefit of others

[ceph-users] Multi-tenancy and sharing CephFS data pools with other RADOS users

[ceph-users] MDS Problems - Solved but reporting for benefit of others

Re: [ceph-users] [EXTERNAL] Re: pg stuck with unfound objects on non exsisting osd's

Re: [ceph-users] CDM Tonight @ 9p EDT

[ceph-users] CDM Tonight @ 9p EDT

Re: [ceph-users] All PGs are active+clean, still remapped PGs

Re: [ceph-users] All PGs are active+clean, still remapped PGs

Re: [ceph-users] All PGs are active+clean, still remapped PGs

Re: [ceph-users] All PGs are active+clean, still remapped PGs

Re: [ceph-users] All PGs are active+clean, still remapped PGs

Re: [ceph-users] All PGs are active+clean, still remapped PGs

Re: [ceph-users] XFS no space left on device

Re: [ceph-users] All PGs are active+clean, still remapped PGs

Re: [ceph-users] All PGs are active+clean, still remapped PGs

Re: [ceph-users] All PGs are active+clean, still remapped PGs

[ceph-users] PGs stuck at creating forever

[ceph-users] Question about PG class

23 matches

Site Navigation

Mail list logo

Footer information