date:20171013

Re: [ceph-users] How dead is my ec pool?

2017-10-13 Thread Brady Deetz

At this point, before I go any further, I'm copying my pools to new pools
so that I can attempt manual rados operations.

My current thinking is I could compare all objects in the cache tier
against the ec pool. Then if the object doesn't exist, copy the object. If
the objects exist in both and are different replace the ec pool's object
with the cache tier's object.

thoughts?

On Fri, Oct 13, 2017 at 10:13 PM, Brady Deetz  wrote:

> TLDR; In Jewel, I briefly had 2 cache tiers assigned to an ec pool and I
> think that broke my ec pool. I then made a series of decisions attempting
> to repair that mistake. I now think I've caused further issues.
>
> Background:
>
> Following having some serious I/O issues with my ec pool's cache tier, I
> decided I wanted to use a cache tier hosted on a different set of disks
> than my current tier.
>
> My first potentially poor decision was not removing the original cache
> tier before adding the new one.
>
> Basically, the workflow was as follows:
>
> pools:
> data_ec
> data_cache
> data_new_cache
>
> ceph osd tier add data_ec data_new_cache
> ceph osd tier cache-mode data_new_cache writeback
>
> ceph osd tier set-overlay data_ec data_new_cache
> ceph osd pool set data_new_cache hit_set_type bloom
> ceph osd pool set data_new_cache hit_set_count 1
> ceph osd pool set data_new_cache hit_set_period 3600
> ceph osd pool set data_new_cache target_max_bytes 1
> ceph osd pool set data_new_cache min_read_recency_for_promote 1
> ceph osd pool set data_new_cache min_write_recency_for_promote 1
>
> #so now I decided to attempt to remove the old cache
> ceph osd tier cache-mode data_cache forward
>
> #here is where things got bad
> rados -p data_cache cache-flush-evict-all
>
> #every object rados attempted to flush from the cache, left errors of the
> following varieties
> #
> rados -p data_cache cache-flush-evict-all
> rbd_data.af81e6238e1f29.0001732e
> error listing snap shots /rbd_data.af81e6238e1f29.0001732e: (2)
> No such file or directory
> rbd_data.af81e6238e1f29.000143bb
> error listing snap shots /rbd_data.af81e6238e1f29.000143bb: (2)
> No such file or directory
> rbd_data.af81e6238e1f29.000cf89d
> failed to flush /rbd_data.af81e6238e1f29.000cf89d: (2) No such
> file or directory
> rbd_data.af81e6238e1f29.000cf82c
>
>
>
> #Following these errors, I thought maybe the world would become happy
> again if I just removed the newly added ecpool.
>
> ceph osd tier cache-mode data_new_cache forward
> rados -p data_new_cache cache-flush-evict-all
>
> #when running the evict against the new tier, I received no errors
> #and so begins potential mistake number 3
>
> ceph osd tier remove-overlay ec_data
> ceph osd tier remove data_ec data_new_cache
>
> #I received the same errors. while trying to evict
>
> #knowing my data had been untouched for over an hour, I made a terrible
> decison
> ceph osd tier remove data_ec data_cache
>
> #I then discovered that I couldn't add the new or the old cache back to
> the ec pool, even with --force-nonempty
>
> ceph osd tier add data_ec data_cache --force-nonempty
> Error ENOTEMPTY: tier pool 'data_cache' has snapshot state; it cannot be
> added as a tier without breaking the pool
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] How dead is my ec pool?

2017-10-13 Thread Brady Deetz

TLDR; In Jewel, I briefly had 2 cache tiers assigned to an ec pool and I
think that broke my ec pool. I then made a series of decisions attempting
to repair that mistake. I now think I've caused further issues.

Background:

Following having some serious I/O issues with my ec pool's cache tier, I
decided I wanted to use a cache tier hosted on a different set of disks
than my current tier.

My first potentially poor decision was not removing the original cache tier
before adding the new one.

Basically, the workflow was as follows:

pools:
data_ec
data_cache
data_new_cache

ceph osd tier add data_ec data_new_cache
ceph osd tier cache-mode data_new_cache writeback

ceph osd tier set-overlay data_ec data_new_cache
ceph osd pool set data_new_cache hit_set_type bloom
ceph osd pool set data_new_cache hit_set_count 1
ceph osd pool set data_new_cache hit_set_period 3600
ceph osd pool set data_new_cache target_max_bytes 1
ceph osd pool set data_new_cache min_read_recency_for_promote 1
ceph osd pool set data_new_cache min_write_recency_for_promote 1

#so now I decided to attempt to remove the old cache
ceph osd tier cache-mode data_cache forward

#here is where things got bad
rados -p data_cache cache-flush-evict-all

#every object rados attempted to flush from the cache, left errors of the
following varieties
#
rados -p data_cache cache-flush-evict-all
rbd_data.af81e6238e1f29.0001732e
error listing snap shots /rbd_data.af81e6238e1f29.0001732e: (2) No
such file or directory
rbd_data.af81e6238e1f29.000143bb
error listing snap shots /rbd_data.af81e6238e1f29.000143bb: (2) No
such file or directory
rbd_data.af81e6238e1f29.000cf89d
failed to flush /rbd_data.af81e6238e1f29.000cf89d: (2) No such file
or directory
rbd_data.af81e6238e1f29.000cf82c



#Following these errors, I thought maybe the world would become happy again
if I just removed the newly added ecpool.

ceph osd tier cache-mode data_new_cache forward
rados -p data_new_cache cache-flush-evict-all

#when running the evict against the new tier, I received no errors
#and so begins potential mistake number 3

ceph osd tier remove-overlay ec_data
ceph osd tier remove data_ec data_new_cache

#I received the same errors. while trying to evict

#knowing my data had been untouched for over an hour, I made a terrible
decison
ceph osd tier remove data_ec data_cache

#I then discovered that I couldn't add the new or the old cache back to the
ec pool, even with --force-nonempty

ceph osd tier add data_ec data_cache --force-nonempty
Error ENOTEMPTY: tier pool 'data_cache' has snapshot state; it cannot be
added as a tier without breaking the pool
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] osd max scrubs not honored?

2017-10-13 Thread J David

Thanks all for input on this.

It’s taken a couple of weeks, but based on the feedback from the list,
we’ve got our version of a scrub-one-at-a-time cron script running and
confirmed that it’s working properly.

Unfortunately, this hasn’t really solved the real problem.  Even with
just one scrub and one client running, client I/O requests routinely
take 30-60 seconds to complete (read or write), which is so poor that
the cluster is unusable for any sort of interactive activity.  Nobody
is going to sit around and wait 30-60 seconds for a file to save or
load, or for a web server to respond, or a SQL query to finish.

Running “ceph -w” blames this on slow requests blocked for > 32 seconds:

2017-10-13 21:21:34.445798 mon.ceph1 [INF] overall HEALTH_OK
2017-10-13 21:21:51.305661 mon.ceph1 [WRN] Health check failed: 42
slow requests are blocked > 32 sec (REQUEST_SLOW)
2017-10-13 21:21:57.311892 mon.ceph1 [WRN] Health check update: 140
slow requests are blocked > 32 sec (REQUEST_SLOW)
2017-10-13 21:22:03.343443 mon.ceph1 [WRN] Health check update: 111
slow requests are blocked > 32 sec (REQUEST_SLOW)
2017-10-13 21:22:01.833605 osd.5 [WRN] 1 slow requests, 1 included
below; oldest blocked for > 30.526819 secs
2017-10-13 21:22:01.833614 osd.5 [WRN] slow request 30.526819 seconds
old, received at 2017-10-13 21:21:31.306718:
osd_op(client.6104975.0:7330926 0.a2
0:456218c9:::rbd_data.1a24832ae8944a.0009d21d:head
[set-alloc-hint object_size 4194304 write_size 4194304,write
2364416~88064] snapc 0=[] ondisk+write+known_if_redirected e18866)
currently sub_op_commit_rec from 9
2017-10-13 21:22:11.238561 mon.ceph1 [WRN] Health check update: 24
slow requests are blocked > 32 sec (REQUEST_SLOW)
2017-10-13 21:22:04.834075 osd.5 [WRN] 1 slow requests, 1 included
below; oldest blocked for > 30.291869 secs
2017-10-13 21:22:04.834082 osd.5 [WRN] slow request 30.291869 seconds
old, received at 2017-10-13 21:21:34.542137:
osd_op(client.6104975.0:7331703 0.a2
0:4571f0f6:::rbd_data.1a24832ae8944a.0009c8ef:head
[set-alloc-hint object_size 4194304 write_size 4194304,write
2934272~46592] snapc 0=[] ondisk+write+known_if_redirected e18866)
currently op_applied
2017-10-13 21:22:07.834445 osd.5 [WRN] 1 slow requests, 1 included
below; oldest blocked for > 30.421122 secs
2017-10-13 21:22:07.834452 osd.5 [WRN] slow request 30.421122 seconds
old, received at 2017-10-13 21:21:37.413260:
osd_op(client.6104975.0:7332411 0.a2
0:456218c9:::rbd_data.1a24832ae8944a.0009d21d:head
[set-alloc-hint object_size 4194304 write_size 4194304,write
4068352~16384] snapc 0=[] ondisk+write+known_if_redirected e18866)
currently op_applied
2017-10-13 21:22:16.238929 mon.ceph1 [WRN] Health check update: 8 slow
requests are blocked > 32 sec (REQUEST_SLOW)
2017-10-13 21:22:21.239234 mon.ceph1 [WRN] Health check update: 4 slow
requests are blocked > 32 sec (REQUEST_SLOW)
2017-10-13 21:22:21.329402 mon.ceph1 [INF] Health check cleared:
REQUEST_SLOW (was: 4 slow requests are blocked > 32 sec)
2017-10-13 21:22:21.329490 mon.ceph1 [INF] Cluster is now healthy

So far, the following steps have been taken to attempt to resolve this:

1) Updated to Ubuntu 16.04.3 LTS and Ceph 12.2.1.

2) Changes to ceph.conf:
osd max scrubs = 1
osd scrub during recovery = false
osd deep scrub interval = 2592000
osd scrub max interval = 2592000
osd deep scrub randomize ratio = 0.0
osd disk thread ioprio priority = 7
osd disk thread ioprio class = idle
osd scrub sleep = 0.1

3) Kernel I/O Scheduler set to cfq.

4) Deep-scrub moved to cron, with a limit of one running at a time.

With these changes, scrubs now take 40-45 minutes to complete, up from
20-25, so the amount of time where there are client I/O issues has
actually gotten substantially worse.

To summarize the ceph cluster, it has five nodes.  Each node has
- Intel Xeon E5-1620 v3 3.5Ghz quad core CPU
- 64GiB DDR4 1866
- Intel SSD DC S3700 1GB divided into three partitions used from
Bluestore blocks.db for each OSD
- Separate 64GB SSD for ceph monitor data & system image.
- 3 x 7200rpm drives (Seagate Constellation ES.3 4TB or Seagate
Enterprise Capacity 8TB)
- Dual Intel 10Gigabit NIC w/LACP

The SATA drives all check out healthy via smartctl and several are
either new and were tested right before insertion into this cluster,
or have been pulled for testing.  When tested on random operations,
they are by and large capable of 120-150 IOPS and about 30MB/sec
throughput at 100% utilization with response times of 5-7ms.

The CPUs are 75-90% idle.  The RAM is largely unused (~55GiB free).
The network is nearly idle (<50Mbps TX & RX, often <10Mbps).  The
blocks.db SSDs report 0% to 0.2% utilization.  The system/monitor SSD
reports 0-0.5% utilization.  The SATA drives report between 0 and 100%
utilization.

If I turn off the client and just let one deep scrub run, then I’ll see

With one scrub only (client turned off), most of the drives at
negligible utilization, but three at 60-100% utilization (all reads,
about

Re: [ceph-users] Brand new cluster -- pg is stuck inactive

2017-10-13 Thread dE


On 10/14/2017 12:53 AM, David Turner wrote:
What does your environment look like?  Someone recently on the mailing 
list had PGs stuck creating because of a networking issue.


On Fri, Oct 13, 2017 at 2:03 PM Ronny Aasen > wrote:


strange that no osd is acting for your pg's
can you show the output from
ceph osd tree


mvh
Ronny Aasen



On 13.10.2017 18:53, dE wrote:
> Hi,
>
>     I'm running ceph 10.2.5 on Debian (official package).
>
> It cant seem to create any functional pools --
>
> ceph health detail
> HEALTH_ERR 64 pgs are stuck inactive for more than 300 seconds;
64 pgs
> stuck inactive; too few PGs per OSD (21 < min 30)
> pg 0.39 is stuck inactive for 652.741684, current state
creating, last
> acting []
> pg 0.38 is stuck inactive for 652.741688, current state
creating, last
> acting []
> pg 0.37 is stuck inactive for 652.741690, current state
creating, last
> acting []
> pg 0.36 is stuck inactive for 652.741692, current state
creating, last
> acting []
> pg 0.35 is stuck inactive for 652.741694, current state
creating, last
> acting []
> pg 0.34 is stuck inactive for 652.741696, current state
creating, last
> acting []
> pg 0.33 is stuck inactive for 652.741698, current state
creating, last
> acting []
> pg 0.32 is stuck inactive for 652.741701, current state
creating, last
> acting []
> pg 0.3 is stuck inactive for 652.741762, current state creating,
last
> acting []
> pg 0.2e is stuck inactive for 652.741715, current state
creating, last
> acting []
> pg 0.2d is stuck inactive for 652.741719, current state
creating, last
> acting []
> pg 0.2c is stuck inactive for 652.741721, current state
creating, last
> acting []
> pg 0.2b is stuck inactive for 652.741723, current state
creating, last
> acting []
> pg 0.2a is stuck inactive for 652.741725, current state
creating, last
> acting []
> pg 0.29 is stuck inactive for 652.741727, current state
creating, last
> acting []
> pg 0.28 is stuck inactive for 652.741730, current state
creating, last
> acting []
> pg 0.27 is stuck inactive for 652.741732, current state
creating, last
> acting []
> pg 0.26 is stuck inactive for 652.741734, current state
creating, last
> acting []
> pg 0.3e is stuck inactive for 652.741707, current state
creating, last
> acting []
> pg 0.f is stuck inactive for 652.741761, current state creating,
last
> acting []
> pg 0.3f is stuck inactive for 652.741708, current state
creating, last
> acting []
> pg 0.10 is stuck inactive for 652.741763, current state
creating, last
> acting []
> pg 0.4 is stuck inactive for 652.741773, current state creating,
last
> acting []
> pg 0.5 is stuck inactive for 652.741774, current state creating,
last
> acting []
> pg 0.3a is stuck inactive for 652.741717, current state
creating, last
> acting []
> pg 0.b is stuck inactive for 652.741771, current state creating,
last
> acting []
> pg 0.c is stuck inactive for 652.741772, current state creating,
last
> acting []
> pg 0.3b is stuck inactive for 652.741721, current state
creating, last
> acting []
> pg 0.d is stuck inactive for 652.741774, current state creating,
last
> acting []
> pg 0.3c is stuck inactive for 652.741722, current state
creating, last
> acting []
> pg 0.e is stuck inactive for 652.741776, current state creating,
last
> acting []
> pg 0.3d is stuck inactive for 652.741724, current state
creating, last
> acting []
> pg 0.22 is stuck inactive for 652.741756, current state
creating, last
> acting []
> pg 0.21 is stuck inactive for 652.741758, current state
creating, last
> acting []
> pg 0.a is stuck inactive for 652.741783, current state creating,
last
> acting []
> pg 0.20 is stuck inactive for 652.741761, current state
creating, last
> acting []
> pg 0.9 is stuck inactive for 652.741787, current state creating,
last
> acting []
> pg 0.1f is stuck inactive for 652.741764, current state
creating, last
> acting []
> pg 0.8 is stuck inactive for 652.741790, current state creating,
last
> acting []
> pg 0.7 is stuck inactive for 652.741792, current state creating,
last
> acting []
> pg 0.6 is stuck inactive for 652.741794, current state creating,
last
> acting []
> pg 0.1e is stuck inactive for 652.741770, current state
creating, last
> acting []
> pg 0.1d is stuck inactive for 652.741772, current state
creating, last
> acting []
> pg 0.1c is stuck inactive for 652.741774, current state

Re: [ceph-users] using Bcache on blueStore

2017-10-13 Thread Kjetil Joergensen

Generally on bcache & for that matter lvmcache & dmwriteboost.

We did extensive "power off" testing with all of them and reliably
managed to break it on our hardware setup.

while true; boot box; start writing & stress metadata updates (i.e.
make piles of files and unlink them, or you could find something else
that's picky about write ordering); let it run for a bit; yank power;
power on;

This never survived for more than a night without badly corrupting
some xfs filesystem. We did the same testing without caching and could
not reproduce.

This may have been a quirk resulting from our particular setup, I get
the impression that others use it and sleep well at night, but I'd
recommend testing it under the most unforgivable circumstances you can
think of before proceeding.

-KJ

On Thu, Oct 12, 2017 at 4:54 PM, Jorge Pinilla López  wrote:
> Well, I wouldn't use bcache on filestore at all.
> First there are problems with all that you have said and second but way
> important you got doble writes (in FS data was written to journal and to
> storage disk at the same time), if jounal and data disk were the same then
> speed was divided by two getting really bad output.
>
> In BlueStore things change quite a lot, first there are not double writes
> there is no "journal" (well there is  a something call Wal but  it's not
> used in the same way), data goes directly into the data disk and you only
> write a few metadata and make a commit into the DB. Rebalancing and scrub go
> through a RockDB not a file system making it way more simple and effective,
> you aren't supposed to have all the problems that you had with FS.
>
> In addition, cache tiering has been deprecated on Red Hat Ceph Storage so I
> personally wouldn't use something deprecated by developers and support.
>
>
>  Mensaje original 
> De: Marek Grzybowski 
> Fecha: 13/10/17 12:22 AM (GMT+01:00)
> Para: Jorge Pinilla López , ceph-users@lists.ceph.com
> Asunto: Re: [ceph-users] using Bcache on blueStore
>
> On 12.10.2017 20:28, Jorge Pinilla López wrote:
>> Hey all!
>> I have a ceph with multiple HDD and 1 really fast SSD with (30GB per OSD)
>> per host.
>>
>> I have been thinking and all docs say that I should give all the SSD space
>> for RocksDB, so I would have a HDD data and a 30GB partition for RocksDB.
>>
>> But it came to my mind that if the OSD isnt full maybe I am not using all
>> the space in the SSD, or maybe I prefer having a really small amount of hot
>> k/v and metadata and the data itself in a really fast device than just
>> storing all could metadata.
>>
>> So I though that using Bcache to make SSD to be a cache and as metadata
>> and k/v are usually hot, they should be place on the cache. But this doesnt
>> guarantee me that k/v and metadata are actually always in the SSD cause
>> under heavy cache loads it can be pushed out (like really big data files).
>>
>> So I came up with the idea of setting small 5-10GB partitions for the hot
>> RocksDB and the rest to use it as a cache, so I make sure that really hot
>> metadata is actually always on the SSD and the coulder one should be also on
>> the SSD (as a bcache) if its not really freezing, in that case they would be
>> pushed to the HDD. It also doesnt make anysense to have metadatada that you
>> never used using space on the SSD, I rather use that space to store hotter
>> data.
>>
>> This is also make writes faster, and in blueStore we dont have the double
>> write problem so it should work fine.
>>
>> What do you think about this? does it have any downsite? is there any
>> other way?
>
> Hi Jorge
>   I was inexperienced and tried bcache on old fsstore once. It was bad.
> Mostly because bcache does not have any typical disk scheduling algorithm.
> So when scrub or rebalnce was running latency on such storage was very high
> and unpredictable.
> OSD deamon could not give any ioprio for disks read or writes, and
> additionaly
> bcache cache was poisoned by scrub/rebalance.
>
> Fortunately to me, it is very easy to rolling replace OSDs.
> I use some SSDs partitions for journal now and what left for pure ssd
> storage.
> This works really great .
>
> If i will ever need cache, i will use cache tiering instead .
>
>
> --
>   Kind Regards
> Marek Grzybowski
>
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Kjetil Joergensen 
SRE, Medallia Inc
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [MONITOR SEGFAULT] Luminous cluster stuck when adding monitor

2017-10-13 Thread Mehmet

Hey guys,

Does this mean we have to do additional when upgrading from Jewel 10.2.10 to 
luminous 12.2.1?

- Mehmet

Am 9. Oktober 2017 04:02:14 MESZ schrieb kefu chai :
>On Mon, Oct 9, 2017 at 8:07 AM, Joao Eduardo Luis  wrote:
>> This looks a lot like a bug I fixed a week or so ago, but for which I
>> currently don't recall the ticket off the top of my head. It was
>basically a
>
>http://tracker.ceph.com/issues/21300
>
>> crash each time a "ceph osd df" was called, if a mgr was not
>available after
>> having set the luminous osd require flag. I will check the log in the
>> morning to figure out whether you need to upgrade to a newer version
>or if
>> this is a corner case the fix missed. In the mean time, check if you
>have
>> ceph-mgr running, because that's the easy work around (assuming it's 
>the
>> same bug).
>>
>>   -Joao
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
>-- 
>Regards
>Kefu Chai
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Brand new cluster -- pg is stuck inactive

2017-10-13 Thread David Turner

What does your environment look like?  Someone recently on the mailing list
had PGs stuck creating because of a networking issue.

On Fri, Oct 13, 2017 at 2:03 PM Ronny Aasen 
wrote:

> strange that no osd is acting for your pg's
> can you show the output from
> ceph osd tree
>
>
> mvh
> Ronny Aasen
>
>
>
> On 13.10.2017 18:53, dE wrote:
> > Hi,
> >
> > I'm running ceph 10.2.5 on Debian (official package).
> >
> > It cant seem to create any functional pools --
> >
> > ceph health detail
> > HEALTH_ERR 64 pgs are stuck inactive for more than 300 seconds; 64 pgs
> > stuck inactive; too few PGs per OSD (21 < min 30)
> > pg 0.39 is stuck inactive for 652.741684, current state creating, last
> > acting []
> > pg 0.38 is stuck inactive for 652.741688, current state creating, last
> > acting []
> > pg 0.37 is stuck inactive for 652.741690, current state creating, last
> > acting []
> > pg 0.36 is stuck inactive for 652.741692, current state creating, last
> > acting []
> > pg 0.35 is stuck inactive for 652.741694, current state creating, last
> > acting []
> > pg 0.34 is stuck inactive for 652.741696, current state creating, last
> > acting []
> > pg 0.33 is stuck inactive for 652.741698, current state creating, last
> > acting []
> > pg 0.32 is stuck inactive for 652.741701, current state creating, last
> > acting []
> > pg 0.3 is stuck inactive for 652.741762, current state creating, last
> > acting []
> > pg 0.2e is stuck inactive for 652.741715, current state creating, last
> > acting []
> > pg 0.2d is stuck inactive for 652.741719, current state creating, last
> > acting []
> > pg 0.2c is stuck inactive for 652.741721, current state creating, last
> > acting []
> > pg 0.2b is stuck inactive for 652.741723, current state creating, last
> > acting []
> > pg 0.2a is stuck inactive for 652.741725, current state creating, last
> > acting []
> > pg 0.29 is stuck inactive for 652.741727, current state creating, last
> > acting []
> > pg 0.28 is stuck inactive for 652.741730, current state creating, last
> > acting []
> > pg 0.27 is stuck inactive for 652.741732, current state creating, last
> > acting []
> > pg 0.26 is stuck inactive for 652.741734, current state creating, last
> > acting []
> > pg 0.3e is stuck inactive for 652.741707, current state creating, last
> > acting []
> > pg 0.f is stuck inactive for 652.741761, current state creating, last
> > acting []
> > pg 0.3f is stuck inactive for 652.741708, current state creating, last
> > acting []
> > pg 0.10 is stuck inactive for 652.741763, current state creating, last
> > acting []
> > pg 0.4 is stuck inactive for 652.741773, current state creating, last
> > acting []
> > pg 0.5 is stuck inactive for 652.741774, current state creating, last
> > acting []
> > pg 0.3a is stuck inactive for 652.741717, current state creating, last
> > acting []
> > pg 0.b is stuck inactive for 652.741771, current state creating, last
> > acting []
> > pg 0.c is stuck inactive for 652.741772, current state creating, last
> > acting []
> > pg 0.3b is stuck inactive for 652.741721, current state creating, last
> > acting []
> > pg 0.d is stuck inactive for 652.741774, current state creating, last
> > acting []
> > pg 0.3c is stuck inactive for 652.741722, current state creating, last
> > acting []
> > pg 0.e is stuck inactive for 652.741776, current state creating, last
> > acting []
> > pg 0.3d is stuck inactive for 652.741724, current state creating, last
> > acting []
> > pg 0.22 is stuck inactive for 652.741756, current state creating, last
> > acting []
> > pg 0.21 is stuck inactive for 652.741758, current state creating, last
> > acting []
> > pg 0.a is stuck inactive for 652.741783, current state creating, last
> > acting []
> > pg 0.20 is stuck inactive for 652.741761, current state creating, last
> > acting []
> > pg 0.9 is stuck inactive for 652.741787, current state creating, last
> > acting []
> > pg 0.1f is stuck inactive for 652.741764, current state creating, last
> > acting []
> > pg 0.8 is stuck inactive for 652.741790, current state creating, last
> > acting []
> > pg 0.7 is stuck inactive for 652.741792, current state creating, last
> > acting []
> > pg 0.6 is stuck inactive for 652.741794, current state creating, last
> > acting []
> > pg 0.1e is stuck inactive for 652.741770, current state creating, last
> > acting []
> > pg 0.1d is stuck inactive for 652.741772, current state creating, last
> > acting []
> > pg 0.1c is stuck inactive for 652.741774, current state creating, last
> > acting []
> > pg 0.1b is stuck inactive for 652.741777, current state creating, last
> > acting []
> > pg 0.1a is stuck inactive for 652.741784, current state creating, last
> > acting []
> > pg 0.2 is stuck inactive for 652.741812, current state creating, last
> > acting []
> > pg 0.31 is stuck inactive for 652.741762, current state creating, last
> > acting []
> > pg 0.19 is stuck inactive for 652.741789, current state creating, last
> > acting []

Re: [ceph-users] Brand new cluster -- pg is stuck inactive

2017-10-13 Thread Gerhard W. Recher

you specify a mon on 0.0.0.0 

my ceph.conf

[mon.2]
 host = pve03
 mon addr = 192.168.100.143:6789

[mon.3]
 host = pve04
 mon addr = 192.168.100.144:6789

[mon.0]
 host = pve01
 mon addr = 192.168.100.141:6789

[mon.1]
 host = pve02
 mon addr = 192.168.100.142:6789



Gerhard W. Recher

net4sec UG (haftungsbeschränkt)
Leitenweg 6
86929 Penzing

+49 171 4802507
Am 13.10.2017 um 20:01 schrieb Ronny Aasen:
> strange that no osd is acting for your pg's
> can you show the output from
> ceph osd tree
>
>
> mvh
> Ronny Aasen
>
>
>
> On 13.10.2017 18:53, dE wrote:
>> Hi,
>>
>>     I'm running ceph 10.2.5 on Debian (official package).
>>
>> It cant seem to create any functional pools --
>>
>> ceph health detail
>> HEALTH_ERR 64 pgs are stuck inactive for more than 300 seconds; 64
>> pgs stuck inactive; too few PGs per OSD (21 < min 30)
>> pg 0.39 is stuck inactive for 652.741684, current state creating,
>> last acting []
>> pg 0.38 is stuck inactive for 652.741688, current state creating,
>> last acting []
>> pg 0.37 is stuck inactive for 652.741690, current state creating,
>> last acting []
>> pg 0.36 is stuck inactive for 652.741692, current state creating,
>> last acting []
>> pg 0.35 is stuck inactive for 652.741694, current state creating,
>> last acting []
>> pg 0.34 is stuck inactive for 652.741696, current state creating,
>> last acting []
>> pg 0.33 is stuck inactive for 652.741698, current state creating,
>> last acting []
>> pg 0.32 is stuck inactive for 652.741701, current state creating,
>> last acting []
>> pg 0.3 is stuck inactive for 652.741762, current state creating, last
>> acting []
>> pg 0.2e is stuck inactive for 652.741715, current state creating,
>> last acting []
>> pg 0.2d is stuck inactive for 652.741719, current state creating,
>> last acting []
>> pg 0.2c is stuck inactive for 652.741721, current state creating,
>> last acting []
>> pg 0.2b is stuck inactive for 652.741723, current state creating,
>> last acting []
>> pg 0.2a is stuck inactive for 652.741725, current state creating,
>> last acting []
>> pg 0.29 is stuck inactive for 652.741727, current state creating,
>> last acting []
>> pg 0.28 is stuck inactive for 652.741730, current state creating,
>> last acting []
>> pg 0.27 is stuck inactive for 652.741732, current state creating,
>> last acting []
>> pg 0.26 is stuck inactive for 652.741734, current state creating,
>> last acting []
>> pg 0.3e is stuck inactive for 652.741707, current state creating,
>> last acting []
>> pg 0.f is stuck inactive for 652.741761, current state creating, last
>> acting []
>> pg 0.3f is stuck inactive for 652.741708, current state creating,
>> last acting []
>> pg 0.10 is stuck inactive for 652.741763, current state creating,
>> last acting []
>> pg 0.4 is stuck inactive for 652.741773, current state creating, last
>> acting []
>> pg 0.5 is stuck inactive for 652.741774, current state creating, last
>> acting []
>> pg 0.3a is stuck inactive for 652.741717, current state creating,
>> last acting []
>> pg 0.b is stuck inactive for 652.741771, current state creating, last
>> acting []
>> pg 0.c is stuck inactive for 652.741772, current state creating, last
>> acting []
>> pg 0.3b is stuck inactive for 652.741721, current state creating,
>> last acting []
>> pg 0.d is stuck inactive for 652.741774, current state creating, last
>> acting []
>> pg 0.3c is stuck inactive for 652.741722, current state creating,
>> last acting []
>> pg 0.e is stuck inactive for 652.741776, current state creating, last
>> acting []
>> pg 0.3d is stuck inactive for 652.741724, current state creating,
>> last acting []
>> pg 0.22 is stuck inactive for 652.741756, current state creating,
>> last acting []
>> pg 0.21 is stuck inactive for 652.741758, current state creating,
>> last acting []
>> pg 0.a is stuck inactive for 652.741783, current state creating, last
>> acting []
>> pg 0.20 is stuck inactive for 652.741761, current state creating,
>> last acting []
>> pg 0.9 is stuck inactive for 652.741787, current state creating, last
>> acting []
>> pg 0.1f is stuck inactive for 652.741764, current state creating,
>> last acting []
>> pg 0.8 is stuck inactive for 652.741790, current state creating, last
>> acting []
>> pg 0.7 is stuck inactive for 652.741792, current state creating, last
>> acting []
>> pg 0.6 is stuck inactive for 652.741794, current state creating, last
>> acting []
>> pg 0.1e is stuck inactive for 652.741770, current state creating,
>> last acting []
>> pg 0.1d is stuck inactive for 652.741772, current state creating,
>> last acting []
>> pg 0.1c is stuck inactive for 652.741774, current state creating,
>> last acting []
>> pg 0.1b is stuck inactive for 652.741777, current state creating,
>> last acting []
>> pg 0.1a is stuck inactive for 652.741784, current state creating,
>> last acting []
>> pg 0.2 is stuck inactive for 652.741812, current state creating, last
>> acting []
>> pg 0.31 is

Re: [ceph-users] Brand new cluster -- pg is stuck inactive

2017-10-13 Thread dE


On 10/13/2017 10:23 PM, dE wrote:

Hi,

    I'm running ceph 10.2.5 on Debian (official package).

It cant seem to create any functional pools --

ceph health detail
HEALTH_ERR 64 pgs are stuck inactive for more than 300 seconds; 64 pgs 
stuck inactive; too few PGs per OSD (21 < min 30)
pg 0.39 is stuck inactive for 652.741684, current state creating, last 
acting []
pg 0.38 is stuck inactive for 652.741688, current state creating, last 
acting []
pg 0.37 is stuck inactive for 652.741690, current state creating, last 
acting []
pg 0.36 is stuck inactive for 652.741692, current state creating, last 
acting []
pg 0.35 is stuck inactive for 652.741694, current state creating, last 
acting []
pg 0.34 is stuck inactive for 652.741696, current state creating, last 
acting []
pg 0.33 is stuck inactive for 652.741698, current state creating, last 
acting []
pg 0.32 is stuck inactive for 652.741701, current state creating, last 
acting []
pg 0.3 is stuck inactive for 652.741762, current state creating, last 
acting []
pg 0.2e is stuck inactive for 652.741715, current state creating, last 
acting []
pg 0.2d is stuck inactive for 652.741719, current state creating, last 
acting []
pg 0.2c is stuck inactive for 652.741721, current state creating, last 
acting []
pg 0.2b is stuck inactive for 652.741723, current state creating, last 
acting []
pg 0.2a is stuck inactive for 652.741725, current state creating, last 
acting []
pg 0.29 is stuck inactive for 652.741727, current state creating, last 
acting []
pg 0.28 is stuck inactive for 652.741730, current state creating, last 
acting []
pg 0.27 is stuck inactive for 652.741732, current state creating, last 
acting []
pg 0.26 is stuck inactive for 652.741734, current state creating, last 
acting []
pg 0.3e is stuck inactive for 652.741707, current state creating, last 
acting []
pg 0.f is stuck inactive for 652.741761, current state creating, last 
acting []
pg 0.3f is stuck inactive for 652.741708, current state creating, last 
acting []
pg 0.10 is stuck inactive for 652.741763, current state creating, last 
acting []
pg 0.4 is stuck inactive for 652.741773, current state creating, last 
acting []
pg 0.5 is stuck inactive for 652.741774, current state creating, last 
acting []
pg 0.3a is stuck inactive for 652.741717, current state creating, last 
acting []
pg 0.b is stuck inactive for 652.741771, current state creating, last 
acting []
pg 0.c is stuck inactive for 652.741772, current state creating, last 
acting []
pg 0.3b is stuck inactive for 652.741721, current state creating, last 
acting []
pg 0.d is stuck inactive for 652.741774, current state creating, last 
acting []
pg 0.3c is stuck inactive for 652.741722, current state creating, last 
acting []
pg 0.e is stuck inactive for 652.741776, current state creating, last 
acting []
pg 0.3d is stuck inactive for 652.741724, current state creating, last 
acting []
pg 0.22 is stuck inactive for 652.741756, current state creating, last 
acting []
pg 0.21 is stuck inactive for 652.741758, current state creating, last 
acting []
pg 0.a is stuck inactive for 652.741783, current state creating, last 
acting []
pg 0.20 is stuck inactive for 652.741761, current state creating, last 
acting []
pg 0.9 is stuck inactive for 652.741787, current state creating, last 
acting []
pg 0.1f is stuck inactive for 652.741764, current state creating, last 
acting []
pg 0.8 is stuck inactive for 652.741790, current state creating, last 
acting []
pg 0.7 is stuck inactive for 652.741792, current state creating, last 
acting []
pg 0.6 is stuck inactive for 652.741794, current state creating, last 
acting []
pg 0.1e is stuck inactive for 652.741770, current state creating, last 
acting []
pg 0.1d is stuck inactive for 652.741772, current state creating, last 
acting []
pg 0.1c is stuck inactive for 652.741774, current state creating, last 
acting []
pg 0.1b is stuck inactive for 652.741777, current state creating, last 
acting []
pg 0.1a is stuck inactive for 652.741784, current state creating, last 
acting []
pg 0.2 is stuck inactive for 652.741812, current state creating, last 
acting []
pg 0.31 is stuck inactive for 652.741762, current state creating, last 
acting []
pg 0.19 is stuck inactive for 652.741789, current state creating, last 
acting []
pg 0.11 is stuck inactive for 652.741797, current state creating, last 
acting []
pg 0.18 is stuck inactive for 652.741793, current state creating, last 
acting []
pg 0.1 is stuck inactive for 652.741820, current state creating, last 
acting []
pg 0.30 is stuck inactive for 652.741769, current state creating, last 
acting []
pg 0.17 is stuck inactive for 652.741797, current state creating, last 
acting []
pg 0.0 is stuck inactive for 652.741829, current state creating, last 
acting []
pg 0.2f is stuck inactive for 652.741774, current state creating, last 
acting []
pg 0.16 is stuck inactive for 652.741802, current state creating, last 
acting []
pg 0.12 is stuck inactive for 652.741807, current

Re: [ceph-users] Brand new cluster -- pg is stuck inactive

2017-10-13 Thread dE


On 10/13/2017 10:23 PM, dE wrote:

Hi,

    I'm running ceph 10.2.5 on Debian (official package).

It cant seem to create any functional pools --

ceph health detail
HEALTH_ERR 64 pgs are stuck inactive for more than 300 seconds; 64 pgs 
stuck inactive; too few PGs per OSD (21 < min 30)
pg 0.39 is stuck inactive for 652.741684, current state creating, last 
acting []
pg 0.38 is stuck inactive for 652.741688, current state creating, last 
acting []
pg 0.37 is stuck inactive for 652.741690, current state creating, last 
acting []
pg 0.36 is stuck inactive for 652.741692, current state creating, last 
acting []
pg 0.35 is stuck inactive for 652.741694, current state creating, last 
acting []
pg 0.34 is stuck inactive for 652.741696, current state creating, last 
acting []
pg 0.33 is stuck inactive for 652.741698, current state creating, last 
acting []
pg 0.32 is stuck inactive for 652.741701, current state creating, last 
acting []
pg 0.3 is stuck inactive for 652.741762, current state creating, last 
acting []
pg 0.2e is stuck inactive for 652.741715, current state creating, last 
acting []
pg 0.2d is stuck inactive for 652.741719, current state creating, last 
acting []
pg 0.2c is stuck inactive for 652.741721, current state creating, last 
acting []
pg 0.2b is stuck inactive for 652.741723, current state creating, last 
acting []
pg 0.2a is stuck inactive for 652.741725, current state creating, last 
acting []
pg 0.29 is stuck inactive for 652.741727, current state creating, last 
acting []
pg 0.28 is stuck inactive for 652.741730, current state creating, last 
acting []
pg 0.27 is stuck inactive for 652.741732, current state creating, last 
acting []
pg 0.26 is stuck inactive for 652.741734, current state creating, last 
acting []
pg 0.3e is stuck inactive for 652.741707, current state creating, last 
acting []
pg 0.f is stuck inactive for 652.741761, current state creating, last 
acting []
pg 0.3f is stuck inactive for 652.741708, current state creating, last 
acting []
pg 0.10 is stuck inactive for 652.741763, current state creating, last 
acting []
pg 0.4 is stuck inactive for 652.741773, current state creating, last 
acting []
pg 0.5 is stuck inactive for 652.741774, current state creating, last 
acting []
pg 0.3a is stuck inactive for 652.741717, current state creating, last 
acting []
pg 0.b is stuck inactive for 652.741771, current state creating, last 
acting []
pg 0.c is stuck inactive for 652.741772, current state creating, last 
acting []
pg 0.3b is stuck inactive for 652.741721, current state creating, last 
acting []
pg 0.d is stuck inactive for 652.741774, current state creating, last 
acting []
pg 0.3c is stuck inactive for 652.741722, current state creating, last 
acting []
pg 0.e is stuck inactive for 652.741776, current state creating, last 
acting []
pg 0.3d is stuck inactive for 652.741724, current state creating, last 
acting []
pg 0.22 is stuck inactive for 652.741756, current state creating, last 
acting []
pg 0.21 is stuck inactive for 652.741758, current state creating, last 
acting []
pg 0.a is stuck inactive for 652.741783, current state creating, last 
acting []
pg 0.20 is stuck inactive for 652.741761, current state creating, last 
acting []
pg 0.9 is stuck inactive for 652.741787, current state creating, last 
acting []
pg 0.1f is stuck inactive for 652.741764, current state creating, last 
acting []
pg 0.8 is stuck inactive for 652.741790, current state creating, last 
acting []
pg 0.7 is stuck inactive for 652.741792, current state creating, last 
acting []
pg 0.6 is stuck inactive for 652.741794, current state creating, last 
acting []
pg 0.1e is stuck inactive for 652.741770, current state creating, last 
acting []
pg 0.1d is stuck inactive for 652.741772, current state creating, last 
acting []
pg 0.1c is stuck inactive for 652.741774, current state creating, last 
acting []
pg 0.1b is stuck inactive for 652.741777, current state creating, last 
acting []
pg 0.1a is stuck inactive for 652.741784, current state creating, last 
acting []
pg 0.2 is stuck inactive for 652.741812, current state creating, last 
acting []
pg 0.31 is stuck inactive for 652.741762, current state creating, last 
acting []
pg 0.19 is stuck inactive for 652.741789, current state creating, last 
acting []
pg 0.11 is stuck inactive for 652.741797, current state creating, last 
acting []
pg 0.18 is stuck inactive for 652.741793, current state creating, last 
acting []
pg 0.1 is stuck inactive for 652.741820, current state creating, last 
acting []
pg 0.30 is stuck inactive for 652.741769, current state creating, last 
acting []
pg 0.17 is stuck inactive for 652.741797, current state creating, last 
acting []
pg 0.0 is stuck inactive for 652.741829, current state creating, last 
acting []
pg 0.2f is stuck inactive for 652.741774, current state creating, last 
acting []
pg 0.16 is stuck inactive for 652.741802, current state creating, last 
acting []
pg 0.12 is stuck inactive for 652.741807, current

Re: [ceph-users] Brand new cluster -- pg is stuck inactive

2017-10-13 Thread Ronny Aasen


strange that no osd is acting for your pg's
can you show the output from
ceph osd tree


mvh
Ronny Aasen



On 13.10.2017 18:53, dE wrote:

Hi,

    I'm running ceph 10.2.5 on Debian (official package).

It cant seem to create any functional pools --

ceph health detail
HEALTH_ERR 64 pgs are stuck inactive for more than 300 seconds; 64 pgs 
stuck inactive; too few PGs per OSD (21 < min 30)
pg 0.39 is stuck inactive for 652.741684, current state creating, last 
acting []
pg 0.38 is stuck inactive for 652.741688, current state creating, last 
acting []
pg 0.37 is stuck inactive for 652.741690, current state creating, last 
acting []
pg 0.36 is stuck inactive for 652.741692, current state creating, last 
acting []
pg 0.35 is stuck inactive for 652.741694, current state creating, last 
acting []
pg 0.34 is stuck inactive for 652.741696, current state creating, last 
acting []
pg 0.33 is stuck inactive for 652.741698, current state creating, last 
acting []
pg 0.32 is stuck inactive for 652.741701, current state creating, last 
acting []
pg 0.3 is stuck inactive for 652.741762, current state creating, last 
acting []
pg 0.2e is stuck inactive for 652.741715, current state creating, last 
acting []
pg 0.2d is stuck inactive for 652.741719, current state creating, last 
acting []
pg 0.2c is stuck inactive for 652.741721, current state creating, last 
acting []
pg 0.2b is stuck inactive for 652.741723, current state creating, last 
acting []
pg 0.2a is stuck inactive for 652.741725, current state creating, last 
acting []
pg 0.29 is stuck inactive for 652.741727, current state creating, last 
acting []
pg 0.28 is stuck inactive for 652.741730, current state creating, last 
acting []
pg 0.27 is stuck inactive for 652.741732, current state creating, last 
acting []
pg 0.26 is stuck inactive for 652.741734, current state creating, last 
acting []
pg 0.3e is stuck inactive for 652.741707, current state creating, last 
acting []
pg 0.f is stuck inactive for 652.741761, current state creating, last 
acting []
pg 0.3f is stuck inactive for 652.741708, current state creating, last 
acting []
pg 0.10 is stuck inactive for 652.741763, current state creating, last 
acting []
pg 0.4 is stuck inactive for 652.741773, current state creating, last 
acting []
pg 0.5 is stuck inactive for 652.741774, current state creating, last 
acting []
pg 0.3a is stuck inactive for 652.741717, current state creating, last 
acting []
pg 0.b is stuck inactive for 652.741771, current state creating, last 
acting []
pg 0.c is stuck inactive for 652.741772, current state creating, last 
acting []
pg 0.3b is stuck inactive for 652.741721, current state creating, last 
acting []
pg 0.d is stuck inactive for 652.741774, current state creating, last 
acting []
pg 0.3c is stuck inactive for 652.741722, current state creating, last 
acting []
pg 0.e is stuck inactive for 652.741776, current state creating, last 
acting []
pg 0.3d is stuck inactive for 652.741724, current state creating, last 
acting []
pg 0.22 is stuck inactive for 652.741756, current state creating, last 
acting []
pg 0.21 is stuck inactive for 652.741758, current state creating, last 
acting []
pg 0.a is stuck inactive for 652.741783, current state creating, last 
acting []
pg 0.20 is stuck inactive for 652.741761, current state creating, last 
acting []
pg 0.9 is stuck inactive for 652.741787, current state creating, last 
acting []
pg 0.1f is stuck inactive for 652.741764, current state creating, last 
acting []
pg 0.8 is stuck inactive for 652.741790, current state creating, last 
acting []
pg 0.7 is stuck inactive for 652.741792, current state creating, last 
acting []
pg 0.6 is stuck inactive for 652.741794, current state creating, last 
acting []
pg 0.1e is stuck inactive for 652.741770, current state creating, last 
acting []
pg 0.1d is stuck inactive for 652.741772, current state creating, last 
acting []
pg 0.1c is stuck inactive for 652.741774, current state creating, last 
acting []
pg 0.1b is stuck inactive for 652.741777, current state creating, last 
acting []
pg 0.1a is stuck inactive for 652.741784, current state creating, last 
acting []
pg 0.2 is stuck inactive for 652.741812, current state creating, last 
acting []
pg 0.31 is stuck inactive for 652.741762, current state creating, last 
acting []
pg 0.19 is stuck inactive for 652.741789, current state creating, last 
acting []
pg 0.11 is stuck inactive for 652.741797, current state creating, last 
acting []
pg 0.18 is stuck inactive for 652.741793, current state creating, last 
acting []
pg 0.1 is stuck inactive for 652.741820, current state creating, last 
acting []
pg 0.30 is stuck inactive for 652.741769, current state creating, last 
acting []
pg 0.17 is stuck inactive for 652.741797, current state creating, last 
acting []
pg 0.0 is stuck inactive for 652.741829, current state creating, last 
acting []
pg 0.2f is stuck inactive for 652.741774, current state creating, last 
acting []
pg 0.16 is stuck inactive

Re: [ceph-users] objects degraded higher than 100%

2017-10-13 Thread David Zafman



I improved the code to compute degraded objects during 
backfill/recovery.  During my testing it wouldn't result in percentage 
above 100%.  I'll have to look at the code and verify that some 
subsequent changes didn't break things.


David


On 10/13/17 9:55 AM, Florian Haas wrote:

Okay, in that case I've no idea. What was the timeline for the recovery
versus the rados bench and cleanup versus the degraded object counts,
then?

1. Jewel deployment with filestore.
2. Upgrade to Luminous (including mgr deployment and "ceph osd
require-osd-release luminous"), still on filestore.
3. rados bench with subsequent cleanup.
4. All OSDs up, all  PGs active+clean.
5. Stop one OSD. Remove from CRUSH, auth list, OSD map.
6. Reinitialize OSD with bluestore.
7. Start OSD, commencing backfill.
8. Degraded objects above 100%.

Please let me know if that information is useful. Thank you!


Hmm, that does leave me a little perplexed.

Yeah exactly, me too. :)


David, do we maybe do something with degraded counts based on the number of
objects identified in pg logs? Or some other heuristic for number of objects
that might be stale? That's the only way I can think of to get these weird
returning sets.

One thing that just crossed my mind: would it make a difference
whether after the OSD goes out or not, in the time window between it
going down and being deleted from the crushmap/osdmap? I think it
shouldn't (whether being marked out or just non-existent, it's not
eligible for holding any data so either way), but I'm not really sure
about the mechanics of the internals here.

Cheers,
Florian


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Brand new cluster -- pg is stuck inactive

2017-10-13 Thread Michael Kuriger

You may not have enough OSDs to satisfy the crush ruleset.  

 
Mike Kuriger 
Sr. Unix Systems Engineer
818-434-6195 
 

On 10/13/17, 9:53 AM, "ceph-users on behalf of dE" 
 wrote:

Hi,

 I'm running ceph 10.2.5 on Debian (official package).

It cant seem to create any functional pools --

ceph health detail
HEALTH_ERR 64 pgs are stuck inactive for more than 300 seconds; 64 pgs 
stuck inactive; too few PGs per OSD (21 < min 30)
pg 0.39 is stuck inactive for 652.741684, current state creating, last 
acting []
pg 0.38 is stuck inactive for 652.741688, current state creating, last 
acting []
pg 0.37 is stuck inactive for 652.741690, current state creating, last 
acting []
pg 0.36 is stuck inactive for 652.741692, current state creating, last 
acting []
pg 0.35 is stuck inactive for 652.741694, current state creating, last 
acting []
pg 0.34 is stuck inactive for 652.741696, current state creating, last 
acting []
pg 0.33 is stuck inactive for 652.741698, current state creating, last 
acting []
pg 0.32 is stuck inactive for 652.741701, current state creating, last 
acting []
pg 0.3 is stuck inactive for 652.741762, current state creating, last 
acting []
pg 0.2e is stuck inactive for 652.741715, current state creating, last 
acting []
pg 0.2d is stuck inactive for 652.741719, current state creating, last 
acting []
pg 0.2c is stuck inactive for 652.741721, current state creating, last 
acting []
pg 0.2b is stuck inactive for 652.741723, current state creating, last 
acting []
pg 0.2a is stuck inactive for 652.741725, current state creating, last 
acting []
pg 0.29 is stuck inactive for 652.741727, current state creating, last 
acting []
pg 0.28 is stuck inactive for 652.741730, current state creating, last 
acting []
pg 0.27 is stuck inactive for 652.741732, current state creating, last 
acting []
pg 0.26 is stuck inactive for 652.741734, current state creating, last 
acting []
pg 0.3e is stuck inactive for 652.741707, current state creating, last 
acting []
pg 0.f is stuck inactive for 652.741761, current state creating, last 
acting []
pg 0.3f is stuck inactive for 652.741708, current state creating, last 
acting []
pg 0.10 is stuck inactive for 652.741763, current state creating, last 
acting []
pg 0.4 is stuck inactive for 652.741773, current state creating, last 
acting []
pg 0.5 is stuck inactive for 652.741774, current state creating, last 
acting []
pg 0.3a is stuck inactive for 652.741717, current state creating, last 
acting []
pg 0.b is stuck inactive for 652.741771, current state creating, last 
acting []
pg 0.c is stuck inactive for 652.741772, current state creating, last 
acting []
pg 0.3b is stuck inactive for 652.741721, current state creating, last 
acting []
pg 0.d is stuck inactive for 652.741774, current state creating, last 
acting []
pg 0.3c is stuck inactive for 652.741722, current state creating, last 
acting []
pg 0.e is stuck inactive for 652.741776, current state creating, last 
acting []
pg 0.3d is stuck inactive for 652.741724, current state creating, last 
acting []
pg 0.22 is stuck inactive for 652.741756, current state creating, last 
acting []
pg 0.21 is stuck inactive for 652.741758, current state creating, last 
acting []
pg 0.a is stuck inactive for 652.741783, current state creating, last 
acting []
pg 0.20 is stuck inactive for 652.741761, current state creating, last 
acting []
pg 0.9 is stuck inactive for 652.741787, current state creating, last 
acting []
pg 0.1f is stuck inactive for 652.741764, current state creating, last 
acting []
pg 0.8 is stuck inactive for 652.741790, current state creating, last 
acting []
pg 0.7 is stuck inactive for 652.741792, current state creating, last 
acting []
pg 0.6 is stuck inactive for 652.741794, current state creating, last 
acting []
pg 0.1e is stuck inactive for 652.741770, current state creating, last 
acting []
pg 0.1d is stuck inactive for 652.741772, current state creating, last 
acting []
pg 0.1c is stuck inactive for 652.741774, current state creating, last 
acting []
pg 0.1b is stuck inactive for 652.741777, current state creating, last 
acting []
pg 0.1a is stuck inactive for 652.741784, current state creating, last 
acting []
pg 0.2 is stuck inactive for 652.741812, current state creating, last 
acting []
pg 0.31 is stuck inactive for 652.741762, current state creating, last 
acting []
pg 0.19 is stuck inactive for 652.741789, current state creating, last 
acting []
pg 0.11 is stuck inactive for

Re: [ceph-users] objects degraded higher than 100%

2017-10-13 Thread Florian Haas

>> > Okay, in that case I've no idea. What was the timeline for the recovery
>> > versus the rados bench and cleanup versus the degraded object counts,
>> > then?
>>
>> 1. Jewel deployment with filestore.
>> 2. Upgrade to Luminous (including mgr deployment and "ceph osd
>> require-osd-release luminous"), still on filestore.
>> 3. rados bench with subsequent cleanup.
>> 4. All OSDs up, all  PGs active+clean.
>> 5. Stop one OSD. Remove from CRUSH, auth list, OSD map.
>> 6. Reinitialize OSD with bluestore.
>> 7. Start OSD, commencing backfill.
>> 8. Degraded objects above 100%.
>>
>> Please let me know if that information is useful. Thank you!
>
>
> Hmm, that does leave me a little perplexed.

Yeah exactly, me too. :)

> David, do we maybe do something with degraded counts based on the number of
> objects identified in pg logs? Or some other heuristic for number of objects
> that might be stale? That's the only way I can think of to get these weird
> returning sets.

One thing that just crossed my mind: would it make a difference
whether after the OSD goes out or not, in the time window between it
going down and being deleted from the crushmap/osdmap? I think it
shouldn't (whether being marked out or just non-existent, it's not
eligible for holding any data so either way), but I'm not really sure
about the mechanics of the internals here.

Cheers,
Florian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Brand new cluster -- pg is stuck inactive

2017-10-13 Thread dE


Hi,

    I'm running ceph 10.2.5 on Debian (official package).

It cant seem to create any functional pools --

ceph health detail
HEALTH_ERR 64 pgs are stuck inactive for more than 300 seconds; 64 pgs 
stuck inactive; too few PGs per OSD (21 < min 30)
pg 0.39 is stuck inactive for 652.741684, current state creating, last 
acting []
pg 0.38 is stuck inactive for 652.741688, current state creating, last 
acting []
pg 0.37 is stuck inactive for 652.741690, current state creating, last 
acting []
pg 0.36 is stuck inactive for 652.741692, current state creating, last 
acting []
pg 0.35 is stuck inactive for 652.741694, current state creating, last 
acting []
pg 0.34 is stuck inactive for 652.741696, current state creating, last 
acting []
pg 0.33 is stuck inactive for 652.741698, current state creating, last 
acting []
pg 0.32 is stuck inactive for 652.741701, current state creating, last 
acting []
pg 0.3 is stuck inactive for 652.741762, current state creating, last 
acting []
pg 0.2e is stuck inactive for 652.741715, current state creating, last 
acting []
pg 0.2d is stuck inactive for 652.741719, current state creating, last 
acting []
pg 0.2c is stuck inactive for 652.741721, current state creating, last 
acting []
pg 0.2b is stuck inactive for 652.741723, current state creating, last 
acting []
pg 0.2a is stuck inactive for 652.741725, current state creating, last 
acting []
pg 0.29 is stuck inactive for 652.741727, current state creating, last 
acting []
pg 0.28 is stuck inactive for 652.741730, current state creating, last 
acting []
pg 0.27 is stuck inactive for 652.741732, current state creating, last 
acting []
pg 0.26 is stuck inactive for 652.741734, current state creating, last 
acting []
pg 0.3e is stuck inactive for 652.741707, current state creating, last 
acting []
pg 0.f is stuck inactive for 652.741761, current state creating, last 
acting []
pg 0.3f is stuck inactive for 652.741708, current state creating, last 
acting []
pg 0.10 is stuck inactive for 652.741763, current state creating, last 
acting []
pg 0.4 is stuck inactive for 652.741773, current state creating, last 
acting []
pg 0.5 is stuck inactive for 652.741774, current state creating, last 
acting []
pg 0.3a is stuck inactive for 652.741717, current state creating, last 
acting []
pg 0.b is stuck inactive for 652.741771, current state creating, last 
acting []
pg 0.c is stuck inactive for 652.741772, current state creating, last 
acting []
pg 0.3b is stuck inactive for 652.741721, current state creating, last 
acting []
pg 0.d is stuck inactive for 652.741774, current state creating, last 
acting []
pg 0.3c is stuck inactive for 652.741722, current state creating, last 
acting []
pg 0.e is stuck inactive for 652.741776, current state creating, last 
acting []
pg 0.3d is stuck inactive for 652.741724, current state creating, last 
acting []
pg 0.22 is stuck inactive for 652.741756, current state creating, last 
acting []
pg 0.21 is stuck inactive for 652.741758, current state creating, last 
acting []
pg 0.a is stuck inactive for 652.741783, current state creating, last 
acting []
pg 0.20 is stuck inactive for 652.741761, current state creating, last 
acting []
pg 0.9 is stuck inactive for 652.741787, current state creating, last 
acting []
pg 0.1f is stuck inactive for 652.741764, current state creating, last 
acting []
pg 0.8 is stuck inactive for 652.741790, current state creating, last 
acting []
pg 0.7 is stuck inactive for 652.741792, current state creating, last 
acting []
pg 0.6 is stuck inactive for 652.741794, current state creating, last 
acting []
pg 0.1e is stuck inactive for 652.741770, current state creating, last 
acting []
pg 0.1d is stuck inactive for 652.741772, current state creating, last 
acting []
pg 0.1c is stuck inactive for 652.741774, current state creating, last 
acting []
pg 0.1b is stuck inactive for 652.741777, current state creating, last 
acting []
pg 0.1a is stuck inactive for 652.741784, current state creating, last 
acting []
pg 0.2 is stuck inactive for 652.741812, current state creating, last 
acting []
pg 0.31 is stuck inactive for 652.741762, current state creating, last 
acting []
pg 0.19 is stuck inactive for 652.741789, current state creating, last 
acting []
pg 0.11 is stuck inactive for 652.741797, current state creating, last 
acting []
pg 0.18 is stuck inactive for 652.741793, current state creating, last 
acting []
pg 0.1 is stuck inactive for 652.741820, current state creating, last 
acting []
pg 0.30 is stuck inactive for 652.741769, current state creating, last 
acting []
pg 0.17 is stuck inactive for 652.741797, current state creating, last 
acting []
pg 0.0 is stuck inactive for 652.741829, current state creating, last 
acting []
pg 0.2f is stuck inactive for 652.741774, current state creating, last 
acting []
pg 0.16 is stuck inactive for 652.741802, current state creating, last 
acting []
pg 0.12 is stuck inactive for 652.741807, current state creating, last 
acting []
pg

Re: [ceph-users] assert(objiter->second->version > last_divergent_update) when testing pull out disk and insert

2017-10-13 Thread Gregory Farnum

On Fri, Oct 13, 2017 at 12:48 AM, zhaomingyue  wrote:
> Hi：
> I had met an assert problem like
> bug16279(http://tracker.ceph.com/issues/16279) when testing pull out disk
> and insert, ceph version 10.2.5，assert(objiter->second->version >
> last_divergent_update)
>
> according to osd log，I think this maybe due to (log.head !=
> *log.log.rbegin.version.version) when some abnormal condition happened,such
> as power off ,pull out disk and insert.

I don't think is supposed to be possible. We apply all changes like
this atomically; FileStore does all its journaling to prevent partial
updates like this.

A few other people have reported the same issue on disk pull, so maybe
there's some *other* issue going on, but the correct fix is by
preventing those two from differing (unless I misunderstand the
context).

Given one of the reporters on that ticket confirms they also had xfs
issues, I find it vastly more likely that something in your kernel
configuration and hardware stack is not writing out data the way it
claims to. Be very, very sure all that is working correctly!


> In below situation, merge_log would push 234’1034 into divergent list;and
> divergent has only one node;then lead to assert(objiter->second->version >
> last_divergent_update).
>
> olog     (0’0, 234’1034)  olog.head = 234’1034
>
> log      (0’0, 234’1034)  log.head = 234’1033
>
>
>
> I see osd load_pgs code,in function PGLog::read_log() , code like this:
>  .
>  for (p->seek_to_first(); p->valid() ; p->next()) {
>
> .
>
> log.log.push_back(e);
>
> log.head = e.version;  // every pg log node
>
>   }
>
> .
>
>  log.head = info.last_update;
>
>
>
> two doubt:
>
> first : why set (log.head = info.last_update) after all pg log node
> processed(every node has updated log.head = e.version)?
>
> second: Whether it can occur that info.last_update is less than
> *log.log.rbegin.version or not and what scene happens?

I'm looking at the luminous code base right now and things have
changed a bit so I don't have the specifics of your question on hand.

But the general reason we change these versions around is because we
need to reconcile the logs across all OSDs. If one OSD has an entry
for an operation that was never returned to the client, we may need to
declare it divergent and undo it. (In replicated pools, entries are
only divergent if the OSD hosting it was either netsplit from the
primary, or else managed to commit something during a failure event
that its peers didn't and then was resubmitted under a different ID by
the client on recovery. In erasure-coded pools things are more
complicated because we can only roll operations forward if a quorum of
the shards are present.)
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] objects degraded higher than 100%

2017-10-13 Thread Gregory Farnum

On Fri, Oct 13, 2017 at 7:57 AM Florian Haas  wrote:

> On Thu, Oct 12, 2017 at 7:56 PM, Gregory Farnum 
> wrote:
> >
> >
> > On Thu, Oct 12, 2017 at 10:52 AM Florian Haas 
> wrote:
> >>
> >> On Thu, Oct 12, 2017 at 7:22 PM, Gregory Farnum 
> >> wrote:
> >> >
> >> >
> >> > On Thu, Oct 12, 2017 at 3:50 AM Florian Haas 
> >> > wrote:
> >> >>
> >> >> On Mon, Sep 11, 2017 at 8:13 PM, Andreas Herrmann 
> >> >> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > how could this happen:
> >> >> >
> >> >> > pgs: 197528/1524 objects degraded (12961.155%)
> >> >> >
> >> >> > I did some heavy failover tests, but a value higher than 100% looks
> >> >> > strange
> >> >> > (ceph version 12.2.0). Recovery is quite slow.
> >> >> >
> >> >> >   cluster:
> >> >> > health: HEALTH_WARN
> >> >> > 3/1524 objects misplaced (0.197%)
> >> >> > Degraded data redundancy: 197528/1524 objects degraded
> >> >> > (12961.155%), 1057 pgs unclean, 1055 pgs degraded, 3 pgs undersized
> >> >> >
> >> >> >   data:
> >> >> > pools:   1 pools, 2048 pgs
> >> >> > objects: 508 objects, 1467 MB
> >> >> > usage:   127 GB used, 35639 GB / 35766 GB avail
> >> >> > pgs: 197528/1524 objects degraded (12961.155%)
> >> >> >  3/1524 objects misplaced (0.197%)
> >> >> >  1042 active+recovery_wait+degraded
> >> >> >  991  active+clean
> >> >> >  8active+recovering+degraded
> >> >> >  3active+undersized+degraded+remapped+backfill_wait
> >> >> >  2active+recovery_wait+degraded+remapped
> >> >> >  2active+remapped+backfill_wait
> >> >> >
> >> >> >   io:
> >> >> > recovery: 340 kB/s, 80 objects/s
> >> >>
> >> >> Did you ever get to the bottom of this? I'm seeing something very
> >> >> similar on a 12.2.1 reference system:
> >> >>
> >> >> https://gist.github.com/fghaas/f547243b0f7ebb78ce2b8e80b936e42c
> >> >>
> >> >> I'm also seeing an unusual MISSING_ON_PRIMARY count in "rados df":
> >> >> https://gist.github.com/fghaas/59cd2c234d529db236c14fb7d46dfc85
> >> >>
> >> >> The odd thing in there is that the "bench" pool was empty when the
> >> >> recovery started (that pool had been wiped with "rados cleanup"), so
> >> >> the number of objects deemed to be missing from the primary really
> >> >> ought to be zero.
> >> >>
> >> >> It seems like it's considering these deleted objects to still require
> >> >> replication, but that sounds rather far fetched to be honest.
> >> >
> >> >
> >> > Actually, that makes some sense. This cluster had an OSD down while
> >> > (some
> >> > of) the deletes were happening?
> >>
> >> I thought of exactly that too, but no it didn't. That's the problem.
> >
> >
> > Okay, in that case I've no idea. What was the timeline for the recovery
> > versus the rados bench and cleanup versus the degraded object counts,
> then?
>
> 1. Jewel deployment with filestore.
> 2. Upgrade to Luminous (including mgr deployment and "ceph osd
> require-osd-release luminous"), still on filestore.
> 3. rados bench with subsequent cleanup.
> 4. All OSDs up, all  PGs active+clean.
> 5. Stop one OSD. Remove from CRUSH, auth list, OSD map.
> 6. Reinitialize OSD with bluestore.
> 7. Start OSD, commencing backfill.
> 8. Degraded objects above 100%.
>
> Please let me know if that information is useful. Thank you!
>

Hmm, that does leave me a little perplexed.

David, do we maybe do something with degraded counts based on the number of
objects identified in pg logs? Or some other heuristic for number of
objects that might be stale? That's the only way I can think of to get
these weird returning sets.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to get current min-compat-client setting

2017-10-13 Thread David Turner

I don't have access to a luminous cluster at the moment, but I would try
looking in the pg dump first. You could also try the crush map.

Worst case scenario you could set up a bunch of test clients and attempt to
connect them to your cluster.  You should be able to find which is the
oldest version it allows.  radosgw, ceph-fuse, or rbd-fuse from a given
ceph version should tell you whether or not that version will work
(choosing the package depending on what you use your cluster for).

You can also check `ceph features` to see which features your currently
connected clients support.

On Fri, Oct 13, 2017 at 4:23 AM Hans van den Bogert 
wrote:

> Hi,
>
> I’m in the middle of debugging some incompatibilities with an upgrade of
> Proxmox which uses Ceph. At this point I’d like to know what my current
> value is for the min-compat-client setting, which would’ve been set by:
>
> ceph osd set-require-min-compat-client …
>
> AFAIK, there is no direct get-* variant of the above command. Does anybody
> now how I can retrieve the current setting with perhaps lower level
> commands/tools ?
>
> Thanks,
>
> Hans
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] windows server 2016 refs3.1 veeam syntetic backup with fast block clone

2017-10-13 Thread Jason Dillaman

I am assuming this is just essentially block reference counting in
ReFS 3.x. Since that would just be a metadata update within the
filesystem itself, Ceph should not need to be in-the-know for this to
just work as expected.

On Fri, Oct 13, 2017 at 3:08 AM, Ronny Aasen  wrote:
> greetings
>
> when using windows storagespaces and refs 3.1 one can in veeam backups use
> something called block clone to build syntetic backups. and to reduce the
> time taken to backup vm's.
>
> i have used windows servers 2016 with refs3.1 on ceph. my question is if it
> is possible to get fast block clone and fast syntetic full backups when
> using refs on rbd on ceph.
>
> i ofcourse have other backup solutions, but this is spesific for vmware
> backups.
>
>
> possible?
>
> kind regards
> Ronny Aasen
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS metadata pool to SSDs

2017-10-13 Thread Reed Dier

As always, appreciate the help and knowledge of the collective ML mind.

> If you aren't using DC SSDs and this is prod, then I wouldn't recommend 
> moving towards this model. 

These are Samsung SM863a’s and Micron 5100 MAXs, all roughly 6-12 months old, 
with the most worn drive showing 23 P/E cycles so far.

Thanks again,

Reed

> On Oct 12, 2017, at 4:18 PM, John Spray  wrote:
> 
> On Thu, Oct 12, 2017 at 9:34 PM, Reed Dier  wrote:
>> I found an older ML entry from 2015 and not much else, mostly detailing the
>> doing performance testing to dispel poor performance numbers presented by
>> OP.
>> 
>> Currently have the metadata pool on my slow 24 HDDs, and am curious if I
>> should see any increased performance with CephFS by moving the metadata pool
>> onto SSD medium.
> 
> It depends a lot on the workload.
> 
> The primary advantage of moving metadata to dedicated drives
> (especially SSDs) is that it makes the system more deterministic under
> load.  The most benefit will be seen on systems which had previously
> had shared HDD OSDs that were fully saturated with data IO, and were
> consequently suffering from very slow metadata writes.
> 
> The impact will also depend on whether the metadata workload fit in
> the mds_cache_size or not: if the MDS is frequently missing its cache
> then the metadata pool latency will be more important.
> 
> On systems with plenty of spare IOPs, with non-latency-sensitive
> workloads, one might see little or no difference in performance when
> using SSDs, as those systems would typically bottleneck on the number
> of operations per second MDS daemon (CPU bound).  Systems like that
> would benefit more from multiple MDS daemons.
> 
> Then again, systems with plenty of spare IOPs can quickly become
> congested during recovery/backfill scenarios, so having SSDs for
> metadata is a nice risk mitigation to keep the system more responsive
> during bad times.
> 
>> My thought is that the SSDs are lower latency, and it removes those iops
>> from the slower spinning disks.
>> 
>> My next concern would be write amplification on the SSDs. Would this thrash
>> the SSD lifespan with tons of little writes or should it not be too heavy of
>> a workload to matter too much?
> 
> The MDS is comparatively efficient in how it writes out metadata:
> journal writes get batched up into larger IOs, and if something is
> frequently modified then it doesn't get written back every time (just
> when it falls off the end of the journal, or periodically).
> 
> If you've got SSDs that you're confident enough to use for data or
> general workloads, I wouldn't be too worried about using them for
> CephFS metadata.
> 
>> My last question from the operations standpoint, if I use:
>> # ceph osd pool set fs-metadata crush_ruleset 
>> Will this just start to backfill the metadata pool over to the SSDs until it
>> satisfies the crush requirements for size and failure domains and not skip a
>> beat?
> 
> On a healthy cluster, yes, this should just work.  The level of impact
> you see will depend on how much else you're trying to do with the
> system.  The prioritization of client IO vs. backfill IO has been
> improved in luminous, so you should use luminous if you can.
> 
> Because the overall size of the metadata pool is small, the smart
> thing is probably to find a time that is quiet for your system, and do
> the crush rule change at that time to get it over with quickly, rather
> than trying to do it during normal operations.
> 
> Cheers,
> John
> 
>> 
>> Obviously things like enabling dirfrags, and multiple MDS ranks will be more
>> likely to improve performance with CephFS, but the metadata pool uses very
>> little space, and I have the SSDs already, so I figured I would explore it
>> as an option.
>> 
>> Thanks,
>> 
>> Reed
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] objects degraded higher than 100%

2017-10-13 Thread Florian Haas

On Thu, Oct 12, 2017 at 7:56 PM, Gregory Farnum  wrote:
>
>
> On Thu, Oct 12, 2017 at 10:52 AM Florian Haas  wrote:
>>
>> On Thu, Oct 12, 2017 at 7:22 PM, Gregory Farnum 
>> wrote:
>> >
>> >
>> > On Thu, Oct 12, 2017 at 3:50 AM Florian Haas 
>> > wrote:
>> >>
>> >> On Mon, Sep 11, 2017 at 8:13 PM, Andreas Herrmann 
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> > how could this happen:
>> >> >
>> >> > pgs: 197528/1524 objects degraded (12961.155%)
>> >> >
>> >> > I did some heavy failover tests, but a value higher than 100% looks
>> >> > strange
>> >> > (ceph version 12.2.0). Recovery is quite slow.
>> >> >
>> >> >   cluster:
>> >> > health: HEALTH_WARN
>> >> > 3/1524 objects misplaced (0.197%)
>> >> > Degraded data redundancy: 197528/1524 objects degraded
>> >> > (12961.155%), 1057 pgs unclean, 1055 pgs degraded, 3 pgs undersized
>> >> >
>> >> >   data:
>> >> > pools:   1 pools, 2048 pgs
>> >> > objects: 508 objects, 1467 MB
>> >> > usage:   127 GB used, 35639 GB / 35766 GB avail
>> >> > pgs: 197528/1524 objects degraded (12961.155%)
>> >> >  3/1524 objects misplaced (0.197%)
>> >> >  1042 active+recovery_wait+degraded
>> >> >  991  active+clean
>> >> >  8active+recovering+degraded
>> >> >  3active+undersized+degraded+remapped+backfill_wait
>> >> >  2active+recovery_wait+degraded+remapped
>> >> >  2active+remapped+backfill_wait
>> >> >
>> >> >   io:
>> >> > recovery: 340 kB/s, 80 objects/s
>> >>
>> >> Did you ever get to the bottom of this? I'm seeing something very
>> >> similar on a 12.2.1 reference system:
>> >>
>> >> https://gist.github.com/fghaas/f547243b0f7ebb78ce2b8e80b936e42c
>> >>
>> >> I'm also seeing an unusual MISSING_ON_PRIMARY count in "rados df":
>> >> https://gist.github.com/fghaas/59cd2c234d529db236c14fb7d46dfc85
>> >>
>> >> The odd thing in there is that the "bench" pool was empty when the
>> >> recovery started (that pool had been wiped with "rados cleanup"), so
>> >> the number of objects deemed to be missing from the primary really
>> >> ought to be zero.
>> >>
>> >> It seems like it's considering these deleted objects to still require
>> >> replication, but that sounds rather far fetched to be honest.
>> >
>> >
>> > Actually, that makes some sense. This cluster had an OSD down while
>> > (some
>> > of) the deletes were happening?
>>
>> I thought of exactly that too, but no it didn't. That's the problem.
>
>
> Okay, in that case I've no idea. What was the timeline for the recovery
> versus the rados bench and cleanup versus the degraded object counts, then?

1. Jewel deployment with filestore.
2. Upgrade to Luminous (including mgr deployment and "ceph osd
require-osd-release luminous"), still on filestore.
3. rados bench with subsequent cleanup.
4. All OSDs up, all  PGs active+clean.
5. Stop one OSD. Remove from CRUSH, auth list, OSD map.
6. Reinitialize OSD with bluestore.
7. Start OSD, commencing backfill.
8. Degraded objects above 100%.

Please let me know if that information is useful. Thank you!

Cheers,
Florian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFs kernel client metadata caching

2017-10-13 Thread Burkhard Linke


Hi,


On 10/13/2017 02:26 PM, Denes Dolhay wrote:

Hi,


Thank you for your fast response!


Is there a way -that You know of- to list these locks?
The only way I know of is dumping the MDS cache content. But I don't 
know exactly how to do it or how to analyse the content.


I write to the file with echo "foo" >> /mnt/ceph/...something... so if 
there is any locking, should not it be released after the append is done?


That's the capability for the filebut there are also capabilities 
for the directory itself. And capabilities are more complex than 
read/write locks.



The strange thing is, that this -increased traffic- stage went on for 
hours, tried many times, and after I stop the watch for ~5s (not tried 
different intervals) and restart it, the traffic is gone, and there is 
normal -I think some keepalive- comm between mds and client, two 
packets in ~5s (request, response)
I'm just guessing, but at that time both clients should have 
capabiltiies for the same directory. Maybe the client was checking 
whether it needs to change its capability?



As if the metadata cache would only be populated in a timer, (between 
1s and 5s) which is never reached because of the repeated watch ls 
query  just a blind shot in the dark...


You can increase the debugging level of the MDS. This should give you 
much more information about what's going on, what kind of requests are 
passed between the MDS and the clients etc.


Regards,
burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFs kernel client metadata caching

2017-10-13 Thread Denes Dolhay


Hi,


Thank you for your fast response!


Is there a way -that You know of- to list these locks?

I write to the file with echo "foo" >> /mnt/ceph/...something... so if 
there is any locking, should not it be released after the append is done?



The strange thing is, that this -increased traffic- stage went on for 
hours, tried many times, and after I stop the watch for ~5s (not tried 
different intervals) and restart it, the traffic is gone, and there is 
normal -I think some keepalive- comm between mds and client, two packets 
in ~5s (request, response)



As if the metadata cache would only be populated in a timer, (between 1s 
and 5s) which is never reached because of the repeated watch ls query 
 just a blind shot in the dark...



Thanks:

Denes.


On 10/13/2017 01:32 PM, Burkhard Linke wrote:

Hi,


On 10/13/2017 12:36 PM, Denes Dolhay wrote:

Dear All,


First of all, this is my first post, so please be lenient :)


For the last few days I have been testing ceph, and cephfs, deploying 
a PoC cluster.


I have been testing the cephfs kernel client caching, when I came 
across something strange, and I cannot decide if it is a bug or I 
just messed up something.



Steps given client1 and client2 both mounded the same cephfs, extra 
mount option, noatime:



Client 1: watch -n 1 ls -lah /mnt/cephfs

-in tcpdump I can see that the directory is being listed once and 
only once, all the following ls requests are served from the client 
cache



Client 2: make any modification for example append to a file, or 
delete a file directly under /mnt/cephfs


-The operation is done, and client1 is informed about the change OK.

-Client1 does not seem to cache the new metadata information received 
from the metadata server, now it communicates every second with the mds.



Client 1: stop watch ls... command, wait a few sec and restart it

-The communication stops, client1 serves ls data from cache


Please help, if it is intentional then why, if not, how can I debug it?


This is probably the intended behaviour. CephFS is a posix compliant 
filesystem, and uses capabilities (similar to locks) to control 
concurrent access to directories and files.


In your first step, a capibility for directory access is granted to 
client1. As soon as client2 wants to access the directory (probably 
read-only first for listing, write access later), the MDS has to check 
the capability requests with client1. I'm not sure about the details, 
but something similar to "write lock" should be granted to client2, 
and client1 is granted a read lock or a "I have this entry in cache 
and need the MDS to know it" lock. That's also the reason why client1 
has to ask the MDS every second whether its cache content is still 
valid. client2 probably still holds the necessary capabilities, so you 
might also see some traffic between MDS and client2.


I'm not sure why client1 does not continue to ask the MDS in the last 
step. Maybe the capability in client2 has expired and it was granted 
to client1. Others with more insight into the details of capabilities 
might be able to give you more details.


Short version: CephFS has a strict posix locking semantic implemented 
by capabilities, and you need to be aware of this fact (especially if 
you are used to NFS...)


Regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFs kernel client metadata caching

2017-10-13 Thread Burkhard Linke


Hi,


On 10/13/2017 12:36 PM, Denes Dolhay wrote:

Dear All,


First of all, this is my first post, so please be lenient :)


For the last few days I have been testing ceph, and cephfs, deploying 
a PoC cluster.


I have been testing the cephfs kernel client caching, when I came 
across something strange, and I cannot decide if it is a bug or I just 
messed up something.



Steps given client1 and client2 both mounded the same cephfs, extra 
mount option, noatime:



Client 1: watch -n 1 ls -lah /mnt/cephfs

-in tcpdump I can see that the directory is being listed once and only 
once, all the following ls requests are served from the client cache



Client 2: make any modification for example append to a file, or 
delete a file directly under /mnt/cephfs


-The operation is done, and client1 is informed about the change OK.

-Client1 does not seem to cache the new metadata information received 
from the metadata server, now it communicates every second with the mds.



Client 1: stop watch ls... command, wait a few sec and restart it

-The communication stops, client1 serves ls data from cache


Please help, if it is intentional then why, if not, how can I debug it?


This is probably the intended behaviour. CephFS is a posix compliant 
filesystem, and uses capabilities (similar to locks) to control 
concurrent access to directories and files.


In your first step, a capibility for directory access is granted to 
client1. As soon as client2 wants to access the directory (probably 
read-only first for listing, write access later), the MDS has to check 
the capability requests with client1. I'm not sure about the details, 
but something similar to "write lock" should be granted to client2, and 
client1 is granted a read lock or a "I have this entry in cache and need 
the MDS to know it" lock. That's also the reason why client1 has to ask 
the MDS every second whether its cache content is still valid. client2 
probably still holds the necessary capabilities, so you might also see 
some traffic between MDS and client2.


I'm not sure why client1 does not continue to ask the MDS in the last 
step. Maybe the capability in client2 has expired and it was granted to 
client1. Others with more insight into the details of capabilities might 
be able to give you more details.


Short version: CephFS has a strict posix locking semantic implemented by 
capabilities, and you need to be aware of this fact (especially if you 
are used to NFS...)


Regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] CephFs kernel client metadata caching

2017-10-13 Thread Denes Dolhay


Dear All,


First of all, this is my first post, so please be lenient :)


For the last few days I have been testing ceph, and cephfs, deploying a 
PoC cluster.


I have been testing the cephfs kernel client caching, when I came across 
something strange, and I cannot decide if it is a bug or I just messed 
up something.



Steps given client1 and client2 both mounded the same cephfs, extra 
mount option, noatime:



Client 1: watch -n 1 ls -lah /mnt/cephfs

-in tcpdump I can see that the directory is being listed once and only 
once, all the following ls requests are served from the client cache



Client 2: make any modification for example append to a file, or delete 
a file directly under /mnt/cephfs


-The operation is done, and client1 is informed about the change OK.

-Client1 does not seem to cache the new metadata information received 
from the metadata server, now it communicates every second with the mds.



Client 1: stop watch ls... command, wait a few sec and restart it

-The communication stops, client1 serves ls data from cache


Please help, if it is intentional then why, if not, how can I debug it?

Where can I find documentation about the kernel client and it's caching 
strategy?



Thank You!



Ceph cluster version: Luminous on all nodes

Client: tested on ubuntu xential - 4.4.0-96-generic #119-Ubuntu - all 
settings default, except noatime


Client: tested on Fedora 26 - 4.13.5-200.fc26.x86_64+debug - all 
settings default, except noatime





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] assertion error trying to start mds server

2017-10-13 Thread John Spray

On Fri, Oct 13, 2017 at 4:13 AM, Bill Sharer  wrote:
> After your comment about the dual mds servers I decided to just give up
> trying to get the second restarted.  After eyeballing what I had on one
> of the new Ryzen boxes for drive space, I decided to just dump the
> filesystem.  That will also make things go faster if and when I flip
> everything over to bluestore.  So far so good...  I just took a peek and
> saw the files being owned by Mr root though.  Is there going to be an
> ownership reset at some point or will I have to resolve that by hand?

You'll have to do that by hand I'm afraid -- the tool only works on
what it gets in the data pool, which is just the path and the layout.
The rest of the metadata is all lost.  This also includes any xattrs,
acls, etc.

John

>
> On 10/12/2017 06:09 AM, John Spray wrote:
>> On Thu, Oct 12, 2017 at 12:23 AM, Bill Sharer  wrote:
>>> I was wondering if I can't get the second mds back up That offline
>>> backward scrub check sounds like it should be able to also salvage what
>>> it can of the two pools to a normal filesystem.  Is there an option for
>>> that or has someone written some form of salvage tool?
>> Yep, cephfs-data-scan can do that.
>>
>> To scrape the files out of a CephFS data pool to a local filesystem, do this:
>> cephfs-data-scan scan_extents   # this is discovering
>> all the file sizes
>> cephfs-data-scan scan_inodes --output-dir /tmp/my_output 
>>
>> The time taken by both these commands scales linearly with the number
>> of objects in your data pool.
>>
>> This tool may not see the correct filename for recently created files
>> (any file whose metadata is in the journal but not flushed), these
>> files will go into a lost+found directory, named after their inode
>> number.
>>
>> John
>>
>>> On 10/11/2017 07:07 AM, John Spray wrote:
 On Wed, Oct 11, 2017 at 1:42 AM, Bill Sharer  
 wrote:
> I've been in the process of updating my gentoo based cluster both with
> new hardware and a somewhat postponed update.  This includes some major
> stuff including the switch from gcc 4.x to 5.4.0 on existing hardware
> and using gcc 6.4.0 to make better use of AMD Ryzen on the new
> hardware.  The existing cluster was on 10.2.2, but I was going to
> 10.2.7-r1 as an interim step before moving on to 12.2.0 to begin
> transitioning to bluestore on the osd's.
>
> The Ryzen units are slated to be bluestore based OSD servers if and when
> I get to that point.  Up until the mds failure, they were simply cephfs
> clients.  I had three OSD servers updated to 10.2.7-r1 (one is also a
> MON) and had two servers left to update.  Both of these are also MONs
> and were acting as a pair of dual active MDS servers running 10.2.2.
> Monday morning I found out the hard way that an UPS one of them was on
> has a dead battery.  After I fsck'd and came back up, I saw the
> following assertion error when it was trying to start it's mds.B server:
>
>
>  mdsbeacon(64162/B up:replay seq 3 v4699) v7  126+0+0 (709014160
> 0 0) 0x7f6fb4001bc0 con 0x55f94779d
> 8d0
>  0> 2017-10-09 11:43:06.935662 7f6fa9ffb700 -1 mds/journal.cc: In
> function 'virtual void EImportStart::r
> eplay(MDSRank*)' thread 7f6fa9ffb700 time 2017-10-09 11:43:06.934972
> mds/journal.cc: 2929: FAILED assert(mds->sessionmap.get_version() == 
> cmapv)
>
>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x82) [0x55f93d64a122]
>  2: (EImportStart::replay(MDSRank*)+0x9ce) [0x55f93d52a5ce]
>  3: (MDLog::_replay_thread()+0x4f4) [0x55f93d4a8e34]
>  4: (MDLog::ReplayThread::entry()+0xd) [0x55f93d25bd4d]
>  5: (()+0x74a4) [0x7f6fd009b4a4]
>  6: (clone()+0x6d) [0x7f6fce5a598d]
>  NOTE: a copy of the executable, or `objdump -rdS ` is
> needed to interpret this.
>
> --- logging levels ---
>0/ 5 none
>0/ 1 lockdep
>0/ 1 context
>1/ 1 crush
>1/ 5 mds
>1/ 5 mds_balancer
>1/ 5 mds_locker
>1/ 5 mds_log
>1/ 5 mds_log_expire
>1/ 5 mds_migrator
>0/ 1 buffer
>0/ 1 timer
>0/ 1 filer
>0/ 1 striper
>0/ 1 objecter
>0/ 5 rados
>0/ 5 rbd
>0/ 5 rbd_mirror
>0/ 5 rbd_replay
>0/ 5 journaler
>0/ 5 objectcacher
>0/ 5 client
>0/ 5 osd
>0/ 5 optracker
>0/ 5 objclass
>1/ 3 filestore
>1/ 3 journal
>0/ 5 ms
>1/ 5 mon
>0/10 monc
>1/ 5 paxos
>0/ 5 tp
>1/ 5 auth
>1/ 5 crypto
>1/ 1 finisher
>1/ 5 heartbeatmap
>1/ 5 perfcounter
>1/ 5 rgw
>1/10 civetweb
>1/ 5 javaclient
>1/ 5 asok
>1/ 1

[ceph-users] How to get current min-compat-client setting

2017-10-13 Thread Hans van den Bogert

Hi, 

I’m in the middle of debugging some incompatibilities with an upgrade of 
Proxmox which uses Ceph. At this point I’d like to know what my current value 
is for the min-compat-client setting, which would’ve been set by:

ceph osd set-require-min-compat-client …

AFAIK, there is no direct get-* variant of the above command. Does anybody now 
how I can retrieve the current setting with perhaps lower level commands/tools ?

Thanks, 

Hans
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] assert(objiter->second->version > last_divergent_update) when testing pull out disk and insert

2017-10-13 Thread zhaomingyue

Hi：
I had met an assert problem like 
bug16279(http://tracker.ceph.com/issues/16279) when testing pull out disk and 
insert, ceph version 10.2.5，assert(objiter->second->version > 
last_divergent_update)
according to osd log，I think this maybe due to (log.head != 
*log.log.rbegin.version.version) when some abnormal condition happened,such as 
power off ,pull out disk and insert.
In below situation, merge_log would push 234’1034 into divergent list;and 
divergent has only one node;then lead to assert(objiter->second->version > 
last_divergent_update).
olog     (0’0, 234’1034)  olog.head = 234’1034
log      (0’0, 234’1034)  log.head = 234’1033

I see osd load_pgs code,in function PGLog::read_log() , code like this:
 .
 for (p->seek_to_first(); p->valid() ; p->next()) {
.
log.log.push_back(e);
log.head = e.version;  // every pg log node
  }
.
 log.head = info.last_update;

two doubt:
first : why set (log.head = info.last_update) after all pg log node 
processed(every node has updated log.head = e.version)?
second: Whether it can occur that info.last_update is less than 
*log.log.rbegin.version or not and what scene happens?

Looking forward to your reply!thks

-
本邮件及其附件含有新华三技术有限公司的保密信息，仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、
或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本
邮件！
This e-mail and its attachments contain confidential information from New H3C, 
which is
intended only for the person or entity whose address is listed above. Any use 
of the
information contained herein in any way (including, but not limited to, total 
or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender
by phone or email immediately and delete it!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cephx

2017-10-13 Thread Ashley Merrick

Seems like it is possible with some "workaround" : 
https://blog-fromsomedude.rhcloud.com/2016/04/26/Allowing-a-RBD-client-to-map-only-one-RBD/

Allowing a RBD client to map only one RBD | Somedude's 
Blog
blog-fromsomedude.rhcloud.com
Allowing the mapping of only a single RBD from a client and preventing them 
from listing the content of the pool, create/removing other’s images or even 
listing ...




Is this still the only / best way being the blog post is over a year old.


,Ashley


From: Ashley Merrick
Sent: 13 October 2017 07:54:27
To: Shinobu Kinjo
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] cephx


Hello,


http://docs.ceph.com/docs/master/rados/operations/user-management/

User Management — Ceph 
Documentation
docs.ceph.com
User Management¶ This document describes Ceph Client users, and their 
authentication and authorization with the Ceph Storage Cluster. Users are 
either individuals or ...




>From this page for example the following line : ceph auth add client.john mon 
>'allow r' osd 'allow rw pool=liverpool'


Mentions about limiting pool access but not RBD/Image level access (Unless I am 
missing it)


,Ashley


From: Shinobu Kinjo 
Sent: 13 October 2017 07:41
To: Ashley Merrick
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] cephx

On Fri, Oct 13, 2017 at 3:29 PM, Ashley Merrick  wrote:
> Hello,
>
>
> Is it possible to limit a cephx user to one image?
>
>
> I have looked and seems it's possible per a pool, but can't find a per image
> option.

What did you look at?

Best regards,
Shinobu Kinjo

>
>
> ,Ashley
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
ceph-users Info Page
lists.ceph.com
To see the collection of prior postings to the list, visit the ceph-users 
Archives. Using ceph-users: To post a message to all the list members, send ...



>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] windows server 2016 refs3.1 veeam syntetic backup with fast block clone

2017-10-13 Thread Ronny Aasen


greetings

when using windows storagespaces and refs 3.1 one can in veeam backups 
use something called block clone to build syntetic backups. and to 
reduce the time taken to backup vm's.


i have used windows servers 2016 with refs3.1 on ceph. my question is if 
it is possible to get fast block clone and fast syntetic full backups 
when using refs on rbd on ceph.


i ofcourse have other backup solutions, but this is spesific for vmware 
backups.



possible?

kind regards
Ronny Aasen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cephx

2017-10-13 Thread Ashley Merrick

Hello,


http://docs.ceph.com/docs/master/rados/operations/user-management/

User Management — Ceph 
Documentation
docs.ceph.com
User Management¶ This document describes Ceph Client users, and their 
authentication and authorization with the Ceph Storage Cluster. Users are 
either individuals or ...




>From this page for example the following line : ceph auth add client.john mon 
>'allow r' osd 'allow rw pool=liverpool'


Mentions about limiting pool access but not RBD/Image level access (Unless I am 
missing it)


,Ashley


From: Shinobu Kinjo 
Sent: 13 October 2017 07:41
To: Ashley Merrick
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] cephx

On Fri, Oct 13, 2017 at 3:29 PM, Ashley Merrick  wrote:
> Hello,
>
>
> Is it possible to limit a cephx user to one image?
>
>
> I have looked and seems it's possible per a pool, but can't find a per image
> option.

What did you look at?

Best regards,
Shinobu Kinjo

>
>
> ,Ashley
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
ceph-users Info Page
lists.ceph.com
To see the collection of prior postings to the list, visit the ceph-users 
Archives. Using ceph-users: To post a message to all the list members, send ...



>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cephx

2017-10-13 Thread Shinobu Kinjo

On Fri, Oct 13, 2017 at 3:29 PM, Ashley Merrick  wrote:
> Hello,
>
>
> Is it possible to limit a cephx user to one image?
>
>
> I have looked and seems it's possible per a pool, but can't find a per image
> option.

What did you look at?

Best regards,
Shinobu Kinjo

>
>
> ,Ashley
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] cephx

2017-10-13 Thread Ashley Merrick

Hello,


Is it possible to limit a cephx user to one image?


I have looked and seems it's possible per a pool, but can't find a per image 
option.


,Ashley
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How dead is my ec pool?

[ceph-users] How dead is my ec pool?

Re: [ceph-users] osd max scrubs not honored?

Re: [ceph-users] Brand new cluster -- pg is stuck inactive

Re: [ceph-users] using Bcache on blueStore

Re: [ceph-users] [MONITOR SEGFAULT] Luminous cluster stuck when adding monitor

Re: [ceph-users] Brand new cluster -- pg is stuck inactive

Re: [ceph-users] Brand new cluster -- pg is stuck inactive

Re: [ceph-users] Brand new cluster -- pg is stuck inactive

Re: [ceph-users] Brand new cluster -- pg is stuck inactive

Re: [ceph-users] Brand new cluster -- pg is stuck inactive

Re: [ceph-users] objects degraded higher than 100%

Re: [ceph-users] Brand new cluster -- pg is stuck inactive

Re: [ceph-users] objects degraded higher than 100%

[ceph-users] Brand new cluster -- pg is stuck inactive

Re: [ceph-users] assert(objiter->second->version > last_divergent_update) when testing pull out disk and insert

Re: [ceph-users] objects degraded higher than 100%

Re: [ceph-users] How to get current min-compat-client setting

Re: [ceph-users] windows server 2016 refs3.1 veeam syntetic backup with fast block clone

Re: [ceph-users] CephFS metadata pool to SSDs

Re: [ceph-users] objects degraded higher than 100%

Re: [ceph-users] CephFs kernel client metadata caching

Re: [ceph-users] CephFs kernel client metadata caching

Re: [ceph-users] CephFs kernel client metadata caching

[ceph-users] CephFs kernel client metadata caching

Re: [ceph-users] assertion error trying to start mds server

[ceph-users] How to get current min-compat-client setting

[ceph-users] assert(objiter->second->version > last_divergent_update) when testing pull out disk and insert

Re: [ceph-users] cephx

[ceph-users] windows server 2016 refs3.1 veeam syntetic backup with fast block clone

Re: [ceph-users] cephx

Re: [ceph-users] cephx

[ceph-users] cephx

33 matches

Site Navigation

Mail list logo

Footer information