Re: [ceph-users] ceph-volume lvm create leaves half-built OSDs lying around [EXT]

2019-09-11 Thread Matthew Vernon
On 11/09/2019 12:18, Alfredo Deza wrote:
> On Wed, Sep 11, 2019 at 6:18 AM Matthew Vernon  wrote:

>> or
>> ii) allow the bootstrap-osd credential to purge OSDs
> 
> I wasn't aware that the bootstrap-osd credentials allowed to
> purge/destroy OSDs, are you sure this is possible? If it is I think
> that would be reasonable to try.

Sorry, that was my point - currently, the bootstrap-osd credential
iasn't allowed to purge/destroy OSDs, but we could decide that the
correct fix is to change that so it can. I'm not convinced that's a good
idea, though!

Regards,

Matthew



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-volume lvm create leaves half-built OSDs lying around

2019-09-11 Thread Matthew Vernon
Hi,

We keep finding part-made OSDs (they appear not attached to any host,
and down and out; but still counting towards the number of OSDs); we
never saw this with ceph-disk. On investigation, this is because
ceph-volume lvm create makes the OSD (ID and auth at least) too early in
the process and is then unable to roll-back cleanly (because the
bootstrap-osd credential isn't allowed to remove OSDs).

As an example (very truncated):

Running command: /usr/bin/ceph --cluster ceph --name
client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
-i - osd new 20cea174-4c1b-4330-ad33-505a03156c33
Running command: vgcreate --force --yes
ceph-9d66ec60-c71b-49e0-8c1a-e74e98eafb0e /dev/sdbh
 stderr: Device /dev/sdbh not found (or ignored by filtering).
  Unable to add physical volume '/dev/sdbh' to volume group
'ceph-9d66ec60-c71b-49e0-8c1a-e74e98eafb0e'.
--> Was unable to complete a new OSD, will rollback changes
--> OSD will be fully purged from the cluster, because the ID was generated
Running command: ceph osd purge osd.828 --yes-i-really-mean-it
 stderr: 2019-09-10 15:07:53.396528 7fbca2caf700 -1 auth: unable to find
a keyring on
/etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,:
(2) No such file or directory
 stderr: 2019-09-10 15:07:53.397318 7fbca2caf700 -1 monclient:
authenticate NOTE: no keyring found; disabled cephx authentication
2019-09-10 15:07:53.397334 7fbca2caf700  0 librados: client.admin
authentication error (95) Operation not supported

This is annoying to have to clear up, and it seems to me could be
avoided by either:

i) ceph-volume should (attempt to) set up the LVM volumes  before
making the new OSD id
or
ii) allow the bootstrap-osd credential to purge OSDs

i) seems like clearly the better answer...?

Regards,

Matthew



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adventures with large RGW buckets [EXT]

2019-08-02 Thread Matthew Vernon
Hi,

On 02/08/2019 13:23, Lars Marowsky-Bree wrote:
> On 2019-08-01T15:20:19, Matthew Vernon  wrote:
> 
>> One you don't mention is that multipart uploads break during resharding - so
>> if our users are filling up a bucket with many writers uploading multipart
>> objects, some of these will fail (rather than blocking) when the bucket is
>> resharded.
> 
> Is that on the tracker? I couldn't find it. If you can reproduce, would
> you add that please?

Not as yet - our support vendor have reproduced the issue, so I'll ask
them to open a ticket on the tracker (I mean, I could, but their
reproducer is probably neater than mine :) ).

Regards,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adventures with large RGW buckets [EXT]

2019-08-01 Thread Matthew Vernon

Hi,

On 31/07/2019 19:02, Paul Emmerich wrote:

Some interesting points here, thanks for raising them :)


 From our experience: buckets with tens of million objects work just fine with
no big problems usually. Buckets with hundreds of million objects require some
attention. Buckets with billions of objects? "How about indexless buckets?" -
"No, we need to list them".


We've had some problems with large buckets (from around the 70Mobject 
mark).


One you don't mention is that multipart uploads break during resharding 
- so if our users are filling up a bucket with many writers uploading 
multipart objects, some of these will fail (rather than blocking) when 
the bucket is resharded.



1. The recommended number of objects per shard is 100k. Why? How was this
default configuration derived?


I don't know what a good number is, but by the time you get into O(10M) 
objects, some sharding does seem to help - we've found a particular OSD 
getting really hammered by heavy updates on large buckets (in Jewel, 
before we had online resharding).



3. Deleting large buckets

Someone accidentaly put 450 million small objects into a bucket and only noticed
when the cluster ran full. The bucket isn't needed, so just delete it and case
closed?

Deleting is unfortunately far slower than adding objects, also
radosgw-admin leaks
memory during deletion:


We've also seen bucket deletion via radosgw-admin failing because of 
oddities in the bucket itself (e.g. missing shadow objects, omap objects 
that still exist when the related object is gone); sorting that was a 
bit fiddly (with some help from Canonical, who I think are working on 
patches).



Increasing --max-concurrent-ios helps with deletion speed (option does effect
deletion concurrency, documentation says it's only for other specific commands).


Yes, we found increasing max-concurrent-ios helped.

Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Health Check error ( havent seen before ) [EXT]

2019-07-30 Thread Matthew Vernon

On 29/07/2019 23:24, Brent Kennedy wrote:
Apparently sent my email too quickly.  I had to install python-pip on 
the mgr nodes and run “pip install requests==2.6.0” to fix the missing 
module and then reboot all three monitors.  Now the dashboard enables no 
issue.


I'm a bit confused as to why installing the python-requests package 
wasn't the correct answer? 16.04 has 2.9.1


Regards,

Matthew



--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-07-25 Thread Matthew Vernon

On 24/07/2019 20:06, Paul Emmerich wrote:

+1 on adding them all at the same time.

All these methods that gradually increase the weight aren't really 
necessary in newer releases of Ceph.


FWIW, we added a rack-full (9x60 = 540 OSDs) in one go to our production 
cluster (then running Jewel) taking it from 2520 to 3060 OSDs and it 
wasn't a big issue.


Regards,

Matthew



--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] memory usage of: radosgw-admin bucket rm [EXT]

2019-07-11 Thread Matthew Vernon

On 11/07/2019 15:40, Paul Emmerich wrote:

Is there already a tracker issue?

I'm seeing the same problem here. Started deletion of a bucket with a 
few hundred million objects a week ago or so and I've now noticed that 
it's also leaking memory and probably going to crash.

Going to investigate this further...


We had a bucket rm on a machine that OOM'd (and killed the relevant 
process), but I wasn't watching at the time to see if it was the thing 
eating all the RAM.


If someone's giving the bucket rm code some love, it'd be nice if
https://tracker.ceph.com/issues/40587 (and associated PR) got looked at 
- missing shadow objects shouldn't really cause a bucket rm to give up...


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Even more objects in a single bucket?

2019-06-17 Thread Matthew Vernon
Hi,

On 17/06/2019 16:00, Harald Staub wrote:
> There are customers asking for 500 million objects in a single object
> storage bucket (i.e. 5000 shards), but also more. But we found some
> places that say that there is a limit in the number of shards per
> bucket, e.g.

Our largest bucket was about 70 million objects (about 1.3PB) of data,
and we're currently deleting it (via radosgw-admin) since the relevant
users don't want it any more. It's going to take a few weeks...

I'd expect any operation that needed to list a bucket with so many
objects to be very slow...

Regards,

Matthew



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rolling upgrade fails with flag norebalance with background IO [EXT]

2019-05-14 Thread Matthew Vernon
On 14/05/2019 00:36, Tarek Zegar wrote:
> It's not just mimic to nautilus
> I confirmed with luminous to mimic
>  
> They are checking for clean pgs with flags set, they should unset flags,
> then check. Set flags again, move on to next osd

I think I'm inclined to agree that "norebalance" is likely to get in the
way when upgrading a cluster - our rolling upgrade playbook omits it.

OTOH, you might want to raise this on the ceph-ansible list (
ceph-ansi...@lists.ceph.com ) and/or as a github issue - I don't think
the ceph-ansible maintainers routinely watch this list.

HTH,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw index all keys in all buckets [EXT]

2019-05-13 Thread Matthew Vernon
Hi,

On 02/05/2019 22:00, Aaron Bassett wrote:

> With these caps I'm able to use a python radosgw-admin lib to list
> buckets and acls and users, but not keys. This user is also unable to
> read buckets and/or keys through the normal s3 api. Is there a way to
> create an s3 user that has read access to all buckets and keys
> without explicitly being granted acls?
I think you might want the --system argument to radosgw-admin user modify?

Regards,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] collectd problems with pools

2019-02-28 Thread Matthew Vernon

Hi,

On 28/02/2019 17:00, Marc Roos wrote:


Should you not be pasting that as an issue on github collectd-ceph? I
hope you don't mind me asking, I am also using collectd and dumping the
data to influx. Are you downsampling with influx? ( I am not :/ [0])


It might be "ask collectd-ceph authors nicely" is the answer, but I 
figured I'd ask here first, since there might be a solution available 
already.


Also, given collectd-ceph works currently by asking the various daemons 
about their perf data, there's not an obvious analogue for pool-related 
metrics, since there isn't a daemon socket to poke in the same manner.


We use graphite/carbon as our data store, so no, nothing influx-related 
(we're trying to get rid of our last few uses of influxdb here).


Regards,

Matthew



--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] collectd problems with pools

2019-02-28 Thread Matthew Vernon

Hi,

We monitor our Ceph clusters (production is Jewel, test clusters are on 
Luminous) with collectd and its official ceph plugin.


The one thing that's missing is per-pool outputs - the collectd plugin 
just talks to the individual daemons, none of which have pool details in 
- those are available via


ceph osd pool stats -f json

...which I could wrap to emit collectd metrics, but surely this is an 
already-invented wheel?


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Self serve / automated S3 key creation?

2019-02-01 Thread Matthew Vernon

Hi,

On 31/01/2019 17:11, shubjero wrote:
Has anyone automated the ability to generate S3 keys for OpenStack users 
in Ceph? Right now we take in a users request manually (Hey we need an 
S3 API key for our OpenStack project 'X', can you help?). We as 
cloud/ceph admins just use radosgw-admin to create them an access/secret 
key pair for their specific OpenStack project and provide it to them 
manually. Was just wondering if there was a self-serve way to do that. 
Curious to hear what others have done in regards to this.


We've set something up so our Service Desk folks can do this; they use 
"rundeck", so we made a script that rundeck runs that works, in very 
brief outline, thus:


ssh to one of our RGW machines, as a restricted user with forced-command

that user calls a userv service

the userv service does some sanity-checking, then calls a script that 
executes the radosgw-admin command(s) and returns the new keys


the rundeck user has access to user home directories, so makes a .s3cfg 
file with the keys returned, places them in the users' home 
directory[0], and emails the user (including our "getting started with 
S3" docs).


...with similar setup for quota adjustments, and similar.

We quota S3 space separately from Openstack volumes and suchlike.

Regards,

Matthew

[0] strictly, the users can override this behaviour with a userv service 
of their own



--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-ansible - where to ask questions? [EXT]

2019-01-31 Thread Matthew Vernon

Hi,

On 31/01/2019 16:06, Will Dennis wrote:


Trying to utilize the ‘ceph-ansible’ project 
(https://github.com/ceph/ceph-ansible)
to deploy some Ceph servers in a Vagrant testbed; hitting some issues 
with some of the plays – where is the right (best) venue to ask 
questions about this?


There's a list for ceph-ansible: ceph-ansi...@lists.ceph.com /
http://lists.ceph.com/listinfo.cgi/ceph-ansible-ceph.com

HTH,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best practice for increasing number of pg and pgp

2019-01-30 Thread Matthew Vernon
Hi,

On 30/01/2019 02:39, Albert Yue wrote:

> As the number of OSDs increase in our cluster, we reach a point where
> pg/osd is lower than recommend value and we want to increase it from
> 4096 to 8192. 

For an increase that small, I'd just do it in one go (and have done so
on our production clusters without issue); I'd only think about doing it
in stages for a larger increase.

Regards,

Matthew



-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Commercial support

2019-01-24 Thread Matthew Vernon

Hi,

On 23/01/2019 22:28, Ketil Froyn wrote:

How is the commercial support for Ceph? More specifically, I was  
recently pointed in the direction of the very interesting combination of 
CephFS, Samba and ctdb. Is anyone familiar with companies that provide 
commercial support for in-house solutions like this?


To add to the answers you've already had:

Ubuntu also offer Ceph & Swift support:
https://www.ubuntu.com/support/plans-and-pricing#storage

Croit offer their own managed Ceph product, but do also offer 
support/consulting for Ceph installs, I think:

https://croit.io/

There are some smaller consultancies, too, including 42on which is run 
by Wido den Hollander who you will have seen posting here:

https://www.42on.com/

Regards,

Matthew
disclaimer: I have no commercial relationship to any of the above


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] logging of cluster status (Jewel vs Luminous and later)

2019-01-24 Thread Matthew Vernon

Hi,

On our Jewel clusters, the mons keep a log of the cluster status e.g.

2019-01-24 14:00:00.028457 7f7a17bef700  0 log_channel(cluster) log 
[INF] : HEALTH_OK
2019-01-24 14:00:00.646719 7f7a46423700  0 log_channel(cluster) log 
[INF] : pgmap v66631404: 173696 pgs: 10 active+clean+scrubbing+deep, 
173686 active+clean; 2271 TB data, 6819 TB used, 9875 TB / 16695 TB 
avail; 1313 MB/s rd, 236 MB/s wr, 12921 op/s


This is sometimes useful after a problem, to see when thing started 
going wrong (which can be helpful for incident response and analysis) 
and so on. There doesn't appear to be any such logging in Luminous, 
either by mons or mgrs. What am I missing?


Thanks,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] The OSD can be “down” but still “in”.

2019-01-22 Thread Matthew Vernon
Hi,

On 22/01/2019 10:02, M Ranga Swami Reddy wrote:
> Hello - If an OSD shown as down and but its still "in" state..what
> will happen with write/read operations on this down OSD?

It depends ;-)

In a typical 3-way replicated setup with min_size 2, writes to placement
groups on that OSD will still go ahead - when 2 replicas are written OK,
then the write will complete. Once the OSD comes back up, these writes
will then be replicated to that OSD. If it stays down for long enough to
be marked out, then pgs on that OSD will be replicated elsewhere.

If you had min_size 3 as well, then writes would block until the OSD was
back up (or marked out and the pgs replicated to another OSD).

Regards,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How many rgw buckets is too many?

2019-01-17 Thread Matthew Vernon

Hi,

The default limit for buckets per user in ceph is 1000, but it is 
adjustable via radosgw-admin user modify --max-buckets


One of our users is asking for a significant increase (they're mooting 
100,000), and I worry about the impact on RGW performance since, I 
think, there's only one object that stores the bucket identifiers.


Has anyone here got experience of rgw with very large numbers of 
buckets? FWIW we're running Jewel with a Luminous upgrade planned for 
Quite Soon...


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] /var/lib/ceph/mon/ceph-{node}/store.db on mon nodes

2019-01-16 Thread Matthew Vernon
Hi,

On 16/01/2019 09:02, Brian Topping wrote:

> I’m looking at writes to a fragile SSD on a mon node,
> /var/lib/ceph/mon/ceph-{node}/store.db is the big offender at the
> moment.
> Is it required to be on a physical disk or can it be in tempfs? One
> of the log files has paxos strings, so I’m guessing it has to be on
> disk for a panic recovery? Are there other options?
Yeah, the mon store is worth keeping ;-) It can get quite large with a
large cluster and/or big rebalances. We bought some extra storage for
our mons and put the mon store onto dedicated storage.

Regards,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mimic 13.2.3?

2019-01-09 Thread Matthew Vernon
Hi,

On 08/01/2019 18:58, David Galloway wrote:

> The current distro matrix is:
> 
> Luminous: xenial centos7 trusty jessie stretch
> Mimic: bionic xenial centos7

Thanks for clarifying :)

> This may have been different in previous point releases because, as Greg
> mentioned in an earlier post in this thread, the release process has
> changed hands and I'm still working on getting a solid/bulletproof
> process documented, in place, and (more) automated.
> 
> I wouldn't be the final decision maker but if you think we should be
> building Mimic packages for Debian (for example), we could consider it.
>  The build process should support it I believe.

Could I suggest building Luminous for Bionic, and Mimic for Buster, please?

Regards,

Matthew



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mimic 13.2.3?

2019-01-08 Thread Matthew Vernon
Dear Greg,

On 04/01/2019 19:22, Gregory Farnum wrote:

> Regarding Ceph releases more generally:

[snip]

> I imagine we will discuss all this in more detail after the release,
> but everybody's patience is appreciated as we work through these
> challenges.

Thanks for this. Could you confirm that which distros (of Debian/Ubuntu)
binary packages for the various Ceph releases are built is something
you're going to try and sort out, please?

[e.g. my earlier post
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-December/031966.html
]

...if not, should I open a tracker issue? I could build binaries myself,
obviously, but this seems a bit wasteful...

Regards,

Matthew



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mimic 13.2.3?

2019-01-04 Thread Matthew Vernon
Hi,

On 04/01/2019 15:34, Abhishek Lekshmanan wrote:
> Ashley Merrick  writes:
> 
>> If this is another nasty bug like .2? Can’t you remove .3 from being
>> available till .4 comes around?
> 
> This time there isn't a nasty bug, just a a couple of more fixes in .4
> which would be better to have. We're building 12.2.4 as we speak
>> Myself will wait for proper confirmation always but others may run an apt
>> upgrade for any other reason and end up with .3 packages.

Without wishing to bang on about this, how is it still the case that
packages are being pushed onto the official ceph.com repos that people
shouldn't install? This has caused plenty of people problems on several
occasions now, and a number of people have offered help to fix it...

Regards,

Matthew



-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Package availability for Debian / Ubuntu

2018-12-20 Thread Matthew Vernon

Hi,

Since the "where are the bionic packages for Luminous?" question remains 
outstanding, I thought I'd look at the question a little further.


The TL;DR is:

Jewel: built for Ubuntu trusty & xenial ; Debian jessie & stretch

Luminous: built for Ubuntu trusty & xenial ; Debian jessie & stretch

Mimic: built for Ubuntu xenial & bionic ; no Debian releases

(in the other cases, a single ceph-deploy package is shipped).

I don't _think_ this is what you're trying to achieve? In particular, do 
you really only want to provide bionic package for Mimic? It feels like 
your build machinery isn't quite doing what you want here, given you've 
previously spoken about building bionic packages for Luminous...


In more detail:

Packages for Ceph jewel:
precise has 1 Packages. No ceph package found
trusty has 47 Packages. Ceph version 10.2.11-1trusty
xenial has 47 Packages. Ceph version 10.2.11-1xenial
bionic has 1 Packages. No ceph package found
wheezy has 1 Packages. No ceph package found
jessie has 47 Packages. Ceph version 10.2.11-1~bpo80+1
stretch has 47 Packages. Ceph version 10.2.11-1~bpo90+1

Packages for Ceph luminous:
precise has 1 Packages. No ceph package found
trusty has 63 Packages. Ceph version 12.2.10-1trusty
xenial has 63 Packages. Ceph version 12.2.10-1xenial
bionic has 1 Packages. No ceph package found
wheezy has 1 Packages. No ceph package found
jessie has 63 Packages. Ceph version 12.2.10-1~bpo80+1
stretch has 63 Packages. Ceph version 12.2.10-1~bpo90+1

Packages for Ceph mimic:
precise has 1 Packages. No ceph package found
trusty has 1 Packages. No ceph package found
xenial has 63 Packages. Ceph version 13.2.2-1xenial
bionic has 63 Packages. Ceph version 13.2.2-1bionic
wheezy has 1 Packages. No ceph package found
jessie has 1 Packages. No ceph package found
stretch has 1 Packages. No ceph package found

If you want to re-run these tests, the attached hacky shell script does it.

Regards,

Matthew



--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 

ceph_versions.sh
Description: application/shellscript
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] disk controller failure

2018-12-13 Thread Matthew Vernon

Hi,

On 13/12/2018 16:44, Dietmar Rieder wrote:


So you say, that there will be no problem when after the rebalancing I
restart the stopped OSDs? I mean the have still the data on them.
(Sorry, I just don't like to mess somthing up)


It should be fine[0]; when the OSDs come back in ceph will know what to 
do with them.


Regards,

Matthew

[0] this consultancy worth what you paid for it ;-)




--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] disk controller failure

2018-12-13 Thread Matthew Vernon

Hi,

On 13/12/2018 15:48, Dietmar Rieder wrote:


one of our OSD nodes is experiencing a Disk controller problem/failure
(frequent resetting), so the OSDs on this controller are flapping
(up/down in/out).


Ah, hardware...


I have some simple questions, what are the best steps to take now before
an after replacement of the controller?


I would stop all the OSDs on the affected node and let the cluster 
rebalance. Once you've replaced the disk controller, start them up again 
and Ceph will rebalance back again.


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Should ceph build against libcurl4 for Ubuntu 18.04 and later?

2018-12-13 Thread Matthew Vernon

Hi,

Sorry for the slow reply.

On 26/11/2018 17:11, Ken Dreyer wrote:

On Thu, Nov 22, 2018 at 11:47 AM Matthew Vernon  wrote:


On 22/11/2018 13:40, Paul Emmerich wrote:

We've encountered the same problem on Debian Buster


It looks to me like this could be fixed simply by building the Bionic
packages in a Bionic chroot (ditto Buster); maybe that could be done in
future? Given I think the packaging process is being reviewed anyway at
the moment (hopefully 12.2.10 will be along at some point...)


That's how we're building it currently. We build ceph in pbuilder
chroots that correspond to each distro.

On master, debian/control has Build-Depends: libcurl4-openssl-dev so
I'm not sure why we'd end up with a dependency on libcurl3.

Would you please give me a minimal set of `apt-get` reproduction steps
on Bionic for this issue? Then we can get it into tracker.ceph.com.


The problem is a bit different to what I thought: there is only 1 
package in the bionic Release file on ceph.com, and that's ceph-deploy:


matthew@aragorn:/tmp$ curl -s 
'http://download.ceph.com/debian-luminous/dists/bionic/main/binary-amd64/Packages' 
-o bionicPackages

matthew@aragorn:/tmp$ grep Package bionicPackages
Package: ceph-deploy
matthew@aragorn:/tmp$ curl -s 
'http://download.ceph.com/debian-luminous/dists/xenial/main/binary-amd64/Packages' 
-o xenialPackages

matthew@aragorn:/tmp$ grep -c Package xenialPackages
63

(i.e. there are 63 packages in the Xenial distribution of luminous, and 
only 1 in the Bionic one).


So if you have a Xenial system running luminous and upgrade it to 
Bionic, then you'll keep the Xenial/luminous packages (since they're 
more recent than the Ubuntu-supplied Bionic packages), and encounter the 
problem that they depend on libcurl3, which is incompatible with 
Bionic's curl:


root@sto-j1-2:~# dpkg -s radosgw | sed -ne '/Depends/s/.*, 
\(libcurl[^,]*\),.*/\1/p'

libcurl3 (>= 7.28.0)

So the problem is really that the Bionic-built packages are not making 
into the release (presumably because they don't make it to the mirror?)


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Empty Luminous RGW pool using 7TiB of data

2018-12-06 Thread Matthew Vernon
Hi,

I've been benchmarking my Luminous test cluster, the s3 user has deleted
all objects and buckets, and yet the RGW data pool is using 7TiB of data:
default.rgw.buckets.data 11 7.16TiB  3.27212TiB  1975644

There are no buckets left (radosgw-admin bucket list returns []), and
the only user is using no quota.

radosgw-admin gc list doesn't show anything pending gc; if I do
--include-all there are some:

[
{
"tag": "01a3b9f4-d6e8-4ac6-a44f-3ebb53dcee1b.3099907.1022170\u",
"time": "2018-12-06 13:36:59.0.88218s",
"objs": [
{
"pool": "default.rgw.buckets.data",
"oid":
"01a3b9f4-d6e8-4ac6-a44f-3ebb53dcee1b.3665142.15__multipart_b713be7d5b86b2fa51830f7c13092223.2~7_mvHIZc-L8mOFy51hkGnZbn4ihgOXR.1",
"key": "",
"instance": ""
},
[continues for 16k lines]

What am I meant to do about this? OK, it's a test system so I could blow
the pool away and start again, but I'd like to know what the underlying
issue is and how I'd manage this on a production cluster.

We've previously had data-loss issues with using orphans find (i.e. it
found things that were not orphans)... :(

Regards,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cephalocon (was Re: CentOS Dojo at Oak Ridge, Tennessee CFP is now open!)

2018-12-04 Thread Matthew Vernon

On 03/12/2018 22:46, Mike Perez wrote:


Also as a reminder, lets try to coordinate our submissions on the CFP
coordination pad:

https://pad.ceph.com/p/cfp-coordination


I see that mentions a Cephalocon in Barcelona in May. Did I miss an 
announcement about that?


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 12.2.9 release

2018-11-23 Thread Matthew Vernon
On 07/11/2018 23:28, Neha Ojha wrote:

> For those who haven't upgraded to 12.2.9 -
> 
> Please avoid this release and wait for 12.2.10.

Any idea when 12.2.10 is going to be here, please?

Regards,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Should ceph build against libcurl4 for Ubuntu 18.04 and later?

2018-11-22 Thread Matthew Vernon

On 22/11/2018 13:40, Paul Emmerich wrote:

We've encountered the same problem on Debian Buster


It looks to me like this could be fixed simply by building the Bionic 
packages in a Bionic chroot (ditto Buster); maybe that could be done in 
future? Given I think the packaging process is being reviewed anyway at 
the moment (hopefully 12.2.10 will be along at some point...)


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Should ceph build against libcurl4 for Ubuntu 18.04 and later?

2018-11-22 Thread Matthew Vernon

Hi,

The ceph.com ceph luminous packages for Ubuntu Bionic still depend on 
libcurl3 (specifically ceph-common, radosgw. librgw2 all depend on 
libcurl3 (>= 7.28.0)).


This means that anything that depends on libcurl4 (which is the default 
libcurl in bionic) isn't co-installable with ceph. That includes the 
"curl" binary itself, which we've been using in a number of our scripts 
/ tests / etc. I would expect this to make ceph-test uninstallable on 
Bionic also...


...so shouldn't ceph packages for Bionic and later releases be compiled 
against libcurl4 (and thus Depend upon it)? The same will apply to the 
next Debian release, I expect.


The curl authors claim the API doesn't have any incompatible changes.

Regards,

Matthew
[the two packages libcurl3 and libcurl4 are not co-installable because 
libcurl3 includes a libcurl.so.4 for historical reasons :-( ]



--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG auto repair with BlueStore

2018-11-15 Thread Matthew Vernon

Hi,

[apropos auto-repair for scrub settings]

On 15/11/2018 18:45, Mark Schouten wrote:

As a user, I’m very surprised that this isn’t a default setting.


We've been to cowardly to do it so far; even on a large cluster the 
occasional ceph pg repair hasn't taken up too much admin time, and the 
fact it isn't enabled by default has put us off. This sometimes helps us 
spot OSD drives "on the way out" that haven't actually failed yet, but 
I'd be in favour of auto-repair iff we're confident it's safe (to be 
fair, ceph pg repair is the first port of call anyway, so it's not clear 
what we gain by having a human type it).


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Unhelpful behaviour of ceph-volume lvm batch with >1 NVME card for block.db

2018-11-14 Thread Matthew Vernon
Hi,

We currently deploy our filestore OSDs with ceph-disk (via
ceph-ansible), and I was looking at using ceph-volume as we migrate to
bluestore.

Our servers have 60 OSDs and 2 NVME cards; each OSD is made up of a
single hdd, and an NVME partition for journal.

If, however, I do:
ceph-volume lvm batch /dev/sda /dev/sdb [...] /dev/nvme0n1 /dev/nvme1n1
then I get (inter alia):

Solid State VG:
  Targets:   block.db  Total size: 1.82 TB
  Total LVs: 2 Size per LV: 931.51 GB

  Devices:   /dev/nvme0n1, /dev/nvme1n1

i.e. ceph-volume is going to make a single VG containing both NVME
devices, and split that up into LVs to use for block.db

It seems to me that this is straightforwardly the wrong answer - either
NVME failing will now take out *every* OSD on the host, whereas the
obvious alternative (one VG per NVME, divide those into LVs) would give
you just as good performance, but you'd only lose 1/2 the OSDs if an
NVME card failed.

Am I missing something obvious here?

I appreciate I /could/ do it all myself, but even using ceph-ansible
that's going to be very tiresome...

Regards,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Packaging bug breaks Jewel -> Luminous upgrade

2018-11-08 Thread Matthew Vernon

Hi,

On 08/11/2018 22:38, Ken Dreyer wrote:


What's the full apt-get command you're running?


I wasn't using apt-get, because the ceph repository has the broken 
12.2.9 packages in it (and I didn't want to install them, obviously); so 
I downloaded all the .debs I needed, installed the dependencies, then 
did dpkg -i [list of .debs]


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Packaging bug breaks Jewel -> Luminous upgrade

2018-11-08 Thread Matthew Vernon

On 08/11/2018 16:31, Matthew Vernon wrote:

Hi,

in Jewel, /etc/bash_completion.d/radosgw-admin is in the radosgw package
In Luminous, /etc/bash_completion.d/radosgw-admin is in the ceph-common 
package


...so if you try and upgrade, you get:

Unpacking ceph-common (12.2.8-1xenial) over (10.2.9-0ubuntu0.16.04.1) ...
dpkg: error processing archive ceph-common_12.2.8-1xenial_amd64.deb 
(--install):
  trying to overwrite '/etc/bash_completion.d/radosgw-admin', which is 
also in package radosgw 10.2.9-0ubuntu0.16.04.1


I submitted 2 PRs to fix this - 24996 for master, and 24997 for 
luminous; it'd be nice if the latter could make it into 12.2.10? :-)


Regards,

Matthew



--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Packaging bug breaks Jewel -> Luminous upgrade

2018-11-08 Thread Matthew Vernon

On 08/11/2018 16:31, Matthew Vernon wrote:

The exact versioning would depend on when the move was made (I presume 
either Jewel -> Kraken or Kraken -> Luminous). Does anyone know?


To answer my own question, this went into 12.0.3 via
https://github.com/ceph/ceph/commit/9fd30b93f7281fad70b93512f0a25e3465f5b225

Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Packaging bug breaks Jewel -> Luminous upgrade

2018-11-08 Thread Matthew Vernon

Hi,

in Jewel, /etc/bash_completion.d/radosgw-admin is in the radosgw package
In Luminous, /etc/bash_completion.d/radosgw-admin is in the ceph-common 
package


...so if you try and upgrade, you get:

Unpacking ceph-common (12.2.8-1xenial) over (10.2.9-0ubuntu0.16.04.1) ...
dpkg: error processing archive ceph-common_12.2.8-1xenial_amd64.deb 
(--install):
 trying to overwrite '/etc/bash_completion.d/radosgw-admin', which is 
also in package radosgw 10.2.9-0ubuntu0.16.04.1


This is a packaging bug - ceph-common needs to declare (via Replaces and 
Breaks) that it's taking over some of the radosgw package -

https://www.debian.org/doc/debian-policy/ch-relationships.html#overwriting-files-in-other-packages

The exact versioning would depend on when the move was made (I presume 
either Jewel -> Kraken or Kraken -> Luminous). Does anyone know?


[would you like this reported formally, or is the fix trivial enough to 
just be done? :-) ]


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 12.2.9 release

2018-11-08 Thread Matthew Vernon

On 08/11/2018 09:17, Marc Roos wrote:
  
And that is why I don't like ceph-deploy. Unless you have maybe hundreds

of disks, I don’t see why you cannot install it "manually".


...as the recent ceph survey showed, plenty of people have hundreds of 
disks! Ceph is meant to be operated at scale, which is why many admins 
will have automation (ceph-ansible, etc.) in place.


[our test clusters are 180 OSDs...]

Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 12.2.9 release

2018-11-07 Thread Matthew Vernon
On 07/11/2018 14:16, Marc Roos wrote:
>  
> 
> I don't see the problem. I am installing only the ceph updates when 
> others have done this and are running several weeks without problems. I 
> have noticed this 12.2.9 availability also, did not see any release 
> notes, so why install it? Especially with recent issues of other 
> releases.

Relevantly, if you want to upgrade to Luminous in many of the obvious
ways, you'll end up with 12.2.9.

Regards,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 12.2.9 release

2018-11-07 Thread Matthew Vernon
On 07/11/2018 10:59, Konstantin Shalygin wrote:
>> I wonder if there is any release announcement for ceph 12.2.9 that I missed.
>> I just found the new packages on download.ceph.com, is this an official
>> release?
> 
> This is because 12.2.9 have a several bugs. You should avoid to use this
> release and wait for 12.2.10

It seems that maybe something isn't quite right in the release
infrastructure, then? The 12.2.8 packages are still available, but e.g.
debian-luminous's Packages file is pointing to the 12.2.9 (broken) packages.

Could the Debian/Ubuntu repos only have their releases updated (as
opposed to what's in the pool) for safe/official releases? It's one
thing letting people find pre-release things if they go looking, but
ISTM that arranging that a mis-timed apt-get update/upgrade might
install known-broken packages is ... unfortunate.

Regards,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrate/convert replicated pool to EC?

2018-10-31 Thread Matthew Vernon

Hi,
On 10/26/18 2:55 PM, David Turner wrote:

It is indeed adding a placement target and not removing it replacing the
pool. The get/put wouldn't be a rados or even ceph command, you would do
it through an s3 client.


Which is an interesting idea, but presumably there's no way of knowing 
which S3 objects are in which underlying pool, beyond knowing the date 
at which a particular S3 user account was pointed at a new pool?


Regards,

Matthew



--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrate/convert replicated pool to EC?

2018-10-26 Thread Matthew Vernon
Hi,

On 26/10/2018 12:38, Alexandru Cucu wrote:

> Have a look at this article:> 
> https://ceph.com/geen-categorie/ceph-pool-migration/

Thanks; that all looks pretty hairy especially for a large pool (ceph df
says 1353T / 428,547,935 objects)...

...so something a bit more controlled/gradual and less
manual-error-prone would make me happier!

Regards,

Matthew



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrate/convert replicated pool to EC?

2018-10-26 Thread Matthew Vernon
Hi,

On 25/10/2018 17:57, David Turner wrote:
> There are no tools to migrate in either direction between EC and
> Replica. You can't even migrate an EC pool to a new EC profile.

Oh well :-/

> With RGW you can create a new data pool and new objects will be written
> to the new pool. If your objects have a lifecycle, then eventually
> you'll be to the new pool over time. Otherwise you can get there by
> rewriting all of the objects manually.

How does this work? I presume if I just change data_pool then everyone
will lose things currently in S3? So I guess this would be adding
another placement_target (can it share an index pool, or do I need a new
one of those too?) with the new pool, and making it the default_placement...

If I do that, is there a way to do manual migration of objects in
parallel? I imagine a dumb rados get/put or similar won't do the correct
plumbing...

Thanks,

Matthew



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Migrate/convert replicated pool to EC?

2018-10-25 Thread Matthew Vernon

Hi,

I thought I'd seen that it was possible to migrate a replicated pool to 
being erasure-coded (but not the converse); but I'm failing to find 
anything that says _how_.


Have I misremembered? Can you migrate a replicated pool to EC? (if so, how?)

...our use case is moving our S3 pool which is quite large, so if we can 
convert in-place that would be ideal...


Thanks,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mimic and Debian 9

2018-10-18 Thread Matthew Vernon

On 17/10/18 15:23, Paul Emmerich wrote:

[apropos building Mimic on Debian 9]


apt-get install -y g++ libc6-dbg libc6 -t testing
apt-get install -y git build-essential cmake


I wonder if you could avoid the "need a newer libc" issue by using 
backported versions of cmake/g++ ?


Regards,

Matthew



--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-objectstore-tool manual

2018-10-15 Thread Matthew Vernon
Hi,

On 15/10/18 11:44, Vincent Godin wrote:
> Does a man exist on ceph-objectstore-tool ? if yes, where can i find it ?

No, but there is some --help output:

root@sto-1-1:~# ceph-objectstore-tool --help

Allowed options:
  --help  produce help message
  --type arg  Arg is one of [filestore (default), memstore]
  --data-path arg path to object store, mandatory
  --journal-path arg  path to journal, mandatory for filestore type
  --pgid arg  PG id, mandatory for info, log, remove,
export,
  rm-past-intervals, mark-complete, and
mandatory
  for apply-layout-settings if --pool is not
  specified
  --pool arg  Pool name, mandatory for
apply-layout-settings if
  --pgid is not specified
  --op argArg is one of [info, log, remove, mkfs, fsck,
  fuse, export, import, list, fix-lost,
list-pgs,
  rm-past-intervals, dump-journal, dump-super,
  meta-list, get-osdmap, set-osdmap,
  get-inc-osdmap, set-inc-osdmap, mark-complete,
  apply-layout-settings, update-mon-db]
  --epoch arg epoch# for get-osdmap and get-inc-osdmap, the
  current epoch in use if not specified
  --file arg  path of file to export, import, get-osdmap,
  set-osdmap, get-inc-osdmap or set-inc-osdmap
  --mon-store-path argpath of monstore to update-mon-db
  --mountpoint argfuse mountpoint
  --format arg (=json-pretty) Output format which may be json, json-pretty,
  xml, xml-pretty
  --debug Enable diagnostic output to stderr
  --force Ignore some types of errors and proceed with
  operation - USE WITH CAUTION: CORRUPTION
POSSIBLE
  NOW OR IN THE FUTURE
  --skip-journal-replay   Disable journal replay
  --skip-mount-omap   Disable mounting of omap
  --head  Find head/snapdir when searching for
objects by
  name
  --dry-run   Don't modify the objectstore


Positional syntax:

ceph-objectstore-tool ...  (get|set)-bytes [file]
ceph-objectstore-tool ...  set-(attr|omap)  [file]
ceph-objectstore-tool ...  (get|rm)-(attr|omap) 
ceph-objectstore-tool ...  get-omaphdr
ceph-objectstore-tool ...  set-omaphdr [file]
ceph-objectstore-tool ...  list-attrs
ceph-objectstore-tool ...  list-omap
ceph-objectstore-tool ...  remove
ceph-objectstore-tool ...  dump
ceph-objectstore-tool ...  set-size
ceph-objectstore-tool ...  remove-clone-metadata 

 can be a JSON object description as displayed
by --op list.
 can be an object name which will be looked up in all
the OSD's PGs.
 can be the empty string ('') which with a provided pgid
specifies the pgmeta object

The optional [file] argument will read stdin or write stdout
if not specified or if '-' specified.

[that's for the Jewel version]

HTH,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph cluster monitoring tool

2018-07-24 Thread Matthew Vernon
Hi,

On 24/07/18 06:02, Satish Patel wrote:
> My 5 node ceph cluster is ready for production, now i am looking for
> good monitoring tool (Open source), what majority of folks using in
> their production?

This does come up from time to time, so it's worth checking the list
archives.

We use collected to collect metrics, graphite to store them (we've found
it much easier to look after than influxdb), and grafana to plot them, e.g.

https://cog.sanger.ac.uk/ceph_dashboard/ceph-dashboard-may2018.png

Regards,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Self shutdown of 1 whole system (Derbian stretch/Ceph 12.2.7/bluestore)

2018-07-23 Thread Matthew Vernon
Hi,

> One of my server silently shutdown last night, with no explanation
> whatsoever in any logs. According to the existing logs, the shutdown

We have seen similar things with our SuperMicro servers; our current
best theory is that it's related to CPU power management. Disabling it
in BIOS seems to have helped.

Regards,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Error bluestore doesn't support lvm

2018-07-23 Thread Matthew Vernon
Hi,

On 21/07/18 04:24, Satish Patel wrote:
> I am using openstack-ansible with ceph-ansible to deploy my Ceph
> custer and here is my config in yml file

You might like to know that there's a dedicated (if quiet!) list for
ceph-ansible - ceph-ansi...@lists.ceph.com

Regards,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Be careful with orphans find (was Re: Lost TB for Object storage)

2018-07-20 Thread Matthew Vernon
Hi,

On 19/07/18 17:19, CUZA Frédéric wrote:

> After that we tried to remove the orphans :
> 
> radosgw-admin orphans find –pool= default.rgw.buckets.data
> --job-id=ophans_clean
> 
> radosgw-admin orphans finish --job-id=ophans_clean
> 
> It finds some orphans : 85, but the command finish seems not to work, so
> we decided to manually delete those ophans by piping the output of find
> in a log file.

I would advise caution with using the "orphans find" code in
radosgw-admin. On the advice of our vendor, we ran this and
automatically removed the resulting objects. Unfortunately, a small
proportion of the objects found and removed thus were not in fact
orphans - meaning we ended up with some damaged S3 objects; they
appeared in bucket listings, but you'd get 404 if you tried to download
them.

We have asked our vendor to make the wider community aware of the issue,
but they have not (yet) done so.

Regards,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Jewel PG stuck inconsistent with 3 0-size objects

2018-07-18 Thread Matthew Vernon
Hi,

On 17/07/18 01:29, Brad Hubbard wrote:
> Your issue is different since not only do the omap digests of all
> replicas not match the omap digest from the auth object info but they
> are all different to each other.
> 
> What is min_size of pool 67 and what can you tell us about the events
> leading up to this?

min_size is 2 ; pool 67 is default.rgw.buckets.index.
This is a moderately-large (3060 OSD) cluster that's been running for a
while; we upgraded to 10.2.9 (from 10.2.6, also from Ubuntu) about a
week ago.

>> rados -p default.rgw.buckets.index setomapval
>> .dir.861ae926-7ff0-48c5-86d6-a6ba8d0a7a14.7130858.6 temporary-key anything
>> [deep-scrub]
>> rados -p default.rgw.buckets.index rmomapkey
>> .dir.861ae926-7ff0-48c5-86d6-a6ba8d0a7a14.7130858.6 temporary-key

We did this, and it does appear to have resolved the issue (the pg is
now happy).

Regards,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Jewel PG stuck inconsistent with 3 0-size objects

2018-07-16 Thread Matthew Vernon
Hi,

Our cluster is running 10.2.9 (from Ubuntu; on 16.04 LTS), and we have a
pg that's stuck inconsistent; if I repair it, it logs "failed to pick
suitable auth object" (repair log attached, to try and stop my MUA
mangling it).

We then deep-scrubbed that pg, at which point
rados list-inconsistent-obj 67.2e --format=json-pretty produces a bit of
output (also attached), which includes that all 3 osds have a zero-sized
object e.g.

"osd": 1937,
"errors": [
"omap_digest_mismatch_oi"
],
"size": 0,
"omap_digest": "0x45773901",
"data_digest": "0x"

All 3 osds have different omap_digest, but all have 0 size. Indeed,
looking on the OSD disks directly, each object is 0 size (i.e. they are
identical).

This looks similar to one of the failure modes in
http://tracker.ceph.com/issues/21388 where the is a suggestion (comment
19 from David Zafman) to do:

rados -p default.rgw.buckets.index setomapval
.dir.861ae926-7ff0-48c5-86d6-a6ba8d0a7a14.7130858.6 temporary-key anything
[deep-scrub]
rados -p default.rgw.buckets.index rmomapkey
.dir.861ae926-7ff0-48c5-86d6-a6ba8d0a7a14.7130858.6 temporary-key

Is this likely to be the correct approach here, to? And is there an
underlying bug in ceph that still needs fixing? :)

Thanks,

Matthew



-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 2018-07-16 09:17:33.351755 7f058a047700  0 log_channel(cluster) log [INF] : 
67.2e repair starts
2018-07-16 09:17:51.521378 7f0587842700 -1 log_channel(cluster) log [ERR] : 
67.2e shard 1937: soid 
67:7463f933:::.dir.861ae926-7ff0-48c5-86d6-a6ba8d0a7a14.7130858.6:head 
omap_digest 0x45773901 != omap_digest 0x952ce474 from auth oi 
67:7463f933:::.dir.861ae926-7ff0-48c5-86d6-a6ba8d0a7a14.7130858.6:head(444843'17812260
 osd.1987.0:16910852 dirty|omap|data_digest|omap_digest s 0 uv 17812259 dd 
 od 952ce474 alloc_hint [0 0])
2018-07-16 09:17:51.521463 7f0587842700 -1 log_channel(cluster) log [ERR] : 
67.2e shard 1987: soid 
67:7463f933:::.dir.861ae926-7ff0-48c5-86d6-a6ba8d0a7a14.7130858.6:head 
omap_digest 0xec3afbe != omap_digest 0x45773901 from shard 1937, omap_digest 
0xec3afbe != omap_digest 0x952ce474 from auth oi 
67:7463f933:::.dir.861ae926-7ff0-48c5-86d6-a6ba8d0a7a14.7130858.6:head(444843'17812260
 osd.1987.0:16910852 dirty|omap|data_digest|omap_digest s 0 uv 17812259 dd 
 od 952ce474 alloc_hint [0 0])
2018-07-16 09:17:51.521653 7f0587842700 -1 log_channel(cluster) log [ERR] : 
67.2e shard 2796: soid 
67:7463f933:::.dir.861ae926-7ff0-48c5-86d6-a6ba8d0a7a14.7130858.6:head 
omap_digest 0x5eec6452 != omap_digest 0x45773901 from shard 1937, omap_digest 
0x5eec6452 != omap_digest 0x952ce474 from auth oi 
67:7463f933:::.dir.861ae926-7ff0-48c5-86d6-a6ba8d0a7a14.7130858.6:head(444843'17812260
 osd.1987.0:16910852 dirty|omap|data_digest|omap_digest s 0 uv 17812259 dd 
 od 952ce474 alloc_hint [0 0])
2018-07-16 09:17:51.521702 7f0587842700 -1 log_channel(cluster) log [ERR] : 
67.2e soid 
67:7463f933:::.dir.861ae926-7ff0-48c5-86d6-a6ba8d0a7a14.7130858.6:head: failed 
to pick suitable auth object
2018-07-16 09:17:51.521988 7f0587842700 -1 log_channel(cluster) log [ERR] : 
67.2e repair 4 errors, 0 fixed
{
"epoch": 514919,
"inconsistents": [
{
"object": {
"name": ".dir.861ae926-7ff0-48c5-86d6-a6ba8d0a7a14.7130858.6",
"nspace": "",
"locator": "",
"snap": "head",
"version": 17812259
},
"errors": [
"omap_digest_mismatch"
],
"union_shard_errors": [
"omap_digest_mismatch_oi"
],
"selected_object_info": 
"67:7463f933:::.dir.861ae926-7ff0-48c5-86d6-a6ba8d0a7a14.7130858.6:head(444843'17812260
 osd.1987.0:16910852 dirty|omap|data_digest|omap_digest s 0 uv 17812259 dd 
 od 952ce474 alloc_hint [0 0])",
"shards": [
{
"osd": 1937,
"errors": [
"omap_digest_mismatch_oi"
],
"size": 0,
"omap_digest": "0x45773901",
"data_digest": "0x"
},
{
"osd": 1987,
"errors": [
"omap_digest_mismatch_oi"
],
"size": 0,
"omap_digest": "0x0ec3afbe",
"data_digest": "0x"
},
{
"osd": 2796,
"errors": [
"omap_digest_mismatch_oi"
],

[ceph-users] RGW bucket sharding in Jewel

2018-06-19 Thread Matthew Vernon
Hi,

Some of our users have Quite Large buckets (up to 20M objects in a
bucket), and AIUI best practice would be to have sharded indexes for
those buckets (of the order of 1 shard per 100k objects).

On a trivial test case (make a 1M-object bucket, shard index to 10
shards, s3cmd ls s3://bucket >/dev/null), sharding makes the bucket
listing slower (not a lot, but a bit).

Are there simple(ish) workflows we could use to demonstrate an
improvement from index sharding?

Thanks,

Matthew

[I understand that Luminous has dynamic resharding, but it seems a bit
unstable for production use; is that still the case?]


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] a big cluster or several small

2018-05-16 Thread Matthew Vernon
Hi,

On 14/05/18 17:49, Marc Boisis wrote:

> Currently we have a 294 OSD (21 hosts/3 racks) cluster with RBD clients
> only, 1 single pool (size=3).

That's not a large cluster.

> We want to divide this cluster into several to minimize the risk in case
> of failure/crash.
> For example, a cluster for the mail, another for the file servers, a
> test cluster ...
> Do you think it's a good idea ?

I'd venture the opinion that you cluster isn't yet big enough to be
thinking about that; you get increased reliability with a larger cluster
(each disk failure is a smaller % of the whole, for example); our
largest cluster here is 3060 OSDs...

We've grown this from a start of 540 OSDs.

> Do you have experience feedback on multiple clusters in production on
> the same hardware:

I think if you did want to have multiple clusters, you'd want to have
each cluster on different hardware.

Regards,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Place on separate hosts?

2018-05-04 Thread Matthew Vernon
Hi,

On 04/05/18 08:25, Tracy Reed wrote:
> On Fri, May 04, 2018 at 12:18:15AM PDT, Tracy Reed spake thusly:
>> https://jcftang.github.io/2012/09/06/going-from-replicating-across-osds-to-replicating-across-hosts-in-a-ceph-cluster/
> 
>> How can I tell which way mine is configured? I could post the whole
>> crushmap if necessary but it's a bit large to copy and paste.
> 
> To further answer my own question (sorry for the spam) the above linked
> doc says this should do what I want:
> 
> step chooseleaf firstn 0 type host
> 
> which is what I already have in my crush map. So it looks like the
> default is as I want it. In which case I wonder why I had the problem
> previously... I guess the only way to know for sure is to stop one osd
> node and see what happens.

You can ask ceph which OSDs a particular pg is on:

root@sto-1-1:~# ceph pg map 71.983
osdmap e435728 pg 71.983 (71.983) -> up [1948,2984,511] acting
[1948,2984,511]

...then you can check these are on different hosts.

HTH,

Matthew



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] bug in rgw quota calculation?

2018-04-19 Thread Matthew Vernon

Hi,

TL;DR there seems to be a problem with quota calculation for rgw in our 
Jewel / Ubuntu 16.04 cluster. Our support people suggested we raise it 
with upstream directly; before I open a tracker item I'd like to check 
I've not missed something obvious :)


Our cluster is running Jewel on Ubuntu 16.40 (rgw version 
10.2.7-0ubuntu0.16.04.2~sanger1 [0]). A user complained that they'd 
deleted a bucket with lots of part-uploaded bits in but their quota was 
still being treated as if the contents were still there.


rgw-admin user stats --sync-stats reports:
"total_entries": 6590,
"total_bytes": 1767693041863,
"total_bytes_rounded": 1767700045824

if I do bucket stats --uid=as45 (or search the output of bucket stats by 
hand), I find 4 buckets who sum to: (details in footnote 1)

num_objects: 3370
size_kb: 774880722
size_kb_actual: 774887560

taking the larger of these x1024 is 793,484,861,440, considerably 
smaller than the quota number above.


We have done "bucket check" on all the users' buckets, all return 0. We 
have done "orphan find" and removed all the leaked objects returned.


I attach the output of
radosgw-admin -n client.rgw.sto-1-2 user stats --sync-stats --uid=as45 
--debug_rgw=20 >/tmp/rgwoutput2 2>&1


(compressed).

This looks like a bug to me; should I open a tracker item?

Thanks,

Matthew

[0] The Sanger1 suffix is a RH-provided patch to fix a MIME issue with 
uploads

[1] 4 buckets:
"size_kb": 33599604,
"size_kb_actual": 33602384,
"num_objects": 1390
"size_kb": 0,
"size_kb_actual": 0,
"num_objects": 0
"size_kb": 707556170,
"size_kb_actual": 707556172,
"num_objects": 2
"size_kb": 33724948,
"size_kb_actual": 33729004,
"num_objects": 1978



--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 

rgwoutput2.gz
Description: application/gzip
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] User deletes bucket with partial multipart uploads in, objects still in quota

2018-04-04 Thread Matthew Vernon
On 04/04/18 10:30, Matthew Vernon wrote:
> Hi,
> 
> We have an rgw user who had a bunch of partial multipart uploads in a
> bucket, which they then deleted. radosgw-admin bucket list doesn't show
> the bucket any more, but  user stats --sync-stats still has (I think)
> the contents of that bucket counted against the users' quota.
> 
> So, err, how do I cause a) the users' quota usage to not include this
> deleted bucket b) the associated storage to actually be cleared (since I
> infer the failure to do so is causing the quota issue)?

Sorry, should have said: this is running jewel.

Regards,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] User deletes bucket with partial multipart uploads in, objects still in quota

2018-04-04 Thread Matthew Vernon
Hi,

We have an rgw user who had a bunch of partial multipart uploads in a
bucket, which they then deleted. radosgw-admin bucket list doesn't show
the bucket any more, but  user stats --sync-stats still has (I think)
the contents of that bucket counted against the users' quota.

So, err, how do I cause a) the users' quota usage to not include this
deleted bucket b) the associated storage to actually be cleared (since I
infer the failure to do so is causing the quota issue)?

Thanks,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] What do you use to benchmark your rgw?

2018-03-28 Thread Matthew Vernon
Hi,

What are people here using to benchmark their S3 service (i.e. the rgw)?
rados bench is great for some things, but doesn't tell me about what
performance I can get from my rgws.

It seems that there used to be rest-bench, but that isn't in Jewel
AFAICT; I had a bit of a look at cosbench but it looks fiddly to set up
and a bit under-maintained (the most recent version doesn't work out of
the box, and the PR to fix that has been languishing for a while).

This doesn't seem like an unusual thing to want to do, so I'd like to
know what other ceph folk are using (and, if you like, the numbers you
get from the benchmarkers)...?

Thanks,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Sizing your MON storage with a large cluster

2018-02-09 Thread Matthew Vernon
On 05/02/18 15:54, Wes Dillingham wrote:
> Good data point on not trimming when non active+clean PGs are present.
> So am I reading this correct? It grew to 32GB? Did it end up growing
> beyond that, what was the max?

The largest Mon store size I've seen (in a 3000-OSD cluster) was about 66GB.

Regards,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] One OSD misbehaving (spinning 100% CPU, delayed ops)

2017-12-14 Thread Matthew Vernon
On 29/11/17 17:24, Matthew Vernon wrote:

> We have a 3,060 OSD ceph cluster (running Jewel
> 10.2.7-0ubuntu0.16.04.1), and one OSD on one host keeps misbehaving - by
> which I mean it keeps spinning ~100% CPU (cf ~5% for other OSDs on that
> host), and having ops blocking on it for some time. It will then behave
> for a bit, and then go back to doing this.
> 
> It's always the same OSD, and we've tried replacing the underlying disk.
> 
> The logs have lots of entries of the form
> 
> 2017-11-29 17:18:51.097230 7fcc06919700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fcc29fec700' had timed out after 15

Thanks for the various helpful suggestions in response to this. In case
you're interested (and for the archives), the answer was Gnocchi - all
the slow requests were for a particular pool, which is where we were
sending metrics from an OpenStack instance. Gnocchi less than version
4.0 is, I learn, known to kill ceph because its use of librados is
rather badly behaved. Newer OpenStacks (from Pike, I think) use a newer
Gnocchi. We stopped ceilometer and gnocchi, and the problem went away.
Thanks are due to RedHat support for finding this for us :)

Regards,

Matthew


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] One OSD misbehaving (spinning 100% CPU, delayed ops)

2017-11-29 Thread Matthew Vernon
Hi,

We have a 3,060 OSD ceph cluster (running Jewel
10.2.7-0ubuntu0.16.04.1), and one OSD on one host keeps misbehaving - by
which I mean it keeps spinning ~100% CPU (cf ~5% for other OSDs on that
host), and having ops blocking on it for some time. It will then behave
for a bit, and then go back to doing this.

It's always the same OSD, and we've tried replacing the underlying disk.

The logs have lots of entries of the form

2017-11-29 17:18:51.097230 7fcc06919700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fcc29fec700' had timed out after 15

I've had a brief poke through the collectd metrics for this osd (and
comparing them with other OSDs on the same host) but other than showing
spikes in latency for that OSD (iostat et al show no issues with the
underlying disk) there's nothing obviously explanatory.

I tried ceph tell osd.2054 injectargs --osd-op-thread-timeout 90 (which
is what googling for the above message suggests), but that just said
"unchangeable", and didn't seem to make any difference.

Any ideas? Other metrics to consider? ...

Thanks,

Matthew


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] findmnt (was Re: Migration from filestore to blustore)

2017-11-20 Thread Matthew Vernon
Hi,

On 20/11/17 15:00, Gerhard W. Recher wrote:

Just interjecting here because I keep seeing things like this, and
they're often buggy, and there's an easy answer:

> DEVICE=`mount | grep /var/lib/ceph/osd/ceph-$ID| cut -f1 -d"p"`

findmnt(8) is your friend, any time you want to find out about mounted
filesystems, and much more reliable than grepping the output of mount or
/proc/mtab/ or whatever (consider if ID is 1 and you have ceph-1 and
ceph-10 mounted on the host, for example).

findmnt -T "/var/lib/ceph/osd/ceph-$id" -n -o SOURCE

is probably what you wanted here. Findmnt is in util-linux, and should
be in all non-ancient distributions.

Here ends the message from the findmnt(8) appreciation society :)

Regards,

Matthew


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] killing ceph-disk [was Re: ceph-volume: migration and disk partition support]

2017-10-12 Thread Matthew Vernon

Hi,

On 09/10/17 16:09, Sage Weil wrote:

To put this in context, the goal here is to kill ceph-disk in mimic.

One proposal is to make it so new OSDs can *only* be deployed with LVM,
and old OSDs with the ceph-disk GPT partitions would be started via
ceph-volume support that can only start (but not deploy new) OSDs in that
style.

Is the LVM-only-ness concerning to anyone?

Looking further forward, NVMe OSDs will probably be handled a bit
differently, as they'll eventually be using SPDK and kernel-bypass (hence,
no LVM).  For the time being, though, they would use LVM.


This seems the best point to jump in on this thread. We have a ceph 
(Jewel / Ubuntu 16.04) cluster with around 3k OSDs, deployed with 
ceph-ansible. They are plain-disk OSDs with journal on NVME partitions. 
I don't think this is an unusual configuration :)


I think to get rid of ceph-disk, we would want at least some of the 
following:


* solid scripting for "move slowly through cluster migrating OSDs from 
disk to lvm" - 1 OSD at a time isn't going to produce unacceptable 
rebalance load, but it is going to take a long time, so such scripting 
would have to cope with being stopped and restarted and suchlike (and be 
able to use the correct journal partitions)


* ceph-ansible support for "some lvm, some plain disk" arrangements - 
presuming a "create new OSDs as lvm" approach when adding new OSDs or 
replacing failed disks


* support for plain disk (regardless of what provides it) that remains 
solid for some time yet



On Fri, 6 Oct 2017, Alfredo Deza wrote:



Bluestore support should be the next step for `ceph-volume lvm`, and
while that is planned we are thinking of ways to improve the current
caveats (like OSDs not coming up) for clusters that have deployed OSDs
with ceph-disk.


These issues seem mostly to be down to timeouts being too short and the 
single global lock for activating OSDs.



IMO we can't require any kind of data migration in order to upgrade, which
means we either have to (1) keep ceph-disk around indefinitely, or (2)
teach ceph-volume to start existing GPT-style OSDs.  Given all of the
flakiness around udev, I'm partial to #2.  The big question for me is
whether #2 alone is sufficient, or whether ceph-volume should also know
how to provision new OSDs using partitions and no LVM.  Hopefully not?


I think this depends on how well tools such as ceph-ansible can cope 
with mixed OSD types (my feeling at the moment is "not terribly well", 
but I may be being unfair).


Regards,

Matthew



--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cephalocon 2018?

2017-10-12 Thread Matthew Vernon

Hi,

The recent FOSDEM CFP reminded me to wonder if there's likely to be a 
Cephalocon in 2018? It was mentioned as a possibility when the 2017 one 
was cancelled...


Regards,

Matthew


--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph monitoring

2017-10-03 Thread Matthew Vernon
On 02/10/17 20:26, Erik McCormick wrote:
> On Mon, Oct 2, 2017 at 11:55 AM, Matthew Vernon <m...@sanger.ac.uk> wrote:
>> Making a dashboard is rather a matter of personal preference - we plot
>> client and s3 i/o, network, server load & CPU use, and have indicator
>> plots for numbers of osds up, and monitor quorum.
>>
>> [I could share our dashboard JSON, but it's obviously specific to our
>> data sources]
> 
> I for one would love to see your dashboard. host and data source names
> can be easily replaced :)

OK. A screenshot is:
https://cog.sanger.ac.uk/ceph_dashboard/screenshot.png

(which should be self-explanatory - that's rather the point :) )

The json that builds it is:
https://cog.sanger.ac.uk/ceph_dashboard/dashboard.json

(you'd want to change the data source and hostnames to suit your own
install; sto-1-1 is one of our mon nodes).

HTH,

Matthew


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph monitoring

2017-10-02 Thread Matthew Vernon
On 02/10/17 12:34, Osama Hasebou wrote:
> Hi Everyone,
> 
> Is there a guide/tutorial about how to setup Ceph monitoring system
> using collectd / grafana / graphite ? Other suggestions are welcome as
> well !

We just installed the collectd plugin for ceph, and pointed it at our
grahphite server; that did most of what we wanted (we also needed a
script to monitor wear on our SSD devices).

Making a dashboard is rather a matter of personal preference - we plot
client and s3 i/o, network, server load & CPU use, and have indicator
plots for numbers of osds up, and monitor quorum.

[I could share our dashboard JSON, but it's obviously specific to our
data sources]

Regards,

Matthew


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph/systemd startup bug (was Re: Some OSDs are down after Server reboot)

2017-09-29 Thread Matthew Vernon
Hi,

On 29/09/17 01:00, Brad Hubbard wrote:
> This looks similar to
> https://bugzilla.redhat.com/show_bug.cgi?id=1458007 or one of the
> bugs/trackers attached to that.

Yes, although increasing the timeout still leaves the issue that if the
timeout fires you don't get anything resembling a useful error message
(because the logs from ceph-disk activate get binned when ceph-disk
trigger gets killed).

Regards,

Matthew


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph/systemd startup bug (was Re: Some OSDs are down after Server reboot)

2017-09-28 Thread Matthew Vernon

Hi,

TL;DR - the timeout setting in ceph-disk@.service is (far) too small - 
it needs increasing and/or removing entirely. Should I copy this to 
ceph-devel?


On 15/09/17 16:48, Matthew Vernon wrote:

On 14/09/17 16:26, Götz Reinicke wrote:

After that, 10 OSDs did not came up as the others. The disk did not get
mounted and the OSD processes did nothing … even after a couple of
minutes no more disks/OSDs showed up.


I'm still digging, but AFAICT it's a race condition in startup - in our
case, we're only seeing it if some of the filesystems aren't clean. This
may be related to the thread "Very slow start of osds after reboot" from
August, but I don't think any conclusion was reached there.


This annoyed me enough that I went off to find the problem :-)

On systemd-enabled machines[0] ceph disks are activated by systemd's 
ceph-disk@.service, which calls:


/bin/sh -c 'timeout 120 flock /var/lock/ceph-disk-$(basename %f) 
/usr/sbin/ceph-disk --verbose --log-stdout trigger --sync %f'


ceph-disk trigger --sync calls ceph-disk activate which (among other 
things) mounts the osd fs (first in a temporary location, then in 
/var/lib/ceph/osd/ once it's extracted the osd number from the fs). If 
the fs is unclean, XFS auto-recovers before mounting (which takes time - 
range 2-25s for our 6TB disks) Importantly, there is a single global 
lock file[1] so only one ceph-disk activate can be doing this at once.


So, each fs is auto-recovering one at at time (rather than in parallel), 
and once the elapsed time gets past 120s, timeout kills the flock, 
systemd kills the cgroup, and no more OSDs start up - we typically find 
a few fs mounted in /var/lib/ceph/tmp/mnt.. systemd keeps trying to 
start the remaining osds (via ceph-osd@.service), but their fs isn't in 
the correct place, so this never works.


The fix/workaround is to adjust the timeout value (edit the service file 
directly, or for style points write an override in /etc/systemd/system 
remembering you need a blank ExecStart line before your revised one).


Experimenting, one of our storage nodes with 60 6TB disks took 17m35s to 
start all its osds when started up with all fss dirty. So the current 
120s is far too small (it's just about OK when all the osd fss are clean).


I think, though, that having the timeout at all is a bug - if something 
needs to time out under some circumstances, should it be at a lower 
layer, perhaps?


A couple of final points/asides, if I may:

ceph-disk trigger uses subprocess.communicate (via the command() 
function), which means it swallows the log output from ceph-disk 
activate (and only outputs it after that process finishes) - as well as 
producing confusing timestamps, this means that when systemd kills the 
cgroup, all the output from the ceph-disk activate command vanishes into 
the void. That made debugging needlessly hard. Better to let called 
processes like that output immediately?


Does each fs need mounting twice? could the osd be encoded in the 
partition label or similar instead?


Is a single global activation lock necessary? It slows startup down 
quite a bit; I see no reason why (at least in the one-osd-per-disk case) 
you couldn't be activating all the osds at once...


Regards,

Matthew

[0] I note, for instance, that /etc/init/ceph-disk.conf doesn't have the 
timeout, so presumably upstart systems aren't affected

[1] /var/lib/ceph/tmp/ceph-disk.activate.lock at least on Ubuntu


--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-osd restartd via systemd in case of disk error

2017-09-20 Thread Matthew Vernon
On 19/09/17 10:40, Wido den Hollander wrote:
> 
>> Op 19 september 2017 om 10:24 schreef Adrian Saul 
>> :
>>
>>
>>> I understand what you mean and it's indeed dangerous, but see:
>>> https://github.com/ceph/ceph/blob/master/systemd/ceph-osd%40.service
>>>
>>> Looking at the systemd docs it's difficult though:
>>> https://www.freedesktop.org/software/systemd/man/systemd.service.ht
>>> ml
>>>
>>> If the OSD crashes due to another bug you do want it to restart.
>>>
>>> But for systemd it's not possible to see if the crash was due to a disk I/O-
>>> error or a bug in the OSD itself or maybe the OOM-killer or something.
>>
>> Perhaps using something like RestartPreventExitStatus and defining a 
>> specific exit code for the OSD to exit on when it is exiting due to an IO 
>> error.
>>
> 
> That's a very, very good idea! I didn't know that one existed.
> 
> That would prevent restarts in case of I/O error indeed.

That would depend on the OSD gracefully handling the I/O failure - IME
they quite often seem to end up abort()ing...

Regards,

Matthew


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Collectd issues

2017-09-18 Thread Matthew Vernon
On 18/09/17 16:37, Matthew Vernon wrote:
> On 13/09/17 15:06, Marc Roos wrote:
>>
>>
>> Am I the only one having these JSON issues with collectd, did I do 
>> something wrong in configuration/upgrade?
> 
> I also see these, although my dashboard seems to mostly be working. I'd
> be interested in knowing what the problem is!
> 
>> Sep 13 15:44:15 c01 collectd: ceph plugin: ds 
>> Bluestore.kvFlushLat.avgtime was not properly initialized.
>> Sep 13 15:44:15 c01 collectd: ceph plugin: JSON handler failed with 
>> status -1.
> 
> [ours is slightly different, as we're not running Bluestore]

To add what might be helpful in tracking this down- we're only seeing
this on our nodes which are running the radosgw...

Sep 18 06:26:27 sto-1-2 collectd[423236]: ceph plugin: ds
ThrottleMsgrDispatchThrottlerRadosclient0x75f799e740.getOrFailF was not
properly initialized.
Sep 18 06:26:27 sto-1-2 collectd[423236]: ceph plugin: JSON handler
failed with status -1.
Sep 18 06:26:27 sto-1-2 collectd[423236]: ceph plugin:
cconn_handle_event(name=client.rgw.sto-1-2,i=60,st=4): error 1

Regards,

Matthew



-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Collectd issues

2017-09-18 Thread Matthew Vernon
On 13/09/17 15:06, Marc Roos wrote:
> 
> 
> Am I the only one having these JSON issues with collectd, did I do 
> something wrong in configuration/upgrade?

I also see these, although my dashboard seems to mostly be working. I'd
be interested in knowing what the problem is!

> Sep 13 15:44:15 c01 collectd: ceph plugin: ds 
> Bluestore.kvFlushLat.avgtime was not properly initialized.
> Sep 13 15:44:15 c01 collectd: ceph plugin: JSON handler failed with 
> status -1.

[ours is slightly different, as we're not running Bluestore]

Regards,

Matthew


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Some OSDs are down after Server reboot

2017-09-15 Thread Matthew Vernon
Hi,

On 14/09/17 16:26, Götz Reinicke wrote:

> maybe someone has a hint: I do have a cephalopod cluster (6 nodes, 144
> OSDs), Cents 7.3 ceph 10.2.7.
> 
> I did a kernel update to the recent centos 7.3 one on a node and did a
> reboot.
> 
> After that, 10 OSDs did not came up as the others. The disk did not get
> mounted and the OSD processes did nothing … even after a couple of
> minutes no more disks/OSDs showed up.
> 
> So I did a ceph-disk activate-all.
> 
> And all missing OSDs got back online.
> 
> Questions: Any hints on debugging why the disk did not get online after
> the reboot?

We've been seeing this on our Ubuntu / Jewel cluster, after we upgraded
from ceph 10.2.3 / kernel 4.4.0-62 to ceph 10.2.7 / kernel 4.4.0-93.

I'm still digging, but AFAICT it's a race condition in startup - in our
case, we're only seeing it if some of the filesystems aren't clean. This
may be related to the thread "Very slow start of osds after reboot" from
August, but I don't think any conclusion was reached there.

Regards,

Matthew


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph release cadence

2017-09-08 Thread Matthew Vernon
Hi,

On 06/09/17 16:23, Sage Weil wrote:

> Traditionally, we have done a major named "stable" release twice a year, 
> and every other such release has been an "LTS" release, with fixes 
> backported for 1-2 years.

We use the ceph version that comes with our distribution (Ubuntu LTS);
those come out every 2 years (though we won't move to a brand-new
distribution until we've done some testing!). So from my POV, LTS ceph
releases that come out such that adjacent ceph LTSs fit neatly into
adjacent Ubuntu LTSs is the ideal outcome. We're unlikely to ever try
putting a non-LTS ceph version into production.

I hope this isn't an unusual requirement :)

Matthew


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How big can a mon store get?

2017-08-25 Thread Matthew Vernon

Hi,

We have a medium-sized (2520 osds, 42 hosts, 88832 pgs, 15PB raw 
capacity) Jewel cluster (on Ubuntu), and in normal operation, our mon 
store size is around the 1.2G mark. I've noticed, though, that when 
doing larger rebalances, they can grow really very large (up to nearly 
70G, which is nearly all the rootfs on our mons); when recently adding 
more osds to the cluster (and consequently increasing pg number by 
increasing the size of an existing pool) we nearly ran out of disk space 
on the mons due to their mon stores getting very big.


Since I imagine having your mons run out of disk mid-rebalance is A Bad 
Idea, is there a way to estimate how big a mon store might get?


Thanks,

Matthew


--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] XFS attempt to access beyond end of device

2017-07-20 Thread Matthew Vernon

Hi,

On 18/07/17 05:08, Marcus Furlong wrote:

On 22 March 2017 at 05:51, Dan van der Ster > wrote:



Apologies for reviving an old thread, but I figured out what happened
and never documented it, so I thought an update might be useful.


[snip detailed debugging]

Thanks for getting to the bottom of this. Have you reported it to the 
sgdisk authors, please? It'd be good to get it fixed :)


[we look to have similarly-affected disks; I'm not yet sure if we're 
brave enough to try adjusting the fs size :-/ ]


Regards,

Matthew


--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Specifying a cache tier for erasure-coding?

2017-07-11 Thread Matthew Vernon
Hi,

On 07/07/17 13:03, David Turner wrote:
> So many of your questions depends on what your cluster is used for. We
> don't even know rbd or cephfs from what you said and that still isn't
> enough to fully answer your questions. I have a much smaller 3 node
> cluster using Erasure coding for rbds as well as cephfs and it is fine
> speed-wise for my needs with the cache tier on the hdds. Luminous will
> remove the need for a cache tier to use Erasure coding if you can wait.

Sorry; our cluster is used partly to provide volumes for OpenStack, and
party for S3 (via rgw).

> Is your current cluster fast enough for your needs? Is Erasure coding
> just for additional space? If so, moving to Erasure coding requires you
> to copy your data from the replicated pool to the EC pool land you will
> have 2 copies of your data until you feel confident enough to delete the
> replicated copy.  Elaborate on what you mean when you ask how robust EC
> is, you then referred to replicated as simple.  Are you concerned it
> will add complexity or that it will be lacking features of a replicated
> pool?

I think our cluster is currently fast enough (I'm sure our users would
always want more speed :) ); we were thinking that erasure coding would
save us some disk space, yes.

I'm concerned that erasure coded pools (and a cache tier in front of
them) will be a more complex setup to manage (we use ceph-ansible) than
our current setup (replicated pools).

Thanks,

Matthew


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Specifying a cache tier for erasure-coding?

2017-07-07 Thread Matthew Vernon
Hi,

Currently, our ceph cluster is all 3-way replicated, and works very
nicely. We're consider the possibility of adding an erasure-coding pool;
which I understand would require a cache tier in front of it to ensure
decent performance.

I am wondering what sort of spec should we be thinking about for the
cache tier, and how robust erasure-coding pools are compared with the
simpler replicated pools?

Our test cluster has 3 storage nodes, each with 60 x 6TB RRD, with NVME
for journals. So for cache tier, we should presumably be thinking SSD?
Roughly how much, and should we consider NVME, or is it not worth the ?

Thanks :)

Matthew


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Read errors on OSD

2017-06-01 Thread Matthew Vernon

Hi,

On 01/06/17 10:38, Oliver Humpage wrote:


These read errors are all on Samsung 850 Pro 2TB disks (journals are
on separate enterprise SSDs). The SMART status on all of them are
similar and show nothing out of the ordinary.

Has anyone else experienced anything similar? Is this just a curse of
non-enterprise SSDs, or do you think there might be something else
going on, e.g. could it be an XFS issue? Any suggestions as to what
to look at would be welcome.


You don't say what's in kern.log - we've had (rotating) disks that were 
throwing read errors but still saying they were OK on SMART.


Regards,

Matthew


--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Spurious 'incorrect nilfs2 checksum' breaking ceph OSD

2017-05-03 Thread Matthew Vernon
Hi,

This has bitten us a couple of times now (such that we're considering
re-building util-linux with the nilfs2 code commented out), so I'm
wondering if anyone else has seen it [and noting the failure mode in
case anyone else is confused in future]

We see this with our setup of rotating media for the osd, NVMe partition
for journal.

What happens is that sometimes an osd refuses to start up, complaining
that /var/lib/ceph/osd/ceph-XXX/journal is missing.

inspecting that file will show it's a broken symlink to an entry in
/dev/disk/by-partuuid:

/var/lib/ceph/osd/ceph-388/journal: broken symbolic link to
/dev/disk/by-partuuid/d2ace848-7e2d-4395-a195-a4428631b333

If you inspect the relevant partition, you see that it has the matching
block id:

blkid /dev/nvme0n1p11
/dev/nvme0n1p11: PARTLABEL="ceph journal"
PARTUUID="d2ace848-7e2d-4395-a195-a4428631b333

And, if you look in syslog, you'll see this:

Jan  4 09:25:29 sto-3-3 systemd-udevd[107317]: incorrect nilfs2 checksum
on /dev/nvme0n1p11

The problem is that the nilfs2 checker is too promiscuous, looking for a
relatively short magic number (0x3434) in 2 different places (location
0x400, and (((part_size-512)/8)*512)). So sometimes you'll be unlucky
and have a ceph journal that matches, at which point the nilfs2 prober
find an invalid checksum, and so systemd/udevd doesn't create the
/dev/disk/by-partuuid link.

You can work around this by making the symlink by hand when the failure
occurs; I also understand that the nilfs2 prober in util_linux 2.29 is
more robust (but that's not in any LTS distributions yet, so I've not
tested it).

Regards,

Matthew

util-linux issue: https://github.com/karelzak/util-linux/issues/361
Ubuntu bug:
https://bugs.launchpad.net/ubuntu/+source/util-linux/+bug/1653936


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adding a new rack to crush map without pain?

2017-04-19 Thread Matthew Vernon
Hi,

> How many OSD's are we talking about? We're about 500 now, and even
> adding another 2000-3000 is a 5 minute cut/paste job of editing the
> CRUSH map. If you really are adding racks and racks of OSD's every week,
> you should have found the crush location hook a long time ago. 

We have 540 at the moment, and have another 540 due in May, and then
about 1500 due at some point in the summer.

Regards,

Matthew


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adding a new rack to crush map without pain?

2017-04-18 Thread Matthew Vernon
On 17/04/17 21:16, Richard Hesse wrote:
> I'm just spitballing here, but what if you set osd crush update on start
> = false ? Ansible would activate the OSD's but not place them in any
> particular rack, working around the ceph.conf problem you mentioned.
> Then you could place them in your CRUSH map by hand. I know you wanted
> to avoid editing the CRUSH map by hand, but it's usually the safest route.

It scales really badly - "edit CRUSH map by hand" isn't really something
that I can automate; presumably something could be lashed up with ceph
osd crush add-bucket and ceph osd set ... but that feels more like a
lash-up and less like a properly-engineered solution to what must be a
fairly common problem?

Regards,

Matthew

> On Wed, Apr 12, 2017 at 4:46 PM, Matthew Vernon <m...@sanger.ac.uk
> <mailto:m...@sanger.ac.uk>> wrote:
> 
> Hi,
> 
> Our current (jewel) CRUSH map has rack / host / osd (and the default
> replication rule does step chooseleaf firstn 0 type rack). We're shortly
> going to be adding some new hosts in new racks, and I'm wondering what
> the least-painful way of getting the new osds associated with the
> correct (new) rack will be.
> 
> We deploy with ceph-ansible, which can add bits of the form
> [osd.104]
> osd crush location = root=default rack=1 host=sto-1-1
> 
> to ceph.conf, but I think this doesn't help for new osds, since
> ceph-disk will activate them before ceph.conf is fully assembled (and
> trying to arrange it otherwise would be serious hassle).
> 
> Would making a custom crush location hook be the way to go? then it'd
> say rack=4 host=sto-4-x and new osds would end up allocated to rack 4?
> And would I need to have done ceph osd crush add-bucket rack4 rack
> first, presumably?
> 
> I am planning on adding osds to the cluster one box at a time, rather
> than going with the add-everything-at-crush-weight-0 route; if nothing
> else it seems easier to automate. And I'd rather avoid having to edit
> the crush map directly...
> 
> Any pointers welcomed :)
> 
> Regards,
> 
> Matthew
> 
> 
> --
>  The Wellcome Trust Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> 
> 



-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Adding a new rack to crush map without pain?

2017-04-12 Thread Matthew Vernon
Hi,

Our current (jewel) CRUSH map has rack / host / osd (and the default
replication rule does step chooseleaf firstn 0 type rack). We're shortly
going to be adding some new hosts in new racks, and I'm wondering what
the least-painful way of getting the new osds associated with the
correct (new) rack will be.

We deploy with ceph-ansible, which can add bits of the form
[osd.104]
osd crush location = root=default rack=1 host=sto-1-1

to ceph.conf, but I think this doesn't help for new osds, since
ceph-disk will activate them before ceph.conf is fully assembled (and
trying to arrange it otherwise would be serious hassle).

Would making a custom crush location hook be the way to go? then it'd
say rack=4 host=sto-4-x and new osds would end up allocated to rack 4?
And would I need to have done ceph osd crush add-bucket rack4 rack
first, presumably?

I am planning on adding osds to the cluster one box at a time, rather
than going with the add-everything-at-crush-weight-0 route; if nothing
else it seems easier to automate. And I'd rather avoid having to edit
the crush map directly...

Any pointers welcomed :)

Regards,

Matthew


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] strange radosgw-admin create user behaviour (bug?)

2017-03-22 Thread Matthew Vernon
Hi,

radosgw-admin create user sometimes seem to misbehave when trying to
create similarly-named accounts with the same email address:

radosgw-admin -n client.rgw.sto-1-2 user create --uid=XXXDELETEME
--display-name=carthago --email=h...@sanger.ac.uk
{
"user_id": "XXXDELETEME",
[...]

radosgw-admin -n client.rgw.sto-1-2 user create --uid=DELETEME
--display-name=carthago --email=h...@sanger.ac.uk
{
"user_id": "XXXDELETEME",
[...]

This second command should surely say
"could not create user: unable to create user, email: h...@sanger.ac.uk
is the email address an existing user"

Rather than returning the credential of a different uid?

I tripped over this when doing some testing, and it confused my script
because it was expecting the 'create' to fail...

Regards,

Matthew


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW listing users' quota and usage painfully slow

2017-03-09 Thread Matthew Vernon

On 09/03/17 11:28, Matthew Vernon wrote:


https://drive.google.com/drive/folders/0B4TV1iNptBAdMEdUaGJIa3U1QVE?usp=sharing


[For the avoidance of doubt, I've changed the key associated with that 
S3 account :-) ]


Regards,

Matthew


--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW listing users' quota and usage painfully slow

2017-03-09 Thread Matthew Vernon

On 09/03/17 10:45, Abhishek Lekshmanan wrote:


On 03/09/2017 11:26 AM, Matthew Vernon wrote:


I'm using Jewel / 10.2.3-0ubuntu0.16.04.2 . We want to keep track of our
S3 users' quota and usage. Even with a relatively small number of users
(23) it's taking ~23 seconds.

What we do is (in outline):
radosgw-admin metadata list user
for each user X:
  radosgw-admin user info --uid=X  #has quota details
  radosgw-admin user stats --uid=X #has usage details

None of these calls is particularly slow (~0.5s), but the net result is
not very satisfactory.

What am I doing wrong? :)


Is this a single site or a multisite cluster? If you're only trying to
read info you could try disabling the cache (it is not recommended to
use this if you're trying to write/modify info) for eg:


It's a single site.


$ radosgw-admin user info --uid=x --rgw-cache-enabled=false


That doesn't noticably change the execution time (perhaps it improves it 
a little)



also you could run the info command with higher debug (--debug-rgw=20
--debug-ms=1) and paste that somewhere (its very verbose) to help
identify where we're slowing down


https://drive.google.com/drive/folders/0B4TV1iNptBAdMEdUaGJIa3U1QVE?usp=sharing

Should let you see the output from running this with that 
cache-disabling option (and without).


Naiively, I find myself wondering if some sort of all-users flag to the 
info and stats command or a "tell me usage and quota with one call" 
command would be quicker.


Thanks,

Matthew


--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW listing users' quota and usage painfully slow

2017-03-09 Thread Matthew Vernon

Hi,

I'm using Jewel / 10.2.3-0ubuntu0.16.04.2 . We want to keep track of our 
S3 users' quota and usage. Even with a relatively small number of users 
(23) it's taking ~23 seconds.


What we do is (in outline):
radosgw-admin metadata list user
for each user X:
  radosgw-admin user info --uid=X  #has quota details
  radosgw-admin user stats --uid=X #has usage details

None of these calls is particularly slow (~0.5s), but the net result is 
not very satisfactory.


What am I doing wrong? :)

Regards,

Matthew


--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph monitoring

2017-01-30 Thread Matthew Vernon

Dear Marc,

On 28/01/17 23:43, Marc Roos wrote:


Is there a doc that describes all the parameters that are published by
collectd-ceph?


The best I've found is the Redhat documentation of the performance 
counters (which are what collectd-ceph is querying):


https://access.redhat.com/documentation/en/red-hat-ceph-storage/1.3/paged/administration-guide/chapter-9-performance-counters

HTH,

Matthew


--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] machine hangs & soft lockups with 10.2.2 / kernel 4.4.0

2017-01-23 Thread Matthew Vernon
On 23/01/17 16:40, Tu Holmes wrote:
> While I know this seems a silly question, are your monitoring nodes
> spec'd the same?

Oh, sorry, I should have said that. All 9 machines have osds on (1 per
disk); additionally 3 of the nodes are also mons and 3 (a different 3)
are rgws.

One of the freezing nodes is osds-only, another is osds-and-mons. The
soft-lockup node is osds-and-rgw

Regards,

Matthew

> //Tu
> On Mon, Jan 23, 2017 at 8:38 AM Matthew Vernon <m...@sanger.ac.uk
> <mailto:m...@sanger.ac.uk>> wrote:
> 
> Hi,
> 
> We have a 9-node ceph cluster, running 10.2.2 and kernel 4.4.0 (Ubuntu
> Xenial). We're seeing both machines freezing (nothing in logs on the
> machine, which is entirely unresponsive to anything except the power
> button) and suffering soft lockups.
> 
> Has anyone seen similar? Googling hasn't found anything obvious, and
> while ceph repairs itself when a machine is lost, this is obviously
> quite concerning.
> 
> I don't have any useful logs from the machines that freeze, but I do
> have logs from the machine that suffered soft lockups - you can see the
> relevant bits of kern.log here:
> 
> 
> https://drive.google.com/drive/folders/0B4TV1iNptBAdblJMX1R4ZWI5eGc?usp=sharing
> 
> [available compressed and uncompressed]
> 
> The cluster was installed with ceph-ansible, and the specs of each node
> are roughly:
> 
> Cores: 16 (2 x 8-core Intel E5-2690)
> Memory: 512 GB (16 x32 GB)
> Storage: 2x 120GB SAMSUNG SSD (system disk)
>  2x 2TB NVME cards (ceph journal)
>  60x 6TB Toshiba 7200 rpm disks (ceph storage)
> Network: 1 Gbit/s Intel I350 (Control interface)
>  2x 100Gbit/s Mellanox cards (bonded together)
> 
> We're in pre-production testing, but any suggestions on how we might get
> to the bottom of this would be appreciated!
> 
> There's no obvious pattern to these problems, and we've had 2 freezes
> and 1 soft lockup in the last ~1.5 weeks.
> 
> Thanks,
> 
> Matthew
> 
> 
> --
>  The Wellcome Trust Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 



-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] machine hangs & soft lockups with 10.2.2 / kernel 4.4.0

2017-01-23 Thread Matthew Vernon
Hi,

We have a 9-node ceph cluster, running 10.2.2 and kernel 4.4.0 (Ubuntu
Xenial). We're seeing both machines freezing (nothing in logs on the
machine, which is entirely unresponsive to anything except the power
button) and suffering soft lockups.

Has anyone seen similar? Googling hasn't found anything obvious, and
while ceph repairs itself when a machine is lost, this is obviously
quite concerning.

I don't have any useful logs from the machines that freeze, but I do
have logs from the machine that suffered soft lockups - you can see the
relevant bits of kern.log here:

https://drive.google.com/drive/folders/0B4TV1iNptBAdblJMX1R4ZWI5eGc?usp=sharing

[available compressed and uncompressed]

The cluster was installed with ceph-ansible, and the specs of each node
are roughly:

Cores: 16 (2 x 8-core Intel E5-2690)
Memory: 512 GB (16 x32 GB)
Storage: 2x 120GB SAMSUNG SSD (system disk)
 2x 2TB NVME cards (ceph journal)
 60x 6TB Toshiba 7200 rpm disks (ceph storage)
Network: 1 Gbit/s Intel I350 (Control interface)
 2x 100Gbit/s Mellanox cards (bonded together)

We're in pre-production testing, but any suggestions on how we might get
to the bottom of this would be appreciated!

There's no obvious pattern to these problems, and we've had 2 freezes
and 1 soft lockup in the last ~1.5 weeks.

Thanks,

Matthew


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] civetweb deamon dies on https port

2017-01-19 Thread Matthew Vernon
Hi,

On 19/01/17 13:58, Chris Sarginson wrote:
> You look to have a typo in this line:
> 
> rgw_frontends = "civetweb port=8080s
> ssl_certificate=/etc/pki/tls/cephrgw01.crt" 
> 
> It would seem from the error it should be port=8080, not port=8080s. 

I think you are incorrect; port=8080s is what you want if you want https
on that port.

The error message is a false positive - we also see similar:

Jan 19 14:30:38 sto-3-2 radosgw[94484]: error parsing int: 443s: The
option value '443s' seems to be invalid

...but our rgw runs fine.

I went looking for this a while back, and found:

The error message comes from src/rgw/rgw_main.cc line 392 in 16.04 (l456
in ceph git) where ceph calls get_val (string,int,int) which produces
the error message observed (and fails).

Subsequently, rgw_civetweb_frontend.cc line 39 (l53 in ceph git) uses
the string version of get_val which succeeds and stores the string value
verbatim in what will become the listening_ports argument to civetweb;
which is happy with a string of the form you use.

One could argue that this is a bug...

Regards,

Matthew


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs quota

2016-12-16 Thread Matthew Vernon
Hello,
On 15/12/16 10:25, David Disseldorp wrote:

> Are you using the Linux kernel CephFS client (mount.ceph), or the
> userspace ceph-fuse back end? Quota enforcement is performed by the
> client, and is currently only supported by ceph-fuse.

Is server enforcement of quotas planned?

Regards,

Matthew


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-mon not starting on system startup (Ubuntu 16.04 / systemd)

2016-11-18 Thread Matthew Vernon
Hi,

On 15/11/16 11:55, Craig Chi wrote:

> You can try to manually fix this by adding the
> /lib/systemd/system/ceph-mon.target file, which contains:



> and then execute the following command to tell systemd to start this
> target on bootup
> systemctl enable ceph-mon.target

This worked a treat, thank you!

> so as ceph-osd can be fixed by the same trick.

I've not had problems with ceph-osd failing to start.

Regards,

Matthew


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-mon not starting on system startup (Ubuntu 16.04 / systemd)

2016-11-15 Thread Matthew Vernon
Hi,

On 15/11/16 01:27, Craig Chi wrote:

> What's your Ceph version?
> I am using Jewel 10.2.3 and systemd seems to work normally. I deployed
> Ceph by ansible, too.

The version in Ubuntu 16.04, which is 10.2.2-0ubuntu0.16.04.2

> You can check whether you have /lib/systemd/system/ceph-mon.target file.
> I believe it was a bug existing in 10.2.1 before
> cfa2d0a08a0bcd0fac153041b9eff17cb6f7c9af has been merged.

No, I have the following:
/lib/systemd/system/ceph-create-keys.service
/lib/systemd/system/ceph-create-keys@.service
/lib/systemd/system/ceph-disk@.service
/lib/systemd/system/ceph-mon.service
/lib/systemd/system/ceph-mon@.service
/lib/systemd/system/ceph-osd@.service
/lib/systemd/system/ceph.target

[so no ceph-osd.service ; ceph-osd@.service says its part of
ceph-osd.target which I can't see defined anywhere explicitly]

Also /etc/systemd/system/ceph-mon.target.wants (contains a link to
ceph-mon@hostname.service) and ...ceph-osd.target.wants (which contains
links to the ceph-osd services)

ceph-mon.service says PartOf ceph.target.

Regards,

Matthew


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-mon not starting on system startup (Ubuntu 16.04 / systemd)

2016-11-14 Thread Matthew Vernon
Hi,

I have a problem that my ceph-mon isn't getting started when my machine
boots; the OSDs start up just fine. Checking logs, there's no sign of
systemd making any attempt to start it, although it is seemingly enabled:

root@sto-1-1:~# systemctl status ceph-mon@sto-1-1
● ceph-mon@sto-1-1.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled;
vendor preset
   Active: inactive (dead)

I see a thread on this issue in the list archives from May, but no sign
of what the eventual solution was...

If it matters, I'm deploying Jewel using ceph-ansible (
https://github.com/ceph/ceph-ansible ); that does (amongst other things)
systemctl enable ceph-mon@sto-1-1

Thanks,

Matthew


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] multiple openstacks on one ceph / namespaces

2016-11-09 Thread Matthew Vernon
Hi,

I'm configuring ceph as the storage for our openstack install. One thing
we might want to do in the future is have a second openstack instance
(e.g. to test the next release of openstack); we might well want to have
this talk to our existing ceph cluster.

I could do this by giving each stack a different ceph username, but I
think there's no way to keep them apart, since namespaces don't current
work at the rbd level? [the docs say currently (i.e. firefly) not supported]

Should I just be creating pools for the different stacks, and hoping
this doesn't result in too many pgs? I'm not clear on what's best
practice, but the more pools I need to create up-front, the more
guesswork that's involved in guessing how big to make them...

Regards,

Matthew


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Radosgw pool creation (jewel / Ubuntu16.04)

2016-11-09 Thread Matthew Vernon
Hi,

I have a jewel/Ubuntu16.40 ceph cluster. I attempted to add some
radosgws, having already made the pools I thought they would need per
http://docs.ceph.com/docs/jewel/radosgw/config-ref/#pools

i.e. .rgw and so on:
.rgw
.rgw.control
.rgw.gc
.log
.intent-log
.usage
.users
.users.email
.users.swift
.users.uid


But in fact, it's created a bunch of pools under default:
default.rgw.control
default.rgw.data.root
default.rgw.gc
default.rgw.log
default.rgw.users.uid

So, should I have created these pools instead, or is there some way to
make radosgw do what I intended? Relatedly, is it going to create e.g.
default.rgw.users.swift as and when I enable the swift gateway? [rather
than .users.swift as the docs suggest]

Thanks,

Matthew


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com