Re: [ceph-users] Missing Ceph perf-counters in Ceph-Dashboard or Prometheus/InfluxDB...?

2019-12-03 Thread Benjeman Meekhof
I'd like to see a few of the cache tier counters exposed. You get some info on cache activity in 'ceph -s' so it makes sense from my perspective to have similar availability in exposed counters. There's a tracker for this request (opened by me a while ago): https://tracker.ceph.com/issues/37156

Re: [ceph-users] RGW Admin REST metadata caps

2019-07-23 Thread Benjeman Meekhof
at 12:53 PM Benjeman Meekhof wrote: > > Ceph Nautilus, 14.2.2, RGW civetweb. > Trying to read from the RGW admin api /metadata/user with request URL like: > GET /admin/metadata/user?key=someuser=json > > But am getting a 403 denied error from RGW. Shouldn't the caps below > b

[ceph-users] RGW Admin REST metadata caps

2019-07-23 Thread Benjeman Meekhof
Ceph Nautilus, 14.2.2, RGW civetweb. Trying to read from the RGW admin api /metadata/user with request URL like: GET /admin/metadata/user?key=someuser=json But am getting a 403 denied error from RGW. Shouldn't the caps below be sufficient, or am I missing something? "caps": [ {

Re: [ceph-users] SSD Sizing for DB/WAL: 4% for large drives?

2019-05-28 Thread Benjeman Meekhof
I suggest having a look at this thread, which suggests that sizes 'in between' the requirements of different RocksDB levels have no net effect, and size accordingly. http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030740.html My impression is that 28GB is good (L0+L1+L3), or 280

Re: [ceph-users] Restricting access to RadosGW/S3 buckets

2019-05-02 Thread Benjeman Meekhof
Hi Vlad, If a user creates a bucket then only that user can see the bucket unless an S3 ACL is applied giving additional permissionsbut I'd guess you are asking a more complex question than that. If you are looking to apply some kind of policy over-riding whatever ACL a user might apply to a

[ceph-users] Limits of mds bal fragment size max

2019-04-12 Thread Benjeman Meekhof
We have a user syncing data with some kind of rsync + hardlink based system creating/removing large numbers of hard links. We've encountered many of the issues with stray inode re-integration as described in the thread and tracker below. As noted one fix is to increase mds_bal_fragment_size_max

[ceph-users] Additional meta data attributes for rgw user?

2019-01-21 Thread Benjeman Meekhof
Hi all, I'm looking to keep some extra meta-data associated with radosgw users created by radosgw-admin. I saw in the output of 'radosgw-admin metadata get user:someuser" there is an 'attrs' structure that looked promising. However it seems to be strict about what it accepts so I wonder if

[ceph-users] ceph-mon high single-core usage, reencode_incremental_map

2018-12-19 Thread Benjeman Meekhof
Version: Mimic 13.2.2 Lately during any kind of cluster change, particularly adding OSD in this most recent instance, I'm seeing our mons (all of them) showing 100% usage on a single core but not at all using any of the other available cores on the system. Cluster commands are slow to respond

Re: [ceph-users] Need help related to authentication

2018-12-05 Thread Benjeman Meekhof
Hi Rishabh, You might want to check out these examples for python boto3 which include SSE-C: https://github.com/boto/boto3/blob/develop/boto3/examples/s3.rst As already noted use 'radosgw-admin' to retrieve access key and secret key to plug into your client. If you are not an administrator on

Re: [ceph-users] Ceph MDS and hard links

2018-08-07 Thread Benjeman Meekhof
MiB) Bytes in malloc metadata MALLOC: MALLOC: = 12869599232 (12273.4 MiB) Actual memory used (physical + swap) MALLOC: +436740096 ( 416.5 MiB) Bytes released to OS (aka unmapped) MALLOC: MALLOC: = 13306339328 (12689.9 MiB) Virtual addre

Re: [ceph-users] Ceph MDS and hard links

2018-08-03 Thread Benjeman Meekhof
378986760 ( 361.4 MiB) Bytes in central cache freelist MALLOC: + 4713472 (4.5 MiB) Bytes in transfer cache freelist MALLOC: + 20722016 ( 19.8 MiB) Bytes in thread cache freelists MALLOC: + 62652416 ( 59.8 MiB) Bytes in malloc metadata MALLOC: MALLOC: = 128

[ceph-users] Ceph MDS and hard links

2018-08-01 Thread Benjeman Meekhof
I've been encountering lately a much higher than expected memory usage on our MDS which doesn't align with the cache_memory limit even accounting for potential over-runs. Our memory limit is 4GB but the MDS process is steadily at around 11GB used. Coincidentally we also have a new user heavily

Re: [ceph-users] active directory integration with cephfs

2018-07-26 Thread Benjeman Meekhof
I can comment on that docker image: We built that to bake in a certain amount of config regarding nfs-ganesha serving CephFS and using LDAP to do idmap lookups (example ldap entries are in readme). At least as we use it the server-side uid/gid information is pulled from sssd using a config file

Re: [ceph-users] MDS: journaler.pq decode error

2018-06-21 Thread Benjeman Meekhof
to the question might be interesting for future reference. thanks, Ben On Thu, Jun 21, 2018 at 11:32 AM, Benjeman Meekhof wrote: > Thanks very much John! Skipping over the corrupt entry by setting a > new expire_pos seems to have worked. The journal expire_pos is now > advancing

Re: [ceph-users] MDS: journaler.pq decode error

2018-06-21 Thread Benjeman Meekhof
integrity. As recommended I did take an export of the journal first and I'll take a stab at using a hex editor on it near future. Worst case we go through the tag/scan if necessary. thanks, Ben On Thu, Jun 21, 2018 at 9:04 AM, John Spray wrote: > On Wed, Jun 20, 2018 at 2:17 PM Benjeman Meek

Re: [ceph-users] MDS: journaler.pq decode error

2018-06-20 Thread Benjeman Meekhof
out": { "stripe_unit": 4194304, "stripe_count": 1, "object_size": 4194304, "pool_id": 64, "pool_ns": "" } } thanks, Ben On Fri, Jun 15, 2018 at 11:54 AM, John Spray wrote: > On Fri, Jun 15, 2018

[ceph-users] MDS: journaler.pq decode error

2018-06-15 Thread Benjeman Meekhof
Have seen some posts and issue trackers related to this topic in the past but haven't been able to put it together to resolve the issue I'm having. All on Luminous 12.2.5 (upgraded over time from past releases). We are going to upgrade to Mimic near future if that would somehow resolve the

[ceph-users] nfs-ganesha 2.6 deb packages

2018-05-14 Thread Benjeman Meekhof
I see that luminous RPM packages are up at download.ceph.com for ganesha-ceph 2.6 but there is nothing in the Deb area. Any estimates on when we might see those packages? http://download.ceph.com/nfs-ganesha/deb-V2.6-stable/luminous/ thanks, Ben ___

Re: [ceph-users] Radosgw ldap info

2018-03-26 Thread Benjeman Meekhof
Hi Marc, I can't speak to your other questions but as far as the user auth caps those are still kept in the radosgw metadata outside of ldap. As far as I know all that LDAP gives you is a way to authenticate users with a user/password combination. So, for example, if you create a user

Re: [ceph-users] Radosgw ldap user authentication issues

2018-03-19 Thread Benjeman Meekhof
Hi Marc, You mentioned following the instructions 'except for doing this ldap token'. Do I read that correctly that you did not generate / use an LDAP token with your client? I think that is a necessary part of triggering the LDAP authentication (Section 3.2 and 3.3 of the doc you linked). I

Re: [ceph-users] Ganesha-rgw export with LDAP auth

2018-03-09 Thread Benjeman Meekhof
e: > Hi Benjeman, > > It is -intended- to work, identically to the standalone radosgw > server. I can try to verify whether there could be a bug affecting > this path. > > Matt > > On Fri, Mar 9, 2018 at 12:01 PM, Benjeman Meekhof <bmeek...@umich.edu> wrote

[ceph-users] Ganesha-rgw export with LDAP auth

2018-03-09 Thread Benjeman Meekhof
I'm having issues exporting a radosgw bucket if the configured user is authenticated using the rgw ldap connectors. I've verified that this same ldap token works ok for other clients, and as I'll note below it seems like the rgw instance is contacting the LDAP server and successfully

Re: [ceph-users] puppet for the deployment of ceph

2018-02-19 Thread Benjeman Meekhof
We use this one, now heavily modified in our own fork. I'd sooner point you at the original unless it is missing something you need. Ours has diverged a bit and makes no attempt to support anything outside our specific environment (RHEL7). https://github.com/openstack/puppet-ceph

Re: [ceph-users] "Cannot get stat of OSD" in ceph.mgr.log upon enabling influx plugin

2018-02-19 Thread Benjeman Meekhof
The 'cannot stat' messages are normal at startup, we see them also in our working setup with mgr influx module. Maybe they could be fixed by delaying the module startup, or having it check for some other 'all good' status but I haven't looked into it. You should only be seeing them when the mgr

Re: [ceph-users] mgr[influx] Cannot transmit statistics: influxdb python module not found.

2018-02-12 Thread Benjeman Meekhof
In our case I think we grabbed the SRPM from Fedora and rebuilt it on Scientific Linux (another RHEL derivative). Presumably the binary didn't work or I would have installed it directly. I'm not quite sure why it hasn't migrated to EPEL yet. I haven't tried the SRPM for latest releases, we're

Re: [ceph-users] Ceph MGR Influx plugin 12.2.2

2018-01-11 Thread Benjeman Meekhof
Hi Reed, Someone in our group originally wrote the plugin and put in PR. Since our commit the plugin was 'forward-ported' to master and made incompatible with Luminous so we've been using our own version of the plugin while waiting for the necessary pieces to be back-ported to Luminous to use

Re: [ceph-users] Ceph-mgr summarize recovery counters

2017-10-10 Thread Benjeman Meekhof
module log: mgr get_python Python module requested unknown data 'pg_status' thanks, Ben On Thu, Oct 5, 2017 at 8:42 AM, John Spray <jsp...@redhat.com> wrote: > On Wed, Oct 4, 2017 at 7:14 PM, Gregory Farnum <gfar...@redhat.com> wrote: >> On Wed, Oct 4, 2017 at 9:14 AM, Be

[ceph-users] Ceph-mgr summarize recovery counters

2017-10-04 Thread Benjeman Meekhof
Wondering if anyone can tell me how to summarize recovery bytes/ops/objects from counters available in the ceph-mgr python interface? To put another way, how does the ceph -s command put together that infomation and can I access that information from a counter queryable by the ceph-mgr python

Re: [ceph-users] Bluestore OSD_DATA, WAL & DB

2017-09-21 Thread Benjeman Meekhof
Some of this thread seems to contradict the documentation and confuses me. Is the statement below correct? "The BlueStore journal will always be placed on the fastest device available, so using a DB device will provide the same benefit that the WAL device would while also allowing additional

Re: [ceph-users] Multipath configuration for Ceph storage nodes

2017-07-12 Thread Benjeman Meekhof
p ceph-sn1.example.com:/dev/mapper/disk1 > ceph-deploy osd prepare ceph-sn1.example.com:/dev/mapper/disk1 > > Best wishes, > Bruno > > > -----Original Message- > From: Benjeman Meekhof [mailto:bmeek...@umich.edu] > Sent: 11 July 2017 18:46 > To: Canning, Bruno

Re: [ceph-users] Multipath configuration for Ceph storage nodes

2017-07-11 Thread Benjeman Meekhof
Hi Bruno, We have similar types of nodes and minimal configuration is required (RHEL7-derived OS). Install device-mapper-multipath or equivalent package, configure /etc/multipath.conf and enable 'multipathd'. If working correctly the command 'multipath -ll' should output multipath devices and

Re: [ceph-users] removing cluster name support

2017-06-08 Thread Benjeman Meekhof
Hi Sage, We did at one time run multiple clusters on our OSD nodes and RGW nodes (with Jewel). We accomplished this by putting code in our puppet-ceph module that would create additional systemd units with appropriate CLUSTER=name environment settings for clusters not named ceph. IE, if the

[ceph-users] Disable osd hearbeat logs

2017-03-14 Thread Benjeman Meekhof
Hi all, Even with debug_osd 0/0 as well as every other debug_ setting at 0/0 I still get logs like those pasted below in /var/log/ceph/ceph-osd..log when the relevant situation arises (release 11.2.0). Any idea what toggle switches these off? I went through and set every single debug_ setting

[ceph-users] Ceph SElinux denials on OSD startup

2017-02-27 Thread Benjeman Meekhof
Hi, I'm seeing some SElinux denials for ops to nvme devices. They only occur at OSD start, they are not ongoing. I'm not sure it's causing an issue though I did try a few tests with SElinux in permissive mode to see if it made any difference with startup/recovery CPU loading we have seen since

Re: [ceph-users] Jewel to Kraken OSD upgrade issues

2017-02-23 Thread Benjeman Meekhof
far...@redhat.com> wrote: > On Thu, Feb 16, 2017 at 9:19 AM, Benjeman Meekhof <bmeek...@umich.edu> wrote: >> I tried starting up just a couple OSD with debug_osd = 20 and >> debug_filestore = 20. >> >> I pasted a sample of the ongoing log here. To my eyes it do

Re: [ceph-users] Jewel to Kraken OSD upgrade issues

2017-02-16 Thread Benjeman Meekhof
ded+inconsistent 1 active+degraded+inconsistent On Thu, Feb 16, 2017 at 5:08 PM, Shinobu Kinjo <ski...@redhat.com> wrote: > Would you simply do? > > * ceph -s > > On Fri, Feb 17, 2017 at 6:26 AM, Benjeman Meekhof <bmeek...@umich.edu> wrote: >> As I'm

Re: [ceph-users] Jewel to Kraken OSD upgrade issues

2017-02-16 Thread Benjeman Meekhof
share_map_peer 0x7fc68f4c1000 already has epoch 152609 2017-02-16 16:23:35.577356 7fc6704e4700 20 osd.564 152609 share_map_peer 0x7fc68f4c1000 already has epoch 152609 thanks, Ben On Thu, Feb 16, 2017 at 12:19 PM, Benjeman Meekhof <bmeek...@umich.edu> wrote: > I tried starting up just a c

Re: [ceph-users] Jewel to Kraken OSD upgrade issues

2017-02-16 Thread Benjeman Meekhof
to revert to Jewel except perhaps one host to continue testing. thanks, Ben On Tue, Feb 14, 2017 at 3:55 PM, Gregory Farnum <gfar...@redhat.com> wrote: > On Tue, Feb 14, 2017 at 11:38 AM, Benjeman Meekhof <bmeek...@umich.edu> wrote: >> Hi all, >> >> We encountered an

Re: [ceph-users] bcache vs flashcache vs cache tiering

2017-02-16 Thread Benjeman Meekhof
Hi all, I'd also not like to see cache tiering in the current form go away. We've explored using it in situations where we have a data pool with replicas spread across WAN sites which we then overlay with a fast cache tier local to the site where most clients will be using the pool. This

[ceph-users] Jewel to Kraken OSD upgrade issues

2017-02-14 Thread Benjeman Meekhof
Hi all, We encountered an issue updating our OSD from Jewel (10.2.5) to Kraken (11.2.0). OS was RHEL derivative. Prior to this we updated all the mons to Kraken. After updating ceph packages I restarted the 60 OSD on the box with 'systemctl restart ceph-osd.target'. Very soon after the system

Re: [ceph-users] Radosgw scaling recommendation?

2017-02-14 Thread Benjeman Meekhof
more RADOS handles? >> >> rgw_num_rados_handles = 8 >> >> That with more RGW threads as Mark mentioned. >> >> Wido >> >> > I believe some folks are considering trying to migrate rgw to a >> > threadpool/event processing model but it sounds lik

[ceph-users] Radosgw scaling recommendation?

2017-02-09 Thread Benjeman Meekhof
Hi all, We're doing some stress testing with clients hitting our rados gw nodes with simultaneous connections. When the number of client connections exceeds about 5400 we start seeing 403 forbidden errors and log messages like the following: 2017-02-09 08:53:16.915536 7f8c667bc700 0 NOTICE:

Re: [ceph-users] Latency between datacenters

2017-02-08 Thread Benjeman Meekhof
Hi Daniel, 50 ms of latency is going to introduce a big performance hit though things will still function. We did a few tests which are documented at http://www.osris.org/performance/latency thanks, Ben On Tue, Feb 7, 2017 at 12:17 PM, Daniel Picolli Biazus wrote: > Hi

Re: [ceph-users] general ceph cluster design

2016-11-28 Thread Benjeman Meekhof
Hi Nick, We have a Ceph cluster spread across 3 datacenters at 3 institutions in Michigan (UM, MSU, WSU). It certainly is possible. As noted you will have increased latency for write operations and overall reduced throughput as latency increases. Latency between our sites is 3-5ms. We did

Re: [ceph-users] Feedback wanted: health warning when standby MDS dies?

2016-10-18 Thread Benjeman Meekhof
+1 to this, it would be useful On Tue, Oct 18, 2016 at 8:31 AM, Wido den Hollander wrote: > >> Op 18 oktober 2016 om 14:06 schreef Dan van der Ster : >> >> >> +1 I would find this warning useful. >> > > +1 Probably make it configurable, say, you want at least

Re: [ceph-users] Ceph OSD journal utilization

2016-06-20 Thread Benjeman Meekhof
For automatically collecting stats like this you might also look into collectd. It has many plugins for different system statistics including one for collecting stats from Ceph daemon admin sockets. There are several ways to collect and view the data from collectd. We are pointing clients at

Re: [ceph-users] dense storage nodes

2016-05-19 Thread Benjeman Meekhof
; > Hello, > > On Wed, 18 May 2016 12:32:25 -0400 Benjeman Meekhof wrote: > >> Hi Lionel, >> >> These are all very good points we should consider, thanks for the >> analysis. Just a couple clarifications: >> >> - NVMe in this system are actually slot

Re: [ceph-users] dense storage nodes

2016-05-18 Thread Benjeman Meekhof
cases). regards, Ben On Wed, May 18, 2016 at 12:02 PM, Lionel Bouton <lionel+c...@bouton.name> wrote: > Hi, > > I'm not yet familiar with Jewel, so take this with a grain of salt. > > Le 18/05/2016 16:36, Benjeman Meekhof a écrit : >> We're in process of tuning a clus

Re: [ceph-users] How do I start ceph jewel in CentOS?

2016-05-04 Thread Benjeman Meekhof
Hi Michael, Systemctl pattern for OSD with Infernalis or higher is 'systemctl start ceph-osd@' (or status, restart) It will start OSD in default cluster 'ceph' or other cluster if you have set 'CLUSTER=' in /etc/sysconfig/ceph If by chance you have 2 clusters on the same hardware you'll have

Re: [ceph-users] osd prepare 10.1.2

2016-04-14 Thread Benjeman Meekhof
Hi Michael, The partprobe issue was resolved for me by updating parted to the package from Fedora 22: parted-3.2-16.fc22.x86_64. It shouldn't require any other dependencies updated to install on EL7 varieties. http://tracker.ceph.com/issues/15176 regards, Ben On Thu, Apr 14, 2016 at 12:35