Re: [ceph-users] Ceph community - how to make it even stronger
Hi. What makes us struggle / wonder again and again is the absence of CEPH __man pages__. On *NIX systems man pages are always the first way to go for help, right? Or is this considered "old school" from the CEPH makers / community? :O And as many ppl complain again and again, the same here as well / again... the CEPH documentation on docs.ceph.com lacks lot of useful / needed things. If you really want to work with CEPH, you need to read and track many different sources all the time, like the community news posts, docs.ceph.com, the RedHat Storage stuff and sometimes even the GitHub source code... all together very time consuming and error prone... from my point of view this is the biggest drop-back of the hole (and overall GREAT!) "storage solution"! As we are on that topic... THANKS for all the great help and posts to YOU / the CEPH community! You guys are great and really "make the difference"! - Hi All. I was reading up and especially the thread on upgrading to mimic and stable releases - caused me to reflect a bit on our ceph journey so far. We started approximately 6 months ago - with CephFS as the dominant use case in our HPC setup - starting at 400TB useable capacity and as is matures going towards 1PB - mixed slow and SSD. Some of the first confusions was. bluestore vs. filestore - what was the recommendation actually? Figuring out what kernel clients are useable with CephFS - and what kernels to use on the other end? Tuning of the MDS ? Imbalace of OSD nodes rendering the cluster down - how to balance? Triggering kernel bugs in the kernel client during OSD_FULL ? This mailing list has been very responsive to the questions, thanks for that. But - compared to other open source projects we're lacking a bit of infrastructure and guidance here. I did check: - http://tracker.ceph.com/projects/ceph/wiki/Wiki => Which does not seem to be operational. - http://docs.ceph.com/docs/mimic/start/get-involved/[http://docs.ceph.com/docs/mimic/start/get-involved/] Gmane is probably not coming back - waiting 2 years now, can we easily get the mailinglist archives indexed otherwise. I feel that the wealth of knowledge being build up around operating ceph is not really captured to make the next users journey - better and easier. I would love to help out - hey - I end up spending the time anyway, but some guidance on how to do it may help. I would suggest: 1) Dump a 1-3 monthly status email on the project to the respective mailing lists => Major releases, Conferences, etc 2) Get the wiki active - one of the main things I want to know about when messing with the storage is - What is working for other people - just a page where people can dump an aggregated output of their ceph cluster and write 2-5 lines about the use-case for it. 3) Either get community more active on the documentation - advocate for it - or start up more documentation on the wiki => A FAQ would be a nice first place to start. There may be an awful lot of things I've missed on the write up - but please follow up. If some of the core ceph people allready have thoughts / ideas / guidance, please share so we collaboratively can make it better. Lastly - thanks for the great support on the mailing list - so far - the intent is only to try to make ceph even better. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")
Hey Nathan. No blaming here. I'm very thankful for this great peace (ok, sometime more of a beast ;) ) of open-source SDS and all the great work around it incl. community and users... and happy the problem is identified and can be fixed for others/the future as well :) Well, yes, can confirm your found "error" also here: [root@sds20 ~]# ceph-detect-init Traceback (most recent call last): File "/usr/bin/ceph-detect-init", line 9, in load_entry_point('ceph-detect-init==1.0.1', 'console_scripts', 'ceph-detect-init')() File "/usr/lib/python2.7/site-packages/ceph_detect_init/main.py", line 56, in run print(ceph_detect_init.get(args.use_rhceph).init) File "/usr/lib/python2.7/site-packages/ceph_detect_init/__init__.py", line 42, in get release=release) ceph_detect_init.exc.UnsupportedPlatform: Platform is not supported.: rhel 7.5 Gesendet: Sonntag, 29. Juli 2018 um 20:33 Uhr Von: "Nathan Cutler" An: ceph.nov...@habmalnefrage.de, "Vasu Kulkarni" Cc: ceph-users , "Ceph Development" Betreff: Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released") > Strange... > - wouldn't swear, but pretty sure v13.2.0 was working ok before > - so what do others say/see? > - no one on v13.2.1 so far (hard to believe) OR > - just don't have this "systemctl ceph-osd.target" problem and all just works? > > If you also __MIGRATED__ from Luminous (say ~ v12.2.5 or older) to Mimic (say > v13.2.0 -> v13.2.1) and __DO NOT__ see the same systemctl problems, whats > your Linix OS and version (I'm on RHEL 7.5 here) ? :O Best regards Anton Hi ceph.novice: I'm the one to blame for this regretful incident. Today I have reproduced the issue in teuthology: 2018-07-29T18:20:07.288 INFO:teuthology.orchestra.run.ovh093:Running: 'sudo TESTDIR=/home/ubuntu/cephtest bash -c ceph-detect-init' 2018-07-29T18:20:07.796 INFO:teuthology.orchestra.run.ovh093.stderr:Traceback (most recent call last): 2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: File "/bin/ceph-detect-init", line 9, in 2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: load_entry_point('ceph-detect-init==1.0.1', 'console_scripts', 'ceph-detect-init')() 2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: File "/usr/lib/python2.7/site-packages/ceph_detect_init/main.py", line 56, in run 2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: print(ceph_detect_init.get(args.use_rhceph).init) 2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: File "/usr/lib/python2.7/site-packages/ceph_detect_init/__init__.py", line 42, in get 2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: release=release) 2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr:ceph_detect_init.exc.UnsupportedPlatform: Platform is not supported.: rhel 7.5 Just to be sure, can you confirm? (I.e. issue the command "ceph-detect-init" on your RHEL 7.5 system. Instead of saying "systemd" it gives an error like above?) I'm working on a fix now at https://github.com/ceph/ceph/pull/23303 Nathan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")
Strange... - wouldn't swear, but pretty sure v13.2.0 was working ok before - so what do others say/see? - no one on v13.2.1 so far (hard to believe) OR - just don't have this "systemctl ceph-osd.target" problem and all just works? If you also __MIGRATED__ from Luminous (say ~ v12.2.5 or older) to Mimic (say v13.2.0 -> v13.2.1) and __DO NOT__ see the same systemctl problems, whats your Linix OS and version (I'm on RHEL 7.5 here) ? :O Gesendet: Sonntag, 29. Juli 2018 um 03:15 Uhr Von: "Vasu Kulkarni" An: ceph.nov...@habmalnefrage.de Cc: "Sage Weil" , ceph-users , "Ceph Development" Betreff: Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released") On Sat, Jul 28, 2018 at 6:02 PM, wrote: > Have you guys changed something with the systemctl startup of the OSDs? I think there is some kind of systemd issue hidden in mimic, https://tracker.ceph.com/issues/25004 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")
Have you guys changed something with the systemctl startup of the OSDs? I've stopped and disabled all the OSDs on all my hosts via "systemctl stop|disable ceph-osd.target" and rebooted all the nodes. Everything look just the same. The I started all the OSD daemons one after the other via the CLI with "/usr/bin/ceph-osd -f --cluster ceph --id $NR --setuser ceph --setgroup ceph > /tmp/osd.${NR}.log 2>&1 & " and now everything (ok, beside the ZABBIX mgr module?!?) seems to work :| cluster: id: 2a919338-4e44-454f-bf45-e94a01c2a5e6 health: HEALTH_WARN Failed to send data to Zabbix services: mon: 3 daemons, quorum sds20,sds21,sds22 mgr: sds22(active), standbys: sds20, sds21 osd: 18 osds: 18 up, 18 in rgw: 4 daemons active data: pools: 25 pools, 1390 pgs objects: 2.55 k objects, 3.4 GiB usage: 26 GiB used, 8.8 TiB / 8.8 TiB avail pgs: 1390 active+clean io: client: 11 KiB/s rd, 10 op/s rd, 0 op/s wr Any hints? -- Gesendet: Samstag, 28. Juli 2018 um 23:35 Uhr Von: ceph.nov...@habmalnefrage.de An: "Sage Weil" Cc: ceph-users@lists.ceph.com, ceph-de...@vger.kernel.org Betreff: Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released") Hi Sage. Sure. Any specific OSD(s) log(s)? Or just any? Gesendet: Samstag, 28. Juli 2018 um 16:49 Uhr Von: "Sage Weil" An: ceph.nov...@habmalnefrage.de, ceph-users@lists.ceph.com, ceph-de...@vger.kernel.org Betreff: Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released") Can you include more or your osd log file? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")
Hi Sage. Sure. Any specific OSD(s) log(s)? Or just any? Gesendet: Samstag, 28. Juli 2018 um 16:49 Uhr Von: "Sage Weil" An: ceph.nov...@habmalnefrage.de, ceph-users@lists.ceph.com, ceph-de...@vger.kernel.org Betreff: Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released") Can you include more or your osd log file? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")
Dear users and developers. I've updated our dev-cluster from v13.2.0 to v13.2.1 yesterday and since then everything is badly broken. I've restarted all Ceph components via "systemctl" and also rebootet the server SDS21 and SDS24, nothing changes. This cluster started as Kraken, was updated to Luminous (up to v12.2.5) and then to Mimic. Here are some system related infos, see https://semestriel.framapad.org/p/DTkBspmnfU Somehow I guess this may have to do with the various "ceph-disk", "ceph-volume", ceph-lvm" changes in the last months?!? Thanks & regards Anton -- Gesendet: Samstag, 28. Juli 2018 um 00:22 Uhr Von: "Bryan Stillwell" An: "ceph-users@lists.ceph.com" Betreff: Re: [ceph-users] v13.2.1 Mimic released I decided to upgrade my home cluster from Luminous (v12.2.7) to Mimic (v13.2.1) today and ran into a couple issues: 1. When restarting the OSDs during the upgrade it seems to forget my upmap settings. I had to manually return them to the way they were with commands like: ceph osd pg-upmap-items 5.1 11 18 8 6 9 0 ceph osd pg-upmap-items 5.1f 11 17 I also saw this when upgrading from v12.2.5 to v12.2.7. 2. Also after restarting the first OSD during the upgrade I saw 21 messages like these in ceph.log: 2018-07-27 15:53:49.868552 osd.1 osd.1 10.0.0.207:6806/4029643 97 : cluster [WRN] failed to encode map e100467 with expected crc 2018-07-27 15:53:49.922365 osd.6 osd.6 10.0.0.16:6804/90400 25 : cluster [WRN] failed to encode map e100467 with expected crc 2018-07-27 15:53:49.925585 osd.6 osd.6 10.0.0.16:6804/90400 26 : cluster [WRN] failed to encode map e100467 with expected crc 2018-07-27 15:53:49.944414 osd.18 osd.18 10.0.0.15:6808/120845 8 : cluster [WRN] failed to encode map e100467 with expected crc 2018-07-27 15:53:49.944756 osd.17 osd.17 10.0.0.15:6800/120749 13 : cluster [WRN] failed to encode map e100467 with expected crc Is this a sign that full OSD maps were sent out by the mons to every OSD like back in the hammer days? I seem to remember that OSD maps should be a lot smaller now, so maybe this isn't as big of a problem as it was back then? Thanks, Bryan From: ceph-users on behalf of Sage Weil Date: Friday, July 27, 2018 at 1:25 PM To: "ceph-annou...@lists.ceph.com" , "ceph-users@lists.ceph.com" , "ceph-maintain...@lists.ceph.com" , "ceph-de...@vger.kernel.org" Subject: [ceph-users] v13.2.1 Mimic released This is the first bugfix release of the Mimic v13.2.x long term stable release series. This release contains many fixes across all components of Ceph, including a few security fixes. We recommend that all users upgrade. Notable Changes -- * CVE 2018-1128: auth: cephx authorizer subject to replay attack (issue#24836 http://tracker.ceph.com/issues/24836, Sage Weil) * CVE 2018-1129: auth: cephx signature check is weak (issue#24837 http://tracker.ceph.com/issues/24837[http://tracker.ceph.com/issues/24837], Sage Weil) * CVE 2018-10861: mon: auth checks not correct for pool ops (issue#24838 *
Re: [ceph-users] mimic (13.2.0) and "Failed to send data to Zabbix"
There was no change in the ZABBIX environment... I got the this warning some minutes after the Linux and Luminous->Mimic update via YUM and a reboot of all the Ceph servers... Is there anyone, who also had the ZABBIX module unabled under Luminos AND then migrated to Mimic? If yes, does it work "ok" in your place? If yes, which Linux OS/version are you running? - Ok, but the reason the Module is issuing the warning is that zabbix_sender does not exit with status 0. You might want to check why this is. Was there a version change of Zabbix? If so, try to trace what might have changed that causes zabbix_sender to exit non-zero. Wido ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mimic (13.2.0) and "Failed to send data to Zabbix"
at about the same time we also updated the Linux OS via "YUM" to: # more /etc/redhat-release Red Hat Enterprise Linux Server release 7.5 (Maipo) from the given error message, it seems like there are 32 "measure points", which are to be send but 3 of them are somehow failing: >>> "response":"success","info":"processed: 29; failed: 3; total: 32; seconds >>> spent: 0.000605" <<< and the funny thing is, our monitoring team, who runs the ZABBIX service/infra here, still receive "all stuff" This is the problem, the zabbix_sender process is exiting with a non-zero status. You didn't change anything? You just upgraded from Luminous to Mimic and this came along? Wido > --- > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com] ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] mimic (13.2.0) and "Failed to send data to Zabbix"
anyone with "mgr Zabbix enabled" migrated from Luminous (12.2.5 or 5) and has the same problem in Mimic now? if I disable and re-enable the "zabbix" module, the status is "HEALTH_OK" for some sec. and changes to "HEALTH_WARN" again... --- # ceph -s cluster: id: health: HEALTH_WARN Failed to send data to Zabbix services: mon: 3 daemons, quorum ceph20,ceph21,ceph22 mgr: ceph21(active), standbys: ceph20, ceph22 osd: 18 osds: 18 up, 18 in rgw: 4 daemons active data: pools: 25 pools, 1390 pgs objects: 2.55 k objects, 3.4 GiB usage: 26 GiB used, 8.8 TiB / 8.8 TiB avail pgs: 1390 active+clean io: client: 8.6 KiB/s rd, 9 op/s rd, 0 op/s wr # ceph version ceph version 13.2.0 () mimic (stable) # grep -i zabbix /var/log/ceph/ceph-mgr.ceph21.log | tail -2 2018-07-11 09:50:10.191 7f2223582700 0 mgr[zabbix] Exception when sending: /usr/bin/zabbix_sender exited non-zero: zabbix_sender [18450]: DEBUG: answer [{"response":"success","info":"processed: 29; failed: 3; total: 32; seconds spent: 0.000605"}] 2018-07-11 09:51:10.222 7f2223582700 0 mgr[zabbix] Exception when sending: /usr/bin/zabbix_sender exited non-zero: zabbix_sender [18459]: DEBUG: answer [{"response":"success","info":"processed: 29; failed: 3; total: 32; seconds spent: 0.000692"}] --- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mimic 13.2.1 release date
- adding ceph-devel - Same here. An estimated date would already help for internal plannings :| Gesendet: Dienstag, 10. Juli 2018 um 11:59 Uhr Von: "Martin Overgaard Hansen" An: ceph-users Betreff: Re: [ceph-users] Mimic 13.2.1 release date > Den 9. jul. 2018 kl. 17.12 skrev Wido den Hollander : > > Hi, > > Is there a release date for Mimic 13.2.1 yet? > > There are a few issues which currently make deploying with Mimic 13.2.0 > a bit difficult, for example: > > - https://tracker.ceph.com/issues/24423 > - > https://github.com/ceph/ceph/pull/22393[https://github.com/ceph/ceph/pull/22393] > > Especially the first one makes it difficult. > > 13.2.1 would be very welcome with these fixes in there. > > Is there a ETA for this version yet? > > Wido > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com] Also looking forward to this release, we had to revert to luminous to continue expanding our cluster. An ETA would be great, thanks. Best regards, Martin Overgaard Hansen MultiHouse IT Partner A/S ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com] ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CentOS release 7.4.1708 and selinux-policy-base >= 3.13.1-166.el7_4.9
Hi Ruben and community. Thanks a lot for all the help and hints. Finally I figured out that "base" is also part of i.e. "selinux-policy-minimum". After installing this pkg via "yum install", the usual "ceph installation" continues... Seems like the "ceph packaging" is too much RHEL oriented ;) Anyhow, I'll continue now and, after reading many complains about "ceph-deploy" and also "ceph-volume" recently, our standard tool, "ceph-deploy" will work as it was used to be Best regards Anton Gesendet: Donnerstag, 03. Mai 2018 um 10:57 Uhr Von: "Ruben Kerkhof"An: ceph.nov...@habmalnefrage.de Cc: ceph-users Betreff: Re: [ceph-users] CentOS release 7.4.1708 and selinux-policy-base >= 3.13.1-166.el7_4.9 On Thu, May 3, 2018 at 1:33 AM, wrote: > > Hi all. Hi Anton, > > We try to setup our first CentOS 7.4.1708 CEPH cluster, based on Luminous > 12.2.5. What we get is: > > > Error: Package: 2:ceph-selinux-12.2.5-0.el7.x86_64 (Ceph-Luminous) > Requires: selinux-policy-base >= 3.13.1-166.el7_4.9 > > > __Host infos__: > > root> lsb_release -d > Description: CentOS Linux release 7.4.1708 (Core) > > root@> uname -a > Linux 3.10.0-693.11.1.el7.x86_64 #1 SMP Mon Dec 4 23:52:40 UTC > 2017 x86_64 x86_64 x86_64 GNU/Linux > > __Question__: > Where can I find the elinux-policy-base-3.13.1-166.el7_4.9 package? It is provided by selinux-policy-targeted: ruben@localhost: ~$ rpm -q --provides selinux-policy-targeted config(selinux-policy-targeted) = 3.13.1-166.el7_4.9 selinux-policy-base = 3.13.1-166.el7_4.9 selinux-policy-targeted = 3.13.1-166.el7_4.9 > > > Regards > Anton > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Kind regards, Ruben Kerkhof ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] CentOS release 7.4.1708 and selinux-policy-base >= 3.13.1-166.el7_4.9
Hi all. We try to setup our first CentOS 7.4.1708 CEPH cluster, based on Luminous 12.2.5. What we get is: Error: Package: 2:ceph-selinux-12.2.5-0.el7.x86_64 (Ceph-Luminous) Requires: selinux-policy-base >= 3.13.1-166.el7_4.9 __Host infos__: root> lsb_release -d Description:CentOS Linux release 7.4.1708 (Core) root@> uname -a Linux 3.10.0-693.11.1.el7.x86_64 #1 SMP Mon Dec 4 23:52:40 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux __Question__: Where can I find the elinux-policy-base-3.13.1-166.el7_4.9 package? Regards Anton ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy: recommended?
... we use (only!) ceph-deploy in all our environments, tools and scripts. If I look in the efforts went into ceph-volume and all the related issues, "manual LVM" overhead and/or still missing features, PLUS the in the same discussions mentioned recommendations to use something like ceph-ansible in parallel for the missing stuff, I can only hope we will find a (full time?!) maintainer for ceph-deploy and keep it alive. PLEASE ;) Gesendet: Donnerstag, 05. April 2018 um 08:53 Uhr Von: "Wido den Hollander"An: ceph-users@lists.ceph.com Betreff: Re: [ceph-users] ceph-deploy: recommended? On 04/04/2018 08:58 PM, Robert Stanford wrote: > > I read a couple of versions ago that ceph-deploy was not recommended > for production clusters. Why was that? Is this still the case? We > have a lot of problems automating deployment without ceph-deploy. > > In the end it is just a Python tool which deploys the daemons. It is not active in any way. Stability of the cluster is not determined by the use of ceph-deploy, but by the runnings daemons. I use ceph-deploy sometimes in very large deployments to make my life a bit easier. Wido > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com] ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Help ! how to recover from total monitor failure in lumnious
there pick your "DISTRO", klick on the "ID", klick "Repo URL"... Gesendet: Freitag, 02. Februar 2018 um 21:34 Uhr Von: ceph.nov...@habmalnefrage.de An: "Frank Li"Cc: "ceph-users@lists.ceph.com" Betreff: Re: [ceph-users] Help ! how to recover from total monitor failure in lumnious https://shaman.ceph.com/repos/ceph/wip-22847-luminous/f04a4a36f01fdd5d9276fa5cfa1940f5cc11fb81/ Gesendet: Freitag, 02. Februar 2018 um 21:27 Uhr Von: "Frank Li" An: "Sage Weil" Cc: "ceph-users@lists.ceph.com" Betreff: Re: [ceph-users] Help ! how to recover from total monitor failure in lumnious Sure, please let me know where to get and run the binaries. Thanks for the fast response ! -- Efficiency is Intelligent Laziness On 2/2/18, 10:31 AM, "Sage Weil" wrote: On Fri, 2 Feb 2018, Frank Li wrote: > Yes, I was dealing with an issue where OSD are not peerings, and I was trying to see if force-create-pg can help recover the peering. > Data lose is an accepted possibility. > > I hope this is what you are looking for ? > > -3> 2018-01-31 22:47:22.942394 7fc641d0b700 5 mon.dl1-kaf101@0(electing) e6 _ms_dispatch setting monitor caps on this connection > -2> 2018-01-31 22:47:22.942405 7fc641d0b700 5 mon.dl1-kaf101@0(electing).paxos(paxos recovering c 28110997..28111530) is_readable = 0 - now=2018-01-31 22:47:22.942405 lease_expire=0.00 has v0 lc 28111530 > -1> 2018-01-31 22:47:22.942422 7fc641d0b700 5 mon.dl1-kaf101@0(electing).paxos(paxos recovering c 28110997..28111530) is_readable = 0 - now=2018-01-31 22:47:22.942422 lease_expire=0.00 has v0 lc 28111530 > 0> 2018-01-31 22:47:22.955415 7fc64350e700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/osd/OSDMapMapping.h: In function 'void OSDMapMapping::get(pg_t, std::vector*, int*, std::vector*, int*) const' thread 7fc64350e700 time 2018-01-31 22:47:22.952877 > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/osd/OSDMapMapping.h: 288: FAILED assert(pgid.ps() < p->second.pg_num) Perfect, thanks! I have a test fix for this pushed to wip-22847-luminous which should appear on shaman.ceph.com in an hour or so; can you give that a try? (Only need to install the updated package on the mons.) Thanks! sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Help ! how to recover from total monitor failure in lumnious
https://shaman.ceph.com/repos/ceph/wip-22847-luminous/f04a4a36f01fdd5d9276fa5cfa1940f5cc11fb81/ Gesendet: Freitag, 02. Februar 2018 um 21:27 Uhr Von: "Frank Li"An: "Sage Weil" Cc: "ceph-users@lists.ceph.com" Betreff: Re: [ceph-users] Help ! how to recover from total monitor failure in lumnious Sure, please let me know where to get and run the binaries. Thanks for the fast response ! -- Efficiency is Intelligent Laziness On 2/2/18, 10:31 AM, "Sage Weil" wrote: On Fri, 2 Feb 2018, Frank Li wrote: > Yes, I was dealing with an issue where OSD are not peerings, and I was trying to see if force-create-pg can help recover the peering. > Data lose is an accepted possibility. > > I hope this is what you are looking for ? > > -3> 2018-01-31 22:47:22.942394 7fc641d0b700 5 mon.dl1-kaf101@0(electing) e6 _ms_dispatch setting monitor caps on this connection > -2> 2018-01-31 22:47:22.942405 7fc641d0b700 5 mon.dl1-kaf101@0(electing).paxos(paxos recovering c 28110997..28111530) is_readable = 0 - now=2018-01-31 22:47:22.942405 lease_expire=0.00 has v0 lc 28111530 > -1> 2018-01-31 22:47:22.942422 7fc641d0b700 5 mon.dl1-kaf101@0(electing).paxos(paxos recovering c 28110997..28111530) is_readable = 0 - now=2018-01-31 22:47:22.942422 lease_expire=0.00 has v0 lc 28111530 > 0> 2018-01-31 22:47:22.955415 7fc64350e700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/osd/OSDMapMapping.h: In function 'void OSDMapMapping::get(pg_t, std::vector*, int*, std::vector*, int*) const' thread 7fc64350e700 time 2018-01-31 22:47:22.952877 > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/osd/OSDMapMapping.h: 288: FAILED assert(pgid.ps() < p->second.pg_num) Perfect, thanks! I have a test fix for this pushed to wip-22847-luminous which should appear on shaman.ceph.com in an hour or so; can you give that a try? (Only need to install the updated package on the mons.) Thanks! sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph luminous - performance issue
Hi Steven. interesting... 'm quite curious after your post now. I've migrated our prod. CEPH cluster to 12.2.2 and Bluestore just today and haven't heard back anything "bad" from the applications/users so far. performance tests on our test cluster were good before, but we use S3/RGW only anyhow ;) there are two things I would like to know/learn... could you try/test and feed back?! - change all your tests to use >=16k block size, see also BStore comments here (https://www.mail-archive.com/ceph-users@lists.ceph.com/msg43023.html) - change your "write.fio" file profile from "rw=randwrite" to "rw=write" (or something similar :O ) to compare apples with apples ;) thanks for your efforts and looking forward for those results ;) best regards Notna -- Gesendet: Mittwoch, 03. Januar 2018 um 16:20 Uhr Von: "Steven Vacaroaia"An: "Brady Deetz" Cc: ceph-users Betreff: Re: [ceph-users] ceph luminous - performance issue Thanks for your willingness to help DELL R620, 1 CPU, 8 cores, 64 GB RAM cluster network is using 2 bonded 10 GB NICs ( mode=4), MTU=9000 SSD drives are Enterprise grade - 400 GB SSD Toshiba PX04SHB040 HDD drives are - 10k RPM, 600 GB Toshiba AL13SEB600 Steven On 3 January 2018 at 09:41, Brady Deetz wrote: Can you provide more detail regarding the infrastructure backing this environment? What hard drive, ssd, and processor are you using? Also, what is providing networking? I'm seeing 4k blocksize tests here. Latency is going to destroy you. On Jan 3, 2018 8:11 AM, "Steven Vacaroaia" wrote: Hi, I am doing a PoC with 3 DELL R620 and 12 OSD , 3 SSD drives ( one on each server), bluestore I configured the OSD using the following ( /dev/sda is my SSD drive) ceph-disk prepare --zap-disk --cluster ceph --bluestore /dev/sde --block.wal /dev/sda --block.db /dev/sda Unfortunately both fio and bench tests show much worse performance for the pools than for the individual disks Example: DISKS fio --filename=/dev/sda --direct=1 --sync=1 --rw=write --bs=4k --numjobs=14 --iodepth=1 --runtime=60 --time_based --group_reporting --name=journal-test SSD drive Jobs: 14 (f=14): [W(14)] [100.0% done] [0KB/465.2MB/0KB /s] [0/119K/0 iops] [eta 00m:00s] HD drive Jobs: 14 (f=14): [W(14)] [100.0% done] [0KB/179.2MB/0KB /s] [0/45.9K/0 iops] [eta 00m:00s] POOL fio write.fio Jobs: 1 (f=0): [w(1)] [100.0% done] [0KB/51428KB/0KB /s] [0/12.9K/0 iops] cat write.fio [write-4M] description="write test with 4k block" ioengine=rbd clientname=admin pool=scbench rbdname=image01 iodepth=32 runtime=120 rw=randwrite bs=4k rados bench -p scbench 12 write Max bandwidth (MB/sec): 224 Min bandwidth (MB/sec): 0 Average IOPS: 26 Stddev IOPS: 24 Max IOPS: 56 Min IOPS: 0 Average Latency(s): 0.59819 Stddev Latency(s): 1.64017 Max latency(s): 10.8335 Min latency(s): 0.00475139 I must be missing something - any help/suggestions will be greatly appreciated Here are some specific info ceph -s cluster: id: 91118dde-f231-4e54-a5f0-a1037f3d5142 health: HEALTH_OK services: mon: 1 daemons, quorum mon01 mgr: mon01(active) osd: 12 osds: 12 up, 12 in data: pools: 4 pools, 484 pgs objects: 70082 objects, 273 GB usage: 570 GB used, 6138 GB / 6708 GB avail pgs: 484 active+clean io: client: 2558 B/s rd, 2 op/s rd, 0 op/s wr ceph osd pool ls detail pool 1 'test-replicated' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 157 flags hashpspool stripe_width 0 application rbd removed_snaps [1~3] pool 2 'test-erasure' erasure size 3 min_size 3 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 334 flags hashpspool stripe_width 8192 application rbd removed_snaps [1~5] pool 3 'rbd' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 200 flags hashpspool stripe_width 0 application rbd removed_snaps [1~3] pool 4 'scbench' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 100 pgp_num 100 last_change 330 flags hashpspool stripe_width 0 removed_snaps [1~3] [cephuser@ceph ceph-config]$ ceph osd df tree ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS TYPE NAME -1 6.55128 - 2237G 198G 2038G 0 0 - root default -7 0 - 0 0 0 0 0 - host ods03 -3 2.18475 - 2237G 181G 2055G 8.12 0.96 - host osd01 3 hdd 0.54619 1.0 559G 53890M 506G 9.41 1.11 90 osd.3 4 hdd 0.54619 1.0 559G 30567M 529G 5.34 0.63 89 osd.4 5 hdd 0.54619
Re: [ceph-users] RGW Logging pool
we never managed to make it work, but I guess the "RGW metadata search" [c|sh]ould have been "the official solution"... - http://ceph.com/geen-categorie/rgw-metadata-search/ - https://marc.info/?l=ceph-devel=149152531005431=2 - http://ceph.com/rgw/new-luminous-rgw-metadata-search/ there was also a solution based on HAproxy as beeing the "middleware" between the S3 clients and the RGW service, which I cannot find now... should you solve your problem, PLEASE post how you did it (with real examples/commands)... because - exactly this was one of the core requirements (beside life cycle, which didn't work as well :| ) in a PoC here and CEPH/RGW failed - I would still like to push CEPH for coming projects... but all of them have the "metasearch" requirement Thanks and regards Gesendet: Freitag, 15. Dezember 2017 um 18:21 Uhr Von: "David Turner"An: ceph-users , "Yehuda Sadeh-Weinraub" Betreff: [ceph-users] RGW Logging pool We're trying to build an auditing system for when a user key pair performs an operation on a bucket (put, delete, creating a bucket, etc) and so far were only able to find this information in the level 10 debug logging in the rgw systems logs. We noticed that our rgw log pool has been growing somewhat indefinitely and we had to move it off of the nvme's and put it to HDD's due to it's growing size. What is in that pool and how can it be accessed? I haven't found the right terms to search for to find anything about what's in this pool on the ML or on Google. What I would like to do is export the log to ElasticSearch, cleanup the log on occasion, and hopefully find the information we're looking for to fulfill our user auditing without having our RGW daemons running on debug level 10 (which is a lot of logging!).___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] S3 object notifications
Hi Yehuda. Are there any examples (doc's, blog posts, ...): - how to use that "framework" and especially for the "callbacks" - for the latest "Metasearch" feature / usage with a S3 client/tools like CyberDuck, s3cmd, AWSCLI or at least boto3? - i.e. is an external ELK still needed or is this somehow included in RGW now? Thanks & regards Gesendet: Dienstag, 28. November 2017 um 13:52 Uhr Von: "Yehuda Sadeh-Weinraub"An: "Sean Purdy" Cc: "ceph-users@lists.ceph.com" Betreff: Re: [ceph-users] S3 object notifications rgw has a sync modules framework that allows you to write your own sync plugins. The system identifies objects changes and triggers callbacks that can then act on those changes. For example, the metadata search feature that was added recently is using this to send objects metadata into elasticsearch for indexing. Yehuda On Tue, Nov 28, 2017 at 2:22 PM, Sean Purdy wrote: > Hi, > > > http://docs.ceph.com/docs/master/radosgw/s3/ says that S3 object > notifications are not supported. I'd like something like object notifications > so that we can backup new objects in realtime, instead of trawling the whole > object list for what's changed. > > Is there anything similar I can use? I've found Spreadshirt's haproxy fork > which traps requests and updates redis - > https://github.com/spreadshirt/s3gw-haproxy[https://github.com/spreadshirt/s3gw-haproxy][https://github.com/spreadshirt/s3gw-haproxy[https://github.com/spreadshirt/s3gw-haproxy]] > Anybody used that? > > > Thanks, > > Sean Purdy > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com][http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com]] ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com][http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com]] ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] docs.ceph.com broken since... days?!?
... or at least since yesterday! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW lifecycle not expiring objects
grrr... sorry && and again as text :| Gesendet: Montag, 05. Juni 2017 um 01:12 Uhr Von: ceph.nov...@habmalnefrage.de An: "Yehuda Sadeh-Weinraub"Cc: "ceph-users@lists.ceph.com" , ceph-de...@vger.kernel.org Betreff: Re: [ceph-users] RGW lifecycle not expiring objects Hi (again) Yehuda. Looping in ceph-devel... Could it be that lifecycle is still not implemented neither in Jewel nor in Kraken, even if release notes and other places say so? https://www.spinics.net/lists/ceph-devel/msg34492.html https://github.com/ceph/ceph-ci/commit/7d48f62f5c86913d8f00b44d46a04a52d338907c https://github.com/ceph/ceph-ci/commit/9162bd29594d34429a09562ed60a32a0703940ea Thanks & regards Anton Gesendet: Sonntag, 04. Juni 2017 um 21:34 Uhr Von: ceph.nov...@habmalnefrage.de An: "Yehuda Sadeh-Weinraub" Cc: "ceph-users@lists.ceph.com" Betreff: Re: [ceph-users] RGW lifecycle not expiring objects Hi Yahuda. Well, here we go: http://tracker.ceph.com/issues/20177[http://tracker.ceph.com/issues/20177] As it's my first one, hope it's ok as it is... Thanks & regards Anton Gesendet: Samstag, 03. Juni 2017 um 00:14 Uhr Von: "Yehuda Sadeh-Weinraub" An: ceph.nov...@habmalnefrage.de Cc: "Graham Allan" , "ceph-users@lists.ceph.com" Betreff: Re: [ceph-users] RGW lifecycle not expiring objects Have you opened a ceph tracker issue, so that we don't lose track of the problem? Thanks, Yehuda On Fri, Jun 2, 2017 at 3:05 PM, wrote: > Hi Graham. > > We are on Kraken and have the same problem with "lifecycle". Various (other) > tools like s3cmd or CyberDuck do show the applied "expiration" settings, but > objects seem never to be purged. > > If you should have new findings, hints,... PLEASE share/let me know. > > Thanks a lot! > Anton > > > Gesendet: Freitag, 19. Mai 2017 um 22:44 Uhr > Von: "Graham Allan" > An: ceph-users@lists.ceph.com > Betreff: [ceph-users] RGW lifecycle not expiring objects > I've been having a hard time getting the s3 object lifecycle to do > anything here. I was able to set a lifecycle on a test bucket. As others > also seem to have found, I do get an EACCES error on setting the > lifecycle, but it does however get stored: > >> % aws --endpoint-url >> https://xxx.xxx.xxx.xxx[https://xxx.xxx.xxx.xxx][https://xxx.xxx.xxx.xxx[https://xxx.xxx.xxx.xxx]] >> s3api get-bucket-lifecycle-configuration --bucket=testgta >> { >> "Rules": [ >> { >> "Status": "Enabled", >> "Prefix": "", >> "Expiration": { >> "Days": 3 >> }, >> "ID": "test" >> } >> ] >> } > > but many days later I have yet to see any object actually get expired. > There are some hints in the rgw log that the expiry thread does run > periodically: > >> 2017-05-19 03:49:03.281347 7f74f1134700 2 >> RGWDataChangesLog::ChangesRenewThread: start >> 2017-05-19 03:49:16.356022 7f74ef931700 2 object expiration: start >> 2017-05-19 03:49:16.356036 7f74ef931700 20 proceeding shard = >> obj_delete_at_hint.00 >> 2017-05-19 03:49:16.359785 7f74ef931700 20 proceeding shard = >> obj_delete_at_hint.01 >> 2017-05-19 03:49:16.364667 7f74ef931700 20 proceeding shard = >> obj_delete_at_hint.02 >> 2017-05-19 03:49:16.369636 7f74ef931700 20 proceeding shard = >> obj_delete_at_hint.03 > ... >> 2017-05-19 03:49:16.803270 7f74ef931700 20 proceeding shard = >> obj_delete_at_hint.000126 >> 2017-05-19 03:49:16.806423 7f74ef931700 2 object expiration: stop > > "radosgw-admin lc process" gives me no output unless I enable debug, then: > >> ]# radosgw-admin lc process >> 2017-05-19 15:28:46.383049 7fedb9ffb700 2 >> RGWDataChangesLog::ChangesRenewThread: start >> 2017-05-19 15:28:46.421806 7feddc240c80 10 Cannot find current period zone >> using local zone >> 2017-05-19 15:28:46.453431 7feddc240c80 2 all 8 watchers are set, enabling >> cache >> 2017-05-19 15:28:46.614991 7feddc240c80 2 removed watcher, disabling cache > > "radosgw-admin lc list" seems to return "empty" output: > >> # radosgw-admin lc list >> [] > > Is there anything obvious that I might be missing? > > Graham > -- > Graham Allan > Minnesota Supercomputing Institute - g...@umn.edu > ___ > ceph-users mailing list > ceph-users@lists.ceph.com >
Re: [ceph-users] RGW lifecycle not expiring objects
Hi (again) Yehuda. Looping in ceph-devel... Could it be that lifecycle is still not implemented neither in Jewel nor in Kraken, even if release notes and other places say so? https://www.spinics.net/lists/ceph-devel/msg34492.html https://github.com/ceph/ceph-ci/commit/7d48f62f5c86913d8f00b44d46a04a52d338907c https://github.com/ceph/ceph-ci/commit/9162bd29594d34429a09562ed60a32a0703940ea Thanks & regards Anton Gesendet: Sonntag, 04. Juni 2017 um 21:34 Uhr Von: ceph.nov...@habmalnefrage.de An: "Yehuda Sadeh-Weinraub"Cc: "ceph-users@lists.ceph.com" Betreff: Re: [ceph-users] RGW lifecycle not expiring objects Hi Yahuda. Well, here we go: http://tracker.ceph.com/issues/20177 As it's my first one, hope it's ok as it is... Thanks & regards Anton Gesendet: Samstag, 03. Juni 2017 um 00:14 Uhr Von: "Yehuda Sadeh-Weinraub" An: ceph.nov...@habmalnefrage.de Cc: "Graham Allan" , "ceph-users@lists.ceph.com" Betreff: Re: [ceph-users] RGW lifecycle not expiring objects Have you opened a ceph tracker issue, so that we don't lose track of the problem? Thanks, Yehuda On Fri, Jun 2, 2017 at 3:05 PM, wrote: > Hi Graham. > > We are on Kraken and have the same problem with "lifecycle". Various (other) tools like s3cmd or CyberDuck do show the applied "expiration" settings, but objects seem never to be purged. > > If you should have new findings, hints,... PLEASE share/let me know. > > Thanks a lot! > Anton > > > Gesendet: Freitag, 19. Mai 2017 um 22:44 Uhr > Von: "Graham Allan" > An: ceph-users@lists.ceph.com > Betreff: [ceph-users] RGW lifecycle not expiring objects > I've been having a hard time getting the s3 object lifecycle to do > anything here. I was able to set a lifecycle on a test bucket. As others > also seem to have found, I do get an EACCES error on setting the > lifecycle, but it does however get stored: > >> % aws --endpoint-url https://xxx.xxx.xxx.xxx[https://xxx.xxx.xxx.xxx] s3api get-bucket-lifecycle-configuration --bucket=testgta >> { >> "Rules": [ >> { >> "Status": "Enabled", >> "Prefix": "", >> "Expiration": { >> "Days": 3 >> }, >> "ID": "test" >> } >> ] >> } > > but many days later I have yet to see any object actually get expired. > There are some hints in the rgw log that the expiry thread does run > periodically: > >> 2017-05-19 03:49:03.281347 7f74f1134700 2 RGWDataChangesLog::ChangesRenewThread: start >> 2017-05-19 03:49:16.356022 7f74ef931700 2 object expiration: start >> 2017-05-19 03:49:16.356036 7f74ef931700 20 proceeding shard = obj_delete_at_hint.00 >> 2017-05-19 03:49:16.359785 7f74ef931700 20 proceeding shard = obj_delete_at_hint.01 >> 2017-05-19 03:49:16.364667 7f74ef931700 20 proceeding shard = obj_delete_at_hint.02 >> 2017-05-19 03:49:16.369636 7f74ef931700 20 proceeding shard = obj_delete_at_hint.03 > ... >> 2017-05-19 03:49:16.803270 7f74ef931700 20 proceeding shard = obj_delete_at_hint.000126 >> 2017-05-19 03:49:16.806423 7f74ef931700 2 object expiration: stop > > "radosgw-admin lc process" gives me no output unless I enable debug, then: > >> ]# radosgw-admin lc process >> 2017-05-19 15:28:46.383049 7fedb9ffb700 2 RGWDataChangesLog::ChangesRenewThread: start >> 2017-05-19 15:28:46.421806 7feddc240c80 10 Cannot find current period zone using local zone >> 2017-05-19 15:28:46.453431 7feddc240c80 2 all 8 watchers are set, enabling cache >> 2017-05-19 15:28:46.614991 7feddc240c80 2 removed watcher, disabling cache > > "radosgw-admin lc list" seems to return "empty" output: > >> # radosgw-admin lc list >> [] > > Is there anything obvious that I might be missing? > > Graham > -- > Graham Allan > Minnesota Supercomputing Institute - g...@umn.edu > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com][http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com]][http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com][http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com]]] > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com][http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com]] ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com] ___
Re: [ceph-users] RGW multisite sync data sync shard stuck
Hi Andreas. Well, we do _NOT_ need multiside in our environment, but unfortunately is is the basis for the announced "metasearch", based on ElasticSearch... so we try to implement a "multisite" config on Kraken (v11.2.0) since weeks, but never succeeded so far. We have purged and started all over with the multiside config for about ~5x by now. We have one CEPH cluster with two RadosGW's on top (so NOT two CEPH cluster!), not sure if this makes a difference!? Can you please share some infos about your (former working?!?) setup? Like - which CEPH version are you on - old deprecated "federated" or "new from Jewel" multiside setup - one or multiple CEPH clusters Great to see that multisite seems to work somehow somewhere. We were really in doubt :O Thanks & regards Anton P.S.: If someone reads this, who has a working "one Kraken CEPH cluster" based multisite setup (or, let me dream, even a working ElasticSearch setup :| ) please step out of the dark and enlighten us :O Gesendet: Dienstag, 30. Mai 2017 um 11:02 Uhr Von: "Andreas Calminder"An: ceph-users@lists.ceph.com Betreff: [ceph-users] RGW multisite sync data sync shard stuck Hello, I've got a sync issue with my multisite setup. There's 2 zones in 1 zone group in 1 realm. The data sync in the non-master zone have stuck on Incremental sync is behind by 1 shard, this wasn't noticed until the radosgw instances in the master zone started dying from out of memory issues, all radosgw instances in the non-master zone was then shutdown to ensure services in the master zone while trying to troubleshoot the issue. From the rgw logs in the master zone I see entries like: 2017-05-29 16:10:34.717988 7fbbc1ffb700 0 ERROR: failed to sync object: 12354/BUCKETNAME:be8fa19b-ad79-4cd8-ac7b-1e14fdc882f6.2374181.27/dirname_1/dirname_2/filename_1.ext 2017-05-29 16:10:34.718016 7fbbc1ffb700 0 ERROR: failed to sync object: 12354/BUCKETNAME:be8fa19b-ad79-4cd8-ac7b-1e14fdc882f6.2374181.27/dirname_1/dirname_2/filename_2.ext 2017-05-29 16:10:34.718504 7fbbc1ffb700 0 ERROR: failed to fetch remote data log info: ret=-5 2017-05-29 16:10:34.719443 7fbbc1ffb700 0 ERROR: a sync operation returned error 2017-05-29 16:10:34.720291 7fbc167f4700 0 store->fetch_remote_obj() returned r=-5 sync status in the non-master zone reports that the metadata is up to sync and that the data sync is behind on 1 shard and that the oldest incremental change not applied is about 2 weeks back. I'm not quite sure how to proceed, is there a way to find out the id of the shard and force some kind of re-sync of the data in it from the master zone? I'm unable to have the non-master zone rgw's running because it'll leave the master zone in a bad state with rgw dying every now and then. Regards, Andreas ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW lifecycle not expiring objects
Hi Yahuda. Well, here we go: http://tracker.ceph.com/issues/20177 As it's my first one, hope it's ok as it is... Thanks & regards Anton Gesendet: Samstag, 03. Juni 2017 um 00:14 Uhr Von: "Yehuda Sadeh-Weinraub"An: ceph.nov...@habmalnefrage.de Cc: "Graham Allan" , "ceph-users@lists.ceph.com" Betreff: Re: [ceph-users] RGW lifecycle not expiring objects Have you opened a ceph tracker issue, so that we don't lose track of the problem? Thanks, Yehuda On Fri, Jun 2, 2017 at 3:05 PM, wrote: > Hi Graham. > > We are on Kraken and have the same problem with "lifecycle". Various (other) > tools like s3cmd or CyberDuck do show the applied "expiration" settings, but > objects seem never to be purged. > > If you should have new findings, hints,... PLEASE share/let me know. > > Thanks a lot! > Anton > > > Gesendet: Freitag, 19. Mai 2017 um 22:44 Uhr > Von: "Graham Allan" > An: ceph-users@lists.ceph.com > Betreff: [ceph-users] RGW lifecycle not expiring objects > I've been having a hard time getting the s3 object lifecycle to do > anything here. I was able to set a lifecycle on a test bucket. As others > also seem to have found, I do get an EACCES error on setting the > lifecycle, but it does however get stored: > >> % aws --endpoint-url https://xxx.xxx.xxx.xxx s3api >> get-bucket-lifecycle-configuration --bucket=testgta >> { >> "Rules": [ >> { >> "Status": "Enabled", >> "Prefix": "", >> "Expiration": { >> "Days": 3 >> }, >> "ID": "test" >> } >> ] >> } > > but many days later I have yet to see any object actually get expired. > There are some hints in the rgw log that the expiry thread does run > periodically: > >> 2017-05-19 03:49:03.281347 7f74f1134700 2 >> RGWDataChangesLog::ChangesRenewThread: start >> 2017-05-19 03:49:16.356022 7f74ef931700 2 object expiration: start >> 2017-05-19 03:49:16.356036 7f74ef931700 20 proceeding shard = >> obj_delete_at_hint.00 >> 2017-05-19 03:49:16.359785 7f74ef931700 20 proceeding shard = >> obj_delete_at_hint.01 >> 2017-05-19 03:49:16.364667 7f74ef931700 20 proceeding shard = >> obj_delete_at_hint.02 >> 2017-05-19 03:49:16.369636 7f74ef931700 20 proceeding shard = >> obj_delete_at_hint.03 > ... >> 2017-05-19 03:49:16.803270 7f74ef931700 20 proceeding shard = >> obj_delete_at_hint.000126 >> 2017-05-19 03:49:16.806423 7f74ef931700 2 object expiration: stop > > "radosgw-admin lc process" gives me no output unless I enable debug, then: > >> ]# radosgw-admin lc process >> 2017-05-19 15:28:46.383049 7fedb9ffb700 2 >> RGWDataChangesLog::ChangesRenewThread: start >> 2017-05-19 15:28:46.421806 7feddc240c80 10 Cannot find current period zone >> using local zone >> 2017-05-19 15:28:46.453431 7feddc240c80 2 all 8 watchers are set, enabling >> cache >> 2017-05-19 15:28:46.614991 7feddc240c80 2 removed watcher, disabling cache > > "radosgw-admin lc list" seems to return "empty" output: > >> # radosgw-admin lc list >> [] > > Is there anything obvious that I might be missing? > > Graham > -- > Graham Allan > Minnesota Supercomputing Institute - g...@umn.edu > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com][http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com]] > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com] ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW lifecycle not expiring objects
Hi Graham. We are on Kraken and have the same problem with "lifecycle". Various (other) tools like s3cmd or CyberDuck do show the applied "expiration" settings, but objects seem never to be purged. If you should have new findings, hints,... PLEASE share/let me know. Thanks a lot! Anton Gesendet: Freitag, 19. Mai 2017 um 22:44 Uhr Von: "Graham Allan"An: ceph-users@lists.ceph.com Betreff: [ceph-users] RGW lifecycle not expiring objects I've been having a hard time getting the s3 object lifecycle to do anything here. I was able to set a lifecycle on a test bucket. As others also seem to have found, I do get an EACCES error on setting the lifecycle, but it does however get stored: > % aws --endpoint-url https://xxx.xxx.xxx.xxx s3api > get-bucket-lifecycle-configuration --bucket=testgta > { > "Rules": [ > { > "Status": "Enabled", > "Prefix": "", > "Expiration": { > "Days": 3 > }, > "ID": "test" > } > ] > } but many days later I have yet to see any object actually get expired. There are some hints in the rgw log that the expiry thread does run periodically: > 2017-05-19 03:49:03.281347 7f74f1134700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2017-05-19 03:49:16.356022 7f74ef931700 2 object expiration: start > 2017-05-19 03:49:16.356036 7f74ef931700 20 proceeding shard = > obj_delete_at_hint.00 > 2017-05-19 03:49:16.359785 7f74ef931700 20 proceeding shard = > obj_delete_at_hint.01 > 2017-05-19 03:49:16.364667 7f74ef931700 20 proceeding shard = > obj_delete_at_hint.02 > 2017-05-19 03:49:16.369636 7f74ef931700 20 proceeding shard = > obj_delete_at_hint.03 ... > 2017-05-19 03:49:16.803270 7f74ef931700 20 proceeding shard = > obj_delete_at_hint.000126 > 2017-05-19 03:49:16.806423 7f74ef931700 2 object expiration: stop "radosgw-admin lc process" gives me no output unless I enable debug, then: > ]# radosgw-admin lc process > 2017-05-19 15:28:46.383049 7fedb9ffb700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2017-05-19 15:28:46.421806 7feddc240c80 10 Cannot find current period zone > using local zone > 2017-05-19 15:28:46.453431 7feddc240c80 2 all 8 watchers are set, enabling > cache > 2017-05-19 15:28:46.614991 7feddc240c80 2 removed watcher, disabling cache "radosgw-admin lc list" seems to return "empty" output: > # radosgw-admin lc list > [] Is there anything obvious that I might be missing? Graham -- Graham Allan Minnesota Supercomputing Institute - g...@umn.edu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com] ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Seems like majordomo doesn't send mails since some weeks?!
Thanks for answering, David. No idea who changed what and where, but I'm flooded with mails since yesterday ;) --> THANKS Gesendet: Samstag, 20. Mai 2017 um 16:42 Uhr Von: "David Turner"An: ceph.nov...@habmalnefrage.de, ceph-users Betreff: Re: [ceph-users] Seems like majordomo doesn't send mails since some weeks?! I was unsubscribed from the list a while ago because my company was filing mail as spam and replying to the list about it. Check for an email like that. On Sat, May 20, 2017, 7:17 AM wrote:I've not received the list mails since weeks. Thought it's my mail provider who may filter them out. Have checked with them for days. Due to them all is ok. I've then subscribed with my business mail account and do not receive any posts/mails from the list. Any ideas anyone? ___ ceph-users mailing list ceph-users@lists.ceph.com[mailto:ceph-users@lists.ceph.com] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Seems like majordomo doesn't send mails since some weeks?!
I've not received the list mails since weeks. Thought it's my mail provider who may filter them out. Have checked with them for days. Due to them all is ok. I've then subscribed with my business mail account and do not receive any posts/mails from the list. Any ideas anyone? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration."
Ups... thanks for your efforts, Ben! This could explain some bit's. Still I have lot's of question as it seems different S3 tools/clients behaive different. We need to stick on CyberDuck on Windows and s3cms and boto on Linux and many things are not the same with RadosGW :| And more on my to-test-list ;) Regards Anton Gesendet: Mittwoch, 12. April 2017 um 06:49 Uhr Von: "Ben Hines"An: ceph.nov...@habmalnefrage.de Cc: ceph-users , "Yehuda Sadeh-Weinraub" Betreff: Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration." After much banging on this and reading through the Ceph RGW source, i figured out Ceph RadosGW returns -13 ( EACCES - AcessDenied) if you dont pass in a 'Prefix' in your S3 lifecycle configuration setting. It also returns EACCES if the XML is invalid in any way, which is probably not the most correct / user friendly result. http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUTlifecycle.html specifies 'Prefix' as Optional, so i'll put in a bug for this. -Ben On Mon, Apr 3, 2017 at 12:14 PM, Ben Hines wrote: Interesting. I'm wondering what the -13 return code for the op execution in my debug output is (can't find in the source..) I just tried out setting the lifecycle with cyberduck and got this error, which is probably the other bug with AWSv4 auth, http://tracker.ceph.com/issues/17076 Not sure if cyberduck can be forced to use V2. 2017-04-03 12:07:15.093235 7f5617024700 10 op=20RGWPutLC_ObjStore_S3 2017-04-03 12:07:15.093248 7f5617024700 2 req 14:0.000438:s3:PUT /bentest/:put_lifecycle:authorizing . 2017-04-03 12:07:15.093637 7f5617024700 10 delaying v4 auth 2017-04-03 12:07:15.093643 7f5617024700 10 ERROR: AWS4 completion for this operation NOT IMPLEMENTED 2017-04-03 12:07:15.093652 7f5617024700 10 failed to authorize request 2017-04-03 12:07:15.093658 7f5617024700 20 handler->ERRORHANDLER: err_no=-2201 new_err_no=-2201 2017-04-03 12:07:15.093844 7f5617024700 2 req 14:0.001034:s3:PUT /bentest/:put_lifecycle:op status=0 2017-04-03 12:07:15.093859 7f5617024700 2 req 14:0.001050:s3:PUT /bentest/:put_lifecycle:http status=501 2017-04-03 12:07:15.093884 7f5617024700 1 == req done req=0x7f561701e340 op status=0 http_status=501 == -Ben On Mon, Apr 3, 2017 at 7:16 AM, wrote: ... hmm, "modify" gives no error and may be the option to use, but I don't see anything related to an "expires" meta field [root s3cmd-master]# ./s3cmd --no-ssl --verbose modify s3://Test/INSTALL --expiry-days=365 INFO: Summary: 1 remote files to modify modify: 's3://Test/INSTALL' [root s3cmd-master]# ./s3cmd --no-ssl --verbose info s3://Test/INSTALL s3://Test/INSTALL (object): File size: 3123 Last mod: Mon, 03 Apr 2017 12:35:28 GMT MIME type: text/plain Storage: STANDARD MD5 sum: 63834dbb20b32968505c4ebe768fc8c4 SSE: none policy: http://s3.amazonaws.com/doc/2006-03-01/">Test1000falseINSTALL2017-04-03T12:35:28.533Z63834dbb20b32968505c4ebe768fc8c43123STANDARD666First UserREADME.TXT2017-03-31T22:36:38.380Z708efc3b9184c8b112e36062804aca1e88STANDARD666First User cors: none ACL: First User: FULL_CONTROL x-amz-meta-s3cmd-attrs: atime:1491218263/ctime:1490998096/gid:0/gname:root/md5:63834dbb20b32968505c4ebe768fc8c4/mode:33188/mtime:1488021707/uid:0/uname:root Gesendet: Montag, 03. April 2017 um 14:13 Uhr Von: ceph.nov...@habmalnefrage.de An: ceph-users Betreff: Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration." ... additional strange but a bit different info related to the "permission denied" [root s3cmd-master]# ./s3cmd --no-ssl put INSTALL s3://Test/ --expiry-days=5 upload: 'INSTALL' -> 's3://Test/INSTALL' [1 of 1] 3123 of 3123 100% in 0s 225.09 kB/s done [root s3cmd-master]# ./s3cmd info s3://Test/INSTALL s3://Test/INSTALL (object): File size: 3123 Last mod: Mon, 03 Apr 2017 12:01:47 GMT MIME type: text/plain Storage: STANDARD MD5 sum: 63834dbb20b32968505c4ebe768fc8c4 SSE: none policy: http://s3.amazonaws.com/doc/2006-03-01/">Test1000falseINSTALL2017-04-03T12:01:47.745Z63834dbb20b32968505c4ebe768fc8c43123STANDARD666First UserREADME.TXT2017-03-31T22:36:38.380Z708efc3b9184c8b112e36062804aca1e88STANDARD666First User cors: none ACL: First User: FULL_CONTROL x-amz-meta-s3cmd-attrs: atime:1491218263/ctime:1490998096/gid:0/gname:root/md5:63834dbb20b32968505c4ebe768fc8c4/mode:33188/mtime:1488021707/uid:0/uname:root [root s3cmd-master]# ./s3cmd --no-ssl expire s3://Test/ --expiry-days=365 ERROR: Access to bucket 'Test' was denied ERROR: S3 error: 403 (AccessDenied) [root s3cmd-master]# ./s3cmd --no-ssl expire s3://Test/INSTALL
Re: [ceph-users] Question about RadosGW subusers
Thanks a lot, Trey. I'll try that stuff next week, once back from Easter holidays. And some "multi site" and "metasearch" is also still on my to-be-tested list. Need badly to free up some time for all the interesting "future of storage" things. BTW., we are on Kraken and I'd hope to see more of the new and shiny stuff here soon (something like 11.2.X) instead of waiting for Luminous late 2017. Not sure how the CEPH release policy is usually?! Anyhow, thanks and happy Easter everyone! Anton Gesendet: Donnerstag, 13. April 2017 um 20:15 Uhr Von: "Trey Palmer"An: ceph.nov...@habmalnefrage.de Cc: "Trey Palmer" , ceph-us...@ceph.com Betreff: Re: [ceph-users] Question about RadosGW subusers Anton, It turns out that Adam Emerson is trying to get bucket policies and roles merged in time for Luminous: https://github.com/ceph/ceph/pull/14307 Given this, I think we will only be using subusers temporarily as a method to track which human or service did what in which bucket. This seems to us much easier than trying to deal with ACL's without any concept of groups, roles, or policies, in buckets that can often have millions of objects. Here is the general idea: 1. Each bucket has a user ("master user"), but we don't use or issue that set of keys at all. radosgw-admin user create --uid=mybucket --display-name="My Bucket" You can of course have multiple buckets per user but so far for us it has been simple to have one user per bucket, with the username the same as the bucket name. If a human needs access to more than one bucket, we will create multiple subusers for them. That's not convenient, but it's temporary. So what we're doing is effectively making the user into the group, with the subusers being the users, and each user only capable of being in one group. Very suboptimal, but better than the total chaos that would result from giving everyone the same set of keys for a given bucket. 2. For each human user or service/machine user of that bucket, we create subusers. You can do this via: ## full-control ops user radosgw-admin subuser create --uid=mybucket --subuser=mybucket:alice --access=full --gen-access-key --gen-secret --key-type=s3 ## write-only server user radosgw-admin subuser create --uid=mybucket --subuser=mybucket:daemon --access=write --gen-access-key --gen-secret-key --key-type=s3 If you then do a "radosgw-admin metadata get user:mybucket", the JSON output contains the subusers and their keys. 3. Raise the RGW log level in ceph.conf to make an "access key id" line available for each request, which you can then map to a subuser if/when you need to track who did what after the fact. In ceph.conf: debug_rgw = 10/10 This will cause the logs to be VERY verbose, an order of magnitude and some change more verbose than default. We plan to discard most of the logs while feeding them into ElasticSearch. We might not need this much log verbosity once we have policies and are using unique users rather than subusers. Nevertheless, I hope we can eventually reduce the log level of the "access key id" line, as we have a pretty mainstream use case and I'm certain that tracking S3 request users will be required for many organizations for accounting and forensic purposes just as it is for us. -- Trey On Thu, Apr 13, 2017 at 1:29 PM, wrote:Hey Trey. Sounds great, we were discussing the same kind of requirements and couldn't agree on/find something "useful"... so THANK YOU for sharing!!! It would be great if you could provide some more details or an example how you configure the "bucket user" and sub-users and all that stuff. Even more interesting for me, how do the "different ppl or services" access that buckets/objects afterwards?! I mean via which tools (s3cmd, boto, cyberduck, mix of some, ...) and are there any ACLs set/in use as well?! (sorry if this all sounds somehow dumb but I'm a just a novice ;) ) best Anton Gesendet: Dienstag, 11. April 2017 um 00:17 Uhr Von: "Trey Palmer" An: ceph-us...@ceph.com[mailto:ceph-us...@ceph.com] Betreff: [ceph-users] Question about RadosGW subusers Probably a question for @yehuda : We have fairly strict user accountability requirements. The best way we have found to meet them with S3 object storage on Ceph is by using RadosGW subusers. If we set up one user per bucket, then set up subusers to provide separate individual S3 keys and access rights for different people or services using that bucket, then we can track who did what via access key in the RadosGW logs (at debug_rgw = 10/10). Of course, this is not a documented use case for subusers. I'm wondering if Yehuda or anyone else could estimate our risk of future incompatibility if we implement user/key management around subusers in this
Re: [ceph-users] Question about RadosGW subusers
Hey Trey. Sounds great, we were discussing the same kind of requirements and couldn't agree on/find something "useful"... so THANK YOU for sharing!!! It would be great if you could provide some more details or an example how you configure the "bucket user" and sub-users and all that stuff. Even more interesting for me, how do the "different ppl or services" access that buckets/objects afterwards?! I mean via which tools (s3cmd, boto, cyberduck, mix of some, ...) and are there any ACLs set/in use as well?! (sorry if this all sounds somehow dumb but I'm a just a novice ;) ) best Anton Gesendet: Dienstag, 11. April 2017 um 00:17 Uhr Von: "Trey Palmer"An: ceph-us...@ceph.com Betreff: [ceph-users] Question about RadosGW subusers Probably a question for @yehuda : We have fairly strict user accountability requirements. The best way we have found to meet them with S3 object storage on Ceph is by using RadosGW subusers. If we set up one user per bucket, then set up subusers to provide separate individual S3 keys and access rights for different people or services using that bucket, then we can track who did what via access key in the RadosGW logs (at debug_rgw = 10/10). Of course, this is not a documented use case for subusers. I'm wondering if Yehuda or anyone else could estimate our risk of future incompatibility if we implement user/key management around subusers in this manner? Thanks, Trey___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] "RGW Metadata Search" and related
Hi Cephers. We try to get "metadata search" working on our test cluster. This is one of two things we promised an internal customer for a very soon to be stared PoC... the second feature is, as I wrote already in another post, the "object expiration" (lifecycle?!) [object's should be auto-removed after XY months]. Feedback on the metadata search: Beside the fact, that it seems to be based on "multisite", which btw we would not really need here, another difficulty for other Cephers could be the ElasticSearch dependency. In our company, we are lucky to have already an official "ELK" service team, which we can approach "as internal customers". BUT... there are requirements, which we need to fulfill before they would serve us ;) Here is the (very minimalistic) list of the requirements, they have defined and which we need to fulfill to get an ELK index here: basic http authentication (discussed in another discussion with @yehuda, this should be the case already now; couldn't verify because I couldn't setup the needed multisite env.) free configurable ELK index name/prefix provide a template mapping, which reflects the later/possible RGW search fields (right now a "default ELK index" would be created, which may be sub-optimal for the lated searches) best case we could create the template mapping in ELK so we could optimize it to our needs Any feedback/thoughts from the other list members? @yehuda: THANKS for working on that feature!!! Thanks & regards Anton ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] how-to undo a "multisite" config
Hi Cephers. Quick question couldn't find a "how-to" or "docu"... not even sure if someone else ever had to do it... What would be the steps to make a (failed) multisite config change, exactly following - http://docs.ceph.com/docs/master/radosgw/multisite/ undone again? And as I'm on that topic now, any other WORKING documentation how to setup "multisite"? Thanks & regards Anton ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration."
... hmm, "modify" gives no error and may be the option to use, but I don't see anything related to an "expires" meta field [root s3cmd-master]# ./s3cmd --no-ssl --verbose modify s3://Test/INSTALL --expiry-days=365 INFO: Summary: 1 remote files to modify modify: 's3://Test/INSTALL' [root s3cmd-master]# ./s3cmd --no-ssl --verbose info s3://Test/INSTALL s3://Test/INSTALL (object): File size: 3123 Last mod: Mon, 03 Apr 2017 12:35:28 GMT MIME type: text/plain Storage: STANDARD MD5 sum: 63834dbb20b32968505c4ebe768fc8c4 SSE: none policy: Test1000falseINSTALL2017-04-03T12:35:28.533Z63834dbb20b32968505c4ebe768fc8c43123STANDARD666First UserREADME.TXT2017-03-31T22:36:38.380Z708efc3b9184c8b112e36062804aca1e88STANDARD666First User cors: none ACL: First User: FULL_CONTROL x-amz-meta-s3cmd-attrs: atime:1491218263/ctime:1490998096/gid:0/gname:root/md5:63834dbb20b32968505c4ebe768fc8c4/mode:33188/mtime:1488021707/uid:0/uname:root Gesendet: Montag, 03. April 2017 um 14:13 Uhr Von: ceph.nov...@habmalnefrage.de An: ceph-usersBetreff: Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration." ... additional strange but a bit different info related to the "permission denied" [root s3cmd-master]# ./s3cmd --no-ssl put INSTALL s3://Test/ --expiry-days=5 upload: 'INSTALL' -> 's3://Test/INSTALL' [1 of 1] 3123 of 3123 100% in 0s 225.09 kB/s done [root s3cmd-master]# ./s3cmd info s3://Test/INSTALL s3://Test/INSTALL (object): File size: 3123 Last mod: Mon, 03 Apr 2017 12:01:47 GMT MIME type: text/plain Storage: STANDARD MD5 sum: 63834dbb20b32968505c4ebe768fc8c4 SSE: none policy: Test1000falseINSTALL2017-04-03T12:01:47.745Z63834dbb20b32968505c4ebe768fc8c43123STANDARD666First UserREADME.TXT2017-03-31T22:36:38.380Z708efc3b9184c8b112e36062804aca1e88STANDARD666First User cors: none ACL: First User: FULL_CONTROL x-amz-meta-s3cmd-attrs: atime:1491218263/ctime:1490998096/gid:0/gname:root/md5:63834dbb20b32968505c4ebe768fc8c4/mode:33188/mtime:1488021707/uid:0/uname:root [root s3cmd-master]# ./s3cmd --no-ssl expire s3://Test/ --expiry-days=365 ERROR: Access to bucket 'Test' was denied ERROR: S3 error: 403 (AccessDenied) [root s3cmd-master]# ./s3cmd --no-ssl expire s3://Test/INSTALL --expiry-days=365 ERROR: Parameter problem: Expecting S3 URI with just the bucket name set instead of 's3://Test/INSTALL' [root@mucsds26 s3cmd-master]# ./s3cmd --no-ssl expire s3://Test/ --expiry-days=365 ERROR: Access to bucket 'Test' was denied ERROR: S3 error: 403 (AccessDenied) [root s3cmd-master]# ./s3cmd --no-ssl la expire s3://Test 2017-04-03 12:01 3123 s3://Test/INSTALL 2017-03-31 22:36 88 s3://Test/README.TXT Gesendet: Montag, 03. April 2017 um 12:31 Uhr Von: ceph.nov...@habmalnefrage.de An: "Ben Hines" , ceph-users Betreff: Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration." Hi Cephers... I did set the "lifecycle" via Cyberduck.I do also get an error first, then suddenly Cyberduck refreshes the window aand the lifecycle is there. I see the following when I check it via s3cmd (GitHub master version because the regular installed version doesn't offer the "getlifecycle" option): [root s3cmd-master]# ./s3cmd getlifecycle s3://Test/README.txt Cyberduck-nVWEhQwE Enabled 1 Here is my S3 "user info": [root ~]# radosgw-admin user info --uid=666 { "user_id": "666", "display_name": "First User", "email": "a...@c.de", "suspended": 0, "max_buckets": 1000, "auid": 0, "subusers": [], "keys": [ { "user": "666", "access_key": "abc ;)", "secret_key": "abc def ;)" } ], "swift_keys": [], "caps": [], "op_mask": "read, write, delete", "default_placement": "", "placement_tags": [], "bucket_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "user_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "temp_url_keys": [], "type": "rgw" } If someone has a working example how to set lifecycle via the s3cmd, I can try it and send the outcome... Gesendet: Montag, 03. April 2017 um 01:43 Uhr Von: "Ben Hines" An: "Orit Wasserman" Cc: ceph-users Betreff: Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration." Hmm, Nope, not using tenants feature. The users/buckets were created on prior ceph versions, perhaps i'll try with a newly created user + bucket. radosgw-admin user info --uid=foo { "user_id": "foo",
Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration."
... additional strange but a bit different info related to the "permission denied" [root s3cmd-master]# ./s3cmd --no-ssl put INSTALL s3://Test/ --expiry-days=5 upload: 'INSTALL' -> 's3://Test/INSTALL' [1 of 1] 3123 of 3123 100% in0s 225.09 kB/s done [root s3cmd-master]# ./s3cmd info s3://Test/INSTALL s3://Test/INSTALL (object): File size: 3123 Last mod: Mon, 03 Apr 2017 12:01:47 GMT MIME type: text/plain Storage: STANDARD MD5 sum: 63834dbb20b32968505c4ebe768fc8c4 SSE: none policy:http://s3.amazonaws.com/doc/2006-03-01/;>Test1000falseINSTALL2017-04-03T12:01:47.745Z63834dbb20b32968505c4ebe768fc8c43123STANDARD666First UserREADME.TXT2017-03-31T22:36:38.380Z708efc3b9184c8b112e36062804aca1e88STANDARD666First User cors:none ACL: First User: FULL_CONTROL x-amz-meta-s3cmd-attrs: atime:1491218263/ctime:1490998096/gid:0/gname:root/md5:63834dbb20b32968505c4ebe768fc8c4/mode:33188/mtime:1488021707/uid:0/uname:root [root s3cmd-master]# ./s3cmd --no-ssl expire s3://Test/ --expiry-days=365 ERROR: Access to bucket 'Test' was denied ERROR: S3 error: 403 (AccessDenied) [root s3cmd-master]# ./s3cmd --no-ssl expire s3://Test/INSTALL --expiry-days=365 ERROR: Parameter problem: Expecting S3 URI with just the bucket name set instead of 's3://Test/INSTALL' [root@mucsds26 s3cmd-master]# ./s3cmd --no-ssl expire s3://Test/ --expiry-days=365 ERROR: Access to bucket 'Test' was denied ERROR: S3 error: 403 (AccessDenied) [root s3cmd-master]# ./s3cmd --no-ssl la expire s3://Test 2017-04-03 12:01 3123 s3://Test/INSTALL 2017-03-31 22:3688 s3://Test/README.TXT Gesendet: Montag, 03. April 2017 um 12:31 Uhr Von: ceph.nov...@habmalnefrage.de An: "Ben Hines", ceph-users Betreff: Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration." Hi Cephers... I did set the "lifecycle" via Cyberduck.I do also get an error first, then suddenly Cyberduck refreshes the window aand the lifecycle is there. I see the following when I check it via s3cmd (GitHub master version because the regular installed version doesn't offer the "getlifecycle" option): [root s3cmd-master]# ./s3cmd getlifecycle s3://Test/README.txt http://s3.amazonaws.com/doc/2006-03-01/;> Cyberduck-nVWEhQwE Enabled 1 Here is my S3 "user info": [root ~]# radosgw-admin user info --uid=666 { "user_id": "666", "display_name": "First User", "email": "a...@c.de", "suspended": 0, "max_buckets": 1000, "auid": 0, "subusers": [], "keys": [ { "user": "666", "access_key": "abc ;)", "secret_key": "abc def ;)" } ], "swift_keys": [], "caps": [], "op_mask": "read, write, delete", "default_placement": "", "placement_tags": [], "bucket_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "user_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "temp_url_keys": [], "type": "rgw" } If someone has a working example how to set lifecycle via the s3cmd, I can try it and send the outcome... Gesendet: Montag, 03. April 2017 um 01:43 Uhr Von: "Ben Hines" An: "Orit Wasserman" Cc: ceph-users Betreff: Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration." Hmm, Nope, not using tenants feature. The users/buckets were created on prior ceph versions, perhaps i'll try with a newly created user + bucket. radosgw-admin user info --uid=foo { "user_id": "foo", "display_name": "foo", "email": "snip", "suspended": 0, "max_buckets": 1000, "auid": 0, "subusers": [ { "id": "foo:swift", "permissions": "full-control" } ], "keys": [ { "user": "foo:swift", "access_key": "xxx", "secret_key": "" }, { "user": "foo", "access_key": "xxx", "secret_key": "" } ], "swift_keys": [], "caps": [ { "type": "buckets", "perm": "*" }, { "type": "metadata", "perm": "*" }, { "type": "usage", "perm": "*" }, { "type": "users", "perm": "*" }, { "type": "zone", "perm": "*" } ], "op_mask": "read, write, delete", "default_placement": "", "placement_tags": [], "bucket_quota": { "enabled": false, "check_on_raw": false, "max_size": -1024,
Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration."
Hi Cephers... I did set the "lifecycle" via Cyberduck.I do also get an error first, then suddenly Cyberduck refreshes the window aand the lifecycle is there. I see the following when I check it via s3cmd (GitHub master version because the regular installed version doesn't offer the "getlifecycle" option): [root s3cmd-master]# ./s3cmd getlifecycle s3://Test/README.txt http://s3.amazonaws.com/doc/2006-03-01/;> Cyberduck-nVWEhQwE Enabled 1 Here is my S3 "user info": [root ~]# radosgw-admin user info --uid=666 { "user_id": "666", "display_name": "First User", "email": "a...@c.de", "suspended": 0, "max_buckets": 1000, "auid": 0, "subusers": [], "keys": [ { "user": "666", "access_key": "abc ;)", "secret_key": "abc def ;)" } ], "swift_keys": [], "caps": [], "op_mask": "read, write, delete", "default_placement": "", "placement_tags": [], "bucket_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "user_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "temp_url_keys": [], "type": "rgw" } If someone has a working example how to set lifecycle via the s3cmd, I can try it and send the outcome... Gesendet: Montag, 03. April 2017 um 01:43 Uhr Von: "Ben Hines"An: "Orit Wasserman" Cc: ceph-users Betreff: Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration." Hmm, Nope, not using tenants feature. The users/buckets were created on prior ceph versions, perhaps i'll try with a newly created user + bucket. radosgw-admin user info --uid=foo { "user_id": "foo", "display_name": "foo", "email": "snip", "suspended": 0, "max_buckets": 1000, "auid": 0, "subusers": [ { "id": "foo:swift", "permissions": "full-control" } ], "keys": [ { "user": "foo:swift", "access_key": "xxx", "secret_key": "" }, { "user": "foo", "access_key": "xxx", "secret_key": "" } ], "swift_keys": [], "caps": [ { "type": "buckets", "perm": "*" }, { "type": "metadata", "perm": "*" }, { "type": "usage", "perm": "*" }, { "type": "users", "perm": "*" }, { "type": "zone", "perm": "*" } ], "op_mask": "read, write, delete", "default_placement": "", "placement_tags": [], "bucket_quota": { "enabled": false, "check_on_raw": false, "max_size": -1024, "max_size_kb": 0, "max_objects": -1 }, "user_quota": { "enabled": false, "check_on_raw": false, "max_size": -1024, "max_size_kb": 0, "max_objects": -1 }, "temp_url_keys": [], "type": "none" } On Sun, Apr 2, 2017 at 5:54 AM, Orit Wasserman wrote: I see : acct_user=foo, acct_name=foo, Are you using radosgw with tenants? If not it could be the problem Orit On Sat, Apr 1, 2017 at 7:43 AM, Ben Hines wrote: I'm also trying to use lifecycles (via boto3) but i'm getting permission denied trying to create the lifecycle. I'm bucket owner with full_control and WRITE_ACP for good measure. Any ideas? This is debug ms=20 debug radosgw=20 2017-03-31 21:28:18.382217 7f50d0010700 2 req 8:0.000693:s3:PUT /bentest:put_lifecycle:verifying op permissions 2017-03-31 21:28:18.38 7f50d0010700 5 Searching permissions for identity=RGWThirdPartyAccountAuthApplier() -> RGWLocalAuthApplier(acct_user=foo, acct_name=foo, subuser=, perm_mask=15, is_admin=) mask=56 2017-03-31 21:28:18.382232 7f50d0010700 5 Searching permissions for uid=foo 2017-03-31 21:28:18.382235 7f50d0010700 5 Found permission: 15 2017-03-31 21:28:18.382237 7f50d0010700 5 Searching permissions for group=1 mask=56 2017-03-31 21:28:18.382297 7f50d0010700 5 Found permission: 3 2017-03-31 21:28:18.382307 7f50d0010700 5 Searching permissions for group=2 mask=56 2017-03-31 21:28:18.382313 7f50d0010700 5 Permissions for group not found 2017-03-31 21:28:18.382318 7f50d0010700 5 Getting permissions identity=RGWThirdPartyAccountAuthApplier() -> RGWLocalAuthApplier(acct_user=foo, acct_name=foo, subuser=, perm_mask=15, is_admin=) owner=foo
[ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration."
Hi Cephers. Couldn't find any special documentation about the "S3 object expiration" so I assume it should work "AWS S3 like" (?!?) ... BUT ... we have a test cluster based on 11.2.0 - Kraken and I set some object expiration dates via CyberDuck and DragonDisk, but the objects are still there, days after the applied date/time. Do I miss something? Thanks & regards ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com