Re: [ceph-users] Are there any statistics available on how most production ceph clusters are being used?
On Sun, Apr 21, 2019 at 03:11:44PM +0200, Marc Roos wrote: > Double thanks for the on-topic reply. The other two repsonses, were > making me doubt if my chinese (which I didn't study) is better than my > english. They were almost on topic, but not that useful. Please don't imply language failings on this list. English may be the lingua franca, but it is by far not the first language for most list members. Not being useful to you didn't mean they weren't useful overall. > >> I am a bit curious on how production ceph clusters are being used. I > am > >> reading here that the block storage is used a lot with openstack and > > >> proxmox, and via iscsi with vmare. > >Have you looked at the Ceph User Surveys/Census? > >https://ceph.com/ceph-blog/ceph-user-survey-2018-results/ > >https://ceph.com/geen-categorie/results-from-the-ceph-census/ > > Sort of what I was looking for, so 42% use rgw, of which 74% s3. > I guess this main archive usage, is mostly done by providers Not just archive, but also API-driven for web services, usually hidden behind hostnames/CDNs. Image/video upload sites are a big part of this, esp. things like Instagram clones in emerging markets. > >As the quantity of data by a single user increases, the odds that GUI > >tools are used for it decreases, as it's MUCH more likely to be driven > >by automation & tooling around the API. > Hmm, interesting. I am having more soho clients. And was thinking of > getting them such gui client. That's great, but orthogonal to the overall issue. Some of the cloud providers DO offer setup docs for GUI clients as well, off the top of my head I know Dreamhost & DigitalOcean's ones, because I contributed to their docs: https://help.dreamhost.com/hc/en-us/sections/11559232-DreamObjects-clients https://www.digitalocean.com/docs/spaces/resources/ > I think if you take the perspective of some end user that associates s3, > with something like an audi and nothing else. It is quite necessary > to have a client that is easy and secure to use, where you just enter > preferably only two things, your access key and your secret. There's a bare minimum of three things you'd need in a generic client: - endpoint(s) - access key - secret The Endpoint could be partially pre-provisioned (think like you'd give your clients an INI file that pointed them to your private Ceph RGW deployment). If it's a deployment with multiple regions, endpoints & region-specifics become more important (e.g. AWS S3 has differing signature requirements in different regions) > The advantage of having a more rgw specific gui client, is that you > - do not have the default amazon 'advertisements' (think of storage > classes etc.) > - less configuration options, everything ceph does not support we do not > need to configure. > - no ftp, no what ever else, just this s3 > - you do not have configuration options that ceph doesn't offer > (eg. this life cycle, bucket access logging?) - Storage Classes: supported - Bucket Lifecycle: supported - Bucket Access Logging: not quite supported, PR exists, some debate about better designs. https://github.com/ceph/ceph/pull/14841 > I can imagine if you have quite a few clients, you could get quite > some questions to answer, about things not working. > - you have better support for specific things like multi tenant account, > etc. Tenacy in RGW if effectively parallel S3 scopes; with different endpoints. > - for once the https urls are correctly advertised What issue do you have with HTTPS URLs? The main gotcha that most people hit is that S3's ssl hostname validation rule is NOT the same as the general SSL hostname validation rule, and trips up browser access. Specifically in a wildcard SSL cert, '*.myrgwendpoint.com', the general rule is that '*' should only match one DNS fragment [e.g. no '.'], while S3's validation says it can match one or more DNS fragments. The AWS S3 docs are even horrible about this, with the text: "To work around this, use HTTP or write your own certificate verification logic." https://github.com/awsdocs/amazon-s3-developer-guide/blame/f498926b68f4f1b11c7f708ac0fbd52ee2a0aa19/doc_source/BucketRestrictions.md#L35 > Whether one likes it or not ceph is afaik not fully s3 compatible No, Ceph isn't fully AWS-S3 compatible, and I did specifically include in my talk at Cephalocon last year that we should explicitly be returning 501 NotImplemented in more cases. AWS-S3 in itself is a moving target, and some of the operations ARE best offloaded to something other than Ceph. Even if Ceph/RGW does support a given set of operations, does the deployment want to consider those operations supported? This thinking lead to the torrent ops being behind a configuration option in Ceph, and other ops can be & are blocked by providers in the reverse proxy. There ARE RGW-specific features that would be valuable to have in more clients: - RGW Admin operations [the list of them is much longer than the docs suggest] -
Re: [ceph-users] Are there any statistics available on how most production ceph clusters are being used?
Double thanks for the on-topic reply. The other two repsonses, were making me doubt if my chinese (which I didn't study) is better than my english. >> I am a bit curious on how production ceph clusters are being used. I am >> reading here that the block storage is used a lot with openstack and >> proxmox, and via iscsi with vmare. >Have you looked at the Ceph User Surveys/Census? >https://ceph.com/ceph-blog/ceph-user-survey-2018-results/ >https://ceph.com/geen-categorie/results-from-the-ceph-census/ Sort of what I was looking for, so 42% use rgw, of which 74% s3. I guess this main archive usage, is mostly done by providers >> But I since nobody here is interested in a better rgw client for end >> users. I am wondering if the rgw is even being used like this, and what >> most production environments look like. >Your end-user client thread was specifically asking targeting GUI >clients on OSX & Windows. I feel that the GUI client usage of S3 >protocol has a much higher visibility to data size ratio than >automation/tooling usage. > >As the quantity of data by a single user increases, the odds that GUI >tools are used for it decreases, as it's MUCH more likely to be driven >by automation & tooling around the API. Hmm, interesting. I am having more soho clients. And was thinking of getting them such gui client. >My earliest Ceph production deployment was mostly RGW (~16TB raw), with >a little bit of RBD/iSCSI usage (~1TB of floating disk between VMs). >Very little of the RGW usage was GUI driven (there certainly was some, >because it made business sense to offer it rather than FTP sites; but it >tiny compared to the automation flows). > >My second production deployment I worked was Dreamhost's DreamObjects, >which was over 3PB then: and MOST of the usage was still not GUI-driven. > >I'm working at DigitalOcean's Spaces offering now; again, mostly non-GUI >access. > >For the second part of your original-query, I feel that any new clients >SHOULD not be RGW-specific; they should be able to work on a wide range >of services that expose the S3 API, and have a good test-suite around >that (s3-tests, but for testing the client implementation; even Boto is >not bug-free). > I think if you take the perspective of some end user that associates s3, with something like an audi and nothing else. It is quite necessary to have a client that is easy and secure to use, where you just enter preferably only two things, your access key and your secret. The advantage of having a more rgw specific gui client, is that you - do not have the default amazon 'advertisements' (think of storage classes etc.) - less configuration options, everything ceph does not support we do not need to configure. - no ftp, no what ever else, just this s3 - you do not have configuration options that ceph doesn't offer (eg. this life cycle, bucket access logging?) I can imagine if you have quite a few clients, you could get quite some questions to answer, about things not working. - you have better support for specific things like multi tenant account, etc. - for once the https urls are correctly advertised Whether one likes it or not ceph is afaik not fully s3 compatible ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Are there any statistics available on how most production ceph clusters are being used?
On Fri, Apr 19, 2019 at 12:10:02PM +0200, Marc Roos wrote: > I am a bit curious on how production ceph clusters are being used. I am > reading here that the block storage is used a lot with openstack and > proxmox, and via iscsi with vmare. Have you looked at the Ceph User Surveys/Census? https://ceph.com/ceph-blog/ceph-user-survey-2018-results/ https://ceph.com/geen-categorie/results-from-the-ceph-census/ > But I since nobody here is interested in a better rgw client for end > users. I am wondering if the rgw is even being used like this, and what > most production environments look like. Your end-user client thread was specifically asking targeting GUI clients on OSX & Windows. I feel that the GUI client usage of S3 protocol has a much higher visibility to data size ratio than automation/tooling usage. As the quantity of data by a single user increases, the odds that GUI tools are used for it decreases, as it's MUCH more likely to be driven by automation & tooling around the API. My earliest Ceph production deployment was mostly RGW (~16TB raw), with a little bit of RBD/iSCSI usage (~1TB of floating disk between VMs). Very little of the RGW usage was GUI driven (there certainly was some, because it made business sense to offer it rather than FTP sites; but it tiny compared to the automation flows). My second production deployment I worked was Dreamhost's DreamObjects, which was over 3PB then: and MOST of the usage was still not GUI-driven. I'm working at DigitalOcean's Spaces offering now; again, mostly non-GUI access. For the second part of your original-query, I feel that any new clients SHOULD not be RGW-specific; they should be able to work on a wide range of services that expose the S3 API, and have a good test-suite around that (s3-tests, but for testing the client implementation; even Boto is not bug-free). -- Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation Treasurer E-Mail : robb...@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 signature.asc Description: PGP signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Are there any statistics available on how most production ceph clusters are being used?
> On Apr 19, 2019, at 10:59 AM, Janne Johansson wrote: > > May the most significant bit of your life be positive. Marc, my favorite thing about open source software is it has a 100% money back satisfaction guarantee: If you are not completely satisfied, you can have an instant refund, just for waving your arm! :D Seriously though, Janne is right, for any OSS project. Think of it like a party where the some people go home “when it’s over” and some people stick around and help clean up. Using myself as an example, I’ve been asking questions about RGW multi-site, and now that I have a little more experience with it (not much more — it’s not working yet, just where I can see gaps in the documentation), I owe it to those that have helped me get here by filling those gaps in the docs. That’s where I can start, and when I understand what’s going on with more authority, I can go into the source and create changes that alter how it works for others to review. Note in both cases I am proposing concrete changes, which is far more effective than trying to describe situations that others may have never been in. Many can try to help, but if it is frustrating for them, they will lose interest. Good pull requests are never frustrating to understand, even if they need more work to handle cases others know about. It’s a more quantitative means of expression. If that kind of commitment doesn’t sound appealing, buy support contracts. Pay back in to the community so that those with passion for the product can do exactly what I’ve described here. There’s no shame in that, but users like you and me need to be careful with the time of those who have put their lives into this, at least until we can put more into the party than we have taken out. Hope that helps! :B ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Are there any statistics available on how most production ceph clusters are being used?
Den fre 19 apr. 2019 kl 12:10 skrev Marc Roos : > > [...]since nobody here is interested in a better rgw client for end > users. I am wondering if the rgw is even being used like this, and what > most production environments look like. > > "Like this" ? People use tons of scriptable and built-in clients, from s3cmd, to "My backup software can use S3 as a remote backend" You mentioned looking at two and now conclude noone wants s3... > This could also be interesting information to decide in what direction > ceph should develop in the future not? > > Find an area which bugs you and fix that, present your results, don't go ape over a failed "survey" during easter vacations. -- May the most significant bit of your life be positive. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Are there any statistics available on how most production ceph clusters are being used?
I am a bit curious on how production ceph clusters are being used. I am reading here that the block storage is used a lot with openstack and proxmox, and via iscsi with vmare. But I since nobody here is interested in a better rgw client for end users. I am wondering if the rgw is even being used like this, and what most production environments look like. This could also be interesting information to decide in what direction ceph should develop in the future not? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com