Re: [ceph-users] Are there any statistics available on how most production ceph clusters are being used?

2019-04-21 Thread Robin H. Johnson
On Sun, Apr 21, 2019 at 03:11:44PM +0200, Marc Roos wrote:
> Double thanks for the on-topic reply. The other two repsonses, were 
> making me doubt if my chinese (which I didn't study) is better than my 
> english.
They were almost on topic, but not that useful. Please don't imply
language failings on this list. English may be the lingua franca, but it
is by far not the first language for most list members. Not being useful
to you didn't mean they weren't useful overall.

>  >> I am a bit curious on how production ceph clusters are being used. I 
> am 
>  >> reading here that the block storage is used a lot with openstack and 
> 
>  >> proxmox, and via iscsi with vmare. 
>  >Have you looked at the Ceph User Surveys/Census?
>  >https://ceph.com/ceph-blog/ceph-user-survey-2018-results/
>  >https://ceph.com/geen-categorie/results-from-the-ceph-census/
> 
> Sort of what I was looking for, so 42% use rgw, of which 74% s3.
> I guess this main archive usage, is mostly done by providers
Not just archive, but also API-driven for web services, usually hidden
behind hostnames/CDNs. Image/video upload sites are a big part of this,
esp. things like Instagram clones in emerging markets.

>  >As the quantity of data by a single user increases, the odds that GUI
>  >tools are used for it decreases, as it's MUCH more likely to be driven
>  >by automation & tooling around the API.
> Hmm, interesting. I am having more soho clients. And was thinking of
> getting them such gui client.
That's great, but orthogonal to the overall issue. Some of the cloud
providers DO offer setup docs for GUI clients as well, off the top of my
head I know Dreamhost & DigitalOcean's ones, because I contributed to
their docs:
https://help.dreamhost.com/hc/en-us/sections/11559232-DreamObjects-clients
https://www.digitalocean.com/docs/spaces/resources/

> I think if you take the perspective of some end user that associates s3,
> with something like an audi and nothing else. It is quite necessary 
> to have a client that is easy and secure to use, where you just enter
>  preferably only two things, your access key and your secret.
There's a bare minimum of three things you'd need in a generic client:
- endpoint(s)
- access key
- secret

The Endpoint could be partially pre-provisioned (think like you'd give
your clients an INI file that pointed them to your private Ceph RGW
deployment). If it's a deployment with multiple regions, endpoints &
region-specifics become more important (e.g. AWS S3 has differing
signature requirements in different regions)

> The advantage of having a more rgw specific gui client, is that you
> - do not have the default amazon 'advertisements' (think of storage 
> classes etc.)
> - less configuration options, everything ceph does not support we do not
>   need to configure. 
> - no ftp, no what ever else, just this s3
> - you do not have configuration options that ceph doesn't offer 
>   (eg. this life cycle, bucket access logging?)
- Storage Classes: supported
- Bucket Lifecycle: supported
- Bucket Access Logging: not quite supported, PR exists, some debate
  about better designs. https://github.com/ceph/ceph/pull/14841

>   I can imagine if you have quite a few clients, you could get quite 
> some questions to answer, about things not working.
> - you have better support for specific things like multi tenant account, 
> etc.
Tenacy in RGW if effectively parallel S3 scopes; with different
endpoints.

> - for once the https urls are correctly advertised
What issue do you have with HTTPS URLs? The main gotcha that most people
hit is that S3's ssl hostname validation rule is NOT the same as the
general SSL hostname validation rule, and trips up browser access.
Specifically in a wildcard SSL cert, '*.myrgwendpoint.com', the general
rule is that '*' should only match one DNS fragment [e.g. no '.'], while
S3's validation says it can match one or more DNS fragments.
The AWS S3 docs are even horrible about this, with the text:
"To work around this, use HTTP or write your own certificate
verification logic."
https://github.com/awsdocs/amazon-s3-developer-guide/blame/f498926b68f4f1b11c7f708ac0fbd52ee2a0aa19/doc_source/BucketRestrictions.md#L35

> Whether one likes it or not ceph is afaik not fully s3 compatible
No, Ceph isn't fully AWS-S3 compatible, and I did specifically include in my
talk at Cephalocon last year that we should explicitly be returning 501
NotImplemented in more cases. AWS-S3 in itself is a moving target, and
some of the operations ARE best offloaded to something other than Ceph.

Even if Ceph/RGW does support a given set of operations, does the
deployment want to consider those operations supported? This thinking
lead to the torrent ops being behind a configuration option in Ceph, and
other ops can be & are blocked by providers in the reverse proxy.

There ARE RGW-specific features that would be valuable to have in more
clients:
- RGW Admin operations [the list of them is much longer than the docs
  suggest]
- 

Re: [ceph-users] Are there any statistics available on how most production ceph clusters are being used?

2019-04-21 Thread Marc Roos


Double thanks for the on-topic reply. The other two repsonses, were 
making
me doubt if my chinese (which I didn't study) is better than my english.


 >> I am a bit curious on how production ceph clusters are being used. I 
am 
 >> reading here that the block storage is used a lot with openstack and 

 >> proxmox, and via iscsi with vmare. 
 >Have you looked at the Ceph User Surveys/Census?
 >https://ceph.com/ceph-blog/ceph-user-survey-2018-results/
 >https://ceph.com/geen-categorie/results-from-the-ceph-census/

Sort of what I was looking for, so 42% use rgw, of which 74% s3.
I guess this main archive usage, is mostly done by providers

 >> But I since nobody here is interested in a better rgw client for end 

 >> users. I am wondering if the rgw is even being used like this, and 
what 
 >> most production environments look like. 
 >Your end-user client thread was specifically asking targeting GUI
 >clients on OSX & Windows. I feel that the GUI client usage of S3
 >protocol has a much higher visibility to data size ratio than
 >automation/tooling usage.
 >
 >As the quantity of data by a single user increases, the odds that GUI
 >tools are used for it decreases, as it's MUCH more likely to be driven
 >by automation & tooling around the API.

Hmm, interesting. I am having more soho clients. And was thinking of
getting them such gui client.

 >My earliest Ceph production deployment was mostly RGW (~16TB raw), 
with
 >a little bit of RBD/iSCSI usage (~1TB of floating disk between VMs).
 >Very little of the RGW usage was GUI driven (there certainly was some,
 >because it made business sense to offer it rather than FTP sites; but 
it
 >tiny compared to the automation flows).
 >
 >My second production deployment I worked was Dreamhost's DreamObjects,
 >which was over 3PB then: and MOST of the usage was still not 
GUI-driven.
 >
 >I'm working at DigitalOcean's Spaces offering now; again, mostly 
non-GUI
 >access.
 >
 >For the second part of your original-query, I feel that any new 
clients
 >SHOULD not be RGW-specific; they should be able to work on a wide 
range
 >of services that expose the S3 API, and have a good test-suite around
 >that (s3-tests, but for testing the client implementation; even Boto 
is
 >not bug-free).
 >

I think if you take the perspective of some end user that associates s3,
with something like an audi and nothing else. It is quite necessary 
to have a client that is easy and secure to use, where you just enter
 preferably only two things, your access key and your secret.

The advantage of having a more rgw specific gui client, is that you
- do not have the default amazon 'advertisements' (think of storage 
classes etc.)
- less configuration options, everything ceph does not support we do not
  need to configure. 
- no ftp, no what ever else, just this s3
- you do not have configuration options that ceph doesn't offer 
  (eg. this life cycle, bucket access logging?)
  I can imagine if you have quite a few clients, you could get quite 
some
  questions to answer, about things not working.
- you have better support for specific things like multi tenant account, 
etc.
- for once the https urls are correctly advertised

Whether one likes it or not ceph is afaik not fully s3 compatible




 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Are there any statistics available on how most production ceph clusters are being used?

2019-04-19 Thread Robin H. Johnson
On Fri, Apr 19, 2019 at 12:10:02PM +0200, Marc Roos wrote:
> I am a bit curious on how production ceph clusters are being used. I am 
> reading here that the block storage is used a lot with openstack and 
> proxmox, and via iscsi with vmare. 
Have you looked at the Ceph User Surveys/Census?
https://ceph.com/ceph-blog/ceph-user-survey-2018-results/
https://ceph.com/geen-categorie/results-from-the-ceph-census/

> But I since nobody here is interested in a better rgw client for end 
> users. I am wondering if the rgw is even being used like this, and what 
> most production environments look like. 
Your end-user client thread was specifically asking targeting GUI
clients on OSX & Windows. I feel that the GUI client usage of S3
protocol has a much higher visibility to data size ratio than
automation/tooling usage.

As the quantity of data by a single user increases, the odds that GUI
tools are used for it decreases, as it's MUCH more likely to be driven
by automation & tooling around the API.

My earliest Ceph production deployment was mostly RGW (~16TB raw), with
a little bit of RBD/iSCSI usage (~1TB of floating disk between VMs).
Very little of the RGW usage was GUI driven (there certainly was some,
because it made business sense to offer it rather than FTP sites; but it
tiny compared to the automation flows).

My second production deployment I worked was Dreamhost's DreamObjects,
which was over 3PB then: and MOST of the usage was still not GUI-driven.

I'm working at DigitalOcean's Spaces offering now; again, mostly non-GUI
access.

For the second part of your original-query, I feel that any new clients
SHOULD not be RGW-specific; they should be able to work on a wide range
of services that expose the S3 API, and have a good test-suite around
that (s3-tests, but for testing the client implementation; even Boto is
not bug-free).

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136


signature.asc
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Are there any statistics available on how most production ceph clusters are being used?

2019-04-19 Thread Brian Topping
> On Apr 19, 2019, at 10:59 AM, Janne Johansson  wrote:
> 
> May the most significant bit of your life be positive.

Marc, my favorite thing about open source software is it has a 100% money back 
satisfaction guarantee: If you are not completely satisfied, you can have an 
instant refund, just for waving your arm! :D

Seriously though, Janne is right, for any OSS project. Think of it like a party 
where the some people go home “when it’s over” and some people stick around and 
help clean up. Using myself as an example, I’ve been asking questions about RGW 
multi-site, and now that I have a little more experience with it (not much more 
— it’s not working yet, just where I can see gaps in the documentation), I owe 
it to those that have helped me get here by filling those gaps in the docs. 

That’s where I can start, and when I understand what’s going on with more 
authority, I can go into the source and create changes that alter how it works 
for others to review.

Note in both cases I am proposing concrete changes, which is far more effective 
than trying to describe situations that others may have never been in. Many can 
try to help, but if it is frustrating for them, they will lose interest. Good 
pull requests are never frustrating to understand, even if they need more work 
to handle cases others know about. It’s a more quantitative means of expression.

If that kind of commitment doesn’t sound appealing, buy support contracts. Pay 
back in to the community so that those with passion for the product can do 
exactly what I’ve described here. There’s no shame in that, but users like you 
and me need to be careful with the time of those who have put their lives into 
this, at least until we can put more into the party than we have taken out.

Hope that helps!  :B
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Are there any statistics available on how most production ceph clusters are being used?

2019-04-19 Thread Janne Johansson
Den fre 19 apr. 2019 kl 12:10 skrev Marc Roos :

>
> [...]since nobody here is interested in a better rgw client for end
> users. I am wondering if the rgw is even being used like this, and what
> most production environments look like.
>
>
"Like this" ?

People use tons of scriptable and built-in clients, from s3cmd, to "My
backup software can use S3 as a remote backend"
You mentioned looking at two and now conclude noone wants s3...


> This could also be interesting information to decide in what direction
> ceph should develop in the future not?
>
>
Find an area which bugs you and fix that, present your results, don't go
ape over a failed "survey" during easter vacations.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Are there any statistics available on how most production ceph clusters are being used?

2019-04-19 Thread Marc Roos


I am a bit curious on how production ceph clusters are being used. I am 
reading here that the block storage is used a lot with openstack and 
proxmox, and via iscsi with vmare. 
But I since nobody here is interested in a better rgw client for end 
users. I am wondering if the rgw is even being used like this, and what 
most production environments look like. 

This could also be interesting information to decide in what direction 
ceph should develop in the future not?








___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com