Re: RGW S3 Website hosting, non-clean code for early review

2015-12-04 Thread Javier Muñoz
On 12/03/2015 10:47 PM, Robin H. Johnson wrote:
> On Wed, Dec 02, 2015 at 03:02:12PM +0100, Javier Muñoz wrote:
>> I would appreciate to know the current status of the implementation if
>> possible. Any progress? Any 'deadline' to go upstream? :)
> 
> When:
> As soon as it works 100% and passes my testsuite, which I hope is very
> soon. I would very much like to have this in Jewel.
> 
> My work was being done in this branch
> https://github.com/dreamhost/ceph/tree/wip-static-website-robbat2-master
> However, due to master moving forward, I can't use the latest parts of
> the gitbuilder and automatic testing successfully (they've moved on,
> while this has stayed behind).
> 
> Yehuda wanted me to try and NOT rebase it, for ease of his review, but
> that was no longer possible :-(.
> (tagged as wip-static-website-robbat2-master_yehuda-review-20151012 in
> the dreamhost fork).
> 
> The above, but squashed and updated to master as of 2015/12/02
> https://github.com/dreamhost/ceph/tree/wip-static-website-robbat2-master-20151202
> It's presently running against my testsuite, and if it passes the pieces
> that I know it should [1], I'll be splitting it up to submit.
> 
> [1] I'm seeing some failures of places where I didn't touch the code, so
> having to separate those out.

Thanks for the update!

Javier

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RGW S3 Website hosting, non-clean code for early review

2015-12-03 Thread Robin H. Johnson
On Wed, Dec 02, 2015 at 03:02:12PM +0100, Javier Muñoz wrote:
> I would appreciate to know the current status of the implementation if
> possible. Any progress? Any 'deadline' to go upstream? :)

When:
As soon as it works 100% and passes my testsuite, which I hope is very
soon. I would very much like to have this in Jewel.

My work was being done in this branch
https://github.com/dreamhost/ceph/tree/wip-static-website-robbat2-master
However, due to master moving forward, I can't use the latest parts of
the gitbuilder and automatic testing successfully (they've moved on,
while this has stayed behind).

Yehuda wanted me to try and NOT rebase it, for ease of his review, but
that was no longer possible :-(.
(tagged as wip-static-website-robbat2-master_yehuda-review-20151012 in
the dreamhost fork).

The above, but squashed and updated to master as of 2015/12/02
https://github.com/dreamhost/ceph/tree/wip-static-website-robbat2-master-20151202
It's presently running against my testsuite, and if it passes the pieces
that I know it should [1], I'll be splitting it up to submit.

[1] I'm seeing some failures of places where I didn't touch the code, so
having to separate those out.

-- 
Robin Hugh Johnson
Gentoo Linux: Developer, Infrastructure Lead, Foundation Trustee
E-Mail : robb...@gentoo.org
GnuPG FP   : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RGW S3 Website hosting, non-clean code for early review

2015-12-02 Thread Javier Muñoz
Hi,

I would appreciate to know the current status of the implementation if
possible. Any progress? Any 'deadline' to go upstream? :)

Thanks in advance!

Javier

On 07/31/2015 10:21 AM, Robin H. Johnson wrote:
> I'm busy travelling and wrapping other things up, but will resume major
> work on this in August, probably week 2/3, depending on other
> interrupts/priorities.
> 
> It's almost complete, the only missing features are:
> - error page fetcher gets confused on conditional requests (I think a
>   different approach is needed, esp re range & if-modified requests,
>   they still seem to leak into the custom req_info on the second request)
> - cascaded errors on conditional requests need more handling (eg
>   404 bounces to the error page that doesn't exist as well)
> Some further work that isn't blocking is also:
> - review handling of x-amz-website-redirect-location header when it
>   conflicts with RoutingRules
> - Check what input validations Amazon does, because already they don't
>   match the docs 
>   - eg docs say any http code for redirect, but it really only allows
>   301-305, 307, 308
>   - Exact validation on the URI/paths is also not immediately clear, I
>   had some unexpected results with URL-escaped inputs.
> - Review cases in which AMZ sets the errors in the header as well as the
>   page, didn't seem consistent, esp in the double-error cases.
> - more cleanup/refactor of test cases
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RGW S3 Website hosting, non-clean code for early review

2015-07-31 Thread Robin H. Johnson
I'm busy travelling and wrapping other things up, but will resume major
work on this in August, probably week 2/3, depending on other
interrupts/priorities.

It's almost complete, the only missing features are:
- error page fetcher gets confused on conditional requests (I think a
  different approach is needed, esp re range  if-modified requests,
  they still seem to leak into the custom req_info on the second request)
- cascaded errors on conditional requests need more handling (eg
  404 bounces to the error page that doesn't exist as well)
Some further work that isn't blocking is also:
- review handling of x-amz-website-redirect-location header when it
  conflicts with RoutingRules
- Check what input validations Amazon does, because already they don't
  match the docs 
  - eg docs say any http code for redirect, but it really only allows
301-305, 307, 308
  - Exact validation on the URI/paths is also not immediately clear, I
had some unexpected results with URL-escaped inputs.
- Review cases in which AMZ sets the errors in the header as well as the
  page, didn't seem consistent, esp in the double-error cases.
- more cleanup/refactor of test cases

-- 
Robin Hugh Johnson
Gentoo Linux: Developer, Infrastructure Lead
E-Mail : robb...@gentoo.org
GnuPG FP   : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85


signature.asc
Description: Digital signature


Re: RGW S3 Website hosting, non-clean code for early review

2015-07-30 Thread Yehuda Sadeh-Weinraub


- Original Message -
 From: Yehuda Sadeh-Weinraub yeh...@redhat.com
 To: Robin H. Johnson robb...@gentoo.org
 Cc: ceph-devel@vger.kernel.org, Jonathan LaCour 
 jonathan.lac...@dreamhost.com
 Sent: Tuesday, June 23, 2015 4:07:44 PM
 Subject: Re: RGW S3 Website hosting, non-clean code for early review
 
 
 
 - Original Message -
  From: Robin H. Johnson robb...@gentoo.org
  To: Yehuda Sadeh-Weinraub yeh...@redhat.com
  Cc: ceph-devel@vger.kernel.org, Jonathan LaCour
  jonathan.lac...@dreamhost.com
  Sent: Tuesday, June 23, 2015 4:04:49 PM
  Subject: Re: RGW S3 Website hosting, non-clean code for early review
  
  
  On Tue, Jun 23, 2015 at 04:30:19PM -0400, Yehuda Sadeh-Weinraub wrote:
Either I have to repeat a lot of code for it, which I'm not happy
about,
or I have to refactor RGWGetObj* to more safely made the second GET
request for the error object (and make sure range headers etc are NOT
used for the get of the error object). I'm leaning to the latter.
   Is generating a new req_state a possibility? E.g., you catch the error
   at the top level, and restart most of the request processing with a
   newly created req_state?
  That was the path I was trying, but not completely succeeding.
  I think need to step it back further and have a partially customized
  copy of the RGWEnv from client_io-get_env(), so that I can build the
  modified req_info for req_state.
  
  It isn't a full new GET really, it's really just custom content for the
  body as well as some headers (mostly Content-Length, Content-Type), but
  ignore EPERM/EACCESS on trying to fetch that custom content, and if they
  are detected, consider that a success but with different HTML content.
  
   Great! I'll wait for the cleaned up pull request.
  Do you want pull requests per logical change of my proposed series
  split, or rather just one pull request with the full series?
  
 
 One pull request for the full series.
 
 Yehuda
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

Hi,

just following up on this one. I don't remember seeing a pull request. Has 
there been any progress?

Thanks,
Yehuda
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RGW S3 Website hosting, non-clean code for early review

2015-06-23 Thread Yehuda Sadeh-Weinraub


- Original Message -
 From: Robin H. Johnson robb...@gentoo.org
 To: ceph-devel@vger.kernel.org
 Cc: Jonathan LaCour jonathan.lac...@dreamhost.com
 Sent: Tuesday, June 23, 2015 2:33:25 AM
 Subject: RGW S3 Website hosting, non-clean code for early review
 
 Hi,
 
 As an extension of earlier work done by Yehuda [1], I've gotten the
 great majority of the work done to support static website hosting in
 RGW, just like AmazonS3 [2].
 
 I need to do some cleanups of the code prior to major review for
 submission, and solve one thorny problem first, have a few discussions
 about best courses of action, and then I'll be submitting this for more
 reviews before merging.
 
 ceph [3]
 s3-tests, unit tests [4]
 s3-tests, fuzzer tests [5]
 
 The thorny problem:
 ---
 One of the pieces of functionality in S3Website is the ability to serve
 any public object in the bucket as the content on a custom error page
 (think shiny 404 error). In some cases, like trivial 403/404 errors, we
 can determine this quite early, before we fetch the object, and redirect
 the request to the error object instead (provided that we also redo the
 ACL check on the error object).
 
 In more complex cases (eg 416 Range Unsatisfiable, 412 Precondition
 Failed), it happens very late in the RGW request processing, and the
 req_state struct seems to have been mangled/pre-filled with a lot of
 decisions that aren't solvable.
 
 Either I have to repeat a lot of code for it, which I'm not happy about,
 or I have to refactor RGWGetObj* to more safely made the second GET
 request for the error object (and make sure range headers etc are NOT
 used for the get of the error object). I'm leaning to the latter.

Is generating a new req_state a possibility? E.g., you catch the error at the 
top level, and restart most of the request processing with a newly created 
req_state?

 
 Oh, and for added fun, if an error object is configured, but is missing
 or private, you get a similar but different than without any error
 object configured, and sometimes the error codes are in the headers, but
 not always.
 
 Discussion pieces:
 --
 RGWRegion
 - presently has both endpoints and hostnames, but doesn't make clear
   which APIs (S3, Swift, S3Website) might be available at each; or allow
   combinations to dedicate a specific FQDN to a given API.
   I'd like to replace both structures with a map structure [6]

Makes sense.

 Bucket existence privacy:
 - In general I agree with the goal that we should be closely compatible
   with AmazonS3, but with an eye to security, I'd like to consider a specific
   deviation:
 - In AmazonS3, you can enumerate buckets for existence, simply looking
   for 404 NoSuchBucket vs 403 AccessDenied. I'd like to offer a
   configuration option that returns 403 Forbidden or 401 Unauthorized on
   anonymous requests to non-existent buckets.

As long as it's configurable.

 - Testing some of functionality against AmazonS3 has been somewhat
   painful, as AmazonS3 only provides eventual consistency of the website
   configuration (with the highest time I've seen so far being about 30
   seconds).

Yup.

 
 New configuration options/changes:
 --
 rgw_enable_apis: gains 's3website' mode
 rgw_dns_s3website_name: similar to rgw_dns_name, but for s3website endpoint
 RGWRegion having per-rgw-api hostnames
 
 Patch series breakdown plans:
 -
 Here's the breakdown of patch series I'm considering for the changes
 (net 2kLOC in ceph, 1kLOC in testcases).
 [TODO marks pieces not in these sets of commits yet, but will be soon).
 
 ceph
 - split Formatter.cc
   - JSON/XML/Table formatter are separator now
   - add header  footer support for formatters
   - add knowledge of status
   - add HTML formatter
 - Add optional error handler hooks to RGWOp and RGWHandler for abort_early
 - Add optional retarget handler hooks
 - Add more flexible redirect handling
 - S3website code
 - x-amz-website-redirect-location handling (TODO: needs a bit more polish and
 testing)
 - TODO: Add more input validations to match S3, on stuff that's NOT
   documented but was discovered when I applied weirder testcases to
   AmazonS3:
   - 'Hostname' field has non-trivial validation (maybe borrow the
 outcome of wip-bucket_name_restrictions)
   - The 'Protocol' field for a redirect must be http/https, cannot be
 gopher or anything else.
   - The HttpRedirectCode field must contain one of: 301-305, 307, 308
 The docs don't say this, and the error message says 'Any 3XX value
 except 300'.
   - First-match in RoutingRules wins; watch out with rules that match
 4XX error codes.
 - Documentation
   - TODO: esp the parts missing from the S3 docs above
 
 s3-tests, unit tests
 - refactor for more requests
 - add new utiliities
 - add website tests
 s3-tests, fuzzer tests [5]
 
 Links for all the bits above
 
 [1] https://github.com/ceph/ceph

RGW S3 Website hosting, non-clean code for early review

2015-06-23 Thread Robin H. Johnson
Hi,

As an extension of earlier work done by Yehuda [1], I've gotten the
great majority of the work done to support static website hosting in
RGW, just like AmazonS3 [2].

I need to do some cleanups of the code prior to major review for
submission, and solve one thorny problem first, have a few discussions
about best courses of action, and then I'll be submitting this for more
reviews before merging.

ceph [3]
s3-tests, unit tests [4] 
s3-tests, fuzzer tests [5]

The thorny problem:
---
One of the pieces of functionality in S3Website is the ability to serve
any public object in the bucket as the content on a custom error page
(think shiny 404 error). In some cases, like trivial 403/404 errors, we
can determine this quite early, before we fetch the object, and redirect
the request to the error object instead (provided that we also redo the
ACL check on the error object).

In more complex cases (eg 416 Range Unsatisfiable, 412 Precondition
Failed), it happens very late in the RGW request processing, and the
req_state struct seems to have been mangled/pre-filled with a lot of
decisions that aren't solvable.

Either I have to repeat a lot of code for it, which I'm not happy about,
or I have to refactor RGWGetObj* to more safely made the second GET
request for the error object (and make sure range headers etc are NOT
used for the get of the error object). I'm leaning to the latter.

Oh, and for added fun, if an error object is configured, but is missing
or private, you get a similar but different than without any error
object configured, and sometimes the error codes are in the headers, but
not always.

Discussion pieces:
--
RGWRegion
- presently has both endpoints and hostnames, but doesn't make clear
  which APIs (S3, Swift, S3Website) might be available at each; or allow
  combinations to dedicate a specific FQDN to a given API.
  I'd like to replace both structures with a map structure [6]
Bucket existence privacy:
- In general I agree with the goal that we should be closely compatible
  with AmazonS3, but with an eye to security, I'd like to consider a specific
  deviation:
- In AmazonS3, you can enumerate buckets for existence, simply looking
  for 404 NoSuchBucket vs 403 AccessDenied. I'd like to offer a
  configuration option that returns 403 Forbidden or 401 Unauthorized on
  anonymous requests to non-existent buckets.
- Testing some of functionality against AmazonS3 has been somewhat
  painful, as AmazonS3 only provides eventual consistency of the website
  configuration (with the highest time I've seen so far being about 30
  seconds).

New configuration options/changes:
--
rgw_enable_apis: gains 's3website' mode
rgw_dns_s3website_name: similar to rgw_dns_name, but for s3website endpoint
RGWRegion having per-rgw-api hostnames

Patch series breakdown plans:
-
Here's the breakdown of patch series I'm considering for the changes
(net 2kLOC in ceph, 1kLOC in testcases).
[TODO marks pieces not in these sets of commits yet, but will be soon).

ceph
- split Formatter.cc
  - JSON/XML/Table formatter are separator now
  - add header  footer support for formatters
  - add knowledge of status
  - add HTML formatter
- Add optional error handler hooks to RGWOp and RGWHandler for abort_early
- Add optional retarget handler hooks
- Add more flexible redirect handling
- S3website code
- x-amz-website-redirect-location handling (TODO: needs a bit more polish and 
testing)
- TODO: Add more input validations to match S3, on stuff that's NOT
  documented but was discovered when I applied weirder testcases to
  AmazonS3:
  - 'Hostname' field has non-trivial validation (maybe borrow the
outcome of wip-bucket_name_restrictions)
  - The 'Protocol' field for a redirect must be http/https, cannot be
gopher or anything else.
  - The HttpRedirectCode field must contain one of: 301-305, 307, 308
The docs don't say this, and the error message says 'Any 3XX value
except 300'.
  - First-match in RoutingRules wins; watch out with rules that match
4XX error codes.
- Documentation
  - TODO: esp the parts missing from the S3 docs above

s3-tests, unit tests
- refactor for more requests
- add new utiliities
- add website tests
s3-tests, fuzzer tests [5]

Links for all the bits above

[1] https://github.com/ceph/ceph/tree/wip-static-website
[2] http://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteHosting.html
[3] 
https://github.com/ceph/ceph/compare/master...robbat2:wip-static-website-robbat2-master
[4] https://github.com/ceph/s3-tests/compare/master...robbat2:wip-static-website
[5] https://github.com/ceph/s3-tests/compare/master...robbat2:wip-website-fuzzy
[6] 
https://github.com/ceph/ceph/compare/master...robbat2:wip-static-website-robbat2-master#diff-ee7891a35944697538486c9269e0d65bR909

-- 
Robin Hugh Johnson
Gentoo Linux: Developer, Infrastructure Lead
E-Mail : robb...@gentoo.org
GnuPG FP 

Re: RGW S3 Website hosting, non-clean code for early review

2015-06-23 Thread Robin H. Johnson

On Tue, Jun 23, 2015 at 04:30:19PM -0400, Yehuda Sadeh-Weinraub wrote:
  Either I have to repeat a lot of code for it, which I'm not happy about,
  or I have to refactor RGWGetObj* to more safely made the second GET
  request for the error object (and make sure range headers etc are NOT
  used for the get of the error object). I'm leaning to the latter.
 Is generating a new req_state a possibility? E.g., you catch the error
 at the top level, and restart most of the request processing with a
 newly created req_state?
That was the path I was trying, but not completely succeeding. 
I think need to step it back further and have a partially customized
copy of the RGWEnv from client_io-get_env(), so that I can build the
modified req_info for req_state.

It isn't a full new GET really, it's really just custom content for the
body as well as some headers (mostly Content-Length, Content-Type), but
ignore EPERM/EACCESS on trying to fetch that custom content, and if they
are detected, consider that a success but with different HTML content.

 Great! I'll wait for the cleaned up pull request.
Do you want pull requests per logical change of my proposed series
split, or rather just one pull request with the full series?

-- 
Robin Hugh Johnson
Gentoo Linux: Developer, Infrastructure Lead
E-Mail : robb...@gentoo.org
GnuPG FP   : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RGW S3 Website hosting, non-clean code for early review

2015-06-23 Thread Yehuda Sadeh-Weinraub


- Original Message -
 From: Robin H. Johnson robb...@gentoo.org
 To: Yehuda Sadeh-Weinraub yeh...@redhat.com
 Cc: ceph-devel@vger.kernel.org, Jonathan LaCour 
 jonathan.lac...@dreamhost.com
 Sent: Tuesday, June 23, 2015 4:04:49 PM
 Subject: Re: RGW S3 Website hosting, non-clean code for early review
 
 
 On Tue, Jun 23, 2015 at 04:30:19PM -0400, Yehuda Sadeh-Weinraub wrote:
   Either I have to repeat a lot of code for it, which I'm not happy about,
   or I have to refactor RGWGetObj* to more safely made the second GET
   request for the error object (and make sure range headers etc are NOT
   used for the get of the error object). I'm leaning to the latter.
  Is generating a new req_state a possibility? E.g., you catch the error
  at the top level, and restart most of the request processing with a
  newly created req_state?
 That was the path I was trying, but not completely succeeding.
 I think need to step it back further and have a partially customized
 copy of the RGWEnv from client_io-get_env(), so that I can build the
 modified req_info for req_state.
 
 It isn't a full new GET really, it's really just custom content for the
 body as well as some headers (mostly Content-Length, Content-Type), but
 ignore EPERM/EACCESS on trying to fetch that custom content, and if they
 are detected, consider that a success but with different HTML content.
 
  Great! I'll wait for the cleaned up pull request.
 Do you want pull requests per logical change of my proposed series
 split, or rather just one pull request with the full series?
 

One pull request for the full series.

Yehuda
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html