Re: RGW S3 Website hosting, non-clean code for early review
On 12/03/2015 10:47 PM, Robin H. Johnson wrote: > On Wed, Dec 02, 2015 at 03:02:12PM +0100, Javier Muñoz wrote: >> I would appreciate to know the current status of the implementation if >> possible. Any progress? Any 'deadline' to go upstream? :) > > When: > As soon as it works 100% and passes my testsuite, which I hope is very > soon. I would very much like to have this in Jewel. > > My work was being done in this branch > https://github.com/dreamhost/ceph/tree/wip-static-website-robbat2-master > However, due to master moving forward, I can't use the latest parts of > the gitbuilder and automatic testing successfully (they've moved on, > while this has stayed behind). > > Yehuda wanted me to try and NOT rebase it, for ease of his review, but > that was no longer possible :-(. > (tagged as wip-static-website-robbat2-master_yehuda-review-20151012 in > the dreamhost fork). > > The above, but squashed and updated to master as of 2015/12/02 > https://github.com/dreamhost/ceph/tree/wip-static-website-robbat2-master-20151202 > It's presently running against my testsuite, and if it passes the pieces > that I know it should [1], I'll be splitting it up to submit. > > [1] I'm seeing some failures of places where I didn't touch the code, so > having to separate those out. Thanks for the update! Javier -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RGW S3 Website hosting, non-clean code for early review
On Wed, Dec 02, 2015 at 03:02:12PM +0100, Javier Muñoz wrote: > I would appreciate to know the current status of the implementation if > possible. Any progress? Any 'deadline' to go upstream? :) When: As soon as it works 100% and passes my testsuite, which I hope is very soon. I would very much like to have this in Jewel. My work was being done in this branch https://github.com/dreamhost/ceph/tree/wip-static-website-robbat2-master However, due to master moving forward, I can't use the latest parts of the gitbuilder and automatic testing successfully (they've moved on, while this has stayed behind). Yehuda wanted me to try and NOT rebase it, for ease of his review, but that was no longer possible :-(. (tagged as wip-static-website-robbat2-master_yehuda-review-20151012 in the dreamhost fork). The above, but squashed and updated to master as of 2015/12/02 https://github.com/dreamhost/ceph/tree/wip-static-website-robbat2-master-20151202 It's presently running against my testsuite, and if it passes the pieces that I know it should [1], I'll be splitting it up to submit. [1] I'm seeing some failures of places where I didn't touch the code, so having to separate those out. -- Robin Hugh Johnson Gentoo Linux: Developer, Infrastructure Lead, Foundation Trustee E-Mail : robb...@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RGW S3 Website hosting, non-clean code for early review
Hi, I would appreciate to know the current status of the implementation if possible. Any progress? Any 'deadline' to go upstream? :) Thanks in advance! Javier On 07/31/2015 10:21 AM, Robin H. Johnson wrote: > I'm busy travelling and wrapping other things up, but will resume major > work on this in August, probably week 2/3, depending on other > interrupts/priorities. > > It's almost complete, the only missing features are: > - error page fetcher gets confused on conditional requests (I think a > different approach is needed, esp re range & if-modified requests, > they still seem to leak into the custom req_info on the second request) > - cascaded errors on conditional requests need more handling (eg > 404 bounces to the error page that doesn't exist as well) > Some further work that isn't blocking is also: > - review handling of x-amz-website-redirect-location header when it > conflicts with RoutingRules > - Check what input validations Amazon does, because already they don't > match the docs > - eg docs say any http code for redirect, but it really only allows > 301-305, 307, 308 > - Exact validation on the URI/paths is also not immediately clear, I > had some unexpected results with URL-escaped inputs. > - Review cases in which AMZ sets the errors in the header as well as the > page, didn't seem consistent, esp in the double-error cases. > - more cleanup/refactor of test cases > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RGW S3 Website hosting, non-clean code for early review
I'm busy travelling and wrapping other things up, but will resume major work on this in August, probably week 2/3, depending on other interrupts/priorities. It's almost complete, the only missing features are: - error page fetcher gets confused on conditional requests (I think a different approach is needed, esp re range if-modified requests, they still seem to leak into the custom req_info on the second request) - cascaded errors on conditional requests need more handling (eg 404 bounces to the error page that doesn't exist as well) Some further work that isn't blocking is also: - review handling of x-amz-website-redirect-location header when it conflicts with RoutingRules - Check what input validations Amazon does, because already they don't match the docs - eg docs say any http code for redirect, but it really only allows 301-305, 307, 308 - Exact validation on the URI/paths is also not immediately clear, I had some unexpected results with URL-escaped inputs. - Review cases in which AMZ sets the errors in the header as well as the page, didn't seem consistent, esp in the double-error cases. - more cleanup/refactor of test cases -- Robin Hugh Johnson Gentoo Linux: Developer, Infrastructure Lead E-Mail : robb...@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 signature.asc Description: Digital signature
Re: RGW S3 Website hosting, non-clean code for early review
- Original Message - From: Yehuda Sadeh-Weinraub yeh...@redhat.com To: Robin H. Johnson robb...@gentoo.org Cc: ceph-devel@vger.kernel.org, Jonathan LaCour jonathan.lac...@dreamhost.com Sent: Tuesday, June 23, 2015 4:07:44 PM Subject: Re: RGW S3 Website hosting, non-clean code for early review - Original Message - From: Robin H. Johnson robb...@gentoo.org To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-devel@vger.kernel.org, Jonathan LaCour jonathan.lac...@dreamhost.com Sent: Tuesday, June 23, 2015 4:04:49 PM Subject: Re: RGW S3 Website hosting, non-clean code for early review On Tue, Jun 23, 2015 at 04:30:19PM -0400, Yehuda Sadeh-Weinraub wrote: Either I have to repeat a lot of code for it, which I'm not happy about, or I have to refactor RGWGetObj* to more safely made the second GET request for the error object (and make sure range headers etc are NOT used for the get of the error object). I'm leaning to the latter. Is generating a new req_state a possibility? E.g., you catch the error at the top level, and restart most of the request processing with a newly created req_state? That was the path I was trying, but not completely succeeding. I think need to step it back further and have a partially customized copy of the RGWEnv from client_io-get_env(), so that I can build the modified req_info for req_state. It isn't a full new GET really, it's really just custom content for the body as well as some headers (mostly Content-Length, Content-Type), but ignore EPERM/EACCESS on trying to fetch that custom content, and if they are detected, consider that a success but with different HTML content. Great! I'll wait for the cleaned up pull request. Do you want pull requests per logical change of my proposed series split, or rather just one pull request with the full series? One pull request for the full series. Yehuda -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Hi, just following up on this one. I don't remember seeing a pull request. Has there been any progress? Thanks, Yehuda -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RGW S3 Website hosting, non-clean code for early review
- Original Message - From: Robin H. Johnson robb...@gentoo.org To: ceph-devel@vger.kernel.org Cc: Jonathan LaCour jonathan.lac...@dreamhost.com Sent: Tuesday, June 23, 2015 2:33:25 AM Subject: RGW S3 Website hosting, non-clean code for early review Hi, As an extension of earlier work done by Yehuda [1], I've gotten the great majority of the work done to support static website hosting in RGW, just like AmazonS3 [2]. I need to do some cleanups of the code prior to major review for submission, and solve one thorny problem first, have a few discussions about best courses of action, and then I'll be submitting this for more reviews before merging. ceph [3] s3-tests, unit tests [4] s3-tests, fuzzer tests [5] The thorny problem: --- One of the pieces of functionality in S3Website is the ability to serve any public object in the bucket as the content on a custom error page (think shiny 404 error). In some cases, like trivial 403/404 errors, we can determine this quite early, before we fetch the object, and redirect the request to the error object instead (provided that we also redo the ACL check on the error object). In more complex cases (eg 416 Range Unsatisfiable, 412 Precondition Failed), it happens very late in the RGW request processing, and the req_state struct seems to have been mangled/pre-filled with a lot of decisions that aren't solvable. Either I have to repeat a lot of code for it, which I'm not happy about, or I have to refactor RGWGetObj* to more safely made the second GET request for the error object (and make sure range headers etc are NOT used for the get of the error object). I'm leaning to the latter. Is generating a new req_state a possibility? E.g., you catch the error at the top level, and restart most of the request processing with a newly created req_state? Oh, and for added fun, if an error object is configured, but is missing or private, you get a similar but different than without any error object configured, and sometimes the error codes are in the headers, but not always. Discussion pieces: -- RGWRegion - presently has both endpoints and hostnames, but doesn't make clear which APIs (S3, Swift, S3Website) might be available at each; or allow combinations to dedicate a specific FQDN to a given API. I'd like to replace both structures with a map structure [6] Makes sense. Bucket existence privacy: - In general I agree with the goal that we should be closely compatible with AmazonS3, but with an eye to security, I'd like to consider a specific deviation: - In AmazonS3, you can enumerate buckets for existence, simply looking for 404 NoSuchBucket vs 403 AccessDenied. I'd like to offer a configuration option that returns 403 Forbidden or 401 Unauthorized on anonymous requests to non-existent buckets. As long as it's configurable. - Testing some of functionality against AmazonS3 has been somewhat painful, as AmazonS3 only provides eventual consistency of the website configuration (with the highest time I've seen so far being about 30 seconds). Yup. New configuration options/changes: -- rgw_enable_apis: gains 's3website' mode rgw_dns_s3website_name: similar to rgw_dns_name, but for s3website endpoint RGWRegion having per-rgw-api hostnames Patch series breakdown plans: - Here's the breakdown of patch series I'm considering for the changes (net 2kLOC in ceph, 1kLOC in testcases). [TODO marks pieces not in these sets of commits yet, but will be soon). ceph - split Formatter.cc - JSON/XML/Table formatter are separator now - add header footer support for formatters - add knowledge of status - add HTML formatter - Add optional error handler hooks to RGWOp and RGWHandler for abort_early - Add optional retarget handler hooks - Add more flexible redirect handling - S3website code - x-amz-website-redirect-location handling (TODO: needs a bit more polish and testing) - TODO: Add more input validations to match S3, on stuff that's NOT documented but was discovered when I applied weirder testcases to AmazonS3: - 'Hostname' field has non-trivial validation (maybe borrow the outcome of wip-bucket_name_restrictions) - The 'Protocol' field for a redirect must be http/https, cannot be gopher or anything else. - The HttpRedirectCode field must contain one of: 301-305, 307, 308 The docs don't say this, and the error message says 'Any 3XX value except 300'. - First-match in RoutingRules wins; watch out with rules that match 4XX error codes. - Documentation - TODO: esp the parts missing from the S3 docs above s3-tests, unit tests - refactor for more requests - add new utiliities - add website tests s3-tests, fuzzer tests [5] Links for all the bits above [1] https://github.com/ceph/ceph
RGW S3 Website hosting, non-clean code for early review
Hi, As an extension of earlier work done by Yehuda [1], I've gotten the great majority of the work done to support static website hosting in RGW, just like AmazonS3 [2]. I need to do some cleanups of the code prior to major review for submission, and solve one thorny problem first, have a few discussions about best courses of action, and then I'll be submitting this for more reviews before merging. ceph [3] s3-tests, unit tests [4] s3-tests, fuzzer tests [5] The thorny problem: --- One of the pieces of functionality in S3Website is the ability to serve any public object in the bucket as the content on a custom error page (think shiny 404 error). In some cases, like trivial 403/404 errors, we can determine this quite early, before we fetch the object, and redirect the request to the error object instead (provided that we also redo the ACL check on the error object). In more complex cases (eg 416 Range Unsatisfiable, 412 Precondition Failed), it happens very late in the RGW request processing, and the req_state struct seems to have been mangled/pre-filled with a lot of decisions that aren't solvable. Either I have to repeat a lot of code for it, which I'm not happy about, or I have to refactor RGWGetObj* to more safely made the second GET request for the error object (and make sure range headers etc are NOT used for the get of the error object). I'm leaning to the latter. Oh, and for added fun, if an error object is configured, but is missing or private, you get a similar but different than without any error object configured, and sometimes the error codes are in the headers, but not always. Discussion pieces: -- RGWRegion - presently has both endpoints and hostnames, but doesn't make clear which APIs (S3, Swift, S3Website) might be available at each; or allow combinations to dedicate a specific FQDN to a given API. I'd like to replace both structures with a map structure [6] Bucket existence privacy: - In general I agree with the goal that we should be closely compatible with AmazonS3, but with an eye to security, I'd like to consider a specific deviation: - In AmazonS3, you can enumerate buckets for existence, simply looking for 404 NoSuchBucket vs 403 AccessDenied. I'd like to offer a configuration option that returns 403 Forbidden or 401 Unauthorized on anonymous requests to non-existent buckets. - Testing some of functionality against AmazonS3 has been somewhat painful, as AmazonS3 only provides eventual consistency of the website configuration (with the highest time I've seen so far being about 30 seconds). New configuration options/changes: -- rgw_enable_apis: gains 's3website' mode rgw_dns_s3website_name: similar to rgw_dns_name, but for s3website endpoint RGWRegion having per-rgw-api hostnames Patch series breakdown plans: - Here's the breakdown of patch series I'm considering for the changes (net 2kLOC in ceph, 1kLOC in testcases). [TODO marks pieces not in these sets of commits yet, but will be soon). ceph - split Formatter.cc - JSON/XML/Table formatter are separator now - add header footer support for formatters - add knowledge of status - add HTML formatter - Add optional error handler hooks to RGWOp and RGWHandler for abort_early - Add optional retarget handler hooks - Add more flexible redirect handling - S3website code - x-amz-website-redirect-location handling (TODO: needs a bit more polish and testing) - TODO: Add more input validations to match S3, on stuff that's NOT documented but was discovered when I applied weirder testcases to AmazonS3: - 'Hostname' field has non-trivial validation (maybe borrow the outcome of wip-bucket_name_restrictions) - The 'Protocol' field for a redirect must be http/https, cannot be gopher or anything else. - The HttpRedirectCode field must contain one of: 301-305, 307, 308 The docs don't say this, and the error message says 'Any 3XX value except 300'. - First-match in RoutingRules wins; watch out with rules that match 4XX error codes. - Documentation - TODO: esp the parts missing from the S3 docs above s3-tests, unit tests - refactor for more requests - add new utiliities - add website tests s3-tests, fuzzer tests [5] Links for all the bits above [1] https://github.com/ceph/ceph/tree/wip-static-website [2] http://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteHosting.html [3] https://github.com/ceph/ceph/compare/master...robbat2:wip-static-website-robbat2-master [4] https://github.com/ceph/s3-tests/compare/master...robbat2:wip-static-website [5] https://github.com/ceph/s3-tests/compare/master...robbat2:wip-website-fuzzy [6] https://github.com/ceph/ceph/compare/master...robbat2:wip-static-website-robbat2-master#diff-ee7891a35944697538486c9269e0d65bR909 -- Robin Hugh Johnson Gentoo Linux: Developer, Infrastructure Lead E-Mail : robb...@gentoo.org GnuPG FP
Re: RGW S3 Website hosting, non-clean code for early review
On Tue, Jun 23, 2015 at 04:30:19PM -0400, Yehuda Sadeh-Weinraub wrote: Either I have to repeat a lot of code for it, which I'm not happy about, or I have to refactor RGWGetObj* to more safely made the second GET request for the error object (and make sure range headers etc are NOT used for the get of the error object). I'm leaning to the latter. Is generating a new req_state a possibility? E.g., you catch the error at the top level, and restart most of the request processing with a newly created req_state? That was the path I was trying, but not completely succeeding. I think need to step it back further and have a partially customized copy of the RGWEnv from client_io-get_env(), so that I can build the modified req_info for req_state. It isn't a full new GET really, it's really just custom content for the body as well as some headers (mostly Content-Length, Content-Type), but ignore EPERM/EACCESS on trying to fetch that custom content, and if they are detected, consider that a success but with different HTML content. Great! I'll wait for the cleaned up pull request. Do you want pull requests per logical change of my proposed series split, or rather just one pull request with the full series? -- Robin Hugh Johnson Gentoo Linux: Developer, Infrastructure Lead E-Mail : robb...@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RGW S3 Website hosting, non-clean code for early review
- Original Message - From: Robin H. Johnson robb...@gentoo.org To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-devel@vger.kernel.org, Jonathan LaCour jonathan.lac...@dreamhost.com Sent: Tuesday, June 23, 2015 4:04:49 PM Subject: Re: RGW S3 Website hosting, non-clean code for early review On Tue, Jun 23, 2015 at 04:30:19PM -0400, Yehuda Sadeh-Weinraub wrote: Either I have to repeat a lot of code for it, which I'm not happy about, or I have to refactor RGWGetObj* to more safely made the second GET request for the error object (and make sure range headers etc are NOT used for the get of the error object). I'm leaning to the latter. Is generating a new req_state a possibility? E.g., you catch the error at the top level, and restart most of the request processing with a newly created req_state? That was the path I was trying, but not completely succeeding. I think need to step it back further and have a partially customized copy of the RGWEnv from client_io-get_env(), so that I can build the modified req_info for req_state. It isn't a full new GET really, it's really just custom content for the body as well as some headers (mostly Content-Length, Content-Type), but ignore EPERM/EACCESS on trying to fetch that custom content, and if they are detected, consider that a success but with different HTML content. Great! I'll wait for the cleaned up pull request. Do you want pull requests per logical change of my proposed series split, or rather just one pull request with the full series? One pull request for the full series. Yehuda -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html