[ 
https://issues.apache.org/jira/browse/OAK-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258433#comment-17258433
 ] 

Matt Ryan edited comment on OAK-9304 at 1/6/21, 12:43 AM:
----------------------------------------------------------

Sure thing [~reschke].  Sorry, I've been on holidays :)

Previously, in regard to the example in the description above, you said:  "The 
first of the two entries looks perfectly ok to me."  The issue here is that the 
first one does not work with Azure blob storage service - it rejects the 
request as having an invalid character in the URI.  So this is less an issue of 
whether the URI is correct per RFCs, and more an issue that the URI does not 
properly work with Azure.

More details follow.

PRIOR TO THIS FIX:  When Oak would attempt to generate a direct binary access 
URI for a filename with characters outside the ISO-8859-1 character set, this 
would result in a URI that Azure would reject with a 400-level error.  The 
reason was due to Oak failing to properly encode this filename in the 
"filename" portion of the Content-Disposition header specification.

(As background, remember that Oak declares to the cloud storage the value that 
should be used in the Content-Disposition header for requests to the generated 
direct binary access URI.  In Oak we specify both the content disposition type 
and filenames for this.  See [0] and [1] for more info.)

Example:  Suppose the filename is "umläut.jpg".  Oak would specify a 
Content-Disposition header value of:
{noformat}
inline; filename="umläut.jpg"; filename*=UTF-8''umla%CC%88ut.jpg{noformat}
This is then specified in a query parameter in the direct access URI, so this 
information gets encoded.  It is probably this encoding change that Azure does 
not expect.  Since this portion of the URI is signed, the signature doesn't 
match and the request fails.

WITH THIS FIX:  A basic ISO-8859-1 encoding is done on the "filename" value of 
the header.  This was made based on RFC6266 Section 4.3 which seems to suggest 
that only ISO-8859-1 characters are allowed for that value.

Thus the header now looks like this:
{noformat}
inline; filename="umla?ut.jpg"; filename*=UTF-8''umla%CC%88ut.jpg{noformat}
This header encodes and validates properly with Azure.  In testing, modern 
clients prefer the "filename*" portion, which results in the proper filename 
being used.

Please let me know if this is still unclear, or if it's clear now, let me know 
if you'd like me to update the bug description accordingly or just let it go :).

 

[0] - 
[https://jackrabbit.apache.org/oak/docs/features/direct-binary-access.html]

[1] - 
[https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/api/binary/BinaryDownloadOptions.html]


was (Author: mattvryan):
Sure thing [~reschke].  Sorry, I've been on holidays :)

Previously, in regard to the example in the description above, you said:  "The 
first of the two entries looks perfectly ok to me."  The issue here is that the 
first one does not work with Azure blob storage service - it rejects the 
request as having an invalid character in the URI.  So this is less an issue of 
whether the URI is correct per RFCs, and more an issue that the URI does not 
properly work with Azure.

More details follow.

PRIOR TO THIS FIX:  When Oak would attempt to generate a direct binary access 
URI for a filename with characters outside the ISO-8859-1 character set, this 
would result in a URI that Azure would reject with a 400-level error.  The 
reason was due to Oak failing to properly encode this filename in the 
"filename" portion of the Content-Disposition header specification.

(As background, remember that Oak declares to the cloud storage the value that 
should be used in the Content-Disposition header for requests to the generated 
direct binary access URI.  In Oak we specify both the content disposition type 
and filenames for this.  See [0] and [1] for more info.)

Example:  Suppose the filename is "umläut.jpg".  Oak would specify a 
Content-Disposition header value of:
{noformat}
inline; filename="umläut.jpg"; filename*=''umla%CC%88ut.jpg{noformat}
This is then specified in a query parameter in the direct access URI, so this 
information gets encoded.  It is probably this encoding change that Azure does 
not expect.  Since this portion of the URI is signed, the signature doesn't 
match and the request fails.

WITH THIS FIX:  A basic ISO-8859-1 encoding is done on the "filename" value of 
the header.  This was made based on RFC6266 Section 4.3 which seems to suggest 
that only ISO-8859-1 characters are allowed for that value.

Thus the header now looks like this:
{noformat}
inline; filename="umla?ut.jpg"; filename*=''umla%CC%88ut.jpg{noformat}
This header encodes and validates properly with Azure.  In testing, modern 
clients prefer the "filename*" portion, which results in the proper filename 
being used.

Please let me know if this is still unclear, or if it's clear now, let me know 
if you'd like me to update the bug description accordingly or just let it go :).

 

[0] - 
[https://jackrabbit.apache.org/oak/docs/features/direct-binary-access.html]

[1] - 
[https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/api/binary/BinaryDownloadOptions.html]

> Filename with special characters in direct download URI Content-Disposition 
> are causing HTTP 400 errors from Azure
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: OAK-9304
>                 URL: https://issues.apache.org/jira/browse/OAK-9304
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: blob-cloud, blob-cloud-azure, blob-plugins
>    Affects Versions: 1.36.0
>            Reporter: Matt Ryan
>            Assignee: Matt Ryan
>            Priority: Major
>
> When generating a direct download URI for a filename with certain 
> non-standard characters in the name, it can cause the resulting signed URI to 
> be considered invalid by some blob storage services (Azure in particular).  
> This can lead to blob storage services being unable to service the URl 
> request.
> For example, a filename of "Ausländische.jpg" currently requests a 
> Content-Disposition header that looks like:
> {noformat}
> inline; filename="Ausländische.jpg"; filename*=UTF-8''Ausla%CC%88ndische.jpg 
> {noformat}
> Azure blob storage service fails trying to parse a URI with that 
> Content-Disposition header specification in the query string.  It instead 
> should look like:
> {noformat}
> inline; filename="Ausla?ndische.jpg"; filename*=UTF-8''Ausla%CC%88ndische.jpg 
> {noformat}
>  
> The "filename" portion of the Content-Disposition needs to consist of 
> ISO-8859-1 characters, per [https://tools.ietf.org/html/rfc6266#section-4.3] 
> in this paragraph:
> {quote}The parameters "filename" and "filename*" differ only in that 
> "filename*" uses the encoding defined in RFC5987, allowing the use of 
> characters not present in the ISO-8859-1 character set ISO-8859-1.
> {quote}
> Note that the purpose of this ticket is to address compatibility issues with 
> blob storage services, not to ensure ISO-8859-1 compatibility.  However, by 
> encoding the "filename" portion using standard Java character set encoding 
> conversion (e.g. {{Charsets.ISO_8859_1.encode(fileName)}}), we can generate a 
> URI that works with Azure, delivers the proper Content-Disposition header in 
> responses, and generates the proper client result (meaning, the correct name 
> for the downloaded file).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to