[ 
https://issues.apache.org/jira/browse/OAK-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258433#comment-17258433
 ] 

Matt Ryan edited comment on OAK-9304 at 1/4/21, 7:14 PM:
---------------------------------------------------------

Sure thing [~reschke].  Sorry, I've been on holidays :)

Previously, in regard to the example in the description above, you said:  "The 
first of the two entries looks perfectly ok to me."  The issue here is that the 
first one does not work with Azure blob storage service - it rejects the 
request as having an invalid character in the URI.  So this is less an issue of 
whether the URI is correct per RFCs, and more an issue that the URI does not 
properly work with Azure.

More details follow.

PRIOR TO THIS FIX:  When Oak would attempt to generate a direct binary access 
URI for a filename with characters outside the ISO-8859-1 character set, this 
would result in a URI that Azure would reject with a 400-level error.  The 
reason was due to Oak failing to properly encode this filename in the 
"filename" portion of the Content-Disposition header specification.

(As background, remember that Oak declares to the cloud storage the value that 
should be used in the Content-Disposition header for requests to the generated 
direct binary access URI.  In Oak we specify both the content disposition type 
and filenames for this.  See [0] and [1] for more info.)

Example:  Suppose the filename is "umläut.jpg".  Oak would specify a 
Content-Disposition header value of:
{noformat}
inline; filename="umläut.jpg"; filename*=''umla%CC%88ut.jpg{noformat}
This is then specified in a query parameter in the direct access URI, so this 
information gets encoded.  It is probably this encoding change that Azure does 
not expect.  Since this portion of the URI is signed, the signature doesn't 
match and the request fails.

WITH THIS FIX:  A basic ISO-8859-1 encoding is done on the "filename" value of 
the header.  This was made based on RFC6266 Section 4.3 which seems to suggest 
that only ISO-8859-1 characters are allowed for that value.

Thus the header now looks like this:
{noformat}
inline; filename="umla?ut.jpg"; filename*=''umla%CC%88ut.jpg{noformat}
This header encodes and validates properly with Azure.  In testing, modern 
clients prefer the "filename*" portion, which results in the proper filename 
being used.

Please let me know if this is still unclear.

 

[0] - 
[https://jackrabbit.apache.org/oak/docs/features/direct-binary-access.html]

[1] - 
[https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/api/binary/BinaryDownloadOptions.html]


was (Author: mattvryan):
Sure thing [~reschke].  Sorry, I've been on holidays :)

PRIOR TO THIS FIX:  When Oak would attempt to generate a direct binary access 
URI for a filename with characters outside the ISO-8859-1 character set, this 
would result in a URI that Azure would reject with a 400-level error.  The 
reason was due to Oak failing to properly encode this filename in the 
"filename" portion of the Content-Disposition header specification.

(As background, remember that Oak declares to the cloud storage the value that 
should be used in the Content-Disposition header for requests to the generated 
direct binary access URI.  In Oak we specify both the content disposition type 
and filenames for this.  See [0] and [1] for more info.)

Example:  Suppose the filename is "umläut.jpg".  Oak would specify a 
Content-Disposition header value of:
{noformat}
inline; filename="umläut.jpg"; filename*=''umla%CC%88ut.jpg{noformat}
This is then specified in a query parameter in the direct access URI, so this 
information gets encoded.  It is probably this encoding change that Azure does 
not expect.  Since this portion of the URI is signed, the signature doesn't 
match and the request fails.

WITH THIS FIX:  A basic ISO-8859-1 encoding is done on the "filename" value of 
the header.  This was made based on RFC6266 Section 4.3 which seems to suggest 
that only ISO-8859-1 characters are allowed for that value.

Thus the header now looks like this:
{noformat}
inline; filename="umla?ut.jpg"; filename*=''umla%CC%88ut.jpg{noformat}
This header encodes and validates properly with Azure.  In testing, modern 
clients prefer the "filename*" portion, which results in the proper filename 
being used.

Please let me know if this is still unclear.

 

[0] - 
[https://jackrabbit.apache.org/oak/docs/features/direct-binary-access.html]

[1] - 
[https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/api/binary/BinaryDownloadOptions.html]

> Filename portion of direct download URI Content-Disposition should be 
> ISO-8859-1 encoded
> ----------------------------------------------------------------------------------------
>
>                 Key: OAK-9304
>                 URL: https://issues.apache.org/jira/browse/OAK-9304
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: blob-cloud, blob-cloud-azure, blob-plugins
>    Affects Versions: 1.36.0
>            Reporter: Matt Ryan
>            Assignee: Matt Ryan
>            Priority: Major
>
> The "filename" portion of the Content-Disposition needs to be ISO-8859-1 
> encoded, per [https://tools.ietf.org/html/rfc6266#section-4.3] in this 
> paragraph:
> {quote}The parameters "filename" and "filename*" differ only in that 
> "filename*" uses the encoding defined in RFC5987, allowing the use of 
> characters not present in the ISO-8859-1 character set ISO-8859-1.
> {quote}
> This is not usually a problem, but if the filename provided contains 
> non-standard characters, it can cause the resulting signed URI to be invalid. 
>  This can lead to blob storage services being unable to service the URl 
> request.
> For example, a filename of "Ausländische.jpg" currently requests a 
> Content-Disposition header that looks like:
> {noformat}
> inline; filename="Ausländische.jpg"; filename*=UTF-8''Ausla%CC%88ndische.jpg 
> {noformat}
> It instead should look like:
> {noformat}
> inline; filename="Ausla?ndische.jpg"; filename*=UTF-8''Ausla%CC%88ndische.jpg 
> {noformat}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to