[
https://issues.apache.org/jira/browse/OAK-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259232#comment-17259232
]
Matt Ryan commented on OAK-9304:
--------------------------------
{quote}my suspicion is that the problem you want to solve is somewhere else:
where the desired field value of Content-Disposition is sent to Azure.
{quote}
That's exactly where the problem lies. We don't control the encoding, their
SDK does that for us.
The documentation for this SDK specifies that you are to provide the exact
string you want in the response's Content-Disposition header. But there are
edge cases where it doesn't always behave the way it is documented. It's worth
pointing out that the AWS SDK for S3 doesn't have these problems, it works just
fine as-is. So it is definitely an issue of trying to accommodate Azure's SDK.
Compounding the problem is that the SDK we currently use in Oak is too old.
It's considered maintenance only by Microsoft now, and needs to be upgraded to
the latest (see OAK-8105). We've run into edge case issues like this before
with this version of the SDK, and it is possible this is fixed in newer
versions.
> Filename with special characters in direct download URI Content-Disposition
> are causing HTTP 400 errors from Azure
> ------------------------------------------------------------------------------------------------------------------
>
> Key: OAK-9304
> URL: https://issues.apache.org/jira/browse/OAK-9304
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: blob-cloud, blob-cloud-azure, blob-plugins
> Affects Versions: 1.36.0
> Reporter: Matt Ryan
> Assignee: Matt Ryan
> Priority: Major
>
> When generating a direct download URI for a filename with certain
> non-standard characters in the name, it can cause the resulting signed URI to
> be considered invalid by some blob storage services (Azure in particular).
> This can lead to blob storage services being unable to service the URl
> request.
> For example, a filename of "Ausländische.jpg" currently requests a
> Content-Disposition header that looks like:
> {noformat}
> inline; filename="Ausländische.jpg"; filename*=UTF-8''Ausla%CC%88ndische.jpg
> {noformat}
> Azure blob storage service fails trying to parse a URI with that
> Content-Disposition header specification in the query string. It instead
> should look like:
> {noformat}
> inline; filename="Ausla?ndische.jpg"; filename*=UTF-8''Ausla%CC%88ndische.jpg
> {noformat}
>
> The "filename" portion of the Content-Disposition needs to consist of
> ISO-8859-1 characters, per [https://tools.ietf.org/html/rfc6266#section-4.3]
> in this paragraph:
> {quote}The parameters "filename" and "filename*" differ only in that
> "filename*" uses the encoding defined in RFC5987, allowing the use of
> characters not present in the ISO-8859-1 character set ISO-8859-1.
> {quote}
> Note that the purpose of this ticket is to address compatibility issues with
> blob storage services, not to ensure ISO-8859-1 compatibility. However, by
> encoding the "filename" portion using standard Java character set encoding
> conversion (e.g. {{Charsets.ISO_8859_1.encode(fileName)}}), we can generate a
> URI that works with Azure, delivers the proper Content-Disposition header in
> responses, and generates the proper client result (meaning, the correct name
> for the downloaded file).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)