[ 
https://issues.apache.org/jira/browse/OAK-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-9304:
---------------------------
    Description: 
When generating a direct download URI for a filename with certain non-standard 
characters in the name, it can cause the resulting signed URI to be considered 
invalid by some blob storage services (Azure in particular).  This can lead to 
blob storage services being unable to service the URl request.

For example, a filename of "Ausländische.jpg" currently requests a 
Content-Disposition header that looks like:
{noformat}
inline; filename="Ausländische.jpg"; filename*=UTF-8''Ausla%CC%88ndische.jpg 
{noformat}
Azure blob storage service fails trying to parse a URI with that 
Content-Disposition header specification in the query string.  It instead 
should look like:
{noformat}
inline; filename="Ausla?ndische.jpg"; filename*=UTF-8''Ausla%CC%88ndische.jpg 
{noformat}
 

The "filename" portion of the Content-Disposition needs to consist of 
ISO-8859-1 characters, per [https://tools.ietf.org/html/rfc6266#section-4.3] in 
this paragraph:
{quote}The parameters "filename" and "filename*" differ only in that 
"filename*" uses the encoding defined in RFC5987, allowing the use of 
characters not present in the ISO-8859-1 character set ISO-8859-1.
{quote}
Note that the purpose of this ticket is to address compatibility issues with 
blob storage services, not to ensure ISO-8859-1 compatibility.  However, by 
encoding the "filename" portion using standard Java character set encoding 
conversion (e.g. {{Charsets.ISO_8859_1.encode(fileName)}}), we can generate a 
URI that works with Azure, delivers the proper Content-Disposition header in 
responses, and generates the proper client result (meaning, the correct name 
for the downloaded file).

  was:
When generating a direct download URI for a filename with certain non-standard 
characters in the name, it can cause the resulting signed URI to be considered 
invalid by some blob storage services (Azure in particular).  This can lead to 
blob storage services being unable to service the URl request.

The "filename" portion of the Content-Disposition needs to be ISO-8859-1 
encoded, per [https://tools.ietf.org/html/rfc6266#section-4.3] in this 
paragraph:
{quote}The parameters "filename" and "filename*" differ only in that 
"filename*" uses the encoding defined in RFC5987, allowing the use of 
characters not present in the ISO-8859-1 character set ISO-8859-1.
{quote}
This is not usually a problem, but if the filename provided contains 
non-standard characters, it can cause the resulting signed URI to be invalid.  
This can lead to blob storage services being unable to service the URl request.

For example, a filename of "Ausländische.jpg" currently requests a 
Content-Disposition header that looks like:
{noformat}
inline; filename="Ausländische.jpg"; filename*=UTF-8''Ausla%CC%88ndische.jpg 
{noformat}
It instead should look like:
{noformat}
inline; filename="Ausla?ndische.jpg"; filename*=UTF-8''Ausla%CC%88ndische.jpg 
{noformat}
 

The "filename" portion of the Content-Disposition needs to consist of 
ISO-8859-1 characters, per [https://tools.ietf.org/html/rfc6266#section-4.3] in 
this paragraph:
{quote}The parameters "filename" and "filename*" differ only in that 
"filename*" uses the encoding defined in RFC5987, allowing the use of 
characters not present in the ISO-8859-1 character set ISO-8859-1.
{quote}
By encoding the "filename" portion using standard Java character set encoding 
conversion (e.g. {{ 


> Filename with special characters in direct download URI Content-Disposition 
> are causing HTTP 400 errors from Azure
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: OAK-9304
>                 URL: https://issues.apache.org/jira/browse/OAK-9304
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: blob-cloud, blob-cloud-azure, blob-plugins
>    Affects Versions: 1.36.0
>            Reporter: Matt Ryan
>            Assignee: Matt Ryan
>            Priority: Major
>
> When generating a direct download URI for a filename with certain 
> non-standard characters in the name, it can cause the resulting signed URI to 
> be considered invalid by some blob storage services (Azure in particular).  
> This can lead to blob storage services being unable to service the URl 
> request.
> For example, a filename of "Ausländische.jpg" currently requests a 
> Content-Disposition header that looks like:
> {noformat}
> inline; filename="Ausländische.jpg"; filename*=UTF-8''Ausla%CC%88ndische.jpg 
> {noformat}
> Azure blob storage service fails trying to parse a URI with that 
> Content-Disposition header specification in the query string.  It instead 
> should look like:
> {noformat}
> inline; filename="Ausla?ndische.jpg"; filename*=UTF-8''Ausla%CC%88ndische.jpg 
> {noformat}
>  
> The "filename" portion of the Content-Disposition needs to consist of 
> ISO-8859-1 characters, per [https://tools.ietf.org/html/rfc6266#section-4.3] 
> in this paragraph:
> {quote}The parameters "filename" and "filename*" differ only in that 
> "filename*" uses the encoding defined in RFC5987, allowing the use of 
> characters not present in the ISO-8859-1 character set ISO-8859-1.
> {quote}
> Note that the purpose of this ticket is to address compatibility issues with 
> blob storage services, not to ensure ISO-8859-1 compatibility.  However, by 
> encoding the "filename" portion using standard Java character set encoding 
> conversion (e.g. {{Charsets.ISO_8859_1.encode(fileName)}}), we can generate a 
> URI that works with Azure, delivers the proper Content-Disposition header in 
> responses, and generates the proper client result (meaning, the correct name 
> for the downloaded file).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to