Martin Becker created OFBIZ-10275:
-------------------------------------

             Summary: UtilCodec URL decoding breaks values with german umlauts
                 Key: OFBIZ-10275
                 URL: https://issues.apache.org/jira/browse/OFBIZ-10275
             Project: OFBiz
          Issue Type: Bug
          Components: framework
    Affects Versions: Trunk
            Reporter: Martin Becker


...and other UTF-8 characters encoded in two hex. values like in this example:
{code:java}
String example = "/webcontent/example_öl.jpg";
String encoded = UtilCodec.getEncoder("url").encode(example);
System.out.println(encoded);
=> "%2Fwebcontent%2Fexample_%C3%B6l.jpg"

String decoded = UtilCodec.getDecoder("url").decode(encoded); 
System.out.println(decoded);
=> "/webcontent/example_öl.jpg"{code}
 

The reason for this is the OWASP ESAPI PercentCodec implementation used within 
the method UtilCodec.canonicalize, called before the proper decoding via 
java.net.URLDecoder here:
{code:java}
public String decode(String original) {
    try {
        String canonical = canonicalize(original);
        return URLDecoder.decode(canonical, "UTF-8");
    } catch (UnsupportedEncodingException ee) {
        Debug.logError(ee, module);
        return null;
    }
}{code}
 

The fix could be to only use the canonicalize logic to check the original value 
for double/mixed encoding and to encode the original value afterwards via 
URLDecoder instead of using the canonicalize output for this.
 This way the UrlCodec decode method matches the encode method by only using 
URLDecoder / URLEncoder for doing the main job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to