Robert Joseph Evans created SPARK-44500: -------------------------------------------
Summary: parse_url treats key as regular expression Key: SPARK-44500 URL: https://issues.apache.org/jira/browse/SPARK-44500 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.1, 3.4.0, 3.3.0, 3.2.0 Reporter: Robert Joseph Evans To be clear I am not 100% sure that this is a bug. It might be a feature, but I don't see anywhere that it is used as a feature. If it is a feature it really should be documented, because there are pitfalls. If it is a bug it should be fixed because it is really confusing and it is simple to shoot yourself in the foot. ```scala > val urls = Seq("http://foo/bar?abc=BAD&a.c=GOOD", > "http://foo/bar?a.c=GOOD&abc=BAD").toDF > urls.selectExpr("parse_url(value, 'QUERY', 'a.c')").show(false) +----------------------------+ |parse_url(value, QUERY, a.c)| +----------------------------+ |BAD | |GOOD | +----------------------------+ > urls.selectExpr("parse_url(value, 'QUERY', 'a[c')").show(false) java.util.regex.PatternSyntaxException: Unclosed character class near index 15 (&|^)a[c=([^&]*) ^ at java.util.regex.Pattern.error(Pattern.java:1969) at java.util.regex.Pattern.clazz(Pattern.java:2562) at java.util.regex.Pattern.sequence(Pattern.java:2077) at java.util.regex.Pattern.expr(Pattern.java:2010) at java.util.regex.Pattern.compile(Pattern.java:1702) at java.util.regex.Pattern.<init>(Pattern.java:1352) at java.util.regex.Pattern.compile(Pattern.java:1028) ``` The simple fix is to quote the key when making the pattern. ```scala private def getPattern(key: UTF8String): Pattern = { Pattern.compile(REGEXPREFIX + Pattern.quote(key.toString) + REGEXSUBFIX) } ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org