[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-05-04 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17736 let's create a new PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-05-04 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17736 @hvanhovell Thanks for comment. @cloud-fan I am going to add the config, is a new PR better or just update this? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-05-04 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17736 For some reference. In 1.6 we used the Catalyst SqlParser to parse the expression in `Dataframe.filter()`, and we used the Hive (ANTLR based) parser for parsing for SQL commands. In Spark 2.0 we

[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-05-03 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17736 @hvanhovell What you think about adding a config to fallback string literal parsing consistent with old SQL parser behavior (don't unescape input string). --- If your project is set up for it, you c

[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-05-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17736 > So the user input query is always going with this kind of string literal in SQL shell? Yea, I think so. Before we create a new PR for the config, let's try to get some feedback

[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-05-03 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17736 `"""string"""` can mitigate this issue. So the user input query is always going with this kind of string literal in SQL shell? A config seems good to me. We can solve migration issue but don

[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-05-02 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17736 > This also seems unreasonable to me because so many backslashes are confusing and it seems to me that no other systems have similar behavior Like I said before, it's because java string

[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-05-02 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17736 For the regex, currently users need to write something like `df.filter("value rlike '^x20[x20-x23]+$'")`. This seems unreasonable to me and it is this patch tries to fix. --- If your pr

[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-05-02 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17736 yea, much clearer now, and the string literal in Spark 2.0 looks more reasonable. For the regex, I think it's unfair to compare `df.filter("value rlike '^\\x20[\\x20-\\x23]+$'")` with `d

[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-05-02 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17736 @cloud-fan I've updated the example. Please check if it is better for you. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-05-02 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17736 hi @viirya , thanks for your example! And I have one suggestion, can we use `"""string"""` instead of `"string"` in the example? Otherwise I have to manually parse the string according to java str

[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-04-28 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17736 @cloud-fan although @hvanhovell haven't comment yet, I will go to fix the inconsistency first and see if we have defined tests against it. --- If your project is set up for it, you can reply to this

[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-04-28 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17736 Btw, I think the rules in `unescapeSQLString` are also inconsistent too. `\u` and `\000` don't follow others. I think we should fix this inconsistency between 1.6 and 2.0 if there is no

[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-04-27 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17736 I don't know why we have `unescapeSQLString` to unescape the string input which causes this inconsistency. Maybe @hvanhovell knows more. --- If your project is set up for it, you can reply to this e

[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-04-27 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17736 shall we fix this inconsistency too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enab

[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-04-27 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17736 @cloud-fan Do you mean `SELECT \\abc`? Spark 2.x: sql("select '\\abc'").show() +---+ |abc| +---+ |abc| +---+ sql("select

[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-04-27 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17736 what does `SELECT '//abc'` result to? a row with a string value `/abc` or `//abc`? Is it consistent between spark 1.6 and 2.0? --- If your project is set up for it, you can reply to this email an

[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-04-26 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17736 ping @cloud-fan @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wis

[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-04-24 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17736 It is. But it has no problem for normal string literal. It causes problem only if the string literal is used as regex pattern string. --- If your project is set up for it, you can reply to this emai

[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-04-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17736 isn't the regex parsed as string literal? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-04-24 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17736 Is it? Are there any significant difference? I don't remember there is necessary migration from 1.6 to 2.0 for string literals. --- If your project is set up for it, you can reply to this email and

[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-04-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17736 seems all string literals in Spark 2.0 parser behave differently from Spark 1.6? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-04-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17736 cc @hvanhovell for review ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wis