Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/17736
let's create a new PR
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/17736
@hvanhovell Thanks for comment. @cloud-fan I am going to add the config, is
a new PR better or just update this?
---
If your project is set up for it, you can reply to this email and have your
reply
Github user hvanhovell commented on the issue:
https://github.com/apache/spark/pull/17736
For some reference. In 1.6 we used the Catalyst SqlParser to parse the
expression in `Dataframe.filter()`, and we used the Hive (ANTLR based) parser
for parsing for SQL commands. In Spark 2.0 we
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/17736
@hvanhovell What you think about adding a config to fallback string literal
parsing consistent with old SQL parser behavior (don't unescape input string).
---
If your project is set up for it, you c
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/17736
> So the user input query is always going with this kind of string literal
in SQL shell?
Yea, I think so.
Before we create a new PR for the config, let's try to get some feedback
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/17736
`"""string"""` can mitigate this issue. So the user input query is always
going with this kind of string literal in SQL shell?
A config seems good to me. We can solve migration issue but don
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/17736
> This also seems unreasonable to me because so many backslashes are
confusing and it seems to me that no other systems have similar behavior
Like I said before, it's because java string
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/17736
For the regex, currently users need to write something like
`df.filter("value rlike '^x20[x20-x23]+$'")`. This seems
unreasonable to me and it is this patch tries to fix.
---
If your pr
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/17736
yea, much clearer now, and the string literal in Spark 2.0 looks more
reasonable.
For the regex, I think it's unfair to compare `df.filter("value rlike
'^\\x20[\\x20-\\x23]+$'")` with
`d
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/17736
@cloud-fan I've updated the example. Please check if it is better for you.
Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/17736
hi @viirya , thanks for your example! And I have one suggestion, can we use
`"""string"""` instead of `"string"` in the example? Otherwise I have to
manually parse the string according to java str
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/17736
@cloud-fan although @hvanhovell haven't comment yet, I will go to fix the
inconsistency first and see if we have defined tests against it.
---
If your project is set up for it, you can reply to this
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/17736
Btw, I think the rules in `unescapeSQLString` are also inconsistent too.
`\u` and `\000` don't follow others.
I think we should fix this inconsistency between 1.6 and 2.0 if there is no
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/17736
I don't know why we have `unescapeSQLString` to unescape the string input
which causes this inconsistency. Maybe @hvanhovell knows more.
---
If your project is set up for it, you can reply to this e
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/17736
shall we fix this inconsistency too?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enab
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/17736
@cloud-fan Do you mean `SELECT \\abc`?
Spark 2.x:
sql("select '\\abc'").show()
+---+
|abc|
+---+
|abc|
+---+
sql("select
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/17736
what does `SELECT '//abc'` result to? a row with a string value `/abc` or
`//abc`? Is it consistent between spark 1.6 and 2.0?
---
If your project is set up for it, you can reply to this email an
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/17736
ping @cloud-fan @hvanhovell
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wis
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/17736
It is. But it has no problem for normal string literal. It causes problem
only if the string literal is used as regex pattern string.
---
If your project is set up for it, you can reply to this emai
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/17736
isn't the regex parsed as string literal?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/17736
Is it? Are there any significant difference? I don't remember there is
necessary migration from 1.6 to 2.0 for string literals.
---
If your project is set up for it, you can reply to this email and
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/17736
seems all string literals in Spark 2.0 parser behave differently from Spark
1.6?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17736
cc @hvanhovell for review ...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wis
23 matches
Mail list logo