?????? Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset
Hi, What I am talking about is the `PlannerExpressionParserImpl`, which is written by Scala Parser tool, Every time we call StreamTableEnvironment#FromDataStream, the field String (or maybe scala.Symbol by scala Api) shall be parsed by `PlannerExpressionParserImpl ` into `Expression`. As we can see the parser grammar written in `PlannerExpressionParserImpl `, the `fieldRefrence` is defined by `*` or `ident`. `ident` in `PlannerExpressionParserImpl` is just the one in [[scala.util.parsing.combinator.JavaTokenParsers]] which is JavaIdentifier. After discussed with Jark, I also discovered that `PlannerExpressionParserImpl` currrently even does not support quote ??'`'). I did't know what u just told me about Calcite before. But it doesn't matter. Well maybe we can just let PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset and support '`' for the first step, and then make the whole project supports Unicode charset when Calcite related part is available. btw I have been to ur lecture in FFA Asia on Calcite, which really inspired me a lot~ Best Regards ??Shoi Liu -- -- ??: "Danny Chan"https://docs.google.com/document/d/1wo5byn_6K_YOKiPdXNav1zgzt9IBC3SbPvpPnIShtXk/edit#heading=h.g4bnumde4dl5 Best, Danny Chan ?? 2020??1??15?? +0800 PM11:08 https://issues.apache.org/jira/browse/FLINK-15573 As the title tells, what I do want to do is let the `FieldRefrence` use Unicode as its default charset (or maybe as an optional charset which can be configured). According to the `PlannerExpressionParserImpl`, currently FLINK uses JavaIdentifier as `FieldRefrence`??s default charset. But, from my perspective, it is not enough. Considering that user who uses ElasticSearch as sink??we all know that ES has A field called `@timestamp`, which JavaIdentifier cannot meet. So in my team, we just let `PlannerExpressionParserImpl#FieldRefrence` use Unicode as its default charset so that solves this kind of problem. (Plz refer to the issue I mentioned above ) In my Opinion, the change shall be for general purpose: Firstly, Mysql supports unicode as default field charset, see the field named `@@`, so shall we support unicode also?
Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset
Hi all, the related issue:https://issues.apache.org/jira/browse/FLINK-15573 As the title tells, what I do want to do is let the `FieldRefrence` use Unicode as its default charset (or maybe as an optional charset which can be configured). According to the `PlannerExpressionParserImpl`, currently FLINK uses JavaIdentifier as `FieldRefrence`??s default charset. But, from my perspective, it is not enough. Considering that user who uses ElasticSearch as sink??we all know that ES has A field called `@timestamp`, which JavaIdentifier cannot meet. So in my team, we just let `PlannerExpressionParserImpl#FieldRefrence` use Unicode as its default charset so that solves this kind of problem. (Plz refer to the issue I mentioned above ) In my Opinion, the change shall be for general purpose: Firstly, Mysql supports unicode as default field charset, see the field named `@@`, so shall we support unicode also? What?? s more, my team really get a lot of benefits from this change. I also believe that it can give other users more benefits without even any harm! Fortunately, the change supports fully forwards compatibility.Cuz Unicode is the superset of JavaIdentifier. Only a few code change can achieve this goal. Looking forward for any opinion. btw, thanks to tison~ Best Regards ??Shoi Liu
Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset
Hi all, the related issue:https://issues.apache.org/jira/browse/FLINK-15573 As the title tells, what I do want to do is let the `FieldRefrence` use Unicode as its default charset (or maybe as an optional charset which can be configured). According to the `PlannerExpressionParserImpl`, currently FLINK uses JavaIdentifier as `FieldRefrence`??s default charset. But, from my perspective, it is not enough. Considering that user who uses ElasticSearch as sink??we all know that ES has A field called `@timestamp`, which JavaIdentifier cannot meet. So in my team, we just let `PlannerExpressionParserImpl#FieldRefrence` use Unicode as its default charset so that solves this kind of problem. (Plz refer to the issue I mentioned above ) In my Opinion, the change shall be for general purpose: Firstly, Mysql supports unicode as default field charset, see the field named `@@`, so shall we support unicode also? What?? s more, my team really get a lot of benefits from this change. I also believe that it can give other users more benefits without even any harm! Fortunately, the change supports fully forwards compatibility.Cuz Unicode is the superset of JavaIdentifier. Only a few code change can achieve this goal. Looking forward for any opinion. btw, thanks to tison~ Best Regards ??Shoi Liu
[jira] [Created] (FLINK-15573) Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset
Lsw_aka_laplace created FLINK-15573: --- Summary: Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset Key: FLINK-15573 URL: https://issues.apache.org/jira/browse/FLINK-15573 Project: Flink Issue Type: Improvement Components: Table SQL / Planner Reporter: Lsw_aka_laplace Now I am talking about the `PlannerExpressionParserImpl` For now the fieldRefrence‘s charset is JavaIdentifier,why not change it to UnicodeIdentifier? Currently in my team, we do actually have this problem. For instance, data from Es always contains `@timestamp` field , which can not meet JavaIdentifier. So what we did is just let the fieldRefrence Charset use Unicode {code:scala} lazy val extensionIdent: Parser[String] = ( "" ~> // handle whitespace rep1(acceptIf(Character.isUnicodeIdentifierStart)("identifier expected but '" + _ + "' found"), elem("identifier part", Character.isUnicodeIdentifierPart(: Char))) ^^ (.mkString) ) lazy val fieldReference: PackratParser[UnresolvedReferenceExpression] = (STAR | ident | extensionIdent) ^^ { sym => unresolvedRef(sym) }{code} It is simple but really make sense~ Looking forward for any opinion -- This message was sent by Atlassian Jira (v8.3.4#803005)