?????? Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset

2020-01-16 Thread ??????
Hi,
  What I am talking about is the `PlannerExpressionParserImpl`, 
which is written by Scala Parser tool, Every time we call 
StreamTableEnvironment#FromDataStream, the field String (or maybe scala.Symbol 
by scala Api) shall be parsed by `PlannerExpressionParserImpl` into 
`Expression`.
As we can see the parser grammar written in 
`PlannerExpressionParserImpl `, the `fieldRefrence` is defined by `*` or 
`ident`. `ident` in  `PlannerExpressionParserImpl` is 
just the one in [[scala.util.parsing.combinator.JavaTokenParsers]] 
which is JavaIdentifier.


 After discussed with Jark, I also discovered that 
`PlannerExpressionParserImpl` currrently even does not support quote ??'`'). I 
did't know what u just told me about Calcite before. But it doesn't 
matter. Well maybe we can just let 
PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset 
and support '`' for the first step, and then make the whole project 
supports Unicode charset when Calcite related part is available.




btw I have been to ur lecture in FFA Asia on Calcite, which really inspired me 
a lot~






Best Regards
??Shoi Liu










----
??:"Danny Chan"https://docs.google.com/document/d/1wo5byn_6K_YOKiPdXNav1zgzt9IBC3SbPvpPnIShtXk/edit#heading=h.g4bnumde4dl5
 
 
 
 
 Best, Danny Chan
 
 
 ?? 2020??1??15?? +0800 PM11:08 https://issues.apache.org/jira/browse/FLINK-15573
 

  As the title tells, what I do want to do is let the `FieldRefrence` use 
Unicode as its default charset (or maybe as an optional charset which can 
be configured).
 According to the `PlannerExpressionParserImpl`, currently FLINK uses 
JavaIdentifier as `FieldRefrence`??s default charset. But, from my 
perspective, it is not enough. Considering that user who uses ElasticSearch as 
sink??we all know that ES has A field called `@timestamp`, which JavaIdentifier 
cannot meet.
 

  So in my team, we just let `PlannerExpressionParserImpl#FieldRefrence` 
use Unicode as its default charset so that solves this kind of problem. (Plz 
refer to the issue I mentioned above )
 

 In my Opinion, the change shall be for general purpose:
 Firstly, Mysql supports unicode as default field charset, see the field 
named `@@`, so shall we support unicode also?
 

Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset

2020-01-15 Thread ??????
Hi all, 
the related issue:https://issues.apache.org/jira/browse/FLINK-15573


 As the title tells, what I do want to do is let the `FieldRefrence` use 
Unicode as its default charset (or maybe as an optional charset which can 
be configured).
According to the `PlannerExpressionParserImpl`, currently FLINK uses 
JavaIdentifier as `FieldRefrence`??s default charset. But, from my 
perspective, it is not enough. Considering that user who uses ElasticSearch as 
sink??we all know that ES has A field called `@timestamp`, which JavaIdentifier 
cannot meet.


 So in my team, we just let `PlannerExpressionParserImpl#FieldRefrence` 
use Unicode as its default charset so that solves this kind of problem. (Plz 
refer to the issue I mentioned above )


In my Opinion, the change shall be for general purpose:
Firstly, Mysql supports unicode as default field charset, see the field 
named `@@`, so shall we support unicode also? 



 What?? s more, my team really get a lot of benefits from 
this change. I also believe that it can give other users more benefits without 
even any harm!
 Fortunately, the change supports fully forwards compatibility.Cuz 
Unicode is the superset of JavaIdentifier. Only a few code change can 
achieve this goal.
 Looking forward for any opinion.
 
btw, thanks to tison~





Best Regards
??Shoi Liu






Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset

2020-01-15 Thread ??????
Hi all, 
the related issue:https://issues.apache.org/jira/browse/FLINK-15573


 As the title tells, what I do want to do is let the `FieldRefrence` use 
Unicode as its default charset (or maybe as an optional charset which can 
be configured).
According to the `PlannerExpressionParserImpl`, currently FLINK uses 
JavaIdentifier as `FieldRefrence`??s default charset. But, from my 
perspective, it is not enough. Considering that user who uses ElasticSearch as 
sink??we all know that ES has A field called `@timestamp`, which JavaIdentifier 
cannot meet.


 So in my team, we just let `PlannerExpressionParserImpl#FieldRefrence` 
use Unicode as its default charset so that solves this kind of problem. (Plz 
refer to the issue I mentioned above )


In my Opinion, the change shall be for general purpose:
Firstly, Mysql supports unicode as default field charset, see the field 
named `@@`, so shall we support unicode also? 



 What?? s more, my team really get a lot of benefits from 
this change. I also believe that it can give other users more benefits without 
even any harm!
 Fortunately, the change supports fully forwards compatibility.Cuz 
Unicode is the superset of JavaIdentifier. Only a few code change can 
achieve this goal.
 Looking forward for any opinion.
 
btw, thanks to tison~





Best Regards
??Shoi Liu






[jira] [Created] (FLINK-15573) Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset

2020-01-13 Thread Lsw_aka_laplace (Jira)
Lsw_aka_laplace created FLINK-15573:
---

 Summary: Let Flink SQL PlannerExpressionParserImpl#FieldRefrence 
use Unicode  as its default charset  
 Key: FLINK-15573
 URL: https://issues.apache.org/jira/browse/FLINK-15573
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Reporter: Lsw_aka_laplace


Now I am talking about the `PlannerExpressionParserImpl`

    For now  the fieldRefrence‘s  charset is JavaIdentifier,why not change it 
to UnicodeIdentifier?

    Currently in my team, we do actually have this problem. For instance, data 
from Es always contains `@timestamp` field , which can not meet JavaIdentifier. 
So what we did is just let the fieldRefrence Charset use Unicode

 
{code:scala}
 lazy val extensionIdent: Parser[String] = ( "" ~> // handle whitespace 
rep1(acceptIf(Character.isUnicodeIdentifierStart)("identifier expected but '" + 
_ + "' found"), elem("identifier part", Character.isUnicodeIdentifierPart(: 
Char))) ^^ (.mkString) ) 
 lazy val fieldReference: PackratParser[UnresolvedReferenceExpression] = (STAR 
| ident | extensionIdent) ^^ { sym => unresolvedRef(sym) }{code}
 

It is simple but really make sense~

Looking forward for any opinion

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)