?????? Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset

2020-01-16 Thread ??????
Hi, 
    What I am talking about is the `PlannerExpressionParserImpl`, 
which is written by Scala Parser tool, Every time we call  
StreamTableEnvironment#FromDataStream, the field String (or maybe scala.Symbol 
by scala Api) shall be parsed by `PlannerExpressionParserImpl ` into 
`Expression`.
As we can see the  parser grammar  written in 
`PlannerExpressionParserImpl `, the `fieldRefrence` is  defined by `*` or 
`ident`.   `ident` in    `PlannerExpressionParserImpl` is 
just the  one in [[scala.util.parsing.combinator.JavaTokenParsers]]  
which is JavaIdentifier. 


   After discussed with Jark, I also discovered that 
`PlannerExpressionParserImpl` currrently even does not support quote ??'`'). I 
did't know what  u just told me about Calcite before. But it doesn't 
matter. Well maybe we can just  let 
PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset 
and support '`'   for the first step, and then make the whole project 
supports Unicode charset  when Calcite related part is available.




btw I have been to ur lecture in FFA Asia on Calcite, which really inspired me 
a lot~
 





Best Regards
??Shoi Liu 





 




--  --
??: "Danny Chan"https://docs.google.com/document/d/1wo5byn_6K_YOKiPdXNav1zgzt9IBC3SbPvpPnIShtXk/edit#heading=h.g4bnumde4dl5
 
 
 
 
 Best, Danny Chan
 
 
 ?? 2020??1??15?? +0800 PM11:08 https://issues.apache.org/jira/browse/FLINK-15573
 

   As the title tells, what I do want to do is let the `FieldRefrence` use 
Unicode as its default charset (or maybe as an optional  charset which can 
be configured).
 According to the  `PlannerExpressionParserImpl`, currently FLINK uses 
JavaIdentifier as   `FieldRefrence`??s default charset. But, from my 
perspective, it is not enough. Considering that user who uses ElasticSearch as 
sink??we all know that ES has A field called `@timestamp`, which JavaIdentifier 
cannot meet.
 

   So in my team, we just let `PlannerExpressionParserImpl#FieldRefrence` 
use Unicode as its default charset so that solves this kind of problem. (Plz 
refer to the issue I mentioned above )
 

 In my Opinion, the change shall be for general purpose:
  Firstly, Mysql supports unicode as default field charset, see the field 
named `@@`, so shall we support unicode also?
 

Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset

2020-01-15 Thread ??????
Hi all, 
 the related issue:https://issues.apache.org/jira/browse/FLINK-15573


  As the title tells, what I do want to do is let the `FieldRefrence` use 
Unicode as its default charset (or maybe as an optional  charset which can 
be configured).
According to the  `PlannerExpressionParserImpl`, currently FLINK uses 
JavaIdentifier as   `FieldRefrence`??s default charset. But, from my 
perspective, it is not enough. Considering that user who uses ElasticSearch as 
sink??we all know that ES has A field called `@timestamp`, which JavaIdentifier 
cannot meet.


  So in my team, we just let `PlannerExpressionParserImpl#FieldRefrence` 
use Unicode as its default charset so that solves this kind of problem. (Plz 
refer to the issue I mentioned above )


In my Opinion, the change shall be for general purpose:
 Firstly, Mysql supports unicode as default field charset, see the field 
named `@@`, so shall we support unicode also? 



  What?? s more,  my team really get a lot of benefits  from 
this change. I also believe that it can give other users more benefits without 
even any harm!
  Fortunately, the change supports fully forwards compatibility.Cuz 
Unicode is the superset of  JavaIdentifier. Only a few code change can 
achieve this goal.
  Looking forward for any opinion.
  
 btw, thanks to tison~





Best Regards
??Shoi Liu




 

Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset

2020-01-15 Thread ??????
Hi all, 
 the related issue:https://issues.apache.org/jira/browse/FLINK-15573


  As the title tells, what I do want to do is let the `FieldRefrence` use 
Unicode as its default charset (or maybe as an optional  charset which can 
be configured).
According to the  `PlannerExpressionParserImpl`, currently FLINK uses 
JavaIdentifier as   `FieldRefrence`??s default charset. But, from my 
perspective, it is not enough. Considering that user who uses ElasticSearch as 
sink??we all know that ES has A field called `@timestamp`, which JavaIdentifier 
cannot meet.


  So in my team, we just let `PlannerExpressionParserImpl#FieldRefrence` 
use Unicode as its default charset so that solves this kind of problem. (Plz 
refer to the issue I mentioned above )


In my Opinion, the change shall be for general purpose:
 Firstly, Mysql supports unicode as default field charset, see the field 
named `@@`, so shall we support unicode also? 



  What?? s more,  my team really get a lot of benefits  from 
this change. I also believe that it can give other users more benefits without 
even any harm!
  Fortunately, the change supports fully forwards compatibility.Cuz 
Unicode is the superset of  JavaIdentifier. Only a few code change can 
achieve this goal.
  Looking forward for any opinion.
  
 btw, thanks to tison~





Best Regards
??Shoi Liu




 

[jira] [Created] (FLINK-15573) Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset

2020-01-13 Thread Lsw_aka_laplace (Jira)
Lsw_aka_laplace created FLINK-15573:
---

 Summary: Let Flink SQL PlannerExpressionParserImpl#FieldRefrence 
use Unicode  as its default charset  
 Key: FLINK-15573
 URL: https://issues.apache.org/jira/browse/FLINK-15573
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Reporter: Lsw_aka_laplace


Now I am talking about the `PlannerExpressionParserImpl`

    For now  the fieldRefrence‘s  charset is JavaIdentifier,why not change it 
to UnicodeIdentifier?

    Currently in my team, we do actually have this problem. For instance, data 
from Es always contains `@timestamp` field , which can not meet JavaIdentifier. 
So what we did is just let the fieldRefrence Charset use Unicode

 
{code:scala}
 lazy val extensionIdent: Parser[String] = ( "" ~> // handle whitespace 
rep1(acceptIf(Character.isUnicodeIdentifierStart)("identifier expected but '" + 
_ + "' found"), elem("identifier part", Character.isUnicodeIdentifierPart(: 
Char))) ^^ (.mkString) ) 
 lazy val fieldReference: PackratParser[UnresolvedReferenceExpression] = (STAR 
| ident | extensionIdent) ^^ { sym => unresolvedRef(sym) }{code}
 

It is simple but really make sense~

Looking forward for any opinion

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)