cloud-fan commented on issue #26626: [SPARK-29986][SQL] Introduce java like 
string trim to UTF8String
URL: https://github.com/apache/spark/pull/26626#issuecomment-557132578
 
 
   From SQL standard
   ```
   let SRC be <trim source>. TRIM ( SRC ) is equivalent to TRIM ( BOTH ' ' FROM 
SRC ).
   ```
   ```
   cast specification
   If SD is character string, then SV is replaced by SV with any leading or 
trailing <space>s removed.
   ```
   some related information
   ```
   L( <left bracket> <colon> SPACE <colon> <right bracket> )
   is the set of all character strings of length 1 (one) that are the <space> 
character.
   r) L( <left bracket> <colon> WHITESPACE <colon> <right bracket> )
   is the set of all character strings of length 1 (one) that are white space 
characters.
   ...
   white space
   consecutive sequences of one or more characters that have no glyphs
   ```
   
   So space means `' '`, and white space means all chars whose ascii code <= 
32. `trim` and `cast` should only remove spaces.
   
   However, seems most of the DBs don't follow the cast part, and we rely on 
`Double.valueOf` so hard to change this behavior. I think it's OK to trim white 
spaces in cast.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to