dtenedor opened a new pull request, #36066:
URL: https://github.com/apache/spark/pull/36066

   ### What changes were proposed in this pull request?
   
   This PR implements the `to_number` and `try_to_number` SQL function 
expressions according to new semantics described below. The former is 
equivalent to the latter except that it throws an exception instead of 
returning NULL for cases where the input string does not match the format 
string.
   
   -----------
   
   # `try_to_number` function
   
   Returns `expr` cast to DECIMAL using formatting `fmt`, or `NULL` if `expr` 
is not a valid match for the given format.
   
   ## Syntax
   
   ```
   try_to_number(expr, fmt)
   fmt
     { ' [ S ] [ L | $ ] 
         [ 0 | 9 | G | , ] [...] 
         [ . | D ] 
         [ 0 | 9 ] [...] 
         [ L | $ ] [ PR | MI | S ] ' }
   ```
   
   ## Arguments
   
   - `expr`: A STRING expression representing a number. `expr` may include 
leading or trailing spaces.
   - `fmt`: An STRING literal, specifying the expected format of `expr`.
   
   ## Returns
   
   A DECIMAL(p, s) where `p` is the total number of digits (`0` or `9`) and `s` 
is the number of digits after the decimal point, or 0 if there is none.
   
   `fmt` can contain the following elements (case insensitive):
   
   - **`0`** or **`9`**
   
     Specifies an expected digit between `0` and `9`. 
     A `0` to the left of the decimal points indicates that `expr` must have at 
least as many digits.
     A leading `9` indicates that `expr` may omit these digits.
   
     `expr` must not be larger than the number of digits to the left of the 
decimal point allowed by the format string.
   
     Digits to the right of the decimal point in the format string indicate the 
most digits that `expr` may have to the right of the decimal point.
   
   - **`.`** or **`D`**
   
     Specifies the position of the decimal point.
   
     `expr` does not need to include a decimal point.
   
   - **`,`** or **`G`**
   
     Specifies the position of the `,` grouping (thousands) separator.
     There must be a `0` or `9` to the left of the rightmost grouping 
separator. 
     `expr` must match the grouping separator relevant for the size of the 
number. 
   
   - **`L`** or **`$`**
   
     Specifies the location of the `$` currency sign. This character may only 
be specified once.
   
   - **`S`** 
   
     Specifies the position of an option '+' or '-' sign. This character may 
only be specified once.
   
   - **`MI`**
   
     Specifies that `expr` has an optional `-` sign at the end, but no `+`.
   
   - **`PR`**
   
     Specifies that `expr` indicates a negative number with wrapping angled 
brackets (`<1>`).
   
   If `expr` contains any characters other than `0` through `9` and those 
permitted in `fmt` a `NULL` is returned.
   
   ## Examples
   
   ```sql
   -- The format expects:
   --  * an optional sign at the beginning,
   --  * followed by a dollar sign,
   --  * followed by a number between 3 and 6 digits long,
   --  * thousands separators,
   --  * up to two dights beyond the decimal point. 
   > SELECT try_to_number('-$12,345.67', 'S$999,099.99');
    -12345.67
   -- The plus sign is optional, and so are fractional digits.
   > SELECT try_to_number('$345', 'S$999,099.99');
    345.00
   -- The format requires at least three digits.
   > SELECT try_to_number('$45', 'S$999,099.99');
    NULL
   -- The format requires at least three digits.
   > SELECT try_to_number('$045', 'S$999,099.99');
    45.00
   -- Using brackets to denote negative values
   > SELECT try_to_number('<1234>', '999999PR');
    -1234
   ```
   
   ### Why are the changes needed?
   
   The new semantics bring Spark into consistency with other engines and grant 
the user flexibility about how to handle cases where inputs do not match the 
format string.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes.
   
   * The minus sign `-` is no longer supported in the format string (`S` 
replaces it).
   * `MI` and `PR` are new options in the format string.
   * `to_number` and `try_to_number` are separate functions with different 
error behavior.
   
   ### How was this patch tested?
   
   * New positive and negative unit tests cover both `to_number` and 
`try_to_number` functions.
   * Query tests update as needed according to the behavior changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to