[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...

kevinyu98 Wed, 21 Mar 2018 23:56:15 -0700

Github user kevinyu98 commented on the issue:

    https://github.com/apache/spark/pull/20795
  
    the reason I was thinking to split is for the below scenario:
    In order to avoid cache twice for the external function name in the cache 
as the scenario described by Dilip, we decide to use `getCurrentDatabase` 
during `normalizeFuncName`.
    
    but it will fail for the spark's builtin function, for example:
    
    ```
    use currentdb;
    select function1(), currentdb.function1() from ...
    
    ```
    if the function1 is builtin function, for example `max`, and the 
`currentdb` doesn't have the function `max`.
    
    the first time, `max` will be found from builtin function checking 
(`functionRegistry.functionExists(name)`), spark's builtin function checking 
didn't use the 
    database name if you don't explicit specify. So the cache will store the 
builtin function max as
    
    `currentdb.max`
    
    the second function `currentdb.max` will be found in the cache, even the 
`currentdb `doesn't have the `max` function.
    
    but during `ResolveFunctions` in the `analyzer`, `currentdb.max` can't be 
resolved, and it will get `NoSuchFunctionException` for `max`.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...

Reply via email to