[GitHub] spark pull request: [SPARK-13167][SQL] Include rows with null valu...

sureshthalamati Tue, 23 Feb 2016 16:31:40 -0800

Github user sureshthalamati commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11063#discussion_r53875892
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala
 ---
    @@ -75,12 +83,63 @@ private[sql] object JDBCRelation {
         }
         ans.toArray
       }
    +
    +  /**
    +   * Returns null value predicate if the column is nullable, otherwise 
empty string. Uses
    +   * nullable information available in the catalyst schema generated for 
the source table.
    +   * This method avoids querying the database metadata to find if the 
unquoted column names
    +   * are interpreted as uppercase or lower case. In rare cases when the 
schema has on column
    +   * with uppercase name, and another one with lowercase, it will return 
null value
    +   * partition predicate if either one of the column is nullable.
    +   *
    +   * @param column name of the column
    +   * @param url the jdbc url of the database
    +   * @param schema table's Catalyst schema
    +   * @return  null value predicate or empty string.
    +   */
    +  private def getParitionNullValuePredicate(column: String,
    +    url: String, schema: StructType): String = {
    --- End diff --
    
    Thanks for the feedback , Reynold. I was also debating if we really need to 
check the nullability
    of the column. As you observed  nullability check got more tricker than I 
originally expected.
    
    Adding null predicate for the first partition should work fine except may 
be minor performance overhead for fetching the one partition data when 
partition column is not nullable. Most probably
    source database might be smart to optimize the unnecessary predicate.
    
    One concern I had was, if sending query with is null predicate for non 
nullable columns raises any red flags for database admins monitoring the 
queries.
    
    Please let me know if you prefer simplifying the fix by adding is null 
predicate always to the first
    partition without checking column nullabilty. I will update the PR 
accordingly.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-13167][SQL] Include rows with null valu...

Reply via email to