GitHub user rajeshbalamohan opened a pull request:

    https://github.com/apache/spark/pull/13522

    [SPARK-14321][SQL] Reduce date format cost and string-to-date cost in…

    ## What changes were proposed in this pull request?
    Here is the generated code snippet when executing date functions. 
SimpleDateFormat is fairly expensive and can show up bottleneck when processing 
millions of records. It would be better to instantiate it once.
    
    ```
    /* 066 */     UTF8String primitive5 = null;
    /* 067 */     if (!isNull4) {
    /* 068 */       try {
    /* 069 */         primitive5 = UTF8String.fromString(new 
java.text.SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(
    /* 070 */             new java.util.Date(primitive7 * 1000L)));
    /* 071 */       } catch (java.lang.Throwable e) {
    /* 072 */         isNull4 = true;
    /* 073 */       }
    /* 074 */     }
    ```
    
    With modified code, here is the generated code
    ```
    /* 010 */   private java.text.SimpleDateFormat sdf2;
    /* 011 */   private UnsafeRow result13;
    /* 012 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder bufferHolder14;
    /* 013 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter15;
    /* 014 */
    ...
    ...
    /* 065 */     boolean isNull0 = isNull3;
    /* 066 */     UTF8String primitive1 = null;
    /* 067 */     if (!isNull0) {
    /* 068 */       try {
    /* 069 */         if (sdf2 == null) {
    /* 070 */           sdf2 = new java.text.SimpleDateFormat("yyyy-MM-dd 
HH:mm:ss");
    /* 071 */         }
    /* 072 */         primitive1 = UTF8String.fromString(sdf2.format(
    /* 073 */             new java.util.Date(primitive4 * 1000L)));
    /* 074 */       } catch (java.lang.Throwable e) {
    /* 075 */         isNull0 = true;
    /* 076 */       }
    /* 077 */     }
    ```
    
    Similarly Calendar.getInstance was used in DateTimeUtils which can be 
lazily inited.
    
    
    ## How was this patch tested?
    
    
org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite,org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rajeshbalamohan/spark SPARK-14321-1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13522.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13522
    
----
commit 602d4a70ba845df3160a07c2c9afe2d5c3c574c4
Author: Rajesh Balamohan <rbalamo...@apache.org>
Date:   2016-06-06T12:54:02Z

    [SPARK-14321][SQL] Reduce date format cost and string-to-date cost in date 
functions

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to