[jira] [Updated] (SPARK-25377) spark sql dataframe cache is invalid

Iverson Hu (JIRA) Fri, 07 Sep 2018 23:22:26 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-25377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Iverson Hu updated SPARK-25377:
-------------------------------
    Description: 
  When I use SQL dataframe in application, I found that dataframe.cache is 
invalid, the first time to execute Action like count() took me 40 seconds, and 
the seconds time to execute Action also.So I use dataframe.rdd.cache, second 
execution time is less than first execution time. And I think it's SQL 
dataframe's bug.

   This is my codes and console log, and I have cached the datafame of result 
before.

 this is my codes

logger.info("start to consuming result count")
logger.info(s"consuming ${result.count} output records")
//result.show(false)
logger.info("starting go to MysqlSink")
logger.info(s"consuming ${result.count} output records")
logger.info("starting go to MysqlSink")

 

And console log is below

18/09/08 14:15:17 INFO MySQLRiskScenarioRunner: start to consuming result count
18/09/08 14:15:49 INFO MySQLRiskScenarioRunner: consuming 5 output records
18/09/08 14:15:49 INFO MySQLRiskScenarioRunner: starting go to MysqlSink
18/09/08 14:16:22 INFO MySQLRiskScenarioRunner: consuming 5 output records
18/09/08 14:16:22 INFO MySQLRiskScenarioRunner: starting go to MysqlSink

 

 

 

 

 

 

  was:
  When I use SQL dataframe in application, I found that dataframe.cache is 
invalid, the first time to execute Action like count() took me 40 seconds, and 
the seconds time to execute Action also.So I use dataframe.rdd.cache, second 
execution time is less than first execution time. And I think it's SQL 
dataframe's bug.

   This is my codes and console log, and I have cached the datafame of result 
before. !image-2018-09-08-14-18-36-780.png!

 

!image-2018-09-08-14-18-07-759.png!


> spark sql dataframe cache is invalid
> ------------------------------------
>
>                 Key: SPARK-25377
>                 URL: https://issues.apache.org/jira/browse/SPARK-25377
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.3.0
>         Environment: spark version 2.3.0
> scala version 2.1.8
>            Reporter: Iverson Hu
>            Priority: Major
>
>   When I use SQL dataframe in application, I found that dataframe.cache is 
> invalid, the first time to execute Action like count() took me 40 seconds, 
> and the seconds time to execute Action also.So I use dataframe.rdd.cache, 
> second execution time is less than first execution time. And I think it's SQL 
> dataframe's bug.
>    This is my codes and console log, and I have cached the datafame of result 
> before.
>  this is my codes
> logger.info("start to consuming result count")
> logger.info(s"consuming ${result.count} output records")
> //result.show(false)
> logger.info("starting go to MysqlSink")
> logger.info(s"consuming ${result.count} output records")
> logger.info("starting go to MysqlSink")
>  
> And console log is below
> 18/09/08 14:15:17 INFO MySQLRiskScenarioRunner: start to consuming result 
> count
> 18/09/08 14:15:49 INFO MySQLRiskScenarioRunner: consuming 5 output records
> 18/09/08 14:15:49 INFO MySQLRiskScenarioRunner: starting go to MysqlSink
> 18/09/08 14:16:22 INFO MySQLRiskScenarioRunner: consuming 5 output records
> 18/09/08 14:16:22 INFO MySQLRiskScenarioRunner: starting go to MysqlSink
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-25377) spark sql dataframe cache is invalid

Reply via email to