[GitHub] spark pull request #22721: [SPARK-25403][SQL] Refreshes the table after inse...

sujith71955 Wed, 07 Nov 2018 23:59:01 -0800

Github user sujith71955 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22721#discussion_r231790964
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala
 ---
    @@ -183,13 +183,14 @@ case class InsertIntoHadoopFsRelationCommand(
             refreshUpdatedPartitions(updatedPartitionPaths)
           }
     
    -      // refresh cached files in FileIndex
    -      fileIndex.foreach(_.refresh())
    -      // refresh data cache if table is cached
    -      sparkSession.catalog.refreshByPath(outputPath.toString)
    -
           if (catalogTable.nonEmpty) {
    +        
sparkSession.sessionState.catalog.refreshTable(catalogTable.get.identifier)
    --- End diff --
    
    This is the reason i asked why in some flow we are initializing the stats 
and for some flow we are not because of which stats will be none and 
refreshTable will be never called.
    in my PR i told the flow where i saw in insert flow we are not nitializing 
the stats because of which refreshTable () flow will never be executed.
    But before insert command you execute a select statement where stats will 
be intialized and the relation will be cached, now if you execute insert query 
refreshTable() will be called as this time the stats will be nonempty



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22721: [SPARK-25403][SQL] Refreshes the table after inse...

Reply via email to