sujith71955 commented on a change in pull request #22758: [SPARK-25332][SQL]
select broadcast join instead of sortMergeJoin for the small size table even
query fired via new session/context
URL: https://github.com/apache/spark/pull/22758#discussion_r266304441
##########
File path:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
##########
@@ -193,6 +193,16 @@ private[hive] class HiveMetastoreCatalog(sparkSession:
SparkSession) extends Log
None)
val logicalRelation = cached.getOrElse {
val updatedTable = inferIfNeeded(relation, options, fileFormat)
+ // Intialize the catalogTable stats if its not defined.An intial
value has to be defined
+ // so that the hive statistics will be updated after each insert
command.
+ val withStats = {
+ if (updatedTable.stats == None) {
Review comment:
> @wangyum, so it is basically subset of #22721? It's funny that Hive tables
should set the initial stats alone here, which is supposed to be set somewhere
else.
Bit old PR :), will go through the problem once again and let you know more
concide details. i remeber i struggled a lot for handing this issue :) Please
let me know for any inputs if this way of handling is wrong.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]