Yes , I am accidentally joining on a Double. keys: 0 UDFToDouble(nav_tcdt) (type: double) 1 UDFToDouble(site_categ_id) (type: double) 2 UDFToDouble(site_categ_id) (type: double) 3 UDFToDouble(mg_brand_id) (type: double) 4 UDFToDouble(attr_detl_id) (type: double)
r7raul1...@163.com From: Gopal Vijayaraghavan Date: 2015-08-29 01:45 To: user Subject: Re: sql mapjoin very slow > I have a question. I use hive 1.1.0 ,so hive.stats.dbclass default value >is fs. Mean store statistics > in local filesystem. Any one can tell what is the file path to store >statistics ? The statistics aren't stored in the file system long term - the final destination for stats is the metastore. The earlier default stats implementation used MR Counters. With stats.dbclass=fs, they're passed during ETL via the FileSystem, not the MR counters. You'll see something like this in the ETL phase, which is just a way to write the target table + a new location where stats for the insert is staged. 2015-08-28T01:44:35,581 INFO [main]: parse.SemanticAnalyzer (SemanticAnalyzer.java:genFileSinkPlan(6629)) - Set stats collection dir : hdfs:// The StatsTask on the client side will read this file and update the metastore. That aside, you might want to check if you're accidentally joining on a Double. That has been recently reported as a HashMap regression & can be triggered when doing a join string_col = int_col; with an easy workaround, cast the smaller table to the bigger table's type. Cheers, Gopal