When performing a join in Hive and then filtering the output with a where clause, the Hive compiler will try to filter data before the tables are joined. This is known as predicate pushdown ( http://allabouthadoop.net/what-is-predicate-pushdown-in-hive/)
For example: SELECT * FROM a JOIN b ON a.some_id=b.some_other_id WHERE a.some_name=6 Rows from table a which have some_name = 6 will be filtered before performing the join, if push down predicates are enabled(hive.optimize.ppd). However, I have also learned recently that there is another way of filtering data from a table before joining it with another table( https://vinaynotes.wordpress.com/2015/10/01/hive-tips-joins-occur-before-where-clause/ ). One can provide the condition in the ON clause, and table a will be filtered before the join is performed For example: SELECT * FROM a JOIN b ON a.some_id=b.some_other_id AND a.some_name=6 Are these both valid ways of filtering data before joins? Thank you Yours Truly, Varun Rao