Re: SMB join bug
Thanks. But this seems to happen for a partitioned bucketed table with subqueries. While my use case is a basic join of non partitioned bucketed tables. I will try the patch and let you know. -Sukhendu On May 2, 2014 12:10 PM, "Thejas Nair" wrote: > It is possible that you hit this issue - > https://issues.apache.org/jira/browse/HIVE-5973 > It is fixed in apache hive 0.13 release. > > > On Thu, May 1, 2014 at 7:10 PM, Sukhendu Chakraborty > wrote: > > I am seeing very different number of rows in this query output depending > on > > whether I enable SMB join: > > > > select count(*) > > from dss.hist_hshld_profl_mc a > > join > > dss.hshld_summary_mc b > >on a.hh_key = b.hh_key > > where ('2012-02-27' between a.hshld_profl_eff_dt and > a.hshld_profl_exp_dt) > > and a.hshld_exp_dt='-12-31' > >and trim(a.cntry_id) = 'USA' > > > > The SMB join returns 60 rows (wrong value) while the regular join returns > > 30million plus rows (correct value). > > > > Is there a known issue/jira for this? We are using CDH5.0/hive-0.12. > > > > -Sukhendu > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. >
Re: SMB join bug
It is possible that you hit this issue - https://issues.apache.org/jira/browse/HIVE-5973 It is fixed in apache hive 0.13 release. On Thu, May 1, 2014 at 7:10 PM, Sukhendu Chakraborty wrote: > I am seeing very different number of rows in this query output depending on > whether I enable SMB join: > > select count(*) > from dss.hist_hshld_profl_mc a > join > dss.hshld_summary_mc b >on a.hh_key = b.hh_key > where ('2012-02-27' between a.hshld_profl_eff_dt and a.hshld_profl_exp_dt) > and a.hshld_exp_dt='-12-31' >and trim(a.cntry_id) = 'USA' > > The SMB join returns 60 rows (wrong value) while the regular join returns > 30million plus rows (correct value). > > Is there a known issue/jira for this? We are using CDH5.0/hive-0.12. > > -Sukhendu -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
SMB join bug
I am seeing very different number of rows in this query output depending on whether I enable SMB join: select count(*) from dss.hist_hshld_profl_mc a join dss.hshld_summary_mc b on a.hh_key = b.hh_key where ('2012-02-27' between a.hshld_profl_eff_dt and a.hshld_profl_exp_dt) and a.hshld_exp_dt='-12-31' and trim(a.cntry_id) = 'USA' The SMB join returns 60 rows (wrong value) while the regular join returns 30million plus rows (correct value). Is there a known issue/jira for this? We are using CDH5.0/hive-0.12. -Sukhendu