Hi Bertrand, Thanks for your quick reply, got it now. Thanks.
On Fri, Jul 27, 2012 at 12:15 AM, Bertrand Dechoux <decho...@gmail.com>wrote: > A join is implemented for most cases with a group by. > > Rows in your table a and your table b will be grouped by something let's > say the value of your colum id. > So for each group doing a join is a trivial operation. The simple way is > to get all values, separate them somehow to know which are from the a table > and which are from the b table and them emit all couple (row_a,row_b) for > your value of "id". > > But if you want to do a OR, there is no way to express it during the group > by. You must be able to define before the group by what will be the key of > it. > > I am not saying that you can not solve your problem. Only that the OR > constraint is due to the MapReduce paradigm. > > I hope it is clearer for you. Knowing what is map reduce could really help > you. It is does not mean you need to know java but you should understand > how the data is manipulated. > > Bertrand > > > On Thu, Jul 26, 2012 at 5:34 PM, 周彩钦 <caiqinz...@gmail.com> wrote: > >> Thanks Bertrand, >> You said it's hadoop problem, is it means that if I change to use >> MapReduce (java MR or streaming), it still can't achieve the purpose? >> PS: I'm not very familiar with java MR and streaming:) but I have to >> find a way to implement it. >> >> >> On Thu, Jul 26, 2012 at 11:19 PM, Bertrand Dechoux <decho...@gmail.com>wrote: >> >>> That's a problem which is hadoop related and not really hive related. >>> The solution is to use only equal (as you know it). For that, you should >>> first extract your real identifier for a, which can be a.pid or a part of >>> it. >>> I assume that you can know it in advance which one will be used. >>> >>> Bertrand >>> >>> >>> >>> On Thu, Jul 26, 2012 at 5:11 PM, 周彩钦 <caiqinz...@gmail.com> wrote: >>> >>>> Hi all, >>>> >>>> I have problem when using left join with hive 0.7.1. >>>> I have a query below: >>>> >>>> select >>>> a.pid, >>>> b.pid >>>> tab1 a >>>> left join >>>> tab2 b >>>> on (a.pid=b.pid or substr(a.pid,1,27)=b.pid); >>>> >>>> But hive don't support "OR" in left join. >>>> Table a is huge, and table b has 40000 rows now(will increase). >>>> Is there any other solution to achieve this? >>>> >>>> Thanks very much. >>>> >>>> -- >>>> >>>> >>> >>> >>> -- >>> Bertrand Dechoux >>> >> >> >> >> -- >> /**********************************************************/ >> // 姓名:周彩钦 >> // 联系电话:15210364513 >> // E-mail:caiqinz...@gmail.com >> /**********************************************************/ >> > > > > -- > Bertrand Dechoux > -- /**********************************************************/ // 姓名:周彩钦 // 联系电话:15210364513 // E-mail:caiqinz...@gmail.com /**********************************************************/