A join is implemented for most cases with a group by. Rows in your table a and your table b will be grouped by something let's say the value of your colum id. So for each group doing a join is a trivial operation. The simple way is to get all values, separate them somehow to know which are from the a table and which are from the b table and them emit all couple (row_a,row_b) for your value of "id".
But if you want to do a OR, there is no way to express it during the group by. You must be able to define before the group by what will be the key of it. I am not saying that you can not solve your problem. Only that the OR constraint is due to the MapReduce paradigm. I hope it is clearer for you. Knowing what is map reduce could really help you. It is does not mean you need to know java but you should understand how the data is manipulated. Bertrand On Thu, Jul 26, 2012 at 5:34 PM, 周彩钦 <caiqinz...@gmail.com> wrote: > Thanks Bertrand, > You said it's hadoop problem, is it means that if I change to use > MapReduce (java MR or streaming), it still can't achieve the purpose? > PS: I'm not very familiar with java MR and streaming:) but I have to find > a way to implement it. > > > On Thu, Jul 26, 2012 at 11:19 PM, Bertrand Dechoux <decho...@gmail.com>wrote: > >> That's a problem which is hadoop related and not really hive related. >> The solution is to use only equal (as you know it). For that, you should >> first extract your real identifier for a, which can be a.pid or a part of >> it. >> I assume that you can know it in advance which one will be used. >> >> Bertrand >> >> >> >> On Thu, Jul 26, 2012 at 5:11 PM, 周彩钦 <caiqinz...@gmail.com> wrote: >> >>> Hi all, >>> >>> I have problem when using left join with hive 0.7.1. >>> I have a query below: >>> >>> select >>> a.pid, >>> b.pid >>> tab1 a >>> left join >>> tab2 b >>> on (a.pid=b.pid or substr(a.pid,1,27)=b.pid); >>> >>> But hive don't support "OR" in left join. >>> Table a is huge, and table b has 40000 rows now(will increase). >>> Is there any other solution to achieve this? >>> >>> Thanks very much. >>> >>> -- >>> >>> >> >> >> -- >> Bertrand Dechoux >> > > > > -- > /**********************************************************/ > // 姓名:周彩钦 > // 联系电话:15210364513 > // E-mail:caiqinz...@gmail.com > /**********************************************************/ > -- Bertrand Dechoux