Theta joins don't do we'll in any system. Are both tables large? If not its a map side join and the reducer will be just an ordinary reducer(s).
Sent from a remote device. Please excuse any typos... Mike Segel On Apr 11, 2013, at 12:18 AM, Vikas Jadhav <[email protected]> wrote: > I wil express it in SQL form > > select * from table1, table2 where table1.attr < table2.attr > > it is also called theta join where theta can be <, >, <=,>=,!= > > > > On Wed, Apr 10, 2013 at 9:35 PM, Michel Segel <[email protected]> > wrote: >> Not sure what is meant by a non equi join. >> >> Are you saying something like for every row in X, join it to all of the rows >> in Y where Y.a < something? >> >> Is that what you are suggesting? >> >> >> Sent from a remote device. Please excuse any typos... >> >> Mike Segel >> >> On Apr 10, 2013, at 9:11 AM, Vikas Jadhav <[email protected]> wrote: >> >>> How are you going to support NON EQUI Join using MapReduce ? >>> As per my understanding there is only one way to do this is >>> to bring all data to one reducer then reducer will know lesser/greater >>> values correctly. >>> Correct me if I am wrong. >>> Thank You. >>> >>> Regards, >>> Vikas >>> >>> >>> >>> On Wed, Apr 10, 2013 at 4:22 PM, Michel Segel <[email protected]> >>> wrote: >>>> Can you show an example of your join? >>>> All joins are an equality in that the key has to match. >>>> Whether its a one to one , one to many, or many to many remains to be seen. >>>> >>>> >>>> Sent from a remote device. Please excuse any typos... >>>> >>>> Mike Segel >>>> >>>> On Apr 9, 2013, at 10:35 AM, Effyroth Gu <[email protected]> wrote: >>>> >>>>> Only equality joins, outer joins, and left semi joins are supported in >>>>> Hive. Hive does not support join conditions that are not equality >>>>> conditions as it is very difficult to express such conditions as a >>>>> map/reduce job. Also, more than two tables can be joined in Hive. >>>>> >>>>> >>>>> 2013/4/9 Michael Segel <[email protected]> >>>>>> Hi, >>>>>> >>>>>> Your cross join is supported in both pig and hive. (Cross, and Theta >>>>>> joins) >>>>>> >>>>>> So there must be code to do this. >>>>>> >>>>>> Essentially in the reducer you would have your key and then the set of >>>>>> rows that match the key. You would then perform the cross product on the >>>>>> key's result set and output them to the collector as separate rows. >>>>>> >>>>>> I'm not sure why you would need the reduce context. >>>>>> >>>>>> But then again, I'm still on my first cup of coffee. ;-) >>>>>> >>>>>> >>>>>> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi >>>>>>> I am also woring on join using MapReduce >>>>>>> i think instead of finding postion of table in RawKeyValuIterator. >>>>>>> what we can do modify context.write method to alway write key as table >>>>>>> name or id >>>>>>> then we dont need to find postion we can get Key and Value from >>>>>>> "reducerContext" >>>>>>> >>>>>>> befor calling reducer.run(reducerContext) in ReduceTask.java we can add >>>>>>> method join in Reducer.java Reducer class and give call to >>>>>>> reducer.join(reduceContext) >>>>>>> >>>>>>> >>>>>>> I just wonder how r going to support NON EQUI join. >>>>>>> >>>>>>> I am also having same problem how to do join if datasets cant fit in to >>>>>>> memory. >>>>>>> >>>>>>> >>>>>>> for now i am cloning using following code : >>>>>>> >>>>>>> >>>>>>> KEYIN key = context.getCurrentKey() ; >>>>>>> KEYIN outKey = null; >>>>>>> try { >>>>>>> outKey = (KEYIN)key.getClass().newInstance(); >>>>>>> } >>>>>>> catch(Exception e) >>>>>>> {} >>>>>>> ReflectionUtils.copy(context.getConfiguration(), key, outKey); >>>>>>> >>>>>>> Iterable<VALUEIN> values = context.getValues(); >>>>>>> ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>(); >>>>>>> for(VALUEIN value: values) { >>>>>>> VALUEIN outValue = null; >>>>>>> try { >>>>>>> outValue = (VALUEIN)value.getClass().newInstance(); >>>>>>> } >>>>>>> catch(Exception e) {} >>>>>>> ReflectionUtils.copy(context.getConfiguration(), value, outValue); >>>>>>> } >>>>>>> >>>>>>> >>>>>>> if you have found any other solution please feel free to share >>>>>>> >>>>>>> Thank You. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <[email protected]> wrote: >>>>>>>> In reduce() we have: >>>>>>>> >>>>>>>> key1 values1 >>>>>>>> key2 values2 >>>>>>>> ... >>>>>>>> keyn valuesn >>>>>>>> >>>>>>>> so,what i want to do is join all values like a SQL: >>>>>>>> >>>>>>>> select * from values1,values2...valuesn; >>>>>>>> >>>>>>>> if memory is not enough to cache values,how to complete the join >>>>>>>> operation? >>>>>>>> my idea is clone the reducecontext,but it maybe not easy. >>>>>>>> >>>>>>>> Any help will be appreciated. >>>>>>>> >>>>>>>> >>>>>>>> 2013/3/13 Roth Effy <[email protected]> >>>>>>>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase >>>>>>>>> looks like only support equal join. >>>>>>>>> I want a non-equal join,but I have no idea now. >>>>>>>>> >>>>>>>>> >>>>>>>>> 2013/3/13 Azuryy Yu <[email protected]> >>>>>>>>>> you want a n:n join or 1:n join? >>>>>>>>>> >>>>>>>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <[email protected]> wrote: >>>>>>>>>>> I want to join two table data in reducer.So I need to find the >>>>>>>>>>> start of the table. >>>>>>>>>>> someone said the DataJoinReducerBase can help me,isn't it? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 2013/3/13 Azuryy Yu <[email protected]> >>>>>>>>>>>> you cannot use RecordReader in Reducer. >>>>>>>>>>>> >>>>>>>>>>>> what's the mean of you want get the record position? I cannot >>>>>>>>>>>> understand, can you give a simple example? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> sorry,I still can't understand how to use recordreader in the >>>>>>>>>>>>> reduce(),because the input is a RawKeyValueIterator in the class >>>>>>>>>>>>> reducecontext.so,I'm confused. >>>>>>>>>>>>> anyway,thank you. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> 2013/3/12 samir das mohapatra <[email protected]> >>>>>>>>>>>>>> Through the RecordReader and FileStatus you can get it. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <[email protected]> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> Hi,everyone, >>>>>>>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the >>>>>>>>>>>>>>> record position? >>>>>>>>>>>>>>> Now,what I thought is to save the context status,but class >>>>>>>>>>>>>>> Context doesn't implement a clone construct method. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Any help will be appreciated. >>>>>>>>>>>>>>> Thank you very much. >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> >>>>>>> Thanx and Regards >>>>>>> Vikas Jadhav >>> >>> >>> >>> -- >>> >>> >>> Thanx and Regards >>> Vikas Jadhav > > > > -- > > > Regards, > Vikas
