Re: How can I record some position of context in Reduce()?

Michel Segel Fri, 12 Apr 2013 05:51:44 -0700

Theta joins don't do we'll in any system.

Are both tables large?
If not its a map side join and the reducer will be just an ordinary reducer(s).



Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 11, 2013, at 12:18 AM, Vikas Jadhav <[email protected]> wrote:

> I wil express it in SQL form
>  
> select * from table1, table2 where table1.attr < table2.attr
>  
> it is also called theta join where theta can be <, >, <=,>=,!=
>  
> 
> 
> On Wed, Apr 10, 2013 at 9:35 PM, Michel Segel <[email protected]> 
> wrote:
>> Not sure what is meant by a non equi join.
>> 
>> Are you saying something like for every row in X, join it to all of the rows 
>> in Y where Y.a < something?
>> 
>> Is that what you are suggesting?
>> 
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On Apr 10, 2013, at 9:11 AM, Vikas Jadhav <[email protected]> wrote:
>> 
>>> How are you going to support NON EQUI Join using MapReduce ?
>>> As per my understanding there is only one way to do this is
>>> to bring all data to one reducer then reducer will know lesser/greater
>>> values correctly.
>>> Correct me if I am wrong.
>>> Thank You.
>>>  
>>>   Regards,
>>>   Vikas
>>>  
>>> 
>>> 
>>> On Wed, Apr 10, 2013 at 4:22 PM, Michel Segel <[email protected]> 
>>> wrote:
>>>> Can you show an example of your join?
>>>> All joins are an equality in that the key has to match.
>>>> Whether its a one to one , one to many, or many to many remains to be seen.
>>>> 
>>>> 
>>>> Sent from a remote device. Please excuse any typos...
>>>> 
>>>> Mike Segel
>>>> 
>>>> On Apr 9, 2013, at 10:35 AM, Effyroth Gu <[email protected]> wrote:
>>>> 
>>>>> Only equality joins, outer joins, and left semi joins are supported in 
>>>>> Hive. Hive does not support join conditions that are not equality 
>>>>> conditions as it is very difficult to express such conditions as a 
>>>>> map/reduce job. Also, more than two tables can be joined in Hive.
>>>>> 
>>>>> 
>>>>> 2013/4/9 Michael Segel <[email protected]>
>>>>>> Hi,
>>>>>> 
>>>>>> Your cross join is supported in both pig and hive. (Cross, and Theta 
>>>>>> joins) 
>>>>>> 
>>>>>> So there must be code to do this. 
>>>>>> 
>>>>>> Essentially in the reducer you would have your key and then the set of 
>>>>>> rows that match the key. You would then perform the cross product on the 
>>>>>> key's result set and output them to the collector as separate rows. 
>>>>>> 
>>>>>> I'm not sure why you would need the reduce context. 
>>>>>> 
>>>>>> But then again, I'm still on my first cup of coffee. ;-)
>>>>>> 
>>>>>> 
>>>>>> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <[email protected]> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi
>>>>>>> I am also woring on join using MapReduce
>>>>>>> i think instead of finding postion of table in RawKeyValuIterator.
>>>>>>> what we can do modify context.write method to alway write key as table 
>>>>>>> name or id
>>>>>>> then we dont need to find postion we can get Key and Value from 
>>>>>>> "reducerContext"
>>>>>>>  
>>>>>>> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
>>>>>>> method join in Reducer.java Reducer class and give call to 
>>>>>>> reducer.join(reduceContext)
>>>>>>>  
>>>>>>>  
>>>>>>> I just wonder how r going to support NON EQUI join.
>>>>>>>  
>>>>>>> I am also having same problem how to do join if datasets cant fit in to 
>>>>>>> memory.
>>>>>>>  
>>>>>>>  
>>>>>>> for now i am cloning using following code :
>>>>>>>  
>>>>>>>  
>>>>>>> KEYIN key = context.getCurrentKey() ;
>>>>>>> KEYIN outKey = null;
>>>>>>> try {
>>>>>>>     outKey = (KEYIN)key.getClass().newInstance();
>>>>>>>    }
>>>>>>> catch(Exception e)
>>>>>>>  {}         
>>>>>>> ReflectionUtils.copy(context.getConfiguration(), key, outKey);       
>>>>>>> 
>>>>>>>  Iterable<VALUEIN> values = context.getValues();
>>>>>>>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>>>>>>>  for(VALUEIN value: values) {        
>>>>>>>    VALUEIN outValue = null;
>>>>>>>     try {
>>>>>>>          outValue = (VALUEIN)value.getClass().newInstance();
>>>>>>>    }
>>>>>>>    catch(Exception e)    {}          
>>>>>>>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>>>>>>>  }
>>>>>>>  
>>>>>>>  
>>>>>>> if you have found any other solution please feel free to share
>>>>>>>  
>>>>>>> Thank You.
>>>>>>>  
>>>>>>>        
>>>>>>>  
>>>>>>>  
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <[email protected]> wrote:
>>>>>>>> In reduce() we have:
>>>>>>>> 
>>>>>>>> key1 values1
>>>>>>>> key2 values2
>>>>>>>> ...
>>>>>>>> keyn valuesn
>>>>>>>> 
>>>>>>>> so,what i want to do is join all values like a SQL:
>>>>>>>> 
>>>>>>>> select * from values1,values2...valuesn;
>>>>>>>> 
>>>>>>>> if memory is not enough to cache values,how to complete the join 
>>>>>>>> operation?
>>>>>>>> my idea is clone the reducecontext,but it maybe not easy.
>>>>>>>> 
>>>>>>>> Any help will be appreciated.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 2013/3/13 Roth Effy <[email protected]>
>>>>>>>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase 
>>>>>>>>> looks like only support equal join.
>>>>>>>>> I want a non-equal join,but I have no idea now.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 2013/3/13 Azuryy Yu <[email protected]>
>>>>>>>>>> you want a n:n join or 1:n join?
>>>>>>>>>> 
>>>>>>>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <[email protected]> wrote:
>>>>>>>>>>> I want to join two table data in reducer.So I need to find the 
>>>>>>>>>>> start of the table.
>>>>>>>>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 2013/3/13 Azuryy Yu <[email protected]>
>>>>>>>>>>>> you cannot use RecordReader in Reducer.
>>>>>>>>>>>>  
>>>>>>>>>>>> what's the mean of you want get the record position? I cannot 
>>>>>>>>>>>> understand, can you give a simple example?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <[email protected]> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> sorry，I still can't understand how to use recordreader in the 
>>>>>>>>>>>>> reduce(),because the input is a RawKeyValueIterator in the class 
>>>>>>>>>>>>> reducecontext.so,I'm confused.
>>>>>>>>>>>>> anyway,thank you.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 2013/3/12 samir das mohapatra <[email protected]>
>>>>>>>>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <[email protected]> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> Hi,everyone,
>>>>>>>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the 
>>>>>>>>>>>>>>> record position?
>>>>>>>>>>>>>>> Now,what I thought is to save the context status,but class 
>>>>>>>>>>>>>>> Context doesn't implement a clone construct method.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Any help will be appreciated.
>>>>>>>>>>>>>>> Thank you very much.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> 
>>>>>>> 
>>>>>>> Thanx and Regards
>>>>>>>  Vikas Jadhav
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> 
>>> Thanx and Regards
>>>  Vikas Jadhav
> 
> 
> 
> -- 
> 
> 
>   Regards,
>    Vikas

Re: How can I record some position of context in Reduce()?

Reply via email to