Re: Work around for using OR in Joins

2011-03-23 Thread MIS
Here is my query :*
select table1.column10, table1.column11, count(distinct table2.column3) as
total from table1 JOIN table2 ON (table1.column1='value1' and
to_date(table1.column2) >= '2011-01-06'  and to_date(table1.column2) <
'2011-01-07' and  table2.column3!='' and table2.column3 is NOT NULL and
table1.coulmn4 !='A' and table1.column5 != '' and
table2.column3=table1.column3 and ((table1.coulmn6 in ('val1') and
to_date(table2.column7)='2011-01-06') or (table1.coulmn6 in ('val2') and
to_date(table2.column8)='2011-01-06') or (table1.coulmn6 in ('val3') and
to_date(table2.column9)='2011-01-06'))) where group by table1.column10,
table1.column11;*

Thanks,
MIS.


On Wed, Mar 23, 2011 at 12:54 PM, Ning Zhang  wrote:

> What is your OR conditions? Are they involving both tables? Can you post
> your query here?
>
> On Mar 23, 2011, at 12:04 AM, MIS wrote:
>
> > Ning, Thanks for the reply.
> > Yes. you are right. Using NOT and AND didn't work as expected.
> > I'll give a try in implementing nested-loop map-side join.
> >
> > In the meanwhile, I brought out the expression using OR from inside the
> JOIN
> > expression to be used in the filtering expression {in the WHERE clause },
> > but I see some difference in the results produced with what is expected.
> > Since,  I'm not using an OUTER join, I expected desired results. Any idea
> > why is the OR expression in filtering not working as desired ? Any
> thought
> > on this are welcome.
> >
> > Thanks,
> > MIS.
> >
> >
> > On Wed, Mar 23, 2011 at 10:28 AM, Ning Zhang  wrote:
> >
> >> Joins with OR conditions are not supported by Hive currently. I think
> even
> >> though you rewrite the condition to use NOT and AND only, the results
> may be
> >> wrong.
> >>
> >> It is quite hard to implement joins of any tables with OR conditions in
> a
> >> MapReduce framework. it is straightforward to implement it in
> nested-loop
> >> join, but due to the nature of distributed processing, nested loop join
> >> cannot be implemented in an efficient and scalable way in MapReduce. In
> >> nested-loop join, each mapper need to join a split of LHS table with the
> >> whole RHS table which could be terabytes.
> >>
> >> The regular (reduce-side) join in Hive is essentially a sort-merge join
> >> operator. With that in mind, it's hard to implement OR conditions in the
> >> sort-merge join.
> >>
> >> One exception is the map-side join, which assumes the RHS table is small
> >> and will be read fully into each mapper. Currently map-side join in Hive
> is
> >> a hash-based join operator. You can implement a nested-loop map-side
> join
> >> operator to enable any join conditions including OR.
> >>
> >> On Mar 22, 2011, at 1:39 AM, MIS wrote:
> >>
> >>> Found it at  *org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.java*
> line
> >>> no. 1122
> >>> There is some concern mentioned that supporting OR would lead to data
> >>> explosion. Is it discussed/documneted in a little more detail somewhere
> ?
> >> If
> >>> so, some pointers towards the same will be helpful.
> >>>
> >>> Thanks,
> >>> MIS.
> >>>
> >>> On Tue, Mar 22, 2011 at 1:19 PM, MIS  wrote:
> >>>
>  I want to use OR in the join expression, but it seems only AND is
> >> supported
>  as of now.
>  I have a work around though to use DeMorgan's law {C1 OR C2 = !(!C1
> AND
>  !C2))} , but it would be nice if somebody can point me to the location
> >> in
>  code base that would need modification to support the OR in the join
>  expression.
> 
>  Thanks,
>  MIS.
> 
> >>
> >>
>
>


Re: Work around for using OR in Joins

2011-03-23 Thread Ning Zhang
What is your OR conditions? Are they involving both tables? Can you post your 
query here?

On Mar 23, 2011, at 12:04 AM, MIS wrote:

> Ning, Thanks for the reply.
> Yes. you are right. Using NOT and AND didn't work as expected.
> I'll give a try in implementing nested-loop map-side join.
> 
> In the meanwhile, I brought out the expression using OR from inside the JOIN
> expression to be used in the filtering expression {in the WHERE clause },
> but I see some difference in the results produced with what is expected.
> Since,  I'm not using an OUTER join, I expected desired results. Any idea
> why is the OR expression in filtering not working as desired ? Any thought
> on this are welcome.
> 
> Thanks,
> MIS.
> 
> 
> On Wed, Mar 23, 2011 at 10:28 AM, Ning Zhang  wrote:
> 
>> Joins with OR conditions are not supported by Hive currently. I think even
>> though you rewrite the condition to use NOT and AND only, the results may be
>> wrong.
>> 
>> It is quite hard to implement joins of any tables with OR conditions in a
>> MapReduce framework. it is straightforward to implement it in nested-loop
>> join, but due to the nature of distributed processing, nested loop join
>> cannot be implemented in an efficient and scalable way in MapReduce. In
>> nested-loop join, each mapper need to join a split of LHS table with the
>> whole RHS table which could be terabytes.
>> 
>> The regular (reduce-side) join in Hive is essentially a sort-merge join
>> operator. With that in mind, it's hard to implement OR conditions in the
>> sort-merge join.
>> 
>> One exception is the map-side join, which assumes the RHS table is small
>> and will be read fully into each mapper. Currently map-side join in Hive is
>> a hash-based join operator. You can implement a nested-loop map-side join
>> operator to enable any join conditions including OR.
>> 
>> On Mar 22, 2011, at 1:39 AM, MIS wrote:
>> 
>>> Found it at  *org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.java* line
>>> no. 1122
>>> There is some concern mentioned that supporting OR would lead to data
>>> explosion. Is it discussed/documneted in a little more detail somewhere ?
>> If
>>> so, some pointers towards the same will be helpful.
>>> 
>>> Thanks,
>>> MIS.
>>> 
>>> On Tue, Mar 22, 2011 at 1:19 PM, MIS  wrote:
>>> 
 I want to use OR in the join expression, but it seems only AND is
>> supported
 as of now.
 I have a work around though to use DeMorgan's law {C1 OR C2 = !(!C1 AND
 !C2))} , but it would be nice if somebody can point me to the location
>> in
 code base that would need modification to support the OR in the join
 expression.
 
 Thanks,
 MIS.
 
>> 
>> 



Re: Work around for using OR in Joins

2011-03-23 Thread MIS
Ning, Thanks for the reply.
Yes. you are right. Using NOT and AND didn't work as expected.
I'll give a try in implementing nested-loop map-side join.

In the meanwhile, I brought out the expression using OR from inside the JOIN
expression to be used in the filtering expression {in the WHERE clause },
but I see some difference in the results produced with what is expected.
Since,  I'm not using an OUTER join, I expected desired results. Any idea
why is the OR expression in filtering not working as desired ? Any thought
on this are welcome.

Thanks,
MIS.


On Wed, Mar 23, 2011 at 10:28 AM, Ning Zhang  wrote:

> Joins with OR conditions are not supported by Hive currently. I think even
> though you rewrite the condition to use NOT and AND only, the results may be
> wrong.
>
> It is quite hard to implement joins of any tables with OR conditions in a
> MapReduce framework. it is straightforward to implement it in nested-loop
> join, but due to the nature of distributed processing, nested loop join
> cannot be implemented in an efficient and scalable way in MapReduce. In
> nested-loop join, each mapper need to join a split of LHS table with the
> whole RHS table which could be terabytes.
>
> The regular (reduce-side) join in Hive is essentially a sort-merge join
> operator. With that in mind, it's hard to implement OR conditions in the
> sort-merge join.
>
> One exception is the map-side join, which assumes the RHS table is small
> and will be read fully into each mapper. Currently map-side join in Hive is
> a hash-based join operator. You can implement a nested-loop map-side join
> operator to enable any join conditions including OR.
>
> On Mar 22, 2011, at 1:39 AM, MIS wrote:
>
> > Found it at  *org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.java* line
> > no. 1122
> > There is some concern mentioned that supporting OR would lead to data
> > explosion. Is it discussed/documneted in a little more detail somewhere ?
> If
> > so, some pointers towards the same will be helpful.
> >
> > Thanks,
> > MIS.
> >
> > On Tue, Mar 22, 2011 at 1:19 PM, MIS  wrote:
> >
> >> I want to use OR in the join expression, but it seems only AND is
> supported
> >> as of now.
> >> I have a work around though to use DeMorgan's law {C1 OR C2 = !(!C1 AND
> >> !C2))} , but it would be nice if somebody can point me to the location
> in
> >> code base that would need modification to support the OR in the join
> >> expression.
> >>
> >> Thanks,
> >> MIS.
> >>
>
>


Re: Work around for using OR in Joins

2011-03-22 Thread Ning Zhang
Joins with OR conditions are not supported by Hive currently. I think even 
though you rewrite the condition to use NOT and AND only, the results may be 
wrong. 
 
It is quite hard to implement joins of any tables with OR conditions in a 
MapReduce framework. it is straightforward to implement it in nested-loop join, 
but due to the nature of distributed processing, nested loop join cannot be 
implemented in an efficient and scalable way in MapReduce. In nested-loop join, 
each mapper need to join a split of LHS table with the whole RHS table which 
could be terabytes. 

The regular (reduce-side) join in Hive is essentially a sort-merge join 
operator. With that in mind, it's hard to implement OR conditions in the 
sort-merge join. 

One exception is the map-side join, which assumes the RHS table is small and 
will be read fully into each mapper. Currently map-side join in Hive is a 
hash-based join operator. You can implement a nested-loop map-side join 
operator to enable any join conditions including OR. 

On Mar 22, 2011, at 1:39 AM, MIS wrote:

> Found it at  *org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.java* line
> no. 1122
> There is some concern mentioned that supporting OR would lead to data
> explosion. Is it discussed/documneted in a little more detail somewhere ? If
> so, some pointers towards the same will be helpful.
> 
> Thanks,
> MIS.
> 
> On Tue, Mar 22, 2011 at 1:19 PM, MIS  wrote:
> 
>> I want to use OR in the join expression, but it seems only AND is supported
>> as of now.
>> I have a work around though to use DeMorgan's law {C1 OR C2 = !(!C1 AND
>> !C2))} , but it would be nice if somebody can point me to the location in
>> code base that would need modification to support the OR in the join
>> expression.
>> 
>> Thanks,
>> MIS.
>> 



Re: Work around for using OR in Joins

2011-03-22 Thread MIS
Found it at  *org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.java* line
no. 1122
There is some concern mentioned that supporting OR would lead to data
explosion. Is it discussed/documneted in a little more detail somewhere ? If
so, some pointers towards the same will be helpful.

Thanks,
MIS.

On Tue, Mar 22, 2011 at 1:19 PM, MIS  wrote:

> I want to use OR in the join expression, but it seems only AND is supported
> as of now.
> I have a work around though to use DeMorgan's law {C1 OR C2 = !(!C1 AND
> !C2))} , but it would be nice if somebody can point me to the location in
> code base that would need modification to support the OR in the join
> expression.
>
> Thanks,
> MIS.
>


Work around for using OR in Joins

2011-03-22 Thread MIS
I want to use OR in the join expression, but it seems only AND is supported
as of now.
I have a work around though to use DeMorgan's law {C1 OR C2 = !(!C1 AND
!C2))} , but it would be nice if somebody can point me to the location in
code base that would need modification to support the OR in the join
expression.

Thanks,
MIS.