Ugly as hell, but should work.
Dudu
SELECT r_id,
CASE WHEN table1.property_value = 'False' THEN FALSE
WHEN table1.property_value = 'True' THEN TRUE
WHEN r.rea < rg.laa THEN FALSE
WHEN r.rea >= rg.laa THEN TRUE
ELSE FALSE END AS flag
FROM rs r
LEFT JOIN public.di_re rg
ON r.re = rg.re
LEFT JOIN (select r.r_id, table1.property_value
from rs r
join public.tbl table1
ON r.re = table1.re
where table1.property_name = 'xxxx'
AND r.rea BETWEEN table1.begin_time AND table1.end_time
) table1
on r.r_id = table1.r_id
From: Goden Yao [mailto:[email protected]]
Sent: Saturday, November 05, 2016 9:22 AM
To: [email protected]
Subject: Hive Left Join inequality condition
Hello!
Lately we have ran into the need to implement inequality JOIN in Hive, and we
could have easily done that with WHERE clause, if it was not the LEFT join.
Basically, we wonder how people implement LEFT/RIGHT JOIN with inequality
conditions in Hive without loss of efficiency.
Thank you.
Example:
SELECT r_id,
CASE WHEN table1.property_value = 'False' THEN FALSE
WHEN table1.property_value = 'True' THEN TRUE
WHEN r.rea < rg.laa THEN FALSE
WHEN r.rea >= rg.laa THEN TRUE
ELSE FALSE END AS flag
FROM rs r
LEFT JOIN public.di_re rg
ON r.re<http://r.re> = rg.re<http://rg.re>
LEFT JOIN public.tbl table1
ON r.re<http://r.re> = table1.re<http://table1.re>
AND table1.property_name = 'xxxx'
AND r.rea BETWEEN table1.begin_time AND table1.end_time
Error:
FAILED: SemanticException Line 0:-1 Both left and right aliases encountered in
JOIN ...
Ways to resolve:
· Move inequality condition in WHERE clause:
· WHERE r.rea BETWEEN table1.begin_time AND table1.end_time
· WARNING: Affects query logic - filters all the table instead of filtering
LEFT JOIN clause only;
· Move condition into SELECT field with CASE statement (if possible):
· SELECT r_id,
· CASE WHEN table1.property_value = 'False'
· AND r.rea BETWEEN table1.begin_time AND table1.end_time
THEN FALSE
· WHEN table1.property_value = 'True'
· AND r.rea BETWEEN table1.begin_time AND table1.end_time
THEN TRUE
Not possible in every case;
· Divide queries into two separate statements and UNION them: one query
with WHERE filter and another query totally omitting the JOIN to table that
needed inequality as well as omitting the ids from the first query:
· WITH stage AS (
· SELECT r_id,
· CASE WHEN table1.property_value = 'False' THEN FALSE
· WHEN table1.property_value = 'True' THEN TRUE
· WHEN r.rea < rg.laa THEN FALSE
· WHEN r.rea >= rg.laa THEN TRUE
· ELSE FALSE END as flag
· FROM rs r
· LEFT JOIN public.di_re rg
· ON r.re<http://r.re> = rg.re<http://rg.re>
· LEFT JOIN public.tbl table1
· ON r.region = table1.region
· AND table1.property_name = 'xxxx'
· WHERE r.rea BETWEEN table1.begin_time AND table1.end_time
· )
· SELECT * FROM stage
· UNION
· SELECT r_id,
· CASE WHEN r.rea < rg.laa THEN FALSE
· WHEN r.rea >= rg.laa THEN TRUE
· ELSE FALSE END as flag
· FROM rs r
· LEFT JOIN public.di_re rg
· ON r.re<http://r.re> = rg.re<http://rg.re>
· WHERE r_id NOT IN (SELECT DISTINCT r_id from stage)
Very expensive in terms of calculation, but in some cases inevitable.