The advice I have always seen for your case is to transform the subquery in
the WHERE clause into a LEFT OUTER JOIN.




On Thu, Jul 25, 2013 at 11:04 AM, Edson Ramiro <erlfi...@gmail.com> wrote:

> AFAIK,
>
> Hive supports subqueries only in the FROM clause.
>
> Maybe you have to split you query into more queries...
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries
>
>
>
>
>        Edson Ramiro
>
>
> On Thu, Jul 25, 2013 at 9:31 AM, Jérôme Verdier <
> verdier.jerom...@gmail.com> wrote:
>
>> Hi Bennie,
>>
>> I was trying some solutions to pass through my problem, and a problem
>> occurs
>>
>> here is the error :
>>
>> FAILED: ParseException line 26:14 cannot recognize input near 'SELECT'
>> 'cal' '.' in expression specification
>>
>> Is AND...BETWEEN ( SELECT..... is possible in Hive?
>>
>>
>> 2013/7/25 Bennie Schut <bsc...@ebuddy.com>
>>
>>>  Hi Jerome,
>>>
>>> Yes it looks like you could stop using GET_SEMAINE  and directly joining
>>> "calendrier_hebdo" with "calendrier" for example. For "FCALC_IDJOUR" you
>>> will have to make a udf so I hope you have some java skills :)
>>> The "calendrier" tables suggests you have star schema with a calendar
>>> table. If on oracle you partitioned on a date and use a subquery to get the
>>> dates you want from the fact table you can expect this to be a problem in
>>> hive. Partition pruning works during planning it will not know which
>>> partitioned to prune and thus run on all the data in the fact table and
>>> filter after it's done making partitioning useless. There are ways of
>>> working around this, it seems most people decide to use a "deterministic"
>>> udf which produces the dates and this causes the udfs to be run during
>>> planning and partition pruning magically works again.
>>> Hope this helps.
>>>
>>> Bennie.
>>>
>>> Op 25-7-2013 09:50, Jérôme Verdier schreef:
>>>
>>>    Hi,
>>>
>>>  I need some help to translate a PL/SQL script in HiveQL.
>>>
>>>  Problem : my PL/SQL script is calling two functions.
>>>
>>>  you can see the script below :
>>>
>>>     SELECT
>>>       in_co_societe                             as co_societe,
>>>       'SEMAINE'                                 as co_type_periode,
>>>       a.type_entite                             as type_entite,
>>>       a.code_entite                             as code_entite,
>>>       a.type_rgrp_produits                      as type_rgrp_produits,
>>>       a.co_rgrp_produits                        as co_rgrp_produits,
>>>       SUM(a.MT_CA_NET_TTC)                      as MT_CA_NET_TTC,
>>>       SUM(a.MT_OBJ_CA_NET_TTC)                  as MT_OBJ_CA_NET_TTC,
>>>       SUM(a.NB_CLIENTS)                         as NB_CLIENTS,
>>>       SUM(a.MT_CA_NET_TTC_COMP)                 as MT_CA_NET_TTC_COMP,
>>>       SUM(a.MT_OBJ_CA_NET_TTC_COMP)             as
>>> MT_OBJ_CA_NET_TTC_COMP,
>>>       SUM(a.NB_CLIENTS_COMP)                    as NB_CLIENTS_COMP
>>>     from
>>>       kpi.thm_ca_rgrp_produits_jour/*@o_bi.match.eu*/ a
>>>     WHERE
>>>         a.co_societe = in_co_societe
>>>     AND a.dt_jour between
>>>       (
>>>         SELECT
>>>           cal.dt_jour_deb
>>>         FROM ods.calendrier_hebdo cal
>>>         WHERE cal.co_societe = in_co_societe
>>>         AND cal.co_an_semaine = ods.package_date.get_semaine(
>>>           ods.package_date.fcalc_idjour(
>>>             CASE
>>>               WHEN TO_CHAR(D_Dernier_Jour,'YYYY') =
>>> TO_CHAR(D_Dernier_Jour-364,'YYYY') THEN
>>>                 NEXT_DAY(D_Dernier_Jour-364,1)-7
>>>               ELSE
>>>                 D_Dernier_Jour-364
>>>             END
>>>           )
>>>         )
>>>       )
>>>       AND D_Dernier_Jour-364
>>>     -- On ne calcule rien si la semaine est compl¿¿te
>>>     AND (
>>>           TO_CHAR(D_Dernier_Jour,'DDMM') <> '3112'
>>>       AND TO_CHAR(D_Dernier_Jour,'D') <> '7'
>>>     )
>>>     GROUP BY
>>>       a.type_entite,
>>>       a.code_entite,
>>>       a.type_rgrp_produits,
>>>       a.co_rgrp_produits;
>>>
>>>  The function ods.package_date.get_semaine is :
>>>
>>> FUNCTION GET_SEMAINE
>>>        (ID_DEB  IN NUMBER)
>>>   RETURN NUMBER
>>>   IS
>>>     SEMAINE  NUMBER(10);
>>>   BEGIN
>>>     SELECT CO_AN_SEMAINE
>>>     INTO   SEMAINE
>>>     FROM   CALENDRIER
>>>     WHERE  ID_JOUR = ID_DEB;
>>>
>>>     RETURN (SEMAINE);
>>>   EXCEPTION
>>>     WHEN NO_DATA_FOUND THEN
>>>       RETURN (0);
>>>     WHEN OTHERS THEN
>>>       RETURN (0);
>>>   END;
>>>
>>>  The function ods.package_date.fcalc_idjour is below :
>>>
>>> FUNCTION FCALC_IDJOUR
>>>        (DATE_REFERENCE  IN DATE)
>>>   RETURN NUMBER
>>>   IS
>>>     NM_ANNEE        NUMBER := TO_NUMBER(TO_CHAR(DATE_REFERENCE,'YYYY'));
>>>     NM_MOIS         NUMBER :=
>>> TO_NUMBER(SUBSTR(TO_CHAR(DATE_REFERENCE,'YYYYMM'),5,2));
>>>     NM_JOUR         NUMBER :=
>>> TO_NUMBER(SUBSTR(TO_CHAR(DATE_REFERENCE,'YYYYMMDD'),7,2));
>>>     IDJOUR_CALCULE  NUMBER := 0;
>>>   BEGIN
>>>     IF NM_ANNEE < 1998
>>>         OR DATE_REFERENCE IS NULL THEN
>>>       IDJOUR_CALCULE := 0;
>>>     ELSE
>>>       IDJOUR_CALCULE := ((NM_ANNEE - 1998) * 600) + ((NM_MOIS - 01) *
>>> 50) + NM_JOUR;
>>>     END IF;
>>>
>>>     RETURN IDJOUR_CALCULE;
>>>     DBMS_OUTPUT.PUT_LINE(IDJOUR_CALCULE);
>>>   END FCALC_IDJOUR;
>>>
>>>  Is it possible to translate this in one  HiveQL script ?
>>>
>>>
>>>
>>
>>
>> --
>> *Jérôme VERDIER*
>> 06.72.19.17.31
>> verdier.jerom...@gmail.com
>>
>>
>


-- 


https://github.com/bearrito
@deepbearrito

Reply via email to