I suspect very few people are still using Hive 0.6 or older. Try upgrading.


From: Florin Diaconeasa [mailto:[email protected]]
Sent: Monday, October 31, 2011 6:37 AM
To: [email protected]
Subject: High number of input files problems

Hello,

Lately our user base has increased so the input files have increased 
considerably in size and number.

One of our processing steps is doing a query of the form found at the end of 
the email. My problem is that apparently, sometimes, the processing misses some 
of the input files (for the 2nd select in most cases).

I'm using Hive 0.6, Hadoop 0.20.2 on a Debian 5 64bit and we are connecting to 
a hive server instance using JDBC. Any idea on what parameters i could tune of 
any tickets that have been opened on this problem? I searched the Hive JIRA for 
nothing until now... The only thing that i think might be related is 
https://issues.apache.org/jira/browse/HIVE-1884

SELECT
            t.a,
            sum(t.b),
            sum(t.c),
            sum(t.d)
FROM
(
            SELECT
                        a,
                        sum(x) as b,
                        sum(y) as c,
                        sum(z) as d
            FROM T1
            WHERE ...
            GROUP BY ...

UNION ALL

            SELECT
                        a,
                        sum(x) as b,
                        sum(y) as c,
                        sum(z) as d
            FROM T2
            WHERE ...
            GROUP BY ...

UNION ALL

            SELECT
                        a,
                        sum(x) as b,
                        sum(y) as c,
                        sum(z) as d
            FROM T3
            WHERE ...
            GROUP BY ...
) t

GROUP BY ...



--


Florin

Reply via email to