I typically change my query to query from a limited version of the whole table.
Change select really_expensive_select_clause from really_big_table where something=something group by something=something to select really_expensive_select_clause from ( select * from really_big_table limit 100 )t where something=something group by something=something On Tue, Mar 5, 2013 at 10:57 AM, Dean Wampler <dean.wamp...@thinkbiganalytics.com> wrote: > Unfortunately, it will still go through the whole thing, then just limit the > output. However, there's a flag that I think only works in more recent Hive > releases: > > set hive.limit.optimize.enable=true > > This is supposed to apply limiting earlier in the data stream, so it will > give different results that limiting just the output. > > Like Chuck said, you might consider sampling, but unless your table is > organized into buckets, you'll at least scan the whole table, but maybe not > do all computation over it ?? > > Also, if you have a small sample data set: > > set hive.exec.mode.local.auto=true > > will cause Hive to bypass the Job and Task Trackers, calling APIs directly, > when it can do the whole thing in a single process. Not "lightning fast", > but faster. > > dean > > On Tue, Mar 5, 2013 at 12:48 PM, Joey D'Antoni <jdant...@yahoo.com> wrote: >> >> Just add a limit 1 to the end of your query. >> >> >> >> >> On Mar 5, 2013, at 1:45 PM, Kyle B <kbi...@gmail.com> wrote: >> >> Hello, >> >> I was wondering if there is a way to quick-verify a Hive query before it >> is run against a big dataset? The tables I am querying against have millions >> of records, and I'd like to verify my Hive query before I run it against all >> records. >> >> Is there a way to test the query against a small subset of the data, >> without going into full MapReduce? As silly as this sounds, is there a way >> to MapReduce without the overhead of MapReduce? That way I can check my >> query is doing what I want before I run it against all records. >> >> Thanks, >> >> -Kyle > > > > > -- > Dean Wampler, Ph.D. > thinkbiganalytics.com > +1-312-339-1330 >