As you expected, there is no documentation on it (like other optimizers) Javadoc of the class might be helpful but seemed not in detail enough.
2013/7/2 Peter Marron <peter.mar...@trilliumsoftware.com>: > Thanks Navis, > > This is a very interesting class which I feel pretty sure that I would never > have found. > Are there any descriptions, motivations, documentation or examples anywhere? > I suspect that there's nothing other than the source itself, but I had to ask. > > Regards, > > Z > -----Original Message----- > From: Navis류승우 [mailto:navis....@nexr.com] > Sent: 02 July 2013 08:50 > To: user@hive.apache.org > Subject: Re: Override COUNT() function > > MetadataOnlyOptimizer changes GBY on partition columns to simple TableScan > with one line dummy. > > I think similar things can be done with stats. > > 2013/6/28 Peter Marron <peter.mar...@trilliumsoftware.com>: >> Hi, >> >> >> >> I feel sure that someone has asked for this before, but here goes… >> >> >> >> In the case where I have the query >> >> >> >> SELECT COUNT(*) FROM table; >> >> >> >> There are many cases where I can determine the count immediately. >> >> (For example if I have run something like: >> >> >> >> ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], >> ...)] COMPUTE STATISTICS [noscan]; >> >> >> >> then there seems to be a table property “numRows” which holds a count >> of the number of rows. >> >> Now I know that the COUNT function can’t always be determined easily. >> >> If the query is more complicated, like >> >> >> >> SELECT COUNT(*) FROM table GROUP BY column; >> >> >> >> then obviously a simple scalar count is of no real use. But is there >> some way >> >> to intercept the simple case and avoid running a table scan? >> >> >> >> One problem that I see is that the COUNT function is a UDAF and I am >> >> assuming that the presence of any aggregate function like this is >> enough >> >> to force the query planner to require a Map/Reduce. Is there anyway >> >> to make the function look like a simple UDF for some queries? Or >> >> just for some tables? I guess that I’d be prepared to sacrifice the >> full >> >> generality of the normal COUNT function for one which >> >> only functions correctly for the simple query on my tables. >> >> >> >> So is it possible to have a different COUNT function only on certain tables? >> >> >> >> Regards, >> >> >> >> Z >> >> >