Re: [VOTE] Sponsoring Howl as an Apache Incubator project
Hi John, Just to clarify where I was going with my line of questioning. There's no Apache policy that prevents dependencies on incubator project, whether it's releases, snapshots or even home-made hacked-together packaging of an incubator project.It's been done before and as long as the incubator code's IP has been cleared and the packaging isn't represented as an official release if it isn't so, there's no wrong in doing that. Now, whether the project choses to use and release with an incubator dependency is a matter of judgment (and ultimately a vote by committers if there is no consensus). I just wanted to make sure there were no incorrect assumptions made. alex On Thu, Feb 3, 2011 at 4:07 PM, John Sichi jsi...@fb.com wrote: I was going off of what I read in HADOOP-3676 (which lacks a reference as well). But I guess if a release can be made from the incubator, then it's not a blocker. JVS On Feb 3, 2011, at 3:29 PM, Alex Boisvert wrote: On Thu, Feb 3, 2011 at 11:38 AM, John Sichi jsi...@fb.com wrote: Besides the fact that the refactoring required is significant, I don't think this is possible to do quickly since: 1) Hive (unlike Pig) requires a metastore 2) Hive releases can't depend on an incubator project I'm not sure what you mean by can't depend on an incubator project here. AFAIK, there is no policy at Apache that projects should not depend on incubator projects. Can you clarify what you mean and why you think such a restriction exists? alex
Re: Help with last 30 day unique user query
As far as I know, Hive has no built-in support for sliding-window analytics. There is an enhancement request here: https://issues.apache.org/jira/browse/HIVE-896 https://issues.apache.org/jira/browse/HIVE-896Without such support, the brute force way of doing things is, SELECT COUNT(DISTINCT user_id) FROM events WHERE event_date start_date and event_date = end_date; (repeated N times to cover each day of your time window). alex On Thu, Oct 14, 2010 at 11:36 PM, Vijay tec...@gmail.com wrote: Hi, I need help with this scenario. We have a table of events which has columns date, event (not important for this discussion), and user_id. It is obviously easy to find number of unique users for each day. I also need to find number of unique users in the last 30 days for each day. This is also quite simple to do for one day. However, I cannot figure out how to do this for a range of days. Something like this is pretty straightforward in most RDBMS but with HiveQL has I'm finding this hard. I might be missing something simple though. Any help is appreciated. Ideally the query should also be as optimized as possible as this table could be huge. Thanks, Vijay
UDAF modes
Hi, I'm writing a UDAF and I'm a little unclear about the PARTIAL1, PARTIAL2, FINAL and COMPLETE modes. I've read the extent of the Javadoc ;) and looked at some of the built-in UDAFs in the Hive source tree and I'm still unclear about the properties of the input data in each aggregation step. Could anybody elaborate a little on the input data in each mode? Say, what are the safe assumptions for each mode assuming, e.g., CLUSTERED BY clause? thanks! alex