On Tue, 11 Feb 2003, Joe Hellerstein wrote: > Hi all: > I emailed Marc Fournier on this topic some weeks back, but haven't > heard from him.
And most public apologies for that ... this past month has been a complete nightmare all around ... we're just finishing up moving our office, and finally have phone lines again, and hope to have internet again starting tomorrow ... :( > 1) We changed the course projects to make the students hack PostgreSQL > internals, rather than the "minibase" eduware > 2) We are coordinating the class with a class at CMU being taught by > Prof. Anastassia ("Natassa") Ailamaki > > Our "Homework 2", which is being passed out this week, will ask the > students to implement a hash-based grouping that spills to disk. I > understand this topic has been batted about the pgsql-hackers list > recently. The TAs who've prepared the assignment (Sailesh > Krishnamurthy at Berkeley and Spiros Papadimitriou at CMU) have also > implemented a reference solution to assignment. Once we've got the > students' projects all turned in, we'll be very happy to contribute our > code back the PostgreSQL project. > > I'm hopeful this will lead to many good things: > > 1) Each year we can pick another feature to assign in class, and > contribute back. We'll need to come up with well-scoped engine > features that exercise concepts from the class -- eventually we'll run > out of tractable things that PGSQL needs, but not in the next couple > years I bet. > > 2) We'll raise a crop of good students who know Postgres internals. > Roughly half the Berkeley EECS undergrads take the DB class, and all of > them will be post-hackers! (Again, I don't know the stats at CMU.) > > So consider this a heads up on the hash-agg front, and on the future > contributions front. I'll follow up with another email on > PostgreSQL-centered research in our group at Berkeley as well. > > Another favor I'd ask is that people on the list be a bit hesitant > about helping our students with their homework! We would like them to > do it themselves, more or less :-) > > Regards, > Joe Hellerstein > > -- > > Joseph M. Hellerstein > Professor, EECS Computer Science Division > UC Berkeley > http://www.cs.berkeley.edu/~jmh > > > On Tuesday, February 11, 2003, at 06:54 PM, Sailesh Krishnamurthy > wrote: > > > From: Hannu Krosing <[EMAIL PROTECTED]> > > Date: Tue Feb 11, 2003 12:21:26 PM US/Pacific > > To: Tom Lane <[EMAIL PROTECTED]> > > Cc: Bruno Wolff III <[EMAIL PROTECTED]>, Greg Stark <[EMAIL PROTECTED]>, > > [EMAIL PROTECTED] > > Subject: Re: [HACKERS] Hash grouping, aggregates > > > > > > Tom Lane kirjutas T, 11.02.2003 kell 18:39: > >> Bruno Wolff III <[EMAIL PROTECTED]> writes: > >>> Tom Lane <[EMAIL PROTECTED]> wrote: > >>>> Greg Stark <[EMAIL PROTECTED]> writes: > >>>>> The neat thing is that hash aggregates would allow grouping on > >>>>> data types that > >>>>> have = operators but no useful < operator. > >>>> > >>>> Hm. Right now I think that would barf on you, because the parser > >>>> wants > >>>> to find the '<' operator to label the grouping column with, even if > >>>> the > >>>> planner later decides not to use it. It'd take some redesign of the > >>>> query data structure (specifically SortClause/GroupClause) to avoid > >>>> that. > >> > >>> I think another issue is that for some = operators you still might > >>> not > >>> be able to use a hash. I would expect the discussion for hash joins > >>> in > >>> http://developer.postgresql.org/docs/postgres/xoper-optimization.html > >>> would to hash aggregates as well. > >> > >> Right, the = operator must be hashable or you're out of luck. But we > >> could imagine tweaking the parser to allow GROUP BY if it finds a > >> hashable = operator and no sort operator. The only objection I can > >> see > >> to this is that it means the planner *must* use hash aggregation, > >> which > >> might be a bad move if there are too many distinct groups. > > > > If we run out of sort memory, we can always bail out later, preferrably > > with a descriptive error message. It is not as elegant as erring out at > > parse (or even plan/optimise) time, but the result is /almost/ the > > same. > > > > Relying on hash aggregation will become essential if we are ever going > > to implement the "other" groupings (CUBE, ROLLUP, (), ...), so it would > > be nice if hash aggregation could also overflow to disk - I suspect > > that > > this will still be faster that running an independent scan for each > > GROUP BY grouping and merging the results. > > > > ----- > > Hannu > > > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 1: subscribe and unsubscribe commands go to > > [EMAIL PROTECTED] > > > > > > > > > > -- > > Pip-pip > > Sailesh > > http://www.cs.berkeley.edu/~sailesh > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html