On Mon, May 18, 2009 at 6:47 AM, Kenneth Marshall <k...@rice.edu> wrote:
> On Sun, May 17, 2009 at 10:04:46PM -0600, Rob Sargent wrote: > > Can one extrapolate future performance expectations for ever-growing > tables > > from a given (non-trivial) data set, and if so with what curve? > Corollary: > > what would one expect a performance curve to approximate in terms of > query > > execution time v. number of data rows (hardware, load staying constant). > > > > I have user and group information on system usage. I would like to be > able > > to do year-to-date counts per user given a single group id but in the > data > > for one business quarter the query is taking in between 10 and 60+ > seconds > > depending on both on the size of the group and the group's total usage. > > Groups typically have 10-100 users and consume 20K - 80K records in a 9M > > record data set. Group id column is indexed, but it is not the primary > > index. (Sad note: two pseudo groups account for 50 percent of the total > > records IIRC (and will never be used for the usage-by-group query below) > > > > This is a single table query: > > > > select user_id, element_type, count(*) > > from dataset > > where group_id = N > > group by user_id, element_type > > order by user_id, element_type > > > > Is this the sort of situation which might benefit from increasing the > number > > of histogram bins (alter table alter column statistics (N>10))? > > > > Any and all pointers appreciated, > > Rob, > > What about partitioning the table based on the group. Then you could > put the two pseudo groups in their own separate child table. > > Cheers, > Ken > Ken, Interesting point. I'm hoping the psuedo groups are on the verge of extinction but will certainly investigate the possibility. R.