[GENERAL] Statistics and PostgreSQL: Streaming Webcast tonight
Folks, For those of you who can't attend in person, we'll be streaming audio and video and having a chat for tonight's SFPUG meeting on how the planner uses statistics. Video: http://media.postgresql.org/sfpug/streaming Chat: irc://irc.freenode.net/sfpug Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Statistics with PostgreSQL
Mean is just sum(col)/count(col) You can also just use avg(col). Either way, be careful because nulls may not be treated as you want for such calculations. The stats package R can access Postgres databases, and can be used for robust statistical analyses of the data. See: http://sourceforge.net/projects/rdbi/ Cheers, Brent Wood ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Statistics with PostgreSQL
Hi All, Is there a way to simple statistics like mean/median/mode in PostgreSQL. I have tables like PsetID | IntensityValue. I want to find out mean (intensityValue) of some PsetID(s)?! Any urls/pointers/books would be a big help. Thanks, Hrishi ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Statistics with PostgreSQL
On Fri, Mar 18, 2005 at 01:37:10PM -0500, Hrishikesh Deshmukh wrote: Hi All, Is there a way to simple statistics like mean/median/mode in PostgreSQL. I have tables like PsetID | IntensityValue. I want to find out mean (intensityValue) of some PsetID(s)?! Any urls/pointers/books would be a big help. Hrishi, For statistics beyond avg() and stddev(), check out PL/R http://www.joeconway.com/plr/ Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote! ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] Statistics with PostgreSQL
Mean is just sum(col)/count(col) Mode can be calculated with having, max, count Median can be computed by sorting, using a cursor, and going to the middle. There are more efficient and better (more accurate) ways to do it, but those have to be implemented at a low level. Of course, since you have libpq, anything is possible. If you want to implement these things at a low level to get better answers, Kahan (or compenstated) summation is a good idea, and do the summation into a larger type to prevent loss of precision. Here are some statistical templates I wrote that are free for any purpose you like: ftp://cap.connx.com/pub/tournament_software/Kahan.Hpp ftp://cap.connx.com/pub/tournament_software/STATS.HPP The Cephes collection by Moshier has good extended precision types, if you need to carefully avoid any PLOSS. For median, the QuickSelect algorithm is very good. Here is an implementation I wrote in C++: #include cstdio #include cstdlib #include iostream using namespace std; /* ** ** In the following code, every reference to CLR means: ** **Introduction to Algorithms **By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest **ISBN 0-07-013143-0 */ /* ** CLR, page 187 */ template class Etype Etype RandomSelect(Etype A[], size_t p, size_t r, size_t i) { size_t q, k; if (p == r) return A[p]; q = RandomPartition(A, p, r); k = q - p + 1; if (i = k) return RandomSelect(A, p, q, i); else return RandomSelect(A, q + 1, r, i - k); } size_t RandRange(size_t a, size_t b) { size_t c = (size_t) ((double) rand() / ((double) RAND_MAX + 1) * (b - a)); return c + a; } /* ** CLR, page 162 */ template class Etype size_t RandomPartition(Etype A[], size_t p, size_t r) { size_t i = RandRange(p, r); Etype Temp; Temp = A[p]; A[p] = A[i]; A[i] = Temp; return Partition(A, p, r); } /* ** CLR, page 154 */ template class Etype size_t Partition(Etype A[], size_t p, size_t r) { Etype x, temp; size_t i, j; x = A[p]; i = p - 1; j = r + 1; for (;;) { do { j--; } while (!(A[j] = x)); do { i++; } while (!(A[i] = x)); if (i j) { temp = A[i]; A[i] = A[j]; A[j] = temp; } else return j; } } double data[30]; int main(void) { size_t i; size_t size = sizeof(data) / sizeof(data[0]); for (i = 0; i size; i++) { data[i] = rand(); } for (i = 0; i size; i++) { cout data[i] endl; } cout 1st item is RandomSelect(data, 0, size - 1, 0) endl; cout 2nd item is RandomSelect(data, 0, size - 1, 1) endl; cout 3rd item is RandomSelect(data, 0, size - 1, 2) endl; for (i = 4; i size; i++) cout i th item is RandomSelect(data, 0, size - 1, i) endl; return 0; } -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Hrishikesh Deshmukh Sent: Friday, March 18, 2005 10:37 AM To: Postgresql-General Subject: [GENERAL] Statistics with PostgreSQL Hi All, Is there a way to simple statistics like mean/median/mode in PostgreSQL. I have tables like PsetID | IntensityValue. I want to find out mean (intensityValue) of some PsetID(s)?! Any urls/pointers/books would be a big help. Thanks, Hrishi ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Statistics with PostgreSQL
/* On the test stub: */ [snip] double data[30]; int main(void) { size_t i; size_t size = sizeof(data) / sizeof(data[0]); for (i = 0; i size; i++) { data[i] = rand(); } for (i = 0; i size; i++) { cout data[i] endl; } cout 1st item is RandomSelect(data, 0, size - 1, 0) endl; cout 2nd item is RandomSelect(data, 0, size - 1, 1) endl; cout 3rd item is RandomSelect(data, 0, size - 1, 2) endl; for (i = 4; i size; i++) cout i th item is RandomSelect(data, 0, size - 1, i) endl; return 0; } /* The positions are cardinal, so the first three queries above should be: cout 1st item is RandomSelect(data, 0, size - 1, 1) endl; cout 2nd item is RandomSelect(data, 0, size - 1, 2) endl; cout 3rd item is RandomSelect(data, 0, size - 1, 3) endl; */ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly