[GENERAL] Statistics and PostgreSQL: Streaming Webcast tonight

2009-09-08 Thread David Fetter
Folks,

For those of you who can't attend in person, we'll be streaming audio
and video and having a chat for tonight's SFPUG meeting on how the
planner uses statistics.

Video:

http://media.postgresql.org/sfpug/streaming

Chat:

irc://irc.freenode.net/sfpug

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Statistics with PostgreSQL

2005-03-20 Thread Brent Wood


 Mean is just sum(col)/count(col)

You can also just use avg(col).

Either way, be careful because nulls may not be treated as you want for
such calculations.

The stats package R can access Postgres databases, and can be used for
robust statistical analyses of the data.

See:
http://sourceforge.net/projects/rdbi/


Cheers,

  Brent Wood

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[GENERAL] Statistics with PostgreSQL

2005-03-18 Thread Hrishikesh Deshmukh
Hi All,

Is there a way to simple statistics like mean/median/mode in PostgreSQL.
I have tables like PsetID | IntensityValue. I want to find out mean
(intensityValue) of some PsetID(s)?!
Any urls/pointers/books would be a big help.

Thanks,
Hrishi

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Statistics with PostgreSQL

2005-03-18 Thread David Fetter
On Fri, Mar 18, 2005 at 01:37:10PM -0500, Hrishikesh Deshmukh wrote:
 Hi All,
 
 Is there a way to simple statistics like mean/median/mode in
 PostgreSQL.  I have tables like PsetID | IntensityValue. I want to
 find out mean (intensityValue) of some PsetID(s)?!

 Any urls/pointers/books would be a big help.

Hrishi,

For statistics beyond avg() and stddev(), check out PL/R
http://www.joeconway.com/plr/

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [GENERAL] Statistics with PostgreSQL

2005-03-18 Thread Dann Corbit
Mean is just sum(col)/count(col)
Mode can be calculated with having, max, count
Median can be computed by sorting, using a cursor, and going to the
middle.

There are more efficient and better (more accurate) ways to do it, but
those have to be implemented at a low level.  Of course, since you have
libpq, anything is possible.

If you want to implement these things at a low level to get better
answers, Kahan (or compenstated) summation is a good idea, and do the
summation into a larger type to prevent loss of precision.

Here are some statistical templates I wrote that are free for any
purpose you like:
ftp://cap.connx.com/pub/tournament_software/Kahan.Hpp
ftp://cap.connx.com/pub/tournament_software/STATS.HPP

The Cephes collection by Moshier has good extended precision types, if
you need to carefully avoid any PLOSS.

For median, the QuickSelect algorithm is very good.  Here is an
implementation I wrote in C++:

#include cstdio
#include cstdlib
#include iostream

using namespace std;

/*
**
** In the following code, every reference to CLR means:
**
**Introduction to Algorithms
**By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest
**ISBN 0-07-013143-0
*/


/*
** CLR, page 187
*/
template  class Etype 
Etype   
RandomSelect(Etype A[], size_t p, size_t r, size_t i)
{
size_t  q,
k;
if (p == r)
return A[p];
q = RandomPartition(A, p, r);
k = q - p + 1;

if (i = k)
return RandomSelect(A, p, q, i);
else
return RandomSelect(A, q + 1, r, i - k);
}

size_t  RandRange(size_t a, size_t b)
{
size_t  c = (size_t) ((double) rand() / ((double) RAND_MAX +
1) * (b - a));
return c + a;
}

/*
** CLR, page 162
*/
template  class Etype 
size_t  
RandomPartition(Etype A[], size_t p, size_t r)
{
size_t  i = RandRange(p, r);
Etype   Temp;
Temp = A[p];
A[p] = A[i];
A[i] = Temp;
return Partition(A, p, r);
}

/*
** CLR, page 154
*/
template  class Etype 
size_t  
Partition(Etype A[], size_t p, size_t r)
{
Etype   x,
temp;
size_t  i,
j;

x = A[p];
i = p - 1;
j = r + 1;

for (;;) {
do {
j--;
} while (!(A[j] = x));
do {
i++;
} while (!(A[i] = x));
if (i  j) {
temp = A[i];
A[i] = A[j];
A[j] = temp;
} else
return j;
}
}

double   data[30];
int main(void)
{
size_t  i;
size_t  size = sizeof(data) / sizeof(data[0]);
for (i = 0; i  size; i++) {
data[i] = rand();
}


for (i = 0; i  size; i++) {
cout  data[i]  endl;
}

cout  1st item is   RandomSelect(data, 0, size - 1, 0) 
endl;
cout  2nd item is   RandomSelect(data, 0, size - 1, 1) 
endl;
cout  3rd item is   RandomSelect(data, 0, size - 1, 2) 
endl;
for (i = 4; i  size; i++)
cout   i  th item is   RandomSelect(data, 0, size - 1,
i)  endl;
return 0;
}

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Hrishikesh
Deshmukh
Sent: Friday, March 18, 2005 10:37 AM
To: Postgresql-General
Subject: [GENERAL] Statistics with PostgreSQL

Hi All,

Is there a way to simple statistics like mean/median/mode in PostgreSQL.
I have tables like PsetID | IntensityValue. I want to find out mean
(intensityValue) of some PsetID(s)?!
Any urls/pointers/books would be a big help.

Thanks,
Hrishi

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [GENERAL] Statistics with PostgreSQL

2005-03-18 Thread Dann Corbit
/* On the test stub: */
[snip]
double   data[30];
int main(void)
{
size_t  i;
size_t  size = sizeof(data) / sizeof(data[0]);
for (i = 0; i  size; i++) {
data[i] = rand();
}


for (i = 0; i  size; i++) {
cout  data[i]  endl;
}

cout  1st item is   RandomSelect(data, 0, size - 1, 0) 
endl;
cout  2nd item is   RandomSelect(data, 0, size - 1, 1) 
endl;
cout  3rd item is   RandomSelect(data, 0, size - 1, 2) 
endl;
for (i = 4; i  size; i++)
cout   i  th item is   RandomSelect(data, 0, size - 1,
i)  endl;
return 0;
}

/*
The positions are cardinal, so the first three queries above should be:
cout  1st item is   RandomSelect(data, 0, size - 1, 1) 
endl;
cout  2nd item is   RandomSelect(data, 0, size - 1, 2) 
endl;
cout  3rd item is   RandomSelect(data, 0, size - 1, 3) 
endl;
*/


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly