[SQL] Question on partitioning

2008-08-21 Thread Oliveiros Cristina
Hello , All.

I am not sure if this is the right mailing list to place this question.
If it doesn't, please kindly redirect me to the right list.

I have a giant table with about 10,6 million records.

Queries on it are usually slow, and if I try to do something more elaborate 
like an INNER JOIN with itself it becomes unnacceptably slow.

I am looking for a way to improve performance.
One of the columns is of type date. Each "day" includes about a few tens of 
thousands records
And the older  a date is the less likely I am to do queries on it.

The objective of the "self join" is to compare data from two different days, 
looking for diferences.

Ive read that one of the benefits of partitioning is to speed up queries by 
separating less used records.

My question is if partitioning can be a good way to make the queries faster 
(specially the self joins) or if it isn't worth trying because it doesn't help 
on my particular situation.

Please kindly advice me on this

Many thanks in advance for your kind help

Best,
Oliveiros


Re: [SQL] Question on partitioning

2008-08-21 Thread Scott Marlowe
On Thu, Aug 21, 2008 at 8:25 AM, Oliveiros Cristina
<[EMAIL PROTECTED]> wrote:
> Hello , All.
>
> I am not sure if this is the right mailing list to place this question.
> If it doesn't, please kindly redirect me to the right list.
>
> I have a giant table with about 10,6 million records.

10.6 million rows isn't really gigantic.  It's a good size, but unless
you're hitting a lot of the records at once, you should be able to get
pretty good performance if it's indexed properly.

> Queries on it are usually slow, and if I try to do something more elaborate
> like an INNER JOIN with itself it becomes unnacceptably slow.
>
> I am looking for a way to improve performance.
> One of the columns is of type date. Each "day" includes about a few tens of
> thousands records
> And the older  a date is the less likely I am to do queries on it.

Well, first let's look at your indexes and if they're being used.
Got an explain analyze output of your slow queries for us to look at?

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Question on partitioning

2008-08-21 Thread Mark Roberts

On Thu, 2008-08-21 at 15:25 +0100, Oliveiros Cristina wrote:
> Hello , All.
>  
> I am not sure if this is the right mailing list to place this
> question.
> If it doesn't, please kindly redirect me to the right list.
>  
> I have a giant table with about 10,6 million records.
>  
> Queries on it are usually slow, and if I try to do something more
> elaborate like an INNER JOIN with itself it becomes unnacceptably
> slow.
>  
> I am looking for a way to improve performance.
> One of the columns is of type date. Each "day" includes about a few
> tens of thousands records
> And the older  a date is the less likely I am to do queries on it.
>  
> The objective of the "self join" is to compare data from two different
> days, looking for diferences.
>  
> Ive read that one of the benefits of partitioning is to speed up
> queries by separating less used records.
>  
> My question is if partitioning can be a good way to make the queries
> faster (specially the self joins) or if it isn't worth trying because
> it doesn't help on my particular situation.
>  
> Please kindly advice me on this
>  
> Many thanks in advance for your kind help
>  
> Best,
> Oliveiros

I would expect partitioning to work.  I've heard tell that fine grained
partitioning coupled with check constraints can even eliminate the need
for certain indexes.  I do know that check constraints on the date will
help you tremendously if you decide to partition.

-Mark


-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Re: Efficiently determining the number of bits set in the contents of, a VARBIT field

2008-08-21 Thread Allan Kamau
Thank you TJ and everyone else for the advise and the c code. Today I 
did finally return to the 'number of bits set challenge' and managed to 
compile and link the nbits c function which went smoothly. However the 
function does crash my postgres server installation (8.3.3) with a 
segmentation fault each time I call it for example SELECT 
nbits_set(B'1101');
My C skills are very sparse and am unable to debug the function, I have 
included the C code of this function. Is there something I may have left 
out?




#include "postgres.h"
#include "utils/varbit.h"
#include "fmgr.h"
#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

PG_FUNCTION_INFO_V1(nbits_set);
Datum
nbits_set(PG_FUNCTION_ARGS)
{
/* how many bits are set in a bitstring? */
   VarBit *a = PG_GETARG_VARBIT_P(0);
   int n=0;
   int i;
   unsigned char *ap = VARBITS(a);
   unsigned char aval;
   for (i=0; i < VARBITBYTES(a); ++i) {
   aval = *ap; ++ap;
   if (aval == 0) continue;
   if (aval & 1) ++n;
   if (aval & 2) ++n;
   if (aval & 4) ++n;
   if (aval & 8) ++n;
   if (aval & 16) ++n;
   if (aval & 32) ++n;
   if (aval & 64) ++n;
   if (aval & 128) ++n;
   }
   PG_RETURN_INT32(n);
}

Allan

Bruce Momjian wrote:

Jean-David Beyer wrote:
  

TJ O'Donnell wrote:


I use a c function, nbits_set that will do what you need.
I've posted the code in this email.

TJ O'Donnell
http://www.gnova.com

#include "postgres.h"
#include "utils/varbit.h"

Datum   nbits_set(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(nbits_set);
Datum
nbits_set(PG_FUNCTION_ARGS)
{
/* how many bits are set in a bitstring? */

 VarBit *a = PG_GETARG_VARBIT_P(0);
 int n=0;
 int i;
 unsigned char *ap = VARBITS(a);
 unsigned char aval;
 for (i=0; i < VARBITBYTES(a); ++i) {
 aval = *ap; ++ap;
 if (aval == 0) continue;
 if (aval & 1) ++n;
 if (aval & 2) ++n;
 if (aval & 4) ++n;
 if (aval & 8) ++n;
 if (aval & 16) ++n;
 if (aval & 32) ++n;
 if (aval & 64) ++n;
 if (aval & 128) ++n;
 }
 PG_RETURN_INT32(n);
}



  

Hi all,
Am looking for a fast and efficient way to count the number of bits set 
(to 1) in a VARBIT field. I am currently using 
"LENGTH(REGEXP_REPLACE(CAST(a.somefield_bit_code AS TEXT),'0','','g'))".


Allan.

  

When I had to do that, in days with smaller amounts of RAM, but very long
bit-vectors, I used a faster function sort-of like this:

static char table[256] = {
0,1,1,2,1,2,2,3,1,.
};

Then like above, but instead of the loop,

n+= table[aval];


You get the idea.



Uh, I was kind of confused by this, even when I saw a full
implementation:

http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable

Actually, this looks even better:


http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan

  



--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql