Re: [SQL] Is there a faster way to do this?

Jean-Luc Lachance Tue, 15 Jun 2004 13:44:18 -0700

One way to do it would be to:

Not put the percentile in the sales table;

Create an percentile table with a foreign key to the sales table primary key and percentile int4:

CREATE TABLE percentiles(
fkey <whatever> PRIMARY KEY REFERENCES sales( <primary key>),
percentile INT4 );

Create a sequence for that ancillary table:
CREATE SEQUENCE percent_seq;

When ready to create the percentiles, truncate the percentile table and reset the sequence next value:

TRUNCATE percentiles; setval( 'percent_seq', 0);

Now query your sales with the proper order by:

INSERT INTO percentiles SELECT pk_sales, nextval( 'percent_seq') / tile_size FROM sales ORDER BY sales_value;

HTH

Patrick Hatcher wrote:


pg: 7.4.2
RedHat 7.2

Can I get some advice on a possible faster way of doing this:

Scenario:  Each day I update a column in a table with an internal
percentile value.  To arrive at  this value, I'll get a count of records
with sales > 0 and then divide this count by the total number of tile
groups I want.  So for example:
Total records w/sales > 0 = 730,000
 tile# = 100
total percentile groups (730,000 / 100):7300

Now that I have the total number of groups I need, I cycle through my
recordset, grab the key field and the current percentile number and stuff
the values into a temp table. I mod the current row counter against the
total percentile group number.  If it is 0 then I add 1 to the  current
percentile number .  After inserting records into the temp file I then
update the main table.

Using the example above, the first 7300 records would get a  percentile
number of 1, the next 7300  records would get  a percentile number of 2,
then 3,4,5,etc.

Unfortunately, I am going record by record in a loop and the process takes
upwards of 20mins.  Is there a faster way to do this?  I thought about
using limit and offset, but I'm not sure how I would accomplish it.

Below is the function I currently use.  Thanks for any help provided

CREATE OR REPLACE FUNCTION cdm.percentile_calc()
  RETURNS text AS
'DECLARE
     v_interval int4;
     v_tile int4;
     v_percentile int4;
     v_check int4;
     v_count int4;
     v_rowcount int4;
     myRec  RECORD;

BEGIN
     v_count:=0;
     v_tile:= 100;
     v_percentile:=1;
     v_rowcount :=1;
     v_check:=0;


     /* Get count of records with val_purch > 0 */
     select into v_count count(*)  from cdm.cdm_indiv_mast where
val_purch_com >0;

     /* this number will be used as part of our MOD to tell when to add one
to our percentile */
     v_interval := v_count / v_tile;

 CREATE TEMP TABLE cdmperct (f1 int8, f2 int2);


     FOR myRec IN  select indiv_key from cdm.cdm_indiv_mast where
val_purch_com  >0 order by val_purch_com desc  LOOP
      INSERT INTO cdmperct values (myRec.indiv_key,v_percentile);
      v_check = mod(v_rowcount,v_interval);
      IF v_check = 0 THEN
       v_percentile:=v_percentile+1;
      END IF;
      v_rowcount:= v_rowcount+1;
     END LOOP;


       UPDATE cdm.cdm_indiv_mast SET percentiler = f2 from  cdmperct where
indiv_key = f1;


     DROP TABLE cdmperct;
     RETURN  \'DONE\';
END; '
  LANGUAGE 'plpgsql' IMMUTABLE;

Patrick Hatcher
Macys.Com


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Re: [SQL] Is there a faster way to do this?

Reply via email to