Re: [SQL] Is there a faster way to do this?

Richard Huxton Tue, 15 Jun 2004 13:29:13 -0700

Patrick Hatcher wrote:


Scenario:  Each day I update a column in a table with an internal
percentile value.  To arrive at  this value, I'll get a count of records
with sales > 0 and then divide this count by the total number of tile
groups I want.  So for example:
Total records w/sales > 0 = 730,000
 tile# = 100
total percentile groups (730,000 / 100):7300

Now that I have the total number of groups I need, I cycle through my
recordset, grab the key field and the current percentile number and stuff
the values into a temp table. I mod the current row counter against the
total percentile group number.  If it is 0 then I add 1 to the  current
percentile number .  After inserting records into the temp file I then
update the main table.

Using the example above, the first 7300 records would get a  percentile
number of 1, the next 7300  records would get  a percentile number of 2,
then 3,4,5,etc.

Unfortunately, I am going record by record in a loop and the process takes
upwards of 20mins.  Is there a faster way to do this?  I thought about
using limit and offset, but I'm not sure how I would accomplish it.

Idea #1: Well, if you ordered by something unique, you could use ORDER BY, LIMIT and OFFSET

UPDATE my_table SET my_group=1 WHERE val > 0 ORDER BY val, my_id_column OFFSET 0 LIMIT 7300; UPDATE my_table SET my_group=2 WHERE val > 0 ORDER BY val, my_id_column OFFSET 7300 LIMIT 7300; etc.

You'll need to make sure the order is guaranteed for all rows when "val" is the same though.

Now, the problem with this is that by the time you get to group 99, you've had to scan over group 1 98 times - not good. If you knew how many rows there were for any given "val" then you could restrict it much more though.

Idea #2: Turn your function around. Declare a cursor on the sorted SELECT of target-rows. SKIP 7300 rows and read the primary-key. Use that to issue an update of all rows between two values. That's only 100 updates issued rather than one for each target row.

Idea #3: Someone might well be able to come up with a clever idea involving a join against a set-returning function, but I'm not sure about guaranteeing the order of the join vs the returned set (and it's getting late here). Any ideas people?

Maybe one of those is some use
--
  Richard Huxton
  Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
     subscribe-nomail command to [EMAIL PROTECTED] so that your
     message can get through to the mailing list cleanly

Re: [SQL] Is there a faster way to do this?

Reply via email to