Re: [PERFORM] Highly Efficient Custom Sorting

2010-07-03 Thread Merlin Moncure
On Fri, Jul 2, 2010 at 11:17 PM, Eliot Gable
egable+pgsql-performa...@gmail.com wrote:
 Well, I re-wrote the algorithm in Perl. However, it did not solve the speed
 issue. Running time now is a whopping 240+ ms instead of the 31.8ms I was
 getting before (15.2 of which is sorting). Here is the Perl code on the
 sorting. I won't post the pl/pgsql code, because this is far more clear (in
 my opinion) on what the algorithm does:
 DROP TYPE IF EXISTS glbtype CASCADE;
 CREATE TYPE glbtype AS (
 id INTEGER,
 group TEXT,
 priority INTEGER,
 weight INTEGER
 );
 DROP TYPE IF EXISTS glbtype_result CASCADE;
 CREATE TYPE glbtype_result AS (
 id INTEGER,
 priority INTEGER,
 weight INTEGER,
 order BIGINT
 );
 CREATE OR REPLACE FUNCTION GroupedRandomWeightedLB(glbtype[]) RETURNS SETOF
 glbtype_result AS

ok, I didn't take the time to read your implementation and completely
understand it, but it looks like you're looking at a N^2 sorting at
best.

You probably want to do something like this (it might not be quite
right, you need to explain what each of your input array fields is
supposed to represent):
CREATE OR REPLACE FUNCTION GroupedRandomWeightedLB(glbtype[]) RETURNS SETOF
glbtype_result AS
$$
  with g as (select unnest(glbtype) as t)
select array(select ((t).id, (t).priority) (t).weight), 0)::glbtype_result
  from g order by (t).group, (t).priority, random() * (t).weight);
$$ language sql;

(not sure what order is, is that the rownum, can't that just be
inferred from the array position?)

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Highly Efficient Custom Sorting

2010-07-03 Thread Eliot Gable
Read RFC 2782 on random weighted load balancing of SRV records inside DNS.
That algorithm is what I need implemented, but with an extension. I have
groups of records I need to have the algorithm applied to where each group
is treated separately from the others. I understand the operational
complexity of what I'm doing. It is more like N^3, or more precisely G*P*W
where G is the number of groups, P the number of priorities per group, and W
the number of different weights per priority. But, the complexity of the
algorithm means nothing in terms of performance  or run time because it will
only ever deal with very small sets of records (maybe 20 rows of data,
tops). Even if the algorithm were N^4, it wouldn't matter with that few
records. But, more importantly, there are constraints in how the data is
sub-divided. Primarily, G  P  W. Further, G and P are groupings which
subdivide the entire set of data and the groups do not have overlapping
data. So, maybe it's more like N^2.2 or something. But, regardless, we're
only talking about 20 rows, tops.

The issue is how efficiently the languages can deal with arrays. In Perl, I
have to parse a string into an array of data, then break it up into sub
arrays inside associative arrays just to work with the input. I also have to
splice the array to remove elements, which I don't think is very efficient.
Any way I could come up with of removing elements involved rebuilding the
entire array. The same thing goes for pl/pgsql. Dealing with arrays there is
also not very efficient. I do a lot of constructing of arrays from sets of
data using myvar = array(select blah);. While pl/pgsql was considerably
faster than Perl, it cannot come close to what I did in C++ using a hash of
a hash of a linked list. The two hash tables provide my groupings and the
linked list gives me something that is highly efficient for removing
elements as I pick them.

I've looked through the documentation on how to re-write this in C, but I
cannot seem to find good documentation on working with the input array
(which is an array of a complex type). I also don't see good documentation
for working with the complex type. I found stuff that talks about
constructing a complex type in C and returning it. However, I'm not sure how
to take an input complex type and deconstruct it into something I can work
with in C. Also, the memory context management stuff is not entirely clear.
Specifically, how do I go about preserving the pointers to the data that I
allocate in multi-call memory context so that they still point to the data
on the next call to the function for the next result row? Am I supposed to
set up some global variables to do that, or am I supposed to take a
different approach? If I need to use global variables, then how do I deal
with concurrency?

On Sat, Jul 3, 2010 at 2:08 PM, Merlin Moncure mmonc...@gmail.com wrote:

 On Fri, Jul 2, 2010 at 11:17 PM, Eliot Gable
 egable+pgsql-performa...@gmail.com egable%2bpgsql-performa...@gmail.com
 wrote:
  Well, I re-wrote the algorithm in Perl. However, it did not solve the
 speed
  issue. Running time now is a whopping 240+ ms instead of the 31.8ms I was
  getting before (15.2 of which is sorting). Here is the Perl code on the
  sorting. I won't post the pl/pgsql code, because this is far more clear
 (in
  my opinion) on what the algorithm does:
  DROP TYPE IF EXISTS glbtype CASCADE;
  CREATE TYPE glbtype AS (
  id INTEGER,
  group TEXT,
  priority INTEGER,
  weight INTEGER
  );
  DROP TYPE IF EXISTS glbtype_result CASCADE;
  CREATE TYPE glbtype_result AS (
  id INTEGER,
  priority INTEGER,
  weight INTEGER,
  order BIGINT
  );
  CREATE OR REPLACE FUNCTION GroupedRandomWeightedLB(glbtype[]) RETURNS
 SETOF
  glbtype_result AS

 ok, I didn't take the time to read your implementation and completely
 understand it, but it looks like you're looking at a N^2 sorting at
 best.

 You probably want to do something like this (it might not be quite
 right, you need to explain what each of your input array fields is
 supposed to represent):
 CREATE OR REPLACE FUNCTION GroupedRandomWeightedLB(glbtype[]) RETURNS SETOF
 glbtype_result AS
 $$
  with g as (select unnest(glbtype) as t)
select array(select ((t).id, (t).priority) (t).weight),
 0)::glbtype_result
  from g order by (t).group, (t).priority, random() * (t).weight);
 $$ language sql;

 (not sure what order is, is that the rownum, can't that just be
 inferred from the array position?)

 merlin




-- 
Eliot Gable

We do not inherit the Earth from our ancestors: we borrow it from our
children. ~David Brower

I decided the words were too conservative for me. We're not borrowing from
our children, we're stealing from them--and it's not even considered to be a
crime. ~David Brower

Esse oportet ut vivas, non vivere ut edas. (Thou shouldst eat to live; not
live to eat.) ~Marcus Tullius Cicero


Re: [PERFORM] Highly Efficient Custom Sorting

2010-07-03 Thread Merlin Moncure
On Sat, Jul 3, 2010 at 4:17 PM, Eliot Gable
egable+pgsql-performa...@gmail.com wrote:
 Read RFC 2782 on random weighted load balancing of SRV records inside DNS.
 That algorithm is what I need implemented, but with an extension. I have
 groups of records I need to have the algorithm applied to where each group
 is treated separately from the others. I understand the operational
 complexity of what I'm doing. It is more like N^3, or more precisely G*P*W
 where G is the number of groups, P the number of priorities per group, and W
 the number of different weights per priority. But, the complexity of the
 algorithm means nothing in terms of performance  or run time because it will
 only ever deal with very small sets of records (maybe 20 rows of data,
 tops). Even if the algorithm were N^4, it wouldn't matter with that few
 records. But, more importantly, there are constraints in how the data is
 sub-divided. Primarily, G  P  W. Further, G and P are groupings which
 subdivide the entire set of data and the groups do not have overlapping
 data. So, maybe it's more like N^2.2 or something. But, regardless, we're
 only talking about 20 rows, tops.
 The issue is how efficiently the languages can deal with arrays. In Perl, I
 have to parse a string into an array of data, then break it up into sub
 arrays inside associative arrays just to work with the input. I also have to
 splice the array to remove elements, which I don't think is very efficient.
 Any way I could come up with of removing elements involved rebuilding the
 entire array. The same thing goes for pl/pgsql. Dealing with arrays there is
 also not very efficient. I do a lot of constructing of arrays from sets of
 data using myvar = array(select blah);. While pl/pgsql was considerably
 faster than Perl, it cannot come close to what I did in C++ using a hash of
 a hash of a linked list. The two hash tables provide my groupings and the
 linked list gives me something that is highly efficient for removing
 elements as I pick them.
 I've looked through the documentation on how to re-write this in C, but I
 cannot seem to find good documentation on working with the input array
 (which is an array of a complex type). I also don't see good documentation
 for working with the complex type. I found stuff that talks about
 constructing a complex type in C and returning it. However, I'm not sure how
 to take an input complex type and deconstruct it into something I can work
 with in C. Also, the memory context management stuff is not entirely clear.
 Specifically, how do I go about preserving the pointers to the data that I
 allocate in multi-call memory context so that they still point to the data
 on the next call to the function for the next result row? Am I supposed to
 set up some global variables to do that, or am I supposed to take a
 different approach? If I need to use global variables, then how do I deal
 with concurrency?

please stop top posting.

What about my suggestion doesn't work for your requirements?  (btw,
let me disagree with my peers and state pl/perl is lousy for this type
of job, only sql/and pl/sql can interact with postgresql variables
natively for the most part).

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Highly Efficient Custom Sorting

2010-07-03 Thread Alvaro Herrera
Excerpts from Merlin Moncure's message of sáb jul 03 18:53:46 -0400 2010:

 What about my suggestion doesn't work for your requirements?  (btw,
 let me disagree with my peers and state pl/perl is lousy for this type
 of job, only sql/and pl/sql can interact with postgresql variables
 natively for the most part).

IIRC the other reason pl/perl sucks for this kind of thing is that it
forces a subtransaction to be created before the function call, which is
expensive.  (I might be misremembering and what actually causes a
subtransaction is a SPI call inside a PL/Perl function, which wouldn't
apply here.)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance