AlexK987 <alex.cue....@gmail.com> writes:
> This is a realistic case: everyone have Python and Java skills, but PostGis
> and Haskell and Closure are rare. If we are looking for a person that has
> all the skills required for a task (array[1, 15]), that is "skills <@
> array[1, 15] " and not the opposite, right?

One of us has this backwards.  It might be me, but I don't think so.
Consider a person who has the two desired skills plus skill #42:

regression=# select array[1,15,42] <@ array[1,15];
 ?column? 
----------
 f
(1 row)

regression=# select array[1,15,42] @> array[1,15];
 ?column? 
----------
 t
(1 row)

> Also can you explain why " entries for "0" and "1" swamp everything else so
> that the planner 
> doesn't know that eg "15" is really rare. " I thought that if a value is not
> found in the histogram, than clearly that value is rare, correct? What am I
> missing here?

The problem is *how* rare.  The planner will take the lowest frequency
seen among the most common elements as an upper bound for the frequency of
unlisted elements --- but if all you have in the stats array is 0 and 1,
and they both have frequency 1.0, that doesn't tell you anything.  And
that's what I see for this example:

regression=# select most_common_elems,most_common_elem_freqs from pg_stats 
where tablename = 'talent' and attname = 'skills';
 most_common_elems | most_common_elem_freqs 
-------------------+------------------------
 {0,1}             | {1,1,1,1,0}
(1 row)

With a less skewed distribution, that rule of thumb would work better :-(

                        regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to