[HACKERS] How do I bump a row to the front of sort efficiently

2015-02-01 Thread Sam Saffron
I have this query:

select * from topics
order by case when id=1 then 0 else 1 end, bumped_at desc
limit 30

It works fine, bumps id 1 to the front of the sort fine but is
terribly inefficient and scans

OTH

"select * from topics where id = 1" is super fast

"select * from topics order by bumped_at desc limit 30" is super fast

Even this is fast, and logically equiv as id is primary key unique

select * from topic
where id = 1000
union all
select * from (
  select * from topics
  where id <> 1000
  order by bumped_at desc
  limit 30
) as x
limit 30


However, the contortions on the above query make it very un-ORM
friendly as I would need to define a view for it but would have no
clean way to pass limits and offsets in.

Is there any clean technique to bump up particular rows to the front
of a sort if a certain condition is met without paying a huge
performance hit?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4

2014-11-17 Thread Sam Saffron
One interesting option would be kicking in an extra more expensive
planning cycle after the Nth run of the query, in general a lot of
these planned queries run 1000s of times, if you add some extra cost
to run 100 it may not be prohibitive cost wise.

On Tue, Nov 18, 2014 at 8:27 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Nov 13, 2014 at 7:34 PM, Tom Lane  wrote:
>>> One thing that occurs to me is that if the generic plan estimate comes
>>> out much cheaper than the custom one, maybe we should assume that the
>>> generic's cost estimate is bogus.  Right offhand I can't think of a reason
>>> for a custom plan to look worse than a generic one, unless there's a
>>> statistical quirk like this one.
>
>> That's an interesting idea, but what do we do after deciding that it's
>> bogus?
>
> Keep using custom plans.  It's possible that the estimate that's in error
> is the custom one, but that's not the way to bet IMO, since the custom
> plan estimate is based on better information.
>
>> The generic plan really can't be cheaper than the custom plan,
>> but it could be the same price, or as close as makes no difference.
>
> Right, and what we want to do is use the generic plan as long as it's
> close to the same cost (close enough to not justify replanning effort).
> The trick here is to not be fooled by estimation errors.  Can we assume
> that generic cost < custom cost is always an estimation error?
>
> Another idea that occurred to me is to run a planning cycle in which the
> actual parameter values are made available to the planner, but as
> estimates not hard constants (this facility already exists, it's just not
> being used by plancache.c).  This would yield cost estimates that are more
> safely comparable to the custom plan.  But I'm not sure that we'd want to
> expend yet another planning cycle to do this, nor am I sure that we'd want
> to use such a plan as The Generic Plan.
>
> regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4

2014-11-13 Thread Sam Saffron
Thank you so much!

So to recap the general way to reproduce this issue is:

create table products(id int primary key, type varchar);
insert into products select generate_series(1,1), 'aaa';
insert into products select generate_series(10001,2), 'bbb';
create index idx on products(type);
prepare stmt as select * from products where type = $1 limit 1;

Which quickly devolves into:

 explain analyze execute stmt ('ccc');

  QUERY PLAN
--
 Limit  (cost=0.00..0.03 rows=1 width=8) (actual time=1.821..1.821
rows=0 loops=1)
   ->  Seq Scan on products  (cost=0.00..339.00 rows=1 width=8)
(actual time=1.819..1.819 rows=0 loops=1)
 Filter: ((type)::text = $1)
 Rows Removed by Filter: 2
 Total runtime: 1.843 ms
(5 rows)



So if I am searching for 'ccc' eventually the prepared plan
"optimises" and uses the better mechanism of just scanning the table
to find the first hit which is what the statistics suggest.

However a fairly common pattern I use it to check for "lack of
presence" of a value.

For example: if the product type 'ccc' is not in the table do this.
Unfortunately the optimiser deoptimises this class of operation.

I tried the exact example above with an "int" instead of a "varchar"
in the "type" column and was not able to reproduce the issue, I wonder
if there is some sort of different handling for strings vs numbers.

Unfortunately my actual table in play has a rather bad schema, the
"archetype" column really should be an int. That said we only have 2
general archetypes at the moment (private message and topic) and the
occasional single "banner" outlier, which may or may not be there. So
the data modelling is pretty hostile to performance. I have some ideas
on how to solve my particular problem but I do have some general
concerns.

Ruby on Rails is just about to ship a new version that heavily changes
the mechanics of query execution. For example, Product.where(name:
"foo").first will now result in a prepared query whereas in the past
it would just send the raw query. Overall this approach is better and
saner, but my general concern is that our API offers no "escape hatch"
for these outlier conditions. You can disable globally, but can not
just disable for a single call. I will raise this particular concern
to the team.

My second question/concern is that I feel I am totally
misunderstanding the changes to 'plancache.c', I thought that the
decision of the plan to use was purely based on the value sent in to
the prepared query. However it seems that the planner completely
ignores the value in some steps. (so, for example I was thinking that
"aaa" and "ccc" would result in completely different plans)

Thank you so much for your time, patience and general awesomeness




On Fri, Nov 14, 2014 at 11:34 AM, Tom Lane  wrote:
> Sam Saffron  writes:
>> I have hit a rather odd issue with prepared queries on both pg 9.3 and 9.4 
>> beta.
>> I have this table (copy at http://samsaffron.com/testing.db.gz) with a
>> very odd performance profile:
>
> Interesting case.  The issue seems to be that your statistics look like
> this:
>
>  select * from pg_stats where tablename = 'topics' and attname = 'archetype';
>  schemaname | tablename |  attname  | inherited | null_frac | avg_width | 
> n_distinct | most_common_vals  |  most_common_freqs  | 
> histogram_bounds | correlation | most_common_elems | most_common_elem_freqs | 
> elem_count_histogram
> +---+---+---+---+---++---+-+--+-+---++-
>  public | topics| archetype | f | 0 |12 | 
>  2 | {private_message,regular} | {0.604957,0.395043} |  | 
>0.612985 |   ||
> (1 row)
>
> That is, archetype consists of 60% 'private_message', 40% 'regular', and
> absolutely nothing else.  So the condition archetype = 'banner' is very
> selective, and a plan built knowing that that is the parameter value will
> use the index:
>
> # explain select * from topics where archetype = 'banner' limit 1;
>   QUERY PLAN
> --
>  Limit  (cost=0.29..651.49 rows=1 width=520)
>->  Index Scan using idx11 on topics  (cost=0.29..651.49 rows=1 width=520)
>  Index Cond: ((arc