[PERFORM] Slow Count-Distinct Query

2014-04-06 Thread Jeff Janes
On Friday, April 4, 2014, Varadharajan Mukundan srinath...@gmail.com
wrote:

 Hi Jeff,

 It looks like the original emailer wrote a query that the planner is not
 smart enough to plan properly (A known limitation of that kind of query).
  He then made a bunch of changes, none of which worked.  He then re-wrote
 the query into a form for which the planner does a better job on.  What we
 do not know is, what would happen if he undoes all of those other changes
 and *just* uses the new form of the query?



 I was also pairing up with Chris (original emailer) on this issue and in
 order to reproduce it, i've a created a two column table with following
 schema with 1.9 Million dummy rows:

 =

 a=# \d participants
  Table public.participants
  Column |  Type  | Modifiers

 ++---
  id | integer| not null default
 nextval('participants_id_seq'::regclass)
  email  | character varying(255) |

 I've tried out various scenarios on this table and recorded it as a
 transcript below: (Please read it as we read a shell script from top to
 bottom continuously to get the whole idea):

 *; Create table and Insert 1.9 Million rows*

  a=# create table participants(id serial, email varchar(255));
 NOTICE:  CREATE TABLE will create implicit sequence participants_id_seq
 for serial column participants.id
 CREATE TABLE
 a=# \d participants
  Table public.participants
  Column |  Type  | Modifiers

 ++---
  id | integer| not null default
 nextval('participants_id_seq'::regclass)
  email  | character varying(255) |

 a=# copy participants from '/tmp/a.csv' with csv;
 COPY 135


Thanks for the detailed response.  I don't have access to your /tmp/a.csv
of course, I so I just used this:

insert into participants (email) select md5(floor(random()*100)::text)
from generate_series(1,200);

This gives each email showing up about twice.




 a=# EXPLAIN (ANALYZE) select count(1) from (select email from participants
 where email=email group by email) x;
 QUERY PLAN

 ---
  Aggregate  (cost=37874.94..37874.96 rows=1 width=0) (actual
 time=1549.258..1549.258 rows=1 loops=1)
-  HashAggregate  (cost=37763.19..37812.86 rows=4967 width=16) (actual
 time=1168.114..1461.672 rows=100 loops=1)
  -  Seq Scan on participants  (cost=0.00..37738.19 rows=1
 width=16) (actual time=0.045..411.267 rows=135 loops=1)
Filter: ((email)::text = (email)::text)
  Total runtime: 1567.586 ms
 (5 rows)


What you have done here is trick the planner into thinking your query will
be 200 times smaller than it actually is, and thus the hash table will be
200 times smaller than it actually is and therefore will fit in allowed
memory.  This is effective at getting the more efficient hash agg.  But it
no more safe than just giving it explicit permission to use that much
memory for this query by increasing work_mem by 200 fold.

I am kind of surprised that the planner is so easily fooled by that.




 *; Creation of idx on email field*

 a=# create index email_idx on participants(email);
 CREATE INDEX



 a=#  EXPLAIN (ANALYZE)  select count(1) from (select email from
 participants group by email) x;
  QUERY PLAN

 -
  Aggregate  (cost=48622.59..48622.60 rows=1 width=0) (actual
 time=1242.718..1242.718 rows=1 loops=1)
-  HashAggregate  (cost=26273.09..36206.20 rows=993311 width=16)
 (actual time=855.215..1150.781 rows=100 loops=1)
  -  Seq Scan on participants  (cost=0.00..21273.25 rows=135
 width=16) (actual time=0.058..217.105 rows=135 loops=1)
  Total runtime: 1264.234 ms
 (4 rows)



I can't reproduce this at all, except by increasing work_mem.   The hash
table needed for this is no smaller than the hash table needed before the
index was built.  Did you increase work_mem before the above plan?

Instead what I get is the index only scan (to provide order) feeding into a
Group.



 a=# drop index email_idx;
 DROP INDEX

 *; Creation of partial index on email *

 a=# create index email_idx on participants(email) where email=email;
 CREATE INDEX

 a=#  EXPLAIN (ANALYZE)  select count(distinct email) from participants
 where email=email;
QUERY PLAN

 

Re: [PERFORM] Slow Count-Distinct Query

2014-04-06 Thread Varadharajan Mukundan
Hi Jeff,

Instead what I get is the index only scan (to provide order) feeding into a
 Group.


That's interesting. We tested out in two versions of Postgres (9.2 and 9.3)
in different Mac machines and ended up with index-only scan only after the
partial index. I remember doing a vacuum full analyse after each and every
step.


 I usually get this plan without the cluster, to.  I think it depends on
 the luck of the sampling in the autoanalyze.


That's interesting as well. I think something like increasing the sample
size would make it much better? Because, we had to perform so many steps to
get the index-only scan working whereas its really obvious for anyone to
guess that it should be the right approach. Also in a far corner of my
mind, i'm also thinking whether any OS specific parameter would be
considered (and is different in your system compared to my system) for
coming up plans and choosing one of them.

-- 
Thanks,
M. Varadharajan



Experience is what you get when you didn't get what you wanted
   -By Prof. Randy Pausch in The Last Lecture

My Journal :- www.thinkasgeek.wordpress.com


Re: [PERFORM] Slow Count-Distinct Query

2014-04-01 Thread bricklen
On Sun, Mar 30, 2014 at 12:45 PM, Christopher Jackson crjac...@gmail.comwrote:

   Hi all,

   tl;dr - How can I speed up my count-distinct query?


Depending on how often you need to run that query and how important it is
to you, if you are willing to accept a performance hit on
INSERT/UPDATE/DELETE of the participants table, you could create a
summary table containing just the count of unique email addresses or the
list of unique email addresses populated via trigger on
INSERT/UPDATE/DELETE of the  participants table. Another option is try out
the new Materialized views (
http://www.postgresql.org/docs/current/static/sql-creatematerializedview.html)
available in 9.3.


Re: [PERFORM] Slow Count-Distinct Query

2014-04-01 Thread Christopher Jackson
Hi Bricklen,

Thanks for the feedback.  I'll play around with materialized views.  My
understanding is they have to be manually triggered for refresh and there's
an exclusive lock on the view while the refresh is taking place.  Is this
your understanding as well?  I'm using PG 9.3.3.  If this is true, I'm
curious what clever ways people have come up with to mitigate any issues
with the lock.

   Thanks again,
  Chris


On Tue, Apr 1, 2014 at 7:34 PM, bricklen brick...@gmail.com wrote:


 On Sun, Mar 30, 2014 at 12:45 PM, Christopher Jackson 
 crjac...@gmail.comwrote:

   Hi all,

   tl;dr - How can I speed up my count-distinct query?


 Depending on how often you need to run that query and how important it is
 to you, if you are willing to accept a performance hit on
 INSERT/UPDATE/DELETE of the participants table, you could create a
 summary table containing just the count of unique email addresses or the
 list of unique email addresses populated via trigger on
 INSERT/UPDATE/DELETE of the  participants table. Another option is try out
 the new Materialized views (
 http://www.postgresql.org/docs/current/static/sql-creatematerializedview.html)
 available in 9.3.




Re: [PERFORM] Slow Count-Distinct Query

2014-04-01 Thread Michael Paquier
On Wed, Apr 2, 2014 at 1:22 PM, Christopher Jackson crjac...@gmail.com wrote:

 Hi Bricklen,

 Thanks for the feedback.  I'll play around with materialized views.  My
 understanding is they have to be manually triggered for refresh
Yep.

 and there's an exclusive lock on the view while the refresh is taking place.  
 Is this
 your understanding as well?
Re-yep.

 I'm using PG 9.3.3.  If this is true, I'm
 curious what clever ways people have come up with to mitigate any issues
 with the lock.
Kevin Grittner has implemented REFRESH MATERIALIZED VIEW CONCURRENTLY
in 9.4. A unique index is needed on the materialized view as well to
authorize this concurrent operation. It has the merit to allow SELECT
operations on the matview during the refresh.
-- 
Michael


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow Count-Distinct Query

2014-04-01 Thread Christopher Jackson
Hi Bricklen,

   Thanks again for the feedback.  The concurrent refresh sounds cool.  I
just saw the 9.4 release is tentatively scheduled for later this year.  Do
you know what people have been doing for view refreshes in the meantime?

   Thanks


On Tue, Apr 1, 2014 at 11:48 PM, Michael Paquier
michael.paqu...@gmail.comwrote:

 On Wed, Apr 2, 2014 at 1:22 PM, Christopher Jackson crjac...@gmail.com
 wrote:
 
  Hi Bricklen,
 
  Thanks for the feedback.  I'll play around with materialized views.
  My
  understanding is they have to be manually triggered for refresh
 Yep.

  and there's an exclusive lock on the view while the refresh is taking
 place.  Is this
  your understanding as well?
 Re-yep.

  I'm using PG 9.3.3.  If this is true, I'm
  curious what clever ways people have come up with to mitigate any issues
  with the lock.
 Kevin Grittner has implemented REFRESH MATERIALIZED VIEW CONCURRENTLY
 in 9.4. A unique index is needed on the materialized view as well to
 authorize this concurrent operation. It has the merit to allow SELECT
 operations on the matview during the refresh.
 --
 Michael



Re: [PERFORM] Slow Count-Distinct Query

2014-03-31 Thread Shaun Thomas
  tl;dr - How can I speed up my count-distinct query?  

You can't.

Doing a count(distinct x) is much different than a count(1), which can simply 
scan available indexes. To build a distinct, it has to construct an in-memory 
hash of every valid email, and count the distinct values therein. This will 
pretty much never be fast, especially with 2M rows involved.

I could be wrong about this, and the back-end folks might have a different 
answer, but I wouldn't hold my breath.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd | Suite 400 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow Count-Distinct Query

2014-03-31 Thread Tom Lane
Christopher Jackson crjac...@gmail.com writes:
   tl;dr - How can I speed up my count-distinct query?

EXPLAIN doesn't provide a lot of visibility into what the Aggregate plan
node is doing, but in this case what it's doing is an internal sort/uniq
operation to implement the DISTINCT.  You didn't say what value of
work_mem you're using, but it'd need to be probably 50-100MB to prevent
that sort from spilling to disk (and therefore being slow).

Note that the indexscan is actually *slower* than the seqscan so far as
the table access is concerned; if the table were big enough to not fit
in RAM, this would get very much worse.  So I'm not impressed with trying
to force the optimizer's hand as you've done here --- it might be a nice
plan now, but it's brittle.  See if a bigger work_mem improves matters
enough with the regular plan.

 *I'm concerned about setting the enable_bitmapscan and seq_page_cost values
 because I'm not yet sure what the consequences are.  Can anyone enlighten
 me on the recommended way to speed up this query?*

Turning off enable_bitmapscan globally would be a seriously bad idea.
Changing the cost settings to these values globally might be all right;
it would amount to optimizing for all-in-memory cases, which might or
might not be a good idea for your situation.  For that matter, greatly
increasing work_mem globally is usually not thought to be smart either;
remember that it's a per-sort-operation setting so you may need to
provision a considerable multiple of the setting as physical RAM,
depending on how many queries you expect to run concurrently.  So all in
all you might be well advised to just set special values for this one
query, whichever solution approach you use.

I doubt you need the where email=email hack, in any case.  That isn't
forcing the optimizer's decision in any meaningful fashion.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow Count-Distinct Query

2014-03-30 Thread Christopher Jackson
  Hi all,

  tl;dr - How can I speed up my count-distinct query?

  I apologize in advance if this question has been asked already.  I'm
finding the mailing list hard to navigate.  I'm trying to speed up a query
that will find a count of distinct emails with in a table using Postgres
9.3.3.  The name of the table is participants.  Our domain is set up such
that duplicate emails are allowed so long as a particular corresponding
column value is unique.


*TABLE participants*

*  id serial NOT NULL (primary key)*

*  email character varying(255)*

*  (other columns omitted)*



I have the following index defined:

*index_participants_on_email ON participants USING btree (email COLLATE
pg_catalog.default);*

The query I'm trying to run is select count(distinct email) from
participants.  I've also tried the *group by* equivalent.  *On a table size
of 2 million rows, the query takes about 1 minute to return.*  This is way
too long.  After running analyze, I see that the index is being ignored and
a full table scan is performed.

So, I tried running the following after dropping the index:
create index email_idx on participants(email) where email=email;
set enable_bitmapscan = false;
set seq_page_cost = 0.1;
set random_page_cost = 0.2;
create index email_idx_2 on participants(email);
cluster participants using email_idx_2;

With these settings in place, if I run *select count(distinct email) from
participants* I get

Aggregate  (cost=29586.20..29586.21 rows=1 width=18) (actual
time=54243.643..54243.644 rows=1 loops=1)
  -  Seq Scan on participants  (cost=0.00..24586.18 rows=208
width=18) (actual time=0.030..550.296 rows=208 loops=1)
*Total runtime: 54243.669 ms*

When I run the following, I get MUCH better results
*select count(1) from (select email from participants where email=email
group by email) x;*

Aggregate  (cost=1856.36..1856.37 rows=1 width=0) (actual
time=1393.573..1393.573 rows=1 loops=1)
  Output: count(1)
  -  Group  (cost=0.43..1731.36 rows=1 width=18) (actual
time=0.052..1205.977 rows=208 loops=1)
Output: participants.email
-  Index Only Scan using email_idx on public.participants
 (cost=0.43..1706.36 rows=1 width=18) (actual time=0.050..625.248
rows=208 loops=1)
  Output: participants.email
  Heap Fetches: 208
*Total runtime: 1393.599 ms*

This query has a weird where clause (email=email) because I'm trying to
force the analyzer's hand to use the index.

*I'm concerned about setting the enable_bitmapscan and seq_page_cost values
because I'm not yet sure what the consequences are.  Can anyone enlighten
me on the recommended way to speed up this query?*

 Thanks


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-09 Thread Gorshkov

On 2011-02-03 22:48, Scott Marlowe wrote:

On Thu, Feb 3, 2011 at 8:40 PM, Greg Smithg...@2ndquadrant.com  wrote:

Scott Marlowe wrote:


Yes they're useful, but like a plastic bad covering a broken car window,
they're useful because they cover something that's inherently broken.



Awesome.  Now we have a car anology, with a funny typo no less.  Plastic
bad, I love it.  This is real progress toward getting all the common list
argument idioms aired out.  All we need now is a homage to Mike Godwin and
we can close this down.


It's not so much a car analogy as a plastic bad analogy.




Don't be such an analogy Nazi.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-06 Thread Andrew Dunstan



On 02/04/2011 02:32 AM, da...@lang.hm wrote:


when a copy command is issued, I assume that there is some indication 
of how much data is going to follow.





No of course there isn't. How would we do that with a stream like STDIN? 
Read the code.


cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-06 Thread Nick Lello
Informix IDS supports hints as well; normally the only need for hints in
this engine is when the Table/Index statistics are not being updated on a
regular basis (ie: lazy DBA).


On 3 February 2011 22:17, Mark Kirkwood mark.kirkw...@catalyst.net.nzwrote:

 On 04/02/11 11:08, Josh Berkus wrote:

 I don't think that's actually accurate.  Can you give me a list of
 DBMSes which support hints other than Oracle?

  DB2 LUW (Linux, Unix, Win32 code base) has hint profiles:


 http://justdb2chatter.blogspot.com/2008/06/db2-hints-optimizer-selection.html



 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 


Nick Lello | Web Architect
o +44 (0) 8433309374 | m +44 (0) 7917 138319
Email: nick.lello at rentrak.com
RENTRAK | www.rentrak.com | NASDAQ: RENT


Re: Does auto-analyze work on dirty writes? (was: Re: [HACKERS] [PERFORM] Slow count(*) again...)

2011-02-06 Thread Tom Lane
Mark Mielke m...@mark.mielke.cc writes:
 My understanding is:

 1) Background daemon wakes up and checks whether a number of changes 
 have happened to the database, irrelevant of transaction boundaries.

 2) Background daemon analyzes a percentage of rows in the database for 
 statistical data, irrelevant of row visibility.

 3) Analyze is important for both visible rows and invisible rows, as 
 plan execution is impacted by invisible rows. As long as they are part 
 of the table, they may impact the queries performed against the table.

 4) It doesn't matter if the invisible rows are invisible because they 
 are not yet committed, or because they are not yet vacuumed.

 Would somebody in the know please confirm the above understanding for my 
 own piece of mind?

No.

1. Autovacuum fires when the stats collector's insert/update/delete
counts have reached appropriate thresholds.  Those counts are
accumulated from messages sent by backends at transaction commit or
rollback, so they take no account of what's been done by transactions
still in progress.

2. Only live rows are included in the stats computed by ANALYZE.
(IIRC it uses SnapshotNow to decide whether rows are live.)

Although the stats collector does track an estimate of the number of
dead rows for the benefit of autovacuum, this isn't used by planning.
Table bloat is accounted for only in terms of growth of the physical
size of the table in blocks.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-05 Thread Scott Marlowe
On Fri, Feb 4, 2011 at 11:37 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Feb 5, 2011 at 12:46 AM,  da...@lang.hm wrote:
 Actually for me the main con with streaming analyze is that it adds
 significant CPU burden to already not too fast load process. Especially if
 it's automatically done for any load operation performed (and I can't see
 how it can be enabled on some threshold).

 two thoughts

 1. if it's a large enough load, itsn't it I/O bound?

 Sometimes.  Our COPY is not as cheap as we'd like it to be.

With a 24 drive RAID-10 array that can read at ~1GB/s I am almost
always CPU bound during copies.  This isn't wholly bad as it leaves
spare IO for the rest of the machine so regular work carries on just
fine.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-05 Thread Greg Smith

Scott Marlowe wrote:

With a 24 drive RAID-10 array that can read at ~1GB/s I am almost
always CPU bound during copies.  This isn't wholly bad as it leaves
spare IO for the rest of the machine so regular work carries on just
fine.
  


And you don't need nearly that much I/O bandwidth to reach that point.  
I've hit being CPU bound on COPY...FROM on systems with far less drives 
than 24.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-04 Thread Torsten Zühlsdorff

Mladen Gogala schrieb:

Well, the problem will not go away.  As I've said before, all other 
databases have that feature and none of the reasons listed here 
convinced me that everybody else has a crappy optimizer.  The problem 
may go away altogether if people stop using PostgreSQL.


A common problem of programmers is, that they want a solution they 
already know for a problem they already know, even if it is the worst 
solution the can choose.


There are so many possibilities to solve a given problem and you even 
have time to do this before your application get released.


Also: if you rely so heavily on hints, then use a database which 
supports hints. A basic mantra in every training i have given is: use 
the tool/technic/persons which fits best for the needs of the project. 
There are many databases out there - choose for every project the one, 
which fits best!


Greetings from Germany,
Torsten
--
http://www.dddbl.de - ein Datenbank-Layer, der die Arbeit mit 8 
verschiedenen Datenbanksystemen abstrahiert,
Queries von Applikationen trennt und automatisch die Query-Ergebnisse 
auswerten kann.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-04 Thread Grant Johnson



Yes.  And this has little to do with hints.  It has to do with years
of development lead with THOUSANDS of engineers who can work on the
most esoteric corner cases in their spare time.  Find the pg project a
couple hundred software engineers and maybe we'll catch Oracle a
little quicker.  Otherwise we'll have to marshall our resources to do
the best we can on the project ,and that means avoiding maintenance
black holes and having the devs work on the things that give the most
benefit for the cost.  Hints are something only a tiny percentage of
users could actually use and use well.

Write a check, hire some developers and get the code done and present
it to the community.  If it's good and works it'll likely get
accepted.  Or use EDB, since it has oracle compatibility in it.

I have to disagree with you here.   I have never seen Oracle outperform 
PostgreSQL on complex joins, which is where the planner comes in.  
Perhaps on certain throughput things, but this is likely do to how we 
handle dead rows, and counts, which is definitely because of how dead 
rows are handled, but the easier maintenance makes up for those.  Also 
both of those are by a small percentage.


I have many times had Oracle queries that never finish (OK maybe not 
never, but not over a long weekend) on large hardware, but can be 
finished on PostgreSQL in a matter or minutes on cheap hardware.   This 
happens to the point that often I have set up a PostgreSQL database to 
copy the data to for querying and runnign the complex reports, even 
though the origin of the data was Oracle, since the application was 
Oracle specific.   It took less time to duplicate the database and run 
the query on PostgreSQL than it did to just run it on Oracle.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-04 Thread Kenneth Marshall
On Thu, Feb 03, 2011 at 04:39:12PM -0800, da...@lang.hm wrote:
 On Thu, 3 Feb 2011, Robert Haas wrote:

 On Thu, Feb 3, 2011 at 3:54 PM,  da...@lang.hm wrote:
 with the current code, this is a completely separate process that knows
 nothing about the load, so if you kick it off when you start the load, it
 makes a pass over the table (competing for I/O), finishes, you continue 
 to
 update the table, so it makes another pass, etc. As you say, this is a 
 bad
 thing to do. I am saying to have an option that ties the two togeather,
 essentially making the data feed into the Analyze run be a fork of the 
 data
 comeing out of the insert run going to disk. So the Analyze run doesn't 
 do
 any I/O and isn't going to complete until the insert is complete. At 
 which
 time it will have seen one copy of the entire table.

 Yeah, but you'll be passing the entire table through this separate
 process that may only need to see 1% of it or less on a large table.
 If you want to write the code and prove it's better than what we have
 now, or some other approach that someone else may implement in the
 meantime, hey, this is an open source project, and I like improvements
 as much as the next guy.  But my prediction for what it's worth is
 that the results will suck.  :-)

 I will point out that 1% of a very large table can still be a lot of disk 
 I/O that is avoided (especially if it's random I/O that's avoided)

 David Lang


In addition, the streaming ANALYZE can provide better statistics at
any time during the load and it will be complete immediately. As far
as passing the entire table through the ANALYZE process, a simple
counter can be used to only send the required samples based on the
statistics target. Where this would seem to help the most is in
temporary tables which currently do not work with autovacuum but it
would streamline their use for more complicated queries that need
an analyze to perform well.

Regards,
Ken

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-04 Thread Vitalii Tymchyshyn

04.02.11 16:33, Kenneth Marshall написав(ла):


In addition, the streaming ANALYZE can provide better statistics at
any time during the load and it will be complete immediately. As far
as passing the entire table through the ANALYZE process, a simple
counter can be used to only send the required samples based on the
statistics target. Where this would seem to help the most is in
temporary tables which currently do not work with autovacuum but it
would streamline their use for more complicated queries that need
an analyze to perform well.

Actually for me the main con with streaming analyze is that it adds 
significant CPU burden to already not too fast load process. Especially 
if it's automatically done for any load operation performed (and I can't 
see how it can be enabled on some threshold).
And you can't start after some threshold of data passed by since you may 
loose significant information (like minimal values).


Best regards, Vitalii Tymchyshyn

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-04 Thread Kenneth Marshall
On Thu, Feb 03, 2011 at 09:05:52PM -0500, Robert Haas wrote:
 On Thu, Feb 3, 2011 at 8:37 PM,  da...@lang.hm wrote:
  On Thu, 3 Feb 2011, Robert Haas wrote:
 
  On Thu, Feb 3, 2011 at 7:39 PM, ?da...@lang.hm wrote:
 
  Yeah, but you'll be passing the entire table through this separate
  process that may only need to see 1% of it or less on a large table.
  If you want to write the code and prove it's better than what we have
  now, or some other approach that someone else may implement in the
  meantime, hey, this is an open source project, and I like improvements
  as much as the next guy. ?But my prediction for what it's worth is
  that the results will suck. ?:-)
 
  I will point out that 1% of a very large table can still be a lot of disk
  I/O that is avoided (especially if it's random I/O that's avoided)
 
  Sure, but I think that trying to avoid it will be costly in other ways
  - you'll be streaming a huge volume of data through some auxiliary
  process, which will have to apply some algorithm that's very different
  from the one we use today. ?The reality is that I think there's little
  evidence that the way we do ANALYZE now is too expensive. ?It's
  typically very cheap and works very well. ?It's a bit annoying when it
  fires off in the middle of a giant data load, so we might need to
  change the time of it a little, but if there's a problem with the
  operation itself being too costly, this is the first I'm hearing of
  it. ?We've actually worked *really* hard to make it cheap.
 
  I could be misunderstanding things here, but my understanding is that it's
  'cheap' in that it has little impact on the database while it is running.
 
 I mean that it's cheap in that it usually takes very little time to complete.
 
  the issue here is that the workflow is
 
  load data
  analyze
  start work
 
  so the cost of analyze in this workflow is not 1% impact on query speed for
  the next X time, it's the database can't be used for the next X time while
  we wait for analyze to finish running
 
 OK.
 
  I don't understand why the algorithm would have to be so different than
  what's done today, surely the analyze thread could easily be tweaked to
  ignore the rest of the data (assuming we don't have the thread sending the
  data to analyze do the filtering)
 
 If you want to randomly pick 10,000 rows out of all the rows that are
 going to be inserted in the table without knowing in advance how many
 there will be, how do you do that?  Maybe there's an algorithm, but
 it's not obvious to me.  But mostly, I question how expensive it is to
 have a second process looking at the entire table contents vs. going
 back and rereading a sample of rows at the end.  I can't remember
 anyone ever complaining ANALYZE took too long to run.  I only
 remember complaints of the form I had to remember to manually run it
 and I wish it had just happened by itself.
 
 -- 
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

Probably doomed to be shot down, but since you are effectively inline,
you could sample assuming a range of table row counts. Start at the
size of a table where random (index) lookups are faster than a sequential
scan and then at appropriate multiples, 100x, 100*100X,... then you should
be able to generate appropriate statistics. I have not actually looked at
how that would happen, but it would certainly allow you to process far, far
fewer rows than the entire table.

Regards,
Ken

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-04 Thread Scott Marlowe
On Fri, Feb 4, 2011 at 6:05 AM, Grant Johnson gr...@amadensor.com wrote:

 Yes.  And this has little to do with hints.  It has to do with years
 of development lead with THOUSANDS of engineers who can work on the
 most esoteric corner cases in their spare time.  Find the pg project a
 couple hundred software engineers and maybe we'll catch Oracle a
 little quicker.  Otherwise we'll have to marshall our resources to do
 the best we can on the project ,and that means avoiding maintenance
 black holes and having the devs work on the things that give the most
 benefit for the cost.  Hints are something only a tiny percentage of
 users could actually use and use well.

 Write a check, hire some developers and get the code done and present
 it to the community.  If it's good and works it'll likely get
 accepted.  Or use EDB, since it has oracle compatibility in it.

 I have to disagree with you here.   I have never seen Oracle outperform
 PostgreSQL on complex joins, which is where the planner comes in.  Perhaps
 on certain throughput things, but this is likely do to how we handle dead
 rows, and counts, which is definitely because of how dead rows are handled,
 but the easier maintenance makes up for those.  Also both of those are by a
 small percentage.

 I have many times had Oracle queries that never finish (OK maybe not never,
 but not over a long weekend) on large hardware, but can be finished on
 PostgreSQL in a matter or minutes on cheap hardware.   This happens to the
 point that often I have set up a PostgreSQL database to copy the data to for
 querying and runnign the complex reports, even though the origin of the data
 was Oracle, since the application was Oracle specific.   It took less time
 to duplicate the database and run the query on PostgreSQL than it did to
 just run it on Oracle.

It very much depends on the query.  With lots of tables to join, and
with pg 8.1 which is what I used when we were running Oracle 9, Oracle
won.  With fewer tables to join in an otherwise complex reporting
query PostgreSQL won.  I did the exact thing you're talking about. I
actually wrote a simple replication system fro Oracle to PostgreSQL
(it was allowed to be imperfect because it was stats data and we could
recreate at a moment).

PostgreSQL on a PIV workstation with 2G ram and 4 SATA drives in
RAID-10 stomped Oracle on much bigger Sun hardware into the ground for
reporting queries.  Queries that ran for hours or didn't finish in
Oracle ran in 5 to 30 minutes on the pg box.

But not all queries were like that.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-04 Thread Robert Haas
On Fri, Feb 4, 2011 at 9:38 AM, Vitalii Tymchyshyn tiv...@gmail.com wrote:
 Actually for me the main con with streaming analyze is that it adds
 significant CPU burden to already not too fast load process.

Exactly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-04 Thread Bruce Momjian
Greg Smith wrote:
 Check out 
 http://www.indeed.com/jobtrends?q=postgres%2C+mysql%2C+oraclerelative=1relative=1
  
 if you want to see the real story here.  Oracle has a large installed 
 base, but it's considered a troublesome legacy product being replaced 

+1 for Oracle being a troublesome legacy product.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-04 Thread Bruce Momjian
Mladen Gogala wrote:
 Chris Browne wrote:
  Well, the community declines to add hints until there is actual
  consensus on a good way to add hints.

 OK. That's another matter entirely.   Who should make that decision? Is 
 there a committee or a person who would be capable of making that decision?
 
  Nobody has ever proposed a way to add hints where consensus was arrived
  at that the way was good, so...

 
 So, I will have to go back on my decision to use Postgres and 
 re-consider MySQL? I will rather throw away the effort invested in 

You want to reconsider using MySQL because Postgres doesn't have hints. 
Hard to see how that logic works.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-04 Thread Bruce Momjian
Mladen Gogala wrote:
 Actually, it is not unlike a religious dogma, only stating that hints 
 are bad. It even says so in the wiki. The arguments are
 1) Refusal to implement hints is motivated by distrust toward users, 
 citing that some people may mess things up.
 Yes, they can, with and without hints.
 2) All other databases have them. This is a major feature and if I were 
 in the MySQL camp, I would use it as an
argument. Asking me for some proof is missing the point. All other 
 databases have hints precisely because
they are useful. Assertion that only Postgres is so smart that can 
 operate without hints doesn't match the
reality. As a matter of fact, Oracle RDBMS on the same machine will 
 regularly beat PgSQL in performance.
That has been my experience so far.   I even posted counting query 
 results.
 3) Hints are make it or break it feature. They're absolutely needed in 
 the fire extinguishing situations.
 
 I see no arguments to say otherwise and until that ridiculous we don't 
 want hints dogma is on wiki, this is precisely what it is:  a dogma. 

Uh, that is kind of funny considering that text is on a 'wiki', meaning
everything there is open to change if the group agrees.

 Dogmas do not change and I am sorry that you don't see it that way. 
 However, this discussion
 did convince me that I need to take another look at MySQL and tone down 
 my engagement with PostgreSQL community. This is my last post on the 
 subject because posts are becoming increasingly personal. This level of 
 irritation is also
 characteristic of a religious community chastising a sinner. Let me 
 remind you again: all other major databases have that possibility: 
 Oracle, MySQL, DB2, SQL Server and Informix. Requiring burden of proof 
 about hints is equivalent to saying that all these databases are 
 developed by idiots and have a crappy optimizer.

You need to state the case for hints independent of what other databases
do, and indepdendent of fixing the problems where the optimizer doesn't
match reatility.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-04 Thread Scott Marlowe
On Fri, Feb 4, 2011 at 5:17 PM, Bruce Momjian br...@momjian.us wrote:
 Mladen Gogala wrote:
 characteristic of a religious community chastising a sinner. Let me
 remind you again: all other major databases have that possibility:
 Oracle, MySQL, DB2, SQL Server and Informix. Requiring burden of proof
 about hints is equivalent to saying that all these databases are
 developed by idiots and have a crappy optimizer.

 You need to state the case for hints independent of what other databases
 do, and indepdendent of fixing the problems where the optimizer doesn't
 match reatility.

And that kind of limits to an area where we would the ability to nudge
costs instead of just set them for an individual part of a query.
i.e. join b on (a.a=b.b set selectivity=0.01) or (a.a=b.b set
selectivity=1.0) or something like that.  i.e. a.a and b.b have a lot
of matches or few, etc.  If there's any thought of hinting it should
be something that a DBA, knowing his data model well, WILL know more
than the current planner because the planner can't get cross table
statistics yet.

But then, why not do something to allow cross table indexes and / or
statistics?  To me that would go much further to helping fix the
issues where the current planner flies blind.

-- 
To understand recursion, one must first understand recursion.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-04 Thread david

On Fri, 4 Feb 2011, Vitalii Tymchyshyn wrote:


04.02.11 16:33, Kenneth Marshall ???(??):


In addition, the streaming ANALYZE can provide better statistics at
any time during the load and it will be complete immediately. As far
as passing the entire table through the ANALYZE process, a simple
counter can be used to only send the required samples based on the
statistics target. Where this would seem to help the most is in
temporary tables which currently do not work with autovacuum but it
would streamline their use for more complicated queries that need
an analyze to perform well.

Actually for me the main con with streaming analyze is that it adds 
significant CPU burden to already not too fast load process. Especially if 
it's automatically done for any load operation performed (and I can't see how 
it can be enabled on some threshold).


two thoughts

1. if it's a large enough load, itsn't it I/O bound?


2. this chould be done in a separate process/thread than the load itself, 
that way the overhead of the load is just copying the data in memory to 
the other process.


with a multi-threaded load, this would eat up some cpu that could be used 
for the load, but cores/chip are still climbing rapidly so I expect that 
it's still pretty easy to end up with enough CPU to handle the extra load.


David Lang

And you can't start after some threshold of data passed by since you may 
loose significant information (like minimal values).


Best regards, Vitalii Tymchyshyn



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-04 Thread Robert Haas
On Sat, Feb 5, 2011 at 12:46 AM,  da...@lang.hm wrote:
 Actually for me the main con with streaming analyze is that it adds
 significant CPU burden to already not too fast load process. Especially if
 it's automatically done for any load operation performed (and I can't see
 how it can be enabled on some threshold).

 two thoughts

 1. if it's a large enough load, itsn't it I/O bound?

Sometimes.  Our COPY is not as cheap as we'd like it to be.

 2. this chould be done in a separate process/thread than the load itself,
 that way the overhead of the load is just copying the data in memory to the
 other process.

I think that's more expensive than you're giving it credit for.

But by all means implement it and post the patch if it works out...!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Vitalii Tymchyshyn

02.02.11 20:32, Robert Haas написав(ла):


Yeah.  Any kind of bulk load into an empty table can be a problem,
even if it's not temporary.  When you load a bunch of data and then
immediately plan a query against it, autoanalyze hasn't had a chance
to do its thing yet, so sometimes you get a lousy plan.


May be introducing something like 'AutoAnalyze' threshold will help? I 
mean that any insert/update/delete statement that changes more then x% 
of table (and no less then y records) must do analyze right after it was 
finished.

Defaults like x=50 y=1 should be quite good as for me.

Best regards, Vitalii Tymchyshyn

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread david

On Thu, 3 Feb 2011, Vitalii Tymchyshyn wrote:


02.02.11 20:32, Robert Haas ???(??):


Yeah.  Any kind of bulk load into an empty table can be a problem,
even if it's not temporary.  When you load a bunch of data and then
immediately plan a query against it, autoanalyze hasn't had a chance
to do its thing yet, so sometimes you get a lousy plan.


May be introducing something like 'AutoAnalyze' threshold will help? I mean 
that any insert/update/delete statement that changes more then x% of table 
(and no less then y records) must do analyze right after it was finished.

Defaults like x=50 y=1 should be quite good as for me.


If I am understanding things correctly, a full Analyze is going over all 
the data in the table to figure out patterns.


If this is the case, wouldn't it make sense in the situation where you are 
loading an entire table from scratch to run the Analyze as you are 
processing the data? If you don't want to slow down the main thread that's 
inserting the data, you could copy the data to a second thread and do the 
analysis while it's still in RAM rather than having to read it off of disk 
afterwords.


this doesn't make sense for updates to existing databases, but the use 
case of loading a bunch of data and then querying it right away isn't 
_that_ uncommon.


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Kenneth Marshall
On Thu, Feb 03, 2011 at 02:11:58AM -0800, da...@lang.hm wrote:
 On Thu, 3 Feb 2011, Vitalii Tymchyshyn wrote:

 02.02.11 20:32, Robert Haas ???(??):
 Yeah.  Any kind of bulk load into an empty table can be a problem,
 even if it's not temporary.  When you load a bunch of data and then
 immediately plan a query against it, autoanalyze hasn't had a chance
 to do its thing yet, so sometimes you get a lousy plan.

 May be introducing something like 'AutoAnalyze' threshold will help? I 
 mean that any insert/update/delete statement that changes more then x% of 
 table (and no less then y records) must do analyze right after it was 
 finished.
 Defaults like x=50 y=1 should be quite good as for me.

 If I am understanding things correctly, a full Analyze is going over all 
 the data in the table to figure out patterns.

 If this is the case, wouldn't it make sense in the situation where you are 
 loading an entire table from scratch to run the Analyze as you are 
 processing the data? If you don't want to slow down the main thread that's 
 inserting the data, you could copy the data to a second thread and do the 
 analysis while it's still in RAM rather than having to read it off of disk 
 afterwords.

 this doesn't make sense for updates to existing databases, but the use case 
 of loading a bunch of data and then querying it right away isn't _that_ 
 uncommon.

 David Lang


+1 for in-flight ANALYZE. This would be great for bulk loads of
real tables as well as temp tables.

Cheers,
Ken

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Jon Nelson
On Thu, Feb 3, 2011 at 7:41 AM, Kenneth Marshall k...@rice.edu wrote:
 On Thu, Feb 03, 2011 at 02:11:58AM -0800, da...@lang.hm wrote:
 On Thu, 3 Feb 2011, Vitalii Tymchyshyn wrote:

 02.02.11 20:32, Robert Haas ???(??):
 Yeah.  Any kind of bulk load into an empty table can be a problem,
 even if it's not temporary.  When you load a bunch of data and then
 immediately plan a query against it, autoanalyze hasn't had a chance
 to do its thing yet, so sometimes you get a lousy plan.

 May be introducing something like 'AutoAnalyze' threshold will help? I
 mean that any insert/update/delete statement that changes more then x% of
 table (and no less then y records) must do analyze right after it was
 finished.
 Defaults like x=50 y=1 should be quite good as for me.

 If I am understanding things correctly, a full Analyze is going over all
 the data in the table to figure out patterns.

 If this is the case, wouldn't it make sense in the situation where you are
 loading an entire table from scratch to run the Analyze as you are
 processing the data? If you don't want to slow down the main thread that's
 inserting the data, you could copy the data to a second thread and do the
 analysis while it's still in RAM rather than having to read it off of disk
 afterwords.

 this doesn't make sense for updates to existing databases, but the use case
 of loading a bunch of data and then querying it right away isn't _that_
 uncommon.

 David Lang


 +1 for in-flight ANALYZE. This would be great for bulk loads of
 real tables as well as temp tables.

Yes, please, that would be really nice.




-- 
Jon

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Robert Haas
On Thu, Feb 3, 2011 at 4:54 AM, Vitalii Tymchyshyn tiv...@gmail.com wrote:
 02.02.11 20:32, Robert Haas написав(ла):

 Yeah.  Any kind of bulk load into an empty table can be a problem,
 even if it's not temporary.  When you load a bunch of data and then
 immediately plan a query against it, autoanalyze hasn't had a chance
 to do its thing yet, so sometimes you get a lousy plan.

 May be introducing something like 'AutoAnalyze' threshold will help? I mean
 that any insert/update/delete statement that changes more then x% of table
 (and no less then y records) must do analyze right after it was finished.
 Defaults like x=50 y=1 should be quite good as for me.

That would actually be a pessimization for many real world cases.  Consider:

COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
SELECT

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Robert Haas
On Thu, Feb 3, 2011 at 5:11 AM,  da...@lang.hm wrote:
 If I am understanding things correctly, a full Analyze is going over all the
 data in the table to figure out patterns.

No.  It's going over a small, fixed-size sample which depends on
default_statistics_target but NOT on the table size.  It's really
important to come up with a solution that's not susceptible to running
ANALYZE over and over again, in many cases unnecessarily.

 If this is the case, wouldn't it make sense in the situation where you are
 loading an entire table from scratch to run the Analyze as you are
 processing the data? If you don't want to slow down the main thread that's
 inserting the data, you could copy the data to a second thread and do the
 analysis while it's still in RAM rather than having to read it off of disk
 afterwords.

Well that's basically what autoanalyze is going to do anyway, if the
table is small enough to fit in shared_buffers.  And it's actually
usually BAD if it starts running while you're doing a large bulk load,
because it competes for I/O bandwidth and the buffer cache and slows
things down.  Especially when you're bulk loading for a long time and
it tries to run over and over.  I'd really like to suppress all those
asynchronous ANALYZE operations and instead do ONE synchronous one at
the end, when we try to use the data.

Of course, the devil is in the nontrivial details.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Vitalii Tymchyshyn

03.02.11 17:31, Robert Haas написав(ла):



May be introducing something like 'AutoAnalyze' threshold will help? I mean
that any insert/update/delete statement that changes more then x% of table
(and no less then y records) must do analyze right after it was finished.
Defaults like x=50 y=1 should be quite good as for me.

That would actually be a pessimization for many real world cases.  Consider:

COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
SELECT

If all the copies are ~ same in size and large this will make it:

COPY
ANALYZE
COPY
ANALYZE
COPY
COPY
ANALYZE
COPY
COPY
COPY
COPY
ANALYZE
COPY
COPY
COPY
COPY
COPY
SELECT

instead of

COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
ANALYZE (manual, if one is clever enough)
SELECT

So, yes this will add 3 more analyze, but
1) Analyze is pretty cheap comparing to large data loading. I'd say this 
would add few percent of burden. And NOT doing analyze manually before 
select can raise select costs orders of magnitude.
2) How often in real world a single table is loaded in many COPY 
statements? (I don't say it's not often, I really don't know). At least 
for restore it is not the case, is not it?
3) default thresholds are things to discuss. You can make x=90 or x=200 
(latter will make it run only for massive load/insert operations). You 
can even make it disabled by default for people to test. Or enable by 
default for temp tables only (and have two sets of thresholds)
4) As most other settings, this threshold can be changed on up to 
per-query basis.


P.S. I would also like to have index analyze as part of any create index 
process.


Best regards, Vitalii Tymchyshyn


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Greg Smith wrote:

Mladen Gogala wrote:
  
The techies at big companies are the guys who will or will not make it 
happen. And these guys are not beginners.  Appeasing them may actually 
go a long way.



The PostgreSQL community isn't real big on appeasing people if it's at 
the expense of robustness or correctness, and this issue falls into that 
category.  
With all due respect, I don't see how does the issue of hints fall into 
this category? As I explained, the mechanisms are already there, they're 
just not elegant enough. The verb appease doesn't convey the meaning 
that I had in mind quite correctly. The phrase target population would 
have  described what I wanted to say in a much better way .
There are downsides to that, but good things too.  Chasing 
after whatever made people happy regardless of its impact on the server 
code itself has in my mind contributed to why Oracle is so bloated and 
MySQL so buggy, to pick two examples from my favorite horse to whip.  
  
Well, those two databases are also used much more widely than Postgres, 
which means that they're doing something better than Postgres.


Hints are not even that complicated to program. The SQL parser should 
compile the list of hints into a table and optimizer should check 
whether any of the applicable access methods exist in the table. If it 
does - use it. If not, ignore it. This looks to me like a philosophical 
issue, not a programming issue. Basically, the current Postgres 
philosophy can be described like this: if the database was a gas stove, 
it would occasionally catch fire. However, bundling a fire extinguisher 
with the stove is somehow seen as bad. When the stove catches fire, 
users is expected to report the issue and wait for a better stove to be 
developed. It is a very rough analogy, but rather accurate one, too.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Bruce Momjian
Mladen Gogala wrote:
 Greg Smith wrote:
  Mladen Gogala wrote:

  The techies at big companies are the guys who will or will not make it 
  happen. And these guys are not beginners.  Appeasing them may actually 
  go a long way.
  
 
  The PostgreSQL community isn't real big on appeasing people if it's at 
  the expense of robustness or correctness, and this issue falls into that 
  category.  

 With all due respect, I don't see how does the issue of hints fall into 
 this category? As I explained, the mechanisms are already there, they're 
 just not elegant enough. The verb appease doesn't convey the meaning 
 that I had in mind quite correctly. The phrase target population would 
 have  described what I wanted to say in a much better way .

The settings are currently there to better model the real world
(random_page_cost), or for testing (enable_seqscan).  They are not there
to force certain plans.  They can be used for that, but that is not
their purpose and they would not have been added if that was their
purpose.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Bruce Momjian
Mladen Gogala wrote:
 Hints are not even that complicated to program. The SQL parser should 
 compile the list of hints into a table and optimizer should check 
 whether any of the applicable access methods exist in the table. If it 
 does - use it. If not, ignore it. This looks to me like a philosophical 
 issue, not a programming issue. Basically, the current Postgres 
 philosophy can be described like this: if the database was a gas stove, 
 it would occasionally catch fire. However, bundling a fire extinguisher 
 with the stove is somehow seen as bad. When the stove catches fire, 
 users is expected to report the issue and wait for a better stove to be 
 developed. It is a very rough analogy, but rather accurate one, too.

That might be true.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Shaun Thomas

On 02/03/2011 10:38 AM, Mladen Gogala wrote:


With all due respect, I don't see how does the issue of hints fall
into this category?


You have a few good arguments, and if you hadn't said this, it wouldn't 
have been so obvious that there was a fundamental philosophical 
disconnect. I asked this same question almost ten years ago, and the 
answer Tom gave me was more than sufficient.


It all boils down to the database. Hints, whether they're 
well-intentioned or not, effectively cover up bugs in the optimizer, 
planner, or some other approach the database is using to build its 
execution. Your analogy is that PG is a gas stove, so bundle a fire 
extinguisher. Well, the devs believe that the stove should be upgraded 
to electric or possibly even induction to remove the need for the 
extinguisher.


If they left hints in, it would just be one more thing to deprecate as 
the original need for the hint was removed. If you really need hints 
that badly, EnterpriseDB cobbled the Oracle syntax into the planner, and 
it seems to work alright. That doesn't mean it's right, just that it 
works. EnterpriseDB will now have to support those query hints forever, 
even if the planner gets so advanced they're effectively useless.



Well, those two databases are also used much more widely than
Postgres, which means that they're doing something better than
Postgres.


Please don't make arguments like this. Better is such a subjective 
evaluation it means nothing. Are Honda Accords better than Lamborghini 
Gallardos because more people buy Accords? The MySQL/PostgreSQL flame 
war is a long and sometimes bitter one, and bringing it up to try and 
persuade the devs to see reason is just going to backfire.



Hints are not even that complicated to program.


Then write a contrib module. It's not part of the core DB, and it 
probably never will be. This is a *very* old argument. There's literally 
nothing you can say, no argument you can bring, that hasn't been heard a 
million times in the last decade.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Tom Lane
Mladen Gogala mladen.gog...@vmsinfo.com writes:
 Hints are not even that complicated to program.

With all due respect, you don't know what you're talking about.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Robert Haas
On Thu, Feb 3, 2011 at 11:56 AM, Bruce Momjian br...@momjian.us wrote:
 The settings are currently there to better model the real world
 (random_page_cost), or for testing (enable_seqscan).  They are not there
 to force certain plans.  They can be used for that, but that is not
 their purpose and they would not have been added if that was their
 purpose.

Sure.  But Mladen's point is that this is rather narrow-minded.  I
happen to agree.  We are not building an ivory tower.  We are building
a program that real people will use to solve real problems, and it is
not our job to artificially prevent them from achieving their
objectives so that we remain motivated to improve future versions of
the code.

I don't, however, agree with his contention that this is easy to
implement.  It would be easy to implement something that sucked.  It
would be hard to implement something that actually helped in the cases
where the existing settings aren't already sufficient.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I don't, however, agree with his contention that this is easy to
 implement.  It would be easy to implement something that sucked.  It
 would be hard to implement something that actually helped in the cases
 where the existing settings aren't already sufficient.

Exactly.  A hint system that actually did more good than harm would be a
very nontrivial project.  IMO such effort is better spent on making the
optimizer smarter.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Chris Browne
mladen.gog...@vmsinfo.com (Mladen Gogala) writes:
 Hints are not even that complicated to program. The SQL parser should
 compile the list of hints into a table and optimizer should check
 whether any of the applicable access methods exist in the table. If it
 does - use it. If not, ignore it. This looks to me like a
 philosophical issue, not a programming issue.

It's worth looking back to what has already been elaborated on in the
ToDo.

http://wiki.postgresql.org/wiki/Todo
---
Optimizer hints (not wanted)

Optimizer hints are used to work around problems in the optimizer and
introduce upgrade and maintenance issues. We would rather have the
problems reported and fixed. We have discussed a more sophisticated
system of per-class cost adjustment instead, but a specification remains
to be developed.
---

The complaint is that kludging hints into a particular query attacks the
problem from the wrong direction.

The alternative recommended is to collect some declarative information,
that *won't* be part of the query, that *won't* be processed by the
parser, and that *won't* kludge up the query with information that is
liable to turn into crud over time.

Tom Lane was pretty specific about some kinds of declarative information
that seemed useful:
   http://archives.postgresql.org/pgsql-hackers/2006-10/msg00663.php

On Jeapordy, participants are expected to phrase one's answers in the
form of a question, and doing so is rewarded.

Based on the presence of query hints on the Not Wanted portion of the
ToDo list, it's pretty clear that participants here are expected to
propose optimizer hints in ways that do NOT involve decorating queries
with crud.  You'll get a vastly friendlier response if you at least make
an attempt to attack the problem in the declarative information
fashion.

Perhaps we're all wrong in believing that pushing query optimization
information into application queries by decorating the application with
hints, is the right idea but it's a belief that certainly seems to be
regularly agreed upon by gentle readers.
-- 
cbbrowne,@,linuxdatabases.info
The people's revolutionary committee has  decided that the name e is
retrogressive, unmulticious   and reactionary, and  has  been flushed.
Please update your abbrevs.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Greg Smith

Mladen Gogala wrote:
With all due respect, I don't see how does the issue of hints fall 
into this category? As I explained, the mechanisms are already there, 
they're just not elegant enough.


You're making some assumptions about what a more elegant mechanism would 
look to develop that are simplifying the actual situation here.  If you 
take a survey of everyone who ever works on this area of the code, and 
responses to this thread are already approaching a significant 
percentage of such people, you'll discover that doing what you want is 
more difficult--and very much not elegant enough from the perspective 
of the code involved--than you think it would be.


It's actually kind of funny...I've run into more than one person who 
charged into the PostgreSQL source code with the goal of I'm going to 
add good hinting!  But it seems like the minute anyone gets enough 
understanding of how it fits together to actually do that, they realize 
there are just plain better things to be done in there instead.  I used 
to be in the same situation you're in--thinking that all it would take 
is a better UI for tweaking the existing parameters.  But now that I've 
actually done such tweaking for long enough to get a feel for what's 
really wrong with the underlying assumptions, I can name 3 better uses 
of development resources that I'd rather work on instead.  I mentioned 
incorporating cache visibility already, Robert has talked about 
improvements to the sensitivity estimates, and the third one is 
improving pooling of work_mem so individual clients can get more of it 
safely.


Well, those two databases are also used much more widely than 
Postgres, which means that they're doing something better than Postgres.


Starting earlier is the only better here.  Obviously Oracle got a 
much earlier start than either open-source database.  The real 
divergence in MySQL adoption relative to PostgreSQL was when they 
released a Windows port in January of 1998.  PostgreSQL didn't really 
match that with a fully native port until January of 2005.


Check out 
http://www.indeed.com/jobtrends?q=postgres%2C+mysql%2C+oraclerelative=1relative=1 
if you want to see the real story here.  Oracle has a large installed 
base, but it's considered a troublesome legacy product being replaced 
whenever possible now in every place I visit.  Obviously my view of the 
world as seen through my client feedback is skewed a bit toward 
PostgreSQL adoption.  But you would be hard pressed to support any view 
that suggests Oracle usage is anything other than flat or decreasing at 
this point.  When usage of one product is growing at an expontential 
rate and the other is not growing at all, eventually the market share 
curves always cross too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Chris Browne wrote:

It's worth looking back to what has already been elaborated on in the
ToDo.
  


And that precisely is what I am trying to contest.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Ross J. Reedstrom
On Thu, Feb 03, 2011 at 12:44:23PM -0500, Chris Browne wrote:
 mladen.gog...@vmsinfo.com (Mladen Gogala) writes:
  Hints are not even that complicated to program. The SQL parser should
  compile the list of hints into a table and optimizer should check
  whether any of the applicable access methods exist in the table. If it
  does - use it. If not, ignore it. This looks to me like a
  philosophical issue, not a programming issue.
 
 It's worth looking back to what has already been elaborated on in the
 ToDo.
 
 http://wiki.postgresql.org/wiki/Todo
 ---
 Optimizer hints (not wanted)
 
 Optimizer hints are used to work around problems in the optimizer and
 introduce upgrade and maintenance issues. We would rather have the
 problems reported and fixed. We have discussed a more sophisticated
 system of per-class cost adjustment instead, but a specification remains
 to be developed.

And as to the 'wait around for a new version to fix that': there are
constantly excellent examples of exactly this happening, all the time
with PostgreSQL - most recent example I've seen -
http://archives.postgresql.org/pgsql-performance/2011-01/msg00337.php

The wait often isn't long, at all.

Ross
-- 
Ross Reedstrom, Ph.D. reeds...@rice.edu
Systems Engineer  Admin, Research Scientistphone: 713-348-6166
Connexions  http://cnx.orgfax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E  F888 D3AE 810E 88F0 BEDE

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread david

On Thu, 3 Feb 2011, Robert Haas wrote:


On Thu, Feb 3, 2011 at 5:11 AM,  da...@lang.hm wrote:

If I am understanding things correctly, a full Analyze is going over all the
data in the table to figure out patterns.


No.  It's going over a small, fixed-size sample which depends on
default_statistics_target but NOT on the table size.  It's really
important to come up with a solution that's not susceptible to running
ANALYZE over and over again, in many cases unnecessarily.


If this is the case, wouldn't it make sense in the situation where you are
loading an entire table from scratch to run the Analyze as you are
processing the data? If you don't want to slow down the main thread that's
inserting the data, you could copy the data to a second thread and do the
analysis while it's still in RAM rather than having to read it off of disk
afterwords.


Well that's basically what autoanalyze is going to do anyway, if the
table is small enough to fit in shared_buffers.  And it's actually
usually BAD if it starts running while you're doing a large bulk load,
because it competes for I/O bandwidth and the buffer cache and slows
things down.  Especially when you're bulk loading for a long time and
it tries to run over and over.  I'd really like to suppress all those
asynchronous ANALYZE operations and instead do ONE synchronous one at
the end, when we try to use the data.


If the table is not large enough to fit in ram, then it will compete for 
I/O, and the user will have to wait.


what I'm proposing is that as the records are created, the process doing 
the creation makes copies of the records (either all of them, or some of 
them if not all are needed for the analysis, possibly via shareing memory 
with the analysis process), this would be synchronous with the load, not 
asynchronous.


this would take zero I/O bandwidth, it would take up some ram, memory 
bandwidth, and cpu time, but a load of a large table like this is I/O 
contrained.


it would not make sense for this to be the default, but as an option it 
should save a significant amount of time.


I am making the assumption that an Analyze run only has to go over the 
data once (a seqential scan of the table if it's  ram for example) and 
gathers stats as it goes.


with the current code, this is a completely separate process that knows 
nothing about the load, so if you kick it off when you start the load, it 
makes a pass over the table (competing for I/O), finishes, you continue to 
update the table, so it makes another pass, etc. As you say, this is a bad 
thing to do. I am saying to have an option that ties the two togeather, 
essentially making the data feed into the Analyze run be a fork of the 
data comeing out of the insert run going to disk. So the Analyze run 
doesn't do any I/O and isn't going to complete until the insert is 
complete. At which time it will have seen one copy of the entire table.


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Shaun Thomas wrote:

On 02/03/2011 10:38 AM, Mladen Gogala wrote:

  
It all boils down to the database. Hints, whether they're 
well-intentioned or not, effectively cover up bugs in the optimizer, 
planner, or some other approach the database is using to build its 
execution. 
Hints don't cover up bugs, they simply make it possible for the user to 
circumvent the bugs and keep the users happy. As I hinted before, this 
is actually a purist argument which was made by someone who has never 
had to support a massive production database with many users for living.
Your analogy is that PG is a gas stove, so bundle a fire 
extinguisher. Well, the devs believe that the stove should be upgraded 
to electric or possibly even induction to remove the need for the 
extinguisher.
  
In the meantime, the fire is burning. What should the hapless owner of 
the database application do in the meantime? Tell the users that it will 
be better in the next version? As I've said before: hints are make it or 
break it point. Without hints, I cannot consider Postgres for the 
mission critical projects. I am managing big databases for living and I 
flatter myself that after more than two decades of doing it, I am not 
too bad at it.


If they left hints in, it would just be one more thing to deprecate as 
the original need for the hint was removed. If you really need hints 
that badly, EnterpriseDB cobbled the Oracle syntax into the planner, and 
it seems to work alright. That doesn't mean it's right, just that it 
works. EnterpriseDB will now have to support those query hints forever, 
even if the planner gets so advanced they're effectively useless.
  


I don't foresee that to happen in my lifetime. And I plan to go on for 
quite a while. There will always be optimizer bugs, users will be 
smarter and know more about their data than computer programs in 
foreseeable future. What this attitude boils down to is that developers 
don't trust their users enough to give them control of the execution 
path. I profoundly disagree with that type of philosophy. DB2 also has 
hints: http://tinyurl.com/48fv7w7
So does SQL Server: 
http://www.sql-server-performance.com/tips/hints_general_p1.aspx
Finally, even the Postgres greatest  open source competitor MySQL 
supports hints: http://dev.mysql.com/doc/refman/5.0/en/index-hints.html


I must say that this purist attitude is extremely surprising to me. All 
the major DB vendors support optimizer hints, yet in the Postgres 
community, they are considered bad with almost religious fervor.

Postgres community is quite unique with the fatwa against hints.

--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Chris Browne
mladen.gog...@vmsinfo.com (Mladen Gogala) writes:
 I must say that this purist attitude is extremely surprising to
 me. All the major DB vendors support optimizer hints, yet in the
 Postgres community, they are considered bad with almost religious
 fervor.
 Postgres community is quite unique with the fatwa against hints.

Well, the community declines to add hints until there is actual
consensus on a good way to add hints.

Nobody has ever proposed a way to add hints where consensus was arrived
at that the way was good, so...
-- 
http://linuxfinances.info/info/nonrdbms.html
Rules of the Evil Overlord #192.  If I appoint someone as my consort,
I will  not subsequently inform  her that she  is being replaced  by a
younger, more attractive woman.  http://www.eviloverlord.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Kevin Grittner
Mladen Gogala mladen.gog...@vmsinfo.com wrote:
 
 In the meantime, the fire is burning. What should the hapless
 owner of the database application do in the meantime? Tell the
 users that it will be better in the next version? As I've said
 before: hints are make it or break it point. Without hints, I
 cannot consider Postgres for the mission critical projects. I am
 managing big databases for living and I flatter myself that after
 more than two decades of doing it, I am not too bad at it.
 
Well, I've been at it since 1972, and I'm OK with the current
situation because I push hard for *testing* in advance of production
deployment.  So I generally discover that leaving a pan of grease on
maximum flame unattended is a bad idea in the test lab, where no
serious damage is done.  Then I take steps to ensure that this
doesn't happen in the user world.
 
We've got about 100 production databases, some at 2TB and growing,
and 100 development, testing, and staging databases.  About 3,000
directly connected users and millions of web hits per day generating
tens of millions of queries.  Lots of fun replication and automated
interfaces to business partners -- DOT, county sheriffs, local
police agencies, district attorneys, public defenders offices,
Department of Revenue (for tax intercept collections), Department of
Justice, etc.  (That was really just the tip of the iceberg.)
 
Almost all of this was converted inside of a year with minimal fuss
and only a one user complaint that I can recall.  Most users
described it as a non-event, with the only visible difference
being that applications were snappier than under the commercial
database product.  One type of query was slow in Milwaukee County
(our largest).  We tuned seq_page_cost and random_page_cost until
all queries were running with good plans.  It did not require any
down time to sort this out and fix it -- same day turnaround.  This
is not a matter of hinting; it's a matter of creating a cost model
for the planner which matches reality.  (We don't set this or any
other hint per query, we tune the model.)  When the cost estimates
mirror reality, good plans are chosen.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Shaun Thomas

On 02/03/2011 03:01 PM, Mladen Gogala wrote:


As I hinted before, this is actually a purist argument which was made
by someone who has never had to support a massive production database
with many users for living.


Our database handles 9000 transactions per second and over 200-million 
transactions per day just fine, thanks. It may not be a real database 
in your world, but it's real enough for us.



I must say that this purist attitude is extremely surprising to me.
All the major DB vendors support optimizer hints, yet in the
Postgres community, they are considered bad with almost religious
fervor. Postgres community is quite unique with the fatwa against
hints.


You missed the argument. The community, or at least the devs, see hints 
as an ugly hack. Do I agree? Not completely, but I can definitely 
understand the perspective. Saying every other vendor has hints is 
really just admitting every other vendor has a crappy optimizer. Is that 
something to be proud of?


In almost every single case I've seen a query with bad performance, it's 
the fault of the author or the DBA. Not enough where clauses; not paying 
attention to cardinality or selectivity; inappropriate or misapplied 
indexes; insufficient table statistics... the list of worse grievances 
out there is endless.


And here's something I never saw you consider: hints making performance 
worse. Sure, for now, forcing a sequence scan or forcing it to use 
indexes on a specific table is faster for some specific edge-case. But 
hints are like most code, and tend to miss frequent refactor. As the 
optimizer improves, hints likely won't, meaning code is likely to be 
slower than if the hints didn't exist. This of course ignores the 
contents of a table are likely to evolve or grow in volume, which can 
also drastically alter the path the optimizer would choose, but can't 
because a hint is forcing it to take a specific path.


Want to remove a reverse index scan? Reindex with DESC on the column 
being reversed. That was added in 8.3. Getting too many calls for nested 
loops when a merge or hash would be faster? Increase the statistics 
target for the column causing the problems and re-analyze. Find an 
actual bug in the optimizer? Tell the devs and they'll fix it. Just stay 
current, and you get all those benefits. This is true for any database; 
bugs get fixed, things get faster and more secure.


Or like I said, if you really need hints that badly, use EnterpriseDB 
instead. It's basically completely Oracle-compatible at this point. But 
pestering the PostgreSQL dev community about how inferior they are, and 
how they're doing it wrong, and how they're just another vendor making a 
database product that can't support massive production databases, is 
doing nothing but ensuring they'll ignore you. Flies, honey, vinegar, etc.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Chris Browne wrote:

Well, the community declines to add hints until there is actual
consensus on a good way to add hints.
  
OK. That's another matter entirely.   Who should make that decision? Is 
there a committee or a person who would be capable of making that decision?



Nobody has ever proposed a way to add hints where consensus was arrived
at that the way was good, so...
  


So, I will have to go back on my decision to use Postgres and 
re-consider MySQL? I will rather throw away the effort invested in 
studying Postgres than to risk an unfixable application downtime.  I am 
not sure about the world domination thing, though. Optimizer hints are a 
big feature that everybody else has and Postgres does not have because 
of religious reasons.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mark Kirkwood

On 04/02/11 10:01, Mladen Gogala wrote:
In the meantime, the fire is burning. What should the hapless owner of 
the database application do in the meantime? Tell the users that it 
will be better in the next version? As I've said before: hints are 
make it or break it point. Without hints, I cannot consider Postgres 
for the mission critical projects. I am managing big databases for 
living and I flatter myself that after more than two decades of doing 
it, I am not too bad at it.


This is somewhat of a straw man argument. This sort of query that the 
optimizer does badly usually gets noticed during the test cycle i.e 
before production, so there is some lead time to get a fix into the 
code, or add/subtract indexes/redesign the query concerned.


The cases I've seen in production typically involve outgrowing 
optimizer parameter settings: (e.g work_mem, effective_cache_size) as 
the application dataset gets bigger over time. I would note that this is 
*more* likely to happen with hints, as they lobotomize the optimizer so 
it *cannot* react to dataset size or distribution changes.


regards

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Shaun Thomas wrote:
You missed the argument. The community, or at least the devs, see hints 
as an ugly hack. Do I agree? Not completely, but I can definitely 
understand the perspective. Saying every other vendor has hints is 
really just admitting every other vendor has a crappy optimizer. Is that 
something to be proud of?
  
This is funny? Everybody else has a crappy optimizer? That's a funny way 
of looking at the fact that every other major database supports hints. I 
would be tempted to call that a major missing feature, but the statement 
that everybody else has a crappy optimizer sounds kind of funny. No 
disrespect meant. It's not unlike claiming that the Earth is 6000 years old.




And here's something I never saw you consider: hints making performance 
worse. 
  
Sure. If you give me the steering wheell, there is a chance that I might 
get car off the cliff or even run someone over, but that doesn't mean 
that there is no need for having one. After all, we're talking about the 
ability to control the optimizer decision.


Want to remove a reverse index scan? Reindex with DESC on the column 
being reversed. That was added in 8.3. Getting too many calls for nested 
loops when a merge or hash would be faster? Increase the statistics 
target for the column causing the problems and re-analyze. Find an 
actual bug in the optimizer? Tell the devs and they'll fix it. Just stay 
current, and you get all those benefits. This is true for any database; 
bugs get fixed, things get faster and more secure.
  
In the meantime, the other databases provide hints which help me bridge 
the gap. As I said before: hints are there, even if they were not meant 
to be used that way. I can do things in a way that I consider very 
non-elegant. The hints are there because they are definitely needed. 
Yet, there is a religious zeal and a fatwa against them.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Ben Chobot
On Feb 3, 2011, at 1:50 PM, Mladen Gogala wrote:

 So, I will have to go back on my decision to use Postgres and re-consider 
 MySQL? I will rather throw away the effort invested in studying Postgres than 
 to risk an unfixable application downtime.  I am not sure about the world 
 domination thing, though. Optimizer hints are a big feature that everybody 
 else has and Postgres does not have because of religious reasons.

As always, you should use the tool you consider best for the job. If you think 
MySQL as both a product and a community has a better chance of giving you what 
you want, then you should use MySQL.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Pavel Stehule
2011/2/3 Mladen Gogala mladen.gog...@vmsinfo.com:
 Chris Browne wrote:

 Well, the community declines to add hints until there is actual
 consensus on a good way to add hints.


 OK. That's another matter entirely.   Who should make that decision? Is
 there a committee or a person who would be capable of making that decision?


Because there are not consensus about hints, then hints are not in pg.

And community development must be based on consensus. There are not second way.

Hints are not a win from some reasons.

Situation isn't immutable. There are a lot of features, that was
rejected first time - like replication. But it needs a different
access. You have to show tests, use cases, code and you have to
satisfy all people, so your request is good and necessary. Argument,
so other databases has this feature is a last on top ten.

 Nobody has ever proposed a way to add hints where consensus was arrived
 at that the way was good, so...


 So, I will have to go back on my decision to use Postgres and re-consider
 MySQL? I will rather throw away the effort invested in studying Postgres
 than to risk an unfixable application downtime.  I am not sure about the
 world domination thing, though. Optimizer hints are a big feature that
 everybody else has and Postgres does not have because of religious reasons.

it's not correct from you. There is a real arguments against hints.



you can try a edb. There is a other external modul

http://postgresql.1045698.n5.nabble.com/contrib-plantuner-enable-PostgreSQL-planner-hints-td1924794.html

Regards

Pavel Stehule


 --

 Mladen Gogala Sr. Oracle DBA
 1500 Broadway
 New York, NY 10036
 (212) 329-5251
 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions




 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Josh Berkus
On 2/3/11 1:18 PM, Chris Browne wrote:
 mladen.gog...@vmsinfo.com (Mladen Gogala) writes:
 I must say that this purist attitude is extremely surprising to
 me. All the major DB vendors support optimizer hints,

I don't think that's actually accurate.  Can you give me a list of
DBMSes which support hints other than Oracle?

 Well, the community declines to add hints until there is actual
 consensus on a good way to add hints.
 
 Nobody has ever proposed a way to add hints where consensus was arrived
 at that the way was good, so...

Well, we did actually have some pretty good proposals (IIRC) for
selectively adjusting the cost model to take into account DBA knowledge.
 These needed some refinement, but in general seem like the right way to go.

However, since this system wasn't directly compatible with Oracle Hints,
folks pushing for hints dropped the solution as unsatisfactory. This is
the discussion we have every time: the users who want hints specifically
want hints which work exactly like Oracle's, and aren't interested in a
system designed for PostgreSQL.  It's gotten very boring; it's like the
requests to support MySQL-only syntax.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Pavel Stehule
 In the meantime, the other databases provide hints which help me bridge the
 gap. As I said before: hints are there, even if they were not meant to be
 used that way. I can do things in a way that I consider very non-elegant.
 The hints are there because they are definitely needed. Yet, there is a
 religious zeal and a fatwa against them.


Other databases has different development model. It isn't based on
consensus. The are not any commercial model for PostgreSQL. There are
not possible to pay programmers. So you can pay and as customer, you
are boss or use it freely and search a consensus - a common talk.

Regards

Pavel Stehule

 --

 Mladen Gogala Sr. Oracle DBA
 1500 Broadway
 New York, NY 10036
 (212) 329-5251
 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions




 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Josh Berkus wrote:

However, since this system wasn't directly compatible with Oracle Hints,
folks pushing for hints dropped the solution as unsatisfactory. This is
the discussion we have every time: the users who want hints specifically
want hints which work exactly like Oracle's, and aren't interested in a
system designed for PostgreSQL.  It's gotten very boring; it's like the
requests to support MySQL-only syntax.
  
Actually, I don't want Oracle hints. Oracle hints are ugly and 
cumbersome. I would prefer something like this:


http://dev.mysql.com/doc/refman/5.0/en/index-hints.html

That should also answer the question about other databases supporting hints.

--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Mladen Gogala wrote:
Actually, I don't want Oracle hints. Oracle hints are ugly and 
cumbersome. I would prefer something like this:


http://dev.mysql.com/doc/refman/5.0/en/index-hints.html

That should also answer the question about other databases supporting hints.
  


Sorry. I forgot that MySQL too is now an Oracle product.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Maciek Sakrejda
 The hints are there because they are definitely needed. Yet, there is a
 religious zeal and a fatwa against them.

The opposition is philosophical, not religious. There is no fatwa.
If you want a serious discussion, avoid inflammatory terms.

---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
www.truviso.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mark Kirkwood

On 04/02/11 11:08, Josh Berkus wrote:

I don't think that's actually accurate.  Can you give me a list of
DBMSes which support hints other than Oracle?


DB2 LUW (Linux, Unix, Win32 code base) has hint profiles:

http://justdb2chatter.blogspot.com/2008/06/db2-hints-optimizer-selection.html


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Greg Smith

da...@lang.hm wrote:
I am making the assumption that an Analyze run only has to go over the 
data once (a seqential scan of the table if it's  ram for example) 
and gathers stats as it goes.


And that's the part there's some confusion about here.  ANALYZE grabs a 
random set of samples from the table, the number of which is guided by 
the setting for default_statistics_target.  The amount of time it takes 
is not proportional to the table size; it's only proportional to the 
sampling size.  Adding a process whose overhead is proportional to the 
table size, such as the continuous analyze idea you're proposing, is 
quite likely to be a big step backwards relative to just running a 
single ANALYZE after the loading is finished.


What people should be doing if concerned about multiple passes happening 
is something like this:


CREATE TABLE t (s serial, i integer) WITH (autovacuum_enabled=off);
[populate table]
ANALYZE t;
ALTER TABLE t SET (autovacuum_enabled=on);

I'm not optimistic the database will ever get smart enough to recognize 
bulk loading and do this sort of thing automatically, but as the 
workaround is so simple it's hard to get motivated to work on trying.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Maciek Sakrejda wrote:

The hints are there because they are definitely needed. Yet, there is a
religious zeal and a fatwa against them.



The opposition is philosophical, not religious. There is no fatwa.
If you want a serious discussion, avoid inflammatory terms.


  
I don't want to insult anybody but the whole thing does look strange. 
Maybe we can agree to remove that ridiculous we don't want hints note 
from Postgresql wiki? That would make it look less like , hmph, 
philosophical issue and more not yet implemented issue, especially if 
we have in mind that hints are already here, in the form of 
enable_method switches.



--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Mark Kirkwood wrote:

On 04/02/11 11:08, Josh Berkus wrote:
  

I don't think that's actually accurate.  Can you give me a list of
DBMSes which support hints other than Oracle?



DB2 LUW (Linux, Unix, Win32 code base) has hint profiles:

http://justdb2chatter.blogspot.com/2008/06/db2-hints-optimizer-selection.html


  

SQL Server and MySQL too.

--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Michael Glaesemann

On Feb 3, 2011, at 17:08, Josh Berkus wrote:

 On 2/3/11 1:18 PM, Chris Browne wrote:
 mladen.gog...@vmsinfo.com (Mladen Gogala) writes:
 I must say that this purist attitude is extremely surprising to
 me. All the major DB vendors support optimizer hints,
 
 I don't think that's actually accurate.  Can you give me a list of
 DBMSes which support hints other than Oracle?

1 minute of Googling shows results for:

db2:
http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.db2.doc.admin/p9li375.htm

informix:
http://www.ibm.com/developerworks/data/zones/informix/library/techarticle/0502fan/0502fan.html

sybase:
http://searchenterpriselinux.techtarget.com/answer/Query-hints-to-override-optimizer

mysql:
http://dev.mysql.com/doc/refman/5.0/en/index-hints.html

I haven't read much of the rest of this thread, so others may have brought 
these up before.

Michael Glaesemann
grzm seespotcode net




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Josh Berkus

 I don't want to insult anybody but the whole thing does look strange.
 Maybe we can agree to remove that ridiculous we don't want hints note
 from Postgresql wiki? That would make it look less like , hmph,
 philosophical issue and more not yet implemented issue, especially if
 we have in mind that hints are already here, in the form of
 enable_method switches.

Link? There's a lot of stuff on the wiki.


-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Kevin Grittner
Mladen Gogala mladen.gog...@vmsinfo.com wrote:
 
 Maybe we can agree to remove that ridiculous we don't want hints
 note from Postgresql wiki?
 
I'd be against that.  This is rehashed less frequently since that
went in.  Less wasted time and bandwidth with it there.
 
 That would make it look less like , hmph, philosophical issue and
 more not yet implemented issue,
 
Exactly what we don't want.
 
 especially if we have in mind that hints are already here, in the
 form of enable_method switches.
 
Those aren't intended as hints for production use.  They're there
for diagnostic purposes.  In our shop we've never used any of those
flags in production.
 
That said, there are ways to force an optimization barrier when
needed, which I have occasionally seen people find useful.  And
there are sometimes provably logically equivalent ways to write a
query which result in different plans with different performance. 
It's rare that someone presents a poorly performing query on the
list and doesn't get a satisfactory resolution fairly quickly -- if
they present sufficient detail and work nicely with others who are
volunteering their time to help.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Justin Pitts
 With all
 due respect, I consider myself smarter than the optimizer.  I'm 6'4, 235LBS
 so telling me that you disagree and that I am more stupid than a computer
 program,  would not be a smart thing to do. Please, do not misunderestimate
 me.

I don't see computer programs make thinly veiled threats, especially
in a public forum.
I'll do what you claim is not the smart thing and disagree with you.
You are wrong.
You are dragging the signal-to-noise ratio of this discussion down.
You owe Greg an apology.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Justin Pitts
Thank you.

It appears I owe an apology also, for jumping to that conclusion. It
was rash and unfair of me. I am sorry.

On Wed, Feb 2, 2011 at 5:03 PM, Mladen Gogala mladen.gog...@vmsinfo.com wrote:
 Justin Pitts wrote:

 With all
 due respect, I consider myself smarter than the optimizer.  I'm 6'4,
 235LBS
 so telling me that you disagree and that I am more stupid than a computer
 program,  would not be a smart thing to do. Please, do not
 misunderestimate
 me.


 I don't see computer programs make thinly veiled threats, especially
 in a public forum.
 I'll do what you claim is not the smart thing and disagree with you.
 You are wrong.
 You are dragging the signal-to-noise ratio of this discussion down.
 You owe Greg an apology.


 I apologize if that was understood as a threat. It was actually a joke. I
 thought that my using of the word misunderestimate has made it abundantly
 clear. Apparently, G.W. doesn't have as many fans as I have previously
 thought. Once again, it was a joke, I humbly apologize if that was
 misunderstood.

 --

 Mladen Gogala Sr. Oracle DBA
 1500 Broadway
 New York, NY 10036
 (212) 329-5251
 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions





-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Andrew Dunstan



On 02/02/2011 07:17 PM, Greg Smith wrote:
I direct anyone who thought Mladen was making a serious comment to 
http://www.nydailynews.com/news/politics/2009/01/08/2009-01-08_misunderestimate_tops_list_of_notable_bu-3.html 
if you want to get his little joke there.  I plan to start using 
misunderestimate more in the future when talking about planner 
errors.  Might even try to slip it into the docs at some point in the 
future and see if anybody catches it.


My wings take dream ...


cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Josh Berkus wrote:

I don't want to insult anybody but the whole thing does look strange.
Maybe we can agree to remove that ridiculous we don't want hints note
from Postgresql wiki? That would make it look less like , hmph,
philosophical issue and more not yet implemented issue, especially if
we have in mind that hints are already here, in the form of
enable_method switches.



Link? There's a lot of stuff on the wiki.


  

http://wiki.postgresql.org/wiki/Todo#Features_We_Do_Not_Want

No. 2 on the list.

--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Robert Haas
On Thu, Feb 3, 2011 at 3:54 PM,  da...@lang.hm wrote:
 with the current code, this is a completely separate process that knows
 nothing about the load, so if you kick it off when you start the load, it
 makes a pass over the table (competing for I/O), finishes, you continue to
 update the table, so it makes another pass, etc. As you say, this is a bad
 thing to do. I am saying to have an option that ties the two togeather,
 essentially making the data feed into the Analyze run be a fork of the data
 comeing out of the insert run going to disk. So the Analyze run doesn't do
 any I/O and isn't going to complete until the insert is complete. At which
 time it will have seen one copy of the entire table.

Yeah, but you'll be passing the entire table through this separate
process that may only need to see 1% of it or less on a large table.
If you want to write the code and prove it's better than what we have
now, or some other approach that someone else may implement in the
meantime, hey, this is an open source project, and I like improvements
as much as the next guy.  But my prediction for what it's worth is
that the results will suck.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Kevin Grittner wrote:

Mladen Gogala mladen.gog...@vmsinfo.com wrote:
 
  

Maybe we can agree to remove that ridiculous we don't want hints
note from Postgresql wiki?

 
I'd be against that.  This is rehashed less frequently since that

went in.  Less wasted time and bandwidth with it there.
  


Well, the problem will not go away.  As I've said before, all other 
databases have that feature and none of the reasons listed here 
convinced me that everybody else has a crappy optimizer.  The problem 
may go away altogether if people stop using PostgreSQL.
 
  

That would make it look less like , hmph, philosophical issue and
more not yet implemented issue,

 
Exactly what we don't want.
  

Who is we?


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Joshua D. Drake
On Thu, 2011-02-03 at 18:33 -0500, Mladen Gogala wrote:
  
   
  Exactly what we don't want.

 Who is we?

The majority of long term hackers.

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Robert Haas
On Thu, Feb 3, 2011 at 6:33 PM, Mladen Gogala mladen.gog...@vmsinfo.com wrote:
 Kevin Grittner wrote:
 Mladen Gogala mladen.gog...@vmsinfo.com wrote:

 Maybe we can agree to remove that ridiculous we don't want hints
 note from Postgresql wiki?


  I'd be against that.  This is rehashed less frequently since that
 went in.  Less wasted time and bandwidth with it there.

 Well, the problem will not go away.  As I've said before, all other
 databases have that feature and none of the reasons listed here convinced me
 that everybody else has a crappy optimizer.  The problem may go away
 altogether if people stop using PostgreSQL.

You seem to be asserting that without hints, problem queries can't be
fixed.  But you haven't offered any evidence for that proposition, and
it doesn't match my experience, or the experience of other people on
this list who have been using PostgreSQL for a very long time.  If you
want to seriously advance this conversation, you should (1) learn how
people who use PostgreSQL solve these problems and then (2) if you
think there are cases where those methods are inadequate, present
them, and let's have a discussion about it.  People in this community
DO change their mind about things - but they do so in response to
*evidence*.  You haven't presented one tangible example of where the
sort of hints you seem to want would actually help anything, and yet
you're accusing the people who don't agree with you of being engaged
in a religious war.  It seems to me that the shoe is on the other
foot.  Religion is when you believe something first and then look for
evidence to support it.  Science goes the other direction.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Joshua D. Drake wrote:

On Thu, 2011-02-03 at 18:33 -0500, Mladen Gogala wrote:
  
 

 
Exactly what we don't want.
  
  

Who is we?



The majority of long term hackers.

  
If that is so,  I don't see world domination in the future, exactly 
the opposite. Database whose creators don't trust their users cannot 
count on the very bright future. All other databases do have that 
feature. I must say, this debate gave me a good deal of stuff to think 
about.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Craig James

On 2/3/11 1:34 PM, Shaun Thomas wrote:

I must say that this purist attitude is extremely surprising to me.
All the major DB vendors support optimizer hints, yet in the
Postgres community, they are considered bad with almost religious
fervor. Postgres community is quite unique with the fatwa against
hints.


You missed the argument. The community, or at least the devs, see hints
 as an ugly hack.


Let's kill the myth right now that Postgres doesn't have hints.  It DOES have 
hints.

Just read this forum for a few days and see how many time there are suggestions like disable nested loops 
or disable seqscan, or change the random page cost, or change the join collapse 
limit.

All of these options are nothing more than a way of altering the planner's 
choices so that it will pick the plan that the designer already suspects is 
more optimal.

If that's not a hint, I don't know what is.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Shaun Thomas

 All other databases do have that feature. I must say, this
 debate gave me a good deal of stuff to think about.

And, I think we're done here. The idea that the lack of hints will kill
PostgreSQL is already demonstrably false. This is sounding more and
more like a petulant tantrum.

Folks, I apologize for ever taking part in this conversation and contributing
to the loss of signal to noise. Please forgive me.

--
Shaun Thomas
Peak6 | 141 W. Jackson Blvd. | Suite 800 | Chicago, IL 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread david

On Thu, 3 Feb 2011, Robert Haas wrote:


On Thu, Feb 3, 2011 at 3:54 PM,  da...@lang.hm wrote:

with the current code, this is a completely separate process that knows
nothing about the load, so if you kick it off when you start the load, it
makes a pass over the table (competing for I/O), finishes, you continue to
update the table, so it makes another pass, etc. As you say, this is a bad
thing to do. I am saying to have an option that ties the two togeather,
essentially making the data feed into the Analyze run be a fork of the data
comeing out of the insert run going to disk. So the Analyze run doesn't do
any I/O and isn't going to complete until the insert is complete. At which
time it will have seen one copy of the entire table.


Yeah, but you'll be passing the entire table through this separate
process that may only need to see 1% of it or less on a large table.
If you want to write the code and prove it's better than what we have
now, or some other approach that someone else may implement in the
meantime, hey, this is an open source project, and I like improvements
as much as the next guy.  But my prediction for what it's worth is
that the results will suck.  :-)


I will point out that 1% of a very large table can still be a lot of disk 
I/O that is avoided (especially if it's random I/O that's avoided)


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Robert Haas wrote:

On Thu, Feb 3, 2011 at 6:33 PM, Mladen Gogala mladen.gog...@vmsinfo.com wrote:
  

Kevin Grittner wrote:


Mladen Gogala mladen.gog...@vmsinfo.com wrote:
  

Maybe we can agree to remove that ridiculous we don't want hints
note from Postgresql wiki?



 I'd be against that.  This is rehashed less frequently since that
went in.  Less wasted time and bandwidth with it there.
  

Well, the problem will not go away.  As I've said before, all other
databases have that feature and none of the reasons listed here convinced me
that everybody else has a crappy optimizer.  The problem may go away
altogether if people stop using PostgreSQL.



You seem to be asserting that without hints, problem queries can't be
fixed.  But you haven't offered any evidence for that proposition, and
it doesn't match my experience, or the experience of other people on
this list who have been using PostgreSQL for a very long time.  If you
want to seriously advance this conversation, you should (1) learn how
people who use PostgreSQL solve these problems and then (2) if you
think there are cases where those methods are inadequate, present
them, and let's have a discussion about it.  People in this community
DO change their mind about things - but they do so in response to
*evidence*.  You haven't presented one tangible example of where the
sort of hints you seem to want would actually help anything, and yet
you're accusing the people who don't agree with you of being engaged
in a religious war.  It seems to me that the shoe is on the other
foot.  Religion is when you believe something first and then look for
evidence to support it.  Science goes the other direction.

  
Actually, it is not unlike a religious dogma, only stating that hints 
are bad. It even says so in the wiki. The arguments are
1) Refusal to implement hints is motivated by distrust toward users, 
citing that some people may mess things up.

   Yes, they can, with and without hints.
2) All other databases have them. This is a major feature and if I were 
in the MySQL camp, I would use it as an
  argument. Asking me for some proof is missing the point. All other 
databases have hints precisely because
  they are useful. Assertion that only Postgres is so smart that can 
operate without hints doesn't match the
  reality. As a matter of fact, Oracle RDBMS on the same machine will 
regularly beat PgSQL in performance.
  That has been my experience so far.   I even posted counting query 
results.
3) Hints are make it or break it feature. They're absolutely needed in 
the fire extinguishing situations.


I see no arguments to say otherwise and until that ridiculous we don't 
want hints dogma is on wiki, this is precisely what it is:  a dogma. 
Dogmas do not change and I am sorry that you don't see it that way. 
However, this discussion
did convince me that I need to take another look at MySQL and tone down 
my engagement with PostgreSQL community. This is my last post on the 
subject because posts are becoming increasingly personal. This level of 
irritation is also
characteristic of a religious community chastising a sinner. Let me 
remind you again: all other major databases have that possibility: 
Oracle, MySQL, DB2, SQL Server and Informix. Requiring burden of proof 
about hints is equivalent to saying that all these databases are 
developed by idiots and have a crappy optimizer.
I am not going to back down, but I may stop using Postgres altogether. 
If that was your goal, you almost achieved it. Oh yes, and good luck 
with the world domination. If there is not enough common sense even to 
take down  that stupid dogma on the wiki, there isn't much hope left.
With this post, my participation in this group is finished, for the 
foreseeable future.



--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Jeremy Harris

On 2011-02-03 23:29, Robert Haas wrote:

Yeah, but you'll be passing the entire table through this separate
process that may only need to see 1% of it or less on a large table.


It doesn't sound too impossible to pass only a percentage, starting high
and dropping towards 1% once the loaded size has become large.
--
Jeremy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Jeremy Harris

On 2011-02-03 21:51, Mark Kirkwood wrote:

The cases I've seen in production typically involve outgrowing optimizer 
parameter settings: (e.g work_mem, effective_cache_size) as the application dataset gets 
bigger over time.


An argument in favour of the DBMS maintaining a running estimate of such things.
--
Jeremy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Grant Johnson



On PostgreSQL, the difference in no hints and hints for that one query 
with skewed data is that the query finishes a little faster.   On some 
others, which shall remain nameless, it is the difference between 
finishing in seconds or days, or maybe never.  Hints can be useful, but 
I can also see why they are not a top priority.  They are rarely needed, 
and only when working around a bug.  If you want them so badly, you have 
the source, write a contrib module  (can you do that on Oracle or 
MSSQL?)  If I have a choice between the developers spending time on 
implementing hints, and spending time on improving the optimiser, I'll 
take the optimiser.


Tom Kyte agrees:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0P11_QUESTION_ID:8912905298920
http://tkyte.blogspot.com/2006/08/words-of-wisdom.html



Oracle can be faster on count queries, but that is the only case I have 
seen.   Generally on most other queries, especially when it involves 
complex joins, or indexes on text fields, PostgreSQL is faster on the 
same hardware.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mark Kirkwood

On 04/02/11 13:49, Jeremy Harris wrote:

On 2011-02-03 21:51, Mark Kirkwood wrote:
The cases I've seen in production typically involve outgrowing 
optimizer parameter settings: (e.g work_mem, effective_cache_size) as 
the application dataset gets bigger over time.


An argument in favour of the DBMS maintaining a running estimate of 
such things.


That is an interesting idea - I'm not quite sure how it could apply to 
server config settings (e.g work_mem) for which it would be dangerous to 
allow the server to increase on the fly, but it sure would be handy to 
have some sort of query execution memory so that alerts like:


STATEMENT: SELECT blah  : PARAMETERS blah: using temp file(s), last 
execution used memory


could be generated (this could be quite complex I guess, requiring some 
sort of long lived statement plan cache).


Cheers

Mark


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Robert Haas
On Thu, Feb 3, 2011 at 7:39 PM,  da...@lang.hm wrote:
 Yeah, but you'll be passing the entire table through this separate
 process that may only need to see 1% of it or less on a large table.
 If you want to write the code and prove it's better than what we have
 now, or some other approach that someone else may implement in the
 meantime, hey, this is an open source project, and I like improvements
 as much as the next guy.  But my prediction for what it's worth is
 that the results will suck.  :-)

 I will point out that 1% of a very large table can still be a lot of disk
 I/O that is avoided (especially if it's random I/O that's avoided)

Sure, but I think that trying to avoid it will be costly in other ways
- you'll be streaming a huge volume of data through some auxiliary
process, which will have to apply some algorithm that's very different
from the one we use today.  The reality is that I think there's little
evidence that the way we do ANALYZE now is too expensive.  It's
typically very cheap and works very well.  It's a bit annoying when it
fires off in the middle of a giant data load, so we might need to
change the time of it a little, but if there's a problem with the
operation itself being too costly, this is the first I'm hearing of
it.  We've actually worked *really* hard to make it cheap.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Robert Haas
On Thu, Feb 3, 2011 at 7:39 PM, Mladen Gogala mladen.gog...@vmsinfo.com wrote:
  reality. As a matter of fact, Oracle RDBMS on the same machine will
 regularly beat PgSQL in performance.
  That has been my experience so far.   I even posted counting query results.

It sure is, but those count queries didn't run faster because of query
planner hints.  They ran faster because of things like index-only
scans, fast full index scans, asynchronous I/O, and parallel query.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread david

On Thu, 3 Feb 2011, Robert Haas wrote:


On Thu, Feb 3, 2011 at 7:39 PM,  da...@lang.hm wrote:

Yeah, but you'll be passing the entire table through this separate
process that may only need to see 1% of it or less on a large table.
If you want to write the code and prove it's better than what we have
now, or some other approach that someone else may implement in the
meantime, hey, this is an open source project, and I like improvements
as much as the next guy.  But my prediction for what it's worth is
that the results will suck.  :-)


I will point out that 1% of a very large table can still be a lot of disk
I/O that is avoided (especially if it's random I/O that's avoided)


Sure, but I think that trying to avoid it will be costly in other ways
- you'll be streaming a huge volume of data through some auxiliary
process, which will have to apply some algorithm that's very different
from the one we use today.  The reality is that I think there's little
evidence that the way we do ANALYZE now is too expensive.  It's
typically very cheap and works very well.  It's a bit annoying when it
fires off in the middle of a giant data load, so we might need to
change the time of it a little, but if there's a problem with the
operation itself being too costly, this is the first I'm hearing of
it.  We've actually worked *really* hard to make it cheap.


I could be misunderstanding things here, but my understanding is that it's 
'cheap' in that it has little impact on the database while it is running.


the issue here is that the workflow is

load data
analyze
start work

so the cost of analyze in this workflow is not 1% impact on query speed 
for the next X time, it's the database can't be used for the next X time 
while we wait for analyze to finish running


I don't understand why the algorithm would have to be so different than 
what's done today, surely the analyze thread could easily be tweaked to 
ignore the rest of the data (assuming we don't have the thread sending the 
data to analyze do the filtering)


David Lang
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Robert Haas
On Thu, Feb 3, 2011 at 8:37 PM,  da...@lang.hm wrote:
 On Thu, 3 Feb 2011, Robert Haas wrote:

 On Thu, Feb 3, 2011 at 7:39 PM,  da...@lang.hm wrote:

 Yeah, but you'll be passing the entire table through this separate
 process that may only need to see 1% of it or less on a large table.
 If you want to write the code and prove it's better than what we have
 now, or some other approach that someone else may implement in the
 meantime, hey, this is an open source project, and I like improvements
 as much as the next guy.  But my prediction for what it's worth is
 that the results will suck.  :-)

 I will point out that 1% of a very large table can still be a lot of disk
 I/O that is avoided (especially if it's random I/O that's avoided)

 Sure, but I think that trying to avoid it will be costly in other ways
 - you'll be streaming a huge volume of data through some auxiliary
 process, which will have to apply some algorithm that's very different
 from the one we use today.  The reality is that I think there's little
 evidence that the way we do ANALYZE now is too expensive.  It's
 typically very cheap and works very well.  It's a bit annoying when it
 fires off in the middle of a giant data load, so we might need to
 change the time of it a little, but if there's a problem with the
 operation itself being too costly, this is the first I'm hearing of
 it.  We've actually worked *really* hard to make it cheap.

 I could be misunderstanding things here, but my understanding is that it's
 'cheap' in that it has little impact on the database while it is running.

I mean that it's cheap in that it usually takes very little time to complete.

 the issue here is that the workflow is

 load data
 analyze
 start work

 so the cost of analyze in this workflow is not 1% impact on query speed for
 the next X time, it's the database can't be used for the next X time while
 we wait for analyze to finish running

OK.

 I don't understand why the algorithm would have to be so different than
 what's done today, surely the analyze thread could easily be tweaked to
 ignore the rest of the data (assuming we don't have the thread sending the
 data to analyze do the filtering)

If you want to randomly pick 10,000 rows out of all the rows that are
going to be inserted in the table without knowing in advance how many
there will be, how do you do that?  Maybe there's an algorithm, but
it's not obvious to me.  But mostly, I question how expensive it is to
have a second process looking at the entire table contents vs. going
back and rereading a sample of rows at the end.  I can't remember
anyone ever complaining ANALYZE took too long to run.  I only
remember complaints of the form I had to remember to manually run it
and I wish it had just happened by itself.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Conor Walsh
 I can't remember
 anyone ever complaining ANALYZE took too long to run.  I only
 remember complaints of the form I had to remember to manually run it
 and I wish it had just happened by itself.

Robert,

This sounds like an argument in favor of an implicit ANALYZE after all
COPY statements, and/or an implicit autoanalyze check after all
INSERT/UPDATE statements.

-Conor

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Joshua D. Drake
On Thu, 2011-02-03 at 18:12 -0800, Conor Walsh wrote:
  I can't remember
  anyone ever complaining ANALYZE took too long to run.  I only
  remember complaints of the form I had to remember to manually run it
  and I wish it had just happened by itself.
 
 Robert,
 
 This sounds like an argument in favor of an implicit ANALYZE after all
 COPY statements, and/or an implicit autoanalyze check after all
 INSERT/UPDATE statements.

Well that already happens. Assuming you insert/update or copy in a
greater amount than the threshold for the 

autovacuum_analyze_scale_factor

Then autovacuum is going to analyze on the next run. The default is .1
so it certainly doesn't take much.

JD

 
 -Conor
 

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Conor Walsh
On Thu, Feb 3, 2011 at 6:33 PM, Joshua D. Drake j...@commandprompt.com wrote:
 Well that already happens...

My understanding is that auto-analyze will fire only after my
transaction is completed, because it is a seperate daemon.  If I do
like so:

BEGIN;
COPY ...;
-- Dangerously un-analyzed
SELECT complicated-stuff ...;
END;

Auto-analyze does not benefit me, or might not because it won't fire
often enough.  I agree that analyze is very fast, and it often seems
to me like the cost/benefit ratio suggests making auto-analyze even
more aggressive.

Disclaimer/disclosure: I deal exclusively with very large data sets
these days, so analyzing all the time is almost a highly effective
worst-case amortization.  I understand that constant analyze is not so
great in, say, an OLTP setting.  But if the check is cheap, making
auto-analyze more integrated and less daemon-driven might be a net
win.  I'm not sure.

-Conor

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Scott Marlowe
On Thu, Feb 3, 2011 at 5:39 PM, Mladen Gogala mladen.gog...@vmsinfo.com wrote:
 Actually, it is not unlike a religious dogma, only stating that hints are
 bad. It even says so in the wiki. The arguments are

There's been considerably more output than hints bad!  Hulk Smash!

 1) Refusal to implement hints is motivated by distrust toward users, citing
 that some people may mess things up.

It's more about creating a knob that will create more problems than it
solves.  Which I get.  And making sure that if you make such a knob
that it'll do the least damage and give the most usefulness.  Until a
good proposal and some code to do it shows up, we're all just waving
our hands around describing different parts of the elephant.

 2) All other databases have them. This is a major feature and if I were in
 the MySQL camp, I would use it as an
  argument. Asking me for some proof is missing the point. All other
 databases have hints precisely because
  they are useful.

Uh, two points.  1: Argumentum Ad Populum.  Just because it's popular
doesn't mean it's right. 2: Other databases have them because their
optimizers can't make the right decision even most of the time.  Yes
they're useful, but like a plastic bad covering a broken car window,
they're useful because they cover something that's inherently broken.


 Assertion that only Postgres is so smart that can operate
 without hints doesn't match the
  reality.

Again, you're twisting what people have said.  the point being that
while postgresql makes mistakes, we'd rather concentrate on making the
planner smarter than giving it a lobotomy and running it remotely like
a robot.


 As a matter of fact, Oracle RDBMS on the same machine will
 regularly beat PgSQL in performance.

Yes.  And this has little to do with hints.  It has to do with years
of development lead with THOUSANDS of engineers who can work on the
most esoteric corner cases in their spare time.  Find the pg project a
couple hundred software engineers and maybe we'll catch Oracle a
little quicker.  Otherwise we'll have to marshall our resources to do
the best we can on the project ,and that means avoiding maintenance
black holes and having the devs work on the things that give the most
benefit for the cost.  Hints are something only a tiny percentage of
users could actually use and use well.

Write a check, hire some developers and get the code done and present
it to the community.  If it's good and works it'll likely get
accepted.  Or use EDB, since it has oracle compatibility in it.

  That has been my experience so far.   I even posted counting query results.
 3) Hints are make it or break it feature. They're absolutely needed in the
 fire extinguishing situations.

I've been using pg since 6.5.2.  I've used Oracle since version 8 or
so.  I have never been in a situation with postgresql where I couldn't
fix the problem with either tuning, query editing, or asking Tom for a
patch for a problem I found in it.  Turnaround time on the last patch
that was made to fix my problem was somewhere in the 24 hour range.
If Oracle can patch their planner that fast, let me know.

 I see no arguments to say otherwise and until that ridiculous we don't want
 hints dogma is on wiki, this is precisely what it is:  a dogma. Dogmas do
 not change and I am sorry that you don't see it that way. However, this
 discussion

No, it's not dogma, you need to present a strong coherent argument,
not threaten people on the list etc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Jeff Davis
On Thu, 2011-02-03 at 16:50 -0500, Mladen Gogala wrote:
 Chris Browne wrote:
  Well, the community declines to add hints until there is actual
  consensus on a good way to add hints.

 OK. That's another matter entirely.   Who should make that decision? Is 
 there a committee or a person who would be capable of making that decision?

Admittedly I haven't read this whole discussion, but it seems like
hints might be too poorly defined right now.

If by hints we mean some mechanism to influence the planner in a more
fine-grained way, I could imagine that some proposal along those lines
might gain significant support.

But, as always, it depends on the content and quality of the proposal
more than the title. If someone has thoughtful proposal that tries to
balance things like:
* DBA control versus query changes/comments
* compatibility across versions versus finer plan control
* allowing the existing optimizer to optimize portions of the
  query while controlling other portions
* indicating costs and cardinalities versus plans directly

I am confident that such a proposal will gain traction among the
community as a whole.

However, a series proposals for individual hacks for specific purposes
will probably be rejected. I am in no way implying that you are
approaching it this way -- I am just trying to characterize an approach
that won't make progress.

Regards,
Jeff Davis


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Scott Marlowe
On Thu, Feb 3, 2011 at 7:05 PM, Robert Haas robertmh...@gmail.com wrote:
 If you want to randomly pick 10,000 rows out of all the rows that are
 going to be inserted in the table without knowing in advance how many
 there will be, how do you do that?

Maybe you could instead just have it use some % of the rows going by?
Just a guess.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Greg Smith

Scott Marlowe wrote:

Yes they're useful, but like a plastic bad covering a broken car window,
they're useful because they cover something that's inherently broken.
  


Awesome.  Now we have a car anology, with a funny typo no less.  
Plastic bad, I love it.  This is real progress toward getting all the 
common list argument idioms aired out.  All we need now is a homage to 
Mike Godwin and we can close this down.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Does auto-analyze work on dirty writes? (was: Re: [HACKERS] [PERFORM] Slow count(*) again...)

2011-02-03 Thread Mark Mielke

On 02/03/2011 09:45 PM, Conor Walsh wrote:

My understanding is that auto-analyze will fire only after my
transaction is completed, because it is a seperate daemon.  If I do
like so:

BEGIN;
COPY ...;
-- Dangerously un-analyzed
SELECT complicated-stuff ...;
END;

Auto-analyze does not benefit me, or might not because it won't fire
often enough.  I agree that analyze is very fast, and it often seems
to me like the cost/benefit ratio suggests making auto-analyze even
more aggressive.


The count discussion is boring. Nothing new there. But auto-analyze on 
dirty writes does interest me. :-)


My understanding is:

1) Background daemon wakes up and checks whether a number of changes 
have happened to the database, irrelevant of transaction boundaries.


2) Background daemon analyzes a percentage of rows in the database for 
statistical data, irrelevant of row visibility.


3) Analyze is important for both visible rows and invisible rows, as 
plan execution is impacted by invisible rows. As long as they are part 
of the table, they may impact the queries performed against the table.


4) It doesn't matter if the invisible rows are invisible because they 
are not yet committed, or because they are not yet vacuumed.


Would somebody in the know please confirm the above understanding for my 
own piece of mind?


Thanks,
mark

--
Mark Mielkem...@mielke.cc


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Scott Marlowe
On Thu, Feb 3, 2011 at 8:40 PM, Greg Smith g...@2ndquadrant.com wrote:
 Scott Marlowe wrote:

 Yes they're useful, but like a plastic bad covering a broken car window,
 they're useful because they cover something that's inherently broken.


 Awesome.  Now we have a car anology, with a funny typo no less.  Plastic
 bad, I love it.  This is real progress toward getting all the common list
 argument idioms aired out.  All we need now is a homage to Mike Godwin and
 we can close this down.

It's not so much a car analogy as a plastic bad analogy.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Greg Smith

Scott Marlowe wrote:

It's not so much a car analogy as a plastic bad analogy.
  


Is that like a Plastic Ono Band?  Because I think one of those is the 
only thing holding the part of my bumper I smashed in the snow on right 
now.  I could be wrong about the name.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Scott Marlowe
On Thu, Feb 3, 2011 at 8:56 PM, Greg Smith g...@2ndquadrant.com wrote:
 Scott Marlowe wrote:

 It's not so much a car analogy as a plastic bad analogy.


 Is that like a Plastic Ono Band?  Because I think one of those is the only
 thing holding the part of my bumper I smashed in the snow on right now.  I
 could be wrong about the name.

No, that's a plastic oh no! band you have.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


  1   2   3   >