Re: [PERFORM] seqscan instead of index scan

2004-09-01 Thread Chester Kustarz
On Mon, 30 Aug 2004, Martin Sarsale wrote:
> "Multicolumn indexes can only be used if the clauses involving the
> indexed columns are joined with AND. For instance,
>
> SELECT name FROM test2 WHERE major = constant OR minor = constant;

You can use DeMorgan's Theorem to transform an OR clause to an AND clause.

In general:
A OR B <=> NOT ((NOT A) AND (NOT B))

So:

> But I need something like:
>
> select * from t where c<>0 or d<>0;

select * from t where not (c=0 and d=0);

I haven't actually tried to see if postgresql would do anything
interesting after such a transformation.




---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] [PERFORM] More detail on settings for pgavd?

2003-11-21 Thread Chester Kustarz
On Fri, 21 Nov 2003, Matthew T. O'Connor wrote:
> >> Do you know of an easy way to get a
> >>count of the total pages used by a whole cluster?
> >
> >Select sum(relpages) from pg_class.

You might want to exclude indexes from this calculation. Some large
read only tables might have indexes larger than the tables themselves.



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [PERFORM] More detail on settings for pgavd?

2003-11-20 Thread Chester Kustarz
On Thu, 20 Nov 2003, Tom Lane wrote:
> Those claims cannot both be true.  In any case, plain vacuum cannot grow
> the indexes --- only a VACUUM FULL that moves a significant number of
> rows could cause index growth.

er, yeah. you're right of course. having flashbacks of vacuum full.



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] [PERFORM] More detail on settings for pgavd?

2003-11-20 Thread Chester Kustarz
On Thu, 20 Nov 2003, Josh Berkus wrote:
> Additionally, you are not thinking of this in terms of an overall database
> maintanence strategy.   Lazy Vacuum needs to stay below the threshold of the
> Free Space Map (max_fsm_pages) to prevent creeping bloat from setting in to
> your databases.   With proper configuration of pg_avd, vacuum_mem and FSM
> values, it should be possible to never run a VACUUM FULL again, and as of 7.4
> never run an REINDEX again either.

is there any command you can run to see how much of the FSM is filled? is
there any way to tell which tables are filling it?

> Analyze is needed only as often as the *aggregate distribution* of data in the
> tables changes.   Depending on the application, this could be frequently, but
> far more often (in my experience running multiple databases for several
> clients) the data distribution of very large tables changes very slowly over
> time.

analyze does 2 things for me:
1. gets reasonable aggregate statistics
2. generates STATISTICS # of bins for the most frequent hitters

(2) is very important for me. my values typically seem to have power-law
like distributions. i need enough bins to reach a "cross-over" point where
the last bin is frequent enough to make an index scan useful. also,
i want enough bins so that the planner can choose index a or b for:
select * from foo where a=n and b=m;

the selectivity of either index depends not only on the average selectivity
of index a or index b, but on n and m as well. for example, 1M row table:

value   % of rows
v1  23
v2  12
v3  4.5
v4  4
v5  3.5
...

you can see that picking an index for =v1 would be poor. picking the
20th most common value would be 0.5% selective. much better. of course
this breaks down for more complex operators, but = is fairly common.

> So if you're going to have a seperate ANALYZE schedule at all, it should be
> slightly less frequent than VACUUM for large tables.   Either that, or drop
> the idea, and simplify pg_avd by running VACUUM ANALYZE all the time instead
> of having 2 seperate schedules.

i have some tables which are insert only. i do not want to vacuum them
because there are never any dead tuples in them and the vacuum grows the
indexes. plus it is very expensive (they tables grow rather large.) after they
expire i drop the whole table to make room for a newer one (making sort
of a rolling log with many large tables.)

i need to analyze them every so often so that the planner knows that
there is 1 row, 100 rows, 100k rows, 1M. the funny thing is
that because i never vacuum the tables, the relpages on the index never
grows. don't know if this affects anything (this is on 7.2.3).

vacuum is to reclaim dead tuples. this means it depends on update and
delete. analyze depends on data values/distribution. this means it depends on
insert, update, and delete. thus the dependencies are slightly different
between the 2 operations, an so you can come up with use-cases that
justify running either more frequently.

i am not sure how failed transactions fit into this though, not that i think
anybody ever has very many. maybe big rollbacks during testing?



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] index creation order?

2003-10-31 Thread Chester Kustarz
is there any way to update the stats inside a transaction? what i have is
something like:

select count(*) from foo;
-> 0

begin;

copy foo from '/tmp/foo'; -- about 100k rows

-- run some queries on foo which perform horribly because the stats
-- are way off (100k rows v. 0 rows)

commit;


it seems that you cannot run analyze inside a transaction:

begin;
analyze foo;
ERROR:  ANALYZE cannot run inside a BEGIN/END block

i am using version 7.2.3.

any work-a-rounds? should i try updating pg_statistic manually?

On Fri, 31 Oct 2003, Josh Berkus wrote:
> Among other things, ANALYZE tells postgres how many rows are in the table.  So
> if you add a PK constraint after loading 10 million rows without ANALYZE,
> PostgreSQL is likely to think that there is only one row in the table ... and
> choose a nested loop or some other really inefficient method of checking for
> uniqueness.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings