[PERFORM] Number of tables

2009-08-20 Thread Fabio La Farcioli

Hi to all,

i am developing a web app for thousands users (1.000/2.000).

Each user have a 2 table of work...I finally have 2.000 (users) x 2 
tables = 4.000 tables!


Postgres support an elevate number of tables??
i have problem of performance ???


Thanks

Sorry for my english

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-20 Thread Fabio La Farcioli

Craig Ringer ha scritto:

On Thu, 2009-08-20 at 09:01 +0200, Fabio La Farcioli wrote:

Each user have a 2 table of work...I finally have 2.000 (users) x 2 
tables = 4.000 tables!


Hmm, ok. Does each user really need two tables each? Why?

Does the set of tables for each user have a different structure? Or are
you separating them so you can give each user a separate database role
and ownership of their own tables?


No no...


i have problem of performance ???


Yes, you probably will. There is a cost to having _lots_ of tables in
PostgreSQL in terms of maintaining table statistics, autovacuum work,
etc. I doubt it'll be too bad at 4000 tables, but if your user numbers
keep growing it could become a problem.


The number of the user probably will increase with the time...


Other concerns are that it'll also be hard to maintain your design,
difficult to write queries that read data from more than one user, etc.
If you need to change the schema of your user tables you're going to
have to write custom tools to automate it. It could get very clumsy.


It's true...i don't think to this problem..



Note that whether this is a good idea DOES depend on how much data
you're going to have. If each user table will have _lots_ of data, then
individual tables might be a better approach after all. It's also a
benefit if you do intend to give each user their own database role.


Every table have between 1.000 and 100.000(MAX) records...

Do you think i don't have problem in performance ??
The user only view the record whit its user_id

I am thinking to redesign the DB


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-20 Thread Jochen Erwied
Thursday, August 20, 2009, 9:01:30 AM you wrote:

 i am developing a web app for thousands users (1.000/2.000).

 Each user have a 2 table of work...I finally have 2.000 (users) x 2 
 tables = 4.000 tables!

If all tables are created equal, I would rethink the design. Instead of
using 2 tables per user I'd use 2 tables with one column specifying the
user(-id).

Especially changes in table layout would require you to change up to 2000 
tables, which is prone to errors...

-- 
Jochen Erwied |   home: joc...@erwied.eu +49-208-38800-18, FAX: -19
Sauerbruchstr. 17 |   work: j...@mbs-software.de  +49-2151-7294-24, FAX: -50
D-45470 Muelheim  | mobile: jochen.erw...@vodafone.de   +49-173-5404164


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-20 Thread Craig Ringer
On Thu, 2009-08-20 at 09:01 +0200, Fabio La Farcioli wrote:

 Each user have a 2 table of work...I finally have 2.000 (users) x 2 
 tables = 4.000 tables!

Hmm, ok. Does each user really need two tables each? Why?

Does the set of tables for each user have a different structure? Or are
you separating them so you can give each user a separate database role
and ownership of their own tables?


 Postgres support an elevate number of tables??

Thousands? Sure.

 i have problem of performance ???
 
Yes, you probably will. There is a cost to having _lots_ of tables in
PostgreSQL in terms of maintaining table statistics, autovacuum work,
etc. I doubt it'll be too bad at 4000 tables, but if your user numbers
keep growing it could become a problem.

Other concerns are that it'll also be hard to maintain your design,
difficult to write queries that read data from more than one user, etc.
If you need to change the schema of your user tables you're going to
have to write custom tools to automate it. It could get very clumsy.

Instead of one or two tables per user, perhaps you should keep the data
in just a few tables, with a composite primary key that includes the
user ID. eg given the user table:

CREATE TABLE user (
  id SERIAL PRIMARY KEY,
  name text
);

instead of:

CREATE TABLE user1_tablea(
  id INTEGER PRIMARY KEY,
  blah text,
  blah2 integer
);

CREATE TABLE user2_tablea(
  id INTEGER PRIMARY KEY,
  blah text,
  blah2 integer
);

... etc ...


you might write:

CREATE TABLE tablea  (
  user_id INTEGER REFERENCES user(id),
  id INTEGER,
  PRIMARY KEY(user_id, id),
  blah text,
  blah2 integer
);


You can, of course, partition this table into blocks of user-IDs behind
the scenes, but your partitioning is invisible to your web app and can
be done solely for performance reasons. You don't have to try juggling
all these little tables.


Note that whether this is a good idea DOES depend on how much data
you're going to have. If each user table will have _lots_ of data, then
individual tables might be a better approach after all. It's also a
benefit if you do intend to give each user their own database role.

--
Craig Ringer



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query tuning

2009-08-20 Thread Віталій Тимчишин
2009/8/19 Kevin Kempter kev...@consistentstate.com


 We do have an index on url_hits.time

 not sure why timestamps were not used, I was not here for the design phase.


What's type of time column? I don't like it casts it to double in explain.
If it is integer, may be you need to change

and time = extract ('epoch' from timestamp '2009-08-12')
and time  extract ('epoch' from timestamp '2009-08-13' )

to

and time = extract ('epoch' from timestamp '2009-08-12')::int4
and time  extract ('epoch' from timestamp '2009-08-13' )::int4

for the index to be used?


Re: [PERFORM] number of rows estimation for bit-AND operation

2009-08-20 Thread Robert Haas
On Tue, Aug 18, 2009 at 6:34 PM, Scott Marlowescott.marl...@gmail.com wrote:
 2009/8/18 Slava Moudry smou...@4info.net:
 increase default stats target, analyze, try again.
 This field has only 5 values. I had put values/frequencies in my first post.

 Sorry, kinda missed that.  Anyway, there's no way for pg to know which
 operation is gonna match.  Without an index on it.  So my guess is
 that it just guesses some fixed value.  With an index it might be able
 to get it right, but you'll need an index for each type of match
 you're looking for.  I think.  Maybe someone else on the list has a
 better idea.

The best way to handle this is probably to not cram multiple vales
into a single field.  Just use one boolean for each flag.  It won't
even cost you any space, because right now you are using 8 bytes to
store 5 booleans, and 5 booleans will (I believe) only require 5
bytes.  Even if you were using enough of the bits for the space usage
to be higher with individual booleans, the overall performance is
likely to be better that way.

This is sort of stating the obvious, but it doesn't make it any less
true.  Unfortunately, PG's selectivity estimator can't handle cases
like this.  Tom Lane recently made some noises about trying to improve
it, but it's not clear whether that will go anywhere, and in any event
it won't happen before 8.5.0 comes out next spring/summer.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-20 Thread Craig James

Fabio La Farcioli wrote:

i am developing a web app for thousands users (1.000/2.000).

Each user have a 2 table of work...I finally have 2.000 (users) x 2 
tables = 4.000 tables!


Postgres support an elevate number of tables??
i have problem of performance ???


We have run databases with over 100,000 tables with no problems.

However, we found that it's not a good idea to have a table-per-user design.  
As you get more users, it is hard to maintain the database.  Most of the time 
there are only a few users active.

So, we create a single large archive table, identical to the per-user table except that 
it also has a user-id column.  When a user hasn't logged in for a few hours, a cron process copies 
their tables into the large archive table, and returns their personal tables to a pool 
of available tables.

When the user logs back in, a hidden part of the login process gets a table 
from the pool of available tables, assigns it to this user, and copies the 
user's  data from the archive into this personal table.  They are now ready to 
work. This whole process takes just a fraction of a second for most users.

We keep a pool of about 200 tables, which automatically will expand (create 
more tables) if needed, but we've never had more than 200 users active at one 
time.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-20 Thread Greg Stark
On Thu, Aug 20, 2009 at 9:16 PM, Craig Jamescraig_ja...@emolecules.com wrote:
 Fabio La Farcioli wrote:

 i am developing a web app for thousands users (1.000/2.000).

 Each user have a 2 table of work...I finally have 2.000 (users) x 2 tables
 = 4.000 tables!

 Postgres support an elevate number of tables??
 i have problem of performance ???

What you want is a multi-column primary key where userid is part of
the key. You don't want to have a separate table for each user unless
each user has their own unique set of columns.


 When the user logs back in, a hidden part of the login process gets a table
 from the pool of available tables, assigns it to this user, and copies the
 user's  data from the archive into this personal table.  They are now ready
 to work. This whole process takes just a fraction of a second for most
 users.

And what does all this accomplish?


-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-20 Thread Craig James

Greg Stark wrote:

What you want is a multi-column primary key where userid is part of
the key. You don't want to have a separate table for each user unless
each user has their own unique set of columns.


Not always true.


When the user logs back in, a hidden part of the login process gets a table
from the pool of available tables, assigns it to this user, and copies the
user's  data from the archive into this personal table.  They are now ready
to work. This whole process takes just a fraction of a second for most
users.


And what does all this accomplish?


The primary difference is between

 delete from big_table where userid = xx

vesus

 truncate user_table

There are also significant differences in performance for large inserts, 
because a single-user table almost never needs indexes at all, whereas a big 
table for everyone has to have at least one user-id column that's indexed.

In our application, the per-user tables are hitlists -- scratch lists that 
are populated something like this.  The hitlist is something like this:

  create table hitlist_xxx (
row_id integer,
sortorder integer default nextval('hitlist_seq_xxx')
  )


  truncate table hitlist_xxx;
  select setval(hitlist_seq_xxx, 1, false);
  insert into hitlist_xxx (row_id) (select some_id from ... where ... order by 
...);

Once the hitlist is populated, the user can page through it quickly with no 
further searching, e.g. using a web app.

We tested the performance using a single large table in Postgres, and it was 
not nearly what we needed.  These hitlists tend to be transitory, and the 
typical operation is to discard the entire list and create a new one.  
Sometimes the user will sort the entire list based on some criterion, which 
also requires a copy/delete/re-insert using a new order-by.

With both Oracle and Postgres, truncate is MUCH faster than delete, and the 
added index needed for a single large table only makes it worse.  With 
Postgres, the repeated large delete/insert makes for tables that need a lot of 
vacuuming and index bloat, further hurting performance.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] improving my query plan

2009-08-20 Thread Kevin Kempter
Hi all;

I have a simple query against two very large tables (  800million rows in 
theurl_hits_category_jt table and 9.2 million  in the url_hits_klk1 table )

I have indexes on the join columns and I've run an explain.
also I've set the default statistics to 250 for both join columns. I get a 
very high overall query cost:


explain 
  
 select 
  
 category_id,   
  
 url_hits_id
  
 from   
  
 url_hits_klk1 a ,  
  
 pwreport.url_hits_category_jt b
  
where   
  
 a.id = b.url_hits_id   
  
 ;  
  
 QUERY PLAN 
  

  
 Hash Join  (cost=296959.90..126526916.55 rows=441764338 width=8)   
  
   Hash Cond: (b.url_hits_id = a.id)
  
   -  Seq Scan on url_hits_category_jt b  (cost=0.00..62365120.22 
rows=432343 width=8)   
   -  Hash  (cost=179805.51..179805.51 rows=9372351 width=4)
 -  Seq Scan on url_hits_klk1 a  (cost=0.00..179805.51 rows=9372351 
width=4)
(5 rows)



If I turn off sequential scans I still get an even higher query cost:

set enable_seqscan = off;
SET
explain
 select
 category_id,
 url_hits_id
 from
 url_hits_klk1 a ,
 pwreport.url_hits_category_jt b
where
 a.id = b.url_hits_id
 ;
  QUERY PLAN
  
---
 Merge Join  (cost=127548504.83..133214707.19 rows=441791932 width=8)
   Merge Cond: (a.id = b.url_hits_id)
   -  Index Scan using klk1 on url_hits_klk1 a  (cost=0.00..303773.29 
rows=9372351 width=4)
   -  Index Scan using mt_url_hits_category_jt_url_hits_id_index on 
url_hits_category_jt b  (cost=0.00..125058243.39 rows=4323702284 width=8)
(4 rows)


Thoughts?


Thanks in advance




Re: [PERFORM] Number of tables

2009-08-20 Thread Greg Stark
On Thu, Aug 20, 2009 at 11:18 PM, Craig Jamescraig_ja...@emolecules.com wrote:
 Greg Stark wrote:

 What you want is a multi-column primary key where userid is part of
 the key. You don't want to have a separate table for each user unless
 each user has their own unique set of columns.
 Not always true.
...
 The primary difference is between
  delete from big_table where userid = xx
 vesus
  truncate user_table


This is a valid point but it's a fairly special case. For most
applications the overhead of deleting records and having to run vacuum
will be manageable and a small contribution to the normal vacuum
traffic. Assuming the above is necessary is a premature optimization
which is probably unnecessary.


 There are also significant differences in performance for large inserts,
 because a single-user table almost never needs indexes at all, whereas a big
 table for everyone has to have at least one user-id column that's indexed.

Maintaining indexes isn't free but one index is hardly going to be a
dealbreaker.

 Once the hitlist is populated, the user can page through it quickly with no
 further searching, e.g. using a web app.

The traditional approach to this would be a temporary table. However
in the modern world of web applications where the user session does
not map directly to a database session that no longer works (well it
never really worked in Postgres where temporary tables are not so
lightweight :( ).

It would be nice to have a solution to that where you could create
lightweight temporary objects which belong to an application session
which can be picked up by a different database connection each go
around.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-20 Thread Alvaro Herrera
Greg Stark wrote:

 It would be nice to have a solution to that where you could create
 lightweight temporary objects which belong to an application session
 which can be picked up by a different database connection each go
 around.

It would be useful:

CREATE SCHEMA session1234 UNLOGGED
  CREATE TABLE hitlist ( ... );

Each table in the session1234 schema would not be WAL-logged, and
would be automatically dropped on crash recovery (actually the whole
schema would be).  But while the server is live it behaves like a
regular schema/table and can be seen by all backends (i.e. not temp)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-20 Thread Greg Stark
On Fri, Aug 21, 2009 at 1:38 AM, Alvaro
Herreraalvhe...@commandprompt.com wrote:
 Greg Stark wrote:

 It would be nice to have a solution to that where you could create
 lightweight temporary objects which belong to an application session
 which can be picked up by a different database connection each go
 around.

 It would be useful:

 CREATE SCHEMA session1234 UNLOGGED
  CREATE TABLE hitlist ( ... );

 Each table in the session1234 schema would not be WAL-logged, and
 would be automatically dropped on crash recovery (actually the whole
 schema would be).  But while the server is live it behaves like a
 regular schema/table and can be seen by all backends (i.e. not temp)

I don't think unlogged is the only, and perhaps not even the most
important, desirable property.

I would want these objects not to cause catalog churn. I might have
thousands of sessions being created all the time and creating new rows
and index pointers which have to be vacuumed would be a headache.

I would actually want the objects to be invisible to other sessions,
at least by default. You would have to have the handle for the
application session to put them into your scope and then you would get
them all en masse. This isn't so much for security -- I would be fine
if there was a back door if you have the right privileges -- but for
application design, so application queries could use prepared plans
without modifying the query to point to hard code the session
information within them and be replanned.

I'm not sure if they should use shared buffers or local buffers. As
long as only one backend at a time could access them it would be
possible to use local buffers and evict them all when the handle is
given up. But that means giving up any caching benefit across
sessions. On the other hand it means they'll be much lighter weight
and easier to make safely unlogged than if they lived in shared
buffers.

These are just some brainstorming ideas, I don't have a clear vision
of how to achieve all this yet. This does sound a lot like the SQL
standard temp table discussion and I think Tom and I are still at odds
on that. Creating new catalog entries for them gives up -- what I
think is the whole point of their design -- their lack of DDL
overhead. But my design above means problems for transactional
TRUNCATE and other DDL.


-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] number of rows estimation for bit-AND operation

2009-08-20 Thread Scott Marlowe
2009/8/20 Slava Moudry smou...@4info.net:
 Hi,
 Yes, I thought about putting the bit-flags in separate fields.
 Unfortunately - I expect to have quite a lot of these and space is an issue 
 when you are dealing with billions of records in fact table, so I prefer to 
 pack them into one int8.

For giggles I created two test tables, one with a single int, one with
8 bools, and put 100M entries in each.  The table with 8 bools took up
aprrox. 3560616 bytes, while the one with a single int took up approx.
3544212

I.e they're about the same.  You should really test to see if having a
lot of bools costs more than mangling ints around.  I'm guessing I
could fit a lot more bools in the test table due to alignment issues
than just 8.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] improving my query plan

2009-08-20 Thread Chris

Kevin Kempter wrote:

Hi all;


I have a simple query against two very large tables (  800million rows 
in theurl_hits_category_jt table and 9.2 million in the url_hits_klk1 
table )



I have indexes on the join columns and I've run an explain.
also I've set the default statistics to 250 for both join columns. I get 
a very high overall query cost:


If you had an extra where condition it might be different, but you're 
just returning results from both tables that match up so doing a 
sequential scan is going to be the fastest way anyway.


--
Postgresql  php tutorials
http://www.designmagick.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] improving my query plan

2009-08-20 Thread Scott Carey


On 8/20/09 4:09 PM, Kevin Kempter kev...@consistentstate.com wrote:

 Hi all;
 
 
 I have a simple query against two very large tables (  800million rows in
 theurl_hits_category_jt table and 9.2 million  in the url_hits_klk1 table )
 
 
 I have indexes on the join columns and I've run an explain.
 also I've set the default statistics to 250 for both join columns. I get a
 very high overall query cost:
 
 

What about the actual times?  The latter plan has higher cost, but perhaps
it is actually faster?  If so, you can change the estimated cost by changing
the db cost parameters.

However, the second plan will surely be slower if the table is not in memory
and causes random disk access.

Note that EXPLAIN ANALYZE for the hash plan will take noticeably longer than
a plain query due to the cost of analysis on hashes.


 
 
 explain  
  select  
  category_id,
  url_hits_id
  from
  url_hits_klk1 a ,
  pwreport.url_hits_category_jt b
 where
  a.id = b.url_hits_id
  ;   
  QUERY PLAN
 --
 --   
  Hash Join  (cost=296959.90..126526916.55 rows=441764338 width=8)
Hash Cond: (b.url_hits_id = a.id)
-  Seq Scan on url_hits_category_jt b  (cost=0.00..62365120.22
 rows=432343 width=8)
-  Hash  (cost=179805.51..179805.51 rows=9372351 width=4)
  -  Seq Scan on url_hits_klk1 a  (cost=0.00..179805.51 rows=9372351
 width=4)
 (5 rows)
 
 
 
 
 
 
 If I turn off sequential scans I still get an even higher query cost:
 
 
 set enable_seqscan = off;
 SET
 explain
  select
  category_id,
  url_hits_id
  from
  url_hits_klk1 a ,
  pwreport.url_hits_category_jt b
 where
  a.id = b.url_hits_id
  ;
   QUERY PLAN
 --
 -
  Merge Join  (cost=127548504.83..133214707.19 rows=441791932 width=8)
Merge Cond: (a.id = b.url_hits_id)
-  Index Scan using klk1 on url_hits_klk1 a  (cost=0.00..303773.29
 rows=9372351 width=4)
-  Index Scan using mt_url_hits_category_jt_url_hits_id_index on
 url_hits_category_jt b  (cost=0.00..125058243.39 rows=4323702284 width=8)
 (4 rows)
 
 
 
 
 Thoughts?
 
 
 
 
 Thanks in advance
 
 
 
 
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] limiting results makes the query slower

2009-08-20 Thread Jaime Casanova
Hi,

in a web app we have a query that we want to show in limited results
at a time, this one executes in 10 seconds if i use limit but executes
in 300ms if i remove it.
why is that happening? the query is using and index for avoiding the
sort so the nestloop should go only for the first 20 records on
tgen_persona, no?
below some more info

postgresql 8.3.7
ram 32GB
shared_buffers 8GB
work_mem 8MB

tgen_persona has 185732 records and tcom_invitacion is a partitioned
(by dates: 1 month every partition) table and has more than 29million
records in the partitions

explain analyze here: http://explain.depesz.com/s/B4

the situation improves if i disable nestloops, explain analyze with
nestloop off here: http://explain.depesz.com/s/Jv

select Per.razon_social as MAIL,inv.cata_esta_calificacion,
   inv.observa_calificacion,
   to_char(inv.fech_crea,':MM:DD') as fech_crea,
   case when (( select cod_estado FROM TPRO_PROVEEDOR
 WHERE id_proveedor = (select max(a.id_proveedor)
 from tpro_proveedor a
where persona_id = Inv.persona_id )
  )='Habilitado')
then 'Habilitado'
else 'Deshabilitado'
   end as empresa_id
  from tgen_persona Per, tcom_invitacion Inv
 where Per.persona_id = Inv.persona_id
   and inv.id_soli_compra = '60505'
 ORDER BY Per.razon_social asc limit 20 offset 0


-- 
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] number of rows estimation for bit-AND operation

2009-08-20 Thread Slava Moudry
Hi,
Yes, I thought about putting the bit-flags in separate fields.
Unfortunately - I expect to have quite a lot of these and space is an issue 
when you are dealing with billions of records in fact table, so I prefer to 
pack them into one int8.
For users it's also much easier to write where mt_flags134=0 instead of 
where f_2=false and f4=false and f_128=false.
In Teradata - that worked just fine, but it costs millions vs. zero cost for 
Postgres, so I am not really complaining out loud :)

Hopefully Tom or other bright folks at PG could take a look at this for the 
next patch/release.
Btw, can you send me the link to  PG's selectivity estimator discussion - I'd 
like to provide feedback if I can.
Thanks,
-Slava.


-Original Message-
From: Robert Haas [mailto:robertmh...@gmail.com] 
Sent: Thursday, August 20, 2009 10:55 AM
To: Scott Marlowe
Cc: Slava Moudry; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] number of rows estimation for bit-AND operation

On Tue, Aug 18, 2009 at 6:34 PM, Scott Marlowescott.marl...@gmail.com wrote:
 2009/8/18 Slava Moudry smou...@4info.net:
 increase default stats target, analyze, try again.
 This field has only 5 values. I had put values/frequencies in my first post.

 Sorry, kinda missed that.  Anyway, there's no way for pg to know which
 operation is gonna match.  Without an index on it.  So my guess is
 that it just guesses some fixed value.  With an index it might be able
 to get it right, but you'll need an index for each type of match
 you're looking for.  I think.  Maybe someone else on the list has a
 better idea.

The best way to handle this is probably to not cram multiple vales
into a single field.  Just use one boolean for each flag.  It won't
even cost you any space, because right now you are using 8 bytes to
store 5 booleans, and 5 booleans will (I believe) only require 5
bytes.  Even if you were using enough of the bits for the space usage
to be higher with individual booleans, the overall performance is
likely to be better that way.

This is sort of stating the obvious, but it doesn't make it any less
true.  Unfortunately, PG's selectivity estimator can't handle cases
like this.  Tom Lane recently made some noises about trying to improve
it, but it's not clear whether that will go anywhere, and in any event
it won't happen before 8.5.0 comes out next spring/summer.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] [PERFORMANCE] how to set wal_buffers

2009-08-20 Thread Jeff Janes
 -- Forwarded message --
 From: Jaime Casanova jcasa...@systemguards.com.ec
 To: psql performance list pgsql-performance@postgresql.org
 Date: Wed, 19 Aug 2009 19:25:11 -0500
 Subject: [PERFORMANCE] how to set wal_buffers
 Hi,

 Our fine manual says:
 
 The amount of memory used in shared memory for WAL data. The default
 is 64 kilobytes (64kB). The setting need only be large enough to hold
 the amount of WAL data generated by one typical transaction, since the
 data is written out to disk at every transaction commit. This
 parameter can only be set at server start.
 

I don't care for that description for several reasons, but haven't
been able to come up with a good alternative.

One problem is as you note.  How is the average user supposed to know
what is the size of the redo that is generated by a typical
transaction?

But other that, I still think it is not good advice.  If your typical
transaction runs for 5 minutes and generates 100's of MB of WAL and
there is only one of them at a time, there is certainly no reason to
have several hundred MB of wal_buffers.  It will merrily run around
the buffer ring of just a few MB.

On the other extreme, if you have many connections rapidly firing
small transactions, and you hope for the WAL of many of them to all
get flushed down to disk with a single fsync,
then your wal_buffers should be big enough to hold the WAL data of all
those transactions.  Running out of WAL space has a nasty effect on
group commits.

The default value of wal_buffers is low because many older systems
have a low default value for the kernel setting shmmax.  On any
decent-sized server, I'd just automatically increase wal_buffers to 1
or 2 MB.  It might help and lot, and it is unlikely to hurt.

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance