[PERFORM] Query's fast standalone - slow as a subquery.

2005-03-04 Thread Ron Mayer

I have a query that runs quite quickly using a hash join when run
standalone.

When I use this query as a subquery the planner always seems to
pick a differnt plan with an order of magnitude worse performance.

This bad plan is chosen even when the outer sql statement is
a trivial expression like this:
   select * from (query) as a;
which I believe should be a no-op.


Should the optimizer have noticed that it could have used a hash
join in this case?   Anything I can do to help convince it to?

  Explain analyze output follows.
  Thanks,
  Ron





fli=# explain analyze SELECT * from (select * from userfeatures.points join 
icons using (iconid) where the_geom  setSRID('BOX3D(-123.40 25.66,-97.87 
43.17)'::BOX3D, -1 )) as upf ;

  QUERY PLAN
  
--
 Nested Loop  (cost=0.00..446.42 rows=1 width=120) (actual 
time=-0.096..7928.546 rows=15743 loops=1)
   Join Filter: (outer.iconid = inner.iconid)
   -  Seq Scan on points  (cost=0.00..444.43 rows=1 width=82) (actual 
time=0.096..132.255 rows=15743 loops=1)
 Filter: (the_geom  
'010300010005009AD95EC0295C8FC2F5A839409AD95EC0F6285C8FC295454048E17A14AE7758C0F6285C8FC295454048E17A14AE7758C0295C8FC2F5A839409AD95EC0295C8FC2F5A83940'::geometry)
   -  Seq Scan on icons  (cost=0.00..1.44 rows=44 width=42) (actual 
time=0.006..0.242 rows=44 loops=15743)
 Total runtime: 8005.766 ms
(6 rows)

fli=# explain analyze   select * from userfeatures.points join 
icons using (iconid) where the_geom  setSRID('BOX3D(-123.40 25.66,-97.87 
43.17)'::BOX3D, -1 );

  QUERY PLAN
  
--
 Hash Join  (cost=1.55..682.84 rows=15789 width=120) (actual 
time=0.641..320.002 rows=15743 loops=1)
   Hash Cond: (outer.iconid = inner.iconid)
   -  Seq Scan on points  (cost=0.00..444.43 rows=15794 width=82) (actual 
time=0.067..94.307 rows=15743 loops=1)
 Filter: (the_geom  
'010300010005009AD95EC0295C8FC2F5A839409AD95EC0F6285C8FC295454048E17A14AE7758C0F6285C8FC295454048E17A14AE7758C0295C8FC2F5A839409AD95EC0295C8FC2F5A83940'::geometry)
   -  Hash  (cost=1.44..1.44 rows=44 width=42) (actual time=0.530..0.530 
rows=0 loops=1)
 -  Seq Scan on icons  (cost=0.00..1.44 rows=44 width=42) (actual 
time=0.026..0.287 rows=44 loops=1)
 Total runtime: 397.003 ms
(7 rows)






---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Query's fast standalone - slow as a subquery.

2005-03-04 Thread Tom Lane
Ron Mayer [EMAIL PROTECTED] writes:
-  Seq Scan on points  (cost=0.00..444.43 rows=1 width=82) (actual 
 time=0.096..132.255 rows=15743 loops=1)
  Filter: (the_geom  
 '010300010005009AD95EC0295C8FC2F5A839409AD95EC0F6285C8FC295454048E17A14AE7758C0F6285C8FC295454048E17A14AE7758C0295C8FC2F5A839409AD95EC0295C8FC2F5A83940'::geometry)

-  Seq Scan on points  (cost=0.00..444.43 rows=15794 width=82) (actual 
 time=0.067..94.307 rows=15743 loops=1)
  Filter: (the_geom  
 '010300010005009AD95EC0295C8FC2F5A839409AD95EC0F6285C8FC295454048E17A14AE7758C0F6285C8FC295454048E17A14AE7758C0295C8FC2F5A839409AD95EC0295C8FC2F5A83940'::geometry)

Apparently the selectivity of the  condition is misestimated in the
first case (note the radically wrong rowcount estimate), leading to an
inefficient join plan choice.  I suppose this is a bug in the postgis
selectivity routines --- better complain to them.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Help with tuning this query (with explain analyze finally)

2005-03-04 Thread Ken
Richard,
What do you mean by summary table?  Basically a cache of the query into a 
table with replicated column names of all the joins?  I'd probably have to 
whipe out the table every minute and re-insert the data for each carrier in 
the system.  I'm not sure how expensive this operation would be, but I'm 
guessing it would be fairly heavy-weight.  And maintaince would be a lot 
harder because of the duplicated columns, making refactorings on the 
database more error-prone.  Am I understanding your suggestion correctly? 
Please correct me if I am.

Can you turn the problem around? Calculate what you want for all users 
(once every 60 seconds) and stuff those results into a summary table. Then 
let the users query the summary table as often as they like (with the 
understanding that the figures aren't going to update any faster than once 
a minute)

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Help with tuning this query (with explain analyze finally)

2005-03-04 Thread John Arbash Meinel
Ken wrote:
Richard,
What do you mean by summary table?  Basically a cache of the query
into a table with replicated column names of all the joins?  I'd
probably have to whipe out the table every minute and re-insert the
data for each carrier in the system.  I'm not sure how expensive this
operation would be, but I'm guessing it would be fairly heavy-weight.
And maintaince would be a lot harder because of the duplicated
columns, making refactorings on the database more error-prone.  Am I
understanding your suggestion correctly? Please correct me if I am.
Can you turn the problem around? Calculate what you want for all
users (once every 60 seconds) and stuff those results into a summary
table. Then let the users query the summary table as often as they
like (with the understanding that the figures aren't going to update
any faster than once a minute)

It's the same idea of a materialized view, or possibly just a lazy cache.
Just try this query:
CREATE TABLE cachedview AS
select p.id as person_id, s.*, ss.*
from shipment s
inner join shipment_status ss on s.current_status_id=ss.id
inner join release_code rc on ss.release_code_id=rc.id
left outer join driver d on s.driver_id=d.id
left outer join carrier_code cc on s.carrier_code_id=cc.id
where s.carrier_code_id in (
select cc.id
from person p
 inner join carrier_to_person ctp on p.id=ctp.person_id
 inner join carrier c on ctp.carrier_id=c.id
 inner join carrier_code cc on cc.carrier_id = c.id
)
and s.current_status_id is not null
and s.is_purged=false
and(rc.number='9' )
and(ss.date=current_date-31 )
order by ss.date desc ;
Notice that I took out the internal p.id = blah.
Then you can do:
CREATE INDEX cachedview_person_id_idx ON cachedview(person_id);
Then from the client side, you can just run:
SELECT * from cachedview WHERE person_id = id;
Now, this assumes that rc.number='9' is what you always want. If that
isn't the case, you could refactor a little bit.
This unrolls all of the work, a table which should be really fast to
query. If this query takes less than 10s to generate, than just have a
service run it every 60s. I think for refreshing, it is actually faster
to drop the table and recreate it, rather than deleteing the entries.
Dropping also has the advantage that if you ever add more rows to s or
ss, then the table automatically gets the new entries.
Another possibility, is to have the cachedview not use s.*, ss.*,
but instead just include whatever the primary keys are for those tables.
Then your final query becomes:
SELECT s.*, ss.* FROM cachedview cv, s, ss WHERE cv.person_id = id,
cv.s_id = s.pkey, cv.ss_id = ss.pkey;
Again, this should be really fast, because you should have an index on
cv.person_id and only have say 300 rows there, and then you are just
fetching a few rows from s and ss. You can also use this time to do some
of your left joins against other tables.
Does this make sense? The biggest advantage you have is your 60s
statement. With that in hand, I think you can do a lot of caching
optimizations.
John
=:-


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Help with tuning this query (with explain analyze finally)

2005-03-04 Thread John Arbash Meinel
Ken Egervari wrote:
Josh,
...
I thought about this, but it's very important since shipment and
shipment_status are both updated in real time 24/7/365.  I think I
might be able to cache it within the application for 60 seconds at
most, but it would make little difference since people tend to refresh
within that time anyway. It's very important that real-time
inforamtion exists though.
Is 60s real-time enough for you? That's what it sounds like. It would be
nice if you could have 1hr, but there's still a lot of extra work you
can do in 60s.
You could also always throw more hardware at it. :) If the
shipment_status is one of the bottlenecks, create a 4-disk raid10 and
move the table over.
I don't remember what your hardware is, but I don't remember it being a
quad opteron with 16GB ram, and 20 15k SCSI disks, with the transaction
log on a solid state disk. :)

That sounds like an awesome system.  I loved to have something like
that. Unfortunately, the production server is just a single processor
machine with 1 GB ram.  I think throwing more disks at it is probably
the best bet, moving the shipment and shipment_status tables over as
you suggested. That's great advice.
Well, disk I/O is one side, but probably sticking another 1GB (2GB
total) also would be a fairly economical upgrade for performance.
You are looking for query performance, not really update performance,
right? So buy a 4-port SATA controller, and some WD Raptor 10k SATA
disks. With this you can create a RAID10 for  $2k (probably like $1k).
30ms is a good target, although I guess I was naive for setting that
goal perhaps.  I've just taken queries that ran at 600ms and with 1 or
2 indexes, they went down to 15ms.
It all depends on your query. If you have a giant table (1M rows), and
you are doing a seqscan for only 5 rows, then adding an index will give
you enormous productivity gains. But you are getting 30k rows, and
combining them with 6k rows, plus a bunch of other stuff. I think we've
tuned the query about as far as we can.
Let's say we have 200 users signed into the application at the same
time. The application refreshes their shipment information
automatically to make sure it's up to date on the user's screen.  The
application will execute the query we are trying to tune every 60
seconds for most of these users.  Users can set the refresh time to be
higher, but 60 is the lowest amount so I'm just assuming everyone has
it at 60.
Anyway, if you have 200 users logged in, that's 200 queries in the 60
second period, which is about 3-4 queries every second.  As you can
see, it's getting maxed out, and because of bad luck, the queries are
bunched together and are being called at the same time, making 8-9
queries in the same second and that's where the performance is
starting to degrade.  I just know that if I could get this down to 30
ms, or even 100, we'd be okay for a few months without throwing
hardware at the problem.   Also keep in mind that other application
logic and Hibernate mapping is occuring to, so 3-4 queries a second is
already no good when everything is running on a single machine.
The other query I just sent, where you do the query for all users at
once, and then cache the result, *might* be cheaper than doing a bunch
of different queries.
However, you may find that doing the query for *all* users takes to
long. So you could keep another table indicating who the most recent
people logged in are, and then only cache the info for those people.
This does start getting a little more involved, so see if you can do all
users before heading down this road.
This isn't the best setup, but it's the best we can afford.  We are
just a new startup company.  Cheaper servers and open source keep our
costs low. But money is starting to come in after 10 months of hard
work, so we'll be able to replace our server within the next 2
months.  It'll be a neccessity because we are signing on some big
clientsnow and they'll have 40 or 50 users for a single company.  If
they are all logged in at the same time, that's a lot of queries.
Sure. Just realize you can't really support 200 concurrent connections
with a single P4 and 1GB of ram.
John
=:-


signature.asc
Description: OpenPGP digital signature


[PERFORM] Select in FOR LOOP Performance

2005-03-04 Thread Charles Joseph
I face problem when running the following pgplsql
function. The problem is it takes more than 24hours to
complete
the calculation. The EMP table has about 200,000
records. I execute the function through psql  select
calculate();
(There is no cyclic link inside the data).

Computer used: IBM xSeries 225, RAM 1GB, SCSI 36GB
O/S : RedHat Linux Enterprise 3.0 AS
PostgreSQL version 8.0.1
fsync=false

I would very appreciate if anyone can help to find
out what the problem is, or any others way to improve
the performance
of the function. 

Is there any difference between select in FOR LOOP
with CURSOR in term of performance ?

EMP Table
GEN  char(3),
CODE varchar(20),
PARENT varchar(20),
POSITION INT4 DEFAULT 0,
PG NUMERIC(15,2) DEFAULT 0,
P  NUMERIC(15,2) DEFAULT 0,
QUA CHAR(1) DEFAULT '0',
.
.
. 
   create index EMP_GEN on EMP (GEN);
   create index EMP_CODE on EMP (CODE);
   create index EMP_PARENT on PARENT (PARENT);

Sample EMP DATA:
  GEN   CODE   PARENT POSITIONPPG   QUA
  ===
  000   A001      3   100   0   '1'
  001   A002   A001   250   0   '1'
  001   A003   A001   150   0   '1'
  001   A004   A001   120   0   '1'
  002   A005   A003   220   0   '1'
  002   A006   A004   330   0   '1'
   ...
   ...
   

for vTMP_ROW in select CODE,PARENT,POSITION from
EMP order by GEN desc loop
vCODE := vTMP_ROW.CODE;
vPARENT   := vTMP_ROW.PARENT;
nPOSITION := vTMP_ROW.POSITION;

update EMP set PG=PG+P where CODE = vCODE;

select into vCURR_ROW PG,POSITION from EMP
where CODE = vCODE;

nPG   := vCURR_ROW.PG;
nPOSITION := vCURR_ROW.POSITION;

vUPL := vPARENT;

loop
   select into vUPL_ROW
CODE,PARENT,POSITION,P,QUA from EMP where CODE = vUPL;
   if found then
  if vUPL_ROW.POSITION  nPOSITION and
vUPL_ROW.QUA = ''1'' then
 update EMP set PG=PG+nPG where CODE =
vUPL;
 exit;
  end if;
   else 
  exit;   
   end if;
   vUPL := vUPL_ROW.PARENT;
end loop;
end loop;
 
.
.
.

Thank You
  




__ 
Celebrate Yahoo!'s 10th Birthday! 
Yahoo! Netrospective: 100 Moments of the Web 
http://birthday.yahoo.com/netrospective/

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Help with tuning this query (with explain analyze finally)

2005-03-04 Thread Josh Berkus
Ken,

 I did everything you said and my query does perform a bit better.  I've
 been getting speeds from 203 to 219 to 234 milliseconds now.   I tried
 increasing the work mem and the effective cache size from the values you
 provided, but I didn't see any more improvement.  I've tried to looking
 into setting the shared buffers for Windows XP, but I'm not sure how to do
 it.  I'm looking in the manual at:

Now that you know how to change the shared_buffers, want to go ahead and run 
the query again?

I'm pretty concerned about your case, because based on your description I 
would expect  100ms on a Linux machine.So I'm wondering if this is a 
problem with WindowsXP performance, or if it's something we can fix through 
tuning.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Select in FOR LOOP Performance

2005-03-04 Thread Tom Lane
Charles Joseph [EMAIL PROTECTED] writes:
 I face problem when running the following pgplsql
 function. The problem is it takes more than 24hours to
 complete
 the calculation. The EMP table has about 200,000
 records.

Sure there are no infinite loops of PARENT links in your table?

Also, if CODE is supposed to be unique, you should probably declare
its index that way.  Or at least make sure the planner knows it's
unique (have you ANALYZEd the table lately?)

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Help with tuning this query (with explain analyze finally)

2005-03-04 Thread John A Meinel
John Arbash Meinel wrote:
Ken wrote:
Richard,
What do you mean by summary table?  Basically a cache of the query
into a table with replicated column names of all the joins?  I'd
probably have to whipe out the table every minute and re-insert the
data for each carrier in the system.  I'm not sure how expensive this
operation would be, but I'm guessing it would be fairly heavy-weight.
And maintaince would be a lot harder because of the duplicated
columns, making refactorings on the database more error-prone.  Am I
understanding your suggestion correctly? Please correct me if I am.
Can you turn the problem around? Calculate what you want for all
users (once every 60 seconds) and stuff those results into a summary
table. Then let the users query the summary table as often as they
like (with the understanding that the figures aren't going to update
any faster than once a minute)

It's the same idea of a materialized view, or possibly just a lazy cache.
...
This unrolls all of the work, a table which should be really fast to
query. If this query takes less than 10s to generate, than just have a
service run it every 60s. I think for refreshing, it is actually faster
to drop the table and recreate it, rather than deleteing the entries.
Dropping also has the advantage that if you ever add more rows to s or
ss, then the table automatically gets the new entries.
Just as a small update. If completely regenerating the cache takes to 
long, the other way to do it, is to create insert and update triggers on 
s and ss, such that as they change, they also update the cachedview table.

Something like
CREATE TRIGGER on_ss_ins AFTER INSERT ON ss FOR EACH ROW EXECUTE
   INSERT INTO cached_view SELECT p.id as person_id, s.*, ss.* FROM 
the big stuff WHERE s.id = NEW.id;

This runs the same query, but notice that the WHERE means it only allows 
the new row. So this query should run fast. It is a little bit of 
overhead on each of your inserts, but it should keep the cache 
up-to-date. With something like this, I would have the final client 
query still include the date restriction, since you accumulate older 
rows into the cached view. But you can run a daily process that prunes 
out everything older than 31 days, which keeps the cachedview from 
getting really large.

John
=:-


signature.asc
Description: OpenPGP digital signature


[PERFORM] MAIN vs. PLAIN

2005-03-04 Thread Dave Held
I notice that by default, postgres sets numeric fields to
storage MAIN.  What exactly does that mean?  Does that mean
it stores it in some type of compressed BCD format?  If so,
how much performance gain can I expect by setting the storage
to PLAIN?  Also, the docs say that char(n) is implemented more
or less the same way as text.  Does that mean that setting
a field to, say, char(2) PLAIN is not going be any faster
than text PLAIN?  That seems a bit counter-intuitive.  I
would hope that a char(2) PLAIN would just reserve two chars
in the record structure without any overhead of pointers to
external data.  Is there a reason this isn't supported?

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] name search query speed

2005-03-04 Thread stig erikson
Jeremiah Jahn wrote:
I have about 5M names stored on my DB. Currently the searches are very
quick unless, they are on a very common last name ie. SMITH. The Index
is always used, but I still hit 10-20 seconds on a SMITH or Jones
search, and I average about 6 searches a second and max out at about
30/s. Any suggestions on how I could arrange things to make this search
quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I
can increase this speed w/o a HW upgrade.
thanx,
-jj-
is there a chance you could benefit from indices spanning over multiple columns?
maybe the user that searches for SMITH knows more then the last name, ie first 
name, location (zip code, name of city, etc.)?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] What is the postgres sql command for last_user_id ???

2005-03-04 Thread stig erikson
[EMAIL PROTECTED] wrote:
I would like to know whether there is any command which the server will give the
record ID back to the client when client puts the data and the server generates
an autoincrement ID for that record.
For example if many clients try to put the money data to the server and each
record from each client has its own record ID by autoincrement process of the
server [x+1] and i don't need to lock the record since it will bring the system
to slow down. That client wil then want to know which ID that server gives to
that record in order to select that record to print the reciept [bill].
I know that in mysql there is a command last_record_id which acts the same as
I mention above. Does anybody know that , please give me the detail?
Amrit,Thailand

---(end of broadcast)---
TIP 8: explain analyze is your friend
http://www.postgresql.org/docs/8.0/static/functions-sequence.html
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [PERFORM] multi billion row tables: possible or insane?

2005-03-04 Thread Jim C. Nasby
On Tue, Mar 01, 2005 at 10:34:29AM +0100, Ramon Bastiaans wrote:
 Hi all,
 
 I am doing research for a project of mine where I need to store several 
 billion values for a monitoring and historical tracking system for a big 
 computer system. My currect estimate is that I have to store (somehow) 
 around 1 billion values each month (possibly more).

On a side-note, do you need to keep the actual row-level details for
history? http://rrs.decibel.org might be of some use.

Other than that, what others have said. Lots and lots of disks in
RAID10, and opterons (though I would choose opterons not for memory size
but because of memory *bandwidth*).
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] multi billion row tables: possible or insane?

2005-03-04 Thread Alex Turner
Not true - with fsync on I get nearly 500 tx/s, with it off I'm as
high as 1600/sec with dual opteron and 14xSATA drives and 4GB RAM on a
3ware Escalade.  Database has 3 million rows.

As long as queries use indexes, multi billion row shouldn't be too
bad.  Full table scan will suck though.

Alex Turner
netEconomist


On Tue, 1 Mar 2005 16:40:29 +0100, Vig, Sandor (G/FI-2)
[EMAIL PROTECTED] wrote:
 385 transaction/sec?
 
 fsync = false
 
 risky but fast.
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Behalf Of John Arbash
 Meinel
 Sent: Tuesday, March 01, 2005 4:19 PM
 To: Ramon Bastiaans
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] multi billion row tables: possible or insane?
 
 Ramon Bastiaans wrote:
 
  Hi all,
 
  I am doing research for a project of mine where I need to store
  several billion values for a monitoring and historical tracking system
  for a big computer system. My currect estimate is that I have to store
  (somehow) around 1 billion values each month (possibly more).
 
 If you have that 1 billion perfectly distributed over all hours of the
 day, then you need 1e9/30/24/3600 = 385 transactions per second.
 
 Which I'm pretty sure is possible with postgres, you just need pretty
 beefy hardware. And like Jeff said, lots of disks for lots of IO.
 Like a quad opteron, with 16GB of ram, and around 14-20 very fast disks.
 raid10 not raid5, etc. To improve query performance, you can do some
 load balancing by having replication machines by using Slony.
 
 Or if you can do batch processing, you could split up the work into a
 few update machines, which then do bulk updates on the master database.
 This lets you get more machines into the job, since you can't share a
 database across multiple machines.
 
  I was wondering if anyone has had any experience with these kind of
  big numbers of data in a postgres sql database and how this affects
  database design and optimization.
 
 Well, one of the biggest things is if you can get bulk updates, or if
 clients can handle data being slightly out of date, so you can use
 cacheing. Can you segregate your data into separate tables as much as
 possible? Are your clients okay if aggregate information takes a little
 while to update?
 
 One trick is to use semi-lazy materialized views to get your updates to
 be fast.
 
  What would be important issues when setting up a database this big,
  and is it at all doable? Or would it be a insane to think about
  storing up to 5-10 billion rows in a postgres database.
 
 I think you if you can design the db properly, it is doable. But if you
 have a clients saying I need up to the second information on 1 billion
 rows, you're never going to get it.
 
 
  The database's performance is important. There would be no use in
  storing the data if a query will take ages. Query's should be quite
  fast if possible.
 
 Again, it depends on the queries being done.
 There are some nice tricks you can use, like doing a month-by-month
 partitioning (if you are getting 1G inserts, you might want week-by-week
 partitioning), and then with a date column index, and a union all view
 you should be able to get pretty good insert speed, and still keep fast
 *recent* queries. Going through 1billion rows is always going to be
 expensive.
 
  I would really like to hear people's thoughts/suggestions or go see a
  shrink, you must be mad statements ;)
 
  Kind regards,
 
  Ramon Bastiaans
 
 I think it would be possible, but there are a lot of design issues with
 a system like this. You can't go into it thinking that you can design a
 multi billion row database the same way you would design a million row db.
 
 John
 =:-
 
 The information transmitted is intended only for the person or entity to
 which it is addressed and may contain confidential and/or privileged
 material. Any review, retransmission, dissemination or other use of, or
 taking of any action in reliance upon, this information by persons or
 entities other than the intended recipient is prohibited. If you received
 this in error, please contact the sender and delete the material from any
 computer.
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] MAIN vs. PLAIN

2005-03-04 Thread Tom Lane
Dave Held [EMAIL PROTECTED] writes:
 I notice that by default, postgres sets numeric fields to
 storage MAIN.  What exactly does that mean?

See http://developer.postgresql.org/docs/postgres/storage-toast.html

There isn't any amazingly strong reason why numeric defaults to MAIN
rather than EXTENDED, which is the default for every other toastable
datatype --- except that I thought it'd be a good idea to have at
least one type that did so, just to exercise that code path in the
tuple toaster.  And numeric shouldn't ordinarily be large enough to
need out-of-line storage anyway.  It's unlikely even to need
compression, really, but as long as it's a varlena type the overhead
to support toasting is nearly nil.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend