Re: [PERFORM] create index with substr function

2004-10-21 Thread Joshua D. Drake




Tom Lane wrote:

  "Ray" [EMAIL PROTECTED] writes:
  
  
CREATE INDEX idx_doc_substr_doc_urn ON doc USING btree (SUBSTR(doc_urn,10));
ERROR:  parser: parse error at or near "10" at character 68

  
  
This will work in 7.4, but not older releases.

  

Can't you just use a SQL function that calls the substr function? I
have done that with date functions before
like:
CREATE OR REPLACE FUNCTION get_month(text) returns double precision AS '
   SELECT date_part('month',$1);
' LANGUAGE 'SQL' IMMUTABLE;

CREATE INDEX get_month_idx on foo(get_month(date_field));
Or in this case:

CREATE OR REPLACE FUNCTION sub_text(text) returns text AS '
 SELECT SUBSTR($1,10) from foo;
' LANGUAGE 'SQL' IMMUTABLE;

CREATE INDEX sub_text_idx ON foo(sub_text(doc_urn));

This works on 7.3.6???

Sincerely,

Joshua D. Drake





  			regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings
  



-- 
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL




[PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-10-21 Thread Matt Clark
I suppose I'm just idly wondering really.  Clearly it's against PG
philosophy to build an FS or direct IO management into PG, but now it's so
relatively easy to plug filesystems into the main open-source Oses, It
struck me that there might be some useful changes to, say, XFS or ext3, that
could be made that would help PG out.

I'm thinking along the lines of an FS that's aware of PG's strategies and
requirements and therefore optimised to make those activities as efiicient
as possible - possibly even being aware of PG's disk layout and treating
files differently on that basis.

Not being an FS guru I'm not really clear on whether this would help much
(enough to be worth it anyway) or not - any thoughts?  And if there were
useful gains to be had, would it need a whole new FS or could an existing
one be modified?

So there might be (as I said, I'm not an FS guru...):
* great append performance for the WAL?
* optimised scattered writes for checkpointing?
* Knowledge that FSYNC is being used for preserving ordering a lot of the
time, rather than requiring actual writes to disk (so long as the writes
eventually happen in order...)?


Matt



Matt Clark
Ymogen Ltd
P: 0845 130 4531
W: https://ymogen.net/
M: 0774 870 1584
 


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Free PostgreSQL Training, Philadelphia, Oct 30

2004-10-21 Thread Alban Medici (NetCentrex)
Nobody got a plane to came from europe :-)  ???
As a poor frenchie I will not come ...
Have a good time

Alban
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Aaron Mulder
Sent: mercredi 20 octobre 2004 15:11
To: [EMAIL PROTECTED]
Subject: Re: [PERFORM] Free PostgreSQL Training, Philadelphia, Oct 30

If anyone is going to take the train all the way, please e-mail me
offline.  There is a train station relatively close to the event (NY to
Philly then the R5 to Malvern), but it's not within walking distance, so
we'll figure out some way to pick people up from there.

Thanks,
Aaron

On Wed, 20 Oct 2004, Aaron Werman wrote:
 I'm driving from Tenafly NJ and going to both sessions. If you're able 
 to get to the George Washington Bridge (A train to 178th Street [Port 
 Authority North] and a bus over the bridge), I can drive you down. I'm 
 not sure right now about the return because I have confused plans to 
 meet someone.
 
 /Aaron
 
 
 On Tue, 19 Oct 2004 14:43:29 -0400, Max Baker [EMAIL PROTECTED] wrote:
  On Wed, Oct 13, 2004 at 12:21:27PM -0400, Aaron Mulder wrote:
   All,
 My company (Chariot Solutions) is sponsoring a day of free 
   PostgreSQL training by Bruce Momjian (one of the core PostgreSQL 
   developers).  The day is split into 2 sessions (plus a QA session):
  
* Mastering PostgreSQL Administration
* PostgreSQL Performance Tuning
  
 Registration is required, and space is limited.  The 
   location is Malvern, PA (suburb of Philadelphia) and it's on 
   Saturday Oct 30.  For more information or to register, see
  
   http://chariotsolutions.com/postgresql.jsp
  
  I'm up in New York City and would be taking the train down to 
  Philly.  Is anyone coming from Philly or New York that would be able 
  to give me a lift to/from the train station?  Sounds like a great event.
  
  Cheers,
  -m
  
  ---(end of 
  broadcast)---
  TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly
  
 
 
 --
 
 Regards,
 /Aaron
 
 ---(end of 
 broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faqs/FAQ.html
 

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-10-21 Thread Leeuw van der, Tim
Hiya,

Looking at that list, I got the feeling that you'd want to push that PG-awareness down 
into the block-io layer as well, then, so as to be able to optimise for (perhaps) 
conflicting goals depending on what the app does; for the IO system to be able to read 
the apps mind it needs to have some knowledge of what the app is / needs / wants and I 
get the impression that this awareness needs to go deeper than the FS only.

--Tim

(But you might have time to rewrite Linux/BSD as a PG-OS? just kidding!)

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Matt Clark
Sent: Thursday, October 21, 2004 9:58 AM
To: [EMAIL PROTECTED]
Subject: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?


I suppose I'm just idly wondering really.  Clearly it's against PG
philosophy to build an FS or direct IO management into PG, but now it's so
relatively easy to plug filesystems into the main open-source Oses, It
struck me that there might be some useful changes to, say, XFS or ext3, that
could be made that would help PG out.

I'm thinking along the lines of an FS that's aware of PG's strategies and
requirements and therefore optimised to make those activities as efiicient
as possible - possibly even being aware of PG's disk layout and treating
files differently on that basis.

Not being an FS guru I'm not really clear on whether this would help much
(enough to be worth it anyway) or not - any thoughts?  And if there were
useful gains to be had, would it need a whole new FS or could an existing
one be modified?

So there might be (as I said, I'm not an FS guru...):
* great append performance for the WAL?
* optimised scattered writes for checkpointing?
* Knowledge that FSYNC is being used for preserving ordering a lot of the
time, rather than requiring actual writes to disk (so long as the writes
eventually happen in order...)?


Matt



Matt Clark
Ymogen Ltd
P: 0845 130 4531
W: https://ymogen.net/
M: 0774 870 1584
 


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-10-21 Thread Pierre-Frdric Caillaud
Reiser4 ?
On Thu, 21 Oct 2004 08:58:01 +0100, Matt Clark [EMAIL PROTECTED] wrote:
I suppose I'm just idly wondering really.  Clearly it's against PG
philosophy to build an FS or direct IO management into PG, but now it's  
so
relatively easy to plug filesystems into the main open-source Oses, It
struck me that there might be some useful changes to, say, XFS or ext3,  
that
could be made that would help PG out.

I'm thinking along the lines of an FS that's aware of PG's strategies and
requirements and therefore optimised to make those activities as  
efiicient
as possible - possibly even being aware of PG's disk layout and treating
files differently on that basis.

Not being an FS guru I'm not really clear on whether this would help much
(enough to be worth it anyway) or not - any thoughts?  And if there were
useful gains to be had, would it need a whole new FS or could an existing
one be modified?
So there might be (as I said, I'm not an FS guru...):
* great append performance for the WAL?
* optimised scattered writes for checkpointing?
* Knowledge that FSYNC is being used for preserving ordering a lot of the
time, rather than requiring actual writes to disk (so long as the writes
eventually happen in order...)?
Matt

Matt Clark
Ymogen Ltd
P: 0845 130 4531
W: https://ymogen.net/
M: 0774 870 1584
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-10-21 Thread Steinar H. Gunderson
On Thu, Oct 21, 2004 at 08:58:01AM +0100, Matt Clark wrote:
 I suppose I'm just idly wondering really.  Clearly it's against PG
 philosophy to build an FS or direct IO management into PG, but now it's so
 relatively easy to plug filesystems into the main open-source Oses, It
 struck me that there might be some useful changes to, say, XFS or ext3, that
 could be made that would help PG out.

This really sounds like a poor replacement for just making PostgreSQL use raw
devices to me. (I have no idea why that isn't done already, but presumably it
isn't all that easy to get right. :-) )

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-10-21 Thread Leeuw van der, Tim
Hi,

I guess the difference is in 'severe hacking inside PG' vs. 'some unknown amount of 
hacking that doesn't touch PG code'.

Hacking PG internally to handle raw devices will meet with strong resistance from 
large portions of the development team. I don't expect (m)any core devs of PG will be 
excited about rewriting the entire I/O architecture of PG and duplicating large 
amounts of OS type of code inside the application, just to try to attain an unknown 
performance benefit.

PG doesn't use one big file, as some databases do, but many small files. Now PG would 
need to be able to do file-management, if you put the PG database on a raw disk 
partition! That's icky stuff, and you'll find much resistance against putting such 
code inside PG.
So why not try to have the external FS know a bit about PG and it's directory-layout, 
and it's IO requirements? Then such type of code can at least be maintained outside 
the application, and will not be as much of a burden to the rest of the application.

(I'm not sure if it's a good idea to create a PG-specific FS in your OS of choice, but 
it's certainly gonna be easier than getting FS code inside of PG)

cheers,

--Tim

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Steinar H. Gunderson
Sent: Thursday, October 21, 2004 12:27 PM
To: [EMAIL PROTECTED]
Subject: Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?


On Thu, Oct 21, 2004 at 08:58:01AM +0100, Matt Clark wrote:
 I suppose I'm just idly wondering really.  Clearly it's against PG
 philosophy to build an FS or direct IO management into PG, but now it's so
 relatively easy to plug filesystems into the main open-source Oses, It
 struck me that there might be some useful changes to, say, XFS or ext3, that
 could be made that would help PG out.

This really sounds like a poor replacement for just making PostgreSQL use raw
devices to me. (I have no idea why that isn't done already, but presumably it
isn't all that easy to get right. :-) )

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-10-21 Thread Aaron Werman
The intuitive thing would be to put pg into a file system. 

/Aaron

On Thu, 21 Oct 2004 12:44:10 +0200, Leeuw van der, Tim
[EMAIL PROTECTED] wrote:
 Hi,
 
 I guess the difference is in 'severe hacking inside PG' vs. 'some unknown amount of 
 hacking that doesn't touch PG code'.
 
 Hacking PG internally to handle raw devices will meet with strong resistance from 
 large portions of the development team. I don't expect (m)any core devs of PG will 
 be excited about rewriting the entire I/O architecture of PG and duplicating large 
 amounts of OS type of code inside the application, just to try to attain an unknown 
 performance benefit.
 
 PG doesn't use one big file, as some databases do, but many small files. Now PG 
 would need to be able to do file-management, if you put the PG database on a raw 
 disk partition! That's icky stuff, and you'll find much resistance against putting 
 such code inside PG.
 So why not try to have the external FS know a bit about PG and it's 
 directory-layout, and it's IO requirements? Then such type of code can at least be 
 maintained outside the application, and will not be as much of a burden to the rest 
 of the application.
 
 (I'm not sure if it's a good idea to create a PG-specific FS in your OS of choice, 
 but it's certainly gonna be easier than getting FS code inside of PG)
 
 cheers,
 
 --Tim
 
 
 
 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Steinar H. Gunderson
 Sent: Thursday, October 21, 2004 12:27 PM
 To: [EMAIL PROTECTED]
 Subject: Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?
 
 On Thu, Oct 21, 2004 at 08:58:01AM +0100, Matt Clark wrote:
  I suppose I'm just idly wondering really.  Clearly it's against PG
  philosophy to build an FS or direct IO management into PG, but now it's so
  relatively easy to plug filesystems into the main open-source Oses, It
  struck me that there might be some useful changes to, say, XFS or ext3, that
  could be made that would help PG out.
 
 This really sounds like a poor replacement for just making PostgreSQL use raw
 devices to me. (I have no idea why that isn't done already, but presumably it
 isn't all that easy to get right. :-) )
 
 /* Steinar */
 --
 Homepage: http://www.sesse.net/
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
 
 ---(end of broadcast)---
 TIP 7: don't forget to increase your free space map settings
 


-- 

Regards,
/Aaron

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-10-21 Thread Neil Conway
Matt Clark wrote:
I'm thinking along the lines of an FS that's aware of PG's strategies and
requirements and therefore optimised to make those activities as efiicient
as possible - possibly even being aware of PG's disk layout and treating
files differently on that basis.
As someone else noted, this doesn't belong in the filesystem (rather the 
kernel's block I/O layer/buffer cache). But I agree, an API by which we 
can tell the kernel what kind of I/O behavior to expect would be good. 
The kernel needs to provide good behavior for a wide range of 
applications, but the DBMS can take advantage of a lot of 
domain-specific information. In theory, being able to pass that 
domain-specific information on to the kernel would mean we could get 
better performance without needing to reimplement large chunks of 
functionality that really ought to be done by the kernel anyway (as 
implementing raw I/O would require, for example). On the other hand, it 
would probably mean adding a fair bit of OS-specific hackery, which 
we've largely managed to avoid in the past.

The closest API to what you're describing that I'm aware of is 
posix_fadvise(). While that is technically-speaking a POSIX standard, it 
is not widely implemented (I know Linux 2.6 implements it; based on some 
quick googling, it looks like AIX does too). Using posix_fadvise() has 
been discussed in the past, so you might want to search the archives. We 
could use FADV_SEQUENTIAL to request more aggressive readahead on a file 
that we know we're about to sequentially scan. We might be able to use 
FADV_NOREUSE on the WAL. We might be able to get away with specifying 
FADV_RANDOM for indexes all of the time, or at least most of the time. 
One question is how this would interact with concurrent access (AFAICS 
there is no way to fetch the current advice on an fd...)

Also, I would imagine Win32 provides some means to inform the kernel 
about your expected I/O pattern, but I haven't checked. Does anyone know 
of any other relevant APIs?

-Neil
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-10-21 Thread Steinar H. Gunderson
On Thu, Oct 21, 2004 at 12:44:10PM +0200, Leeuw van der, Tim wrote:
 Hacking PG internally to handle raw devices will meet with strong
 resistance from large portions of the development team. I don't expect
 (m)any core devs of PG will be excited about rewriting the entire I/O
 architecture of PG and duplicating large amounts of OS type of code inside
 the application, just to try to attain an unknown performance benefit.

Well, at least I see people claiming 30% difference between different file
systems, but no, I'm not shouting bah, you'd better do this or I'll warez
Oracle :-) I have no idea how much you can improve over the best
filesystems out there, but having two layers of journalling (both WAL _and_
FS journalling) on top of each other don't make all that much sense to me.
:-)

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-10-21 Thread Tom Lane
Steinar H. Gunderson [EMAIL PROTECTED] writes:
 ... I have no idea how much you can improve over the best
 filesystems out there, but having two layers of journalling (both WAL _and_
 FS journalling) on top of each other don't make all that much sense to me.

Which is why setting the FS to journal metadata but not file contents is
often suggested as best practice for a PG-only filesystem.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] create index with substr function

2004-10-21 Thread george young
As previously suggested by Stephan Szabo, you need to create a helper
function, e.g.:
create or replace function after9(text)returns text language plpgsql immutable as '
   begin
   return substr($1, 10);
   end;
';

You may need the immutable specification is to allow the
function's use in an index.

Then use this function in the index creation:

CREATE INDEX idx_doc_substr_doc_urn ON doc USING btree (after9(doc_urn));

I think that should do it.


-- George

On Thu, 21 Oct 2004 11:37:26 +0800
Ray [EMAIL PROTECTED] threw this fish to the penguins:

 sorry it doesn't works, as my postgres is 7.3 not 7.4. any other alternative
 solution for version after 7.4??
 
 Thank
 Ray : )
 
 - Original Message - 
 From: Rosser Schwarz [EMAIL PROTECTED]
 To: Ray [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Sent: Thursday, October 21, 2004 11:34 AM
 Subject: Re: [PERFORM] create index with substr function
 
 
  while you weren't looking, Ray wrote:
 
   CREATE INDEX idx_doc_substr_doc_urn ON doc USING btree
 (SUBSTR(doc_urn,10));
 
  CREATE INDEX idx_doc_substr_doc_urn ON doc USING btree
 ((SUBSTR(doc_urn,10)));
 
  You need an additional set of parens around the SUBSTR() call.
 
  /rls
 
  -- 
  :wq
 
 
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
 


-- 
Are the gods not just?  Oh no, child.
What would become of us if they were? (CSL)

---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] Simple machine-killing query!

2004-10-21 Thread Victor Ciurus
Hi all,

I'm writing this because I've reached the limit of my imagination and
patience! So here is it...

2 tables:
1 containing 27 million variable lenght, alpha-numeric records
(strings) in 1 (one) field. (10 - 145 char lenght per record)
1 containing 2.5 million variable lenght, alpha-numeric records
(strings) in 1 (one) field.

table wehere created using:
CREATE TABLE public.BIGMA (string VARCHAR(255) NOT NULL) WITH OIDS; +
CREATE INDEX BIGMA_INDEX ON public.BIGMA USING btree (string);
and 
CREATE TABLE public.DIRTY (string VARCHAR(128) NOT NULL) WITH OIDS; +
CREATE INDEX DIRTY_INDEX ON public.DIRTY USING btree (string);

What I am requested to do is to keep all records from 'BIGMA' that do
not apear in 'DIRTY'
So far I have tried solving this by going for:

[explain] select * from BIGMA where string not in (select * from DIRTY);
   QUERY PLAN

 Seq Scan on bigma  (cost=0.00..24582291.25 rows=500 width=145)
   Filter: (NOT (subplan))
   SubPlan
 -  Seq Scan on dirty  (cost=0.00..42904.63 rows=2503963 width=82)
(4 rows)

AND

[explain] select * from bigma,dirty where bigma.email!=dirty.email;
  QUERY PLAN
---
 Nested Loop  (cost=20.00..56382092.13 rows=2491443185 width=227)
   Join Filter: ((inner.email)::text  (outer.email)::text)
   -  Seq Scan on dirty  (cost=0.00..42904.63 rows=2503963 width=82)
   -  Materialize  (cost=20.00..30.00 rows=1000 width=145)
 -  Seq Scan on bigma  (cost=0.00..20.00 rows=1000 width=145)
(5 rows)

Now the problem is that both of my previous tries seem to last
forever! I'm not a pqsql guru so that's why I'm asking you fellas to
guide mw right! I've tried this on mysql previosly but there seems to
be no way mysql can handle this large query.

QUESTIONS:
What can I do in order to make this work?
Where do I make mistakes? Is there a way I can improve the performance
in table design, query style, server setting so that I can get this
monster going and producing a result?

Thanks all for your preciuos time and answers!

Victor C.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-10-21 Thread Jan Dittmer
Neil Conway wrote:
 Also, I would imagine Win32 provides some means to inform the kernel 
 about your expected I/O pattern, but I haven't checked. Does anyone know 
 of any other relevant APIs?

See CreateFile, Parameter dwFlagsAndAttributes

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/base/createfile.asp

There is FILE_FLAG_NO_BUFFERING, FILE_FLAG_OPEN_NO_RECALL,
FILE_FLAG_RANDOM_ACCESS and even FILE_FLAG_POSIX_SEMANTICS

Jan


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Simple machine-killing query!

2004-10-21 Thread Stephan Szabo

On Thu, 21 Oct 2004, Victor Ciurus wrote:

 Hi all,

 I'm writing this because I've reached the limit of my imagination and
 patience! So here is it...

 2 tables:
 1 containing 27 million variable lenght, alpha-numeric records
 (strings) in 1 (one) field. (10 - 145 char lenght per record)
 1 containing 2.5 million variable lenght, alpha-numeric records
 (strings) in 1 (one) field.

 table wehere created using:
 CREATE TABLE public.BIGMA (string VARCHAR(255) NOT NULL) WITH OIDS; +
 CREATE INDEX BIGMA_INDEX ON public.BIGMA USING btree (string);
 and
 CREATE TABLE public.DIRTY (string VARCHAR(128) NOT NULL) WITH OIDS; +
 CREATE INDEX DIRTY_INDEX ON public.DIRTY USING btree (string);

 What I am requested to do is to keep all records from 'BIGMA' that do
 not apear in 'DIRTY'
 So far I have tried solving this by going for:

 [explain] select * from BIGMA where string not in (select * from DIRTY);
QUERY PLAN
 
  Seq Scan on bigma  (cost=0.00..24582291.25 rows=500 width=145)
Filter: (NOT (subplan))
SubPlan
  -  Seq Scan on dirty  (cost=0.00..42904.63 rows=2503963 width=82)
 (4 rows)

Have you analyzed bigma? The number of rows from the two explains for that
table look suspiciously like default values.

Also, what version are you using, because there are some differences from
7.3 to 7.4 that change possible suggestions.

The first is that on 7.4, you may be able to do better with a higher
sort_mem which could possible switch over to the hashed implementation,
although I think it's probably going to take a pretty high value given the
size.

The second is that you might get better results (even on older versions)
from an exists or left join solution, something like (assuming no nulls in
bigma.email):

select * from bigma where not exists(select 1 from dirty where dirty.email
!= bigma.email);

select bigma.* from bigma left outer join dirty on (dirty.email =
bigma.email) where dirty.email is null;

If you've got nulls in bigma.email you have to be a little more careful.

 [explain] select * from bigma,dirty where bigma.email!=dirty.email;

This *almost* certainly does not do what you want.  For most data sets
this is going to give you a number of rows very close to # of rows in
dirty * # of rows in bigma.  Needless to say, this is going to take a long
time.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Simple machine-killing query!

2004-10-21 Thread Aaron Werman
Sounds like you need some way to match a subset of the data first,
rather than try indices that are bigger than the data. Can you add
operation indices, perhaps on the first 10 bytes of the keys in both
tables or on a integer hash of all of the strings? If so you could
join on the exact set difference over the set difference of the
operation match.

/Aaron


On Thu, 21 Oct 2004 17:34:17 +0300, Victor Ciurus [EMAIL PROTECTED] wrote:
 Hi all,
 
 I'm writing this because I've reached the limit of my imagination and
 patience! So here is it...
 
 2 tables:
 1 containing 27 million variable lenght, alpha-numeric records
 (strings) in 1 (one) field. (10 - 145 char lenght per record)
 1 containing 2.5 million variable lenght, alpha-numeric records
 (strings) in 1 (one) field.
 
 table wehere created using:
 CREATE TABLE public.BIGMA (string VARCHAR(255) NOT NULL) WITH OIDS; +
 CREATE INDEX BIGMA_INDEX ON public.BIGMA USING btree (string);
 and
 CREATE TABLE public.DIRTY (string VARCHAR(128) NOT NULL) WITH OIDS; +
 CREATE INDEX DIRTY_INDEX ON public.DIRTY USING btree (string);
 
 What I am requested to do is to keep all records from 'BIGMA' that do
 not apear in 'DIRTY'
 So far I have tried solving this by going for:
 
 [explain] select * from BIGMA where string not in (select * from DIRTY);
QUERY PLAN
 
  Seq Scan on bigma  (cost=0.00..24582291.25 rows=500 width=145)
Filter: (NOT (subplan))
SubPlan
  -  Seq Scan on dirty  (cost=0.00..42904.63 rows=2503963 width=82)
 (4 rows)
 
 AND
 
 [explain] select * from bigma,dirty where bigma.email!=dirty.email;
   QUERY PLAN
 ---
  Nested Loop  (cost=20.00..56382092.13 rows=2491443185 width=227)
Join Filter: ((inner.email)::text  (outer.email)::text)
-  Seq Scan on dirty  (cost=0.00..42904.63 rows=2503963 width=82)
-  Materialize  (cost=20.00..30.00 rows=1000 width=145)
  -  Seq Scan on bigma  (cost=0.00..20.00 rows=1000 width=145)
 (5 rows)
 
 Now the problem is that both of my previous tries seem to last
 forever! I'm not a pqsql guru so that's why I'm asking you fellas to
 guide mw right! I've tried this on mysql previosly but there seems to
 be no way mysql can handle this large query.
 
 QUESTIONS:
 What can I do in order to make this work?
 Where do I make mistakes? Is there a way I can improve the performance
 in table design, query style, server setting so that I can get this
 monster going and producing a result?
 
 Thanks all for your preciuos time and answers!
 
 Victor C.
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
 


-- 

Regards,
/Aaron

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Simple machine-killing query!

2004-10-21 Thread Tom Lane
Victor Ciurus [EMAIL PROTECTED] writes:
 What I am requested to do is to keep all records from 'BIGMA' that do
 not apear in 'DIRTY'
 So far I have tried solving this by going for:

 [explain] select * from BIGMA where string not in (select * from DIRTY);
QUERY PLAN
 
  Seq Scan on bigma  (cost=0.00..24582291.25 rows=500 width=145)
Filter: (NOT (subplan))
SubPlan
  -  Seq Scan on dirty  (cost=0.00..42904.63 rows=2503963 width=82)
 (4 rows)

If you are using PG 7.4, you can get reasonable performance out of this
approach, but you need to jack sort_mem up to the point where the whole
DIRTY table will fit into sort_mem (so that you get a hashed-subplan
plan and not a plain subplan).  If you find yourself setting sort_mem to
more than say half of your machine's available RAM, you should probably
forget that idea.

 [explain] select * from bigma,dirty where bigma.email!=dirty.email;

This of course does not give the right answer at all.

A trick that people sometimes use is an outer join:

select * from bigma left join dirty on (bigma.email=dirty.email)
where dirty.email is null;

Understanding why this works is left as an exercise for the reader
... but it does work, and pretty well too.  If you're using pre-7.4
PG then this is about the only effective solution AFAIR.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-10-21 Thread Steinar H. Gunderson
On Thu, Oct 21, 2004 at 10:20:55AM -0400, Tom Lane wrote:
 ... I have no idea how much you can improve over the best
 filesystems out there, but having two layers of journalling (both WAL _and_
 FS journalling) on top of each other don't make all that much sense to me.
 Which is why setting the FS to journal metadata but not file contents is
 often suggested as best practice for a PG-only filesystem.

Mm, but you still journal the metadata. Oh well, noatime etc.. :-)

By the way, I'm probably hitting a FAQ here, but would O_DIRECT help
PostgreSQL any, given large enough shared_buffers?

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Simple machine-killing query!

2004-10-21 Thread Josh Berkus
Victor,

 [explain] select * from BIGMA where string not in (select * from DIRTY);
                                QUERY PLAN
 
  Seq Scan on bigma  (cost=0.00..24582291.25 rows=500 width=145)
    Filter: (NOT (subplan))
    SubPlan
      -  Seq Scan on dirty  (cost=0.00..42904.63 rows=2503963 width=82)

This is what you call an evil query.   I'm not surprised it takes forever; 
you're telling the database Compare every value in 2.7 million rows of text 
against 2.5 million rows of text and give me those that don't match.   There 
is simply no way, on ANY RDBMS, for this query to execute and not eat all of 
your RAM and CPU for a long time.

You're simply going to have to allocate shared_buffers and sort_mem (about 2GB 
of sort_mem would be good) to the query, and turn the computer over to the 
task until it's done.   And, for the sake of sanity, when you find the 
200,000 rows that don't match, flag them so that you don't have to do this 
again.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-10-21 Thread Sean Chittenden
As someone else noted, this doesn't belong in the filesystem (rather 
the kernel's block I/O layer/buffer cache). But I agree, an API by 
which we can tell the kernel what kind of I/O behavior to expect would 
be good.
[snip]
The closest API to what you're describing that I'm aware of is 
posix_fadvise(). While that is technically-speaking a POSIX standard, 
it is not widely implemented (I know Linux 2.6 implements it; based on 
some quick googling, it looks like AIX does too).
Don't forget about the existence/usefulness/widely implemented 
madvise(2)/posix_madvise(2) call, which can give the OS the following 
hints: MADV_NORMAL, MADV_SEQUENTIAL, MADV_RANDOM, MADV_WILLNEED, 
MADV_DONTNEED, and MADV_FREE.  :)  -sc

--
Sean Chittenden
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


[PERFORM] Links to OSDL test results up

2004-10-21 Thread Josh Berkus
Simon, Folks,

I've put links to all of my OSDL-STP test results up on the TestPerf project:
http://pgfoundry.org/forum/forum.php?thread_id=164forum_id=160

SHareEnjoy!

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] mmap (was First set of OSDL Shared Mem scalability results, some wierdness ...

2004-10-21 Thread Sean Chittenden
However the really major difficulty with using mmap is that it breaks
the scheme we are currently using for WAL, because you don't have any
way to restrict how soon a change in an mmap'd page will go to disk.
(No, I don't believe that mlock guarantees this.  It says that the
page will not be removed from main memory; it does not specify that,
say, the syncer won't write the contents out anyway.)
I had to think about this for a minute (now nearly a week) and reread 
the docs on WAL before I groked what could happen here.  You're 
absolutely right in that WAL needs to be taken into account first.  How 
does this execution path sound to you?

By default, all mmap(2)'ed pages are MAP_SHARED.  There are no 
complications with regards to reads.

When a backend wishes to write a page, the following steps are taken:
1) Backend grabs a lock from the lockmgr to write to the page (exactly 
as it does now)

2) Backend mmap(2)'s a second copy of the page(s) being written to, 
this time with the MAP_PRIVATE flag set.  Mapping a copy of the page 
again is wasteful in terms of address space, but does not require any 
more memory than our current scheme.  The re-mapping of the page with 
MAP_PRIVATE prevents changes to the data that other backends are 
viewing.

3) The writing backend, can then scribble on its private copy of the 
page(s) as it sees fit.

4) Once completed making changes and a transaction is to be committed, 
the backend WAL logs its changes.

5) Once the WAL logging is complete and it has hit the disk, the 
backend msync(2)'s its private copy of the pages to disk (ASYNC or 
SYNC, it doesn't really matter too much to me).

6) Optional(?).  I'm not sure whether or not the backend would need to 
also issues an msync(2) MS_INVALIDATE, but, I suspect it would not need 
to on systems with unified buffer caches such as FreeBSD or OS-X.  On 
HPUX, or other older *NIX'es, it may be necessary.  *shrug*  I could be 
trying to be overly protective here.

7) Backend munmap(2)'s its private copy of the written on page(s).
8) Backend releases its lock from the lockmgr.
At this point, the remaining backends now are able to see the updated 
pages of data.

Let's look at what happens with a read(2) call.  To read(2) data you
have to have a block of memory to copy data into.  Assume your OS of
choice has a good malloc(3) implementation and it only needs to call
brk(2) once to extend the process's memory address after the first
malloc(3) call.  There's your first system call, which guarantees one
context switch.
Wrong.  Our reads occur into shared memory allocated at postmaster
startup, remember?
Doh.  Fair enough.  In most programs that involve read(2), a call to 
alloc(3) needs to be made.

mmap(2) is a totally different animal in that you don't ever need to
make calls to read(2): mmap(2) is used in place of those calls (With
#ifdef and a good abstraction, the rest of PostgreSQL wouldn't know it
was working with a page of mmap(2)'ed data or need to know that it 
is).
Instead, you have to worry about address space management and keeping a
consistent view of the data.
Which is largely handled by mmap() and the VM.
... If a write(2) system call is issued on a page of
mmap(2)'ed data (and your operating system supports it, I know FreeBSD
does, but don't think Linux does), then the page of data is DMA'ed by
the network controller and sent out without the data needing to be
copied into the network controller's buffer.
Perfectly irrelevant to Postgres, since there is no situation where 
we'd
ever write directly from a disk buffer to a socket; in the present
implementation there are at least two levels of copy needed in between
(datatype-specific output function and protocol message assembly).  And
that's not even counting the fact that any data item large enough to
make the savings interesting would have been sliced, diced, and
compressed by TOAST.
The biggest winners will be columns whos storage type is PLAIN or 
EXTERNAL.  writev(2) from mmap(2)'ed pages and non-mmap(2)'ed pages 
would be a nice perk too (not sure if PostgreSQL uses this or not).  
Since compression isn't happening on most tuples under 1K in size and 
most tuples in a database are going to be under that, most tuples are 
going to be uncompressed.  Total pages for the database, however, is 
likely a different story.  For large tuples that are uncompressed and 
larger than a page, it is probably beneficial to use sendfile(2) 
instead of mmap(2) + write(2)'ing the page/file.

If a large tuple is compressed, it'd be interesting to see if it'd be 
worthwhile to have the data uncompressed onto an anonymously mmap(2)'ed 
page(s) that way the benefits of zero-socket-copies could be used.

shared mem is a bastardized subsystem that works, but isn't integral 
to
any performance areas in the kernel so it gets neglected.
What performance issues do you think shared memory needs to have fixed?
We don't issue any shmem kernel calls after the initial shmget, so
comparing the 

[PERFORM] Performance Anomalies in 7.4.5

2004-10-21 Thread Thomas F.O'Connell
I'm seeing some weird behavior on a repurposed server that was wiped  
clean and set up to run as a database and application server with  
postgres and Apache, as well as some command-line PHP scripts.

The box itself is a quad processor (2.4 GHz Intel Xeons) Debian woody  
GNU/Linux (2.6.2) system.

postgres is crawling on some fairly routine queries. I'm wondering if  
this could somehow be related to the fact that this isn't a  
database-only server, but Apache is not really using any resources when  
postgres slows to a crawl.

Here's an example of analysis of a recent query:
EXPLAIN ANALYZE SELECT COUNT(DISTINCT u.id)
FROM userdata as u, userdata_history as h
WHERE h.id = '18181'
AND h.id = u.id;

QUERY PLAN
 

 Aggregate  (cost=0.02..0.02 rows=1 width=8) (actual  
time=298321.421..298321.422 rows=1 loops=1)
   -  Nested Loop  (cost=0.00..0.01 rows=1 width=8) (actual  
time=1.771..298305.531 rows=2452 loops=1)
 Join Filter: (inner.id = outer.id)
 -  Seq Scan on userdata u  (cost=0.00..0.00 rows=1 width=8)  
(actual time=0.026..11.869 rows=2452 loops=1)
 -  Seq Scan on userdata_history h  (cost=0.00..0.00 rows=1  
width=8) (actual time=0.005..70.519 rows=41631 loops=2452)
   Filter: (id = 18181::bigint)
 Total runtime: 298321.926 ms
(7 rows)

userdata has a primary/foreign key on id, which references  
userdata_history.id, which is a primary key.

At the time of analysis, the userdata table had  2,500 rows.  
userdata_history had  50,000 rows. I can't imagine how even a seq scan  
could result in a runtime of nearly 5 minutes in these circumstances.

Also, doing a count( * ) from each table individually returns nearly  
instantly.

I can provide details of postgresql.conf and kernel settings if  
necessary, but I'm using some pretty well tested settings that I use  
any time I admin a postgres installation these days based on box  
resources and database size. I'm more interested in knowing if there  
are any bird's eye details I should be checking immediately.

Thanks.
-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Performance Anomalies in 7.4.5

2004-10-21 Thread Thomas F . O'Connell
I know, I know: I should've done this before I posted. REINDEXing and  
VACUUMing mostly fixed this problem. Which gets me back to where I was  
yesterday, reviewing an import process (that existed previously) that  
populates tables in this system that seems to allow small data sets to  
cause simple queries like this to crawl. Is there anything about  
general COPY/INSERT activity that can cause small data sets to become  
so severely slow in postgres that can be prevented other than being  
diligent about VACUUMing? I was hoping that pg_autovacuum along with  
post-import manual VACUUMs would be sufficient, but it doesn't seem to  
be the case necessarily. Granted, I haven't done a methodical and  
complete review of the process, but I'm still surprised at how quickly  
it seems able to debilitate postgres with even small amounts of data. I  
had a similar situation crawl yesterday based on a series of COPYs  
involving 5 rows!

As in, can I look for something to treat the cause rather than the  
symptoms?

If not, should I be REINDEXing manually, as well as VACUUMing manually  
after large data imports (whether via COPY or INSERT)? Or will a VACUUM  
FULL ANALYZE be enough?

Thanks!
-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
On Oct 21, 2004, at 3:36 PM, Thomas F.O'Connell wrote:
I'm seeing some weird behavior on a repurposed server that was wiped  
clean and set up to run as a database and application server with  
postgres and Apache, as well as some command-line PHP scripts.

The box itself is a quad processor (2.4 GHz Intel Xeons) Debian woody  
GNU/Linux (2.6.2) system.

postgres is crawling on some fairly routine queries. I'm wondering if  
this could somehow be related to the fact that this isn't a  
database-only server, but Apache is not really using any resources  
when postgres slows to a crawl.

Here's an example of analysis of a recent query:
EXPLAIN ANALYZE SELECT COUNT(DISTINCT u.id)
FROM userdata as u, userdata_history as h
WHERE h.id = '18181'
AND h.id = u.id;

QUERY PLAN
--- 
--- 
--
 Aggregate  (cost=0.02..0.02 rows=1 width=8) (actual  
time=298321.421..298321.422 rows=1 loops=1)
   -  Nested Loop  (cost=0.00..0.01 rows=1 width=8) (actual  
time=1.771..298305.531 rows=2452 loops=1)
 Join Filter: (inner.id = outer.id)
 -  Seq Scan on userdata u  (cost=0.00..0.00 rows=1 width=8)  
(actual time=0.026..11.869 rows=2452 loops=1)
 -  Seq Scan on userdata_history h  (cost=0.00..0.00 rows=1  
width=8) (actual time=0.005..70.519 rows=41631 loops=2452)
   Filter: (id = 18181::bigint)
 Total runtime: 298321.926 ms
(7 rows)

userdata has a primary/foreign key on id, which references  
userdata_history.id, which is a primary key.

At the time of analysis, the userdata table had  2,500 rows.  
userdata_history had  50,000 rows. I can't imagine how even a seq  
scan could result in a runtime of nearly 5 minutes in these  
circumstances.

Also, doing a count( * ) from each table individually returns nearly  
instantly.

I can provide details of postgresql.conf and kernel settings if  
necessary, but I'm using some pretty well tested settings that I use  
any time I admin a postgres installation these days based on box  
resources and database size. I'm more interested in knowing if there  
are any bird's eye details I should be checking immediately.

Thanks.
-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
---(end of  
broadcast)---
TIP 8: explain analyze is your friend

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance Anomalies in 7.4.5

2004-10-21 Thread Thomas F.O'Connell
The irony is that I had just disabled pg_autovacuum the previous day 
during analysis of a wider issue affecting imports of data into the 
system.

-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
On Oct 21, 2004, at 4:05 PM, Dennis Bjorklund wrote:
On Thu, 21 Oct 2004, Thomas F.O'Connell wrote:
  Aggregate  (cost=0.02..0.02 rows=1 width=8) (actual
time=298321.421..298321.422 rows=1 loops=1)
-  Nested Loop  (cost=0.00..0.01 rows=1 width=8) (actual
time=1.771..298305.531 rows=2452 loops=1)
  Join Filter: (inner.id = outer.id)
  -  Seq Scan on userdata u  (cost=0.00..0.00 rows=1 width=8)
(actual time=0.026..11.869 rows=2452 loops=1)
  -  Seq Scan on userdata_history h  (cost=0.00..0.00 rows=1
width=8) (actual time=0.005..70.519 rows=41631 loops=2452)
Filter: (id = 18181::bigint)
It looks like you have not run ANALYZE recently. Most people run VACUUM
ANALYZE every night (or similar) in a cron job.
--
/Dennis Björklund
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Performance Anomalies in 7.4.5

2004-10-21 Thread Tom Lane
Thomas F.O'Connell [EMAIL PROTECTED] writes:
 -  Nested Loop  (cost=0.00..0.01 rows=1 width=8) (actual  
 time=1.771..298305.531 rows=2452 loops=1)
   Join Filter: (inner.id = outer.id)
   -  Seq Scan on userdata u  (cost=0.00..0.00 rows=1 width=8)  
 (actual time=0.026..11.869 rows=2452 loops=1)
   -  Seq Scan on userdata_history h  (cost=0.00..0.00 rows=1  
 width=8) (actual time=0.005..70.519 rows=41631 loops=2452)
 Filter: (id = 18181::bigint)
   Total runtime: 298321.926 ms
 (7 rows)

What's killing you here is that the planner thinks these tables are
completely empty (notice the zero cost estimates, which implies the
table has zero blocks --- the fact that the rows estimate is 1 and not 0
is the result of sanity-check clamping inside costsize.c).  This leads
it to choose a nestloop, which would be the best plan if there were only
a few rows involved, but it degenerates rapidly when there are not.

It's easy to fall into this trap when truncating and reloading tables;
all you need is an analyze while the table is empty.  The rule of
thumb is to analyze just after you reload the table, not just before.

I'm getting more and more convinced that we need to drop the reltuples
and relpages entries in pg_class, in favor of checking the physical
table size whenever we make a plan.  We could derive the tuple count
estimate by having ANALYZE store a tuples-per-page estimate in pg_class
and then multiply by the current table size; tuples-per-page should be
a much more stable figure than total tuple count.

One drawback to this is that it would require an additional lseek per
table while planning, but that doesn't seem like a huge penalty.

Probably the most severe objection to doing things this way is that the
selected plan could change unexpectedly as a result of the physical
table size changing.  Right now the DBA can keep tight rein on actions
that might affect plan selection (ie, VACUUM and ANALYZE), but that
would go by the board with this.  OTOH, we seem to be moving towards
autovacuum, which also takes away any guarantees in this department.

In any case this is speculation for 8.1; I think it's too late for 8.0.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Does PostgreSQL run with Oracle?

2004-10-21 Thread Jim C. Nasby
On Fri, Oct 15, 2004 at 10:19:48AM -0700, Steve Atkins wrote:
 On Fri, Oct 15, 2004 at 11:54:44AM -0500, [EMAIL PROTECTED] wrote:
  My basic question to the community is is PostgreSQL approximately as fast
  as Oracle?
 
  I'm currently running single processor UltraSPARC workstations, and intend
  to use Intel Arch laptops and Linux.  The application is a big turnkey
  workstation app.  I know the hardware switch alone will enhance
  performance, and may do so to the point where even a slower database will
  still be adequate.
 
 I have found that PostgreSQL seems to perform poorly on Solaris/SPARC
 (less so after recent improvements, but still...) compared to x86
 systems - more so than the delta between Oracle on the two platforms.
 Just a gut impression, but it might mean that comparing the two
 databases on SPARC may not be that useful comparison if you're
 planning to move to x86.
 
As a point of reference, an IBM hardware sales rep I worked with a few
years ago told me that he got a lot of sales from Oracle shops that were
running Sun and switched to RS/6000. Basically, for a given workload, it
would take 2x the number of Sun CPUs as RS/6000 CPUs. The difference in
Oracle licensing costs was usually enough to pay for the new hardware in
one year.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] OS desicion

2004-10-21 Thread Jim C. Nasby
On Wed, Oct 20, 2004 at 09:38:51AM -0700, Josh Berkus wrote:
 Tom,
 
  You are asking the wrong question.  The best OS is the OS you (and/or
  the customer)  knows and can administer competently.  
 
 I'll have to 2nd this.

I'll 3rd but add one tidbit: FreeBSD will schedule disk I/O based on
process priority, while linux won't. This can be very handy for things
like vacuum.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-10-21 Thread Jim C. Nasby
Note that most people are now moving away from raw devices for databases
in most applicaitons. The relatively small performance gain isn't worth
the hassles.

On Thu, Oct 21, 2004 at 12:27:27PM +0200, Steinar H. Gunderson wrote:
 On Thu, Oct 21, 2004 at 08:58:01AM +0100, Matt Clark wrote:
  I suppose I'm just idly wondering really.  Clearly it's against PG
  philosophy to build an FS or direct IO management into PG, but now it's so
  relatively easy to plug filesystems into the main open-source Oses, It
  struck me that there might be some useful changes to, say, XFS or ext3, that
  could be made that would help PG out.
 
 This really sounds like a poor replacement for just making PostgreSQL use raw
 devices to me. (I have no idea why that isn't done already, but presumably it
 isn't all that easy to get right. :-) )
 
 /* Steinar */
 -- 
 Homepage: http://www.sesse.net/
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Performance Anomalies in 7.4.5

2004-10-21 Thread Gary Doades
On 21 Oct 2004 at 15:50, Thomas F.O'Connell wrote:

 If not, should I be REINDEXing manually, as well as VACUUMing manually  
 after large data imports (whether via COPY or INSERT)? Or will a VACUUM  
 FULL ANALYZE be enough?
 

It's not the vacuuming that's important here, just the analyze. If you import any data 
into 
a table, Postgres often does not *know* that until you gather the statistics on the 
table.
You are simply running into the problem of the planner not knowing how much 
data/distribution of data in your tables.

If you have large imports it may be faster overall to drop the indexes first, then 
insert the 
data, then put the indexes back on, then analyze.

Cheers,
Gary.


---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] Large Database Performance suggestions

2004-10-21 Thread Joshua Marsh
Hello everyone,

I am currently working on a data project that uses PostgreSQL
extensively to store, manage and maintain the data.  We haven't had
any problems regarding database size until recently.  The three major
tables we use never get bigger than 10 million records.  With this
size, we can do things like storing the indexes or even the tables in
memory to allow faster access.

Recently, we have found customers who are wanting to use our service
with data files between 100 million and 300 million records.  At that
size, each of the three major tables will hold between 150 million and
700 million records.  At this size, I can't expect it to run queries
in 10-15 seconds (what we can do with 10 million records), but would
prefer to keep them all under a minute.

We did some original testing and with a server with 8GB or RAM and
found we can do operations on data file up to 50 million fairly well,
but performance drop dramatically after that.  Does anyone have any
suggestions on a good way to improve performance for these extra large
tables?  Things that have come to mind are Replication and Beowulf
clusters, but from what I have recently studied, these don't do so wel
with singular processes.  We will have parallel process running, but
it's more important that the speed of each process be faster than
several parallel processes at once.

Any help would be greatly appreciated! 

Thanks,

Joshua Marsh

P.S. Off-topic, I have a few invitations to gmail.  If anyone would
like one, let me know.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Large Database Performance suggestions

2004-10-21 Thread Gavin Sherry
On Thu, 21 Oct 2004, Joshua Marsh wrote:

 Recently, we have found customers who are wanting to use our service
 with data files between 100 million and 300 million records.  At that
 size, each of the three major tables will hold between 150 million and
 700 million records.  At this size, I can't expect it to run queries
 in 10-15 seconds (what we can do with 10 million records), but would
 prefer to keep them all under a minute.

To provide any useful information, we'd need to look at your table schemas
and sample queries.

The values for sort_mem and shared_buffers will also be useful.

Are you VACUUMing and ANALYZEing? (or is the data read only?))

gavin

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Large Database Performance suggestions

2004-10-21 Thread Tom Lane
Joshua Marsh [EMAIL PROTECTED] writes:
 ... We did some original testing and with a server with 8GB or RAM and
 found we can do operations on data file up to 50 million fairly well,
 but performance drop dramatically after that.

What you have to ask is *why* does it drop dramatically?  There aren't
any inherent limits in Postgres that are going to kick in at that level.
I'm suspicious that you could improve the situation by adjusting
sort_mem and/or other configuration parameters; but there's not enough
info here to make specific recommendations.  I would suggest posting
EXPLAIN ANALYZE results for your most important queries both in the size
range where you are getting good results, and the range where you are not.
Then we'd have something to chew on.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] mmap (was First set of OSDL Shared Mem scalability results, some wierdness ...

2004-10-21 Thread Tom Lane
Sean Chittenden [EMAIL PROTECTED] writes:
 When a backend wishes to write a page, the following steps are taken:
 ...
 2) Backend mmap(2)'s a second copy of the page(s) being written to, 
 this time with the MAP_PRIVATE flag set.
 ...
 5) Once the WAL logging is complete and it has hit the disk, the 
 backend msync(2)'s its private copy of the pages to disk (ASYNC or 
 SYNC, it doesn't really matter too much to me).

My man page for mmap says that changes in a MAP_PRIVATE region are
private; they do not affect the file at all, msync or no.  So I don't
think the above actually works.

In any case, this scheme still forces you to flush WAL records to disk
before making the changed page visible to other backends, so I don't
see how it improves the situation.  In the existing scheme we only have
to fsync WAL at (1) transaction commit, (2) when we are forced to write
a page out from shared buffers because we are short of buffers, or (3)
checkpoint.  Anything that implies an fsync per atomic action is going
to be a loser.  It does not matter how great your kernel API is if you
only get to perform one atomic action per disk rotation :-(

The important point here is that you can't postpone making changes at
the page level visible to other backends; there's no MVCC at this level.
Consider for example two backends wanting to insert a new row.  If they
both MAP_PRIVATE the same page, they'll probably choose the same tuple
slot on the page to insert into (certainly there is nothing to stop that
from happening).  Now you have conflicting definitions for the same
CTID, not to mention probably conflicting uses of the page's physical
free space; disaster ensues.  So atomic action really means lock
page, make changes, add WAL record to in-memory WAL buffers, unlock
page with the understanding that as soon as you unlock the page the
changes you've made in it are visible to all other backends.  You
*can't* afford to put a WAL fsync in this sequence.

You could possibly buy back most of the lossage in this scenario if
there were some efficient way for a backend to hold the low-level lock
on a page just until some other backend wanted to modify the page;
whereupon the previous owner would have to do what's needed to make his
changes visible before releasing the lock.  Given the right access
patterns you don't have to fsync very often (though given the wrong
access patterns you're still in deep trouble).  But we don't have any
such mechanism and I think the communication costs of one would be
forbidding.

 [ much snipped ]
 4) Not having shared pages get lost when the backend dies (mmap(2) uses 
 refcounts and cleans itself up, no need for ipcs/ipcrm/ipcclean).

Actually, that is not a bug that's a feature.  One of the things that
scares me about mmap is that a crashing backend is able to scribble all
over live disk buffers before it finally SEGV's (think about memcpy gone
wrong and similar cases).  In our existing scheme there's a pretty good
chance that we will be able to commit hara-kiri before any of the
trashed data gets written out.  In an mmap scheme, it's time to dig out
your backup tapes, because there simply is no distinction between
transient and permanent data --- the kernel has no way to know that you
didn't mean it.

In short, I remain entirely unconvinced that mmap is of any interest to us.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly