Re: [PERFORM] create index with substr function
Tom Lane wrote: "Ray" [EMAIL PROTECTED] writes: CREATE INDEX idx_doc_substr_doc_urn ON doc USING btree (SUBSTR(doc_urn,10)); ERROR: parser: parse error at or near "10" at character 68 This will work in 7.4, but not older releases. Can't you just use a SQL function that calls the substr function? I have done that with date functions before like: CREATE OR REPLACE FUNCTION get_month(text) returns double precision AS ' SELECT date_part('month',$1); ' LANGUAGE 'SQL' IMMUTABLE; CREATE INDEX get_month_idx on foo(get_month(date_field)); Or in this case: CREATE OR REPLACE FUNCTION sub_text(text) returns text AS ' SELECT SUBSTR($1,10) from foo; ' LANGUAGE 'SQL' IMMUTABLE; CREATE INDEX sub_text_idx ON foo(sub_text(doc_urn)); This works on 7.3.6??? Sincerely, Joshua D. Drake regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL
[PERFORM] Anything to be gained from a 'Postgres Filesystem'?
I suppose I'm just idly wondering really. Clearly it's against PG philosophy to build an FS or direct IO management into PG, but now it's so relatively easy to plug filesystems into the main open-source Oses, It struck me that there might be some useful changes to, say, XFS or ext3, that could be made that would help PG out. I'm thinking along the lines of an FS that's aware of PG's strategies and requirements and therefore optimised to make those activities as efiicient as possible - possibly even being aware of PG's disk layout and treating files differently on that basis. Not being an FS guru I'm not really clear on whether this would help much (enough to be worth it anyway) or not - any thoughts? And if there were useful gains to be had, would it need a whole new FS or could an existing one be modified? So there might be (as I said, I'm not an FS guru...): * great append performance for the WAL? * optimised scattered writes for checkpointing? * Knowledge that FSYNC is being used for preserving ordering a lot of the time, rather than requiring actual writes to disk (so long as the writes eventually happen in order...)? Matt Matt Clark Ymogen Ltd P: 0845 130 4531 W: https://ymogen.net/ M: 0774 870 1584 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Free PostgreSQL Training, Philadelphia, Oct 30
Nobody got a plane to came from europe :-) ??? As a poor frenchie I will not come ... Have a good time Alban -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Aaron Mulder Sent: mercredi 20 octobre 2004 15:11 To: [EMAIL PROTECTED] Subject: Re: [PERFORM] Free PostgreSQL Training, Philadelphia, Oct 30 If anyone is going to take the train all the way, please e-mail me offline. There is a train station relatively close to the event (NY to Philly then the R5 to Malvern), but it's not within walking distance, so we'll figure out some way to pick people up from there. Thanks, Aaron On Wed, 20 Oct 2004, Aaron Werman wrote: I'm driving from Tenafly NJ and going to both sessions. If you're able to get to the George Washington Bridge (A train to 178th Street [Port Authority North] and a bus over the bridge), I can drive you down. I'm not sure right now about the return because I have confused plans to meet someone. /Aaron On Tue, 19 Oct 2004 14:43:29 -0400, Max Baker [EMAIL PROTECTED] wrote: On Wed, Oct 13, 2004 at 12:21:27PM -0400, Aaron Mulder wrote: All, My company (Chariot Solutions) is sponsoring a day of free PostgreSQL training by Bruce Momjian (one of the core PostgreSQL developers). The day is split into 2 sessions (plus a QA session): * Mastering PostgreSQL Administration * PostgreSQL Performance Tuning Registration is required, and space is limited. The location is Malvern, PA (suburb of Philadelphia) and it's on Saturday Oct 30. For more information or to register, see http://chariotsolutions.com/postgresql.jsp I'm up in New York City and would be taking the train down to Philly. Is anyone coming from Philly or New York that would be able to give me a lift to/from the train station? Sounds like a great event. Cheers, -m ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Regards, /Aaron ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?
Hiya, Looking at that list, I got the feeling that you'd want to push that PG-awareness down into the block-io layer as well, then, so as to be able to optimise for (perhaps) conflicting goals depending on what the app does; for the IO system to be able to read the apps mind it needs to have some knowledge of what the app is / needs / wants and I get the impression that this awareness needs to go deeper than the FS only. --Tim (But you might have time to rewrite Linux/BSD as a PG-OS? just kidding!) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Matt Clark Sent: Thursday, October 21, 2004 9:58 AM To: [EMAIL PROTECTED] Subject: [PERFORM] Anything to be gained from a 'Postgres Filesystem'? I suppose I'm just idly wondering really. Clearly it's against PG philosophy to build an FS or direct IO management into PG, but now it's so relatively easy to plug filesystems into the main open-source Oses, It struck me that there might be some useful changes to, say, XFS or ext3, that could be made that would help PG out. I'm thinking along the lines of an FS that's aware of PG's strategies and requirements and therefore optimised to make those activities as efiicient as possible - possibly even being aware of PG's disk layout and treating files differently on that basis. Not being an FS guru I'm not really clear on whether this would help much (enough to be worth it anyway) or not - any thoughts? And if there were useful gains to be had, would it need a whole new FS or could an existing one be modified? So there might be (as I said, I'm not an FS guru...): * great append performance for the WAL? * optimised scattered writes for checkpointing? * Knowledge that FSYNC is being used for preserving ordering a lot of the time, rather than requiring actual writes to disk (so long as the writes eventually happen in order...)? Matt Matt Clark Ymogen Ltd P: 0845 130 4531 W: https://ymogen.net/ M: 0774 870 1584 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?
Reiser4 ? On Thu, 21 Oct 2004 08:58:01 +0100, Matt Clark [EMAIL PROTECTED] wrote: I suppose I'm just idly wondering really. Clearly it's against PG philosophy to build an FS or direct IO management into PG, but now it's so relatively easy to plug filesystems into the main open-source Oses, It struck me that there might be some useful changes to, say, XFS or ext3, that could be made that would help PG out. I'm thinking along the lines of an FS that's aware of PG's strategies and requirements and therefore optimised to make those activities as efiicient as possible - possibly even being aware of PG's disk layout and treating files differently on that basis. Not being an FS guru I'm not really clear on whether this would help much (enough to be worth it anyway) or not - any thoughts? And if there were useful gains to be had, would it need a whole new FS or could an existing one be modified? So there might be (as I said, I'm not an FS guru...): * great append performance for the WAL? * optimised scattered writes for checkpointing? * Knowledge that FSYNC is being used for preserving ordering a lot of the time, rather than requiring actual writes to disk (so long as the writes eventually happen in order...)? Matt Matt Clark Ymogen Ltd P: 0845 130 4531 W: https://ymogen.net/ M: 0774 870 1584 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?
On Thu, Oct 21, 2004 at 08:58:01AM +0100, Matt Clark wrote: I suppose I'm just idly wondering really. Clearly it's against PG philosophy to build an FS or direct IO management into PG, but now it's so relatively easy to plug filesystems into the main open-source Oses, It struck me that there might be some useful changes to, say, XFS or ext3, that could be made that would help PG out. This really sounds like a poor replacement for just making PostgreSQL use raw devices to me. (I have no idea why that isn't done already, but presumably it isn't all that easy to get right. :-) ) /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?
Hi, I guess the difference is in 'severe hacking inside PG' vs. 'some unknown amount of hacking that doesn't touch PG code'. Hacking PG internally to handle raw devices will meet with strong resistance from large portions of the development team. I don't expect (m)any core devs of PG will be excited about rewriting the entire I/O architecture of PG and duplicating large amounts of OS type of code inside the application, just to try to attain an unknown performance benefit. PG doesn't use one big file, as some databases do, but many small files. Now PG would need to be able to do file-management, if you put the PG database on a raw disk partition! That's icky stuff, and you'll find much resistance against putting such code inside PG. So why not try to have the external FS know a bit about PG and it's directory-layout, and it's IO requirements? Then such type of code can at least be maintained outside the application, and will not be as much of a burden to the rest of the application. (I'm not sure if it's a good idea to create a PG-specific FS in your OS of choice, but it's certainly gonna be easier than getting FS code inside of PG) cheers, --Tim -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Steinar H. Gunderson Sent: Thursday, October 21, 2004 12:27 PM To: [EMAIL PROTECTED] Subject: Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'? On Thu, Oct 21, 2004 at 08:58:01AM +0100, Matt Clark wrote: I suppose I'm just idly wondering really. Clearly it's against PG philosophy to build an FS or direct IO management into PG, but now it's so relatively easy to plug filesystems into the main open-source Oses, It struck me that there might be some useful changes to, say, XFS or ext3, that could be made that would help PG out. This really sounds like a poor replacement for just making PostgreSQL use raw devices to me. (I have no idea why that isn't done already, but presumably it isn't all that easy to get right. :-) ) /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?
The intuitive thing would be to put pg into a file system. /Aaron On Thu, 21 Oct 2004 12:44:10 +0200, Leeuw van der, Tim [EMAIL PROTECTED] wrote: Hi, I guess the difference is in 'severe hacking inside PG' vs. 'some unknown amount of hacking that doesn't touch PG code'. Hacking PG internally to handle raw devices will meet with strong resistance from large portions of the development team. I don't expect (m)any core devs of PG will be excited about rewriting the entire I/O architecture of PG and duplicating large amounts of OS type of code inside the application, just to try to attain an unknown performance benefit. PG doesn't use one big file, as some databases do, but many small files. Now PG would need to be able to do file-management, if you put the PG database on a raw disk partition! That's icky stuff, and you'll find much resistance against putting such code inside PG. So why not try to have the external FS know a bit about PG and it's directory-layout, and it's IO requirements? Then such type of code can at least be maintained outside the application, and will not be as much of a burden to the rest of the application. (I'm not sure if it's a good idea to create a PG-specific FS in your OS of choice, but it's certainly gonna be easier than getting FS code inside of PG) cheers, --Tim -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Steinar H. Gunderson Sent: Thursday, October 21, 2004 12:27 PM To: [EMAIL PROTECTED] Subject: Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'? On Thu, Oct 21, 2004 at 08:58:01AM +0100, Matt Clark wrote: I suppose I'm just idly wondering really. Clearly it's against PG philosophy to build an FS or direct IO management into PG, but now it's so relatively easy to plug filesystems into the main open-source Oses, It struck me that there might be some useful changes to, say, XFS or ext3, that could be made that would help PG out. This really sounds like a poor replacement for just making PostgreSQL use raw devices to me. (I have no idea why that isn't done already, but presumably it isn't all that easy to get right. :-) ) /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings -- Regards, /Aaron ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?
Matt Clark wrote: I'm thinking along the lines of an FS that's aware of PG's strategies and requirements and therefore optimised to make those activities as efiicient as possible - possibly even being aware of PG's disk layout and treating files differently on that basis. As someone else noted, this doesn't belong in the filesystem (rather the kernel's block I/O layer/buffer cache). But I agree, an API by which we can tell the kernel what kind of I/O behavior to expect would be good. The kernel needs to provide good behavior for a wide range of applications, but the DBMS can take advantage of a lot of domain-specific information. In theory, being able to pass that domain-specific information on to the kernel would mean we could get better performance without needing to reimplement large chunks of functionality that really ought to be done by the kernel anyway (as implementing raw I/O would require, for example). On the other hand, it would probably mean adding a fair bit of OS-specific hackery, which we've largely managed to avoid in the past. The closest API to what you're describing that I'm aware of is posix_fadvise(). While that is technically-speaking a POSIX standard, it is not widely implemented (I know Linux 2.6 implements it; based on some quick googling, it looks like AIX does too). Using posix_fadvise() has been discussed in the past, so you might want to search the archives. We could use FADV_SEQUENTIAL to request more aggressive readahead on a file that we know we're about to sequentially scan. We might be able to use FADV_NOREUSE on the WAL. We might be able to get away with specifying FADV_RANDOM for indexes all of the time, or at least most of the time. One question is how this would interact with concurrent access (AFAICS there is no way to fetch the current advice on an fd...) Also, I would imagine Win32 provides some means to inform the kernel about your expected I/O pattern, but I haven't checked. Does anyone know of any other relevant APIs? -Neil ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?
On Thu, Oct 21, 2004 at 12:44:10PM +0200, Leeuw van der, Tim wrote: Hacking PG internally to handle raw devices will meet with strong resistance from large portions of the development team. I don't expect (m)any core devs of PG will be excited about rewriting the entire I/O architecture of PG and duplicating large amounts of OS type of code inside the application, just to try to attain an unknown performance benefit. Well, at least I see people claiming 30% difference between different file systems, but no, I'm not shouting bah, you'd better do this or I'll warez Oracle :-) I have no idea how much you can improve over the best filesystems out there, but having two layers of journalling (both WAL _and_ FS journalling) on top of each other don't make all that much sense to me. :-) /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?
Steinar H. Gunderson [EMAIL PROTECTED] writes: ... I have no idea how much you can improve over the best filesystems out there, but having two layers of journalling (both WAL _and_ FS journalling) on top of each other don't make all that much sense to me. Which is why setting the FS to journal metadata but not file contents is often suggested as best practice for a PG-only filesystem. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] create index with substr function
As previously suggested by Stephan Szabo, you need to create a helper function, e.g.: create or replace function after9(text)returns text language plpgsql immutable as ' begin return substr($1, 10); end; '; You may need the immutable specification is to allow the function's use in an index. Then use this function in the index creation: CREATE INDEX idx_doc_substr_doc_urn ON doc USING btree (after9(doc_urn)); I think that should do it. -- George On Thu, 21 Oct 2004 11:37:26 +0800 Ray [EMAIL PROTECTED] threw this fish to the penguins: sorry it doesn't works, as my postgres is 7.3 not 7.4. any other alternative solution for version after 7.4?? Thank Ray : ) - Original Message - From: Rosser Schwarz [EMAIL PROTECTED] To: Ray [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Thursday, October 21, 2004 11:34 AM Subject: Re: [PERFORM] create index with substr function while you weren't looking, Ray wrote: CREATE INDEX idx_doc_substr_doc_urn ON doc USING btree (SUBSTR(doc_urn,10)); CREATE INDEX idx_doc_substr_doc_urn ON doc USING btree ((SUBSTR(doc_urn,10))); You need an additional set of parens around the SUBSTR() call. /rls -- :wq ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) -- Are the gods not just? Oh no, child. What would become of us if they were? (CSL) ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] Simple machine-killing query!
Hi all, I'm writing this because I've reached the limit of my imagination and patience! So here is it... 2 tables: 1 containing 27 million variable lenght, alpha-numeric records (strings) in 1 (one) field. (10 - 145 char lenght per record) 1 containing 2.5 million variable lenght, alpha-numeric records (strings) in 1 (one) field. table wehere created using: CREATE TABLE public.BIGMA (string VARCHAR(255) NOT NULL) WITH OIDS; + CREATE INDEX BIGMA_INDEX ON public.BIGMA USING btree (string); and CREATE TABLE public.DIRTY (string VARCHAR(128) NOT NULL) WITH OIDS; + CREATE INDEX DIRTY_INDEX ON public.DIRTY USING btree (string); What I am requested to do is to keep all records from 'BIGMA' that do not apear in 'DIRTY' So far I have tried solving this by going for: [explain] select * from BIGMA where string not in (select * from DIRTY); QUERY PLAN Seq Scan on bigma (cost=0.00..24582291.25 rows=500 width=145) Filter: (NOT (subplan)) SubPlan - Seq Scan on dirty (cost=0.00..42904.63 rows=2503963 width=82) (4 rows) AND [explain] select * from bigma,dirty where bigma.email!=dirty.email; QUERY PLAN --- Nested Loop (cost=20.00..56382092.13 rows=2491443185 width=227) Join Filter: ((inner.email)::text (outer.email)::text) - Seq Scan on dirty (cost=0.00..42904.63 rows=2503963 width=82) - Materialize (cost=20.00..30.00 rows=1000 width=145) - Seq Scan on bigma (cost=0.00..20.00 rows=1000 width=145) (5 rows) Now the problem is that both of my previous tries seem to last forever! I'm not a pqsql guru so that's why I'm asking you fellas to guide mw right! I've tried this on mysql previosly but there seems to be no way mysql can handle this large query. QUESTIONS: What can I do in order to make this work? Where do I make mistakes? Is there a way I can improve the performance in table design, query style, server setting so that I can get this monster going and producing a result? Thanks all for your preciuos time and answers! Victor C. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?
Neil Conway wrote: Also, I would imagine Win32 provides some means to inform the kernel about your expected I/O pattern, but I haven't checked. Does anyone know of any other relevant APIs? See CreateFile, Parameter dwFlagsAndAttributes http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/base/createfile.asp There is FILE_FLAG_NO_BUFFERING, FILE_FLAG_OPEN_NO_RECALL, FILE_FLAG_RANDOM_ACCESS and even FILE_FLAG_POSIX_SEMANTICS Jan ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Simple machine-killing query!
On Thu, 21 Oct 2004, Victor Ciurus wrote: Hi all, I'm writing this because I've reached the limit of my imagination and patience! So here is it... 2 tables: 1 containing 27 million variable lenght, alpha-numeric records (strings) in 1 (one) field. (10 - 145 char lenght per record) 1 containing 2.5 million variable lenght, alpha-numeric records (strings) in 1 (one) field. table wehere created using: CREATE TABLE public.BIGMA (string VARCHAR(255) NOT NULL) WITH OIDS; + CREATE INDEX BIGMA_INDEX ON public.BIGMA USING btree (string); and CREATE TABLE public.DIRTY (string VARCHAR(128) NOT NULL) WITH OIDS; + CREATE INDEX DIRTY_INDEX ON public.DIRTY USING btree (string); What I am requested to do is to keep all records from 'BIGMA' that do not apear in 'DIRTY' So far I have tried solving this by going for: [explain] select * from BIGMA where string not in (select * from DIRTY); QUERY PLAN Seq Scan on bigma (cost=0.00..24582291.25 rows=500 width=145) Filter: (NOT (subplan)) SubPlan - Seq Scan on dirty (cost=0.00..42904.63 rows=2503963 width=82) (4 rows) Have you analyzed bigma? The number of rows from the two explains for that table look suspiciously like default values. Also, what version are you using, because there are some differences from 7.3 to 7.4 that change possible suggestions. The first is that on 7.4, you may be able to do better with a higher sort_mem which could possible switch over to the hashed implementation, although I think it's probably going to take a pretty high value given the size. The second is that you might get better results (even on older versions) from an exists or left join solution, something like (assuming no nulls in bigma.email): select * from bigma where not exists(select 1 from dirty where dirty.email != bigma.email); select bigma.* from bigma left outer join dirty on (dirty.email = bigma.email) where dirty.email is null; If you've got nulls in bigma.email you have to be a little more careful. [explain] select * from bigma,dirty where bigma.email!=dirty.email; This *almost* certainly does not do what you want. For most data sets this is going to give you a number of rows very close to # of rows in dirty * # of rows in bigma. Needless to say, this is going to take a long time. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Simple machine-killing query!
Sounds like you need some way to match a subset of the data first, rather than try indices that are bigger than the data. Can you add operation indices, perhaps on the first 10 bytes of the keys in both tables or on a integer hash of all of the strings? If so you could join on the exact set difference over the set difference of the operation match. /Aaron On Thu, 21 Oct 2004 17:34:17 +0300, Victor Ciurus [EMAIL PROTECTED] wrote: Hi all, I'm writing this because I've reached the limit of my imagination and patience! So here is it... 2 tables: 1 containing 27 million variable lenght, alpha-numeric records (strings) in 1 (one) field. (10 - 145 char lenght per record) 1 containing 2.5 million variable lenght, alpha-numeric records (strings) in 1 (one) field. table wehere created using: CREATE TABLE public.BIGMA (string VARCHAR(255) NOT NULL) WITH OIDS; + CREATE INDEX BIGMA_INDEX ON public.BIGMA USING btree (string); and CREATE TABLE public.DIRTY (string VARCHAR(128) NOT NULL) WITH OIDS; + CREATE INDEX DIRTY_INDEX ON public.DIRTY USING btree (string); What I am requested to do is to keep all records from 'BIGMA' that do not apear in 'DIRTY' So far I have tried solving this by going for: [explain] select * from BIGMA where string not in (select * from DIRTY); QUERY PLAN Seq Scan on bigma (cost=0.00..24582291.25 rows=500 width=145) Filter: (NOT (subplan)) SubPlan - Seq Scan on dirty (cost=0.00..42904.63 rows=2503963 width=82) (4 rows) AND [explain] select * from bigma,dirty where bigma.email!=dirty.email; QUERY PLAN --- Nested Loop (cost=20.00..56382092.13 rows=2491443185 width=227) Join Filter: ((inner.email)::text (outer.email)::text) - Seq Scan on dirty (cost=0.00..42904.63 rows=2503963 width=82) - Materialize (cost=20.00..30.00 rows=1000 width=145) - Seq Scan on bigma (cost=0.00..20.00 rows=1000 width=145) (5 rows) Now the problem is that both of my previous tries seem to last forever! I'm not a pqsql guru so that's why I'm asking you fellas to guide mw right! I've tried this on mysql previosly but there seems to be no way mysql can handle this large query. QUESTIONS: What can I do in order to make this work? Where do I make mistakes? Is there a way I can improve the performance in table design, query style, server setting so that I can get this monster going and producing a result? Thanks all for your preciuos time and answers! Victor C. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] -- Regards, /Aaron ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Simple machine-killing query!
Victor Ciurus [EMAIL PROTECTED] writes: What I am requested to do is to keep all records from 'BIGMA' that do not apear in 'DIRTY' So far I have tried solving this by going for: [explain] select * from BIGMA where string not in (select * from DIRTY); QUERY PLAN Seq Scan on bigma (cost=0.00..24582291.25 rows=500 width=145) Filter: (NOT (subplan)) SubPlan - Seq Scan on dirty (cost=0.00..42904.63 rows=2503963 width=82) (4 rows) If you are using PG 7.4, you can get reasonable performance out of this approach, but you need to jack sort_mem up to the point where the whole DIRTY table will fit into sort_mem (so that you get a hashed-subplan plan and not a plain subplan). If you find yourself setting sort_mem to more than say half of your machine's available RAM, you should probably forget that idea. [explain] select * from bigma,dirty where bigma.email!=dirty.email; This of course does not give the right answer at all. A trick that people sometimes use is an outer join: select * from bigma left join dirty on (bigma.email=dirty.email) where dirty.email is null; Understanding why this works is left as an exercise for the reader ... but it does work, and pretty well too. If you're using pre-7.4 PG then this is about the only effective solution AFAIR. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?
On Thu, Oct 21, 2004 at 10:20:55AM -0400, Tom Lane wrote: ... I have no idea how much you can improve over the best filesystems out there, but having two layers of journalling (both WAL _and_ FS journalling) on top of each other don't make all that much sense to me. Which is why setting the FS to journal metadata but not file contents is often suggested as best practice for a PG-only filesystem. Mm, but you still journal the metadata. Oh well, noatime etc.. :-) By the way, I'm probably hitting a FAQ here, but would O_DIRECT help PostgreSQL any, given large enough shared_buffers? /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Simple machine-killing query!
Victor, [explain] select * from BIGMA where string not in (select * from DIRTY); QUERY PLAN Seq Scan on bigma (cost=0.00..24582291.25 rows=500 width=145) Filter: (NOT (subplan)) SubPlan - Seq Scan on dirty (cost=0.00..42904.63 rows=2503963 width=82) This is what you call an evil query. I'm not surprised it takes forever; you're telling the database Compare every value in 2.7 million rows of text against 2.5 million rows of text and give me those that don't match. There is simply no way, on ANY RDBMS, for this query to execute and not eat all of your RAM and CPU for a long time. You're simply going to have to allocate shared_buffers and sort_mem (about 2GB of sort_mem would be good) to the query, and turn the computer over to the task until it's done. And, for the sake of sanity, when you find the 200,000 rows that don't match, flag them so that you don't have to do this again. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?
As someone else noted, this doesn't belong in the filesystem (rather the kernel's block I/O layer/buffer cache). But I agree, an API by which we can tell the kernel what kind of I/O behavior to expect would be good. [snip] The closest API to what you're describing that I'm aware of is posix_fadvise(). While that is technically-speaking a POSIX standard, it is not widely implemented (I know Linux 2.6 implements it; based on some quick googling, it looks like AIX does too). Don't forget about the existence/usefulness/widely implemented madvise(2)/posix_madvise(2) call, which can give the OS the following hints: MADV_NORMAL, MADV_SEQUENTIAL, MADV_RANDOM, MADV_WILLNEED, MADV_DONTNEED, and MADV_FREE. :) -sc -- Sean Chittenden ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Links to OSDL test results up
Simon, Folks, I've put links to all of my OSDL-STP test results up on the TestPerf project: http://pgfoundry.org/forum/forum.php?thread_id=164forum_id=160 SHareEnjoy! -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] mmap (was First set of OSDL Shared Mem scalability results, some wierdness ...
However the really major difficulty with using mmap is that it breaks the scheme we are currently using for WAL, because you don't have any way to restrict how soon a change in an mmap'd page will go to disk. (No, I don't believe that mlock guarantees this. It says that the page will not be removed from main memory; it does not specify that, say, the syncer won't write the contents out anyway.) I had to think about this for a minute (now nearly a week) and reread the docs on WAL before I groked what could happen here. You're absolutely right in that WAL needs to be taken into account first. How does this execution path sound to you? By default, all mmap(2)'ed pages are MAP_SHARED. There are no complications with regards to reads. When a backend wishes to write a page, the following steps are taken: 1) Backend grabs a lock from the lockmgr to write to the page (exactly as it does now) 2) Backend mmap(2)'s a second copy of the page(s) being written to, this time with the MAP_PRIVATE flag set. Mapping a copy of the page again is wasteful in terms of address space, but does not require any more memory than our current scheme. The re-mapping of the page with MAP_PRIVATE prevents changes to the data that other backends are viewing. 3) The writing backend, can then scribble on its private copy of the page(s) as it sees fit. 4) Once completed making changes and a transaction is to be committed, the backend WAL logs its changes. 5) Once the WAL logging is complete and it has hit the disk, the backend msync(2)'s its private copy of the pages to disk (ASYNC or SYNC, it doesn't really matter too much to me). 6) Optional(?). I'm not sure whether or not the backend would need to also issues an msync(2) MS_INVALIDATE, but, I suspect it would not need to on systems with unified buffer caches such as FreeBSD or OS-X. On HPUX, or other older *NIX'es, it may be necessary. *shrug* I could be trying to be overly protective here. 7) Backend munmap(2)'s its private copy of the written on page(s). 8) Backend releases its lock from the lockmgr. At this point, the remaining backends now are able to see the updated pages of data. Let's look at what happens with a read(2) call. To read(2) data you have to have a block of memory to copy data into. Assume your OS of choice has a good malloc(3) implementation and it only needs to call brk(2) once to extend the process's memory address after the first malloc(3) call. There's your first system call, which guarantees one context switch. Wrong. Our reads occur into shared memory allocated at postmaster startup, remember? Doh. Fair enough. In most programs that involve read(2), a call to alloc(3) needs to be made. mmap(2) is a totally different animal in that you don't ever need to make calls to read(2): mmap(2) is used in place of those calls (With #ifdef and a good abstraction, the rest of PostgreSQL wouldn't know it was working with a page of mmap(2)'ed data or need to know that it is). Instead, you have to worry about address space management and keeping a consistent view of the data. Which is largely handled by mmap() and the VM. ... If a write(2) system call is issued on a page of mmap(2)'ed data (and your operating system supports it, I know FreeBSD does, but don't think Linux does), then the page of data is DMA'ed by the network controller and sent out without the data needing to be copied into the network controller's buffer. Perfectly irrelevant to Postgres, since there is no situation where we'd ever write directly from a disk buffer to a socket; in the present implementation there are at least two levels of copy needed in between (datatype-specific output function and protocol message assembly). And that's not even counting the fact that any data item large enough to make the savings interesting would have been sliced, diced, and compressed by TOAST. The biggest winners will be columns whos storage type is PLAIN or EXTERNAL. writev(2) from mmap(2)'ed pages and non-mmap(2)'ed pages would be a nice perk too (not sure if PostgreSQL uses this or not). Since compression isn't happening on most tuples under 1K in size and most tuples in a database are going to be under that, most tuples are going to be uncompressed. Total pages for the database, however, is likely a different story. For large tuples that are uncompressed and larger than a page, it is probably beneficial to use sendfile(2) instead of mmap(2) + write(2)'ing the page/file. If a large tuple is compressed, it'd be interesting to see if it'd be worthwhile to have the data uncompressed onto an anonymously mmap(2)'ed page(s) that way the benefits of zero-socket-copies could be used. shared mem is a bastardized subsystem that works, but isn't integral to any performance areas in the kernel so it gets neglected. What performance issues do you think shared memory needs to have fixed? We don't issue any shmem kernel calls after the initial shmget, so comparing the
[PERFORM] Performance Anomalies in 7.4.5
I'm seeing some weird behavior on a repurposed server that was wiped clean and set up to run as a database and application server with postgres and Apache, as well as some command-line PHP scripts. The box itself is a quad processor (2.4 GHz Intel Xeons) Debian woody GNU/Linux (2.6.2) system. postgres is crawling on some fairly routine queries. I'm wondering if this could somehow be related to the fact that this isn't a database-only server, but Apache is not really using any resources when postgres slows to a crawl. Here's an example of analysis of a recent query: EXPLAIN ANALYZE SELECT COUNT(DISTINCT u.id) FROM userdata as u, userdata_history as h WHERE h.id = '18181' AND h.id = u.id; QUERY PLAN Aggregate (cost=0.02..0.02 rows=1 width=8) (actual time=298321.421..298321.422 rows=1 loops=1) - Nested Loop (cost=0.00..0.01 rows=1 width=8) (actual time=1.771..298305.531 rows=2452 loops=1) Join Filter: (inner.id = outer.id) - Seq Scan on userdata u (cost=0.00..0.00 rows=1 width=8) (actual time=0.026..11.869 rows=2452 loops=1) - Seq Scan on userdata_history h (cost=0.00..0.00 rows=1 width=8) (actual time=0.005..70.519 rows=41631 loops=2452) Filter: (id = 18181::bigint) Total runtime: 298321.926 ms (7 rows) userdata has a primary/foreign key on id, which references userdata_history.id, which is a primary key. At the time of analysis, the userdata table had 2,500 rows. userdata_history had 50,000 rows. I can't imagine how even a seq scan could result in a runtime of nearly 5 minutes in these circumstances. Also, doing a count( * ) from each table individually returns nearly instantly. I can provide details of postgresql.conf and kernel settings if necessary, but I'm using some pretty well tested settings that I use any time I admin a postgres installation these days based on box resources and database size. I'm more interested in knowing if there are any bird's eye details I should be checking immediately. Thanks. -tfo -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Performance Anomalies in 7.4.5
I know, I know: I should've done this before I posted. REINDEXing and VACUUMing mostly fixed this problem. Which gets me back to where I was yesterday, reviewing an import process (that existed previously) that populates tables in this system that seems to allow small data sets to cause simple queries like this to crawl. Is there anything about general COPY/INSERT activity that can cause small data sets to become so severely slow in postgres that can be prevented other than being diligent about VACUUMing? I was hoping that pg_autovacuum along with post-import manual VACUUMs would be sufficient, but it doesn't seem to be the case necessarily. Granted, I haven't done a methodical and complete review of the process, but I'm still surprised at how quickly it seems able to debilitate postgres with even small amounts of data. I had a similar situation crawl yesterday based on a series of COPYs involving 5 rows! As in, can I look for something to treat the cause rather than the symptoms? If not, should I be REINDEXing manually, as well as VACUUMing manually after large data imports (whether via COPY or INSERT)? Or will a VACUUM FULL ANALYZE be enough? Thanks! -tfo -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 On Oct 21, 2004, at 3:36 PM, Thomas F.O'Connell wrote: I'm seeing some weird behavior on a repurposed server that was wiped clean and set up to run as a database and application server with postgres and Apache, as well as some command-line PHP scripts. The box itself is a quad processor (2.4 GHz Intel Xeons) Debian woody GNU/Linux (2.6.2) system. postgres is crawling on some fairly routine queries. I'm wondering if this could somehow be related to the fact that this isn't a database-only server, but Apache is not really using any resources when postgres slows to a crawl. Here's an example of analysis of a recent query: EXPLAIN ANALYZE SELECT COUNT(DISTINCT u.id) FROM userdata as u, userdata_history as h WHERE h.id = '18181' AND h.id = u.id; QUERY PLAN --- --- -- Aggregate (cost=0.02..0.02 rows=1 width=8) (actual time=298321.421..298321.422 rows=1 loops=1) - Nested Loop (cost=0.00..0.01 rows=1 width=8) (actual time=1.771..298305.531 rows=2452 loops=1) Join Filter: (inner.id = outer.id) - Seq Scan on userdata u (cost=0.00..0.00 rows=1 width=8) (actual time=0.026..11.869 rows=2452 loops=1) - Seq Scan on userdata_history h (cost=0.00..0.00 rows=1 width=8) (actual time=0.005..70.519 rows=41631 loops=2452) Filter: (id = 18181::bigint) Total runtime: 298321.926 ms (7 rows) userdata has a primary/foreign key on id, which references userdata_history.id, which is a primary key. At the time of analysis, the userdata table had 2,500 rows. userdata_history had 50,000 rows. I can't imagine how even a seq scan could result in a runtime of nearly 5 minutes in these circumstances. Also, doing a count( * ) from each table individually returns nearly instantly. I can provide details of postgresql.conf and kernel settings if necessary, but I'm using some pretty well tested settings that I use any time I admin a postgres installation these days based on box resources and database size. I'm more interested in knowing if there are any bird's eye details I should be checking immediately. Thanks. -tfo -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Performance Anomalies in 7.4.5
The irony is that I had just disabled pg_autovacuum the previous day during analysis of a wider issue affecting imports of data into the system. -tfo -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 On Oct 21, 2004, at 4:05 PM, Dennis Bjorklund wrote: On Thu, 21 Oct 2004, Thomas F.O'Connell wrote: Aggregate (cost=0.02..0.02 rows=1 width=8) (actual time=298321.421..298321.422 rows=1 loops=1) - Nested Loop (cost=0.00..0.01 rows=1 width=8) (actual time=1.771..298305.531 rows=2452 loops=1) Join Filter: (inner.id = outer.id) - Seq Scan on userdata u (cost=0.00..0.00 rows=1 width=8) (actual time=0.026..11.869 rows=2452 loops=1) - Seq Scan on userdata_history h (cost=0.00..0.00 rows=1 width=8) (actual time=0.005..70.519 rows=41631 loops=2452) Filter: (id = 18181::bigint) It looks like you have not run ANALYZE recently. Most people run VACUUM ANALYZE every night (or similar) in a cron job. -- /Dennis Björklund ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance Anomalies in 7.4.5
Thomas F.O'Connell [EMAIL PROTECTED] writes: - Nested Loop (cost=0.00..0.01 rows=1 width=8) (actual time=1.771..298305.531 rows=2452 loops=1) Join Filter: (inner.id = outer.id) - Seq Scan on userdata u (cost=0.00..0.00 rows=1 width=8) (actual time=0.026..11.869 rows=2452 loops=1) - Seq Scan on userdata_history h (cost=0.00..0.00 rows=1 width=8) (actual time=0.005..70.519 rows=41631 loops=2452) Filter: (id = 18181::bigint) Total runtime: 298321.926 ms (7 rows) What's killing you here is that the planner thinks these tables are completely empty (notice the zero cost estimates, which implies the table has zero blocks --- the fact that the rows estimate is 1 and not 0 is the result of sanity-check clamping inside costsize.c). This leads it to choose a nestloop, which would be the best plan if there were only a few rows involved, but it degenerates rapidly when there are not. It's easy to fall into this trap when truncating and reloading tables; all you need is an analyze while the table is empty. The rule of thumb is to analyze just after you reload the table, not just before. I'm getting more and more convinced that we need to drop the reltuples and relpages entries in pg_class, in favor of checking the physical table size whenever we make a plan. We could derive the tuple count estimate by having ANALYZE store a tuples-per-page estimate in pg_class and then multiply by the current table size; tuples-per-page should be a much more stable figure than total tuple count. One drawback to this is that it would require an additional lseek per table while planning, but that doesn't seem like a huge penalty. Probably the most severe objection to doing things this way is that the selected plan could change unexpectedly as a result of the physical table size changing. Right now the DBA can keep tight rein on actions that might affect plan selection (ie, VACUUM and ANALYZE), but that would go by the board with this. OTOH, we seem to be moving towards autovacuum, which also takes away any guarantees in this department. In any case this is speculation for 8.1; I think it's too late for 8.0. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Does PostgreSQL run with Oracle?
On Fri, Oct 15, 2004 at 10:19:48AM -0700, Steve Atkins wrote: On Fri, Oct 15, 2004 at 11:54:44AM -0500, [EMAIL PROTECTED] wrote: My basic question to the community is is PostgreSQL approximately as fast as Oracle? I'm currently running single processor UltraSPARC workstations, and intend to use Intel Arch laptops and Linux. The application is a big turnkey workstation app. I know the hardware switch alone will enhance performance, and may do so to the point where even a slower database will still be adequate. I have found that PostgreSQL seems to perform poorly on Solaris/SPARC (less so after recent improvements, but still...) compared to x86 systems - more so than the delta between Oracle on the two platforms. Just a gut impression, but it might mean that comparing the two databases on SPARC may not be that useful comparison if you're planning to move to x86. As a point of reference, an IBM hardware sales rep I worked with a few years ago told me that he got a lot of sales from Oracle shops that were running Sun and switched to RS/6000. Basically, for a given workload, it would take 2x the number of Sun CPUs as RS/6000 CPUs. The difference in Oracle licensing costs was usually enough to pay for the new hardware in one year. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] OS desicion
On Wed, Oct 20, 2004 at 09:38:51AM -0700, Josh Berkus wrote: Tom, You are asking the wrong question. The best OS is the OS you (and/or the customer) knows and can administer competently. I'll have to 2nd this. I'll 3rd but add one tidbit: FreeBSD will schedule disk I/O based on process priority, while linux won't. This can be very handy for things like vacuum. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?
Note that most people are now moving away from raw devices for databases in most applicaitons. The relatively small performance gain isn't worth the hassles. On Thu, Oct 21, 2004 at 12:27:27PM +0200, Steinar H. Gunderson wrote: On Thu, Oct 21, 2004 at 08:58:01AM +0100, Matt Clark wrote: I suppose I'm just idly wondering really. Clearly it's against PG philosophy to build an FS or direct IO management into PG, but now it's so relatively easy to plug filesystems into the main open-source Oses, It struck me that there might be some useful changes to, say, XFS or ext3, that could be made that would help PG out. This really sounds like a poor replacement for just making PostgreSQL use raw devices to me. (I have no idea why that isn't done already, but presumably it isn't all that easy to get right. :-) ) /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Performance Anomalies in 7.4.5
On 21 Oct 2004 at 15:50, Thomas F.O'Connell wrote: If not, should I be REINDEXing manually, as well as VACUUMing manually after large data imports (whether via COPY or INSERT)? Or will a VACUUM FULL ANALYZE be enough? It's not the vacuuming that's important here, just the analyze. If you import any data into a table, Postgres often does not *know* that until you gather the statistics on the table. You are simply running into the problem of the planner not knowing how much data/distribution of data in your tables. If you have large imports it may be faster overall to drop the indexes first, then insert the data, then put the indexes back on, then analyze. Cheers, Gary. ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] Large Database Performance suggestions
Hello everyone, I am currently working on a data project that uses PostgreSQL extensively to store, manage and maintain the data. We haven't had any problems regarding database size until recently. The three major tables we use never get bigger than 10 million records. With this size, we can do things like storing the indexes or even the tables in memory to allow faster access. Recently, we have found customers who are wanting to use our service with data files between 100 million and 300 million records. At that size, each of the three major tables will hold between 150 million and 700 million records. At this size, I can't expect it to run queries in 10-15 seconds (what we can do with 10 million records), but would prefer to keep them all under a minute. We did some original testing and with a server with 8GB or RAM and found we can do operations on data file up to 50 million fairly well, but performance drop dramatically after that. Does anyone have any suggestions on a good way to improve performance for these extra large tables? Things that have come to mind are Replication and Beowulf clusters, but from what I have recently studied, these don't do so wel with singular processes. We will have parallel process running, but it's more important that the speed of each process be faster than several parallel processes at once. Any help would be greatly appreciated! Thanks, Joshua Marsh P.S. Off-topic, I have a few invitations to gmail. If anyone would like one, let me know. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Large Database Performance suggestions
On Thu, 21 Oct 2004, Joshua Marsh wrote: Recently, we have found customers who are wanting to use our service with data files between 100 million and 300 million records. At that size, each of the three major tables will hold between 150 million and 700 million records. At this size, I can't expect it to run queries in 10-15 seconds (what we can do with 10 million records), but would prefer to keep them all under a minute. To provide any useful information, we'd need to look at your table schemas and sample queries. The values for sort_mem and shared_buffers will also be useful. Are you VACUUMing and ANALYZEing? (or is the data read only?)) gavin ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Large Database Performance suggestions
Joshua Marsh [EMAIL PROTECTED] writes: ... We did some original testing and with a server with 8GB or RAM and found we can do operations on data file up to 50 million fairly well, but performance drop dramatically after that. What you have to ask is *why* does it drop dramatically? There aren't any inherent limits in Postgres that are going to kick in at that level. I'm suspicious that you could improve the situation by adjusting sort_mem and/or other configuration parameters; but there's not enough info here to make specific recommendations. I would suggest posting EXPLAIN ANALYZE results for your most important queries both in the size range where you are getting good results, and the range where you are not. Then we'd have something to chew on. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] mmap (was First set of OSDL Shared Mem scalability results, some wierdness ...
Sean Chittenden [EMAIL PROTECTED] writes: When a backend wishes to write a page, the following steps are taken: ... 2) Backend mmap(2)'s a second copy of the page(s) being written to, this time with the MAP_PRIVATE flag set. ... 5) Once the WAL logging is complete and it has hit the disk, the backend msync(2)'s its private copy of the pages to disk (ASYNC or SYNC, it doesn't really matter too much to me). My man page for mmap says that changes in a MAP_PRIVATE region are private; they do not affect the file at all, msync or no. So I don't think the above actually works. In any case, this scheme still forces you to flush WAL records to disk before making the changed page visible to other backends, so I don't see how it improves the situation. In the existing scheme we only have to fsync WAL at (1) transaction commit, (2) when we are forced to write a page out from shared buffers because we are short of buffers, or (3) checkpoint. Anything that implies an fsync per atomic action is going to be a loser. It does not matter how great your kernel API is if you only get to perform one atomic action per disk rotation :-( The important point here is that you can't postpone making changes at the page level visible to other backends; there's no MVCC at this level. Consider for example two backends wanting to insert a new row. If they both MAP_PRIVATE the same page, they'll probably choose the same tuple slot on the page to insert into (certainly there is nothing to stop that from happening). Now you have conflicting definitions for the same CTID, not to mention probably conflicting uses of the page's physical free space; disaster ensues. So atomic action really means lock page, make changes, add WAL record to in-memory WAL buffers, unlock page with the understanding that as soon as you unlock the page the changes you've made in it are visible to all other backends. You *can't* afford to put a WAL fsync in this sequence. You could possibly buy back most of the lossage in this scenario if there were some efficient way for a backend to hold the low-level lock on a page just until some other backend wanted to modify the page; whereupon the previous owner would have to do what's needed to make his changes visible before releasing the lock. Given the right access patterns you don't have to fsync very often (though given the wrong access patterns you're still in deep trouble). But we don't have any such mechanism and I think the communication costs of one would be forbidding. [ much snipped ] 4) Not having shared pages get lost when the backend dies (mmap(2) uses refcounts and cleans itself up, no need for ipcs/ipcrm/ipcclean). Actually, that is not a bug that's a feature. One of the things that scares me about mmap is that a crashing backend is able to scribble all over live disk buffers before it finally SEGV's (think about memcpy gone wrong and similar cases). In our existing scheme there's a pretty good chance that we will be able to commit hara-kiri before any of the trashed data gets written out. In an mmap scheme, it's time to dig out your backup tapes, because there simply is no distinction between transient and permanent data --- the kernel has no way to know that you didn't mean it. In short, I remain entirely unconvinced that mmap is of any interest to us. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly