Re: [PERFORM] Massive table (500M rows) update nightmare

2010-01-07 Thread Scott Marlowe
On Thu, Jan 7, 2010 at 11:14 PM, Carlo Stonebanks wrote: >> It might well be checkpoints.  Have you tried cranking up checkpoint >> segments to something like 100 or more and seeing how it behaves then? > > No I haven't, althugh it certainly make sense - watching the process run, > you get this se

Re: [PERFORM] Massive table (500M rows) update nightmare

2010-01-07 Thread Carlo Stonebanks
Already done in an earlier post, Kevin - I have included it again below. As you can see, it's pretty well wqhat you would expect, index scan plus a filter. One note: updates where no rows qualify run appreciably faster than the ones that do. That is, the update itself appears to be consuming a

Re: [PERFORM] Massive table (500M rows) update nightmare

2010-01-07 Thread Carlo Stonebanks
It might well be checkpoints. Have you tried cranking up checkpoint segments to something like 100 or more and seeing how it behaves then? No I haven't, althugh it certainly make sense - watching the process run, you get this sense that the system occaisionally pauses to take a deep, long bre

[PERFORM] Array comparison

2010-01-07 Thread ramasubramanian
Hi all, I want to compare the two arrys in sql, The below example is wrong. But i want to know how to do that in postgres 8.2.5 SELECT 'x' WHERE string_to_array('the,quick,ram,fox', ',') any (string_to_array('the,quick,lazy ,fox', ',')) Regards, Ram

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Craig Ringer
On 8/01/2010 2:11 AM, Nikolas Everett wrote: This table is totally unnormalized. Normalize it and try again. You'll probably see a huge speedup. Maybe even 10x. My mantra has always been less data stored means less data to scan means faster scans. Sometimes one intentionally denormalizes st

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Tom Lane
Craig Ringer writes: > Can Pg even read partial records ? I thought it all operated on a page > level, where if an index indicates that a particular value is present on > a page the whole page gets read in and all records on the page are > checked for the value of interest. No? The whole page

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Craig Ringer
On 7/01/2010 11:45 PM, Gurgel, Flavio wrote: The table is very wide, which is probably why the tested databases can deal with it faster than PG. You could try and narrow the table down (for instance: remove the Div* fields) to make the data more "relational-like". In real life, speedups in t

Re: [PERFORM] "large" spam tables and performance: postgres memory parameters

2010-01-07 Thread Stephen Frost
* Gary Warner (g...@cis.uab.edu) wrote: > - if you have 8 pentium cores, 12GB of RAM and "infinite" diskspace, what > sorts of memory settings would you have in your start up tables? If the PG database is the only thing on the system, I'd probably go with something like: shared_buffers = 4GB te

Re: [PERFORM] "large" spam tables and performance: postgres memory parameters

2010-01-07 Thread Scott Marlowe
On Thu, Jan 7, 2010 at 8:23 AM, Gary Warner wrote: > Hello, > > I've been lurking on this list a couple weeks now, and have asked some "side > questions" to some of the list members, who have been gracious, and helpful, > and encouraged me to just dive in and participate on the list. > > I'll no

Re: [PERFORM] Massive table (500M rows) update nightmare

2010-01-07 Thread Scott Marlowe
On Thu, Jan 7, 2010 at 2:48 PM, Carlo Stonebanks wrote: > Doing the updates in smaller chunks resolved these apparent freezes - or, > more specifically, when the application DID freeze, it didn't do it for more > than 30 seconds. In all likelyhood, this is the OS and the DB thrashing. It might we

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Greg Smith
hubert depesz lubaczewski wrote: Well, this query basically has to be slow. Correct approach to this problem is to add precalculated aggregates... The point of this data set and associated queries is to see how fast the database can do certain types of queries on its own. Some other types of

Re: [PERFORM] Massive table (500M rows) update nightmare

2010-01-07 Thread Kevin Grittner
"Carlo Stonebanks" wrote: > An interesting idea, if I can confirm that the performance problem > is because of the WHERE clause, not the UPDATE. If you could show EXPLAIN ANALYZE output for one iteration, with related queries and maybe more info on the environment, it would take most of the gu

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread hubert depesz lubaczewski
On Thu, Jan 07, 2010 at 01:38:41PM +0100, Lefteris wrote: > airtraffic=# EXPLAIN ANALYZE SELECT "DayOfWeek", count(*) AS c FROM > ontime WHERE "Year" BETWEEN 2000 AND 2008 GROUP BY "DayOfWeek" ORDER > BY c DESC; Well, this query basically has to be slow. Correct approach to this problem is to add

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Lefteris
On Thu, Jan 7, 2010 at 11:57 PM, Lefteris wrote: > Hi Greg, > > thank you for your help. The changes I did on the dataset was just > removing the last comma from the CSV files as it was interpreted by pg > as an extra column. The schema I used, the load script and queries can > be found at: > > ht

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Lefteris
Hi Greg, thank you for your help. The changes I did on the dataset was just removing the last comma from the CSV files as it was interpreted by pg as an extra column. The schema I used, the load script and queries can be found at: http://homepages.cwi.nl/~lsidir/postgres/ (I understood that if I

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Greg Smith
Lefteris wrote: So we all agree that the problem is on the scans:) So the next question is why changing shared memory buffers will fix that? i only have one session with one connection, do I have like many reader workers or something? I wouldn't expect it to. Large sequential scans like th

Re: [PERFORM] Massive table (500M rows) update nightmare

2010-01-07 Thread Carlo Stonebanks
What is the rationale behind this? How about doing 10k rows in 1 update, and committing every time? When we did 10K updates, the application would sometimes appear to have frozen, and we were concerned that there was a deadlock condition because of the number of locked rows. While we may have

Re: [PERFORM] Massive table (500M rows) update nightmare

2010-01-07 Thread marcin mank
> every update is a UPDATE ... WHERE id >>= x AND id < x+10 and a commit is performed after every 1000 updates > statement, i.e. every 1 rows. What is the rationale behind this? How about doing 10k rows in 1 update, and committing every time? You could try making the condition on the ctid col

Re: [PERFORM] noob inheritance question

2010-01-07 Thread Robert Haas
On Wed, Jan 6, 2010 at 6:53 PM, Zintrigue wrote: > I'm wondering if there's any performance penalty here There definitely is. Your design sounds pretty painful to me... adding a column referencing a side-table will be much nicer. > If anyone can offer in any insight as too how inheritance is ac

Re: [PERFORM] query looping?

2010-01-07 Thread Robert Haas
On Thu, Jan 7, 2010 at 1:43 PM, Brian Cox wrote: > On 01/05/2010 08:34 PM, Robert Haas [robertmh...@gmail.com] wrote: >> >> - If you have other queries where this index helps (even though it is >> hurting this one), then you're going to have to find a way to execute >> the query without using boun

Re: [PERFORM] pg_connect takes 3.0 seconds

2010-01-07 Thread Dave Crooke
Oops, I meant to mention this too virtually all GigE and/or server class NICs do TCP checksum offload. Dimitri - it's unlikely that you have a hardware issue on the NIC, it's more likely to be a cable problem or network congestion. What you want to look for in the tcpdump capture is things li

Re: [PERFORM] Joining on text field VS int

2010-01-07 Thread Nikolas Everett
Joining via a tinyint or something will make your large table smaller which is nice. Smaller tables = faster tables. On Wed, Jan 6, 2010 at 11:21 PM, Radhika S wrote: > Hi, > I am going to test this out but would be good to know anyways. A large > table is joined to a tiny table (8 rows) on a t

Re: [PERFORM] query looping?

2010-01-07 Thread Brian Cox
On 01/05/2010 08:34 PM, Robert Haas [robertmh...@gmail.com] wrote: - If you have other queries where this index helps (even though it is hurting this one), then you're going to have to find a way to execute the query without using bound parameters - i.e. with the actual values in there instead of

Re: [PERFORM] Massive table (500M rows) update nightmare

2010-01-07 Thread Carlo Stonebanks
If it is possible to lock this audit table exclusively (may be during off peak hours) I would look into - create new_audit_table as select col1, col2, col3 ... col9, 'new_col_value' from old_audit_table; - create all indexes - drop old_audit_table - rename new_audit_table to old_audit_table That

Re: [PERFORM] Massive table (500M rows) update nightmare

2010-01-07 Thread Carlo Stonebanks
Got an explain analyze of the delete query? UPDATE mdx_core.audit_impt SET source_table = 'mdx_import.'||impt_name WHERE audit_impt_id >= 31941 AND audit_impt_id <= 319400010 AND coalesce(source_table, '') = '' Index Scan using audit_impt_pkey on audit_impt (cost=0.00..92.63 rows=1 width=

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Nikolas Everett
This table is totally unnormalized. Normalize it and try again. You'll probably see a huge speedup. Maybe even 10x. My mantra has always been less data stored means less data to scan means faster scans. On Thu, Jan 7, 2010 at 12:57 PM, Gurgel, Flavio wrote: > - "Matthew Wakeling" escreve

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Gurgel, Flavio
- "Scott Marlowe" escreveu: > You do know that indexes in postgresql are not "covering" right? > I.e. > after hitting the index, the db then has to hit the table to see if > those rows are in fact visible. So there's no such thing in pgsql, > at > the moment, as an index only scan. That w

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Scott Marlowe
On Thu, Jan 7, 2010 at 10:57 AM, Gurgel, Flavio wrote: > - "Matthew Wakeling" escreveu: >> On Thu, 7 Jan 2010, Gurgel, Flavio wrote: >> Postgres does not change a query plan according to the shared_buffers >> >> setting. It does not anticipate one step contributing to another step >> in >> th

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Gurgel, Flavio
- "Matthew Wakeling" escreveu: > On Thu, 7 Jan 2010, Gurgel, Flavio wrote: > Postgres does not change a query plan according to the shared_buffers > > setting. It does not anticipate one step contributing to another step > in > this way. It does however make use of the effective_cache_size s

Re: [PERFORM] Massive table (500M rows) update nightmare

2010-01-07 Thread Greg Sabino Mullane
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 > If it is possible to lock this audit table exclusively (may be during > off peak hours) I would look into > - create new_audit_table as se

Re: [PERFORM] Massive table (500M rows) update nightmare

2010-01-07 Thread Kevin Kempter
On Thursday 07 January 2010 09:57:59 Kevin Grittner wrote: > Ludwik Dylag wrote: > > I would suggest: > > 1. turn off autovacuum > > 1a. ewentually tune db for better performace for this kind of > > operation (cant not help here) > > 2. restart database > > 3. drop all indexes > > 4. update >

Re: [PERFORM] "large" spam tables and performance: postgres memory parameters

2010-01-07 Thread Kevin Grittner
Welcome out of the shadows, Gary! ;-) Gary Warner wrote: > My biggest question mark there really has to do with how many > users I have and how that might alter the results. In benchmarks I've run with our software load, I've found that I get best throughput when I use a connection pool whic

Re: [PERFORM] Massive table (500M rows) update nightmare

2010-01-07 Thread Kevin Grittner
Ludwik Dylag wrote: > I would suggest: > 1. turn off autovacuum > 1a. ewentually tune db for better performace for this kind of > operation (cant not help here) > 2. restart database > 3. drop all indexes > 4. update > 5. vacuum full table > 6. create indexes > 7. turn on autovacuum I've onl

Re: [PERFORM] Massive table (500M rows) update nightmare

2010-01-07 Thread Ludwik Dylag
I would suggest: 1. turn off autovacuum 1a. ewentually tune db for better performace for this kind of operation (cant not help here) 2. restart database 3. drop all indexes 4. update 5. vacuum full table 6. create indexes 7. turn on autovacuum Ludwik 2010/1/7 Leo Mannhart > Kevin Grittner wrot

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Craig James
Alvaro Herrera wrote: No amount of tinkering is going to change the fact that a seqscan is the fastest way to execute these queries. Even if you got it to be all in memory, it would still be much slower than the other systems which, I gather, are using columnar storage and thus are perfectly sui

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Matthew Wakeling
On Thu, 7 Jan 2010, Gurgel, Flavio wrote: If one single query execution had a step that brought a page to the buffercache, it's enough to increase another step speed and change the execution plan, since the data access in memory is (usually) faster then disk. Postgres does not change a query

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Lefteris
On Thu, Jan 7, 2010 at 4:57 PM, Ivan Voras wrote: > 2010/1/7 Lefteris : >> On Thu, Jan 7, 2010 at 3:51 PM, Ivan Voras wrote: >>> On 7.1.2010 15:23, Lefteris wrote: >>> I think what you all said was very helpful and clear! The only part that I still disagree/don't understand is the share

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Ivan Voras
2010/1/7 Lefteris : > On Thu, Jan 7, 2010 at 3:51 PM, Ivan Voras wrote: >> On 7.1.2010 15:23, Lefteris wrote: >> >>> I think what you all said was very helpful and clear! The only part >>> that I still disagree/don't understand is the shared_buffer option:)) >> >> Did you ever try increasing share

[PERFORM] "large" spam tables and performance: postgres memory parameters

2010-01-07 Thread Gary Warner
Hello, I've been lurking on this list a couple weeks now, and have asked some "side questions" to some of the list members, who have been gracious, and helpful, and encouraged me to just dive in and participate on the list. I'll not tell you the whole story right off the bat, but let me give y

Re: [PERFORM] Massive table (500M rows) update nightmare

2010-01-07 Thread Leo Mannhart
Kevin Grittner wrote: > Leo Mannhart wrote: > >> You could also try to just update the whole table in one go, it is >> probably faster than you expect. > > That would, of course, bloat the table and indexes horribly. One > advantage of the incremental approach is that there is a chance for >

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Gurgel, Flavio
- "Lefteris" escreveu: > > Did you ever try increasing shared_buffers to what was suggested > (around > > 4 GB) and see what happens (I didn't see it in your posts)? > > No I did not to that yet, mainly because I need the admin of the > machine to change the shmmax of the kernel and also beca

Re: [PERFORM] Massive table (500M rows) update nightmare

2010-01-07 Thread Kevin Grittner
Leo Mannhart wrote: > You could also try to just update the whole table in one go, it is > probably faster than you expect. That would, of course, bloat the table and indexes horribly. One advantage of the incremental approach is that there is a chance for autovacuum or scheduled vacuums to m

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Grzegorz Jaśkiewicz
On Thu, Jan 7, 2010 at 3:05 PM, Lefteris wrote: > On Thu, Jan 7, 2010 at 3:51 PM, Ivan Voras wrote: >> On 7.1.2010 15:23, Lefteris wrote: >> >>> I think what you all said was very helpful and clear! The only part >>> that I still disagree/don't understand is the shared_buffer option:)) >> >> Did

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Lefteris
On Thu, Jan 7, 2010 at 3:51 PM, Ivan Voras wrote: > On 7.1.2010 15:23, Lefteris wrote: > >> I think what you all said was very helpful and clear! The only part >> that I still disagree/don't understand is the shared_buffer option:)) > > Did you ever try increasing shared_buffers to what was sugges

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Ivan Voras
On 7.1.2010 15:23, Lefteris wrote: > I think what you all said was very helpful and clear! The only part > that I still disagree/don't understand is the shared_buffer option:)) Did you ever try increasing shared_buffers to what was suggested (around 4 GB) and see what happens (I didn't see it in

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Jochen Erwied
Thursday, January 7, 2010, 2:47:36 PM you wrote: > so I understand from all of you that you don't consider the use of 25k > for sorting to be the cause of the slowdown? Probably I am missing Maybe you are reading the plan wrong: - the sort needs only 25kB of memory, and finishes in sub-second ti

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Lefteris
On Thu, Jan 7, 2010 at 3:14 PM, Alvaro Herrera wrote: > Lefteris escribió: >> Yes, I am reading the plan wrong! I thought that each row from the >> plan reported the total time for the operation but it actually reports >> the starting and ending point. >> >> So we all agree that the problem is on

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Alvaro Herrera
Alvaro Herrera escribió: > No amount of tinkering is going to change the fact that a seqscan is the > fastest way to execute these queries. Even if you got it to be all in > memory, it would still be much slower than the other systems which, I > gather, are using columnar storage and thus are per

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Alvaro Herrera
Lefteris escribió: > Yes, I am reading the plan wrong! I thought that each row from the > plan reported the total time for the operation but it actually reports > the starting and ending point. > > So we all agree that the problem is on the scans:) > > So the next question is why changing shared

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Lefteris
Yes, I am reading the plan wrong! I thought that each row from the plan reported the total time for the operation but it actually reports the starting and ending point. So we all agree that the problem is on the scans:) So the next question is why changing shared memory buffers will fix that? i o

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread A. Kretschmer
In response to Lefteris : > Hi Arjen, > > so I understand from all of you that you don't consider the use of 25k > for sorting to be the cause of the slowdown? Probably I am missing > something on the specific sort algorithm used by PG. My RAM does fill > up, mainly by file buffers from linux, but

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread A. Kretschmer
In response to Lefteris : > Thank you all for your answers! > > Andrea, I see the other way around what you are saying: > > Sort (cost=7407754.12..7407754.13 rows=4 width=2) (actual > time=371188.821..371188.823 rows=7 loops=1) > Seq Scan on ontime (cost=0.00..7143875.40 rows=52775727 width=2)

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Lefteris
Hi Arjen, so I understand from all of you that you don't consider the use of 25k for sorting to be the cause of the slowdown? Probably I am missing something on the specific sort algorithm used by PG. My RAM does fill up, mainly by file buffers from linux, but postgres process remains to 0.1% cons

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Arjen van der Meijden
On 7-1-2010 13:38 Lefteris wrote: I decided to run the benchmark over postgres to get some more experience and insights. Unfortunately, the query times I got from postgres were not the expected ones: Why were they not expected? In the given scenario, column databases are having a huge advantag

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Lefteris
Thank you all for your answers! Andrea, I see the other way around what you are saying: Sort (cost=7407754.12..7407754.13 rows=4 width=2) (actual time=371188.821..371188.823 rows=7 loops=1) Seq Scan on ontime (cost=0.00..7143875.40 rows=52775727 width=2) (actual time=190938.959..346180.079 rows

Re: [PERFORM] Massive table (500M rows) update nightmare

2010-01-07 Thread Leo Mannhart
Carlo Stonebanks wrote: > Our DB has an audit table which is 500M rows and growing. (FYI the > objects being audited are grouped semantically, not individual field > values). > > Recently we wanted to add a new feature and we altered the table to add > a new column. We are backfilling this varchar

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread A. Kretschmer
In response to Lefteris : > > airtraffic=# EXPLAIN ANALYZE SELECT "DayOfWeek", count(*) AS c FROM > ontime WHERE "Year" BETWEEN 2000 AND 2008 GROUP BY "DayOfWeek" ORDER > BY c DESC; >QUERY > PLAN >

Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Gurgel, Flavio
Hello - "Lefteris" escreveu: > Hi all, > > following the simple but interesting air-traffic benchmark published > at: > http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/ Quite interesting test, if you have the time to download all

[PERFORM] Air-traffic benchmark

2010-01-07 Thread Lefteris
Hi all, following the simple but interesting air-traffic benchmark published at: http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/ I decided to run the benchmark over postgres to get some more experience and insights. Unfortunately, the

Re: [PERFORM] Digesting explain analyze

2010-01-07 Thread Matthew Wakeling
On Thu, 7 Jan 2010, Jesper Krogh wrote: If disk seeks are killing you a kinda crazy idea would be to duplicate the table - clustering one by (id1) and the other one by an index on (id2) and unioning the results of each. That's doubling the disk space needs for the table. Is there any odds that

Re: [PERFORM] Digesting explain analyze

2010-01-07 Thread Oleg Bartunov
Jesper, the whole idea of bitmap index scan is to optimize heap access, so it ruins any ordering, returned by index. That's why our new KNNGist, which returned ordered index tuples doesn't supports bitmap index scan (note, this is only for knn search). Oleg On Wed, 6 Jan 2010, Robert Haas wrote