Re: [PERFORM] probelm with alter table add constraint......
Tom Lane-2 wrote: Albe Laurenz laurenz.a...@wien.gv.at writes: roopasatish wrote: I have an issue with the add foreign key constraint which goes for waiting and locks other queries as well. ALTER TABLE ONLY holding_positions ADD CONSTRAINT holding_positions_stock_id_fkey FOREIGN KEY (stock_id) REFERENCES stocks (stock_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION; The holding_positions table has no data in it as yet. Look in pg_catalog.pg_locks for a second transaction that holds a lock on the table holding_positions. This statement also needs to get lock on the referenced table stocks. An open transaction that's referenced either table will block it. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance I can't lock the table 'stocks' as its used continuously by many users. Is there a way to run the constraint in a background without affecting the users using the database. Thanks a lot in advance Roopa -- View this message in context: http://www.nabble.com/probelm-with-alter-table-add-constraint..-tp22903334p23170924.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance for high-volume log insertion
David, * da...@lang.hm (da...@lang.hm) wrote: in a recent thread about prepared statements, where it was identified that since the planning took place at the time of the prepare you sometimes have worse plans than for non-prepared statements, a proposal was made to have a 'pre-parsed, but not pre-planned' version of a prepared statement. This was dismissed as a waste of time (IIRC by Tom L) as the parsing time was negligable. was that just because it was a more complex query to plan? Yes, as I beleive was mentioned already, planning time for inserts is really small. Parsing time for inserts when there's little parsing that has to happen also isn't all *that* expensive and the same goes for conversions from textual representations of data to binary. We're starting to re-hash things, in my view. The low-hanging fruit is doing multiple things in a single transaction, either by using COPY, multi-value INSERTs, or just multiple INSERTs in a single transaction. That's absolutely step one. Adding in other things, where they make sense (prepared statements, binary format for things you have as binary, etc) is a good idea if it can be done along the way. Stephen signature.asc Description: Digital signature
Re: [PERFORM] GiST index performance
On Tue, 21 Apr 2009, Matthew Wakeling wrote: Unfortunately, it seems there is another bug in the picksplit function. My patch fixes a bug that reveals this new bug. The whole picksplit algorithm is fundamentally broken, and needs to be rewritten completely, which is what I am doing. I have now rewritten the picksplit and penalty functions for the bioseg data type, and they perform much better. The index size is now 164MB, compared to 350MB or so originally, and 2400MB after my earlier bugfix. Execution time of one of our queries (basically a nested loop join over a sequential scan and an index lookup in this index type) has gone down from 45 minutes to two minutes. I have abandoned Guttman's poly time split algorithm. A fundamental flaw in the picksplit algorithm is that it would create two separate target sets, and incrementally add entries to whichever one would grow the least in range size. However, if the entries arrived in any sort of order, they would all be added to the one set, growing it by a small amount each time. This caused the picksplit algorithm to split a set of 367 entries into a set of 366 and a set of one a high proportion of the time. I have replaced the picksplit algorithm with a simple one. For each range element, find the midpoint of the range. Then find the mean of all the midpoints. All elements with a midpoint below the mean go in one set, and the others go in the second set. This usually splits the entries in a meaningful way. I have also changed the penalty function. Previously, the penalty was the amount that the range would have to expand. So, if a new element fitted inside the existing range, then the penalty is zero. I have changed it to create a tie-break between multiple index pages that the element would fit in without expanding the range - the element should be inserted into the index page with the smallest range. This prevents large elements from messing up the index by forcing a large index page range that sucks in all the elements in the whole area into a non-selective group. I may experiment with improving these functions further. The main problem with this index is the fact that I need to index ranges with a wide variety of widths, and I have a couple more strategies yet to help with that. I will post a patch when I have ported my bioseg code over to the seg data type. Matthew -- Riker: Our memory pathways have become accustomed to your sensory input. Data: I understand - I'm fond of you too, Commander. And you too Counsellor -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance for high-volume log insertion
On Mon, 2009-04-20 at 14:53 -0700, da...@lang.hm wrote: the big win is going to be in changing the core of rsyslog so that it can process multiple messages at a time (bundling them into a single transaction) That isn't necessarily true as a single big win. The reason there is an overhead per transaction is because of commit delays, which can be removed by executing SET synchronous_commit = off; after connecting to PostgreSQL 8.3+ You won't need to do much else. This can also be enabled for a PostgreSQL user without even changing the rsyslog source code, so it should be easy enough to test. And this type of application is *exactly* what it was designed for. Some other speedups should also be possible, but this is easiest. I would guess that batching inserts will be a bigger win than simply using prepared statements because it will reduce network roundtrips to a centralised log server. Preparing statements might show up well on tests because people will do tests against a local database, most likely. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] GiST index performance
On Wed, 22 Apr 2009, Matthew Wakeling wrote: I will post a patch when I have ported my bioseg code over to the seg data type. Here is my patch ported over to the seg contrib package, attached. Apply it to seg.c and all should be well. A similar thing needs to be done to cube, but I haven't looked at that. Matthew -- An optimist sees the glass as half full, a pessimist as half empty, and an engineer as having redundant storage capacity.--- seg.c 2006-09-10 18:36:51.0 +0100 +++ seg.c_new 2009-04-22 17:03:13.0 +0100 @@ -69,7 +69,6 @@ bool seg_right(SEG * a, SEG * b); bool seg_over_right(SEG * a, SEG * b); SEG *seg_union(SEG * a, SEG * b); -SEG *seg_inter(SEG * a, SEG * b); void rt_seg_size(SEG * a, float *sz); float *seg_size(SEG * a); @@ -269,14 +268,22 @@ gseg_penalty(GISTENTRY *origentry, GISTENTRY *newentry, float *result) { SEG*ud; +SEG*origrange; +SEG*newrange; float tmp1, tmp2; - ud = seg_union((SEG *) DatumGetPointer(origentry-key), - (SEG *) DatumGetPointer(newentry-key)); +origrange = (SEG *) DatumGetPointer(origentry-key); +newrange = (SEG *) DatumGetPointer(newentry-key); + ud = seg_union(origrange, newrange); rt_seg_size(ud, tmp1); - rt_seg_size((SEG *) DatumGetPointer(origentry-key), tmp2); - *result = tmp1 - tmp2; + rt_seg_size(origrange, tmp2); +if (tmp1 == tmp2) { +rt_seg_size(newrange, tmp1); +*result = (tmp2 - tmp1) / tmp2; +} else { +*result = tmp1 - tmp2 + 1.0; +} #ifdef GIST_DEBUG fprintf(stderr, penalty\n); @@ -297,27 +304,16 @@ GIST_SPLITVEC *v) { OffsetNumber i, - j; SEG*datum_alpha, - *datum_beta; SEG*datum_l, *datum_r; - SEG*union_d, - *union_dl, - *union_dr; - SEG*inter_d; - boolfirsttime; - float size_alpha, - size_beta, - size_union, - size_inter; - float size_waste, - waste; - float size_l, - size_r; +boolfirsttime; + float lowest, + highest, +midpoint, +split, +midpointsum; int nbytes; - OffsetNumber seed_1 = 1, - seed_2 = 2; OffsetNumber *left, *right; OffsetNumber maxoff; @@ -326,107 +322,64 @@ fprintf(stderr, picksplit\n); #endif - maxoff = entryvec-n - 2; + maxoff = entryvec-n - 1; nbytes = (maxoff + 2) * sizeof(OffsetNumber); v-spl_left = (OffsetNumber *) palloc(nbytes); v-spl_right = (OffsetNumber *) palloc(nbytes); +midpointsum = 0.0; firsttime = true; - waste = 0.0; +lowest = 0.0; +highest = 0.0; - for (i = FirstOffsetNumber; i maxoff; i = OffsetNumberNext(i)) + for (i = FirstOffsetNumber; i = maxoff; i = OffsetNumberNext(i)) { datum_alpha = (SEG *) DatumGetPointer(entryvec-vector[i].key); - for (j = OffsetNumberNext(i); j = maxoff; j = OffsetNumberNext(j)) - { - datum_beta = (SEG *) DatumGetPointer(entryvec-vector[j].key); - - /* compute the wasted space by unioning these guys */ - /* size_waste = size_union - size_inter; */ - union_d = seg_union(datum_alpha, datum_beta); - rt_seg_size(union_d, size_union); - inter_d = seg_inter(datum_alpha, datum_beta); - rt_seg_size(inter_d, size_inter); - size_waste = size_union - size_inter; - - /* -* are these a more promising split that what we've already seen? -*/ - if (size_waste waste || firsttime) - { - waste = size_waste; - seed_1 = i; - seed_2 = j; - firsttime = false; - } - } +midpoint = (datum_alpha-lower + datum_alpha-upper) / 2; +midpointsum += midpoint; +if (firsttime || (midpoint lowest)) +{ +lowest = midpoint; +} +if (firsttime || (midpoint highest)) +{ +
Re: [PERFORM] performance for high-volume log insertion
On Wed, Apr 22, 2009 at 8:19 AM, Stephen Frost sfr...@snowman.net wrote: Yes, as I beleive was mentioned already, planning time for inserts is really small. Parsing time for inserts when there's little parsing that has to happen also isn't all *that* expensive and the same goes for conversions from textual representations of data to binary. We're starting to re-hash things, in my view. The low-hanging fruit is doing multiple things in a single transaction, either by using COPY, multi-value INSERTs, or just multiple INSERTs in a single transaction. That's absolutely step one. This is all well-known, covered information, but perhaps some numbers will help drive this home. 4 inserts into a single-column, unindexed table; with predictable results: separate inserts, no transaction: 21.21s separate inserts, same transaction: 1.89s 40 inserts, 100 rows/insert: 0.18s one 4-value insert: 0.16s 40 prepared inserts, 100 rows/insert: 0.15s COPY (text): 0.10s COPY (binary): 0.10s Of course, real workloads will change the weights, but this is more or less the magnitude of difference I always see--batch your inserts into single statements, and if that's not enough, skip to COPY. -- Glenn Maynard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance for high-volume log insertion
On Wed, 22 Apr 2009, Glenn Maynard wrote: On Wed, Apr 22, 2009 at 8:19 AM, Stephen Frost sfr...@snowman.net wrote: Yes, as I beleive was mentioned already, planning time for inserts is really small. Parsing time for inserts when there's little parsing that has to happen also isn't all *that* expensive and the same goes for conversions from textual representations of data to binary. We're starting to re-hash things, in my view. The low-hanging fruit is doing multiple things in a single transaction, either by using COPY, multi-value INSERTs, or just multiple INSERTs in a single transaction. That's absolutely step one. This is all well-known, covered information, but perhaps some numbers will help drive this home. 4 inserts into a single-column, unindexed table; with predictable results: separate inserts, no transaction: 21.21s separate inserts, same transaction: 1.89s are these done as seperate round trips? i.e. begin send insert send insert send .. end send or as one round trip? i.e. begin;insert;insert..;end 40 inserts, 100 rows/insert: 0.18s one 4-value insert: 0.16s 40 prepared inserts, 100 rows/insert: 0.15s are one of these missing a 0? COPY (text): 0.10s COPY (binary): 0.10s Of course, real workloads will change the weights, but this is more or less the magnitude of difference I always see--batch your inserts into single statements, and if that's not enough, skip to COPY. thanks for this information, this is exactly what I was looking for. can this get stored somewhere for reference? David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance for high-volume log insertion
Glenn, * Glenn Maynard (glennfmayn...@gmail.com) wrote: This is all well-known, covered information, but perhaps some numbers will help drive this home. 4 inserts into a single-column, unindexed table; with predictable results: Thanks for doing the work. I had been intending to but hadn't gotten to it yet. separate inserts, no transaction: 21.21s separate inserts, same transaction: 1.89s 40 inserts, 100 rows/insert: 0.18s one 4-value insert: 0.16s 40 prepared inserts, 100 rows/insert: 0.15s COPY (text): 0.10s COPY (binary): 0.10s What about 4 individual prepared inserts? Just curious about it. Also, kind of suprised about COPY text vs. binary. What was the data type in the table..? If text, that makes sense, if it was an integer or something else, I'm kind of suprised. Thanks, Stephen signature.asc Description: Digital signature
Re: [PERFORM] performance for high-volume log insertion
Stephen Frost sfr...@snowman.net writes: * Glenn Maynard (glennfmayn...@gmail.com) wrote: separate inserts, no transaction: 21.21s separate inserts, same transaction: 1.89s 40 inserts, 100 rows/insert: 0.18s one 4-value insert: 0.16s 40 prepared inserts, 100 rows/insert: 0.15s COPY (text): 0.10s COPY (binary): 0.10s What about 4 individual prepared inserts? Just curious about it. Also, just to be clear: were the 40 insert cases 40 separate transactions or one transaction? I think the latter was meant but it's not 100% clear. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance for high-volume log insertion
Stephen Frost wrote: You're re-hashing things I've already said. The big win is batching the inserts, however that's done, into fewer transactions. Sure, multi-row inserts could be used to do that, but so could dropping begin/commits in right now which probably takes even less effort. Well, I think you are seriously underestimating the cost of the round-trip compared to all the other effects (possibly bar the commits). When I tested the union insert technique on SQLServer and Sybase I got measurable improvements going from 100 row statements to 200 row statements, though I suspect in that case the per-statement overheads are quite high. I expected improvements from 10 to 20 row batches, but it carried on getting better for a long time after that. The Sybase parser runs out of workspace first. No, as was pointed out previously already, you really just need 2. A And I'm disagreeing with that. Single row is a given, but I think you'll find it pays to have one round trip if at all possible and invoking multiple prepared statements can work against this. see if there's really much of a performance difference between a 50-insert prepared statement, and 50 1-insert prepared statements. If they're both done in larger transactions, I don't know that there's really alot of performance difference. I think you'll be surprised, but the only way is to test it. And also the simple 50 row single insert as text. See if you can measure the difference between that and the prepared statement. storage overhead? indexing overhead? We're talking about prepared statements here, what additional storage requirement do you think those would impose? What additional indexing overhead? I don't believe we actually do anything differently between prepared statements and multi-row inserts that would change either of those. That's my point. You will brickwall on the actual database operations for execution early enough that the efficiency difference between parse-and-execute and prepared statements will be hard to measure - at least if you have multi-row statements. But this really needs testing and timing. Ah, latency is a reasonable thing to bring up. Of course, if you want to talk about latency then you get to consider that multi-insert SQL will inherently have larger packet sizes which could cause them to be delayed in some QoS arrangements. No, I mean latency from round trips from the client to the server process. I don't know why you think I'd mean that. As I said, most of this is a re-hash of things already said. The low-hanging fruit here is doing multiple inserts inside of a transaction, rather than 1 insert per transaction. Regardless of how that's done, it's going to give the best bang-for-buck. It will complicate the client code some, regardless of how it's implemented, so that failures are handled gracefully (if that's something you care about anyway), but as there exists some queueing mechanisms in rsyslog already, hopefully it won't be too bad. I think you have largely missed the point. There are two things here: 1) how many rows per commit 2) how many rows per logical RPC (ie round trip) between the client and server processes We are agreed that the first is a Very Big Deal, but you seem resistant to the idea that the second of these is a big deal once you've dealt with the former. My experience has been that its much more important than any benefits of preparing statements etc, particularly if the use of a prepared statement can make it harder to do multi-row RPCs because the protocol doesn't allow pipelining (at least without things getting very hairy). Clearly 'copy' is your friend for this too, at least potentially (even if it means streaming to a staging table). James -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance for high-volume log insertion
On Wed, Apr 22, 2009 at 4:07 PM, da...@lang.hm wrote: are these done as seperate round trips? i.e. begin send insert send insert send .. end send or as one round trip? All tests were done by constructing a file and using time psql -f 40 inserts, 100 rows/insert: 0.18s one 4-value insert: 0.16s 40 prepared inserts, 100 rows/insert: 0.15s are one of these missing a 0? Sorry, 400 * 100. All cases inserted 4 rows, and I deleted all rows between tests (but did not recreate the table). -- Glenn Maynard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance for high-volume log insertion
On Wed, 2009-04-22 at 21:53 +0100, James Mansion wrote: Stephen Frost wrote: You're re-hashing things I've already said. The big win is batching the inserts, however that's done, into fewer transactions. Sure, multi-row inserts could be used to do that, but so could dropping begin/commits in right now which probably takes even less effort. Well, I think you are seriously underestimating the cost of the round-trip compared The breakdown is this: 1. Eliminate single inserts 2. Eliminate round trips Yes round trips are hugely expensive. No, as was pointed out previously already, you really just need 2. A And I'm disagreeing with that. Single row is a given, but I think you'll find it pays to have one My experience shows that you are correct. Even if you do a single BEGIN; with 1000 inserts you are still getting a round trip for every insert until you commit. Based on 20ms round trip time, you are talking 20seconds additional overhead. Joshua D. Drake -- PostgreSQL - XMPP: jdr...@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance for high-volume log insertion
On Wed, Apr 22, 2009 at 4:37 PM, Stephen Frost sfr...@snowman.net wrote: Thanks for doing the work. I had been intending to but hadn't gotten to it yet. I'd done similar tests recently, for some batch import code, so it was just a matter of recreating it. separate inserts, no transaction: 21.21s separate inserts, same transaction: 1.89s 40 inserts, 100 rows/insert: 0.18s one 4-value insert: 0.16s 40 prepared inserts, 100 rows/insert: 0.15s COPY (text): 0.10s COPY (binary): 0.10s What about 4 individual prepared inserts? Just curious about it. 4 inserts, one prepared statement each (constructing the prepared statement only once), in a single transaction: 1.68s I'm surprised that there's any win here at all. Also, kind of suprised about COPY text vs. binary. What was the data type in the table..? If text, that makes sense, if it was an integer or something else, I'm kind of suprised. Each row had one integer column. I expect strings to be harder to parse, since it's allocating buffers and parsing escapes, which is usually more expensive than parsing an integer out of a string. I'd expect the difference to be negligible either way, though, and I'd be interested in hearing about use cases where binary copying is enough of a win to be worth the development cost of maintaining the feature. On Wed, Apr 22, 2009 at 4:49 PM, Tom Lane t...@sss.pgh.pa.us wrote: Also, just to be clear: were the 40 insert cases 40 separate transactions or one transaction? I think the latter was meant but it's not 100% clear. One transaction--the only case where I ran more than one transaction was the first. -- Glenn Maynard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance for high-volume log insertion
On Wed, Apr 22, 2009 at 4:53 PM, James Mansion ja...@mansionfamily.plus.com wrote: And I'm disagreeing with that. Single row is a given, but I think you'll find it pays to have one round trip if at all possible and invoking multiple prepared statements can work against this. You're talking about round-trips to a *local* server, on the same system, not a dedicated server with network round-trips, right? Blocking round trips to another process on the same server should be fairly cheap--that is, writing to a socket (or pipe, or localhost TCP connection) where the other side is listening for it; and then blocking in return for the response. The act of writing to an FD that another process is waiting for will make the kernel mark the process as ready to wake up immediately, and the act of blocking for the response will kick the scheduler to some waiting process, so as long as there isn't something else to compete for CPU for, each write/read will wake up the other process instantly. There's a task switching cost, but that's too small to be relevant here. Doing 100 local round trips, over a pipe: 5.25s (5 *microseconds* each), code attached. The cost *should* be essentially identical for any local transport (pipes, named pipes, local TCP connections), since the underlying scheduler mechanisms are the same. That's not to say that batching inserts doesn't make a difference--it clearly does, and it would probably be a much larger difference with actual network round-trips--but round-trips to a local server aren't inherently slow. I'd be (casually) interested in knowing what the actual costs are behind an SQL command round-trip (where the command isn't blocking on I/O, eg. an INSERT inside a transaction, with no I/O for things like constraint checks that need to be done before the command can return success). -- Glenn Maynard #include stdio.h #include unistd.h #include assert.h main() { int parent[2]; int ret = pipe(parent); assert(ret != -1); int child[2]; ret = pipe(child); assert(ret != -1); int pid = fork(); assert(pid != -1); if(pid != 0) { // parent close(parent[0]); close(child[1]); int wfd = parent[1]; int rfd = child[0]; printf(go\n); int i = 100; while(i--) { char c = 1; ret = write(wfd, c, 1); assert(ret == 1); ret = read(rfd, c, 1); assert(ret == 1); } printf(done\n); } else { // child close(parent[1]); close(child[0]); int wfd = child[1]; int rfd = parent[0]; int i = 100; while(i--) { char c = 1; ret = read(rfd, c, 1); assert(ret == 1); ret = write(wfd, c, 1); assert(ret == 1); } } } -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance for high-volume log insertion
* Glenn Maynard (glennfmayn...@gmail.com) wrote: separate inserts, no transaction: 21.21s separate inserts, same transaction: 1.89s 40 inserts, 100 rows/insert: 0.18s one 4-value insert: 0.16s 40 prepared inserts, 100 rows/insert: 0.15s COPY (text): 0.10s COPY (binary): 0.10s What about 4 individual prepared inserts? Just curious about it. 4 inserts, one prepared statement each (constructing the prepared statement only once), in a single transaction: 1.68s I'm surprised that there's any win here at all. For a single column table, I wouldn't expect much either. With more columns I think it would be a larger improvement. Each row had one integer column. I expect strings to be harder to parse, since it's allocating buffers and parsing escapes, which is usually more expensive than parsing an integer out of a string. I'd expect the difference to be negligible either way, though, and I'd be interested in hearing about use cases where binary copying is enough of a win to be worth the development cost of maintaining the feature. I've seen it help, but I was sending everything as binary (I figured, once I'm doing it, might as well do it all), which included dates, timestamps, IP addresses, integers, and some text. It may have more of an impact on dates and timestamps than on simple integers. Thanks! Stephen signature.asc Description: Digital signature
Re: [PERFORM] performance for high-volume log insertion
On Wed, Apr 22, 2009 at 5:51 PM, Stephen Frost sfr...@snowman.net wrote: For a single column table, I wouldn't expect much either. With more columns I think it would be a larger improvement. Maybe. I'm not sure why parsing (1,2,3,4,5) in an EXECUTE parameter should be faster than parsing the exact same thing in an INSERT, though. I've seen it help, but I was sending everything as binary (I figured, once I'm doing it, might as well do it all), which included dates, timestamps, IP addresses, integers, and some text. It may have more of an impact on dates and timestamps than on simple integers. Of course, you still need to get it in that format. Be careful to include any parsing you're doing to create the binary date in the benchmarks. Inevitably, at least part of the difference will be costs simply moving from the psql process to your own. -- Glenn Maynard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance for high-volume log insertion
On Wed, 22 Apr 2009, Glenn Maynard wrote: On Wed, Apr 22, 2009 at 4:53 PM, James Mansion ja...@mansionfamily.plus.com wrote: And I'm disagreeing with that. Single row is a given, but I think you'll find it pays to have one round trip if at all possible and invoking multiple prepared statements can work against this. You're talking about round-trips to a *local* server, on the same system, not a dedicated server with network round-trips, right? the use-case for a production setup for logging servers would probably include a network hop. David Lang Blocking round trips to another process on the same server should be fairly cheap--that is, writing to a socket (or pipe, or localhost TCP connection) where the other side is listening for it; and then blocking in return for the response. The act of writing to an FD that another process is waiting for will make the kernel mark the process as ready to wake up immediately, and the act of blocking for the response will kick the scheduler to some waiting process, so as long as there isn't something else to compete for CPU for, each write/read will wake up the other process instantly. There's a task switching cost, but that's too small to be relevant here. Doing 100 local round trips, over a pipe: 5.25s (5 *microseconds* each), code attached. The cost *should* be essentially identical for any local transport (pipes, named pipes, local TCP connections), since the underlying scheduler mechanisms are the same. That's not to say that batching inserts doesn't make a difference--it clearly does, and it would probably be a much larger difference with actual network round-trips--but round-trips to a local server aren't inherently slow. I'd be (casually) interested in knowing what the actual costs are behind an SQL command round-trip (where the command isn't blocking on I/O, eg. an INSERT inside a transaction, with no I/O for things like constraint checks that need to be done before the command can return success). -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance for high-volume log insertion
* Glenn Maynard (glennfmayn...@gmail.com) wrote: On Wed, Apr 22, 2009 at 5:51 PM, Stephen Frost sfr...@snowman.net wrote: For a single column table, I wouldn't expect much either. With more columns I think it would be a larger improvement. Maybe. I'm not sure why parsing (1,2,3,4,5) in an EXECUTE parameter should be faster than parsing the exact same thing in an INSERT, though. Erm.. Prepared queries is about using PQexecPrepared(), not about sending a text string as an SQL EXECUTE(). PQexecPrepared takes an array of arguments. That gets translated into a Bind command in the protocol with a defined number of parameters and a length for each parameter being passed. That removes any need for scanning/parsing the string sent to the backend. That's the savings I'm referring to. If you weren't using PQexecPrepared() (and using psql, you wouldn't be..), then the difference you saw was more likely planning cost. Of course, you still need to get it in that format. Be careful to include any parsing you're doing to create the binary date in the benchmarks. Inevitably, at least part of the difference will be costs simply moving from the psql process to your own. Sure. What I recall from when I was working on this is that it wasn't terribly hard to go from unix timestamps (epoch from 1970) to a PG timestamp format (and there was nice example code in the backend) in terms of CPU time. Thanks, Stephen signature.asc Description: Digital signature
Re: [PERFORM] performance for high-volume log insertion
* da...@lang.hm (da...@lang.hm) wrote: On Wed, 22 Apr 2009, Glenn Maynard wrote: You're talking about round-trips to a *local* server, on the same system, not a dedicated server with network round-trips, right? the use-case for a production setup for logging servers would probably include a network hop. Sure, but there's a big difference between a rtt of 0.041ms (my dinky home network) and 20ms (from my home network in Virginia to Boston). I wasn't intending to discount latency, it can be a concen, but I doubt it'll be 20ms for most people.. Thanks, Stephen signature.asc Description: Digital signature
Re: [PERFORM] performance for high-volume log insertion
On Wed, 22 Apr 2009, Stephen Frost wrote: * Glenn Maynard (glennfmayn...@gmail.com) wrote: On Wed, Apr 22, 2009 at 5:51 PM, Stephen Frost sfr...@snowman.net wrote: For a single column table, I wouldn't expect much either. With more columns I think it would be a larger improvement. Maybe. I'm not sure why parsing (1,2,3,4,5) in an EXECUTE parameter should be faster than parsing the exact same thing in an INSERT, though. Erm.. Prepared queries is about using PQexecPrepared(), not about sending a text string as an SQL EXECUTE(). PQexecPrepared takes an array of arguments. That gets translated into a Bind command in the protocol with a defined number of parameters and a length for each parameter being passed. That removes any need for scanning/parsing the string sent to the backend. That's the savings I'm referring to. are you sure? I thought that what goes out over the wire is always text. David Lang If you weren't using PQexecPrepared() (and using psql, you wouldn't be..), then the difference you saw was more likely planning cost. Of course, you still need to get it in that format. Be careful to include any parsing you're doing to create the binary date in the benchmarks. Inevitably, at least part of the difference will be costs simply moving from the psql process to your own. Sure. What I recall from when I was working on this is that it wasn't terribly hard to go from unix timestamps (epoch from 1970) to a PG timestamp format (and there was nice example code in the backend) in terms of CPU time. Thanks, Stephen -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance