Re: [PERFORM] probelm with alter table add constraint......

2009-04-22 Thread roopabenzer



Tom Lane-2 wrote:
 
 Albe Laurenz laurenz.a...@wien.gv.at writes:
 roopasatish wrote:
 I have an issue with the add foreign key constraint which 
 goes for waiting and locks other queries as well.
 
 ALTER TABLE ONLY holding_positions ADD CONSTRAINT 
 holding_positions_stock_id_fkey FOREIGN KEY (stock_id)
 REFERENCES stocks (stock_id) MATCH SIMPLE
 ON UPDATE NO ACTION ON DELETE NO ACTION;
 
 The holding_positions table has no data in it as yet.
 
 Look in pg_catalog.pg_locks for a second transaction that
 holds a lock on the table holding_positions.
 
 This statement also needs to get lock on the referenced table stocks.
 An open transaction that's referenced either table will block it.
 
   regards, tom lane
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 
 


I can't lock the table 'stocks' as its used continuously by many users. Is
there a way to run the constraint in a background without affecting the
users using the database.

Thanks a lot in advance
Roopa
-- 
View this message in context: 
http://www.nabble.com/probelm-with-alter-table-add-constraint..-tp22903334p23170924.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance for high-volume log insertion

2009-04-22 Thread Stephen Frost
David,

* da...@lang.hm (da...@lang.hm) wrote:
 in a recent thread about prepared statements, where it was identified 
 that since the planning took place at the time of the prepare you 
 sometimes have worse plans than for non-prepared statements, a proposal 
 was made to have a 'pre-parsed, but not pre-planned' version of a 
 prepared statement. This was dismissed as a waste of time (IIRC by Tom L) 
 as the parsing time was negligable.

 was that just because it was a more complex query to plan?

Yes, as I beleive was mentioned already, planning time for inserts is
really small.  Parsing time for inserts when there's little parsing that
has to happen also isn't all *that* expensive and the same goes for
conversions from textual representations of data to binary.

We're starting to re-hash things, in my view.  The low-hanging fruit is
doing multiple things in a single transaction, either by using COPY,
multi-value INSERTs, or just multiple INSERTs in a single transaction.
That's absolutely step one.

Adding in other things, where they make sense (prepared statements,
binary format for things you have as binary, etc) is a good idea if it
can be done along the way.

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] GiST index performance

2009-04-22 Thread Matthew Wakeling

On Tue, 21 Apr 2009, Matthew Wakeling wrote:
Unfortunately, it seems there is another bug in the picksplit function. My 
patch fixes a bug that reveals this new bug. The whole picksplit algorithm is 
fundamentally broken, and needs to be rewritten completely, which is what I 
am doing.


I have now rewritten the picksplit and penalty functions for the bioseg 
data type, and they perform much better. The index size is now 164MB, 
compared to 350MB or so originally, and 2400MB after my earlier bugfix. 
Execution time of one of our queries (basically a nested loop join over 
a sequential scan and an index lookup in this index type) has gone down 
from 45 minutes to two minutes.


I have abandoned Guttman's poly time split algorithm. A fundamental flaw 
in the picksplit algorithm is that it would create two separate target 
sets, and incrementally add entries to whichever one would grow the least 
in range size. However, if the entries arrived in any sort of order, they 
would all be added to the one set, growing it by a small amount each time. 
This caused the picksplit algorithm to split a set of 367 entries into a 
set of 366 and a set of one a high proportion of the time.


I have replaced the picksplit algorithm with a simple one. For each range 
element, find the midpoint of the range. Then find the mean of all the 
midpoints. All elements with a midpoint below the mean go in one set, and 
the others go in the second set. This usually splits the entries in a 
meaningful way.


I have also changed the penalty function. Previously, the penalty was the 
amount that the range would have to expand. So, if a new element fitted 
inside the existing range, then the penalty is zero. I have changed it to 
create a tie-break between multiple index pages that the element would fit 
in without expanding the range - the element should be inserted into the 
index page with the smallest range. This prevents large elements from 
messing up the index by forcing a large index page range that sucks in all 
the elements in the whole area into a non-selective group.


I may experiment with improving these functions further. The main problem 
with this index is the fact that I need to index ranges with a wide 
variety of widths, and I have a couple more strategies yet to help with 
that.


I will post a patch when I have ported my bioseg code over to the seg data 
type.


Matthew

--
Riker: Our memory pathways have become accustomed to your sensory input.
Data:  I understand - I'm fond of you too, Commander. And you too Counsellor

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance for high-volume log insertion

2009-04-22 Thread Simon Riggs

On Mon, 2009-04-20 at 14:53 -0700, da...@lang.hm wrote:

 the big win is going to be in changing the core of rsyslog so that it can 
 process multiple messages at a time (bundling them into a single 
 transaction)

That isn't necessarily true as a single big win.

The reason there is an overhead per transaction is because of commit
delays, which can be removed by executing

  SET synchronous_commit = off; 

after connecting to PostgreSQL 8.3+

You won't need to do much else. This can also be enabled for a
PostgreSQL user without even changing the rsyslog source code, so it
should be easy enough to test.

And this type of application is *exactly* what it was designed for.

Some other speedups should also be possible, but this is easiest. 

I would guess that batching inserts will be a bigger win than simply
using prepared statements because it will reduce network roundtrips to a
centralised log server. Preparing statements might show up well on tests
because people will do tests against a local database, most likely.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] GiST index performance

2009-04-22 Thread Matthew Wakeling

On Wed, 22 Apr 2009, Matthew Wakeling wrote:
I will post a patch when I have ported my bioseg code over to the seg data 
type.


Here is my patch ported over to the seg contrib package, attached. Apply 
it to seg.c and all should be well. A similar thing needs to be done to 
cube, but I haven't looked at that.


Matthew

--
An optimist sees the glass as half full, a pessimist as half empty,
and an engineer as having redundant storage capacity.--- seg.c   2006-09-10 18:36:51.0 +0100
+++ seg.c_new   2009-04-22 17:03:13.0 +0100
@@ -69,7 +69,6 @@
 bool   seg_right(SEG * a, SEG * b);
 bool   seg_over_right(SEG * a, SEG * b);
 SEG   *seg_union(SEG * a, SEG * b);
-SEG   *seg_inter(SEG * a, SEG * b);
 void   rt_seg_size(SEG * a, float *sz);
 float *seg_size(SEG * a);
 
@@ -269,14 +268,22 @@
 gseg_penalty(GISTENTRY *origentry, GISTENTRY *newentry, float *result)
 {
SEG*ud;
+SEG*origrange;
+SEG*newrange;
float   tmp1,
tmp2;
 
-   ud = seg_union((SEG *) DatumGetPointer(origentry-key),
-  (SEG *) DatumGetPointer(newentry-key));
+origrange = (SEG *) DatumGetPointer(origentry-key);
+newrange = (SEG *) DatumGetPointer(newentry-key);
+   ud = seg_union(origrange, newrange);
rt_seg_size(ud, tmp1);
-   rt_seg_size((SEG *) DatumGetPointer(origentry-key), tmp2);
-   *result = tmp1 - tmp2;
+   rt_seg_size(origrange, tmp2);
+if (tmp1 == tmp2) {
+rt_seg_size(newrange, tmp1);
+*result = (tmp2 - tmp1) / tmp2;
+} else {
+*result = tmp1 - tmp2 + 1.0;
+}
 
 #ifdef GIST_DEBUG
fprintf(stderr, penalty\n);
@@ -297,27 +304,16 @@
   GIST_SPLITVEC *v)
 {
OffsetNumber i,
-   j;
SEG*datum_alpha,
-  *datum_beta;
SEG*datum_l,
   *datum_r;
-   SEG*union_d,
-  *union_dl,
-  *union_dr;
-   SEG*inter_d;
-   boolfirsttime;
-   float   size_alpha,
-   size_beta,
-   size_union,
-   size_inter;
-   float   size_waste,
-   waste;
-   float   size_l,
-   size_r;
+boolfirsttime;
+   float   lowest,
+   highest,
+midpoint,
+split,
+midpointsum;
int nbytes;
-   OffsetNumber seed_1 = 1,
-   seed_2 = 2;
OffsetNumber *left,
   *right;
OffsetNumber maxoff;
@@ -326,107 +322,64 @@
fprintf(stderr, picksplit\n);
 #endif
 
-   maxoff = entryvec-n - 2;
+   maxoff = entryvec-n - 1;
nbytes = (maxoff + 2) * sizeof(OffsetNumber);
v-spl_left = (OffsetNumber *) palloc(nbytes);
v-spl_right = (OffsetNumber *) palloc(nbytes);
 
+midpointsum = 0.0;
firsttime = true;
-   waste = 0.0;
+lowest = 0.0;
+highest = 0.0;
 
-   for (i = FirstOffsetNumber; i  maxoff; i = OffsetNumberNext(i))
+   for (i = FirstOffsetNumber; i = maxoff; i = OffsetNumberNext(i))
{
datum_alpha = (SEG *) DatumGetPointer(entryvec-vector[i].key);
-   for (j = OffsetNumberNext(i); j = maxoff; j = 
OffsetNumberNext(j))
-   {
-   datum_beta = (SEG *) 
DatumGetPointer(entryvec-vector[j].key);
-
-   /* compute the wasted space by unioning these guys */
-   /* size_waste = size_union - size_inter; */
-   union_d = seg_union(datum_alpha, datum_beta);
-   rt_seg_size(union_d, size_union);
-   inter_d = seg_inter(datum_alpha, datum_beta);
-   rt_seg_size(inter_d, size_inter);
-   size_waste = size_union - size_inter;
-
-   /*
-* are these a more promising split that what we've 
already seen?
-*/
-   if (size_waste  waste || firsttime)
-   {
-   waste = size_waste;
-   seed_1 = i;
-   seed_2 = j;
-   firsttime = false;
-   }
-   }
+midpoint = (datum_alpha-lower + datum_alpha-upper) / 2;
+midpointsum += midpoint;
+if (firsttime || (midpoint  lowest))
+{
+lowest = midpoint;
+}
+if (firsttime || (midpoint  highest))
+{
+

Re: [PERFORM] performance for high-volume log insertion

2009-04-22 Thread Glenn Maynard
On Wed, Apr 22, 2009 at 8:19 AM, Stephen Frost sfr...@snowman.net wrote:
 Yes, as I beleive was mentioned already, planning time for inserts is
 really small.  Parsing time for inserts when there's little parsing that
 has to happen also isn't all *that* expensive and the same goes for
 conversions from textual representations of data to binary.

 We're starting to re-hash things, in my view.  The low-hanging fruit is
 doing multiple things in a single transaction, either by using COPY,
 multi-value INSERTs, or just multiple INSERTs in a single transaction.
 That's absolutely step one.

This is all well-known, covered information, but perhaps some numbers
will help drive this home.  4 inserts into a single-column,
unindexed table; with predictable results:

separate inserts, no transaction: 21.21s
separate inserts, same transaction: 1.89s
40 inserts, 100 rows/insert: 0.18s
one 4-value insert: 0.16s
40 prepared inserts, 100 rows/insert: 0.15s
COPY (text): 0.10s
COPY (binary): 0.10s

Of course, real workloads will change the weights, but this is more or
less the magnitude of difference I always see--batch your inserts into
single statements, and if that's not enough, skip to COPY.

-- 
Glenn Maynard

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance for high-volume log insertion

2009-04-22 Thread david

On Wed, 22 Apr 2009, Glenn Maynard wrote:


On Wed, Apr 22, 2009 at 8:19 AM, Stephen Frost sfr...@snowman.net wrote:

Yes, as I beleive was mentioned already, planning time for inserts is
really small.  Parsing time for inserts when there's little parsing that
has to happen also isn't all *that* expensive and the same goes for
conversions from textual representations of data to binary.

We're starting to re-hash things, in my view.  The low-hanging fruit is
doing multiple things in a single transaction, either by using COPY,
multi-value INSERTs, or just multiple INSERTs in a single transaction.
That's absolutely step one.


This is all well-known, covered information, but perhaps some numbers
will help drive this home.  4 inserts into a single-column,
unindexed table; with predictable results:

separate inserts, no transaction: 21.21s
separate inserts, same transaction: 1.89s


are these done as seperate round trips?

i.e.
begin send
insert send
insert send
..
end send

or as one round trip?

i.e.
begin;insert;insert..;end


40 inserts, 100 rows/insert: 0.18s
one 4-value insert: 0.16s
40 prepared inserts, 100 rows/insert: 0.15s


are one of these missing a 0?


COPY (text): 0.10s
COPY (binary): 0.10s

Of course, real workloads will change the weights, but this is more or
less the magnitude of difference I always see--batch your inserts into
single statements, and if that's not enough, skip to COPY.


thanks for this information, this is exactly what I was looking for.

can this get stored somewhere for reference?

David Lang
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance for high-volume log insertion

2009-04-22 Thread Stephen Frost
Glenn,

* Glenn Maynard (glennfmayn...@gmail.com) wrote:
 This is all well-known, covered information, but perhaps some numbers
 will help drive this home.  4 inserts into a single-column,
 unindexed table; with predictable results:

Thanks for doing the work.  I had been intending to but hadn't gotten to
it yet.

 separate inserts, no transaction: 21.21s
 separate inserts, same transaction: 1.89s
 40 inserts, 100 rows/insert: 0.18s
 one 4-value insert: 0.16s
 40 prepared inserts, 100 rows/insert: 0.15s
 COPY (text): 0.10s
 COPY (binary): 0.10s

What about 4 individual prepared inserts?  Just curious about it.

Also, kind of suprised about COPY text vs. binary.  What was the data
type in the table..?  If text, that makes sense, if it was an integer or
something else, I'm kind of suprised.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] performance for high-volume log insertion

2009-04-22 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * Glenn Maynard (glennfmayn...@gmail.com) wrote:
 separate inserts, no transaction: 21.21s
 separate inserts, same transaction: 1.89s
 40 inserts, 100 rows/insert: 0.18s
 one 4-value insert: 0.16s
 40 prepared inserts, 100 rows/insert: 0.15s
 COPY (text): 0.10s
 COPY (binary): 0.10s

 What about 4 individual prepared inserts?  Just curious about it.

Also, just to be clear: were the 40 insert cases 40 separate
transactions or one transaction?  I think the latter was meant but
it's not 100% clear.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance for high-volume log insertion

2009-04-22 Thread James Mansion

Stephen Frost wrote:

You're re-hashing things I've already said.  The big win is batching the
inserts, however that's done, into fewer transactions.  Sure, multi-row
inserts could be used to do that, but so could dropping begin/commits in
right now which probably takes even less effort.
  
Well, I think you are seriously underestimating the cost of the 
round-trip compared
to all the other effects (possibly bar the commits).  When I tested the 
union insert

technique on SQLServer and Sybase I got measurable improvements going from
100 row statements to 200 row statements, though I suspect in that case the
per-statement overheads are quite high.  I expected improvements from 10 
to 20
row batches, but it carried on getting better for a long time after 
that.  The

Sybase parser runs out of workspace first.



No, as was pointed out previously already, you really just need 2.  A
  
And I'm disagreeing with that.  Single row is a given, but I think 
you'll find it pays to have one
round trip if at all possible and invoking multiple prepared statements 
can work against this.



see if there's really much of a performance difference between a
50-insert prepared statement, and 50 1-insert prepared statements.  If
they're both done in larger transactions, I don't know that there's
really alot of performance difference.
  
I think you'll be surprised, but the only way is to test it.  And also 
the simple 50 row single
insert as text.  See if you can measure the difference between that and 
the prepared

statement.

storage overhead?  indexing overhead?  We're talking about prepared
statements here, what additional storage requirement do you think those
would impose?  What additional indexing overhead?  I don't believe we
actually do anything differently between prepared statements and
multi-row inserts that would change either of those.
  
That's my point.  You will brickwall on the actual database operations 
for execution
early enough that the efficiency difference between parse-and-execute 
and prepared
statements will be hard to measure - at least if you have multi-row 
statements.


But this really needs testing and timing.


Ah, latency is a reasonable thing to bring up.  Of course, if you want
to talk about latency then you get to consider that multi-insert SQL
will inherently have larger packet sizes which could cause them to be
delayed in some QoS arrangements.
  
No, I mean latency from round trips from the client to the server 
process.  I don't know why

you think I'd mean that.

As I said, most of this is a re-hash of things already said.  The
low-hanging fruit here is doing multiple inserts inside of a
transaction, rather than 1 insert per transaction.  Regardless of how
that's done, it's going to give the best bang-for-buck.  It will
complicate the client code some, regardless of how it's implemented, so
that failures are handled gracefully (if that's something you care about
anyway), but as there exists some queueing mechanisms in rsyslog
already, hopefully it won't be too bad.
  

I think you have largely missed the point. There are two things here:
1) how many rows per commit
2) how many rows per logical RPC (ie round trip) between the client
  and server processes

We are agreed that the first is a Very Big Deal, but you seem resistant to
the idea that the second of these is a big deal once you've dealt with 
the former.


My experience has been that its much more important than any benefits of
preparing statements etc, particularly if the use of a prepared 
statement can

make it harder to do multi-row RPCs because the protocol doesn't
allow pipelining (at least without things getting very hairy).

Clearly 'copy' is your friend for this too, at least potentially (even 
if it means

streaming to a staging table).


James


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance for high-volume log insertion

2009-04-22 Thread Glenn Maynard
On Wed, Apr 22, 2009 at 4:07 PM,  da...@lang.hm wrote:
 are these done as seperate round trips?

 i.e.
 begin send
 insert send
 insert send
 ..
 end send

 or as one round trip?

All tests were done by constructing a file and using time psql -f 

 40 inserts, 100 rows/insert: 0.18s
 one 4-value insert: 0.16s
 40 prepared inserts, 100 rows/insert: 0.15s

 are one of these missing a 0?

Sorry, 400 * 100.  All cases inserted 4 rows, and I deleted all
rows between tests (but did not recreate the table).

-- 
Glenn Maynard

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance for high-volume log insertion

2009-04-22 Thread Joshua D. Drake
On Wed, 2009-04-22 at 21:53 +0100, James Mansion wrote:
 Stephen Frost wrote:
  You're re-hashing things I've already said.  The big win is batching the
  inserts, however that's done, into fewer transactions.  Sure, multi-row
  inserts could be used to do that, but so could dropping begin/commits in
  right now which probably takes even less effort.

 Well, I think you are seriously underestimating the cost of the 
 round-trip compared

The breakdown is this:

1. Eliminate single inserts
2. Eliminate round trips

Yes round trips are hugely expensive. 

 
  No, as was pointed out previously already, you really just need 2.  A

 And I'm disagreeing with that.  Single row is a given, but I think 
 you'll find it pays to have one

My experience shows that you are correct. Even if you do a single BEGIN;
with 1000 inserts you are still getting a round trip for every insert
until you commit. Based on 20ms round trip time, you are talking
20seconds additional overhead.

Joshua D. Drake


-- 
PostgreSQL - XMPP: jdr...@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance for high-volume log insertion

2009-04-22 Thread Glenn Maynard
On Wed, Apr 22, 2009 at 4:37 PM, Stephen Frost sfr...@snowman.net wrote:
 Thanks for doing the work.  I had been intending to but hadn't gotten to
 it yet.

I'd done similar tests recently, for some batch import code, so it was
just a matter of recreating it.

 separate inserts, no transaction: 21.21s
 separate inserts, same transaction: 1.89s
 40 inserts, 100 rows/insert: 0.18s
 one 4-value insert: 0.16s
 40 prepared inserts, 100 rows/insert: 0.15s
 COPY (text): 0.10s
 COPY (binary): 0.10s

 What about 4 individual prepared inserts?  Just curious about it.

4 inserts, one prepared statement each (constructing the prepared
statement only once), in a single transaction: 1.68s

I'm surprised that there's any win here at all.

 Also, kind of suprised about COPY text vs. binary.  What was the data
 type in the table..?  If text, that makes sense, if it was an integer or
 something else, I'm kind of suprised.

Each row had one integer column.  I expect strings to be harder to
parse, since it's allocating buffers and parsing escapes, which is
usually more expensive than parsing an integer out of a string.  I'd
expect the difference to be negligible either way, though, and I'd be
interested in hearing about use cases where binary copying is enough
of a win to be worth the development cost of maintaining the feature.

On Wed, Apr 22, 2009 at 4:49 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Also, just to be clear: were the 40 insert cases 40 separate
 transactions or one transaction?  I think the latter was meant but
 it's not 100% clear.

One transaction--the only case where I ran more than one transaction
was the first.

-- 
Glenn Maynard

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance for high-volume log insertion

2009-04-22 Thread Glenn Maynard
On Wed, Apr 22, 2009 at 4:53 PM, James Mansion
ja...@mansionfamily.plus.com wrote:
 And I'm disagreeing with that.  Single row is a given, but I think you'll
 find it pays to have one
 round trip if at all possible and invoking multiple prepared statements can
 work against this.

You're talking about round-trips to a *local* server, on the same
system, not a dedicated server with network round-trips, right?

Blocking round trips to another process on the same server should be
fairly cheap--that is, writing to a socket (or pipe, or localhost TCP
connection) where the other side is listening for it; and then
blocking in return for the response.  The act of writing to an FD that
another process is waiting for will make the kernel mark the process
as ready to wake up immediately, and the act of blocking for the
response will kick the scheduler to some waiting process, so as long
as there isn't something else to compete for CPU for, each write/read
will wake up the other process instantly.  There's a task switching
cost, but that's too small to be relevant here.

Doing 100 local round trips, over a pipe: 5.25s (5 *microseconds*
each), code attached.  The cost *should* be essentially identical for
any local transport (pipes, named pipes, local TCP connections), since
the underlying scheduler mechanisms are the same.

That's not to say that batching inserts doesn't make a difference--it
clearly does, and it would probably be a much larger difference with
actual network round-trips--but round-trips to a local server aren't
inherently slow.  I'd be (casually) interested in knowing what the
actual costs are behind an SQL command round-trip (where the command
isn't blocking on I/O, eg. an INSERT inside a transaction, with no I/O
for things like constraint checks that need to be done before the
command can return success).

-- 
Glenn Maynard
#include stdio.h
#include unistd.h
#include assert.h

main()
{
int parent[2];
int ret = pipe(parent);
assert(ret != -1);

int child[2];
ret = pipe(child);
assert(ret != -1);

int pid = fork();
assert(pid != -1);

if(pid != 0)
{
// parent
close(parent[0]);
close(child[1]);
int wfd = parent[1];
int rfd = child[0];

printf(go\n);
int i = 100;
while(i--)
{
char c = 1;
ret = write(wfd, c, 1);
assert(ret == 1);
ret = read(rfd, c, 1);
assert(ret == 1);
}
printf(done\n);
}
else
{
// child
close(parent[1]);
close(child[0]);
int wfd = child[1];
int rfd = parent[0];
int i = 100;
while(i--)
{
char c = 1;
ret = read(rfd, c, 1);
assert(ret == 1);
ret = write(wfd, c, 1);
assert(ret == 1);
}
}
}


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance for high-volume log insertion

2009-04-22 Thread Stephen Frost
* Glenn Maynard (glennfmayn...@gmail.com) wrote:
  separate inserts, no transaction: 21.21s
  separate inserts, same transaction: 1.89s
  40 inserts, 100 rows/insert: 0.18s
  one 4-value insert: 0.16s
  40 prepared inserts, 100 rows/insert: 0.15s
  COPY (text): 0.10s
  COPY (binary): 0.10s
 
  What about 4 individual prepared inserts?  Just curious about it.
 
 4 inserts, one prepared statement each (constructing the prepared
 statement only once), in a single transaction: 1.68s
 
 I'm surprised that there's any win here at all.

For a single column table, I wouldn't expect much either.  With more
columns I think it would be a larger improvement.

 Each row had one integer column.  I expect strings to be harder to
 parse, since it's allocating buffers and parsing escapes, which is
 usually more expensive than parsing an integer out of a string.  I'd
 expect the difference to be negligible either way, though, and I'd be
 interested in hearing about use cases where binary copying is enough
 of a win to be worth the development cost of maintaining the feature.

I've seen it help, but I was sending everything as binary (I figured,
once I'm doing it, might as well do it all), which included dates,
timestamps, IP addresses, integers, and some text.  It may have more of
an impact on dates and timestamps than on simple integers.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] performance for high-volume log insertion

2009-04-22 Thread Glenn Maynard
On Wed, Apr 22, 2009 at 5:51 PM, Stephen Frost sfr...@snowman.net wrote:
 For a single column table, I wouldn't expect much either.  With more
 columns I think it would be a larger improvement.

Maybe.  I'm not sure why parsing (1,2,3,4,5) in an EXECUTE parameter
should be faster than parsing the exact same thing in an INSERT,
though.

 I've seen it help, but I was sending everything as binary (I figured,
 once I'm doing it, might as well do it all), which included dates,
 timestamps, IP addresses, integers, and some text.  It may have more of
 an impact on dates and timestamps than on simple integers.

Of course, you still need to get it in that format.  Be careful to
include any parsing you're doing to create the binary date in the
benchmarks.  Inevitably, at least part of the difference will be costs
simply moving from the psql process to your own.

-- 
Glenn Maynard

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance for high-volume log insertion

2009-04-22 Thread david

On Wed, 22 Apr 2009, Glenn Maynard wrote:


On Wed, Apr 22, 2009 at 4:53 PM, James Mansion
ja...@mansionfamily.plus.com wrote:

And I'm disagreeing with that.  Single row is a given, but I think you'll
find it pays to have one
round trip if at all possible and invoking multiple prepared statements can
work against this.


You're talking about round-trips to a *local* server, on the same
system, not a dedicated server with network round-trips, right?


the use-case for a production setup for logging servers would probably 
include a network hop.


David Lang


Blocking round trips to another process on the same server should be
fairly cheap--that is, writing to a socket (or pipe, or localhost TCP
connection) where the other side is listening for it; and then
blocking in return for the response.  The act of writing to an FD that
another process is waiting for will make the kernel mark the process
as ready to wake up immediately, and the act of blocking for the
response will kick the scheduler to some waiting process, so as long
as there isn't something else to compete for CPU for, each write/read
will wake up the other process instantly.  There's a task switching
cost, but that's too small to be relevant here.

Doing 100 local round trips, over a pipe: 5.25s (5 *microseconds*
each), code attached.  The cost *should* be essentially identical for
any local transport (pipes, named pipes, local TCP connections), since
the underlying scheduler mechanisms are the same.

That's not to say that batching inserts doesn't make a difference--it
clearly does, and it would probably be a much larger difference with
actual network round-trips--but round-trips to a local server aren't
inherently slow.  I'd be (casually) interested in knowing what the
actual costs are behind an SQL command round-trip (where the command
isn't blocking on I/O, eg. an INSERT inside a transaction, with no I/O
for things like constraint checks that need to be done before the
command can return success).



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance for high-volume log insertion

2009-04-22 Thread Stephen Frost
* Glenn Maynard (glennfmayn...@gmail.com) wrote:
 On Wed, Apr 22, 2009 at 5:51 PM, Stephen Frost sfr...@snowman.net wrote:
  For a single column table, I wouldn't expect much either.  With more
  columns I think it would be a larger improvement.
 
 Maybe.  I'm not sure why parsing (1,2,3,4,5) in an EXECUTE parameter
 should be faster than parsing the exact same thing in an INSERT,
 though.

Erm..  Prepared queries is about using PQexecPrepared(), not about
sending a text string as an SQL EXECUTE().  PQexecPrepared takes an
array of arguments.  That gets translated into a Bind command in the
protocol with a defined number of parameters and a length for each
parameter being passed.  That removes any need for scanning/parsing the
string sent to the backend.  That's the savings I'm referring to.

If you weren't using PQexecPrepared() (and using psql, you wouldn't
be..), then the difference you saw was more likely planning cost.

 Of course, you still need to get it in that format.  Be careful to
 include any parsing you're doing to create the binary date in the
 benchmarks.  Inevitably, at least part of the difference will be costs
 simply moving from the psql process to your own.

Sure.  What I recall from when I was working on this is that it wasn't
terribly hard to go from unix timestamps (epoch from 1970) to a PG
timestamp format (and there was nice example code in the backend) in
terms of CPU time.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] performance for high-volume log insertion

2009-04-22 Thread Stephen Frost
* da...@lang.hm (da...@lang.hm) wrote:
 On Wed, 22 Apr 2009, Glenn Maynard wrote:
 You're talking about round-trips to a *local* server, on the same
 system, not a dedicated server with network round-trips, right?

 the use-case for a production setup for logging servers would probably  
 include a network hop.

Sure, but there's a big difference between a rtt of 0.041ms (my dinky
home network) and 20ms (from my home network in Virginia to Boston).  I
wasn't intending to discount latency, it can be a concen, but I doubt
it'll be 20ms for most people..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] performance for high-volume log insertion

2009-04-22 Thread david

On Wed, 22 Apr 2009, Stephen Frost wrote:


* Glenn Maynard (glennfmayn...@gmail.com) wrote:

On Wed, Apr 22, 2009 at 5:51 PM, Stephen Frost sfr...@snowman.net wrote:

For a single column table, I wouldn't expect much either.  With more
columns I think it would be a larger improvement.


Maybe.  I'm not sure why parsing (1,2,3,4,5) in an EXECUTE parameter
should be faster than parsing the exact same thing in an INSERT,
though.


Erm..  Prepared queries is about using PQexecPrepared(), not about
sending a text string as an SQL EXECUTE().  PQexecPrepared takes an
array of arguments.  That gets translated into a Bind command in the
protocol with a defined number of parameters and a length for each
parameter being passed.  That removes any need for scanning/parsing the
string sent to the backend.  That's the savings I'm referring to.


are you sure? I thought that what goes out over the wire is always text.

David Lang


If you weren't using PQexecPrepared() (and using psql, you wouldn't
be..), then the difference you saw was more likely planning cost.


Of course, you still need to get it in that format.  Be careful to
include any parsing you're doing to create the binary date in the
benchmarks.  Inevitably, at least part of the difference will be costs
simply moving from the psql process to your own.


Sure.  What I recall from when I was working on this is that it wasn't
terribly hard to go from unix timestamps (epoch from 1970) to a PG
timestamp format (and there was nice example code in the backend) in
terms of CPU time.

Thanks,

Stephen


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance