date:20071205

[PATCHES] Wrong result with pgbench -C option?

2007-12-05 Thread Yoshiyuki Asaba

Hi,

I ran pgbench with -C option. Here is an output.

% pgbench -C  -c 10 -t 100  bench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 10
number of transactions per client: 100
number of transactions actually processed: 1000/1000
tps = 83.209663 (including connections establishing)
tps = 83.209663 (excluding connections establishing)

The first tps was equal to the second tps. I think it is wrong because
pgbench with -C option connects per transaction.

The attached patch calculates total connection time. Here is an output
with revised pgbench.

% ./pgbench -C  -c 10 -t 100  bench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 10
number of transactions per client: 100
number of transactions actually processed: 1000/1000
tps = 108.410991 (including connections establishing)
tps = 228.657561 (excluding connections establishing)

Regards,
--
Yoshiyuki Asaba
[EMAIL PROTECTED]
Index: pgbench.c
===
RCS file: /projects/cvsroot/pgsql/contrib/pgbench/pgbench.c,v
retrieving revision 1.74
diff -c -r1.74 pgbench.c
*** pgbench.c   15 Nov 2007 21:14:31 -  1.74
--- pgbench.c   6 Dec 2007 03:04:35 -
***
*** 185,190 
--- 185,223 
"SELECT abalance FROM accounts WHERE aid = :aid;\n"
  };
  
+ /* Connection overhead time */
+ static struct timeval conn_total_time = {0, 0};
+ 
+ /* Calculate total time */
+ static void
+ addTime(struct timeval *t1, struct timeval *t2, struct timeval *result)
+ {
+   int sec = t1->tv_sec + t2->tv_sec;
+   int usec = t1->tv_usec + t2->tv_usec;
+   if (usec >= 100)
+   {
+   usec -= 100;
+   sec++;
+   }
+   result->tv_sec = sec;
+   result->tv_usec = usec;
+ }
+ 
+ /* Calculate time difference */
+ static void
+ diffTime(struct timeval *t1, struct timeval *t2, struct timeval *result)
+ {
+   int sec = t1->tv_sec - t2->tv_sec;
+   int usec = t1->tv_usec - t2->tv_usec;
+   if (usec < 0)
+   {
+   usec += 100;
+   sec--;
+   }
+   result->tv_sec = sec;
+   result->tv_usec = usec;
+ }
+ 
  static void
  usage(void)
  {
***
*** 543,548 
--- 576,584 
  
if (st->con == NULL)
{
+   struct timeval t1, t2, t3;
+ 
+   gettimeofday(&t1, NULL);
if ((st->con = doConnect()) == NULL)
{
fprintf(stderr, "Client %d aborted in establishing 
connection.\n",
***
*** 552,557 
--- 588,596 
st->con = NULL;
return;
}
+   gettimeofday(&t2, NULL);
+   diffTime(&t2, &t1, &t3);
+   addTime(&conn_total_time, &t3, &conn_total_time);
}
  
if (use_log && st->state == 0)
***
*** 1162,1169 
  static void
  printResults(
 int ttype, CState * state,
!struct timeval * tv1, struct timeval * tv2,
!struct timeval * tv3)
  {
double  t1,
t2;
--- 1201,1207 
  static void
  printResults(
 int ttype, CState * state,
!struct timeval * start_time, struct timeval * end_time)
  {
double  t1,
t2;
***
*** 1174,1183 
for (i = 0; i < nclients; i++)
normal_xacts += state[i].cnt;
  
!   t1 = (tv3->tv_sec - tv1->tv_sec) * 100.0 + (tv3->tv_usec - 
tv1->tv_usec);
t1 = normal_xacts * 100.0 / t1;
  
!   t2 = (tv3->tv_sec - tv2->tv_sec) * 100.0 + (tv3->tv_usec - 
tv2->tv_usec);
t2 = normal_xacts * 100.0 / t2;
  
if (ttype == 0)
--- 1212,1222 
for (i = 0; i < nclients; i++)
normal_xacts += state[i].cnt;
  
!   t1 = (end_time->tv_sec - start_time->tv_sec) * 100.0 + 
(end_time->tv_usec - start_time->tv_usec);
t1 = normal_xacts * 100.0 / t1;
  
!   t2 = (end_time->tv_sec - start_time->tv_sec - conn_total_time.tv_sec) * 
100.0 +
!   (end_time->tv_usec - start_time->tv_usec - 
conn_total_time.tv_usec);
t2 = normal_xacts * 100.0 / t2;
  
if (ttype == 0)
***
*** 1213,1222 
  
CState *state;  /* status of clients */
  
!   struct timeval tv1; /* start up time */
!   struct timeval tv2; /* after establishing all 
connections to the
!* backend */
!   struct timeval tv3; /* end time */
  
int i;
  
--- 1252,1259 
  
CState *state;  /* status of clients */
  
!

Re: [PATCHES] Better default_statistics_target

2007-12-05 Thread Gregory Stark

"Chris Browne" <[EMAIL PROTECTED]> writes:

>  - Any columns marked "unique" could keep to having somewhat smaller
>numbers of bins in the histogram because we know that uniqueness
>will keep values dispersed at least somewhat.

I think you're on the wrong track. It's not dispersal that's significant but
how evenly the values are dispersed. If the values are evenly spread
throughout the region from low to high bound then we just need the single
bucket telling us the low and high bound and how many values there are. If
they're unevenly distributed then we need enough buckets to be able to
distinguish the dense areas from the sparse areas.

Perhaps something like starting with 1 bucket, splitting it into 2, seeing if
the distributions are similar in which case we stop. If not repeat for each
bucket.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PATCHES] Better default_statistics_target

2007-12-05 Thread Chris Browne

[EMAIL PROTECTED] ("Guillaume Smet") writes:
> On Dec 5, 2007 3:26 PM, Greg Sabino Mullane <[EMAIL PROTECTED]> wrote:
>> Agreed, this would be a nice 8.4 thing. But what about 8.3 and 8.2? Is
>> there a reason not to make this change? I know I've been lazy and not run
>> any absolute figures, but rough tests show that raising it (from 10 to
>> 100) results in a very minor increase in analyze time, even for large
>> databases. I think the burden of a slightly slower analyze time, which
>> can be easily adjusted, both in postgresql.conf and right before running
>> an analyze, is very small compared to the pain of some queries - which worked
>> before - suddenly running much, much slower for no apparent reason at all.
>
> As Tom stated it earlier, the ANALYZE slow down is far from being the
> only consequence. The planner will also have more work to do and
> that's the hard point IMHO.
>
> Without studying the impacts of this change on a large set of queries
> in different cases, it's quite hard to know for sure that it won't
> have a negative impact in a lot of cases.
>
> It's a bit too late in the cycle to change that IMHO, especially
> without any numbers.

I have the theory (thus far not borne out by any numbers) that it
might be a useful approach to try to go through the DB schema and use
what information is there to try to come up with better numbers on a
per-column basis.

As a "first order" perspective on things:

 - Any columns marked "unique" could keep to having somewhat smaller
   numbers of bins in the histogram because we know that uniqueness
   will keep values dispersed at least somewhat.

   Ditto for "SERIAL" types.

 - Columns NOT marked unique should imply adding some bins to the
   histogram.

 - Datestamps tend to imply temporal dispersion, ergo "somewhat fewer
   bins."  Similar for floats.

 - Discrete values (integer, text) frequently see less dispersion,
   -> "more bins"

Then could come a "second order" perspective, where data would
actually get sampled from pg_statistics.

 - If we look at the number of distinct histogram bins used, for a
   particular column, and find that there are some not used, we might
   drop bins.

 - We might try doing some summary statistics to see how many unique
   values there actually are, on each column, and increase the number of
   bins if they're all in use, and there are other values that *are*
   frequently used.

   Maybe cheaper, if we find that pg_statistics tells us that all bins
   are in use, and extrapolation shows that there's a lot of the table
   NOT represented, we increase the number of bins.

There might even be a "third order" analysis, where you'd try to
collect additional data from the table, and analytically try to
determine appropriate numbers of bins...

Thus, we don't have a universal increase in the amount of statistics
collected - the added stats are localized to places where there is
some reason to imagine them useful.
-- 
let name="cbbrowne" and tld="acm.org" in String.concat "@" [name;tld];;
http://cbbrowne.com/info/nonrdbms.html
There was a young lady of Crewe
Whose limericks stopped at line two. 

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PATCHES] PQParam version 0.5

2007-12-05 Thread Andrew Chernow


Merlin Moncure wrote:

On Dec 5, 2007 2:44 PM, Alvaro Herrera <[EMAIL PROTECTED]> wrote:

Andrew Chernow escribió:


Also changed PQputint8's prototype.  Previously, it was using a void* as
the value argument, due to a lack of a portable 64-bit type in libpq. We
found an intersting way around this by using macro and variable argument
tricks.

I didn't read the patch, but variadic macros are not portable.  FWIW
uint64 should "portable" to all platforms that have it (and it should be
32 bits on platforms that don't), but you have to watch for
INT64_IS_BUSTED.


we don't use variadic macros...just a macro wrapper to a variadic function.

merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq




Taken from libpq-fe.h

#define PQputint8(conn, i8) PQputint8v(conn, sizeof(i8), i8)
/* Function subject to change. Do not use directly, see PQputint8. */
extern int PQputint8v(PGconn *conn, size_t valsize, ...);

// goal was pass by value, not by ptr, which was our first solution
PQputint8(conn, 12345678912345LL);

The problem is libpq has no public 64-bit data type to use with the 
PQputint8 prototype.  But! if we make PQputint8 a macro that wraps a 
variadic function, we get around the data type issue.


Since libpq doesn't have a public 64-bit portable data type, we felt 
this was done for a good reason.  We didn't want to break that convention.


andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PATCHES] PQParam version 0.5

2007-12-05 Thread Merlin Moncure

On Dec 5, 2007 2:44 PM, Alvaro Herrera <[EMAIL PROTECTED]> wrote:
> Andrew Chernow escribió:
>
> > Also changed PQputint8's prototype.  Previously, it was using a void* as
> > the value argument, due to a lack of a portable 64-bit type in libpq. We
> > found an intersting way around this by using macro and variable argument
> > tricks.
>
> I didn't read the patch, but variadic macros are not portable.  FWIW
> uint64 should "portable" to all platforms that have it (and it should be
> 32 bits on platforms that don't), but you have to watch for
> INT64_IS_BUSTED.

we don't use variadic macros...just a macro wrapper to a variadic function.

merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PATCHES] PQParam version 0.5

2007-12-05 Thread Alvaro Herrera

Andrew Chernow escribió:

> Also changed PQputint8's prototype.  Previously, it was using a void* as 
> the value argument, due to a lack of a portable 64-bit type in libpq. We 
> found an intersting way around this by using macro and variable argument 
> tricks.

I didn't read the patch, but variadic macros are not portable.  FWIW
uint64 should "portable" to all platforms that have it (and it should be
32 bits on platforms that don't), but you have to watch for
INT64_IS_BUSTED.

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J
"I suspect most samba developers are already technically insane...
Of course, since many of them are Australians, you can't tell." (L. Torvalds)

---(end of broadcast)---
TIP 6: explain analyze is your friend

[PATCHES] PQParam version 0.5

2007-12-05 Thread Andrew Chernow

Here is the lastest pgparam patch.  It is patched against a fresh 
checkout on 2007-12-05.


This release adds support for printf-style param puts/execs.  Instead of 
having to do a PQputX for each param, you can use a format string and 
put multiple params.  PQputf(), PQparamExecf() and PQparamSendf() 
support this.  See release notes and conversion specifiers for details.


Also changed PQputint8's prototype.  Previously, it was using a void* as 
the value argument, due to a lack of a portable 64-bit type in libpq. 
We found an intersting way around this by using macro and variable 
argument tricks.


Andrew & Merlin


pgparam-0.5-patch.tgz
Description: application/compressed

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PATCHES] Better default_statistics_target

2007-12-05 Thread Guillaume Smet

On Dec 5, 2007 3:26 PM, Greg Sabino Mullane <[EMAIL PROTECTED]> wrote:
> Agreed, this would be a nice 8.4 thing. But what about 8.3 and 8.2? Is
> there a reason not to make this change? I know I've been lazy and not run
> any absolute figures, but rough tests show that raising it (from 10 to
> 100) results in a very minor increase in analyze time, even for large
> databases. I think the burden of a slightly slower analyze time, which
> can be easily adjusted, both in postgresql.conf and right before running
> an analyze, is very small compared to the pain of some queries - which worked
> before - suddenly running much, much slower for no apparent reason at all.

As Tom stated it earlier, the ANALYZE slow down is far from being the
only consequence. The planner will also have more work to do and
that's the hard point IMHO.

Without studying the impacts of this change on a large set of queries
in different cases, it's quite hard to know for sure that it won't
have a negative impact in a lot of cases.

It's a bit too late in the cycle to change that IMHO, especially
without any numbers.

--
Guillaume

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PATCHES] Better default_statistics_target

2007-12-05 Thread Greg Sabino Mullane


-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


Simon spoke:
> The choice of 100 is because of the way the LIKE estimator is
> configured. Greg is not suggesting he measured it and found 100 to be
> best, he is saying that the LIKE operator is hard-coded at 100 and so
> the stats_target should reflect that.

Exactly.

> Setting it to 100 for all columns because of LIKE doesn't make much
> sense. I think we should set stats target differently depending upon the
> data type, but thats probably an 8.4 thing. Long text fields that might
> use LIKE should be set to 100. CHAR(1) and general fields should be set
> to 10.

Agreed, this would be a nice 8.4 thing. But what about 8.3 and 8.2? Is 
there a reason not to make this change? I know I've been lazy and not run 
any absolute figures, but rough tests show that raising it (from 10 to 
100) results in a very minor increase in analyze time, even for large 
databases. I think the burden of a slightly slower analyze time, which 
can be easily adjusted, both in postgresql.conf and right before running 
an analyze, is very small compared to the pain of some queries - which worked 
before - suddenly running much, much slower for no apparent reason at all.
Sure, 100 may have been chosen somewhat arbitrarily for the LIKE thing, 
but this is a current real-world performance regression (aka a bug, 
according to a nearby thread). Almost everyone agrees that 10 is too low, 
so why not make it 100, throw a big warning in the release notes, and 
then start some serious re-evaluation for 8.4?


- --
Greg Sabino Mullane [EMAIL PROTECTED]
End Point Corporation
PGP Key: 0x14964AC8 200712050920
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8


-BEGIN PGP SIGNATURE-

iD8DBQFHVrSivJuQZxSWSsgRAyDNAKCInH9SJRO8ly1L1MomJUPlBslBlgCeLQ1v
+w4ZumRcB5U5L3SGT0rk4AE=
=I8Ur
-END PGP SIGNATURE-



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

[PATCHES] Wrong result with pgbench -C option?

Re: [PATCHES] Better default_statistics_target

Re: [PATCHES] Better default_statistics_target

Re: [PATCHES] PQParam version 0.5

Re: [PATCHES] PQParam version 0.5

Re: [PATCHES] PQParam version 0.5

[PATCHES] PQParam version 0.5

Re: [PATCHES] Better default_statistics_target

Re: [PATCHES] Better default_statistics_target

9 matches

Site Navigation

Mail list logo

Footer information