Re: [HACKERS] [PERFORM] pgbench to the MAXINT

2013-01-29 Thread Heikki Linnakangas

On 28.01.2013 23:30, Gurjeet Singh wrote:

On Sat, Jan 26, 2013 at 11:24 PM, Satoshi Nagayasusn...@uptime.jp  wrote:


2012/12/21 Gurjeet Singhsingh.gurj...@gmail.com:

 The patch is very much what you had posted, except for a couple of
differences due to bit-rot. (i) I didn't have to #define

MAX_RANDOM_VALUE64

since its cousin MAX_RANDOM_VALUE is not used by code anymore, and (ii) I
used ternary operator in DDLs[] array to decide when to use bigint vs int
columns.

 Please review.

 As for tests, I am currently running 'pgbench -i -s 21474' using
unpatched pgbench, and am recording the time taken;Scale factor 21475 had
actually failed to do anything meaningful using unpatched pgbench. Next

I'll

run with '-s 21475' on patched version to see if it does the right thing,
and in acceptable time compared to '-s 21474'.

 What tests would you and others like to see, to get some confidence

in

the patch? The machine that I have access to has 62 GB RAM, 16-core
64-hw-threads, and about 900 GB of disk space.


I have tested this patch, and hvae confirmed that the columns
for aid would be switched to using bigint, instead of int,
when the scalefactor= 20,000.
(aid columns would exeed the upper bound of int when sf21474.)

Also, I added a few fixes on it.

- Fixed to apply for the current git master.
- Fixed to surpress few more warnings about INT64_FORMAT.
- Minor improvement in the docs. (just my suggestion)

I attached the revised one.


Looks good to me. Thanks!


Ok, committed.

At some point, we might want to have a strtoll() implementation in src/port.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pgbench to the MAXINT

2013-01-28 Thread Gurjeet Singh
On Sat, Jan 26, 2013 at 11:24 PM, Satoshi Nagayasu sn...@uptime.jp wrote:

 Hi,

 I have reviewed this patch.

 https://commitfest.postgresql.org/action/patch_view?id=1068

 2012/12/21 Gurjeet Singh singh.gurj...@gmail.com:
  The patch is very much what you had posted, except for a couple of
  differences due to bit-rot. (i) I didn't have to #define
 MAX_RANDOM_VALUE64
  since its cousin MAX_RANDOM_VALUE is not used by code anymore, and (ii) I
  used ternary operator in DDLs[] array to decide when to use bigint vs int
  columns.
 
  Please review.
 
  As for tests, I am currently running 'pgbench -i -s 21474' using
  unpatched pgbench, and am recording the time taken;Scale factor 21475 had
  actually failed to do anything meaningful using unpatched pgbench. Next
 I'll
  run with '-s 21475' on patched version to see if it does the right thing,
  and in acceptable time compared to '-s 21474'.
 
  What tests would you and others like to see, to get some confidence
 in
  the patch? The machine that I have access to has 62 GB RAM, 16-core
  64-hw-threads, and about 900 GB of disk space.

 I have tested this patch, and hvae confirmed that the columns
 for aid would be switched to using bigint, instead of int,
 when the scalefactor = 20,000.
 (aid columns would exeed the upper bound of int when sf21474.)

 Also, I added a few fixes on it.

 - Fixed to apply for the current git master.
 - Fixed to surpress few more warnings about INT64_FORMAT.
 - Minor improvement in the docs. (just my suggestion)

 I attached the revised one.


Looks good to me. Thanks!

-- 
Gurjeet Singh

http://gurjeet.singh.im/


Re: [HACKERS] [PERFORM] pgbench to the MAXINT

2013-01-26 Thread Satoshi Nagayasu
Hi,

I have reviewed this patch.

https://commitfest.postgresql.org/action/patch_view?id=1068

2012/12/21 Gurjeet Singh singh.gurj...@gmail.com:
 The patch is very much what you had posted, except for a couple of
 differences due to bit-rot. (i) I didn't have to #define MAX_RANDOM_VALUE64
 since its cousin MAX_RANDOM_VALUE is not used by code anymore, and (ii) I
 used ternary operator in DDLs[] array to decide when to use bigint vs int
 columns.

 Please review.

 As for tests, I am currently running 'pgbench -i -s 21474' using
 unpatched pgbench, and am recording the time taken;Scale factor 21475 had
 actually failed to do anything meaningful using unpatched pgbench. Next I'll
 run with '-s 21475' on patched version to see if it does the right thing,
 and in acceptable time compared to '-s 21474'.

 What tests would you and others like to see, to get some confidence in
 the patch? The machine that I have access to has 62 GB RAM, 16-core
 64-hw-threads, and about 900 GB of disk space.

I have tested this patch, and hvae confirmed that the columns
for aid would be switched to using bigint, instead of int,
when the scalefactor = 20,000.
(aid columns would exeed the upper bound of int when sf21474.)

Also, I added a few fixes on it.

- Fixed to apply for the current git master.
- Fixed to surpress few more warnings about INT64_FORMAT.
- Minor improvement in the docs. (just my suggestion)

I attached the revised one.

Regards,
-- 
Satoshi Nagayasu sn...@uptime.jp
Uptime Technologies, LLC http://www.uptime.jp/


pgbench-64-v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pgbench to the MAXINT

2012-12-20 Thread Gurjeet Singh
On Wed, Feb 16, 2011 at 8:15 AM, Greg Smith g...@2ndquadrant.com wrote:

 Tom Lane wrote:

 I think that might be a good idea --- it'd reduce the cross-platform
 variability of the results quite a bit, I suspect.  random() is not
 to be trusted everywhere, but I think erand48 is pretty much the same
 wherever it exists at all (and src/port/ provides it elsewhere).



 Given that pgbench will run with threads in some multi-worker
 configurations, after some more portability research I think odds are good
 we'd get nailed by 
 http://sourceware.org/**bugzilla/show_bug.cgi?id=10320http://sourceware.org/bugzilla/show_bug.cgi?id=10320:
  erand48 implementation not thread safe but POSIX says it should be.
  The AIX docs have a similar warning on them, so who knows how many
 versions of that library have the same issue.

 Maybe we could make sure the one in src/port/ is thread safe and make sure
 pgbench only uses it.  This whole area continues to be messy enough that I
 think the patch needs to brew for another CF before it will all be sorted
 out properly.  I'll mark it accordingly and can pick this back up later.


Hi Greg,

I spent some time rebasing this patch to current master. Attached is
the patch, based on master couple of commits old.

Your concern of using erand48() has been resolved since pgbench now
uses thread-safe and concurrent pg_erand48() from src/port/.

The patch is very much what you had posted, except for a couple of
differences due to bit-rot. (i) I didn't have to #define MAX_RANDOM_VALUE64
since its cousin MAX_RANDOM_VALUE is not used by code anymore, and (ii) I
used ternary operator in DDLs[] array to decide when to use bigint vs int
columns.

Please review.

As for tests, I am currently running 'pgbench -i -s 21474' using
unpatched pgbench, and am recording the time taken;Scale factor 21475 had
actually failed to do anything meaningful using unpatched pgbench. Next
I'll run with '-s 21475' on patched version to see if it does the right
thing, and in acceptable time compared to '-s 21474'.

What tests would you and others like to see, to get some confidence in
the patch? The machine that I have access to has 62 GB RAM, 16-core
64-hw-threads, and about 900 GB of disk space.

Linux host 3.2.6-3.fc16.ppc64 #1 SMP Fri Feb 17 21:41:20 UTC 2012 ppc64
ppc64 ppc64 GNU/Linux

Best regards,

PS: The primary source of patch is this branch:
https://github.com/gurjeet/postgres/tree/64bit_pgbench
-- 
Gurjeet Singh

http://gurjeet.singh.im/


pgbencg-64-v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pgbench to the MAXINT

2011-02-16 Thread Greg Smith

Tom Lane wrote:

I think that might be a good idea --- it'd reduce the cross-platform
variability of the results quite a bit, I suspect.  random() is not
to be trusted everywhere, but I think erand48 is pretty much the same
wherever it exists at all (and src/port/ provides it elsewhere).
  


Given that pgbench will run with threads in some multi-worker 
configurations, after some more portability research I think odds are 
good we'd get nailed by 
http://sourceware.org/bugzilla/show_bug.cgi?id=10320 : erand48 
implementation not thread safe but POSIX says it should be.  The AIX 
docs have a similar warning on them, so who knows how many versions of 
that library have the same issue.


Maybe we could make sure the one in src/port/ is thread safe and make 
sure pgbench only uses it.  This whole area continues to be messy enough 
that I think the patch needs to brew for another CF before it will all 
be sorted out properly.  I'll mark it accordingly and can pick this back 
up later.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pgbench to the MAXINT

2011-02-16 Thread Tom Lane
Greg Smith g...@2ndquadrant.com writes:
 Given that pgbench will run with threads in some multi-worker 
 configurations, after some more portability research I think odds are 
 good we'd get nailed by 
 http://sourceware.org/bugzilla/show_bug.cgi?id=10320 : erand48 
 implementation not thread safe but POSIX says it should be.  The AIX 
 docs have a similar warning on them, so who knows how many versions of 
 that library have the same issue.

FWIW, I think that bug report is effectively complaining that if you use
both drand48 and erand48, the former can impact the latter.  If you use
only erand48, I don't see that there's any problem.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pgbench to the MAXINT

2011-02-15 Thread Robert Haas
On Fri, Feb 11, 2011 at 8:35 AM, Stephen Frost sfr...@snowman.net wrote:
 Greg,

 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 Greg Smith g...@2ndquadrant.com writes:
  Poking around a bit more, I just discovered another possible approach is
  to use erand48 instead of rand in pgbench, which is either provided by
  the OS or emulated in src/port/erand48.c  That's way more resolution
  than needed here, given that 2^48 pgbench accounts would be a scale of
  2.8M, which makes for a database of about 42 petabytes.

 I think that might be a good idea --- it'd reduce the cross-platform
 variability of the results quite a bit, I suspect.  random() is not
 to be trusted everywhere, but I think erand48 is pretty much the same
 wherever it exists at all (and src/port/ provides it elsewhere).

 Works for me.  Greg, will you be able to work on this change?  If not, I
 might be able to.

Seeing as how this patch has not been updated, I think it's time to
mark this one Returned with Feedback.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pgbench to the MAXINT

2011-02-11 Thread Stephen Frost
Greg,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Greg Smith g...@2ndquadrant.com writes:
  Poking around a bit more, I just discovered another possible approach is 
  to use erand48 instead of rand in pgbench, which is either provided by 
  the OS or emulated in src/port/erand48.c  That's way more resolution 
  than needed here, given that 2^48 pgbench accounts would be a scale of 
  2.8M, which makes for a database of about 42 petabytes.
 
 I think that might be a good idea --- it'd reduce the cross-platform
 variability of the results quite a bit, I suspect.  random() is not
 to be trusted everywhere, but I think erand48 is pretty much the same
 wherever it exists at all (and src/port/ provides it elsewhere).

Works for me.  Greg, will you be able to work on this change?  If not, I
might be able to.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PERFORM] pgbench to the MAXINT

2011-02-10 Thread Greg Smith

Stephen Frost wrote:

Just wondering, did you consider just calling random() twice and
smashing the result together..?
  


I did.  The problem is that even within the 32 bits that random() 
returns, it's not uniformly distributed.  Combining two of them isn't 
really going to solve the distribution problem, just move it around.  
Some number of lower-order bits are less random than the others, and 
which they are is implementation dependent.


Poking around a bit more, I just discovered another possible approach is 
to use erand48 instead of rand in pgbench, which is either provided by 
the OS or emulated in src/port/erand48.c  That's way more resolution 
than needed here, given that 2^48 pgbench accounts would be a scale of 
2.8M, which makes for a database of about 42 petabytes.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pgbench to the MAXINT

2011-02-10 Thread Tom Lane
Greg Smith g...@2ndquadrant.com writes:
 Poking around a bit more, I just discovered another possible approach is 
 to use erand48 instead of rand in pgbench, which is either provided by 
 the OS or emulated in src/port/erand48.c  That's way more resolution 
 than needed here, given that 2^48 pgbench accounts would be a scale of 
 2.8M, which makes for a database of about 42 petabytes.

I think that might be a good idea --- it'd reduce the cross-platform
variability of the results quite a bit, I suspect.  random() is not
to be trusted everywhere, but I think erand48 is pretty much the same
wherever it exists at all (and src/port/ provides it elsewhere).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pgbench to the MAXINT

2011-02-09 Thread Greg Smith
Attached is an updated 64-bit pgbench patch that works as expected for 
all of the most common pgbench operations, including support for scales 
above the previous boundary of just over 21,000.  Here's the patched 
version running against a 303GB database with a previously unavailable 
scale factor:


$ pgbench -T 300 -j 2 -c 4 pgbench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 25000
query mode: simple
number of clients: 4
number of threads: 2
duration: 300 s
number of transactions actually processed: 21681
tps = 72.24 (including connections establishing)
tps = 72.250610 (excluding connections establishing)

And some basic Q/A that the values it touched were in the right range:

$ psql -d pgbench -c select min(aid),max(aid) from pgbench_accounts;

min |max
-+

  1 | 25

$ psql -d pgbench -c select min(aid),max(aid),count(*) from 
pgbench_accounts where abalance!=0 


 min  |max | count
---++---
51091 | 2499989587 | 21678

(This system was doing 300MB/s on reads while executing that count, and 
it still took 19 minutes)


The clever way Euler updated the patch, you don't pay for the larger 
on-disk data (bigint columns) unless you use a range that requires it, 
which greatly reduces the number of ways the test results can suffer 
from this change.  I felt the way that was coded was a bit more 
complicated than it needed to be though, as it made where that switch 
happened at get computed at runtime based on the true size of the 
integers.  I took that complexity out and just put a hard line in there 
instead:  if scale=2, you get bigints.  That's not very different 
from the real limit, and it made documenting when the switch happens 
easy to write and to remember.


The main performance concern with this change was whether using int64 
more internally for computations would slow things down on a 32-bit 
system.  I thought I'd test that on my few years old laptop.  It turns 
out that even though I've been running an i386 Linux on here, it's 
actually a 64-bit CPU.  (I think that it has a 32-bit install may be an 
artifact of Adobe Flash install issues, sadly)  So this may not be as 
good of a test case as I'd hoped.  Regardless, running a test aimed to 
stress simple SELECTs, the thing I'd expect to suffer most from 
additional CPU overhead, didn't show any difference in performance:


$ createdb pgbench
$ pgbench -i -s 10 pgbench
$ psql -c show shared_buffers
shared_buffers

256MB
(1 row)
$ pgbench -S -j 2 -c 4 -T 60 pgbench

i386x86_64
69326924   
69236926   
69236922   
66886772   
69146791   
69026916   
69176909   
69436837   
66896744   
  
66886744min

69436926max
68706860average

Given the noise level of pgbench tests, I'm happy saying that is the 
same speed.  I suspect the real overhead in pgbench's processing relates 
to how it is constantly parsing text to turn them into statements, and 
that how big the integers it uses are is barley detectable over that.


So...where does that leave this patch?  I feel that pgbench will become 
less relevant very quickly in 9.1 unless something like this is 
committed.  And there don't seem to be significant downsides to this in 
terms of performance.  There are however a few rough points left in here 
that might raise concern:


1) A look into the expected range of the rand() function suggests the 
glibc implementation normally proves 30 bits of resolution, so about 1 
billion numbers.  You'll have 1B rows in a pgbench database once the 
scale goes over 10,000.  So without a major overhaul of how random 
number generation is treated here, people can expect the distribution of 
rows touched by a test run to get less even once the database scale gets 
very large.  I added another warning paragraph to the end of the docs in 
this update to mention this.  Long-term, I suspect we may need to adopt 
a superior 64-bit RNG approach, something like a Mersenne Twister 
perhaps.  That's a bit more than can be chewed on during 9.1 development 
though.


2) I'd rate odds are good there's one or more corner-case bugs in 
\setrandom or \setshell I haven't found yet, just from the way that code 
was converted.  Those have some changes I haven't specifically tested 
exhaustively yet.  I don't see any issues when running the most common 
two pgbench tests, but that's doesn't mean every part of that 32 - 64 
bit conversion was done correctly.


Given how I use pgbench, for data generation and rough load testing, I'd 
say neither of those concerns outweights the need to expand the size 
range of this program.  I would be happy to see this go in, followed by 
some alpha and beta testing aimed to see if any of the rough spots I'm 
concerned about actually appear.  Unfortunately I can't fit all of those 
tests in right now, as throwing around one of these 300GB data sets is 
painful--when you're only 

Re: [HACKERS] [PERFORM] pgbench to the MAXINT

2011-02-09 Thread Stephen Frost
Greg,

* Greg Smith (g...@2ndquadrant.com) wrote:
 I took that complexity out and just put a hard line
 in there instead:  if scale=2, you get bigints.  That's not
 very different from the real limit, and it made documenting when the
 switch happens easy to write and to remember.

Agreed completely on this.

 It turns out that even though I've been running an i386 Linux on
 here, it's actually a 64-bit CPU.  (I think that it has a 32-bit
 install may be an artifact of Adobe Flash install issues, sadly)  So
 this may not be as good of a test case as I'd hoped.  

Actually, I would think it'd still be sufficient..  If you're under a
32bit kernel you're not going to be using the extended registers, etc,
that would be available under a 64bit kernel..  That said, the idea that
we should care about 32-bit systems these days, in a benchmarking tool,
is, well, silly, imv.

 1) A look into the expected range of the rand() function suggests
 the glibc implementation normally proves 30 bits of resolution, so
 about 1 billion numbers.  You'll have 1B rows in a pgbench database
 once the scale goes over 10,000.  So without a major overhaul of how
 random number generation is treated here, people can expect the
 distribution of rows touched by a test run to get less even once the
 database scale gets very large.  

Just wondering, did you consider just calling random() twice and
smashing the result together..?

 I added another warning paragraph
 to the end of the docs in this update to mention this.  Long-term, I
 suspect we may need to adopt a superior 64-bit RNG approach,
 something like a Mersenne Twister perhaps.  That's a bit more than
 can be chewed on during 9.1 development though.

I tend to agree that we should be able to improve the random number
generation in the future.  Additionally, imv, we should be able to say
pg_bench version X isn't comparable to version Y in the release notes
or something, or have seperate version #s for it which make it clear
what can be compared to each other and what can't.  Painting ourselves
into a corner by saying we can't ever make pgbench generate results that
can't be compared to every other released version of pgbench just isn't
practical.

 2) I'd rate odds are good there's one or more corner-case bugs in
 \setrandom or \setshell I haven't found yet, just from the way that
 code was converted.  Those have some changes I haven't specifically
 tested exhaustively yet.  I don't see any issues when running the
 most common two pgbench tests, but that's doesn't mean every part of
 that 32 - 64 bit conversion was done correctly.

I'll take a look. :)

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PERFORM] pgbench to the MAXINT

2011-02-07 Thread Greg Smith
The update on the work to push towards a bigger pgbench is that I now 
have the patch running and generating databases larger than any 
previously possible scale:


$ time pgbench -i -s 25000 pgbench
...
25 tuples done.
...
real258m46.350s
user14m41.970s
sys0m21.310s

$ psql -d pgbench -c select 
pg_size_pretty(pg_relation_size('pgbench_accounts'));

pg_size_pretty

313 GB

$ psql -d pgbench -c select 
pg_size_pretty(pg_relation_size('pgbench_accounts_pkey'));

pg_size_pretty

52 GB

$ time psql -d pgbench -c select count(*) from pgbench_accounts
  count   


25

real18m48.363s
user0m0.010s
sys0m0.000s

The only thing wrong with the patch sent already needed to reach this 
point was this line:


for (k = 0; k  naccounts * scale; k++)

Which needed a (int64) cast for the multiplied value in the middle there.

Unfortunately the actual test itself doesn't run yet.  Every line I see 
when running the SELECT-only test says:


client 0 sending SELECT abalance FROM pgbench_accounts WHERE aid = 1;

So something about the updated random generation code isn't quite right 
yet.  Now that I have this monster built, I'm going to leave it on the 
server until I can sort that out, which hopefully will finish up in the 
next day or so.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pgbench to the MAXINT

2011-02-03 Thread Greg Smith

Robert Haas wrote:

At least in my book, we need to get this committed in the next two
weeks, or wait for 9.2.
  


Yes, I was just suggesting that I was not going to get started in the 
first week or two given the other pgbench related tests I had queued up 
already.  Those are closing up nicely, and I'll start testing 
performance of this change over the weekend.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pgbench to the MAXINT

2011-01-30 Thread Robert Haas
On Tue, Jan 18, 2011 at 1:42 PM, Greg Smith g...@2ndquadrant.com wrote:
 Thanks for picking this up again and finishing the thing off.  I'll add this
 into my queue of performance tests to run and we can see if this is worth
 applying.  Probably take a little longer than the usual CF review time.  But
 as this doesn't interfere with other code people are working on and is sort
 of a bug fix, I don't think it will be a problem if it takes a little longer
 to get this done.

At least in my book, we need to get this committed in the next two
weeks, or wait for 9.2.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pgbench to the MAXINT

2011-01-18 Thread Greg Smith

Euler Taveira de Oliveira wrote:
(i) If we want to support and scale factor greater than 21474 we have 
to convert some columns to bigint; it will change the test. From the 
portability point it is a pity but as we have never supported it I'm 
not too worried about it. Why? Because it will use bigint columns only 
if the scale factor is greater than 21474. Is it a problem? I don't 
think so because generally people compare tests with the same scale 
factor.


(ii) From the performance perspective, we need to test if the 
modifications don't impact performance. I don't create another code 
path for 64-bit modifications (it is too ugly) and I'm afraid some 
modifications affect the 32-bit performance. I'm in a position to test 
it though because I don't have a big machine ATM. Greg, could you lead 
these tests?


(iii) I decided to copy scanint8() (called strtoint64 there) from 
backend (Robert suggestion [1]) because Tom pointed out that strtoll() 
has portability issues. I replaced atoi() with strtoint64() but didn't 
do any performance tests.


(i):  Completely agreed.

(ii):  There is no such thing as a big machine that is 32 bits now; 
anything that's 32 is a tiny system here in 2011.  What I can do is 
check for degredation on the only 32-bit system I have left here, my 
laptop.  I'll pick a sensitive test case and take a look.


(iii) This is an important thing to test, particularly given it has the 
potential to impact 64-bit results too.


Thanks for picking this up again and finishing the thing off.  I'll add 
this into my queue of performance tests to run and we can see if this is 
worth applying.  Probably take a little longer than the usual CF review 
time.  But as this doesn't interfere with other code people are working 
on and is sort of a bug fix, I don't think it will be a problem if it 
takes a little longer to get this done.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pgbench to the MAXINT

2011-01-11 Thread Euler Taveira de Oliveira

Em 10-01-2011 05:25, Greg Smith escreveu:

Euler Taveira de Oliveira wrote:

Em 07-01-2011 22:59, Greg Smith escreveu:

setrandom: invalid maximum number -2147467296


It is failing at atoi() circa pgbench.c:1036. But it just the first
one. There are some variables and constants that need to be converted
to int64 and some functions that must speak 64-bit such as getrand().
Are you working on a patch?


http://archives.postgresql.org/pgsql-hackers/2010-01/msg02868.php
http://archives.postgresql.org/message-id/4c326f46.4050...@2ndquadrant.com

Greg, I just improved your patch. I tried to work around the problems pointed 
out in the above threads. Also, I want to raise some points:


(i) If we want to support and scale factor greater than 21474 we have to 
convert some columns to bigint; it will change the test. From the portability 
point it is a pity but as we have never supported it I'm not too worried about 
it. Why? Because it will use bigint columns only if the scale factor is 
greater than 21474. Is it a problem? I don't think so because generally people 
compare tests with the same scale factor.


(ii) From the performance perspective, we need to test if the modifications 
don't impact performance. I don't create another code path for 64-bit 
modifications (it is too ugly) and I'm afraid some modifications affect the 
32-bit performance. I'm in a position to test it though because I don't have a 
big machine ATM. Greg, could you lead these tests?


(iii) I decided to copy scanint8() (called strtoint64 there) from backend 
(Robert suggestion [1]) because Tom pointed out that strtoll() has portability 
issues. I replaced atoi() with strtoint64() but didn't do any performance tests.


Comments?


[1] http://archives.postgresql.org/pgsql-hackers/2010-07/msg00173.php


--
  Euler Taveira de Oliveira
  http://www.timbira.com/
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 7c2ca6e..e9eb720 100644
*** a/contrib/pgbench/pgbench.c
--- b/contrib/pgbench/pgbench.c
***
*** 60,65 
--- 60,67 
  #define INT64_MAX	INT64CONST(0x7FFF)
  #endif
  
+ #define MAX_RANDOM_VALUE64	INT64_MAX
+ 
  /*
   * Multi-platform pthread implementations
   */
*** usage(const char *progname)
*** 364,378 
  		   progname, progname);
  }
  
  /* random number generator: uniform distribution from min to max inclusive */
! static int
! getrand(int min, int max)
  {
  	/*
  	 * Odd coding is so that min and max have approximately the same chance of
  	 * being selected as do numbers between them.
  	 */
! 	return min + (int) (((max - min + 1) * (double) random()) / (MAX_RANDOM_VALUE + 1.0));
  }
  
  /* call PQexec() and exit() on failure */
--- 366,451 
  		   progname, progname);
  }
  
+ /*
+  * strtoint64 -- convert a string to 64-bit integer
+  *
+  * this function is a modified version of scanint8() from
+  * src/backend/utils/adt/int8.c.
+  *
+  * XXX should it have a return value?
+  *
+  */
+ static int64
+ strtoint64(const char *str)
+ {
+ 	const char *ptr = str;
+ 	int64		result = 0;
+ 	int			sign = 1;
+ 
+ 	/*
+ 	 * Do our own scan, rather than relying on sscanf which might be broken
+ 	 * for long long.
+ 	 */
+ 
+ 	/* skip leading spaces */
+ 	while (*ptr  isspace((unsigned char) *ptr))
+ 		ptr++;
+ 
+ 	/* handle sign */
+ 	if (*ptr == '-')
+ 	{
+ 		ptr++;
+ 
+ 		/*
+ 		 * Do an explicit check for INT64_MIN.	Ugly though this is, it's
+ 		 * cleaner than trying to get the loop below to handle it portably.
+ 		 */
+ 		if (strncmp(ptr, 9223372036854775808, 19) == 0)
+ 		{
+ 			result = -INT64CONST(0x7fff) - 1;
+ 			ptr += 19;
+ 			goto gotdigits;
+ 		}
+ 		sign = -1;
+ 	}
+ 	else if (*ptr == '+')
+ 		ptr++;
+ 
+ 	/* require at least one digit */
+ 	if (!isdigit((unsigned char) *ptr))
+ 		fprintf(stderr, invalid input syntax for integer: \%s\\n, str);
+ 
+ 	/* process digits */
+ 	while (*ptr  isdigit((unsigned char) *ptr))
+ 	{
+ 		int64		tmp = result * 10 + (*ptr++ - '0');
+ 
+ 		if ((tmp / 10) != result)		/* overflow? */
+ 			fprintf(stderr, value \%s\ is out of range for type bigint\n, str);
+ 		result = tmp;
+ 	}
+ 
+ gotdigits:
+ 
+ 	/* allow trailing whitespace, but not other trailing chars */
+ 	while (*ptr != '\0'  isspace((unsigned char) *ptr))
+ 		ptr++;
+ 
+ 	if (*ptr != '\0')
+ 		fprintf(stderr, invalid input syntax for integer: \%s\\n, str);
+ 
+ 	return ((sign  0) ? -result : result);
+ }
+ 
  /* random number generator: uniform distribution from min to max inclusive */
! static int64
! getrand(int64 min, int64 max)
  {
  	/*
  	 * Odd coding is so that min and max have approximately the same chance of
  	 * being selected as do numbers between them.
  	 */
! 	return min + (int64) (((max - min + 1) * (double) random()) / (MAX_RANDOM_VALUE64 + 1.0));
  }
  
  /* call PQexec() and exit() on failure */
*** top:
*** 887,893 
  		if (commands[st-state] == NULL)
  		{
  			st-state = 0;
! 			st-use_file = getrand(0,