I have developed the attached patch based on your observations.
---------------------------------------------------------------------------
On Mon, Apr 30, 2012 at 03:59:20PM -0500, Dan McGee wrote:
> Hey everyone,
>
> Reading the docs today, I came across this paragraph
> (http://www.postgresql.org/docs/devel/static/datatype-numeric.html#DATATYPE-INT),
> which goes back several major versions:
>
> > The type integer is the common choice, as it offers the best balance
> > between range, storage size, and performance. The smallint type is
> > generally only used if disk space is at a premium. The bigint type should
> > only be used if the range of the integer type is insufficient, because the
> > latter is definitely faster.
>
> A few thoughts on this.
> 1) the use of the word "latter" isn't totally clear, or at least I had
> to re-read it to realize former was 'bigint' and latter was 'integer'.
> It might just be the style of writing.
> 2) I'm less than convinced this note belongs in modern documentation,
> and set out to test that theory. My full results are below, but the
> summary is this: on a 64-bit system, there seems to be only a minimal
> measurable performance difference (< 5%) and very little size
> difference. In the case of the indexes, the size difference is zero.
> This is not true for a 32-bit system (where it is 39% slower), but the
> blanket statement doesn't hold true, which is why I'm writing all this
> up.
>
> On a final note, the following paragraph also seems like it has
> outlived its useful life:
>
> > On very minimal operating systems the bigint type might not function
> > correctly, because it relies on compiler support for eight-byte integers.
> > On such machines, bigint acts the same as integer, but still takes up eight
> > bytes of storage. (We are not aware of any modern platform where this is
> > the case.)
>
> Thanks!
> -Dan
>
>
> Table setup (only difference is type of 'id' column):
>
> Table "public.package_files"
> Column | Type | Modifiers
> --------------+------------------------+------------------------
> id | integer | not null
> pkg_id | integer | not null
> is_directory | boolean | not null default false
> directory | character varying(255) | not null
> filename | character varying(255) |
> Indexes:
> "package_files_pkey" PRIMARY KEY, btree (id)
> "package_files_pkg_id" btree (pkg_id) CLUSTER
>
>
> Table "public.package_files_int8"
> Column | Type | Modifiers
> --------------+------------------------+------------------------
> id | bigint | not null
> pkg_id | integer | not null
> is_directory | boolean | not null default false
> directory | character varying(255) | not null
> filename | character varying(255) |
> Indexes:
> "package_files_int8_pkey" PRIMARY KEY, btree (id)
> "package_files_int8_pkg_id" btree (pkg_id) CLUSTER
>
>
> # select count(*) from package_files;
> 2621418
> # select count(*) from package_files_int8 ;
> 2621418
>
>
> All runs below were done after issuing a few warm up queries, and both
> tables went through a VACUUM/CLUSTER/ANALYZE sequence.
>
> 32-bit P4 2.4 GHz (single core). no enabled CPU frequency scaling, 1GB
> total ram, shared_buffers 128MB, work_mem 4MB:
>
> relation | size
> ----------------------------------+------------
> public.package_files_int8 | 239 MB
> public.package_files | 229 MB
> public.package_files_int8_pkey | 56 MB
> public.package_files_int8_pkg_id | 45 MB
> public.package_files_pkey | 45 MB
> public.package_files_pkg_id | 45 MB
>
> archweb=> \timing on
> Timing is on.
> archweb=> \t
> Showing only tuples.
> archweb=> select count(*) from (select * from package_files where
> pkg_id in (48024, 48025, 40343) order by id) a;
> 119325
> Time: 516.558 ms
> archweb=> select count(*) from (select * from package_files where
> pkg_id in (48024, 48025, 40343) order by id) a;
> 119325
> Time: 519.720 ms
> archweb=> select count(*) from (select * from package_files where
> pkg_id in (48024, 48025, 40343) order by id) a;
> 119325
> Time: 533.330 ms
> archweb=> select count(*) from (select * from package_files where
> pkg_id in (48024, 48025, 40343) order by id) a;
> 119325
> Time: 519.095 ms
> archweb=> select count(*) from (select * from package_files where
> pkg_id in (48024, 48025, 40343) order by id) a;
> 119325
> Time: 520.253 ms
>
> archweb=> select count(*) from (select * from package_files_int8 where
> pkg_id in (48024, 48025, 40343) order by id) a;
> 119325
> Time: 731.194 ms
> archweb=> select count(*) from (select * from package_files_int8 where
> pkg_id in (48024, 48025, 40343) order by id) a;
> 119325
> Time: 730.329 ms
> archweb=> select count(*) from (select * from package_files_int8 where
> pkg_id in (48024, 48025, 40343) order by id) a;
> 119325
> Time: 724.646 ms
> archweb=> select count(*) from (select * from package_files_int8 where
> pkg_id in (48024, 48025, 40343) order by id) a;
> 119325
> Time: 710.815 ms
> archweb=> select count(*) from (select * from package_files_int8 where
> pkg_id in (48024, 48025, 40343) order by id) a;
> 119325
> Time: 724.519 ms
>
>
>
>
> 64-bit Core2 Quad 2.66 GHz (four core), CPU freq scaling disabled
> (performance governor used), 8GB total ram, shared_buffers 128MB,
> work_mem 4MB:
>
> relation | size
> ----------------------------------+------------
> public.package_files_int8 | 245 MB
> public.package_files | 234 MB
> public.package_files_int8_pkey | 56 MB
> public.package_files_pkg_id | 56 MB
> public.package_files_int8_pkg_id | 56 MB
> public.package_files_pkey | 56 MB
>
> dmcgee=# select count(*) from (select * from package_files where
> pkg_id in (48024, 48025, 40343) order by id) a;
> 119325
> Time: 177.078 ms
> dmcgee=# select count(*) from (select * from package_files where
> pkg_id in (48024, 48025, 40343) order by id) a;
> 119325
> Time: 176.109 ms
> dmcgee=# select count(*) from (select * from package_files where
> pkg_id in (48024, 48025, 40343) order by id) a;
> 119325
> Time: 177.478 ms
> dmcgee=# select count(*) from (select * from package_files where
> pkg_id in (48024, 48025, 40343) order by id) a;
> 119325
> Time: 176.639 ms
> dmcgee=# select count(*) from (select * from package_files where
> pkg_id in (48024, 48025, 40343) order by id) a;
> 119325
> Time: 176.453 ms
>
> dmcgee=# select count(*) from (select * from package_files_int8 where
> pkg_id in (48024, 48025, 40343) order by id) a;
> 119325
> Time: 185.768 ms
> dmcgee=# select count(*) from (select * from package_files_int8 where
> pkg_id in (48024, 48025, 40343) order by id) a;
> 119325
> Time: 185.159 ms
> dmcgee=# select count(*) from (select * from package_files_int8 where
> pkg_id in (48024, 48025, 40343) order by id) a;
> 119325
> Time: 184.407 ms
> dmcgee=# select count(*) from (select * from package_files_int8 where
> pkg_id in (48024, 48025, 40343) order by id) a;
> 119325
> Time: 184.555 ms
> dmcgee=# select count(*) from (select * from package_files_int8 where
> pkg_id in (48024, 48025, 40343) order by id) a;
> 119325
> Time: 184.290 ms
>
>
> Overall results:
>
> i686 x86_64
> int4 int8 int4 int8
> 516.558 731.194 177.078 185.768
> 519.72 730.329 176.109 185.159
> 533.33 724.646 177.478 184.407
> 519.095 710.815 176.639 184.555
> 520.253 724.519 176.453 184.29
>
>
>
> Average 521.7912 724.3006 176.7514 184.8358
> Stddev 6.6040841681 8.1530512264 0.5359499044 0.619288059
> Ratio 1.3881042839 1.0457388173
>
> --
> Sent via pgsql-docs mailing list ([email protected])
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-docs
--
Bruce Momjian <[email protected]> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
new file mode 100644
index 7f6e661..71cf59e
*** a/doc/src/sgml/datatype.sgml
--- b/doc/src/sgml/datatype.sgml
***************
*** 453,470 ****
The type <type>integer</type> is the common choice, as it offers
the best balance between range, storage size, and performance.
The <type>smallint</type> type is generally only used if disk
! space is at a premium. The <type>bigint</type> type should only
! be used if the range of the <type>integer</type> type is insufficient,
! because the latter is definitely faster.
! </para>
!
! <para>
! On very minimal operating systems the <type>bigint</type> type
! might not function correctly, because it relies on compiler support
! for eight-byte integers. On such machines, <type>bigint</type>
! acts the same as <type>integer</type>, but still takes up eight
! bytes of storage. (We are not aware of any modern
! platform where this is the case.)
</para>
<para>
--- 453,460 ----
The type <type>integer</type> is the common choice, as it offers
the best balance between range, storage size, and performance.
The <type>smallint</type> type is generally only used if disk
! space is at a premium. The <type>bigint</type> type is designed to be
! used when the range of the <type>integer</type> type is insufficient.
</para>
<para>
--
Sent via pgsql-docs mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs