My apologies if you are seeing this twice. I posted it last night, but
it still does not appear to have made it to the group.
Mark Dilger wrote:
Tom Lane wrote:
Mark Dilger <[EMAIL PROTECTED]> writes:
Tom Lane wrote:
Please provide a stack trace --- AFAIK there shouldn't be any reason
a pass-by-ref 3-byte type wouldn't work.
#0 0xb7e01d45 in memcpy () from /lib/libc.so.6
#1 0x08077ece in heap_fill_tuple (tupleDesc=0x83c2ef7,
values=0x83c2e84, isnull=0x83c2e98 "", data=0x83c2ef4 "",
Hm, are you sure you provided a valid pointer (not the integer value
itself) as the Datum output from int3_in?
(Looks at patch ... ) Um, I think you didn't, although that coding
is far too cute to be actually readable ...
regards, tom lane
Ok, I have it working on my intel architecture machine. Here are some
of my findings. Disk usage is calculated by running 'du -b' in
/usr/local/pgsql/data before and after loading the table, and taking the
difference. That directory is deleted, recreated, and initdb rerun
between each test. The host system is a dual processor, dual core 2.4
GHz system, 2 GB DDR400 memory, 10,000 RPM SCSI ultra160 hard drive with
the default postgresql.conf file as created by initdb. The code is the
stock postgresql-8.1.4 release tarball compiled with gcc and configured
without debug or cassert options enabled.
INT3 VS INT4
Using a table of 8 integers per row and 16777216 rows, I can drop the
disk usage from 1.2 GB down to 1.0 GB by defining those integers as int3
rather than int4. (It works out to about 70.5 bytes per row vs. 62.5
bytes per row.) However, the load time actually increases, probably due
to CPU/memory usage. The time increased from 197 seconds to 213
seconds. Note that int3 is defined pass-by-reference due to a
limitation in the code that prevents pass-by-value for any datasize
other than 1, 2, or 4 bytes.
Using a table of only one integer per row, the table size is exactly the
same (down to the byte) whether I use int3 or int4. I suspect this is
due to data alignment for the row being on at least a 4 byte boundary.
Creating an index on a single column of the 8-integer-per-row table, the
index size is exactly the same whether the integers are int3 or int4.
Once again, I suspect that data alignment is eliminating the space savings.
I haven't tested this, but I suspect that if the column following an
int3 is aligned on 4 or 8 byte boundaries, that the int3 column will
have an extra byte padded and hence will have no performance gain.
INT1 VS INT2
Once again using a table of 8 integers per row and 16777216 rows, I can
drop the disk usage from 909 MB down to 774 MB by defining those
integers as int1 rather than int2. (54 bytes per row vs 46 bytes per
row.) The load time also drops, from 179 seconds to 159 seconds. Note
that int1 is defined pass-by-value.
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?