Hi all,

Here're a couple of patches for PostgreSQL 64bit support.  There're just
two extension to 64bit, size of shared memory and transaction ID.

Please take a look at overview.txt for this proposal and pathces, based
upon 8.0.3.

Any discussions are welcome.

-- 
-------------------------------------------
Koichi Suzuki
Open Source Engineeering Departmeent,
NTT DATA Intellilink Corporation
Phone: +81-3-5566-9628  WWW: http://www.intellilink.co.jp
------------------------------------------

Proposal: 64bit Extension in PostgreSQL 8.0.x

                                                              July 7th, 2005
                     Koichi Suzuki (NTT DATA Intellilink)

1. Background and Purpose

64-bit architecture CPU is getting more and more popular in Intel-based CPU, 
such as 
EM64T, AMD64 and IA64.    Servers based upon such CPUs can provide much more 
memory 
available.   Tens of gigabytes of memory is available in a node, typically 
16gigabytes 
or so.

Obviously, 32bit-based Linux and its applications can run on such a machine.   
However, 
from the point of each process' view, size of avilable memory is limited by 
process user 
space, typically 1GB for kernel and 3GB for each process.   PostgreSQL's kernel 
uses 
shared memory to hold shared data and much more memory should be available as 
PostgreSQL 
is going to handle bigger database.   For this purpose, we need to extend 
PostgreSQL to 
64-bit program and make shared memory management to handle shared memory beyond 
32bit 
limitation.

On the other hand, PostgreSQL is going to be used in mission-critical systems 
in 
enterprise environment.   In such application, we need to provide long period 
operation 
without stopping service.   Currently, PostgreSQL's transaction ID is limited 
by 32bit 
integer and when it is about to run out, we have to stop the database operation 
and run 
vacuum freeze to reuse the older transaction ID value.   Vacuum freeze 
operation scans 
all the database space and together with the bigger database size in oparation, 
it's 
going to take longer.   Because PostgreSQL is going to be used in busier 
systems, 
transaction IDs tend to run out more earlier.   To provide longer continuous 
operation, 
we need to make transaction ID 64bit-based.


2. How we can do?

It is basically very simple to do these two extensions.   We can locate their 
definitions 
and change them into 64-bit based ones.   However, there are much much more to 
be done.   
We have to find all the lines which deals with their value and have to modify 
them so that 
there will not be any loss of calculation precision.   Detailed result will be 
given 
later.

3. Environment

Our code will assume the following:

1) PostgreSQL server (i.e. postmaster and postgres processes) should run only 
on 64-bit 
   CPU based server machies, that is, EM64T, AMD64 and IA64.
2) Both 64bit (EM64T, AMD64 and IA64) and 32bit CPU(IA32) are allowed as 
clients.
3) Currently, we support only Linux 2.6.x kernel.

4. Specification changes

Due to these two changes, PostgreSQL's spec will change as follows:

4.1 Shared memory size

Shared memory size are specified in shared_buffers entry in postgresql.conf 
file.   
In 32-bit environment, it is limited to INTMAX/BLOCKSIZE.   Now new limitation 
is 
INTMAX/2.   This value specifies the number of blocks, so actual memory which 
can be 
specified by this parameter will be (INITMAX/2)*BLOCKSIZE.   Typical BLOCKSIZE 
is 8KB.   
Therefore, typicak maximum shared memory will be 8TB.   This is beyond the 
limitation of 
Linux user space for EM64T (512GB) and should be sufficient.

The reason why shared memory size specification is limited to INITMAX/2 is as 
follows:  
In 8.0.x buffer management, (buffer_number)*2 is used to produce buffer IDs and 
whole 
these subsystem is still based on 32bit calculation.   We'd like to keep the 
change as 
minimum as possible and having this limit will not have bad influence to the 
maxumum 
value of actual shared memory size.

4.2 Transaction ID (XID)

Type of transaction ID is pre-defined in the catalog and there are no way for 
users to 
redefine its type.   In this extension, transaction IDs are handled as follows:

1) On catalog, transaction ID's (such as XMIN and XMAX) are give the type 
"XID", as 
   in 32-bit environment.
2) If applications on 32-bit based environment trys to read the value of the 
type "XID", 
   it has to read this value as "unsigned long long" value.
3) In the case of 64-bit based environment, the value of gXIDh can be handled 
as 
   "unsigned long" value, depending on compilers.
4) The length of the type "XID" is stored in the catalog "pg_type".

4.3 Configure

W have added two configuration options:
--enable-64bit-shared-memory
        enables 64bit shared memory by defining USE_64BIT_SHARED.
--enable-64bit-transaction-id
        enables 64-bit transaction id by defining USE_64BIT_XID.

5. Acknowledgement

I'd like to thank Mr.Tom Lane, Mr.Bruce Momjan and Mr.Jan Wieck for their 
stimulating discussions.  Many people from Fujitsu Ltd., NTT DATA and NTT DATA 
Intellilink helped me.
Index: configure.in
===================================================================
--- configure.in        (.../tags/postgres-src/8.0.3/postgres-src)      
(revision 55)
+++ configure.in        (.../trunk/postgres-src)        (revision 55)
@@ -208,6 +208,36 @@
 AC_SUBST(enable_debug)
 
 #
+# 64-bit shared memory (--enable-64bit-shared-memory)
+#
+AC_MSG_CHECKING([whether to build with 64-bit shared memory.])
+PGAC_ARG_BOOL(enable, 64bit-shared-memory, no, [  --enable-64bit-shared-memory 
 enable 64-bit shared memory support],
+              [AC_DEFINE([USE_64BIT_SHARED], 1,
+                         [Define to 1 if you want 64-bit shared memory 
support. (--enable-64bit-shared-memory)])])
+AC_MSG_RESULT([$enable_64bit_shared_memory])
+
+# 64-bit shared memory support can be used only on the x86_64 linux.
+# spinlocks must be enabled when enable 64-bit shared memory support.
+if test "$enable_64bit_shared_memory" = yes ; then
+  check_64bit_shared_memory=0
+  case $host_os in
+    linux*) 
+      if test "$host_cpu" = "x86_64" -o "$host_cpu" = "ia64" ; then
+        check_64bit_shared_memory=1
+      fi
+      ;;
+  esac
+
+  if test "$check_64bit_shared_memory" = 0 ; then
+    AC_MSG_ERROR([64-bit shared memory support can be used only on x86_64 
linux or ia64 linux.])
+  else 
+    if test "$enable_spinlocks" != yes ; then
+      AC_MSG_ERROR([spinlocks must be enabled when enable 64-bit shared memory 
support.])
+    fi
+  fi
+fi
+
+#
 # C compiler
 #
 
Index: src/include/pg_config.h.in
===================================================================
--- src/include/pg_config.h.in  (.../tags/postgres-src/8.0.3/postgres-src)      
(revision 55)
+++ src/include/pg_config.h.in  (.../trunk/postgres-src)        (revision 55)
@@ -660,6 +660,10 @@
 /* Define to select unnamed POSIX semaphores. */
 #undef USE_UNNAMED_POSIX_SEMAPHORES
 
+/* Define to 1 if you want 64-bit shared memory support.
+   (--enable-64bit-shared-memory) */
+#undef USE_64BIT_SHARED
+
 /* Number of bits in a file offset, on hosts where this is settable. */
 #undef _FILE_OFFSET_BITS
 
Index: src/include/postmaster/bgwriter.h
===================================================================
--- src/include/postmaster/bgwriter.h   
(.../tags/postgres-src/8.0.3/postgres-src)      (revision 55)
+++ src/include/postmaster/bgwriter.h   (.../trunk/postgres-src)        
(revision 55)
@@ -30,7 +30,7 @@
 extern bool ForwardFsyncRequest(RelFileNode rnode, BlockNumber segno);
 extern void AbsorbFsyncRequests(void);
 
-extern int     BgWriterShmemSize(void);
+extern Size    BgWriterShmemSize(void);
 extern void BgWriterShmemInit(void);
 
 #endif   /* _BGWRITER_H */
Index: src/include/storage/lwlock.h
===================================================================
--- src/include/storage/lwlock.h        
(.../tags/postgres-src/8.0.3/postgres-src)      (revision 55)
+++ src/include/storage/lwlock.h        (.../trunk/postgres-src)        
(revision 55)
@@ -68,7 +68,7 @@
 extern bool LWLockHeldByMe(LWLockId lockid);
 
 extern int     NumLWLocks(void);
-extern int     LWLockShmemSize(void);
+extern Size    LWLockShmemSize(void);
 extern void CreateLWLocks(void);
 
 #endif   /* LWLOCK_H */
Index: src/include/storage/buf_internals.h
===================================================================
--- src/include/storage/buf_internals.h 
(.../tags/postgres-src/8.0.3/postgres-src)      (revision 55)
+++ src/include/storage/buf_internals.h (.../trunk/postgres-src)        
(revision 55)
@@ -138,11 +138,11 @@
 extern void StrategyHintVacuum(bool vacuum_active);
 extern int StrategyDirtyBufferList(BufferDesc **buffers, BufferTag *buftags,
                                                int max_buffers);
-extern int     StrategyShmemSize(void);
+extern Size    StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
 
 /* buf_table.c */
-extern int     BufTableShmemSize(int size);
+extern Size    BufTableShmemSize(int size);
 extern void InitBufTable(int size);
 extern int     BufTableLookup(BufferTag *tagPtr);
 extern void BufTableInsert(BufferTag *tagPtr, int buf_id);
Index: src/include/storage/pg_shmem.h
===================================================================
--- src/include/storage/pg_shmem.h      
(.../tags/postgres-src/8.0.3/postgres-src)      (revision 55)
+++ src/include/storage/pg_shmem.h      (.../trunk/postgres-src)        
(revision 55)
@@ -29,8 +29,8 @@
        int32           magic;                  /* magic # to identify Postgres 
segments */
 #define PGShmemMagic  679834893
        pid_t           creatorPID;             /* PID of creating process */
-       uint32          totalsize;              /* total size of segment */
-       uint32          freeoffset;             /* offset to first free space */
+       Size            totalsize;              /* total size of segment */
+       Size            freeoffset;             /* offset to first free space */
 #ifndef WIN32                                  /* Windows doesn't have useful 
inode#s */
        dev_t           device;                 /* device data directory is on 
*/
        ino_t           inode;                  /* inode number of data 
directory */
@@ -45,7 +45,7 @@
 extern void PGSharedMemoryReAttach(void);
 #endif
 
-extern PGShmemHeader *PGSharedMemoryCreate(uint32 size, bool makePrivate,
+extern PGShmemHeader *PGSharedMemoryCreate(Size size, bool makePrivate,
                                         int port);
 extern bool PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2);
 extern void PGSharedMemoryDetach(void);
Index: src/include/storage/bufmgr.h
===================================================================
--- src/include/storage/bufmgr.h        
(.../tags/postgres-src/8.0.3/postgres-src)      (revision 55)
+++ src/include/storage/bufmgr.h        (.../trunk/postgres-src)        
(revision 55)
@@ -137,7 +137,7 @@
 #ifdef NOT_USED
 extern void PrintPinnedBufs(void);
 #endif
-extern int     BufferShmemSize(void);
+extern Size    BufferShmemSize(void);
 extern RelFileNode BufferGetFileNode(Buffer buffer);
 
 extern void SetBufferCommitInfoNeedsSave(Buffer buffer);
Index: src/backend/port/sysv_shmem.c
===================================================================
--- src/backend/port/sysv_shmem.c       
(.../tags/postgres-src/8.0.3/postgres-src)      (revision 55)
+++ src/backend/port/sysv_shmem.c       (.../trunk/postgres-src)        
(revision 55)
@@ -51,7 +51,7 @@
 unsigned long UsedShmemSegID = 0;
 void      *UsedShmemSegAddr = NULL;
 
-static void *InternalIpcMemoryCreate(IpcMemoryKey memKey, uint32 size);
+static void *InternalIpcMemoryCreate(IpcMemoryKey memKey, Size size);
 static void IpcMemoryDetach(int status, Datum shmaddr);
 static void IpcMemoryDelete(int status, Datum shmId);
 static PGShmemHeader *PGSharedMemoryAttach(IpcMemoryKey key,
@@ -71,7 +71,7 @@
  * print out an error and abort.  Other types of errors are not recoverable.
  */
 static void *
-InternalIpcMemoryCreate(IpcMemoryKey memKey, uint32 size)
+InternalIpcMemoryCreate(IpcMemoryKey memKey, Size size)
 {
        IpcMemoryId shmid;
        void       *memAddress;
@@ -99,14 +99,14 @@
                 */
                ereport(FATAL,
                                (errmsg("could not create shared memory 
segment: %m"),
-               errdetail("Failed system call was shmget(key=%lu, size=%u, 
0%o).",
-                                 (unsigned long) memKey, size,
+               errdetail("Failed system call was shmget(key=%lu, size=%lu, 
0%o).",
+                                 (unsigned long) memKey, (unsigned long) size,
                                  IPC_CREAT | IPC_EXCL | IPCProtection),
                                 (errno == EINVAL) ?
                                 errhint("This error usually means that 
PostgreSQL's request for a shared memory "
                                                 "segment exceeded your 
kernel's SHMMAX parameter.  You can either "
                                                 "reduce the request size or 
reconfigure the kernel with larger SHMMAX.  "
-                          "To reduce the request size (currently %u bytes), 
reduce "
+                          "To reduce the request size (currently %lu bytes), 
reduce "
                   "PostgreSQL's shared_buffers parameter (currently %d) and/or 
"
                                                 "its max_connections parameter 
(currently %d).\n"
                                                 "If the request size is 
already small, it's possible that it is less than "
@@ -114,28 +114,28 @@
                                                 "reconfiguring SHMMIN is 
called for.\n"
                                                 "The PostgreSQL documentation 
contains more information about shared "
                                                 "memory configuration.",
-                                                size, NBuffers, MaxBackends) : 
0,
+                                                (unsigned long) size, 
NBuffers, MaxBackends) : 0,
                                 (errno == ENOMEM) ?
                                 errhint("This error usually means that 
PostgreSQL's request for a shared "
                           "memory segment exceeded available memory or swap 
space. "
-                          "To reduce the request size (currently %u bytes), 
reduce "
+                          "To reduce the request size (currently %lu bytes), 
reduce "
                   "PostgreSQL's shared_buffers parameter (currently %d) and/or 
"
                                                 "its max_connections parameter 
(currently %d).\n"
                                                 "The PostgreSQL documentation 
contains more information about shared "
                                                 "memory configuration.",
-                                                size, NBuffers, MaxBackends) : 
0,
+                                                (unsigned long) size, 
NBuffers, MaxBackends) : 0,
                                 (errno == ENOSPC) ?
                                 errhint("This error does *not* mean that you 
have run out of disk space. "
                                                 "It occurs either if all 
available shared memory IDs have been taken, "
                                                 "in which case you need to 
raise the SHMMNI parameter in your kernel, "
                                                 "or because the system's 
overall limit for shared memory has been "
                         "reached.  If you cannot increase the shared memory 
limit, "
-               "reduce PostgreSQL's shared memory request (currently %u 
bytes), "
+               "reduce PostgreSQL's shared memory request (currently %lu 
bytes), "
                "by reducing its shared_buffers parameter (currently %d) and/or 
"
                                                 "its max_connections parameter 
(currently %d).\n"
                                                 "The PostgreSQL documentation 
contains more information about shared "
                                                 "memory configuration.",
-                                                size, NBuffers, MaxBackends) : 
0));
+                                                (unsigned long) size, 
NBuffers, MaxBackends) : 0));
        }
 
        /* Register on-exit routine to delete the new segment */
@@ -289,7 +289,7 @@
  * zero will be passed.
  */
 PGShmemHeader *
-PGSharedMemoryCreate(uint32 size, bool makePrivate, int port)
+PGSharedMemoryCreate(Size size, bool makePrivate, int port)
 {
        IpcMemoryKey NextShmemSegID;
        void       *memAddress;
Index: src/backend/utils/misc/guc.c
===================================================================
--- src/backend/utils/misc/guc.c        
(.../tags/postgres-src/8.0.3/postgres-src)      (revision 55)
+++ src/backend/utils/misc/guc.c        (.../trunk/postgres-src)        
(revision 55)
@@ -964,7 +964,11 @@
                        NULL
                },
                &NBuffers,
+#ifndef USE_64BIT_SHARED
                1000, 16, INT_MAX / BLCKSZ, NULL, NULL
+#else
+               1000, 16, INT_MAX / 2, NULL, NULL
+#endif /* USE_64BIT_SHARED */
        },
 
        {
Index: src/backend/postmaster/bgwriter.c
===================================================================
--- src/backend/postmaster/bgwriter.c   
(.../tags/postgres-src/8.0.3/postgres-src)      (revision 55)
+++ src/backend/postmaster/bgwriter.c   (.../trunk/postgres-src)        
(revision 55)
@@ -458,7 +458,7 @@
  * BgWriterShmemSize
  *             Compute space needed for bgwriter-related shared memory
  */
-int
+Size
 BgWriterShmemSize(void)
 {
        /*
Index: src/backend/storage/lmgr/lwlock.c
===================================================================
--- src/backend/storage/lmgr/lwlock.c   
(.../tags/postgres-src/8.0.3/postgres-src)      (revision 55)
+++ src/backend/storage/lmgr/lwlock.c   (.../trunk/postgres-src)        
(revision 55)
@@ -123,17 +123,17 @@
 /*
  * Compute shmem space needed for LWLocks.
  */
-int
+Size
 LWLockShmemSize(void)
 {
        int                     numLocks = NumLWLocks();
-       uint32          spaceLocks;
+       Size            spaceLocks;
 
        /* Allocate the LWLocks plus space for shared allocation counter. */
        spaceLocks = numLocks * sizeof(LWLock) + 2 * sizeof(int);
        spaceLocks = MAXALIGN(spaceLocks);
 
-       return (int) spaceLocks;
+       return (Size) spaceLocks;
 }
 
 
@@ -144,7 +144,7 @@
 CreateLWLocks(void)
 {
        int                     numLocks = NumLWLocks();
-       uint32          spaceLocks = LWLockShmemSize();
+       Size            spaceLocks = LWLockShmemSize();
        LWLock     *lock;
        int                     id;
 
Index: src/backend/storage/buffer/buf_init.c
===================================================================
--- src/backend/storage/buffer/buf_init.c       
(.../tags/postgres-src/8.0.3/postgres-src)      (revision 55)
+++ src/backend/storage/buffer/buf_init.c       (.../trunk/postgres-src)        
(revision 55)
@@ -96,7 +96,7 @@
 
        BufferBlocks = (char *)
                ShmemInitStruct("Buffer Blocks",
-                                               NBuffers * BLCKSZ, &foundBufs);
+                                               (Size)NBuffers * BLCKSZ, 
&foundBufs);
 
        if (foundDescs || foundBufs)
        {
@@ -193,16 +193,16 @@
  * compute the size of shared memory for the buffer pool including
  * data pages, buffer descriptors, hash tables, etc.
  */
-int
+Size
 BufferShmemSize(void)
 {
-       int                     size = 0;
+       Size            size = 0;
 
        /* size of buffer descriptors */
        size += MAXALIGN(NBuffers * sizeof(BufferDesc));
 
        /* size of data pages */
-       size += NBuffers * MAXALIGN(BLCKSZ);
+       size += (Size)NBuffers * MAXALIGN(BLCKSZ);
 
        /* size of stuff controlled by freelist.c */
        size += StrategyShmemSize();
Index: src/backend/storage/buffer/buf_table.c
===================================================================
--- src/backend/storage/buffer/buf_table.c      
(.../tags/postgres-src/8.0.3/postgres-src)      (revision 55)
+++ src/backend/storage/buffer/buf_table.c      (.../trunk/postgres-src)        
(revision 55)
@@ -40,7 +40,7 @@
  * Estimate space needed for mapping hashtable
  *             size is the desired hash table size (possibly more than 
NBuffers)
  */
-int
+Size
 BufTableShmemSize(int size)
 {
        return hash_estimate_size(size, sizeof(BufferLookupEnt));
Index: src/backend/storage/buffer/freelist.c
===================================================================
--- src/backend/storage/buffer/freelist.c       
(.../tags/postgres-src/8.0.3/postgres-src)      (revision 55)
+++ src/backend/storage/buffer/freelist.c       (.../trunk/postgres-src)        
(revision 55)
@@ -798,12 +798,12 @@
  *
  * estimate the size of shared memory used by the freelist-related structures.
  */
-int
+Size
 StrategyShmemSize(void)
 {
        /* A1out list can hold 50% of NBuffers, per Johnson and Shasha */
        int                     nCDBs = NBuffers + NBuffers / 2;
-       int                     size = 0;
+       Size            size = 0;
 
        /* size of CDB lookup hash table */
        size += BufTableShmemSize(nCDBs);
Index: src/backend/storage/ipc/ipci.c
===================================================================
--- src/backend/storage/ipc/ipci.c      
(.../tags/postgres-src/8.0.3/postgres-src)      (revision 55)
+++ src/backend/storage/ipc/ipci.c      (.../trunk/postgres-src)        
(revision 55)
@@ -60,7 +60,7 @@
 
        if (!IsUnderPostmaster)
        {
-               int                     size;
+               Size                    size;
                int                     numSemas;
 
                /*
@@ -86,7 +86,7 @@
                /* might as well round it off to a multiple of a typical page 
size */
                size += 8192 - (size % 8192);
 
-               elog(DEBUG3, "invoking IpcMemoryCreate(size=%d)", size);
+               elog(DEBUG3, "invoking IpcMemoryCreate(size=%lu)", (unsigned 
long) size);
 
                /*
                 * Create the shmem segment
Index: src/backend/storage/ipc/shmem.c
===================================================================
--- src/backend/storage/ipc/shmem.c     
(.../tags/postgres-src/8.0.3/postgres-src)      (revision 55)
+++ src/backend/storage/ipc/shmem.c     (.../trunk/postgres-src)        
(revision 55)
@@ -146,8 +146,8 @@
 void *
 ShmemAlloc(Size size)
 {
-       uint32          newStart;
-       uint32          newFree;
+       Size            newStart;
+       Size            newFree;
        void       *newSpace;
 
        /* use volatile pointer to prevent code rearrangement */

Attachment: pg803-xid64.tgz
Description: Binary data

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org

Reply via email to