I attach patch which remove nonsegment mode support. It was discussed during
last commit fest. Nonsegment mode is possible uses only on couple of FS (ZFS,
XFS) and it is not safe on any OS because each OS support more filesystems.

I added RELSEG option to the configure script to allow easily compile with
different segment size (on most filesystem 1T is safe value). As a bonus I added
also BLCKSZ to configure script. It is not important for this patch but it could be useful e.g. for buildfarm testing with different BLCKSZ.

Patch requires to run autoconf and autoheader.

                Zdenek

PS: --with-segsize=1/1024 allows set segsize to 1MB - good for testing


Index: configure.in
===================================================================
RCS file: /zfs_data/cvs_pgsql/cvsroot/pgsql/configure.in,v
retrieving revision 1.555
diff -c -r1.555 configure.in
*** configure.in	30 Mar 2008 04:08:14 -0000	1.555
--- configure.in	21 Apr 2008 15:19:59 -0000
***************
*** 220,233 ****
  #
  # Data file segmentation
  #
! PGAC_ARG_BOOL(enable, segmented-files, yes,
!               [  --disable-segmented-files disable data file segmentation (requires largefile support)])
  
  #
  # C compiler
  #
  
  # For historical reasons you can also use --with-CC to specify the C compiler
  # to use, although the standard way to do this is to set the CC environment
  # variable.
  PGAC_ARG_REQ(with, CC, [], [CC=$with_CC])
--- 220,287 ----
  #
  # Data file segmentation
  #
! AC_MSG_CHECKING([for default relation segment size])
! PGAC_ARG_REQ(with, segsize, [  --with-segsize=RELSEG_SIZE  change default relation segment size in GB [[1]]],
!              [default_segsize=$withval],
!              [default_segsize=1])
! AC_MSG_RESULT([${default_segsize}GB])
! AC_DEFINE_UNQUOTED([RELSEG_SIZE], 1024*1024*1024LL*${default_segsize}/BLCKSZ, [
!  RELSEG_SIZE is the maximum number of blocks allowed in one disk
!  file. Thus, the maximum size of a single file is RELSEG_SIZE * BLCKSZ;
!  relations bigger than that are divided into multiple files.
!  
!  RELSEG_SIZE * BLCKSZ must be less than your OS' limit on file size.
!  This is often 2 GB or 4GB in a 32-bit operating system, unless you
!  have large file support enabled.  By default, we make the limit 1
!  GB to avoid any possible integer-overflow problems within the OS.
!  A limit smaller than necessary only means we divide a large
!  relation into more chunks than necessary, so it seems best to err
!  in the direction of a small limit.  (Besides, a power-of-2 value
!  saves a few cycles in md.c.)
  
+  Changing RELSEG_SIZE requires an initdb.
+ ])
+ AC_SUBST(default_segsize)
+ 
+ #
+ # Block size
  #
+ AC_MSG_CHECKING([for default block size])
+ PGAC_ARG_REQ(with, blocksize, [  --with-blocksize=BLCKSZ change default block size (1,2,4,8,16,32 are allowed values). [[8]]],
+              [default_blocksize=$withval],
+              [default_blocksize=8])
+ case ${default_blocksize} in
+   1) default_blocksize=1024;;
+   2) default_blocksize=2048;;
+   4) default_blocksize=4096;;
+   8) default_blocksize=8192;;
+  16) default_blocksize=16384;;
+  32) default_blocksize=32768;;
+   *) AC_MSG_ERROR([Invalid block size. Allowed values are 1,2,4,8,16,32.])
+ esac
+ 
+ AC_MSG_RESULT([${default_blocksize}B])
+ AC_DEFINE_UNQUOTED([BLCKSZ], ${default_blocksize}, [
+  Size of a disk block --- this also limits the size of a tuple.  You
+  can set it bigger if you need bigger tuples (although TOAST should
+  reduce the need to have large tuples, since fields can be spread
+  across multiple tuples).
+  
+  BLCKSZ must be a power of 2.  The maximum possible value of BLCKSZ
+  is currently 2^15 (32768).  This is determined by the 15-bit widths
+  of the lp_off and lp_len fields in ItemIdData (see
+  include/storage/itemid.h).
+  
+  Changing BLCKSZ requires an initdb.
+ ]) 
+ AC_SUBST(default_blocksize)
+ 
+ 
  # C compiler
  #
  
  # For historical reasons you can also use --with-CC to specify the C compiler
+ 
  # to use, although the standard way to do this is to set the CC environment
  # variable.
  PGAC_ARG_REQ(with, CC, [], [CC=$with_CC])
***************
*** 1435,1443 ****
  
  # Check for largefile support (must be after AC_SYS_LARGEFILE)
  AC_CHECK_SIZEOF([off_t])
! 
! if test "$ac_cv_sizeof_off_t" -lt 8 -o "$enable_segmented_files" = "yes"; then 
!   AC_DEFINE([USE_SEGMENTED_FILES], 1, [Define to split data files into 1GB segments.]) 
  fi
  
  # SunOS doesn't handle negative byte comparisons properly with +/- return
--- 1489,1496 ----
  
  # Check for largefile support (must be after AC_SYS_LARGEFILE)
  AC_CHECK_SIZEOF([off_t])
! if test "$ac_cv_sizeof_off_t" -lt 8 -a "$default_segsize" != "1"; then 
!    AC_MSG_ERROR([Large file support is not enabled. Segment size cannot be larger then 1GB.]) 
  fi
  
  # SunOS doesn't handle negative byte comparisons properly with +/- return
Index: src/backend/storage/file/buffile.c
===================================================================
RCS file: /zfs_data/cvs_pgsql/cvsroot/pgsql/src/backend/storage/file/buffile.c,v
retrieving revision 1.30
diff -c -r1.30 buffile.c
*** src/backend/storage/file/buffile.c	10 Mar 2008 20:06:27 -0000	1.30
--- src/backend/storage/file/buffile.c	18 Apr 2008 08:13:45 -0000
***************
*** 38,45 ****
  #include "storage/buffile.h"
  
  /*
!  * We break BufFiles into gigabyte-sized segments, whether or not
!  * USE_SEGMENTED_FILES is defined.  The reason is that we'd like large
   * temporary BufFiles to be spread across multiple tablespaces when available.
   */
  #define MAX_PHYSICAL_FILESIZE	0x40000000
--- 38,44 ----
  #include "storage/buffile.h"
  
  /*
!  * We break BufFiles into gigabyte-sized segments. The reason is that we'd like large
   * temporary BufFiles to be spread across multiple tablespaces when available.
   */
  #define MAX_PHYSICAL_FILESIZE	0x40000000
Index: src/backend/storage/smgr/md.c
===================================================================
RCS file: /zfs_data/cvs_pgsql/cvsroot/pgsql/src/backend/storage/smgr/md.c,v
retrieving revision 1.137
diff -c -r1.137 md.c
*** src/backend/storage/smgr/md.c	18 Apr 2008 06:48:38 -0000	1.137
--- src/backend/storage/smgr/md.c	18 Apr 2008 08:12:02 -0000
***************
*** 89,106 ****
   *
   *	All MdfdVec objects are palloc'd in the MdCxt memory context.
   *
-  *	On platforms that support large files, USE_SEGMENTED_FILES can be
-  *	#undef'd to disable the segmentation logic.  In that case each
-  *	relation is a single operating-system file.
   */
  
  typedef struct _MdfdVec
  {
  	File		mdfd_vfd;		/* fd number in fd.c's pool */
  	BlockNumber mdfd_segno;		/* segment number, from 0 */
- #ifdef USE_SEGMENTED_FILES
  	struct _MdfdVec *mdfd_chain;	/* next segment, or NULL */
- #endif
  } MdfdVec;
  
  static MemoryContext MdCxt;		/* context for all md.c allocations */
--- 89,101 ----
***************
*** 162,171 ****
  static void register_unlink(RelFileNode rnode);
  static MdfdVec *_fdvec_alloc(void);
  
- #ifdef USE_SEGMENTED_FILES
  static MdfdVec *_mdfd_openseg(SMgrRelation reln, BlockNumber segno,
  			  int oflags);
- #endif
  static MdfdVec *_mdfd_getseg(SMgrRelation reln, BlockNumber blkno,
  			 bool isTemp, ExtensionBehavior behavior);
  static BlockNumber _mdnblocks(SMgrRelation reln, MdfdVec *seg);
--- 157,164 ----
***************
*** 258,266 ****
  
  	reln->md_fd->mdfd_vfd = fd;
  	reln->md_fd->mdfd_segno = 0;
- #ifdef USE_SEGMENTED_FILES
  	reln->md_fd->mdfd_chain = NULL;
- #endif
  }
  
  /*
--- 251,257 ----
***************
*** 344,350 ****
  							rnode.relNode)));
  	}
  
- #ifdef USE_SEGMENTED_FILES
  	/* Delete the additional segments, if any */
  	else
  	{
--- 335,340 ----
***************
*** 374,380 ****
  		}
  		pfree(segpath);
  	}
- #endif
  
  	pfree(path);
  
--- 364,369 ----
***************
*** 420,431 ****
  
  	v = _mdfd_getseg(reln, blocknum, isTemp, EXTENSION_CREATE);
  
- #ifdef USE_SEGMENTED_FILES
  	seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
  	Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE);
- #else
- 	seekpos = (off_t) BLCKSZ * blocknum;
- #endif
  
  	/*
  	 * Note: because caller usually obtained blocknum by calling mdnblocks,
--- 409,416 ----
***************
*** 469,477 ****
  	if (!isTemp)
  		register_dirty_segment(reln, v);
  
- #ifdef USE_SEGMENTED_FILES
  	Assert(_mdnblocks(reln, v) <= ((BlockNumber) RELSEG_SIZE));
- #endif
  }
  
  /*
--- 454,460 ----
***************
*** 530,539 ****
  
  	mdfd->mdfd_vfd = fd;
  	mdfd->mdfd_segno = 0;
- #ifdef USE_SEGMENTED_FILES
  	mdfd->mdfd_chain = NULL;
  	Assert(_mdnblocks(reln, mdfd) <= ((BlockNumber) RELSEG_SIZE));
- #endif
  
  	return mdfd;
  }
--- 513,520 ----
***************
*** 552,558 ****
  
  	reln->md_fd = NULL;			/* prevent dangling pointer after error */
  
- #ifdef USE_SEGMENTED_FILES
  	while (v != NULL)
  	{
  		MdfdVec    *ov = v;
--- 533,538 ----
***************
*** 564,574 ****
  		v = v->mdfd_chain;
  		pfree(ov);
  	}
- #else
- 	if (v->mdfd_vfd >= 0)
- 		FileClose(v->mdfd_vfd);
- 	pfree(v);
- #endif
  }
  
  /*
--- 544,549 ----
***************
*** 583,594 ****
  
  	v = _mdfd_getseg(reln, blocknum, false, EXTENSION_FAIL);
  
- #ifdef USE_SEGMENTED_FILES
  	seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
  	Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE);
- #else
- 	seekpos = (off_t) BLCKSZ * blocknum;
- #endif
  
  	if (FileSeek(v->mdfd_vfd, seekpos, SEEK_SET) != seekpos)
  		ereport(ERROR,
--- 558,565 ----
***************
*** 653,664 ****
  
  	v = _mdfd_getseg(reln, blocknum, isTemp, EXTENSION_FAIL);
  
- #ifdef USE_SEGMENTED_FILES
  	seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
  	Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE);
- #else
- 	seekpos = (off_t) BLCKSZ * blocknum;
- #endif
  
  	if (FileSeek(v->mdfd_vfd, seekpos, SEEK_SET) != seekpos)
  		ereport(ERROR,
--- 624,631 ----
***************
*** 708,714 ****
  {
  	MdfdVec    *v = mdopen(reln, EXTENSION_FAIL);
  
- #ifdef USE_SEGMENTED_FILES
  	BlockNumber nblocks;
  	BlockNumber segno = 0;
  
--- 675,680 ----
***************
*** 764,772 ****
  
  		v = v->mdfd_chain;
  	}
- #else
- 	return _mdnblocks(reln, v);
- #endif
  }
  
  /*
--- 730,735 ----
***************
*** 777,786 ****
  {
  	MdfdVec    *v;
  	BlockNumber curnblk;
- 
- #ifdef USE_SEGMENTED_FILES
  	BlockNumber priorblocks;
- #endif
  
  	/*
  	 * NOTE: mdnblocks makes sure we have opened all active segments, so that
--- 740,746 ----
***************
*** 804,810 ****
  
  	v = mdopen(reln, EXTENSION_FAIL);
  
- #ifdef USE_SEGMENTED_FILES
  	priorblocks = 0;
  	while (v != NULL)
  	{
--- 764,769 ----
***************
*** 866,884 ****
  		}
  		priorblocks += RELSEG_SIZE;
  	}
- #else
- 	/* For unsegmented files, it's a lot easier */
- 	if (FileTruncate(v->mdfd_vfd, (off_t) nblocks * BLCKSZ) < 0)
- 		ereport(ERROR,
- 				(errcode_for_file_access(),
- 			  errmsg("could not truncate relation %u/%u/%u to %u blocks: %m",
- 					 reln->smgr_rnode.spcNode,
- 					 reln->smgr_rnode.dbNode,
- 					 reln->smgr_rnode.relNode,
- 					 nblocks)));
- 	if (!isTemp)
- 		register_dirty_segment(reln, v);
- #endif
  }
  
  /*
--- 825,830 ----
***************
*** 901,907 ****
  
  	v = mdopen(reln, EXTENSION_FAIL);
  
- #ifdef USE_SEGMENTED_FILES
  	while (v != NULL)
  	{
  		if (FileSync(v->mdfd_vfd) < 0)
--- 847,852 ----
***************
*** 914,928 ****
  					   reln->smgr_rnode.relNode)));
  		v = v->mdfd_chain;
  	}
- #else
- 	if (FileSync(v->mdfd_vfd) < 0)
- 		ereport(ERROR,
- 				(errcode_for_file_access(),
- 				 errmsg("could not fsync relation %u/%u/%u: %m",
- 						reln->smgr_rnode.spcNode,
- 						reln->smgr_rnode.dbNode,
- 						reln->smgr_rnode.relNode)));
- #endif
  }
  
  /*
--- 859,864 ----
***************
*** 1476,1483 ****
  	return (MdfdVec *) MemoryContextAlloc(MdCxt, sizeof(MdfdVec));
  }
  
- #ifdef USE_SEGMENTED_FILES
- 
  /*
   * Open the specified segment of the relation,
   * and make a MdfdVec object for it.  Returns NULL on failure.
--- 1412,1417 ----
***************
*** 1522,1528 ****
  	/* all done */
  	return v;
  }
- #endif   /* USE_SEGMENTED_FILES */
  
  /*
   *	_mdfd_getseg() -- Find the segment of the relation holding the
--- 1456,1461 ----
***************
*** 1538,1544 ****
  {
  	MdfdVec    *v = mdopen(reln, behavior);
  
- #ifdef USE_SEGMENTED_FILES
  	BlockNumber targetseg;
  	BlockNumber nextsegno;
  
--- 1471,1476 ----
***************
*** 1600,1607 ****
  		}
  		v = v->mdfd_chain;
  	}
- #endif
- 
  	return v;
  }
  
--- 1532,1537 ----
Index: src/include/pg_config_manual.h
===================================================================
RCS file: /zfs_data/cvs_pgsql/cvsroot/pgsql/src/include/pg_config_manual.h,v
retrieving revision 1.31
diff -c -r1.31 pg_config_manual.h
*** src/include/pg_config_manual.h	11 Apr 2008 22:54:23 -0000	1.31
--- src/include/pg_config_manual.h	21 Apr 2008 15:17:07 -0000
***************
*** 11,57 ****
   */
  
  /*
-  * Size of a disk block --- this also limits the size of a tuple.  You
-  * can set it bigger if you need bigger tuples (although TOAST should
-  * reduce the need to have large tuples, since fields can be spread
-  * across multiple tuples).
-  *
-  * BLCKSZ must be a power of 2.  The maximum possible value of BLCKSZ
-  * is currently 2^15 (32768).  This is determined by the 15-bit widths
-  * of the lp_off and lp_len fields in ItemIdData (see
-  * include/storage/itemid.h).
-  *
-  * Changing BLCKSZ requires an initdb.
-  */
- #define BLCKSZ	8192
- 
- /*
-  * RELSEG_SIZE is the maximum number of blocks allowed in one disk
-  * file when USE_SEGMENTED_FILES is defined.  Thus, the maximum size 
-  * of a single file is RELSEG_SIZE * BLCKSZ; relations bigger than that 
-  * are divided into multiple files.
-  *
-  * RELSEG_SIZE * BLCKSZ must be less than your OS' limit on file size.
-  * This is often 2 GB or 4GB in a 32-bit operating system, unless you
-  * have large file support enabled.  By default, we make the limit 1
-  * GB to avoid any possible integer-overflow problems within the OS.
-  * A limit smaller than necessary only means we divide a large
-  * relation into more chunks than necessary, so it seems best to err
-  * in the direction of a small limit.  (Besides, a power-of-2 value
-  * saves a few cycles in md.c.)
-  *
-  * When not using segmented files, RELSEG_SIZE is set to zero so that
-  * this behavior can be distinguished in pg_control.
-  *
-  * Changing RELSEG_SIZE requires an initdb.
-  */
- #ifdef USE_SEGMENTED_FILES
- #define RELSEG_SIZE (0x40000000 / BLCKSZ)
- #else
- #define RELSEG_SIZE 0
- #endif
- 
- /*
   * Size of a WAL file block.  This need have no particular relation to BLCKSZ.
   * XLOG_BLCKSZ must be a power of 2, and if your system supports O_DIRECT I/O,
   * XLOG_BLCKSZ must be a multiple of the alignment requirement for direct-I/O
--- 11,16 ----
-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Reply via email to