Re: [HACKERS] memory usage of pg_upgrade

2014-02-12 Thread Bruce Momjian
On Mon, Feb  3, 2014 at 09:14:10PM -0500, Bruce Momjian wrote:
 On Mon, Sep  9, 2013 at 07:39:00PM -0400, Bruce Momjian wrote:
   In the case of tablespaces, I should have thought you could keep a
   hash table of the names and just store an entry id in the table
   structure. But that's just my speculation without actually looking
   at the code, so don't take my word for it :-)
  
  Yes, please feel free to improve the code.  I improved pg_upgrade CPU
  usage for a lerge number of objects, but never thought to look at memory
  usage.  It would be a big win to just palloc/pfree the memory, rather
  than allocate tones of memory.  If you don't get to it, I will in a few
  weeks.
 
 Thanks you for pointing out this problem.  I have researched the cause
 and the major problem was that I was allocating the maximum path length
 in a struct rather than allocating just the length I needed, and was not
 reusing string pointers that I knew were not going to change.
 
 The updated attached patch significantly decreases memory consumption:
 
   tables  orig  patch % decrease
   
   127,168 kB  27,168 kB0
   1k   46,136 kB  27,920 kB   39
   2k   65,224 kB  28,796 kB   56
   4k  103,276 kB  30,472 kB   70
   8k  179,512 kB  33,900 kB   81
   16k 331,860 kB  40,788 kB   88
   32k 636,544 kB  54,572 kB   91
   64k   1,245,920 kB  81,876 kB   93
 
 As you can see, a database with 64k tables shows a 93% decrease in
 memory use.  I plan to apply this for PG 9.4.

Patch applied.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] memory usage of pg_upgrade

2014-02-03 Thread Bruce Momjian
On Mon, Sep  9, 2013 at 07:39:00PM -0400, Bruce Momjian wrote:
  In the case of tablespaces, I should have thought you could keep a
  hash table of the names and just store an entry id in the table
  structure. But that's just my speculation without actually looking
  at the code, so don't take my word for it :-)
 
 Yes, please feel free to improve the code.  I improved pg_upgrade CPU
 usage for a lerge number of objects, but never thought to look at memory
 usage.  It would be a big win to just palloc/pfree the memory, rather
 than allocate tones of memory.  If you don't get to it, I will in a few
 weeks.

Thanks you for pointing out this problem.  I have researched the cause
and the major problem was that I was allocating the maximum path length
in a struct rather than allocating just the length I needed, and was not
reusing string pointers that I knew were not going to change.

The updated attached patch significantly decreases memory consumption:

tables  orig  patch % decrease

127,168 kB  27,168 kB0
1k   46,136 kB  27,920 kB   39
2k   65,224 kB  28,796 kB   56
4k  103,276 kB  30,472 kB   70
8k  179,512 kB  33,900 kB   81
16k 331,860 kB  40,788 kB   88
32k 636,544 kB  54,572 kB   91
64k   1,245,920 kB  81,876 kB   93

As you can see, a database with 64k tables shows a 93% decrease in
memory use.  I plan to apply this for PG 9.4.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
new file mode 100644
index f04707f..acf8660
*** a/contrib/pg_upgrade/check.c
--- b/contrib/pg_upgrade/check.c
*** create_script_for_old_cluster_deletion(c
*** 707,726 
  			fprintf(script, \n);
  			/* remove PG_VERSION? */
  			if (GET_MAJOR_VERSION(old_cluster.major_version) = 804)
! fprintf(script, RM_CMD  %s%s%cPG_VERSION\n,
  		fix_path_separator(os_info.old_tablespaces[tblnum]),
- 		fix_path_separator(old_cluster.tablespace_suffix),
  		PATH_SEPARATOR);
  
  			for (dbnum = 0; dbnum  old_cluster.dbarr.ndbs; dbnum++)
! 			{
! fprintf(script, RMDIR_CMD  %s%s%c%d\n,
  		fix_path_separator(os_info.old_tablespaces[tblnum]),
- 		fix_path_separator(old_cluster.tablespace_suffix),
  		PATH_SEPARATOR, old_cluster.dbarr.dbs[dbnum].db_oid);
- 			}
  		}
  		else
  
  			/*
  			 * Simply delete the tablespace directory, which might be .old
--- 707,724 
  			fprintf(script, \n);
  			/* remove PG_VERSION? */
  			if (GET_MAJOR_VERSION(old_cluster.major_version) = 804)
! fprintf(script, RM_CMD  %s%cPG_VERSION\n,
  		fix_path_separator(os_info.old_tablespaces[tblnum]),
  		PATH_SEPARATOR);
  
  			for (dbnum = 0; dbnum  old_cluster.dbarr.ndbs; dbnum++)
! fprintf(script, RMDIR_CMD  %s%c%d\n,
  		fix_path_separator(os_info.old_tablespaces[tblnum]),
  		PATH_SEPARATOR, old_cluster.dbarr.dbs[dbnum].db_oid);
  		}
  		else
+ 		{
+ 			char	*suffix_path = pg_strdup(old_cluster.tablespace_suffix);
  
  			/*
  			 * Simply delete the tablespace directory, which might be .old
*** create_script_for_old_cluster_deletion(c
*** 728,734 
  			 */
  			fprintf(script, RMDIR_CMD  %s%s\n,
  	fix_path_separator(os_info.old_tablespaces[tblnum]),
! 	fix_path_separator(old_cluster.tablespace_suffix));
  	}
  
  	fclose(script);
--- 726,734 
  			 */
  			fprintf(script, RMDIR_CMD  %s%s\n,
  	fix_path_separator(os_info.old_tablespaces[tblnum]),
! 	fix_path_separator(suffix_path));
! 			pfree(suffix_path);
! 		}
  	}
  
  	fclose(script);
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
new file mode 100644
index 3d0dd1a..fd083de
*** a/contrib/pg_upgrade/info.c
--- b/contrib/pg_upgrade/info.c
*** create_rel_filename_map(const char *old_
*** 108,127 
  		 * relation belongs to the default tablespace, hence relfiles should
  		 * exist in the data directories.
  		 */
! 		strlcpy(map-old_tablespace, old_data, sizeof(map-old_tablespace));
! 		strlcpy(map-new_tablespace, new_data, sizeof(map-new_tablespace));
! 		strlcpy(map-old_tablespace_suffix, /base, sizeof(map-old_tablespace_suffix));
! 		strlcpy(map-new_tablespace_suffix, /base, sizeof(map-new_tablespace_suffix));
  	}
  	else
  	{
  		/* relation belongs to a tablespace, so use the tablespace location */
! 		strlcpy(map-old_tablespace, old_rel-tablespace, sizeof(map-old_tablespace));
! 		strlcpy(map-new_tablespace, new_rel-tablespace, sizeof(map-new_tablespace));
! 		strlcpy(map-old_tablespace_suffix, old_cluster.tablespace_suffix,
! sizeof(map-old_tablespace_suffix));
! 		strlcpy(map-new_tablespace_suffix, new_cluster.tablespace_suffix,
! sizeof(map-new_tablespace_suffix));
 

Re: [HACKERS] memory usage of pg_upgrade

2013-09-09 Thread Andrew Dunstan


On 09/09/2013 06:20 PM, Jeff Janes wrote:

pg_upgrade reserves 5 times MAXPGPATH, or 5120 characters, for the
tablespace name of every object (table, toast table, index) in the
database being upgraded.  This adds up pretty quickly when there is a
very large number of objects.  It could be changed to char* to a
separately allocated name that takes only as much space it needs.  But
maybe it would be better to point into os_info.old_tablespaces or
something like that, as surely there are not going to be one
independent file space per object.


typedef struct
{
  ...
 chartablespace[MAXPGPATH];
} RelInfo;

The struct FileNameMap has 4 more .

Since there seems to be some interest in improving the scalability of
pg_upgrade, this is one of the things to consider fixing.  What is the
best way to do it?



Send in a patch :-)

We recently ripped out some uses of statically sized strings in the 
parallel code and replaced them with pointers to palloc'ed strings. So 
there is good precedent for this. See 
https://github.com/postgres/postgres/commit/910d3a458c15c1b4cc518ba480be2f712f42f179


In the case of tablespaces, I should have thought you could keep a hash 
table of the names and just store an entry id in the table structure. 
But that's just my speculation without actually looking at the code, so 
don't take my word for it :-)


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] memory usage of pg_upgrade

2013-09-09 Thread Bruce Momjian
On Mon, Sep  9, 2013 at 06:39:39PM -0400, Andrew Dunstan wrote:
 
 On 09/09/2013 06:20 PM, Jeff Janes wrote:
 pg_upgrade reserves 5 times MAXPGPATH, or 5120 characters, for the
 tablespace name of every object (table, toast table, index) in the
 database being upgraded.  This adds up pretty quickly when there is a
 very large number of objects.  It could be changed to char* to a
 separately allocated name that takes only as much space it needs.  But
 maybe it would be better to point into os_info.old_tablespaces or
 something like that, as surely there are not going to be one
 independent file space per object.
 
 
 typedef struct
 {
   ...
  chartablespace[MAXPGPATH];
 } RelInfo;
 
 The struct FileNameMap has 4 more .
 
 Since there seems to be some interest in improving the scalability of
 pg_upgrade, this is one of the things to consider fixing.  What is the
 best way to do it?
 
 
 Send in a patch :-)
 
 We recently ripped out some uses of statically sized strings in the
 parallel code and replaced them with pointers to palloc'ed strings.
 So there is good precedent for this. See 
 https://github.com/postgres/postgres/commit/910d3a458c15c1b4cc518ba480be2f712f42f179
 
 In the case of tablespaces, I should have thought you could keep a
 hash table of the names and just store an entry id in the table
 structure. But that's just my speculation without actually looking
 at the code, so don't take my word for it :-)

Yes, please feel free to improve the code.  I improved pg_upgrade CPU
usage for a lerge number of objects, but never thought to look at memory
usage.  It would be a big win to just palloc/pfree the memory, rather
than allocate tones of memory.  If you don't get to it, I will in a few
weeks.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers