Re: [HACKERS] Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory
Hi, On 2013-06-28 23:03:22 -0400, Bruce Momjian wrote: Did we decide against specifying huge pages in Postgres? I don't think so. We need somebody to make some last modifications to the patch though and Christian doesn't seem to have the time atm. I think the bigger memory (size of the per process page table) and #cpus (number of pg backends = number of page tables allocated) the more imporant it gets to implement this. We already can spend gigabytes of memory on those on big systems. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory
Did we decide against specifying huge pages in Postgres? --- On Tue, Oct 30, 2012 at 09:16:07PM +0100, Christian Kruse wrote: Hey, ok, I think I implemented all of the changes you requested. All but the ia64 dependent, I have to do more research for this one. Greetings, CK diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index b4fcbaf..66ed10f 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1049,6 +1049,37 @@ include 'filename' /listitem /varlistentry + varlistentry id=guc-huge-tlb-pages xreflabel=huge_tlb_pages + termvarnamehuge_tlb_pages/varname (typeenum/type)/term + indexterm + primaryvarnamehuge_tlb_pages/ configuration parameter/primary + /indexterm + listitem + para +Enables/disables the use of huge tlb pages. Valid values are +literalon/literal, literaloff/literal and literaltry/literal. +The default value is literaltry/literal. + /para + + para +With varnamehuge_tlb_pages/varname set to literalon/literal +symbolmmap()/symbol will be called with symbolMAP_HUGETLB/symbol. +If the call fails the server will fail fatally. + /para + + para +With varnamehuge_tlb_pages/varname set to literaloff/literal we +will not use symbolMAP_HUGETLB/symbol at all. + /para + + para +With varnamehuge_tlb_pages/varname set to literaltry/literal +we will try to use symbolMAP_HUGETLB/symbol and fall back to +symbolmmap()/symbol without symbolMAP_HUGETLB/symbol. + /para + /listitem + /varlistentry + varlistentry id=guc-temp-buffers xreflabel=temp_buffers termvarnametemp_buffers/varname (typeinteger/type)/term indexterm diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c index df06312..f9de239 100644 --- a/src/backend/port/sysv_shmem.c +++ b/src/backend/port/sysv_shmem.c @@ -27,10 +27,14 @@ #ifdef HAVE_SYS_SHM_H #include sys/shm.h #endif +#ifdef MAP_HUGETLB +#include dirent.h +#endif #include miscadmin.h #include storage/ipc.h #include storage/pg_shmem.h +#include utils/guc.h typedef key_t IpcMemoryKey; /* shared memory key passed to shmget(2) */ @@ -61,6 +65,19 @@ typedef int IpcMemoryId; /* shared memory ID returned by shmget(2) */ #define MAP_FAILED ((void *) -1) #endif +#ifdef MAP_HUGETLB +# ifdef __ia64__ +#define PG_HUGETLB_BASE_ADDR (void *)(0x8000UL) +#define PG_MAP_HUGETLB (MAP_HUGETLB|MAP_FIXED) +# else +#define PG_HUGETLB_BASE_ADDR (void *)(0x0UL) +#define PG_MAP_HUGETLB MAP_HUGETLB +# endif +#else +# define PG_MAP_HUGETLB 0 +#endif + + unsigned long UsedShmemSegID = 0; void*UsedShmemSegAddr = NULL; @@ -73,7 +90,6 @@ static void IpcMemoryDelete(int status, Datum shmId); static PGShmemHeader *PGSharedMemoryAttach(IpcMemoryKey key, IpcMemoryId *shmid); - /* * InternalIpcMemoryCreate(memKey, size) * @@ -342,6 +358,155 @@ PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2) } +#ifdef MAP_HUGETLB +#define HUGE_PAGE_INFO_DIR /sys/kernel/mm/hugepages + +/* + * static long InternalGetFreeHugepagesCount(const char *name) + * + * Attempt to read the number of available hugepages from + * /sys/kernel/mm/hugepages/hugepages-size/free_hugepages + * Will fail (return -1) if file could not be opened, 0 if no pages are available + * and 0 if there are free pages + * + */ +static long +InternalGetFreeHugepagesCount(const char *name) +{ + int fd; + char buff[1024]; + size_t len; + long result; + char *ptr; + + len = snprintf(buff, 1024, %s/%s/free_hugepages, HUGE_PAGE_INFO_DIR, name); + if (len == 1024) /* I don't think that this will happen ever */ + { + ereport(huge_tlb_pages == HUGE_TLB_TRY ? DEBUG1 : WARNING, + (errmsg(Filename %s/%s/free_hugepages is too long, HUGE_PAGE_INFO_DIR, name), + errcontext(while checking hugepage size))); + return -1; + } + + fd = open(buff, O_RDONLY); + if (fd = 0) + { + ereport(huge_tlb_pages == HUGE_TLB_TRY ? DEBUG1 : WARNING, + (errmsg(Could not open file %s: %s, buff, strerror(errno)), + errcontext(while checking hugepage size))); + return -1; + } + + len = read(fd, buff, 1024); + if (len = 0) + { + ereport(huge_tlb_pages == HUGE_TLB_TRY ? DEBUG1 : WARNING, + (errmsg(Error reading from file %s: %s, buff, strerror(errno)), + errcontext(while checking
Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory
On 2012-10-30 21:16:07 +0100, Christian Kruse wrote: +static long +InternalGetHugepageSize() +{ + DIR *dir = opendir(HUGE_PAGE_INFO_DIR); + long smallest_size = -1, size; + char *ptr; ... + while((ent = readdir(dir)) != NULL) + { This should be (Allocate|Read)Dir btw. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory
Hi CK, On 2012-10-30 21:16:07 +0100, Christian Kruse wrote: index b4fcbaf..66ed10f 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml I think a short introduction or at least a reference on how to configure hugepages would be a good thing. varlistentry id=guc-temp-buffers xreflabel=temp_buffers termvarnametemp_buffers/varname (typeinteger/type)/term indexterm diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c index df06312..f9de239 100644 --- a/src/backend/port/sysv_shmem.c +++ b/src/backend/port/sysv_shmem.c @@ -27,10 +27,14 @@ #ifdef HAVE_SYS_SHM_H #include sys/shm.h #endif +#ifdef MAP_HUGETLB +#include dirent.h +#endif I think a central #define for the MAP_HUGETLB capability would be a good idea, akin to HAVE_SYS_SHM_H. E.g. this: --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -22,6 +22,7 @@ #include limits.h #include unistd.h #include sys/stat.h +#include sys/mman.h #ifdef HAVE_SYSLOG #include syslog.h #endif is unlikely to fly on windows. +/* + * static long InternalGetHugepageSize() + * + * Attempt to get a valid hugepage size from /sys/kernel/mm/hugepages/ by + * reading directory contents + * Will fail (return -1) if the directory could not be opened or no valid + * page sizes are available. Will return the biggest hugepage size on + * success. + * + */ The biggest remark is out of date. +static long +InternalGetHugepageSize() +{ ... + if ((smallest_size == -1 || size smallest_size) + InternalGetFreeHugepagesCount(ent-d_name) 0) + { + smallest_size = size; + } ... + + if (smallest_size == -1) + { + ereport(huge_tlb_pages == HUGE_TLB_TRY ? DEBUG1 : WARNING, + (errmsg(Could not find a valid hugepage size), + errhint(This error usually means that either CONFIG_HUGETLB_PAGE + is not in kernel or that your architecture does not + support hugepages or you did not configure hugepages))); + } I think differentiating the error message between no hugepages found and InternalGetFreeHugepagesCount(ent-d_name) always beeing zero would be a good idea. Failing this way if InternalGetFreeHugepagesCount(ent-d_name) 0 seems fine. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory
Oh, one more thing... On 2012-10-30 21:16:07 +0100, Christian Kruse wrote: ok, I think I implemented all of the changes you requested. All but the ia64 dependent, I have to do more research for this one. I vote for simply not caring about ia64. This is: +#ifdef MAP_HUGETLB +# ifdef __ia64__ +#define PG_HUGETLB_BASE_ADDR (void *)(0x8000UL) +#define PG_MAP_HUGETLB (MAP_HUGETLB|MAP_FIXED) +# else +#define PG_HUGETLB_BASE_ADDR (void *)(0x0UL) +#define PG_MAP_HUGETLB MAP_HUGETLB +# endif +#else +# define PG_MAP_HUGETLB 0 +#endif too much underdocumented crazyness for a very minor platform. Should somebody with the approprate harware want to submit an additional patch, fine Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory
Hi, On Monday 29 October 2012 16:33:25 Tom Lane wrote: Christian Kruse cjk+postg...@defunct.ch writes: I created a patch which implements MAP_HUGETLB for sysv shared memory segments (PGSharedMemoryCreate). It is based on tests of Tom Lane and Andres Freund, I added error handling, huge page size detection and a GUC variable. My recollection is we'd decided not to pursue that because of fears that it could make performance horribly worse on some platforms. This is why I created a GUC variable huge_tlb_pages = on|off|try to enable/disable this feature. A performance improvement up to 10% seems worth the trouble of setting another config value, IMO. Greetings, CK -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory
Hey, On 30/10/12 20:33, Andres Freund wrote: +#ifdef MAP_HUGETLB +# ifdef __ia64__ +#define PG_HUGETLB_BASE_ADDR (void *)(0x8000UL) +#define PG_MAP_HUGETLB (MAP_HUGETLB|MAP_FIXED) +# else Not your fault, but that looks rather strange to me. The level of documentation around the requirement of using MAP_FIXED is subpar... Yeah, looks strange to me, too. This is why I asked if this is really all we have to do. I dislike automatically using the biggest size here. There are platforms with 16GB hugepages... I think we should rather always use the smallest size and leave it to the kernel to choose whatever pagesize he wants to use. Good point. I will change this to the smallest available page size. Thats going to log two warnings if the current system doesn't have hugepage support compiled in and thus no /sys/kernel/mm/hugepages directory. I think you should 1. only test this if TRY or ON is configured, 2. make the messages in InternalGetHugepageSize DEBUG1 at least for the TRY case. Ok, point taken. […] + HUGE_TLB_TRY, huge_tlb_options, + NULL, NULL, NULL + }, you use HUGE_TLB_TRY with ifdef here. Ah, right. Setting it to HUGE_TLB_OFF when no MAP_HUGETLB is available. Greetings, CK pgprK0vhwogK9.pgp Description: PGP signature
Re: [HACKERS] Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory
Hi, On 29/10/12 21:14, Peter Geoghegan wrote: I have a few initial observations on this. Thanks for your feedback. * I think you should be making the new GUC PGC_INTERNAL on platforms where MAP_HUGETLB is not defined or available. See also, effective_io_concurrency. This gives sane error handling. Ok, sounds good for me. * Why aren't you failing when HUGE_TLB_ON and the mmap fails? You do the same thing as HUGE_TLB_TRY, contrary to your documentation: +if (huge_tlb_pages == HUGE_TLB_ON || huge_tlb_pages == HUGE_TLB_TRY) No, what I did was mmap()ing the memory with MAP_HUGETLB and falling back to mmap() w/o MAP_HUGETLB when mmap() failed. But there was another bug, I didn't mmap() at all when huge_tlb_pages == off. * I think you should add MAP_HUGETLB to PG_MMAP_FLAGS directly, rather than XOR'ing after the fact. We already build the flags that comprise PG_MMAP_FLAGS using another set of intermediary flags, based on platform considerations, so what you're doing here is just inconsistent. This would mean that I have to disable the bit when huge_tlb_pages == off. I think it is better to enable it if wanted and to just leave the flags as they were when this feature has been turned off, do you disagree? Don't wrap the mmap() code in ifdefs, and instead rely on the GUC being available on all platforms, with setting the GUC causing an error on unsupported platforms, in the style of effective_io_concurrency, as established in commit 3b07182e613a0e63ff662e3dc7f820f010923ec7 . Build a PG_MAP_HUGETLB intermediary flag on all platforms. Ok, this sounds good. It will remove complexity from the code. * Apparently you're supposed to do this for the benefit of Itanic [1]: /* Only ia64 requires this */ #ifdef __ia64__ #define ADDR (void *)(0x8000UL) #define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB | MAP_FIXED) #else #define ADDR (void *)(0x0UL) #define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB) #endif Hm… is this enough? Or do we have to do more work for ia64? * You aren't following our style guide with respect to error messages [2]. Thanks, I wasn't aware of this. I attached a new version of the patch. Greetings, CK pgp3AmMtkv8SY.pgp Description: PGP signature
Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory
Hi, On 29/10/12 16:33, Tom Lane wrote: I created a patch which implements MAP_HUGETLB for sysv shared memory segments (PGSharedMemoryCreate). It is based on tests of Tom Lane and Andres Freund, I added error handling, huge page size detection and a GUC variable. My recollection is we'd decided not to pursue that because of fears that it could make performance horribly worse on some platforms. This is why I created a GUC for turning this feature off. Greetings, CK pgpADXeFXTb5Z.pgp Description: PGP signature
Re: [HACKERS] Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory
Hey, Oh man, first I didn't sent the email to the list and now I forgot the attachment. I should really get some sleep, sorry for any inconveniences :( Greetings, CK diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index b4fcbaf..66ed10f 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1049,6 +1049,37 @@ include 'filename' /listitem /varlistentry + varlistentry id=guc-huge-tlb-pages xreflabel=huge_tlb_pages + termvarnamehuge_tlb_pages/varname (typeenum/type)/term + indexterm + primaryvarnamehuge_tlb_pages/ configuration parameter/primary + /indexterm + listitem + para +Enables/disables the use of huge tlb pages. Valid values are +literalon/literal, literaloff/literal and literaltry/literal. +The default value is literaltry/literal. + /para + + para +With varnamehuge_tlb_pages/varname set to literalon/literal +symbolmmap()/symbol will be called with symbolMAP_HUGETLB/symbol. +If the call fails the server will fail fatally. + /para + + para +With varnamehuge_tlb_pages/varname set to literaloff/literal we +will not use symbolMAP_HUGETLB/symbol at all. + /para + + para +With varnamehuge_tlb_pages/varname set to literaltry/literal +we will try to use symbolMAP_HUGETLB/symbol and fall back to +symbolmmap()/symbol without symbolMAP_HUGETLB/symbol. + /para + /listitem + /varlistentry + varlistentry id=guc-temp-buffers xreflabel=temp_buffers termvarnametemp_buffers/varname (typeinteger/type)/term indexterm diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c index df06312..73cfdef 100644 --- a/src/backend/port/sysv_shmem.c +++ b/src/backend/port/sysv_shmem.c @@ -27,10 +27,14 @@ #ifdef HAVE_SYS_SHM_H #include sys/shm.h #endif +#ifdef MAP_HUGETLB +#include dirent.h +#endif #include miscadmin.h #include storage/ipc.h #include storage/pg_shmem.h +#include utils/guc.h typedef key_t IpcMemoryKey;/* shared memory key passed to shmget(2) */ @@ -61,6 +65,19 @@ typedef int IpcMemoryId; /* shared memory ID returned by shmget(2) */ #define MAP_FAILED ((void *) -1) #endif +#ifdef MAP_HUGETLB +# ifdef __ia64__ +#define PG_HUGETLB_BASE_ADDR (void *)(0x8000UL) +#define PG_MAP_HUGETLB (MAP_HUGETLB|MAP_FIXED) +# else +#define PG_HUGETLB_BASE_ADDR (void *)(0x0UL) +#define PG_MAP_HUGETLB MAP_HUGETLB +# endif +#else +# define PG_MAP_HUGETLB 0 +#endif + + unsigned long UsedShmemSegID = 0; void *UsedShmemSegAddr = NULL; @@ -342,6 +359,154 @@ PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2) } +#ifdef MAP_HUGETLB +#define HUGE_PAGE_INFO_DIR /sys/kernel/mm/hugepages + +/* + * static long InternalGetFreeHugepagesCount(const char *name) + * + * Attempt to read the number of available hugepages from + * /sys/kernel/mm/hugepages/hugepages-size/free_hugepages + * Will fail (return -1) if file could not be opened, 0 if no pages are available + * and 0 if there are free pages + * + */ +static long +InternalGetFreeHugepagesCount(const char *name) +{ + int fd; + char buff[1024]; + size_t len; + long result; + char *ptr; + + len = snprintf(buff, 1024, %s/%s/free_hugepages, HUGE_PAGE_INFO_DIR, name); + if (len == 1024) /* I don't think that this will happen ever */ + { + ereport(WARNING, + (errmsg(Filename %s/%s/free_hugepages is too long, HUGE_PAGE_INFO_DIR, name), +errcontext(while checking hugepage size))); + return -1; + } + + fd = open(buff, O_RDONLY); + if (fd = 0) + { + ereport(WARNING, + (errmsg(Could not open file %s: %s, buff, strerror(errno)), +errcontext(while checking hugepage size))); + return -1; + } + + len = read(fd, buff, 1024); + if (len = 0) + { + ereport(WARNING, + (errmsg(Error reading from file %s: %s, buff, strerror(errno)), +errcontext(while checking hugepage size))); + close(fd); + return -1; + } + + /* +* If the content of free_hugepages is longer than or equal to 1024 bytes +* the rest is irrelevant; we simply want to know if there are any +* hugepages left +*/ + if (len == 1024) + { + buff[1023] = 0; + } + else + { + buff[len] = 0; + } + + close(fd); + + result = strtol(buff, ptr, 10); + + if (ptr == NULL) + { + ereport(WARNING, + (errmsg(Could not convert contents of file %s/%s/free_hugepages
Re: [HACKERS] Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory
On Tuesday, October 30, 2012 08:20:33 PM Christian Kruse wrote: Hey, Oh man, first I didn't sent the email to the list and now I forgot the attachment. I should really get some sleep, sorry for any inconveniences :( +#ifdef MAP_HUGETLB +# ifdef __ia64__ +#define PG_HUGETLB_BASE_ADDR (void *)(0x8000UL) +#define PG_MAP_HUGETLB (MAP_HUGETLB|MAP_FIXED) +# else Not your fault, but that looks rather strange to me. The level of documentation around the requirement of using MAP_FIXED is subpar... +long +InternalGetHugepageSize() +{ +... + if (biggest_size size InternalGetFreeHugepagesCount(ent- d_name) 0) + { + biggest_size = size; + } I dislike automatically using the biggest size here. There are platforms with 16GB hugepages... I think we should rather always use the smallest size and leave it to the kernel to choose whatever pagesize he wants to use. +#ifdef MAP_HUGETLB + longpagesize = InternalGetHugepageSize(); + + if (pagesize = 0) + pagesize = sysconf(_SC_PAGE_SIZE); +#else longpagesize = sysconf(_SC_PAGE_SIZE); +#endif Thats going to log two warnings if the current system doesn't have hugepage support compiled in and thus no /sys/kernel/mm/hugepages directory. I think you should 1. only test this if TRY or ON is configured, 2. make the messages in InternalGetHugepageSize DEBUG1 at least for the TRY case. + { + {huge_tlb_pages, +#ifdef MAP_HUGETLB + PGC_SUSET, +#else + PGC_INTERNAL, +#endif + RESOURCES_MEM, + gettext_noop(Enable/disable the use of the hugepages feature), + NULL + }, + huge_tlb_pages, + HUGE_TLB_TRY, huge_tlb_options, + NULL, NULL, NULL + }, you use HUGE_TLB_TRY with ifdef here. Looking forward to having this... Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory
Hey, On 30/10/12 20:33, Andres Freund wrote: +#ifdef MAP_HUGETLB +# ifdef __ia64__ +#define PG_HUGETLB_BASE_ADDR (void *)(0x8000UL) +#define PG_MAP_HUGETLB (MAP_HUGETLB|MAP_FIXED) +# else Not your fault, but that looks rather strange to me. The level of documentation around the requirement of using MAP_FIXED is subpar... Yeah, looks strange to me, too. This is why I asked if this is really all we have to do. I dislike automatically using the biggest size here. There are platforms with 16GB hugepages... I think we should rather always use the smallest size and leave it to the kernel to choose whatever pagesize he wants to use. Good point. I will change this to the smallest available page size. Thats going to log two warnings if the current system doesn't have hugepage support compiled in and thus no /sys/kernel/mm/hugepages directory. I think you should 1. only test this if TRY or ON is configured, 2. make the messages in InternalGetHugepageSize DEBUG1 at least for the TRY case. Ok, point taken. […] + HUGE_TLB_TRY, huge_tlb_options, + NULL, NULL, NULL + }, you use HUGE_TLB_TRY with ifdef here. Ah, right. Setting it to HUGE_TLB_OFF when no MAP_HUGETLB is available. Greetings, CK pgp6PpeR5Kpxp.pgp Description: PGP signature
Re: [HACKERS] Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory
Hey, ok, I think I implemented all of the changes you requested. All but the ia64 dependent, I have to do more research for this one. Greetings, CK diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index b4fcbaf..66ed10f 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1049,6 +1049,37 @@ include 'filename' /listitem /varlistentry + varlistentry id=guc-huge-tlb-pages xreflabel=huge_tlb_pages + termvarnamehuge_tlb_pages/varname (typeenum/type)/term + indexterm + primaryvarnamehuge_tlb_pages/ configuration parameter/primary + /indexterm + listitem + para +Enables/disables the use of huge tlb pages. Valid values are +literalon/literal, literaloff/literal and literaltry/literal. +The default value is literaltry/literal. + /para + + para +With varnamehuge_tlb_pages/varname set to literalon/literal +symbolmmap()/symbol will be called with symbolMAP_HUGETLB/symbol. +If the call fails the server will fail fatally. + /para + + para +With varnamehuge_tlb_pages/varname set to literaloff/literal we +will not use symbolMAP_HUGETLB/symbol at all. + /para + + para +With varnamehuge_tlb_pages/varname set to literaltry/literal +we will try to use symbolMAP_HUGETLB/symbol and fall back to +symbolmmap()/symbol without symbolMAP_HUGETLB/symbol. + /para + /listitem + /varlistentry + varlistentry id=guc-temp-buffers xreflabel=temp_buffers termvarnametemp_buffers/varname (typeinteger/type)/term indexterm diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c index df06312..f9de239 100644 --- a/src/backend/port/sysv_shmem.c +++ b/src/backend/port/sysv_shmem.c @@ -27,10 +27,14 @@ #ifdef HAVE_SYS_SHM_H #include sys/shm.h #endif +#ifdef MAP_HUGETLB +#include dirent.h +#endif #include miscadmin.h #include storage/ipc.h #include storage/pg_shmem.h +#include utils/guc.h typedef key_t IpcMemoryKey;/* shared memory key passed to shmget(2) */ @@ -61,6 +65,19 @@ typedef int IpcMemoryId; /* shared memory ID returned by shmget(2) */ #define MAP_FAILED ((void *) -1) #endif +#ifdef MAP_HUGETLB +# ifdef __ia64__ +#define PG_HUGETLB_BASE_ADDR (void *)(0x8000UL) +#define PG_MAP_HUGETLB (MAP_HUGETLB|MAP_FIXED) +# else +#define PG_HUGETLB_BASE_ADDR (void *)(0x0UL) +#define PG_MAP_HUGETLB MAP_HUGETLB +# endif +#else +# define PG_MAP_HUGETLB 0 +#endif + + unsigned long UsedShmemSegID = 0; void *UsedShmemSegAddr = NULL; @@ -73,7 +90,6 @@ static void IpcMemoryDelete(int status, Datum shmId); static PGShmemHeader *PGSharedMemoryAttach(IpcMemoryKey key, IpcMemoryId *shmid); - /* * InternalIpcMemoryCreate(memKey, size) * @@ -342,6 +358,155 @@ PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2) } +#ifdef MAP_HUGETLB +#define HUGE_PAGE_INFO_DIR /sys/kernel/mm/hugepages + +/* + * static long InternalGetFreeHugepagesCount(const char *name) + * + * Attempt to read the number of available hugepages from + * /sys/kernel/mm/hugepages/hugepages-size/free_hugepages + * Will fail (return -1) if file could not be opened, 0 if no pages are available + * and 0 if there are free pages + * + */ +static long +InternalGetFreeHugepagesCount(const char *name) +{ + int fd; + char buff[1024]; + size_t len; + long result; + char *ptr; + + len = snprintf(buff, 1024, %s/%s/free_hugepages, HUGE_PAGE_INFO_DIR, name); + if (len == 1024) /* I don't think that this will happen ever */ + { + ereport(huge_tlb_pages == HUGE_TLB_TRY ? DEBUG1 : WARNING, + (errmsg(Filename %s/%s/free_hugepages is too long, HUGE_PAGE_INFO_DIR, name), +errcontext(while checking hugepage size))); + return -1; + } + + fd = open(buff, O_RDONLY); + if (fd = 0) + { + ereport(huge_tlb_pages == HUGE_TLB_TRY ? DEBUG1 : WARNING, + (errmsg(Could not open file %s: %s, buff, strerror(errno)), +errcontext(while checking hugepage size))); + return -1; + } + + len = read(fd, buff, 1024); + if (len = 0) + { + ereport(huge_tlb_pages == HUGE_TLB_TRY ? DEBUG1 : WARNING, + (errmsg(Error reading from file %s: %s, buff, strerror(errno)), +errcontext(while checking hugepage size))); + close(fd); + return -1; + } + + /* +* If the content of free_hugepages is longer than or equal to 1024 bytes +* the rest is irrelevant; we simply want to know if there are any +* hugepages left +*/ +
Re: [HACKERS] Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory
Hey, On Tuesday 30 October 2012 20:33:18 Andres Freund wrote: +#ifdef MAP_HUGETLB +# ifdef __ia64__ +#define PG_HUGETLB_BASE_ADDR (void *)(0x8000UL) +#define PG_MAP_HUGETLB (MAP_HUGETLB|MAP_FIXED) +# else Not your fault, but that looks rather strange to me. The level of documentation around the requirement of using MAP_FIXED is subpar... Ok, after further reading I think with MAP_FIXED one has to „generate“ SHM segment addresses hisself. So 0x8000UL would just be one possible location for a hugepage. Since this feature is not used in PG for now, the code should work. Could someone with access to a ia64 architecture verify it? Greetings, CK -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory
Christian Kruse cjk+postg...@defunct.ch writes: I created a patch which implements MAP_HUGETLB for sysv shared memory segments (PGSharedMemoryCreate). It is based on tests of Tom Lane and Andres Freund, I added error handling, huge page size detection and a GUC variable. My recollection is we'd decided not to pursue that because of fears that it could make performance horribly worse on some platforms. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory
On Monday, October 29, 2012 09:33:25 PM Tom Lane wrote: Christian Kruse cjk+postg...@defunct.ch writes: I created a patch which implements MAP_HUGETLB for sysv shared memory segments (PGSharedMemoryCreate). It is based on tests of Tom Lane and Andres Freund, I added error handling, huge page size detection and a GUC variable. My recollection is we'd decided not to pursue that because of fears that it could make performance horribly worse on some platforms. Hm. I don't so. Maybe youre remembering our discussion about mlock? MAP_HUGETLB is linux specific anyway, there are other OSes doing this, but I don't think a standard exists. http://archives.postgresql.org/message- id/201206292152.40311.andres%402ndquadrant.com and http://archives.postgresql.org/message- id/CA%2BTgmoZGX0Pi0rw5sDH0Uz%3D03WkQ%3DmnoAW3TXVEfvUpyW%2BfMjw%40mail.gmail.com seem to be the most recent discussions arround this. The parts I like most about having usable hugepages arround is usable 'ps' output and way smaller pagetables... Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory
On 29 October 2012 20:14, Christian Kruse cjk+postg...@defunct.ch wrote: created a patch which implements MAP_HUGETLB for sysv shared memory segments (PGSharedMemoryCreate). It is based on tests of Tom Lane and Andres Freund, I added error handling, huge page size detection and a GUC variable. I have a few initial observations on this. * I think you should be making the new GUC PGC_INTERNAL on platforms where MAP_HUGETLB is not defined or available. See also, effective_io_concurrency. This gives sane error handling. * Why aren't you failing when HUGE_TLB_ON and the mmap fails? You do the same thing as HUGE_TLB_TRY, contrary to your documentation: + if (huge_tlb_pages == HUGE_TLB_ON || huge_tlb_pages == HUGE_TLB_TRY) * I think you should add MAP_HUGETLB to PG_MMAP_FLAGS directly, rather than XOR'ing after the fact. We already build the flags that comprise PG_MMAP_FLAGS using another set of intermediary flags, based on platform considerations, so what you're doing here is just inconsistent. Don't wrap the mmap() code in ifdefs, and instead rely on the GUC being available on all platforms, with setting the GUC causing an error on unsupported platforms, in the style of effective_io_concurrency, as established in commit 3b07182e613a0e63ff662e3dc7f820f010923ec7 . Build a PG_MAP_HUGETLB intermediary flag on all platforms. * Apparently you're supposed to do this for the benefit of Itanic [1]: /* Only ia64 requires this */ #ifdef __ia64__ #define ADDR (void *)(0x8000UL) #define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB | MAP_FIXED) #else #define ADDR (void *)(0x0UL) #define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB) #endif * You aren't following our style guide with respect to error messages [2]. [1] https://www.kernel.org/doc/Documentation/vm/map_hugetlb.c [2] http://www.postgresql.org/docs/devel/static/error-style-guide.html -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers