Re: [HACKERS] Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory

2013-06-29 Thread Andres Freund
Hi,

On 2013-06-28 23:03:22 -0400, Bruce Momjian wrote:
 Did we decide against specifying huge pages in Postgres?

I don't think so. We need somebody to make some last modifications to
the patch though and Christian doesn't seem to have the time atm.

I think the bigger memory (size of the per process page table) and #cpus
(number of pg backends = number of page tables allocated) the more
imporant it gets to implement this. We already can spend gigabytes of
memory on those on big systems.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory

2013-06-28 Thread Bruce Momjian

Did we decide against specifying huge pages in Postgres?

---

On Tue, Oct 30, 2012 at 09:16:07PM +0100, Christian Kruse wrote:
 Hey,
 
 ok, I think I implemented all of the changes you requested. All but
 the ia64 dependent, I have to do more research for this one.
 
 
 Greetings,
  CK

 diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
 index b4fcbaf..66ed10f 100644
 --- a/doc/src/sgml/config.sgml
 +++ b/doc/src/sgml/config.sgml
 @@ -1049,6 +1049,37 @@ include 'filename'
/listitem
   /varlistentry
  
 + varlistentry id=guc-huge-tlb-pages xreflabel=huge_tlb_pages
 +  termvarnamehuge_tlb_pages/varname (typeenum/type)/term
 +  indexterm
 +   primaryvarnamehuge_tlb_pages/ configuration parameter/primary
 +  /indexterm
 +  listitem
 +   para
 +Enables/disables the use of huge tlb pages. Valid values are
 +literalon/literal, literaloff/literal and 
 literaltry/literal.
 +The default value is literaltry/literal.
 +   /para
 +
 +   para
 +With varnamehuge_tlb_pages/varname set to literalon/literal
 +symbolmmap()/symbol will be called with 
 symbolMAP_HUGETLB/symbol.
 +If the call fails the server will fail fatally.
 +   /para
 +
 +   para
 +With varnamehuge_tlb_pages/varname set to literaloff/literal 
 we
 +will not use symbolMAP_HUGETLB/symbol at all.
 +   /para
 +
 +   para
 +With varnamehuge_tlb_pages/varname set to literaltry/literal
 +we will try to use symbolMAP_HUGETLB/symbol and fall back to
 +symbolmmap()/symbol without symbolMAP_HUGETLB/symbol.
 +   /para
 +  /listitem
 + /varlistentry
 +
   varlistentry id=guc-temp-buffers xreflabel=temp_buffers
termvarnametemp_buffers/varname (typeinteger/type)/term
indexterm
 diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
 index df06312..f9de239 100644
 --- a/src/backend/port/sysv_shmem.c
 +++ b/src/backend/port/sysv_shmem.c
 @@ -27,10 +27,14 @@
  #ifdef HAVE_SYS_SHM_H
  #include sys/shm.h
  #endif
 +#ifdef MAP_HUGETLB
 +#include dirent.h
 +#endif
  
  #include miscadmin.h
  #include storage/ipc.h
  #include storage/pg_shmem.h
 +#include utils/guc.h
  
  
  typedef key_t IpcMemoryKey;  /* shared memory key passed to 
 shmget(2) */
 @@ -61,6 +65,19 @@ typedef int IpcMemoryId;   /* shared memory ID 
 returned by shmget(2) */
  #define MAP_FAILED ((void *) -1)
  #endif
  
 +#ifdef MAP_HUGETLB
 +#  ifdef __ia64__
 +#define PG_HUGETLB_BASE_ADDR (void *)(0x8000UL)
 +#define PG_MAP_HUGETLB (MAP_HUGETLB|MAP_FIXED)
 +#  else
 +#define PG_HUGETLB_BASE_ADDR (void *)(0x0UL)
 +#define PG_MAP_HUGETLB MAP_HUGETLB
 +#  endif
 +#else
 +#  define PG_MAP_HUGETLB 0
 +#endif
 +
 +
  
  unsigned long UsedShmemSegID = 0;
  void*UsedShmemSegAddr = NULL;
 @@ -73,7 +90,6 @@ static void IpcMemoryDelete(int status, Datum shmId);
  static PGShmemHeader *PGSharedMemoryAttach(IpcMemoryKey key,
IpcMemoryId *shmid);
  
 -
  /*
   *   InternalIpcMemoryCreate(memKey, size)
   *
 @@ -342,6 +358,155 @@ PGSharedMemoryIsInUse(unsigned long id1, unsigned long 
 id2)
  }
  
  
 +#ifdef MAP_HUGETLB
 +#define HUGE_PAGE_INFO_DIR  /sys/kernel/mm/hugepages
 +
 +/*
 + *   static long InternalGetFreeHugepagesCount(const char *name)
 + *
 + * Attempt to read the number of available hugepages from
 + * /sys/kernel/mm/hugepages/hugepages-size/free_hugepages
 + * Will fail (return -1) if file could not be opened, 0 if no pages are 
 available
 + * and  0 if there are free pages
 + *
 + */
 +static long
 +InternalGetFreeHugepagesCount(const char *name)
 +{
 + int fd;
 + char buff[1024];
 + size_t len;
 + long result;
 + char *ptr;
 +
 + len = snprintf(buff, 1024, %s/%s/free_hugepages, HUGE_PAGE_INFO_DIR, 
 name);
 + if (len == 1024) /* I don't think that this will happen ever */
 + {
 + ereport(huge_tlb_pages == HUGE_TLB_TRY ? DEBUG1 : WARNING,
 + (errmsg(Filename %s/%s/free_hugepages is too 
 long, HUGE_PAGE_INFO_DIR, name),
 +  errcontext(while checking hugepage size)));
 + return -1;
 + }
 +
 + fd = open(buff, O_RDONLY);
 + if (fd = 0)
 + {
 + ereport(huge_tlb_pages == HUGE_TLB_TRY ? DEBUG1 : WARNING,
 + (errmsg(Could not open file %s: %s, buff, 
 strerror(errno)),
 +  errcontext(while checking hugepage size)));
 + return -1;
 + }
 +
 + len = read(fd, buff, 1024);
 + if (len = 0)
 + {
 + ereport(huge_tlb_pages == HUGE_TLB_TRY ? DEBUG1 : WARNING,
 + (errmsg(Error reading from file %s: %s, buff, 
 strerror(errno)),
 +  errcontext(while checking 

Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory

2012-12-21 Thread Andres Freund
On 2012-10-30 21:16:07 +0100, Christian Kruse wrote:
 +static long
 +InternalGetHugepageSize()
 +{
 + DIR *dir = opendir(HUGE_PAGE_INFO_DIR);
 + long smallest_size = -1, size;
 + char *ptr;
 ...
 + while((ent = readdir(dir)) != NULL)
 + {

This should be (Allocate|Read)Dir btw.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory

2012-11-13 Thread Andres Freund
Hi CK,

On 2012-10-30 21:16:07 +0100, Christian Kruse wrote:
 index b4fcbaf..66ed10f 100644
 --- a/doc/src/sgml/config.sgml
 +++ b/doc/src/sgml/config.sgml

I think a short introduction or at least a reference on how to configure
hugepages would be a good thing.

   varlistentry id=guc-temp-buffers xreflabel=temp_buffers
termvarnametemp_buffers/varname (typeinteger/type)/term
indexterm
 diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
 index df06312..f9de239 100644
 --- a/src/backend/port/sysv_shmem.c
 +++ b/src/backend/port/sysv_shmem.c
 @@ -27,10 +27,14 @@
  #ifdef HAVE_SYS_SHM_H
  #include sys/shm.h
  #endif
 +#ifdef MAP_HUGETLB
 +#include dirent.h
 +#endif

I think a central #define for the MAP_HUGETLB capability would be a good
idea, akin to HAVE_SYS_SHM_H.

E.g. this:
 --- a/src/backend/utils/misc/guc.c
 +++ b/src/backend/utils/misc/guc.c
 @@ -22,6 +22,7 @@
  #include limits.h
  #include unistd.h
  #include sys/stat.h
 +#include sys/mman.h
  #ifdef HAVE_SYSLOG
  #include syslog.h
  #endif

is unlikely to fly on windows.


 +/*
 + *   static long InternalGetHugepageSize()
 + *
 + * Attempt to get a valid hugepage size from /sys/kernel/mm/hugepages/ by
 + * reading directory contents
 + * Will fail (return -1) if the directory could not be opened or no valid
 + * page sizes are available. Will return the biggest hugepage size on
 + * success.
 + *
 + */

The biggest remark is out of date.


 +static long
 +InternalGetHugepageSize()
 +{
 ...
 + if ((smallest_size == -1 || size  smallest_size)
 +  InternalGetFreeHugepagesCount(ent-d_name)  
 0)
 + {
 + smallest_size = size;
 + }
 ...
 +
 + if (smallest_size == -1)
 + {
 + ereport(huge_tlb_pages == HUGE_TLB_TRY ? DEBUG1 : WARNING,
 + (errmsg(Could not find a valid hugepage size),
 +  errhint(This error usually means that either 
 CONFIG_HUGETLB_PAGE 
 +  is not in kernel or that your 
 architecture does not 
 +  support hugepages or you did 
 not configure hugepages)));
 + }

I think differentiating the error message between no hugepages found and
InternalGetFreeHugepagesCount(ent-d_name) always beeing zero would be a
good idea. Failing this way if
InternalGetFreeHugepagesCount(ent-d_name)  0 seems fine.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory

2012-11-13 Thread Andres Freund
Oh, one more thing...

On 2012-10-30 21:16:07 +0100, Christian Kruse wrote:
 ok, I think I implemented all of the changes you requested. All but
 the ia64 dependent, I have to do more research for this one.

I vote for simply not caring about ia64.

This is:

 +#ifdef MAP_HUGETLB
 +#  ifdef __ia64__
 +#define PG_HUGETLB_BASE_ADDR (void *)(0x8000UL)
 +#define PG_MAP_HUGETLB (MAP_HUGETLB|MAP_FIXED)
 +#  else
 +#define PG_HUGETLB_BASE_ADDR (void *)(0x0UL)
 +#define PG_MAP_HUGETLB MAP_HUGETLB
 +#  endif
 +#else
 +#  define PG_MAP_HUGETLB 0
 +#endif

too much underdocumented crazyness for a very minor platform. Should
somebody with the approprate harware want to submit an additional patch,
fine


Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory

2012-11-01 Thread Christian Kruse
Hi,

On Monday 29 October 2012 16:33:25 Tom Lane wrote:
 Christian Kruse cjk+postg...@defunct.ch writes:
  I created a patch which implements MAP_HUGETLB for sysv shared memory
  segments (PGSharedMemoryCreate). It is based on tests of Tom Lane and
  Andres Freund, I added error handling, huge page size detection and a GUC
  variable.
 My recollection is we'd decided not to pursue that because of fears that
 it could make performance horribly worse on some platforms.

This is why I created a GUC variable huge_tlb_pages = on|off|try to 
enable/disable this feature.

A performance improvement up to 10% seems worth the trouble of setting another 
config value, IMO.

Greetings,
 CK



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory

2012-10-31 Thread Christian Kruse
Hey,

On 30/10/12 20:33, Andres Freund wrote:
 +#ifdef MAP_HUGETLB
 +#  ifdef __ia64__
 +#define PG_HUGETLB_BASE_ADDR (void *)(0x8000UL)
 +#define PG_MAP_HUGETLB (MAP_HUGETLB|MAP_FIXED)
 +#  else

 Not your fault, but that looks rather strange to me. The level of 
 documentation
 around the requirement of using MAP_FIXED is subpar...

Yeah, looks strange to me, too. This is why I asked if this is really
all we have to do.

 I dislike automatically using the biggest size here. There are platforms with
 16GB hugepages... I think we should rather always use the smallest size and
 leave it to the kernel to choose whatever pagesize he wants to use.

Good point. I will change this to the smallest available page size.

 Thats going to log two warnings if the current system doesn't have hugepage
 support compiled in and thus no /sys/kernel/mm/hugepages directory.
 I think you should 1. only test this if TRY or ON is configured, 2. make the
 messages in InternalGetHugepageSize DEBUG1 at least for the TRY case.

Ok, point taken.

 […]
 + HUGE_TLB_TRY, huge_tlb_options,
 + NULL, NULL, NULL
 + },

 you use HUGE_TLB_TRY with ifdef here.

Ah, right. Setting it to HUGE_TLB_OFF when no MAP_HUGETLB is available.

Greetings,
 CK


pgprK0vhwogK9.pgp
Description: PGP signature


Re: [HACKERS] Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory

2012-10-30 Thread Christian Kruse
Hi,

On 29/10/12 21:14, Peter Geoghegan wrote:
 I have a few initial observations on this.

Thanks for your feedback.


 * I think you should be making the new GUC PGC_INTERNAL on platforms
 where MAP_HUGETLB is not defined or available. See also,
 effective_io_concurrency. This gives sane error handling.

Ok, sounds good for me.

 * Why aren't you failing when HUGE_TLB_ON and the mmap fails? You do
 the same thing as HUGE_TLB_TRY, contrary to your documentation:

  +if (huge_tlb_pages == HUGE_TLB_ON || huge_tlb_pages == 
 HUGE_TLB_TRY)

No, what I did was mmap()ing the memory with MAP_HUGETLB and falling
back to mmap() w/o MAP_HUGETLB when mmap() failed. But there was
another bug, I didn't mmap() at all when huge_tlb_pages == off.

 * I think you should add MAP_HUGETLB to PG_MMAP_FLAGS directly, rather
 than XOR'ing after the fact. We already build the flags that comprise
 PG_MMAP_FLAGS using another set of intermediary flags, based on
 platform considerations, so what you're doing here is just
 inconsistent.

This would mean that I have to disable the bit when huge_tlb_pages ==
off. I think it is better to enable it if wanted and to just leave the
flags as they were when this feature has been turned off, do you
disagree?

 Don't wrap the mmap() code in ifdefs, and instead rely
 on the GUC being available on all platforms, with setting the GUC
 causing an error on unsupported platforms, in the style of
 effective_io_concurrency, as established in commit
 3b07182e613a0e63ff662e3dc7f820f010923ec7 . Build a PG_MAP_HUGETLB
 intermediary flag on all platforms.

Ok, this sounds good. It will remove complexity from the code.

 * Apparently you're supposed to do this for the benefit of Itanic [1]:

 /* Only ia64 requires this */
 #ifdef __ia64__
 #define ADDR (void *)(0x8000UL)
 #define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB | MAP_FIXED)
 #else
 #define ADDR (void *)(0x0UL)
 #define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB)
 #endif

Hm… is this enough? Or do we have to do more work for ia64?

 * You aren't following our style guide with respect to error messages [2].

Thanks, I wasn't aware of this. I attached a new version of the patch.

Greetings,
 CK


pgp3AmMtkv8SY.pgp
Description: PGP signature


Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory

2012-10-30 Thread Christian Kruse
Hi,

On 29/10/12 16:33, Tom Lane wrote:
  I created a patch which implements MAP_HUGETLB for sysv shared memory 
  segments
  (PGSharedMemoryCreate). It is based on tests of Tom Lane and Andres Freund, 
  I
  added error handling, huge page size detection and a GUC variable.

 My recollection is we'd decided not to pursue that because of fears that
 it could make performance horribly worse on some platforms.

This is why I created a GUC for turning this feature off.

Greetings,
 CK


pgpADXeFXTb5Z.pgp
Description: PGP signature


Re: [HACKERS] Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory

2012-10-30 Thread Christian Kruse
Hey,

Oh man, first I didn't sent the email to the list and now I forgot the
attachment. I should really get some sleep, sorry for any
inconveniences :(

Greetings,
 CK
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b4fcbaf..66ed10f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1049,6 +1049,37 @@ include 'filename'
   /listitem
  /varlistentry
 
+ varlistentry id=guc-huge-tlb-pages xreflabel=huge_tlb_pages
+  termvarnamehuge_tlb_pages/varname (typeenum/type)/term
+  indexterm
+   primaryvarnamehuge_tlb_pages/ configuration parameter/primary
+  /indexterm
+  listitem
+   para
+Enables/disables the use of huge tlb pages. Valid values are
+literalon/literal, literaloff/literal and 
literaltry/literal.
+The default value is literaltry/literal.
+   /para
+
+   para
+With varnamehuge_tlb_pages/varname set to literalon/literal
+symbolmmap()/symbol will be called with 
symbolMAP_HUGETLB/symbol.
+If the call fails the server will fail fatally.
+   /para
+
+   para
+With varnamehuge_tlb_pages/varname set to literaloff/literal we
+will not use symbolMAP_HUGETLB/symbol at all.
+   /para
+
+   para
+With varnamehuge_tlb_pages/varname set to literaltry/literal
+we will try to use symbolMAP_HUGETLB/symbol and fall back to
+symbolmmap()/symbol without symbolMAP_HUGETLB/symbol.
+   /para
+  /listitem
+ /varlistentry
+
  varlistentry id=guc-temp-buffers xreflabel=temp_buffers
   termvarnametemp_buffers/varname (typeinteger/type)/term
   indexterm
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index df06312..73cfdef 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -27,10 +27,14 @@
 #ifdef HAVE_SYS_SHM_H
 #include sys/shm.h
 #endif
+#ifdef MAP_HUGETLB
+#include dirent.h
+#endif
 
 #include miscadmin.h
 #include storage/ipc.h
 #include storage/pg_shmem.h
+#include utils/guc.h
 
 
 typedef key_t IpcMemoryKey;/* shared memory key passed to 
shmget(2) */
@@ -61,6 +65,19 @@ typedef int IpcMemoryId; /* shared memory ID 
returned by shmget(2) */
 #define MAP_FAILED ((void *) -1)
 #endif
 
+#ifdef MAP_HUGETLB
+#  ifdef __ia64__
+#define PG_HUGETLB_BASE_ADDR (void *)(0x8000UL)
+#define PG_MAP_HUGETLB (MAP_HUGETLB|MAP_FIXED)
+#  else
+#define PG_HUGETLB_BASE_ADDR (void *)(0x0UL)
+#define PG_MAP_HUGETLB MAP_HUGETLB
+#  endif
+#else
+#  define PG_MAP_HUGETLB 0
+#endif
+
+
 
 unsigned long UsedShmemSegID = 0;
 void  *UsedShmemSegAddr = NULL;
@@ -342,6 +359,154 @@ PGSharedMemoryIsInUse(unsigned long id1, unsigned long 
id2)
 }
 
 
+#ifdef MAP_HUGETLB
+#define HUGE_PAGE_INFO_DIR  /sys/kernel/mm/hugepages
+
+/*
+ * static long InternalGetFreeHugepagesCount(const char *name)
+ *
+ * Attempt to read the number of available hugepages from
+ * /sys/kernel/mm/hugepages/hugepages-size/free_hugepages
+ * Will fail (return -1) if file could not be opened, 0 if no pages are 
available
+ * and  0 if there are free pages
+ *
+ */
+static long
+InternalGetFreeHugepagesCount(const char *name)
+{
+   int fd;
+   char buff[1024];
+   size_t len;
+   long result;
+   char *ptr;
+
+   len = snprintf(buff, 1024, %s/%s/free_hugepages, HUGE_PAGE_INFO_DIR, 
name);
+   if (len == 1024) /* I don't think that this will happen ever */
+   {
+   ereport(WARNING,
+   (errmsg(Filename %s/%s/free_hugepages is too 
long, HUGE_PAGE_INFO_DIR, name),
+errcontext(while checking hugepage size)));
+   return -1;
+   }
+
+   fd = open(buff, O_RDONLY);
+   if (fd = 0)
+   {
+   ereport(WARNING,
+   (errmsg(Could not open file %s: %s, buff, 
strerror(errno)),
+errcontext(while checking hugepage size)));
+   return -1;
+   }
+
+   len = read(fd, buff, 1024);
+   if (len = 0)
+   {
+   ereport(WARNING,
+   (errmsg(Error reading from file %s: %s, buff, 
strerror(errno)),
+errcontext(while checking hugepage size)));
+   close(fd);
+   return -1;
+   }
+
+   /*
+* If the content of free_hugepages is longer than or equal to 1024 
bytes
+* the rest is irrelevant; we simply want to know if there are any
+* hugepages left
+*/
+   if (len == 1024)
+   {
+   buff[1023] = 0;
+   }
+   else
+   {
+   buff[len] = 0;
+   }
+
+   close(fd);
+
+   result = strtol(buff, ptr, 10);
+
+   if (ptr == NULL)
+   {
+   ereport(WARNING,
+   (errmsg(Could not convert contents of file 
%s/%s/free_hugepages 

Re: [HACKERS] Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory

2012-10-30 Thread Andres Freund
On Tuesday, October 30, 2012 08:20:33 PM Christian Kruse wrote:
 Hey,
 
 Oh man, first I didn't sent the email to the list and now I forgot the
 attachment. I should really get some sleep, sorry for any
 inconveniences :(

+#ifdef MAP_HUGETLB
+#  ifdef __ia64__
+#define PG_HUGETLB_BASE_ADDR (void *)(0x8000UL)
+#define PG_MAP_HUGETLB (MAP_HUGETLB|MAP_FIXED)
+#  else

Not your fault, but that looks rather strange to me. The level of documentation 
around the requirement of using MAP_FIXED is subpar...

+long
+InternalGetHugepageSize()
+{
+...
+   if (biggest_size  size  
InternalGetFreeHugepagesCount(ent-
d_name)  0)
+   {
+   biggest_size = size;
+   }

I dislike automatically using the biggest size here. There are platforms with 
16GB hugepages... I think we should rather always use the smallest size and 
leave it to the kernel to choose whatever pagesize he wants to use.

+#ifdef MAP_HUGETLB
+   longpagesize = InternalGetHugepageSize();
+
+   if (pagesize = 0)
+   pagesize = sysconf(_SC_PAGE_SIZE);
+#else
longpagesize = sysconf(_SC_PAGE_SIZE);
+#endif

Thats going to log two warnings if the current system doesn't have hugepage 
support compiled in and thus no /sys/kernel/mm/hugepages directory.
I think you should 1. only test this if TRY or ON is configured, 2. make the 
messages in InternalGetHugepageSize DEBUG1 at least for the TRY case.
 
+   {
+   {huge_tlb_pages,
+#ifdef MAP_HUGETLB
+   PGC_SUSET,
+#else
+   PGC_INTERNAL,
+#endif
+   RESOURCES_MEM,
+   gettext_noop(Enable/disable the use of the hugepages 
feature),
+   NULL
+   },
+   huge_tlb_pages,
+   HUGE_TLB_TRY, huge_tlb_options,
+   NULL, NULL, NULL
+   },

you use HUGE_TLB_TRY with ifdef here.


Looking forward to having this...

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory

2012-10-30 Thread Christian Kruse
Hey,

On 30/10/12 20:33, Andres Freund wrote:
 +#ifdef MAP_HUGETLB
 +#  ifdef __ia64__
 +#define PG_HUGETLB_BASE_ADDR (void *)(0x8000UL)
 +#define PG_MAP_HUGETLB (MAP_HUGETLB|MAP_FIXED)
 +#  else

 Not your fault, but that looks rather strange to me. The level of 
 documentation
 around the requirement of using MAP_FIXED is subpar...

Yeah, looks strange to me, too. This is why I asked if this is really
all we have to do.

 I dislike automatically using the biggest size here. There are platforms with
 16GB hugepages... I think we should rather always use the smallest size and
 leave it to the kernel to choose whatever pagesize he wants to use.

Good point. I will change this to the smallest available page size.

 Thats going to log two warnings if the current system doesn't have hugepage
 support compiled in and thus no /sys/kernel/mm/hugepages directory.
 I think you should 1. only test this if TRY or ON is configured, 2. make the
 messages in InternalGetHugepageSize DEBUG1 at least for the TRY case.

Ok, point taken.

 […]
 + HUGE_TLB_TRY, huge_tlb_options,
 + NULL, NULL, NULL
 + },

 you use HUGE_TLB_TRY with ifdef here.

Ah, right. Setting it to HUGE_TLB_OFF when no MAP_HUGETLB is available.

Greetings,
 CK


pgp6PpeR5Kpxp.pgp
Description: PGP signature


Re: [HACKERS] Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory

2012-10-30 Thread Christian Kruse
Hey,

ok, I think I implemented all of the changes you requested. All but
the ia64 dependent, I have to do more research for this one.


Greetings,
 CK
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b4fcbaf..66ed10f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1049,6 +1049,37 @@ include 'filename'
   /listitem
  /varlistentry
 
+ varlistentry id=guc-huge-tlb-pages xreflabel=huge_tlb_pages
+  termvarnamehuge_tlb_pages/varname (typeenum/type)/term
+  indexterm
+   primaryvarnamehuge_tlb_pages/ configuration parameter/primary
+  /indexterm
+  listitem
+   para
+Enables/disables the use of huge tlb pages. Valid values are
+literalon/literal, literaloff/literal and 
literaltry/literal.
+The default value is literaltry/literal.
+   /para
+
+   para
+With varnamehuge_tlb_pages/varname set to literalon/literal
+symbolmmap()/symbol will be called with 
symbolMAP_HUGETLB/symbol.
+If the call fails the server will fail fatally.
+   /para
+
+   para
+With varnamehuge_tlb_pages/varname set to literaloff/literal we
+will not use symbolMAP_HUGETLB/symbol at all.
+   /para
+
+   para
+With varnamehuge_tlb_pages/varname set to literaltry/literal
+we will try to use symbolMAP_HUGETLB/symbol and fall back to
+symbolmmap()/symbol without symbolMAP_HUGETLB/symbol.
+   /para
+  /listitem
+ /varlistentry
+
  varlistentry id=guc-temp-buffers xreflabel=temp_buffers
   termvarnametemp_buffers/varname (typeinteger/type)/term
   indexterm
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index df06312..f9de239 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -27,10 +27,14 @@
 #ifdef HAVE_SYS_SHM_H
 #include sys/shm.h
 #endif
+#ifdef MAP_HUGETLB
+#include dirent.h
+#endif
 
 #include miscadmin.h
 #include storage/ipc.h
 #include storage/pg_shmem.h
+#include utils/guc.h
 
 
 typedef key_t IpcMemoryKey;/* shared memory key passed to 
shmget(2) */
@@ -61,6 +65,19 @@ typedef int IpcMemoryId; /* shared memory ID 
returned by shmget(2) */
 #define MAP_FAILED ((void *) -1)
 #endif
 
+#ifdef MAP_HUGETLB
+#  ifdef __ia64__
+#define PG_HUGETLB_BASE_ADDR (void *)(0x8000UL)
+#define PG_MAP_HUGETLB (MAP_HUGETLB|MAP_FIXED)
+#  else
+#define PG_HUGETLB_BASE_ADDR (void *)(0x0UL)
+#define PG_MAP_HUGETLB MAP_HUGETLB
+#  endif
+#else
+#  define PG_MAP_HUGETLB 0
+#endif
+
+
 
 unsigned long UsedShmemSegID = 0;
 void  *UsedShmemSegAddr = NULL;
@@ -73,7 +90,6 @@ static void IpcMemoryDelete(int status, Datum shmId);
 static PGShmemHeader *PGSharedMemoryAttach(IpcMemoryKey key,
 IpcMemoryId *shmid);
 
-
 /*
  * InternalIpcMemoryCreate(memKey, size)
  *
@@ -342,6 +358,155 @@ PGSharedMemoryIsInUse(unsigned long id1, unsigned long 
id2)
 }
 
 
+#ifdef MAP_HUGETLB
+#define HUGE_PAGE_INFO_DIR  /sys/kernel/mm/hugepages
+
+/*
+ * static long InternalGetFreeHugepagesCount(const char *name)
+ *
+ * Attempt to read the number of available hugepages from
+ * /sys/kernel/mm/hugepages/hugepages-size/free_hugepages
+ * Will fail (return -1) if file could not be opened, 0 if no pages are 
available
+ * and  0 if there are free pages
+ *
+ */
+static long
+InternalGetFreeHugepagesCount(const char *name)
+{
+   int fd;
+   char buff[1024];
+   size_t len;
+   long result;
+   char *ptr;
+
+   len = snprintf(buff, 1024, %s/%s/free_hugepages, HUGE_PAGE_INFO_DIR, 
name);
+   if (len == 1024) /* I don't think that this will happen ever */
+   {
+   ereport(huge_tlb_pages == HUGE_TLB_TRY ? DEBUG1 : WARNING,
+   (errmsg(Filename %s/%s/free_hugepages is too 
long, HUGE_PAGE_INFO_DIR, name),
+errcontext(while checking hugepage size)));
+   return -1;
+   }
+
+   fd = open(buff, O_RDONLY);
+   if (fd = 0)
+   {
+   ereport(huge_tlb_pages == HUGE_TLB_TRY ? DEBUG1 : WARNING,
+   (errmsg(Could not open file %s: %s, buff, 
strerror(errno)),
+errcontext(while checking hugepage size)));
+   return -1;
+   }
+
+   len = read(fd, buff, 1024);
+   if (len = 0)
+   {
+   ereport(huge_tlb_pages == HUGE_TLB_TRY ? DEBUG1 : WARNING,
+   (errmsg(Error reading from file %s: %s, buff, 
strerror(errno)),
+errcontext(while checking hugepage size)));
+   close(fd);
+   return -1;
+   }
+
+   /*
+* If the content of free_hugepages is longer than or equal to 1024 
bytes
+* the rest is irrelevant; we simply want to know if there are any
+* hugepages left
+*/
+   

Re: [HACKERS] Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory

2012-10-30 Thread Christian Kruse
Hey,

On Tuesday 30 October 2012 20:33:18 Andres Freund wrote:
 +#ifdef MAP_HUGETLB
 +#  ifdef __ia64__
 +#define PG_HUGETLB_BASE_ADDR (void *)(0x8000UL)
 +#define PG_MAP_HUGETLB (MAP_HUGETLB|MAP_FIXED)
 +#  else
 
 Not your fault, but that looks rather strange to me. The level of
 documentation around the requirement of using MAP_FIXED is subpar...

Ok, after further reading I think with MAP_FIXED one has to „generate“ SHM 
segment addresses hisself. So 0x8000UL would just be one possible 
location for a hugepage.

Since this feature is not used in PG for now, the code should work. Could 
someone with access to a ia64 architecture verify it?

Greetings,
 CK



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory

2012-10-29 Thread Tom Lane
Christian Kruse cjk+postg...@defunct.ch writes:
 I created a patch which implements MAP_HUGETLB for sysv shared memory 
 segments 
 (PGSharedMemoryCreate). It is based on tests of Tom Lane and Andres Freund, I 
 added error handling, huge page size detection and a GUC variable.

My recollection is we'd decided not to pursue that because of fears that
it could make performance horribly worse on some platforms.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory

2012-10-29 Thread Andres Freund
On Monday, October 29, 2012 09:33:25 PM Tom Lane wrote:
 Christian Kruse cjk+postg...@defunct.ch writes:
  I created a patch which implements MAP_HUGETLB for sysv shared memory
  segments (PGSharedMemoryCreate). It is based on tests of Tom Lane and
  Andres Freund, I added error handling, huge page size detection and a
  GUC variable.
 
 My recollection is we'd decided not to pursue that because of fears that
 it could make performance horribly worse on some platforms.

Hm. I don't so. Maybe youre remembering our discussion about mlock?

MAP_HUGETLB is linux specific anyway, there are other OSes doing this, but I 
don't think a standard exists.

http://archives.postgresql.org/message-
id/201206292152.40311.andres%402ndquadrant.com and 
http://archives.postgresql.org/message-
id/CA%2BTgmoZGX0Pi0rw5sDH0Uz%3D03WkQ%3DmnoAW3TXVEfvUpyW%2BfMjw%40mail.gmail.com
seem to be the most recent discussions arround this.


The parts I like most about having usable hugepages arround is usable 'ps' 
output and way smaller pagetables...

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory

2012-10-29 Thread Peter Geoghegan
On 29 October 2012 20:14, Christian Kruse cjk+postg...@defunct.ch wrote:
 created a patch which implements MAP_HUGETLB for sysv shared memory segments
 (PGSharedMemoryCreate). It is based on tests of Tom Lane and Andres Freund, I
 added error handling, huge page size detection and a GUC variable.

I have a few initial observations on this.

* I think you should be making the new GUC PGC_INTERNAL on platforms
where MAP_HUGETLB is not defined or available. See also,
effective_io_concurrency. This gives sane error handling.

* Why aren't you failing when HUGE_TLB_ON and the mmap fails? You do
the same thing as HUGE_TLB_TRY, contrary to your documentation:

 +  if (huge_tlb_pages == HUGE_TLB_ON || huge_tlb_pages == 
HUGE_TLB_TRY)

* I think you should add MAP_HUGETLB to PG_MMAP_FLAGS directly, rather
than XOR'ing after the fact. We already build the flags that comprise
PG_MMAP_FLAGS using another set of intermediary flags, based on
platform considerations, so what you're doing here is just
inconsistent. Don't wrap the mmap() code in ifdefs, and instead rely
on the GUC being available on all platforms, with setting the GUC
causing an error on unsupported platforms, in the style of
effective_io_concurrency, as established in commit
3b07182e613a0e63ff662e3dc7f820f010923ec7 . Build a PG_MAP_HUGETLB
intermediary flag on all platforms.

* Apparently you're supposed to do this for the benefit of Itanic [1]:

/* Only ia64 requires this */
#ifdef __ia64__
#define ADDR (void *)(0x8000UL)
#define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB | MAP_FIXED)
#else
#define ADDR (void *)(0x0UL)
#define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB)
#endif

* You aren't following our style guide with respect to error messages [2].

[1] https://www.kernel.org/doc/Documentation/vm/map_hugetlb.c

[2] http://www.postgresql.org/docs/devel/static/error-style-guide.html
-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers