Hello All,

I was looking thr. the source and thought it would be worth to seek opinion on this proposal.

From what I understood so far, the core shared memory handling is done in pgsql/src/backend/port/sysv_shmem.c. It is linked by configure as per the runtime environment.

So I need to write another source code file which exports same APIs as above(i.e. all non static functions in that file) but using mmap and that would do it for using anon mmap instead of sysV shared memory.

It might seem unnecessary to provide mmap based shared memory. but this is just one step I was thinking of.

In pgsql/src/backend/storage/ipc/shmem.c, all the shared memory allocations are done. I was thinking of creating a structure of all global variables in that file. The global variables would still be in place so that existing code would not break. But the structure would hold database specific buffering information. Let's call that structure database context.

That way we can assign different mmaped(anon, of course) regions per database. In the backend, we could just switch the database contexts i.e. assign global variables from the database context and let the backend write to appropriate shared memory region. Every database would need at least two shared memory regions. One for operating on it's own buffers and another for system where it could write to shared catalogs etc. It can close the shared memory region belonging to other databases on startup.

Of course, buffer management alone would not cover database contexts altogether. WAL need to be lumped in as well(Not necessarily though. If all WAL buffering go thr. system shared region, everything will still work). I don't know if clog and data file handling is affected by this. If WAL goes in database context, we can probably provide per database WAL which could go well with tablespaces as well.

In case of WAL per database, the operations done on a shared catalog from a backend would need flushing system WAL and database WAL to ensure such transaction commit. Otherwise only flushing database WAL would do.

This way we can provided a background writer process per database, a common buffer per database minimising impact of cross database load significantly. e.g. vacuum full on one database would not hog another database due to buffer cache pollution. (IO can still saturate though.) This way we can push hardware to limit which might not possible right now in some cases.

I was looking for the reason large number of buffers degrades the performance and the source code browsing spiralled in this thought. So far I haven't figured out any reason why large numebr of buffers can degrade the performance. Still looking for it.

Comments?

Shridhar


---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match

Reply via email to