Re: [HACKERS] On How To Shorten the Steep Learning Curve Towards PG Hacking...

Kang Yuzhe Wed, 29 Mar 2017 00:18:34 -0700

Thanks you all for pointing me to useful docs on PG kernel stuff as well as
for being sympathetic with me and the newbie question that appears to be
true and interesting but yet be addressed by PG experts.


Last but not least, *Craig Ringer*, you just nailed it!! You also made me
feel and think that my question is working asking.

Regards,
Zeray

On Wed, Mar 29, 2017 at 6:36 AM, Craig Ringer <[email protected]> wrote:

> On 29 March 2017 at 10:53, Amit Langote <[email protected]>
> wrote:
> > Hi,
> >
> > On 2017/03/28 15:40, Kang Yuzhe wrote:
> >> Thanks Tsunakawa for such an informative reply.
> >>
> >> Almost all of the docs related to the internals of PG are of
> introductory
> >> concepts only.
> >> There is even more useful PG internals site entitled "The Internals of
> >> PostgreSQL" in http://www.interdb.jp/pg/ translation of the Japanese PG
> >> Internals.
> >>
> >> The query processing framework that is described in the manual as you
> >> mentioned is of informative and introductory nature.
> >> In theory, the query processing framework described in the manual is
> >> understandable.
> >>
> >> Unfortunate, it is another story to understand how query processing
> >> framework in PG codebase really works.
> >> It has become a difficult task for me to walk through the PG source code
> >> for example how SELECT/INSERT/TRUNCATE in the the different modules
> under
> >> "src/..". really works.
> >>
> >> I wish there were Hands-On with PostgreSQL Internals like
> >> https://bkmjournal.wordpress.com/2017/01/22/hands-on-with-
> postgresql-internals/
> >> for more complex PG features.
> >>
> >> For example, MERGE SQL standard is not supported yet by PG.  I wish
> there
> >> were Hands-On with PostgreSQL Internals for MERGE/UPSERT. How it is
> >> implemented in parser/executor/storage etc. modules with detailed
> >> explanation for each code and debugging and other important concepts
> >> related to system programming.
> >
> > I am not sure if I can show you that one place where you could learn all
> > of that, but many people who started with PostgreSQL development at some
> > point started by exploring the source code itself (either for learning or
> > to write a feature patch), articles on PostgreSQL wiki, and many related
> > presentations accessible using the Internet. I liked the following among
> > many others:
>
> Personally I have to agree that the learning curve is very steep. Some
> of the docs and presentations help, but there's a LOT to understand.
>
> When you're getting started you're lost in a world of language you
> don't know, and trying to understand one piece often gets you lost in
> other pieces. In no particular order:
>
> * Memory contexts and palloc
> * Managing transactions and how that interacts with memory contexts
> and the default memory context
> * Snapshots, snapshot push/pop, etc
> * LWLocks, memory barriers, spinlocks, latches
> * Heavyweight locks (and the different APIs to them)
> * GUCs, their scopes, the rules around their callbacks, etc
> * dynahash
> * catalogs and oids and access methods
> * The heap AM like heap_open
> * relcache, catcache, syscache
> * genam and the systable_ calls and their limitations with indexes
> * The SPI
> * When to use each of the above 4!
> * Heap tuples and minimal tuples
> * VARLENA
> * GETSTRUCT, when you can/can't use it, other attribute fetching methods
> * TOAST and detoasting datums.
> * forming and deforming tuples
> * LSNs, WAL/xlog generation and redo. Timelines. (ARGH, timelines).
> * cache invalidations, when they can happen, and how to do anything
> safely around them.
> * TIDs, cmin and cmax, xmin and xmax
> * postmaster, vacuum, bgwriter, checkpointer, startup process,
> walsender, walreceiver, all our auxillary procs and what they do
> * relmapper, relfilenodes vs relation oids, filenode extents
> * ondisk structure, page headers, pages
> * shmem management, buffers and buffer pins
> * bgworkers
> * PG_TRY() and PG_CATCH() and their limitations
> * elog and ereport and errcontexts, exception unwinding/longjmp and
> how it interacts with memory contexts, lwlocks, etc
> * The nest of macros around datum manipulation and functions, PL
> handlers. How to find the macros for the data types you want to work
> with.
> * Everything to do with the C API for arrays (is horrible)
> * The details of the parse/rewrite/plan phases with rewrite calling
> back into parse, paths, the mess with inheritance_planner, reading and
> understanding plantrees
> * The permissions and grants model and how to interact with it
> * PGPROC, PGXACT, other main shmem structures
> * Resource owners (which I still don't fully "get")
> * Checkpoints, pg_control and ShmemVariableCache, crash recovery
> * How globals are used in Pg and how they interact with fork()ing from
> postmaster
> * SSI (haven't gone there yet myself)
> * ....
>
> Personally I recall finding the magic of resource owner and memory
> context changing under me when I started/stopped xacts in a bgworker,
> along with the need to manage snapshots and SPI state to be distinctly
> confusing.
>
> There are various READMEs, blog posts, presentation slides/videos, etc
> that explain bits and pieces. But not much exists to tie it together
> into a comprehensible hole with simple, minimal explanations for each
> part so someone who's new to it all can begin to get a handle on it,
> find resources to learn more about subsystems they need to care about,
> etc.
>
> Lots of it boils down to "read the code". But so much code! You don't
> know if what you're reading is really relevant or if it's even
> correct, or if it makes assumptions that differ from your situation.
> There are lots of coding rules that aren't necessarily obvious unless
> you read the right place, e.g. that you don't need to and shouldn't
> LWLockRelease() before elog(ERROR). That SPI doesn't manage snapshots
> or xacts for you (but will often silently work anyway!). etc.
>
> I've long intended to start a blog series on postgresql innards
> concepts, partly with the intent of turning it into such an overview.
> I find that people are better at shouting you down when you're wrong
> than they are at writing new material or reviewing proposed docs, so
> it's often a good way to fact-check things ;) .  Plus it's a good way
> to learn. Time is always short though.
>
> --
>  Craig Ringer                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Re: [HACKERS] On How To Shorten the Steep Learning Curve Towards PG Hacking...

Reply via email to