Josh Berkus <josh@agliodbs.com> writes: > The biggest single area where I see PostgreSQL external sort sucking is > on index creation on large tables. For example, for free version of > TPCH, it takes only 1.5 hours to load a 60GB Lineitem table on OSDL's > hardware, but over 3 hours to create each index on that table. This > means that over all our load into TPCH takes 4 times as long to create > the indexes as it did to bulk load the data. > ... > Following an index creation, we see that 95% of the time required is the > external sort, which averages 2mb/s. This is with seperate drives for > the WAL, the pg_tmp, the table and the index. I've confirmed that > increasing work_mem beyond a small minimum (around 128mb) had no benefit > on the overall index creation speed.
These numbers don't seem to add up. You have not provided any details about the index key datatypes or sizes, but I'll take a guess that the raw data for each index is somewhere around 10GB. The theory says that the runs created during the first pass should on average be about twice work_mem, so at 128mb work_mem there should be around 40 runs to be merged, which would take probably three passes with six-way merging. Raising work_mem to a gig should result in about five runs, needing only one pass, which is really going to be as good as it gets. If you could not see any difference then I see little hope for the idea that reducing the number of merge passes will help. Umm ... you were raising maintenance_work_mem, I trust, not work_mem? We really need to get some hard data about what's going on here. The sort code doesn't report any internal statistics at the moment, but it would not be hard to whack together a patch that reports useful info in the form of NOTICE messages or some such. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings