Re: Some observations on very slow pg_restore operations

MichaelDBA Thu, 03 Oct 2019 14:49:35 -0700

Hi Ogden,

You didn't mention any details about your postgresql.conf settings. Whydon't you set them optimally for your loads and try again and see ifthere is any difference. Make sure you do a DB restart since some ofthese parameters require it.


======================================
parameter               before  after
----------------        ------  -------
shared_buffers          Reduce this value to about 25% of total memory
temp_buffers            Decrease this value to 8MB since we are not using 
temporary tables or doing intermediate sorts
work_mem                Reduce significantly (1MB) since we are not doing 
memory sorts or hashes per SQL
maintenance_work_mem    Increase signficantly for DDL bulk loading, restore 
operations
fsync                   off (so that time is not being spent waiting for stuff 
to be written to disk). Note: you may not be able to recover your database 
after a crash when set to off.
checkpoint_segments     Increase this significantly for DML bulk loading, 
restore operations
max_wal_size            Increase significantly like you would to 
checkpoint_segments
min_wal_size            Increase significantly like you would to 
checkpoint_segments
checkpoint_timeout      Increase to at least 30min
archive_mode            off
autovacuum              off
synchronous_commit      off
wal_level               minimal
max_wal_senders         0
full_page_writes        off during DML bulk loading, restore operations
wal_buffers             16MB during DML bulk loading, restore operations


Regards,
Michael Vitale


Ogden Brash wrote on 10/3/2019 4:30 PM:

I recently performed a pg_dump (data-only) of a relatively largedatabase where we store intermediate results of calculations. It isapproximately 3 TB on disk and has about 20 billion rows.
We do the dump/restore about once a month and as the dataset hasgrown, the restores have gotten very slow. So, this time I decided todo it a different way and have some observations that puzzle me.
Background:
The data is extremely simple. The rows consist only of numbers and areall fixed length. There are no foreign keys, constraints, null values,or default values. There are no strings or arrays. There are 66 tablesand the number of rows in each table forms a gaussian distribution; sothere are 3 tables which have about 3 billion rows each and the restof the tables have significantly fewer rows.
I used the directory format when doing the pg_dump. The compresseddata of the dump is 550 GB.
I am using: (PostgreSQL) 11.5 (Ubuntu 11.5-1.pgdg18.04+1)
The machine that I attempted to do a pg_restore to is a dedicatedserver just for one instance of posgresql. It has 32 GB of memory andis running Ubuntu 18.04 (headless). It physical hardware, notvirtualized. Nothing else runs on the machine and the postgresql.confsettings have been tuned (to the best of my postgresql abilities whichare suspect). While the operating system is installed on an SSD, thereis one extra large, fast HDD that is dedicated to the posgresqlserver. It has been in use for this particular purpose for a while andhas not had performance issues. (Just with pg_restore)
Autovacuum is off and all indexes have been deleted before the restoreis started. There is nothing in the db except for the empty data tables.
Restoring over the net:
In the past we have always restored in a way where the dumped data isread over a gigabit connection while being restored to the localdrive. But, the last time we did it it took 2 days and I was lookingfor something faster. So, I decided to copy the dumped directory tothe local drive and restore from the dump locally. I knew that becausethe machine only had one drive that would fit the data, there would besome I/O contention, but I hoped that it might not be as bad asreading over the network.
The pg_restore went unbearably slowly... after many hours it hadwritten less than 20GB to the database, so I started tracking it withiostat to see what was going on. The following is iostat output every60 seconds. I tracked it for several hours and this is representativeof what was happening consistently.
avg-cpu:  %user   %nice %system %iowait  %steal  %idle
           0.39    0.00 0.40   43.10    0.00   56.11

Device             tps    kB_read/s    kB_wrtn/s kB_read    kB_wrtn
loop0    0.00         0.00         0.00          0          0
loop1             0.00         0.00         0.00   0          0
loop2  0.00         0.00         0.00          0          0
sda             263.33       132.87      2990.93  7972     179456
sdb  0.17         0.00         0.73          0         44

avg-cpu: %user   %nice %system %iowait  %steal   %idle
           0.34    0.00    0.41   44.43    0.00   54.82

Device          tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
loop0             0.00         0.00         0.00   0          0
loop1  0.00         0.00         0.00          0          0
loop2             0.00         0.00         0.00   0          0
sda  262.95       140.47      2983.00       8428     178980
sdb               0.08         0.00         0.40   0         24
While I was tracking this I started experimenting with the IOscheduler to see if it had a noticable impact. I had been using cfq(ubuntu 18.04 default). Changing to deadline did not have a noticabledifference. Changing to noop made things much slower. I went back tocfq. I also experimented with turning fsync off; that did speed thingsup a bit but not enough for me to leave it off.
What puzzled me is that the OS was spending such a large percentage oftime in iowait, yet there was so little IO going on.
So, I decided to go back to restoring over the net. While the slowpg_restore was still going on, and while I was still tracking iostat,I copied the 550 GB dumps to an nfs drive. The copy happened prettymuch at full speed (limit being the gigabit ethernet) andinterestingly, it did not slow down kb_wrtn and kb_wrtn/s numbers iniostat (which was the postgresql server continuing with the restore).To me that seemed to indicate that it was not really a disk I/Olimitation.
Restoring over the net:
After copying the dump files to an NFS drive, I stopped the restore,truncated the tables and started exactly the same command, but thistime taking its input from the nfs drive. I did not reboot the machineor restart the postgresql server. I tracked iostate every 60 secondsand this is what it looks like:
avg-cpu: %user   %nice %system %iowait  %steal   %idle
           8.87    0.00    1.62   39.89    0.00   49.61

Device          tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
loop0             0.00         0.00         0.00   0          0
loop1  0.00         0.00         0.00          0          0
loop2             0.00         0.00         0.00   0          0
sda  252.77       527.87     37837.47      31672    2270248
sdb               0.22         0.00         1.00   0         60

avg-cpu:  %user   %nice %system %iowait  %steal  %idle
           8.57    0.00 2.21   35.26    0.00   53.97

Device             tps    kB_read/s    kB_wrtn/s kB_read    kB_wrtn
loop0    0.00         0.00         0.00          0          0
loop1             0.00         0.00         0.00   0          0
loop2  0.00         0.00         0.00          0          0
sda             236.10       465.27     54312.00 27916    3258720
sdb  0.08         0.00         0.40          0         24
Notice that the database is writing approximately 15 times as fast(and I have verified that by tracking the size of the posgresql datadirectory over time) while the number of i/o transactions per secondhas actually dropped a little bit. It has now been running about 24hours and has maintained that speed.
My interpretation
At first sight this seems to me as being symptomatic of the pg_restoreprocess doing a huge number of very small input operations whenreading from the dump. If the proportion of input to output operationsis the same now as it was when trying to restore from the local drive,that implies that the vast majority of i/o operations were inputs andnot outputs.
However, I am not sure that even that would cause such a slowdownbecause the compressed data files in the directory format dumpcorrespond to the tables and so there are 3 very large files that itstarts with. So all of these stats were gathered in the first 24 hoursof the restore when it was just restoring the first 3 tables (I haveverbose on, so I know). Because those files are gzipped, we know thatthey are being read sequentially and because the machine has lots ofmemory we know that the OS has allocated a lot of space to diskbuffers and so even if postgresql was doing lots of small reads,bouncing around between the 3 files, it would not hit the disk that often.
Now that restore is happening 15 times faster when reading from an nfsdrive, I looked at the nfsiostat output for a while and it does notshow any indication of any untoward behavior:
192.168.17.146:/volume1/Backups mounted on /nas/Backups:

           ops/s       rpc bklog
          27.000           0.000
read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) 27.000 3464.332 128.309 0 (0.0%) 13.500 13.511write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) 0.000 0.000 0.000 0 (0.0%) 0.000 0.000
192.168.17.146:/volume1/Backups mounted on /nas/Backups:

           ops/s       rpc bklog
          24.000           0.000
read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) 24.000 3079.406 128.309 0 (0.0%) 28.492 28.504write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) 0.000 0.000 0.000 0 (0.0%) 0.000 0.000
The nubmer of operations per second (if those correspond to reads frompostgresql, which I do not know for a fact) does not seem high at all.
I actually do not have a great theory for what is going on but itmight be more obvious to someone who knows the postgresqlimplementation well. I would love to hear any thoughts that would behelpful on how to get my restores even faster.

Re: Some observations on very slow pg_restore operations

Reply via email to