Re: [GENERAL] Recovery Assistance

Brian Mills Sat, 28 Jan 2017 12:17:43 -0800

Hi,

No, it hasn't changed since the first time I looked at it.


root@atlassian:/home/tbadmin# ps ax | grep post
 1364 ?        Ss     0:00 /usr/lib/postfix/master
 5198 pts/3    S      0:00 su postgres
 5221 pts/3    S      0:00 /usr/lib/postgresql/9.3/bin/postgres -D
/etc/postgresql/9.3/main
 5222 ?        Ss     0:10 postgres: startup process   recovering
0000000100000005000000A3
11161 pts/4    S+     0:00 grep --color=auto post

The /var/log/postgres logs haven't got anything new in them.
Also the screen session hasn't got anything new in them, except it adds a
line every time I attempt to connect to it using psql. The last attempt
below was my connection attempt for this email, a bit over 8 hours after my
last connection attempt.

Screen session:
2017-01-28 12:38:35 AEDT FATAL:  the database system is starting up
2017-01-28 23:00:01 AEDT FATAL:  the database system is starting up
2017-01-28 23:00:01 AEDT FATAL:  the database system is starting up
2017-01-29 07:14:00 AEDT FATAL:  the database system is starting up

I also still can't connect.
postgres@atlassian:/home/tbadmin$ psql
psql: FATAL:  the database system is starting up


*Brian Mills*
CTO


*Mob: *0410660003
*Melbourne* 03 9012 3460 or 03 8376 6327 *|*  *Sydney* 02 8064 3600 *|*
*Brisbane* 07 3173 1570
Level 1 *|*  600 Chapel Street *|* South Yarra*|*  VIC *|*  3141 *|*
Australia

<https://www.facebook.com/TryBooking/>  <https://twitter.com/trybooking>
<https://www.linkedin.com/company/trybooking-com>

On 28 January 2017 at 16:08, Adrian Klaver <[email protected]>
wrote:

> On 01/27/2017 05:40 PM, Brian Mills wrote:
>
>> First of all, Thank you for your time to assist me learning. I really
>> appreciate it.
>>
>> root# ps ax | grep post
>>  1364 ?        Ss     0:00 /usr/lib/postfix/master
>>  5198 pts/3    S      0:00 su postgres
>>  5221 pts/3    S      0:00 /usr/lib/postgresql/9.3/bin/postgres -D
>> /etc/postgresql/9.3/main
>>  5222 ?        Ss     0:10 postgres: startup process   recovering
>> 0000000100000005000000A3
>>  7930 pts/4    S+     0:00 grep --color=auto post
>>
>
> So if you check back does the recovering XXXXXXXX part change?
>
> If so Postgres is walking through the WAL files as it should.
>
>
>> Its a single machine postgres database server, so I'm assuming there is
>> no cluster log. If there is one, where would I look for it?
>> The only log in /var/log/postgres is postgresql-9.3-main.log
>>
>
> That would be it. A single Postgres instance has multiple databases in it,
> by default it starts with template0, template1 and postgres databases. Then
> add whatever databases you created and you have a cluster of databases.
>
> which shows (tail):
>>
>> 2017-01-27 20:27:01 AEDT LOG:  database system was shut down at
>> 2017-01-27 20:26:29 AEDT
>> 2017-01-27 20:27:01 AEDT LOG:  MultiXact member wraparound protections
>> are now enabled
>> 2017-01-27 20:27:01 AEDT LOG:  autovacuum launcher started
>> 2017-01-27 20:27:01 AEDT LOG:  database system is ready to accept
>> connections
>> 2017-01-27 20:27:02 AEDT LOG:  incomplete startup packet
>> 2017-01-27 20:28:54 AEDT ERROR:  unexpected chunk size 104 (expected
>> 1996) in chunk 3 of 22 for toast value 48862 in pg_toast_20028
>> 2017-01-27 20:28:54 AEDT STATEMENT:  COPY public.bodycontent
>> (bodycontentid, body, contentid, bodytypeid) TO stdout;
>> 2017-01-27 20:30:13 AEDT LOG:  received fast shutdown request
>> 2017-01-27 20:30:13 AEDT LOG:  aborting any active transactions
>> 2017-01-27 20:30:13 AEDT LOG:  autovacuum launcher shutting down
>> 2017-01-27 20:30:13 AEDT LOG:  shutting down
>> 2017-01-27 20:30:13 AEDT LOG:  database system is shut down
>>
>
> That would be your Attempt 1 log.
>
>>
>> I ran the screen utility so I could leave the db started using
>> the pg_ctl command. The later logs in that session have not progressed,
>> its last entry is still
>> 2017-01-27 23:00:01 AEDT FATAL:  the database system is starting up
>> Which is still later datetime than the /var/log/postgres... log.
>>
>
> So it is just logging to stdout and not to the log file.
>
>
>> Connection attempt shows the same log
>> postgres@atlassian:/home/myuser$ psql
>> psql: FATAL:  the database system is starting up
>>
>> Nothing in the syslog that seems connected.
>>
>> *Brian Mills*
>> CTO
>>
>>
>> *Mob: *0410660003 <tel:0410660003>
>> *Melbourne* 03 9012 3460 <tel:03%209012%203460> or 03 8376 6327
>> <tel:03%208376%206327> *|* * **Sydney* 02 8064 3600
>> <tel:02%208064%203600> *|*  *Brisbane* 07 3173 1570
>> <tel:07%203173%201570>
>> Level 1 *|*  600 Chapel Street *|* South
>> Yarra*|*  VIC *|*  3141 *|*  Australia
>>
>> <https://www.facebook.com/TryBooking/> <https://twitter.com/trybooking> <
>> https://www.linkedin.com/company/trybooking-com>
>>
>> On 28 January 2017 at 12:05, Adrian Klaver <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>     On 01/27/2017 01:31 PM, Brian Mills wrote:
>>
>>         Hi,
>>
>>         I have a Atlassian Confluence Wiki that depends on postgres, but I
>>         haven't much experience with postgres other than for this purpose.
>>
>>         A few days ago, the hard disk filled, so all services stopped
>>         working.
>>         When the admin realised this he increased the disk size (its in
>>         a cloud,
>>         so that was easy to do) however I think the way it shutdown left
>>         Postgres in an inconsistent state for some reason.
>>         Now when I start it up I get this message in the log over and
>>         over again:
>>         "FATAL:  the database system is starting up"
>>
>>         I have a backup, which I have successfully recovered, but it is
>>         24 hours
>>         old, the next backup was the cause of the disk filling. So I'm
>> using
>>         this as exercise in learning a bit more about postgres.
>>
>>         I did some research and found a number of options. I took a file
>>         level
>>         backup with the service not running then tried 2 things. I
>>         haven't found
>>         much else on what to do though.
>>
>>         *Attempt 1 - Reset Log *
>>
>>         It sounded like this shouldn't be my first option (it wasn't)
>>         but it did
>>         sound like what I needed to do.
>>         I ran this command
>>         ./pg_resetxlog /var/lib/postgresql/9.3/main -f
>>         It worked a treat, the database did startup ok.
>>         However when I tried to dump the DB:
>>         root@atlassian:/home/myuser# sudo -u postgres pg_dump
>> confluencedb >
>>         $now-confluencedb.sql
>>         pg_dump: Dumping the contents of table "bodycontent" failed:
>>         PQgetResult() failed.
>>         pg_dump: Error message from server: ERROR:  unexpected chunk
>>         size 104
>>         (expected 1996) in chunk 3 of 22 for toast value 48862 in
>>         pg_toast_20028
>>         pg_dump: The command was: COPY public.bodycontent
>>         (bodycontentid, body,
>>         contentid, bodytypeid) TO stdout;
>>         The dump failed, so I assume this did leave my database in an
>>         inconsistent state.
>>
>>
>>         *Attempt 2 -  startup manually and let it try recovery*
>>
>>         I restored my file level backup and started again.
>>         This time I tried to startup manually on the command line to see
>> the
>>         output (I'd done it as a service startup a number of times to
>>         nearly the
>>         same effect too)
>>
>>         postgres@atlassian:/usr/lib/postgresql/9.3/bin$ ./pg_ctl -D
>>         /etc/postgresql/9.3/main start
>>         server starting
>>         postgres@atlassian:/usr/lib/postgresql/9.3/bin$ 2017-01-27
>>         20:36:08 AEDT
>>         LOG:  database system was interrupted while in recovery at
>>         2017-01-27
>>         20:13:26 AEDT
>>         2017-01-27 20:36:08 AEDT HINT:  This probably means that some
>>         data is
>>         corrupted and you will have to use the last backup for recovery.
>>         2017-01-27 20:36:08 AEDT LOG:  database system was not properly
>> shut
>>         down; automatic recovery in progress
>>         2017-01-27 20:36:08 AEDT LOG:  redo starts at 5/528B4558
>>         2017-01-27 20:36:18 AEDT LOG:  redo done at 5/A3FFF9E8
>>         2017-01-27 20:36:18 AEDT LOG:  last completed transaction was at
>> log
>>         time 2017-01-24 02:08:00.023064+11
>>         2017-01-27 23:00:01 AEDT FATAL:  the database system is starting
>> up
>>         2017-01-27 23:00:01 AEDT FATAL:  the database system is starting
>> up
>>
>>
>>     What does ps ax | grep post show?
>>
>>     Is the cluster set up to log to a file, if so what does it show?
>>
>>     Does the system log show anything relevant?
>>
>>
>>         I've left it that way for 12 hours, and its still not allowing
>>         connections.
>>         I assume its doing some kind of consistency check?
>>
>>
>>     What does it say when you attempt a connection?
>>
>>
>>
>>         Does anyone have any suggestions on what I should be doing to
>>         try and
>>         restore this database?
>>
>>         - The amount of change is minimal in the DB after 6pm it should be
>>         basically no change overnight.
>>         - The log above seems to suggest it has completed a redo ok, is
>>         that right?
>>         - The last completed transaction time 2017-01-24
>>         02:08:00.023064+11 the
>>         log mentions would be fine to use, so loosing even a few hours
>>         before
>>         that would be more than adequate.
>>
>>         I'm just not clear what the database is doing now, or how I
>>         should be
>>         trying to recover it.
>>
>>         Any help anyone can suggest would be great! I've given myself this
>>         weekend to attempt to recover more than the last backup, after
>>         that I
>>         need to restore the service for my team to use and suck up the
>>         lost last
>>         day of updates.
>>
>>         Thanks,
>>         Brian
>>
>>
>>
>>     --
>>     Adrian Klaver
>>     [email protected] <mailto:[email protected]>
>>
>>
>>
>
> --
> Adrian Klaver
> [email protected]
>

Re: [GENERAL] Recovery Assistance

Reply via email to