On 5 March 2014 18:22, Israel Brewster <isr...@eraalaska.net> wrote: > I have a Postgresql 9.2.4 database containing real-time tracking data for > our aircraft for the past week (theoretically). It is populated by two > different processes: one that runs every few minutes, retrieving data from > a number of sources and storing it in the DB, and one that has an "always > on" connection to the DB streaming data into the database in realtime > (often several records per second). To keep the database size manageable I > have a cron job that runs every evening to delete all records that are more > than a week old, after archiving a subset of them in permanent storage. > > This morning my boss e-mailed me, complaining that only a couple of > aircraft were showing up in the list (SELECT distinct(tail) FROM data being > the command that populates the list). Upon looking at the data I saw that > it only went back to 4am this morning, rather than the week I was > expecting. My first thought was "Oh, I must have a typo in my cleanup > routine, such that it is deleting all records rather than only those a week > old, and it's just that no one has noticed until now". So I looked at that, > but changing the delete to a select appeared to produce the proper results, > in that no records were selected: > > DELETE FROM data WHERE pointtime<now() AT TIME ZONE 'UTC'-interval '7 > days'; > > Then I noticed something even more odd. My database has an id column, > which is defined as a SERIAL. As we all know, a serial is a monotonically > increasing number that is not affected by deletes. However, the oldest > record in my database, from 4am this morning, had an id of 1. Even though I > KNOW there was data in the system yesterday. Even if my DELETE command was > wrong and deleted ALL records, that shouldn't have reset the SERIAL column > to 1! I also know that I have not been in the database mucking around with > the sequence value - to be completely honest, I don't even know the exact > command to reset it - I'd have to google it if I wanted to. > > Also odd is that my cleanup script runs at 1am. I have records of there > being new data in the database up to 3:51am, but the oldest record > currently in the DB is from 4:45am (as specified by the default of now() on > the column). So I know records were added after my delete command ran, but > before this reset occurred. > > So my question is, aside from someone going in and mucking about in the > wee hours of the morning, what could possibly cause this behavior? What > sort of event could cause all data to be deleted from the table, and the > sequence to be reset? Especially while there is an active connection? > Thanks for any ideas, however wild or off the wall :-) >
That is odd. Even if it were an unlogged table, and there was a crash, the sequence wouldn't reset, and even if it was running in a very long-running transaction held open by a buggy connection pooler, the sequence would still progress as it's immune to the effects of transactions. So if all the data went missing, and the sequence reset, the only thing I can think of is: Someone ran: TRUNCATE data RESTART IDENTITY; or someone restored the table structure from a backup that deleted the original table. Do you log DDL? Was the table partitioned? You should also really be on 9.2.7, although I can't think of any bug that's been fixed which could be responsible for this issue. -- Thom