Thank you for the feedback. We had encountered this error message when allocating a record buf of 344 bytes. The message "record length 344 at %X/%X too long" along with the comment /* We treat this as a "bogus data" condition */ masked the OOM condition, implying an error in log record size calculation logic.
The actual allocation that failed was for 5 * Max(BLCKSZ, XLOG_BLCKSZ), a much larger allocation. Shoaib On Sun, Jul 10, 2016 at 10:58 PM, Craig Ringer <cr...@2ndquadrant.com> wrote: > > > On 11 July 2016 at 12:04, Michael Paquier <michael.paqu...@gmail.com> > wrote: > >> On Sat, Jul 9, 2016 at 2:58 AM, Shoaib Lari <sl...@pivotal.io> wrote: >> > Besides making the error message more informative, we had to modify >> > allocate_recordbuf() to return the actual number of bytes that were >> being >> > allocated. >> >> - report_invalid_record(state, "record length %u at %X/%X too long", >> - total_len, >> - (uint32) (RecPtr >> 32), (uint32) RecPtr); >> + report_invalid_record(state, >> + "cannot allocate %u bytes for record >> length %u at %X/%X", >> + newSizeOut, total_len, (uint32) (RecPtr >> >> 32), >> + (uint32) RecPtr); >> >> It does not look like a good idea to me to complicate the interface of >> allocate_recordbuf just to make more verbose one error message, >> meaning that it basically a complain related to the fact that >> palloc_extended(MCXT_ALLOC_NO_OOM) does not mention to the user the >> size that it has tried to allocate before returning NULL. Do you have >> a use case that would prove to be useful if this extra piece of >> information is provided? Because it seems to me that we don't really >> care if we know how much memory it has failed to allocate, we only >> want to know that it *has* failed and take actions only based on that. >> > > I actually find details of how much memory we tried to allocate to be > really useful in other places that do emit it. Sometimes it's been an > important clue when trying to figure out what's going on on a remote system > with no or limited access. Even if it just tells me "we were probably > incrementally allocating lots of small values since this failure is small" > vs "this allocation is huge, look for something unusually large" or "this > allocation is impossibly huge / some suspicious value look for memory > corruption". > > I tend to agree with your suggestion about a better approach but do think > emitting details on allocation failures is useful. At least when people > turn VM overcommit off ... > > -- > Craig Ringer http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >