Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-28 Thread Teodor Sigaev
The more I think about it, the more I think gin is just an innocent bystander, for which I just happen to have a particularly demanding test. I think something about snapshots and wrap-around may be broken. After 10 hours of running I've got 1587 XX000 2016-04-28 05:57:09.964 MSK:ERROR:

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-28 Thread Noah Misch
On Tue, Apr 26, 2016 at 08:22:03PM +0300, Teodor Sigaev wrote: > >>Check my reasoning: In version 4 I added a remebering of tail of pending > >>list into blknoFinish variable. And when we read page which was a tail on > >>cleanup start then we sets cleanupFinish variable and after cleaning that >

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-26 Thread Teodor Sigaev
Check my reasoning: In version 4 I added a remebering of tail of pending list into blknoFinish variable. And when we read page which was a tail on cleanup start then we sets cleanupFinish variable and after cleaning that page we will stop further cleanup. Any insert caused during cleanup will be

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-23 Thread Noah Misch
On Fri, Apr 22, 2016 at 02:03:01PM -0700, Jeff Janes wrote: > On Thu, Apr 21, 2016 at 11:00 PM, Noah Misch wrote: > > Could you describe the test case in sufficient detail for Teodor to > > reproduce > > your results? > > [detailed description and attachments] Thanks. > The

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-22 Thread Jeff Janes
On Thu, Apr 21, 2016 at 11:00 PM, Noah Misch wrote: > On Mon, Apr 18, 2016 at 05:48:17PM +0300, Teodor Sigaev wrote: >> >>Added, see attached patch (based on v3.1) >> > >> >With this applied, I am getting a couple errors I have not seen before >> >after extensive crash recovery

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-22 Thread Robert Haas
On Fri, Apr 22, 2016 at 2:20 PM, Jeff Janes wrote: >> Check my reasoning: In version 4 I added a remebering of tail of pending >> list into blknoFinish variable. And when we read page which was a tail on >> cleanup start then we sets cleanupFinish variable and after cleaning

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-22 Thread Jeff Janes
On Mon, Apr 18, 2016 at 7:48 AM, Teodor Sigaev wrote: >>> Added, see attached patch (based on v3.1) >> >> >> With this applied, I am getting a couple errors I have not seen before >> after extensive crash recovery testing: >> ERROR: attempted to delete invisible tuple >> ERROR:

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-22 Thread Peter Geoghegan
On Thu, Nov 5, 2015 at 2:44 PM, Jeff Janes wrote: > The bug theoretically exists in 9.5, but it wasn't until 9.6 (commit > e95680832854cf300e64c) that free pages were recycled aggressively > enough that it actually becomes likely to be hit. In other words: The bug could be

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-22 Thread Noah Misch
On Mon, Apr 18, 2016 at 05:48:17PM +0300, Teodor Sigaev wrote: > >>Added, see attached patch (based on v3.1) > > > >With this applied, I am getting a couple errors I have not seen before > >after extensive crash recovery testing: > >ERROR: attempted to delete invisible tuple > >ERROR: unexpected

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-18 Thread Teodor Sigaev
Added, see attached patch (based on v3.1) With this applied, I am getting a couple errors I have not seen before after extensive crash recovery testing: ERROR: attempted to delete invisible tuple ERROR: unexpected chunk number 1 (expected 2) for toast value 100338365 in pg_toast_16425 Huh,

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-17 Thread Jeff Janes
On Tue, Apr 12, 2016 at 9:53 AM, Teodor Sigaev wrote: > > With pending cleanup patch backend will try to get lock on metapage with > ConditionalLockPage. Will it interrupt autovacum worker? Correct, ConditionalLockPage should not interrupt the autovacuum worker. >> >>

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-15 Thread Teodor Sigaev
Alvaro's recommendation, to let the cleaner off the hook once it passes the page which was the tail page at the time it started, would prevent any process from getting pinned down indefinitely, but would Added, see attached patch (based on v3.1) If there is no objections I will aplly it at

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-12 Thread Teodor Sigaev
There are only 3 fundamental options I see, the cleaner can wait, "help", or move on. "Helping" is what it does now and is dangerous. Moving on gives the above-discussed unthrottling problem. Waiting has two problems. The act of waiting will cause autovacuums to be canceled, unless ugly hacks

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-12 Thread Teodor Sigaev
This restricts the memory used by ordinary backends when doing the cleanup to be work_mem. Shouldn't we let them use maintenance_work_mem? Only one backend can be doing this clean up of a given index at any given time, so we don't need to worry about many concurrent allocations of

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-12 Thread Noah Misch
On Thu, Apr 07, 2016 at 05:53:54PM -0700, Jeff Janes wrote: > On Thu, Apr 7, 2016 at 4:33 PM, Tom Lane wrote: > > Jeff Janes writes: > >> To summarize the behavior change: > > > >> In the released code, an inserting backend that violates the pending > >>

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-07 Thread Jeff Janes
On Thu, Apr 7, 2016 at 4:33 PM, Tom Lane wrote: > Jeff Janes writes: >> To summarize the behavior change: > >> In the released code, an inserting backend that violates the pending >> list limit will try to clean the list, even if it is already being >>

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-07 Thread Tom Lane
Jeff Janes writes: > To summarize the behavior change: > In the released code, an inserting backend that violates the pending > list limit will try to clean the list, even if it is already being > cleaned. It won't accomplish anything useful, but will go through the >

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-07 Thread Alvaro Herrera
Jeff Janes wrote: > The proposed change removes that throttle, so that inserters will > immediately see there is already a cleaner and just go back about > their business. Due to that, unthrottled backends could add to the > pending list faster than the cleaner can clean it, leading to >

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-07 Thread Jeff Janes
On Wed, Apr 6, 2016 at 9:52 AM, Teodor Sigaev wrote: > I'm inclining to push v3.1 as one of two winners by size/performance and, > unlike to pending lock patch, it doesn't change an internal logic of lock > machinery. This restricts the memory used by ordinary backends when

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-06 Thread Teodor Sigaev
I've tested the v2, v3 and v3.1 of the patch, to see if there are any differences. The v2 no longer applies, so I tested it on ee943004. The following table shows the total duration of the data load, and also sizes of the two GIN indexes. duration (sec) subject body

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-05 Thread Tomas Vondra
Hi, On 04/04/2016 02:25 PM, Tomas Vondra wrote: On 04/04/2016 02:06 PM, Teodor Sigaev wrote: The above-described topic is currently a PostgreSQL 9.6 open item. Teodor, since you committed the patch believed to have created it, you own this open item. If that responsibility lies elsewhere,

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-04 Thread Tomas Vondra
On 04/04/2016 02:06 PM, Teodor Sigaev wrote: The above-described topic is currently a PostgreSQL 9.6 open item. Teodor, since you committed the patch believed to have created it, you own this open item. If that responsibility lies elsewhere, please let us know whose responsibility it is to fix

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-04 Thread Teodor Sigaev
The above-described topic is currently a PostgreSQL 9.6 open item. Teodor, since you committed the patch believed to have created it, you own this open item. If that responsibility lies elsewhere, please let us know whose responsibility it is to fix this. Since new open items may be discovered

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-04 Thread Noah Misch
On Thu, Feb 25, 2016 at 11:19:20AM -0800, Jeff Janes wrote: > On Wed, Feb 24, 2016 at 8:51 AM, Teodor Sigaev wrote: > > Thank you for remembering this problem, at least for me. > > > >>> Well, turns out there's a quite significant difference, actually. The > >>> index sizes I

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-02-25 Thread Jeff Janes
On Wed, Feb 24, 2016 at 8:51 AM, Teodor Sigaev wrote: > Thank you for remembering this problem, at least for me. > >>> Well, turns out there's a quite significant difference, actually. The >>> index sizes I get (quite stable after multiple runs): >>> >>> 9.5 : 2428 MB >>>

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-02-25 Thread Tomas Vondra
Hi, On 02/25/2016 05:32 PM, Teodor Sigaev wrote: Well, turns out there's a quite significant difference, actually. The index sizes I get (quite stable after multiple runs): 9.5 : 2428 MB 9.6 + alone cleanup : 730 MB 9.6 + pending lock : 488 MB In attach modified alone_cleanup

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-02-25 Thread Teodor Sigaev
Well, turns out there's a quite significant difference, actually. The index sizes I get (quite stable after multiple runs): 9.5 : 2428 MB 9.6 + alone cleanup : 730 MB 9.6 + pending lock : 488 MB In attach modified alone_cleanup patch which doesn't break cleanup process as it does

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-02-24 Thread Teodor Sigaev
Thank you for remembering this problem, at least for me. Well, turns out there's a quite significant difference, actually. The index sizes I get (quite stable after multiple runs): 9.5 : 2428 MB 9.6 + alone cleanup : 730 MB 9.6 + pending lock : 488 MB Interesting, I don't see why

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-02-24 Thread Tomas Vondra
On 02/24/2016 06:56 AM, Robert Haas wrote: On Wed, Feb 24, 2016 at 9:17 AM, Tomas Vondra wrote: ... Are we going to anything about this? While the bug is present in 9.5 (and possibly other versions), fixing it before 9.6 gets out seems important because

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-02-23 Thread Robert Haas
On Wed, Feb 24, 2016 at 9:17 AM, Tomas Vondra wrote: >> Well, turns out there's a quite significant difference, actually. The >> index sizes I get (quite stable after multiple runs): >> >> 9.5 : 2428 MB >> 9.6 + alone cleanup : 730 MB >> 9.6 + pending

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-02-23 Thread Tomas Vondra
Hi, On 01/05/2016 10:38 AM, Tomas Vondra wrote: Hi, ... There shouldn't be a difference between the two approaches (although I guess there could be if one left a larger pending list than the other, as pending lists is very space inefficient), but since you included 9.5 in your test I

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-01-05 Thread Tomas Vondra
Hi, On 12/23/2015 09:33 PM, Jeff Janes wrote: On Mon, Dec 21, 2015 at 11:51 AM, Tomas Vondra wrote: On 12/21/2015 07:41 PM, Jeff Janes wrote: On Sat, Dec 19, 2015 at 3:19 PM, Tomas Vondra wrote: ... So both patches seem to

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2015-12-23 Thread Jeff Janes
On Mon, Dec 21, 2015 at 11:51 AM, Tomas Vondra wrote: > > > On 12/21/2015 07:41 PM, Jeff Janes wrote: >> >> On Sat, Dec 19, 2015 at 3:19 PM, Tomas Vondra >> wrote: > > > ... > >>> So both patches seem to do the trick, but (2) is faster.

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2015-12-21 Thread Tomas Vondra
On 12/21/2015 07:41 PM, Jeff Janes wrote: On Sat, Dec 19, 2015 at 3:19 PM, Tomas Vondra wrote: ... So both patches seem to do the trick, but (2) is faster. Not sure if this is expected. (BTW all the results are without asserts enabled). Do you know what the

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2015-12-21 Thread Jeff Janes
On Sat, Dec 19, 2015 at 3:19 PM, Tomas Vondra wrote: > Hi, > > On 11/06/2015 02:09 AM, Tomas Vondra wrote: >> >> Hi, >> >> On 11/06/2015 01:05 AM, Jeff Janes wrote: >>> >>> On Thu, Nov 5, 2015 at 3:50 PM, Tomas Vondra >>> wrote: >> >>

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2015-12-19 Thread Tomas Vondra
Hi, On 11/06/2015 02:09 AM, Tomas Vondra wrote: Hi, On 11/06/2015 01:05 AM, Jeff Janes wrote: On Thu, Nov 5, 2015 at 3:50 PM, Tomas Vondra wrote: ... I can do that - I see there are three patches in the two threads: 1) gin_pending_lwlock.patch (Jeff

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2015-11-05 Thread Jeff Janes
On Thu, Nov 5, 2015 at 2:18 PM, Tomas Vondra wrote: > Hi, > > while repeating some full-text benchmarks on master, I've discovered > that there's a data corruption bug somewhere. What happens is that while > loading data into a table with GIN indexes (using multiple

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2015-11-05 Thread Tomas Vondra
On 11/05/2015 11:44 PM, Jeff Janes wrote: > This looks like it is probably the same bug discussed here: http://www.postgresql.org/message-id/CAMkU=1xalflhuuohfp5v33rzedlvb5aknnujceum9knbkrb...@mail.gmail.com And here: http://www.postgresql.org/message-id/56041b26.2040...@sigaev.ru The bug

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2015-11-05 Thread Jeff Janes
On Thu, Nov 5, 2015 at 3:50 PM, Tomas Vondra wrote: > > > On 11/05/2015 11:44 PM, Jeff Janes wrote: >> >> >> This looks like it is probably the same bug discussed here: >> >> >>

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2015-11-05 Thread Tomas Vondra
Hi, On 11/06/2015 01:05 AM, Jeff Janes wrote: On Thu, Nov 5, 2015 at 3:50 PM, Tomas Vondra wrote: ... I can do that - I see there are three patches in the two threads: 1) gin_pending_lwlock.patch (Jeff Janes) 2) gin_pending_pagelock.patch (Jeff Janes)