Re: [HACKERS] pgbench stuck with 100% cpu usage

2017-09-28 Thread Pavan Deolasee
On Fri, Sep 29, 2017 at 1:03 AM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:

>
> The commit that introduced this code is 12788ae49e1933f463bc. So I amn
>>> copying Heikki.
>>>
>>
>> AFAICR the commit was mostly a heavy restructuring of previous
>> unmaintainable spaghetti code. I'm not sure the problem was not there
>> before under one form or another.
>>
>> I agree that it should error out & stop the client in this case at least.
>>
>
> Here is a probable "fix", which does was the comment said should be done.
>
>
Looks good to me.


> I could not trigger an infinite loop with various kill -9 and other quick
> stops. Could you try it on your side?
>
>
Ok, I will try. But TBH I did not try to reproduce that either and I am not
sure if I can. I discovered the problem when my laptop's battery started
draining out much more quickly. Having seen the problem, it seems very
obvious though.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] pgbench stuck with 100% cpu usage

2017-09-28 Thread Pavan Deolasee
On Fri, Sep 29, 2017 at 12:22 AM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:

>
> While running some tests, I encountered a situation where pgbench gets
>> stuck in an infinite loop, consuming 100% cpu. The setup was:
>>
>> - Start postgres server from the master branch
>> - Initialise pgbench
>> - Run pgbench -c 10 -T 100
>> - Stop postgres with -m immediate
>>
>
> That is a strange test to run, but it would be better if the behavior was
> not that one.


Well, I think it's a very legitimate test, not for testing performance, but
testing crash recovery and I use it very often. This particular test was
run to catch another bug which will be reported separately.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[HACKERS] pgbench stuck with 100% cpu usage

2017-09-28 Thread Pavan Deolasee
Hello,

While running some tests, I encountered a situation where pgbench gets
stuck in an infinite loop, consuming 100% cpu. The setup was:

- Start postgres server from the master branch
- Initialise pgbench
- Run pgbench -c 10 -T 100
- Stop postgres with -m immediate

Now it seems that pgbench gets stuck and it's state machine does not
advance. Attaching it to debugger, I saw that one of the clients remain
stuck in this loop forever.

   if (command->type == SQL_COMMAND)
{
if (!sendCommand(st, command))
{
/*
 * Failed. Stay in CSTATE_START_COMMAND state, to
 * retry. ??? What the point or retrying? Should
 * rather abort?
 */
return;
}
else
st->state = CSTATE_WAIT_RESULT;
}

sendCommand() returns false because the underlying connection is bad
and PQsendQuery returns 0. Reading the comment, it seems that the author
thought about this situation but decided to retry instead of abort. Not
sure what was the rationale for that decision, may be to deal with
transient failures?

The commit that introduced this code is 12788ae49e1933f463bc. So I am
copying Heikki.

Thanks,
Pavan



-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Parallel worker error

2017-09-03 Thread Pavan Deolasee
On Wed, Aug 30, 2017 at 8:49 PM, Robert Haas <robertmh...@gmail.com> wrote:

>
>
> But since that's an established design fl^H^Hprinciple, maybe that
> means we should go with the approach of teaching SerializeGUCState()
> to ignore role altogether and instead have ParallelWorkerMain call
> SetCurrentRoleId using information passed via the FixedParallelState
> (not sure of the precise details here).
>
>
Seems like a reasonable approach to me.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Parallel worker error

2017-08-30 Thread Pavan Deolasee
On Wed, Aug 30, 2017 at 5:19 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

>
>
> I am able to reproduce this without involving session authorization
> guc as well.  One needs to drop the newly created role from another
> session, then also we can see the same error.
>
>
Yeah, that's how I first created the case. But concurrent dropping of role
(which is surprisingly allowed even when there are active connections with
the role active) opens another set of errors. Hence I tried to reproduce
this in a single session. While it might be interesting to fix the
concurrent role drop problem someday, the single session problem looks more
pressing.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[HACKERS] Parallel worker error

2017-08-30 Thread Pavan Deolasee
Hello,

While investing an issue in Postgres-XL 10, I came across this rather
surprisingly behaviour in PG master. See this test case:

create role testuser1;
set role to testuser1;
show role; -- shows testuser1

-- reset back to default role
reset session authorization ;
show role; -- shows none

set force_parallel_mode TO 1;
select count(*) from pg_class ; -- ok

-- now drop the role and try parallel query again
drop role  testuser1 ;
select count(*) from pg_class ; -- fails

The last statement in this test fails with an error:
ERROR:  role "testuser1" does not exist
CONTEXT:  parallel worker

Looks like the leader process when serialises the GUC state, saves the
"role" value, which is still set to testuser1 (and which is now dropped).
Later, when the parallel worker process tries to restore the GUC state, it
fails on catalog lookup.

Comments in show_role() mentions a kludge because apparently SET SESSION
AUTHORIZATION cannot call set_config_option and change the current value of
"role". So that probably explains why show_role() displays the correct
information, but parallel worker fails to handle the case.

It's quite possible that I don't understand the differences in "role" and
"session authorization", but it still looks like a bug to me. May
be SerializeGUCState() should check if SetRoleIsActive is true and only
then save the role information?

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-07-27 Thread Pavan Deolasee
On Fri, Jul 28, 2017 at 5:57 AM, Peter Geoghegan <p...@bowt.ie> wrote:

> Pavan Deolasee <pavan.deola...@gmail.com> wrote:
> > I'll be happy if someone wants to continue hacking the patch further and
> > get it in a committable shape. I can stay actively involved. But TBH the
> > amount of time I can invest is far as compared to what I could during the
> > last cycle.
>
> That's disappointing.
>
>
Yes, it is even more for me. But I was hard pressed to choose between
Postgres-XL 10 and WARM. Given ever increasing interest in XL and my
ability to control the outcome, I thought it makes sense to focus on XL for
now.


> I personally find it very difficult to assess something like this.


One good thing is that the patch is ready and fully functional. So that
allows those who are keen to run real performance tests and see the actual
impact of the patch.


> The
> problem is that even if you can demonstrate that the patch is strictly
> better than what we have today, the risk of reaching a local maxima
> exists.  Do we really want to double-down on HOT?
>

Well HOT has served us well for over a decade now. So I won't hesitate to
place my bets on WARM.


>
> If I'm not mistaken, the goal of WARM is, roughly speaking, to make
> updates that would not be HOT-safe today do a "partial HOT update".  My
> concern with that idea is that it doesn't do much for the worst case.
>

I see your point. But I would like to think this way: does the technology
significantly help many common use cases, that are currently not addressed
by HOT? It probably won't help all workloads, that's given. Also, we don't
have any credible alternative while this patch has progressed quite a lot.
May be Robert will soon present the pluggable storage/UNDO patch and that
will cover everything and more that is currently covered by HOT/WARM. That
will probably make many other things redundant.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-07-27 Thread Pavan Deolasee
On Wed, Jul 26, 2017 at 6:26 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Tue, Apr 18, 2017 at 4:25 AM, Pavan Deolasee
> <pavan.deola...@gmail.com> wrote:
> > I'll include the fix in the next set of patches.
>
> I haven't see a new set of patches.  Are you intending to continue
> working on this?
>
>
Looks like I'll be short on bandwidth to pursue this further, given other
work commitments including upcoming Postgres-XL 10 release. While I haven't
worked on the patch since April, I think it was in a pretty good shape
where I left it. But it's going to be incredibly difficult to estimate the
amount of further efforts required, especially with testing and validating
all the use cases and finding optimisations to fix regressions in all those
cases. Also, many fundamental concerns around the patch touching the core
of the database engine can only be addressed if some senior hackers, like
you, take serious interest in the patch.

I'll be happy if someone wants to continue hacking the patch further and
get it in a committable shape. I can stay actively involved. But TBH the
amount of time I can invest is far as compared to what I could during the
last cycle.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] logical replication and PANIC during shutdown checkpoint in publisher

2017-05-05 Thread Pavan Deolasee
On Fri, May 5, 2017 at 10:56 AM, Michael Paquier <michael.paqu...@gmail.com>
wrote:

> On Wed, May 3, 2017 at 12:25 AM, Peter Eisentraut
>
>
> >>> Can we prevent HOT pruning during logical decoding?
> >>
> >> It does not sound much difficult to do, couldn't you just make it a
> >> no-op with am_walsender?
> >
> > That's my hope.
>
> The only code path doing HOT-pruning and generating WAL is
> heap_page_prune(). Do you think that we need to worry about FPWs as
> well?
>

IMO the check should go inside heap_page_prune_opt(). Do we need to worry
about wal_log_hints or checksums producing WAL because of hint bit updates?
While I haven't read the thread, I am assuming if HOT pruning can happen,
surely hint bits can get set too.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Assertion failure in REL9_5_STABLE

2017-04-28 Thread Pavan Deolasee
On Wed, Apr 19, 2017 at 11:19 AM, Andrew Gierth <and...@tao11.riddles.org.uk
> wrote:

> >>>>> "Pavan" == Pavan Deolasee <pavan.deola...@gmail.com> writes:
>
>  Pavan> I am attaching a patch that throws a similar ERROR during
>  Pavan> planning even for 9.5. AFAICS in presence of grouping sets, we
>  Pavan> always decide to use sort-based implementation for grouping, but
>  Pavan> do not check if the columns support ordering or not.
>
> Hmmm. This is obviously my fault somewhere... the patch looks good, I'll
> deal with it shortly.
>

JFR this got fixed via 7be3678a8cfb55dcfca90fa586485f835ab912a5

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Assertion failure in REL9_5_STABLE

2017-04-18 Thread Pavan Deolasee
On Wed, Apr 19, 2017 at 11:19 AM, Andrew Gierth <and...@tao11.riddles.org.uk
> wrote:

> >>>>> "Pavan" == Pavan Deolasee <pavan.deola...@gmail.com> writes:
>
>  Pavan> I am attaching a patch that throws a similar ERROR during
>  Pavan> planning even for 9.5. AFAICS in presence of grouping sets, we
>  Pavan> always decide to use sort-based implementation for grouping, but
>  Pavan> do not check if the columns support ordering or not.
>
> Hmmm. This is obviously my fault somewhere... the patch looks good, I'll
> deal with it shortly.
>
>
Thanks. It might be a good idea to include that test case from the master.
Please let me know if you would like me to send a new patch which includes
that.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[HACKERS] Assertion failure in REL9_5_STABLE

2017-04-18 Thread Pavan Deolasee
Hi All,

I stumbled upon this test case from the master that sets an assertion
failure (see stack trace at the end) on REL9_5_STABLE.

create temp table gstest4(id integer, v integer,
  unhashable_col bit(4), unsortable_col xid);
insert into gstest4
values (1,1,b'','1'), (2,2,b'0001','1'),
   (3,4,b'0010','2'), (4,8,b'0011','2'),
   (5,16,b'','2'), (6,32,b'0001','2'),
   (7,64,b'0010','1'), (8,128,b'0011','1');

-- mixed hashable/sortable cases
select unhashable_col, unsortable_col,
   grouping(unhashable_col, unsortable_col),
   count(*), sum(v)
  from gstest4 group by grouping sets ((unhashable_col),(unsortable_col))
 order by 3, 5;

I tested this with REL9_6_STABLE and it throws a more useful error message,
presumably because the code was refactored quite heavily for upper planner
changes.

ERROR:  could not implement GROUP BY
DETAIL:  Some of the datatypes only support hashing, while others only
support sorting.

I am attaching a patch that throws a similar ERROR during planning even for
9.5. AFAICS in presence of grouping sets, we always decide to use
sort-based implementation for grouping, but do not check if the columns
support ordering or not.

I did not test it for other older branches because grouping sets were
introduced in 9.5.

Thanks,
Pavan


(lldb) bt
* thread #1: tid = 0x, 0x7fff9c1dfdda
libsystem_kernel.dylib`__pthread_kill + 10, stop reason = signal SIGSTOP
  * frame #0: 0x7fff9c1dfdda libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x7fff9c2cb787 libsystem_pthread.dylib`pthread_kill + 90
frame #2: 0x7fff9c145420 libsystem_c.dylib`abort + 129
frame #3: 0x00010f61a3f0
postgres`ExceptionalCondition(conditionName="!(sortOperators[i] != 0)",
errorType="BadArgument", fileName="tuplesort.c", lineNumber=657) + 128 at
assert.c:54
frame #4: 0x00010f65a07d
postgres`tuplesort_begin_heap(tupDesc=0x7fb1528194d8, nkeys=1,
attNums=0x7fb15280e9e0, sortOperators=0x7fb15280ea00,
sortCollations=0x7fb15280ea20, nullsFirstFlags="", workMem=4096,
randomAccess='\0') + 509 at tuplesort.c:657
frame #5: 0x00010f2e871d
postgres`initialize_phase(aggstate=0x7fb152817828, newphase=0) + 477 at
nodeAgg.c:456
frame #6: 0x00010f2e73f0
postgres`ExecInitAgg(node=0x7fb1528062e8, estate=0x7fb152817440,
eflags=16) + 2656 at nodeAgg.c:2223
frame #7: 0x00010f2d18e1
postgres`ExecInitNode(node=0x7fb1528062e8, estate=0x7fb152817440,
eflags=16) + 865 at execProcnode.c:296
frame #8: 0x00010f301231
postgres`ExecInitSort(node=0x7fb152806400, estate=0x7fb152817440,
eflags=16) + 225 at nodeSort.c:202
frame #9: 0x00010f2d18a9
postgres`ExecInitNode(node=0x7fb152806400, estate=0x7fb152817440,
eflags=16) + 809 at execProcnode.c:286
frame #10: 0x00010f2ccf20
postgres`InitPlan(queryDesc=0x7fb152803c40, eflags=16) + 1520 at
execMain.c:957
frame #11: 0x00010f2cc81f
postgres`standard_ExecutorStart(queryDesc=0x7fb152803c40, eflags=16) +
591 at execMain.c:237
frame #12: 0x00010f2cc5be
postgres`ExecutorStart(queryDesc=0x7fb152803c40, eflags=0) + 62 at
execMain.c:139
frame #13: 0x00010f48b112
postgres`PortalStart(portal=0x7fb151836c40, params=0x,
eflags=0, snapshot=0x) + 722 at pquery.c:533
frame #14: 0x00010f4871c1
postgres`exec_simple_query(query_string="select unhashable_col,
unsortable_col,\n   grouping(unhashable_col, unsortable_col),\n
count(*), sum(v)\n  from gstest4 group by grouping sets
((unhashable_col),(unsortable_col))\n order by 3, 5;") + 945 at
postgres.c:1065
frame #15: 0x00010f486525 postgres`PostgresMain(argc=1,
argv=0x7fb151801f00, dbname="postgres", username="pavan") + 2245 at
postgres.c:4051
frame #16: 0x00010f3ef392
postgres`BackendRun(port=0x7fb151700540) + 674 at postmaster.c:4254
frame #17: 0x00010f3ee64d
postgres`BackendStartup(port=0x7fb151700540) + 365 at postmaster.c:3928
frame #18: 0x00010f3ed705 postgres`ServerLoop + 597 at
postmaster.c:1698
frame #19: 0x00010f3eb066 postgres`PostmasterMain(argc=3,
argv=0x7fb151403740) + 5414 at postmaster.c:1306
frame #20: 0x00010f32bddf postgres`main(argc=3,
argv=0x7fb151403740) + 751 at main.c:228
frame #21: 0x7fff9c0b1255 libdyld.dylib`start + 1
    frame #22: 0x7fff9c0b1255 libdyld.dylib`start + 1


-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pg95_assertion_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-18 Thread Pavan Deolasee
On Fri, Apr 14, 2017 at 9:21 PM, Jaime Casanova <jaime.casanova@2ndquadrant.
com> wrote:

>
>
> Hi Pavan,
>
> I run a test on current warm patchset, i used pgbench with a scale of
> 20 and a fillfactor of 90 and then start the pgbench run with 6
> clients in parallel i also run sqlsmith on it.
>
> And i got a core dump after sometime of those things running.
>
> The assertion that fails is:
>
> """
> LOG:  statement: UPDATE pgbench_tellers SET tbalance = tbalance + 3519
> WHERE tid = 34;
> TRAP: FailedAssertion("!(((bool) (((const void*)(>t_ctid) !=
> ((void *)0)) && (((>t_ctid)->ip_posid & uint16) 1) << 13) -
> 1)) != 0", File: "../../../../src/include/access/htup_details.h",
> Line: 659)
> """
>

Hi Jaime,

Thanks for doing the tests and reporting the problem. Per our chat, the
assertion failure occurs only after a crash recovery. I traced i down to
the point where we were failing to set the root line pointer correctly
during crash recovery. In fact, we were setting it, but after the local
changes are copied to the on-disk image, thus failing to make to the
storage.

Can you please test with the attached patch and confirm it works? I was
able to reproduce the exact same assertion on my end and the patch seems to
fix it. But an additional check won't harm.

I'll include the fix in the next set of patches.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


warm_crash_recovery_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PANIC in pg_commit_ts slru after crashes

2017-04-14 Thread Pavan Deolasee
On Sat, Apr 15, 2017 at 12:53 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:

>
> In the first statement executed after crash recovery, I sometimes get this
> error:
>
> PANIC:  XX000: could not access status of transaction 207580505
> DETAIL:  Could not read from file "pg_commit_ts/1EF0" at offset 131072:
> Success.
> LOCATION:  SlruReportIOError, slru.c:918
> STATEMENT:  create temporary table aldjf (x serial)
>
> Other examples:
>
> PANIC:  XX000: could not access status of transaction 3483853232
> DETAIL:  Could not read from file "pg_commit_ts/20742" at offset 237568:
> Success.
> LOCATION:  SlruReportIOError, slru.c:918
> STATEMENT:  create temporary table aldjf (x serial)
>
> PANIC:  XX000: could not access status of transaction 802552883
> DETAIL:  Could not read from file "pg_commit_ts/779E" at offset 114688:
> Success.
> LOCATION:  SlruReportIOError, slru.c:918
> STATEMENT:  create temporary table aldjf (x serial)
>
> Based on the errno, I'm assuming the read was successful but returned the
> wrong number of bytes (which was zero in the case I saw after changing the
> code to log short reads).
>
> It then goes through recovery again and the problem does not immediately
> re-occur if you attempt to connect again.  I don't know why the file size
> would have changed between attempts.
>
>
> The problem bisects to the commit:
>
> commit 728bd991c3c4389fb39c45dcb0fe57e4a1dccd71
> Author: Simon Riggs <si...@2ndquadrant.com>
> Date:   Tue Apr 4 15:56:56 2017 -0400
>
> Speedup 2PC recovery by skipping two phase state files in normal path
>
>
> It is not obvious to me how that is relevant.  My test doesn't use
> prepared transactions (and leaves the guc at zero), and this commit doesn't
> touch the slru.c.
>
> I'm attaching the test harness.  There is a patch which injects the
> crash-faults and also allows xid fast-forward, a perl script that runs
> until crash and assesses the consistency of the post-crash results, and a
> shell script which sets up the database and then calls the perl script in a
> loop.  On 8 CPU machine, it takes about an hour for the PANIC to occur.
>
> The attached script bails out once it sees the PANIC (count.pl line 158)
> if it didn't do that then it will proceed to connect again and retry, and
> works fine.  No apparent permanent data corruption.
>
> Any clues on how to investigate this further?
>

Since all those offsets fall on a page boundary, my guess is that we're
somehow failing to handle a new page correctly.

Looking at the patch itself, my feeling is that the following code
in src/backend/access/transam/twophase.c might be causing the problem.

1841
1842 /* update nextXid if needed */
1843 if (TransactionIdFollowsOrEquals(maxsubxid,
ShmemVariableCache->nextXid))
1844 {
1845 LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
1846 ShmemVariableCache->nextXid = maxsubxid;
1847 TransactionIdAdvance(ShmemVariableCache->nextXid);
1848 LWLockRelease(XidGenLock);
1849 }

The function PrescanPreparedTransactions() gets called at the start of the
redo recovery and this specific block will get exercised irrespective of
whether there are any prepared transactions or not. What I find
particularly wrong here is that we are initialising maxsubxid to current
value of ShmemVariableCache->nextXid when the function enters, but this
block would then again increment ShmemVariableCache->nextXid, when there
are no prepared transactions in the system.

I wonder if we should do as in attached patch.

It's not entirely clear to me why only CommitTS fails and not other slrus.
One possible reason could be that CommitTS is started before this function
gets called whereas CLOG and SUBTRANS get started after the function
returns.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


fix_commit_ts_crash.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-12 Thread Pavan Deolasee
On Wed, Apr 12, 2017 at 10:42 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Tue, Apr 11, 2017 at 1:20 PM, Pavan Deolasee
>


> > 5. Added code to set a CLEAR pointer to a WARM pointer when we know that
> the
> > entire chain is WARM. This should address the workload Dilip ran and
> found
> > regression (I don't think he got chance to confirm that)
>
> Which is clearly a thing that should happen before commit, and really,
> you ought to be leading the effort to confirm that, not him.  It's
> good for him to verify that your fix worked, but you should test it
> first.
>

Not sure why you think I did not do the tests. I did and reported that it
helps reduce the regression. Last para here:  https://www.postgresql.
org/message-id/CABOikdOTstHK2y0rDk%2BY3Wx9HRe%2BbZtj3zuYGU%
3DVngneiHo5KQ%40mail.gmail.com

I understand it might have got lost in the conversation and I possibly did
a poor job of explaining it. From my perspective, I did not want say that
everything is hunky-dory based on my own tests because 1. I probably do not
have access to the same kind of machine Dilip has and 2. It's better to get
it confirmed by someone who initially reported it. Again, I fully respect
that he would be busy with other things and I do not expect him or anyone
else to test/review my patch on priority. The only point I am trying to
make is that I did my own tests and made sure that it helps.

(Having said that, I am not sure if changing pointer state from CLEAR to
WARM is indeed a good change. Having thought more about it and after
looking at the page-split code, I now think that this might just confuse
the WARM cleanup code and make algorithm that much harder to prove)


> > 6. Enhanced stats collector to collect information about candidate WARM
> > chains and added mechanism to control WARM cleanup at the heap as well as
> > index level, based on configurable parameters. This gives user better
> > control over the additional work that is required for WARM cleanup.
>
> I haven't seen previous discussion of this; therefore I doubt whether
> we have agreement on these parameters.
>

Sure. I will bring these up in a more structured manner for everyone to
comment.


>
> > 7. Added table level option to disable WARM if nothing else works.
>
> -1 from me.
>

Ok. It's kinda last resort for me too. But at some point, we might want to
make that call if we find an important use case that regresses because of
WARM and we see no way to fix that or at least not without a whole lot of
complexity.


>
>
> > I may have missed something, but there is no intention to ignore known
> > regressions/reviews. Of course, I don't think that every regression will
> be
> > solvable, like if you run a CPU-bound workload, setting it up in a way
> such
> > that you repeatedly exercise the area where WARM is doing additional
> work,
> > without providing any benefit, may be you can still find regression. I am
> > willing to fix them as long as they are fixable and we are comfortable
> with
> > the additional code complexity. IMHO certain trade-offs are good, but I
> > understand that not everybody will agree with my views and that's ok.
>
> The point here is that we can't make intelligent decisions about
> whether to commit this feature unless we know which situations get
> better and which get worse and by how much.


Sure.


>   I don't accept as a
> general principle the idea that CPU-bound workloads don't matter.
> Obviously, I/O-bound workloads matter too, but we can't throw
> CPU-bound workloads under the bus.


Yeah, definitely not suggesting that.


>   Now, avoiding index bloat does
> also save CPU, so it is easy to imagine that WARM could come out ahead
> even if each update consumes slightly more CPU when actually updating,
> so we might not actually regress.  If we do, I guess I'd want to know
> why.


Well the kind of tests we did to look for regression were worst case
scenarios. For example, in the test where we found 10-15% regression, we
used a wide index (so recheck cost is high), WARM updated all rows,
disabled auto-vacuum (so no chain conversion) and then repeatedly selected
the rows from the index, thus incurring recheck overhead and in fact,
measuring only that.

When I measured WARM on tables with small scale factor so that everything
fits in memory, I found a modest 20% improvement in tps. So, you're right,
WARM might also help in-memory workloads. But that will show up only if we
measure UPDATEs and SELECTs both. If we measure only SELECTs and that too
in a state where we are paying all price for having done a WARM update,
obviously we will only see regression, if any. Not saying we should ignore
that. We should in fact measure all possible loads, and try to fix as many
as we can, especially if they resemble to a r

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-12 Thread Pavan Deolasee
On Thu, Apr 13, 2017 at 2:04 AM, Peter Geoghegan <p...@bowt.ie> wrote:

> On Wed, Apr 12, 2017 at 10:12 AM, Robert Haas <robertmh...@gmail.com>
> wrote:
> >> I may have missed something, but there is no intention to ignore known
> >> regressions/reviews. Of course, I don't think that every regression
> will be
> >> solvable, like if you run a CPU-bound workload, setting it up in a way
> such
> >> that you repeatedly exercise the area where WARM is doing additional
> work,
> >> without providing any benefit, may be you can still find regression. I
> am
> >> willing to fix them as long as they are fixable and we are comfortable
> with
> >> the additional code complexity. IMHO certain trade-offs are good, but I
> >> understand that not everybody will agree with my views and that's ok.
> >
> > The point here is that we can't make intelligent decisions about
> > whether to commit this feature unless we know which situations get
> > better and which get worse and by how much.  I don't accept as a
> > general principle the idea that CPU-bound workloads don't matter.
> > Obviously, I/O-bound workloads matter too, but we can't throw
> > CPU-bound workloads under the bus.  Now, avoiding index bloat does
> > also save CPU, so it is easy to imagine that WARM could come out ahead
> > even if each update consumes slightly more CPU when actually updating,
> > so we might not actually regress.  If we do, I guess I'd want to know
> > why.
>
> I myself wonder if this CPU overhead is at all related to LP_DEAD
> recycling during page splits.


With the respect to the tests that myself, Dilip and others did for WARM, I
think we were kinda exercising the worst case scenario. Like in one case,
we created a table with 40% fill factor,  created an index with a large
text column, WARM updated all rows in the table, turned off autovacuum so
that chain conversion does not take place, and then repeatedly run select
query on those rows using the index which did not receive WARM insert.

IOW we were only measuring the overhead of doing recheck by constructing an
index tuple from the heap tuple and then comparing it against the existing
index tuple. And we did find regression, which is not entirely surprising
because obviously that code path does extra work when it needs to do
recheck. And we're only measuring that overhead without taking into account
the benefits of WARM to the system in general. I think counter-argument to
that is, such workload may exists somewhere which might be regressed.

I have my suspicions that the recyling
> has some relationship to locality, which leads me to want to
> investigate how Claudio Freire's patch to consistently treat heap TID
> as part of the B-Tree sort order could help, both in general, and for
> WARM.
>

It could be, especially if we re-redesign recheck solely based on the index
pointer state and the heap tuple state. That could be more performant for
selects and could also be more robust, but will require index inserts to
get hold of the old index pointer (based on root TID), compare it against
the new index tuple and either skip the insert (if everything matches) or
set a PREWARM flag on the old pointer, and insert the new tuple with
POSTWARM flag.

Searching for old index pointer will be non-starter for non-unique indexes,
unless they are also sorted by TID, something that Claudio's patch does.
What I am not sure is whether the patch on its own will stand the
performance implications because it increases the index tuple width (and
probably index maintenance cost too).

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-11 Thread Pavan Deolasee
On Wed, Apr 12, 2017 at 9:23 AM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Tue, Apr 11, 2017 at 10:50 PM, Pavan Deolasee
>
> >
> > And I fixed them as quickly as humanly possible.
> >
>
> Yes, you have responded to them quickly, but I didn't get a chance to
> re-verify all of those.  However, I think the main point Robert wants
> to say is that somebody needs to dig the complete patch to see if
> there is any kind of problems with it.
>

There are no two views about that. I don't even claim that more problems
won't be found during in-depth review. I was only responding to his view
that I did not do much to address the regressions reported during the
review/tests.


>
> > 5. Added code to set a CLEAR pointer to a WARM pointer when we know that
> the
> > entire chain is WARM. This should address the workload Dilip ran and
> found
> > regression (I don't think he got chance to confirm that)
> >
>
> Have you by any chance tried to reproduce it at your end?


I did reproduce and verified that the new technique helps the case [1] (see
last para). I did not go extra length to check if there are more cases
which can still cause regression, like recheck is applied only once  to
each tuple (so the new technique does not yield any benefit) and whether
that still causes regression and by how much. However I ran pure pgbench
workload (only HOT updates) with smallish scale factor so that everything
fits in memory and did not find any regression.

Having said that, it's my view that others need not agree to it, that we
need to distinguish between CPU and IO load since WARM is designed to
address IO problems and not so much CPU problems. We also need to see
things in totality and probably measure updates and selects both if we are
going to WARM update all tuples once and read them once. That doesn't mean
we shouldn't perform more tests and I am more than willing to fix if we
find regression in even a remotely real-world use case.

Thanks,
Pavan

[1]
https://www.postgresql.org/message-id/CABOikdOTstHK2y0rDk%2BY3Wx9HRe%2BbZtj3zuYGU%3DVngneiHo5KQ%40mail.gmail.com

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-11 Thread Pavan Deolasee
On Tue, Apr 11, 2017 at 7:10 PM, Robert Haas <robertmh...@gmail.com> wrote:

>
>
> Yes, and as Andres says, you don't help with those, and then you're
> upset when your own patch doesn't get attention.


I am not upset, I was obviously a bit disappointed which I think is a very
natural emotion after spending weeks on it. I am not blaming any one
individual (excluding myself) for that and neither the community at large
for the outcome. And I've moved on. I know everyone is busy getting the
release ready and I see no point discussing this endlessly. We have enough
on our plates for next few weeks.


>
>   Amit and others who have started to
> dig into this patch a little bit found real problems pretty quickly
> when they started digging.


And I fixed them as quickly as humanly possible.


>   Those problems should be addressed, and
> review should continue (from whatever source) until no more problems
> can be found.


Absolutely.


>  The patch may
> or may not have any data-corrupting bugs, but regressions have been
> found and not addressed.


I don't know why you say that regressions are not addressed. Here are a few
things I did to address the regressions/reviews/concerns, apart from fixing
all the bugs discovered, but please let me know if there are things I've
not addressed.

1. Improved the interesting attrs patch that Alvaro wrote to address the
regression discovered in fetching more heap attributes. The patch that got
committed in fact improved certain synthetic workloads over then master.
2. Based on Petr and your feedback, disabled WARM on toasted attributes to
reduce overhead of fetching/decompressing the attributes.
3. Added code to avoid doing second index scan when the index does not
contain any WARM pointers. This should address the situation Amit brought
up where only one of the indexes receive WARM inserts.
4. Added code to kill wrong index pointers to do online cleanup.
5. Added code to set a CLEAR pointer to a WARM pointer when we know that
the entire chain is WARM. This should address the workload Dilip ran and
found regression (I don't think he got chance to confirm that)
6. Enhanced stats collector to collect information about candidate WARM
chains and added mechanism to control WARM cleanup at the heap as well as
index level, based on configurable parameters. This gives user better
control over the additional work that is required for WARM cleanup.
7. Added table level option to disable WARM if nothing else works.
8. Added mechanism to disable WARM when more than 50% indexes are being
updated. I ran some benchmarks with different percentage of indexes getting
updated and thought this is a good threshold.

I may have missed something, but there is no intention to ignore known
regressions/reviews. Of course, I don't think that every regression will be
solvable, like if you run a CPU-bound workload, setting it up in a way such
that you repeatedly exercise the area where WARM is doing additional work,
without providing any benefit, may be you can still find regression. I am
willing to fix them as long as they are fixable and we are comfortable with
the additional code complexity. IMHO certain trade-offs are good, but I
understand that not everybody will agree with my views and that's ok.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-08 Thread Pavan Deolasee
On Wed, Apr 5, 2017 at 11:57 PM, Andres Freund <and...@anarazel.de> wrote:

> On 2017-04-05 09:36:47 -0400, Robert Haas wrote:
> > By the way, the "Converting WARM chains back to HOT chains" section of
> > README.WARM seems to be out of date.  Any chance you could update that
> > to reflect the current state and thinking of the patch?
>
> I propose we move this patch to the next CF.  That shouldn't prevent you
> working on it, although focusing on review of patches that still might
> make it wouldn't hurt either.
>
>
Thank you all for the  reviews, feedback, tests, criticism. And apologies
for keep pushing it till the last minute even though it was clear to me
quite some time back the patch is not going to make it. But if I'd given
up, it would have never received whatever little attention it got. The only
thing that disappoints me is that the patch was held back on no strong
technical grounds -  at least none were clear to me. There were concerns
about on-disk changes etc, but most on-disk changes were known for 7 months
now. Reminds me of HOT development, when it would not receive adequate
feedback for quite many months, probably for very similar reasons - complex
patch, changes on-disk format, risky, even though performance gains were
quite substantial. I was much more hopeful this time because we have many
more experts now as compared to then, but we probably have equally more
amount of complex patches to review/commit.

I understand that we would like this patch to go in very early in the
development cycle. So as Alvaro mentioned elsewhere, we will continue to
work on it so that we can get it in as soon as v11 tree open. We shall soon
submit a revised version, with the list of critical things so that we can
discuss them here and get some useful feedback. I hope everyone understands
that the feature of this kind won't happen without on-disk format changes.
So to be able to address any concerns, we will need specific feedback and
workable suggestions, if any.

Finally, my apologies for not spending enough time reviewing other patches.
I know its critical, and I'll try to improve on that. Congratulations to
all whose work got accepted and many thanks to all reviewers/committers/CF
managers. I know how difficult and thankless that work is.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-06 Thread Pavan Deolasee
On Thu, Apr 6, 2017 at 12:20 AM, Peter Geoghegan <p...@bowt.ie> wrote:

> On Wed, Apr 5, 2017 at 11:27 AM, Andres Freund <and...@anarazel.de> wrote:
> > I propose we move this patch to the next CF.
>
> I agree. I think it's too late to be working out fine details around
> TOAST like this. This is a patch that touches the storage format in a
> fairly fundamental way.
>
> The idea of turning WARM on or off reminds me a little bit of the way
> it was at one time suggested that HOT not be used against catalog
> tables, a position that Tom pushed against.


I agree. I am grateful that Tom put his put down and helped me find answers
to all hard problems, including catalog tables and create index
concurrently. So I was very clear in my mind from the very beginning that
WARM must support all these things too. Obviously it still doesn't support
everything like other index methods and expression indexes, but IMHO that's
a much smaller problem. Also, making sure that WARM works on system tables
helped me find any corner bugs which would have otherwise skipped via
regular regression testing.



> I'm not saying that it's
> necessarily a bad idea, but we should exhaust alternatives, and have a
> clear rationale for it.
>

One reason why it's probably a good idea is because we know WARM will not
effective for all use cases and it might actually cause performance
regression for some of them. Even worse and as Robert fears, it might cause
data loss issues. Though TBH I haven't yet seen any concrete example where
it breaks so badly that it causes data loss, but that may be because the
patch still hasn't received enough eye balls or outside tests. Having table
level option would allow us to incrementally improve things instead of
making the initial patch so large that reviewing it is a complete
nightmare. May be it's already a nightmare.

It's not as if HOT would not have caused regression for some specific use
cases. But I think the general benefit was so strong that we never invested
time in finding and tuning for those specific cases, thus avoided some more
complexity to the code. WARM's benefits are probably not the same as HOT or
our standards may have changed or we probably have resources to do much
more elaborate tests, which were missing 10 years back. But now that we are
aware of some regressions, the choice is between spending considerable
amount of time trying to handle every case vs doing it incrementally and
start delivering to majority of the users, yet keeping the patch at a
manageable level.

Even if we were to provide table level option, my preference would be keep
it ON by default.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-05 Thread Pavan Deolasee
On Thu, Apr 6, 2017 at 1:06 AM, Robert Haas <robertmh...@gmail.com> wrote:

> On Wed, Apr 5, 2017 at 2:32 PM, Pavan Deolasee <pavan.deola...@gmail.com>
> wrote:
>
> > The other way is to pass old tuple values along with the new tuple
> values to
> > amwarminsert, build index tuples and then do a comparison. For duplicate
> > index tuples, skip WARM inserts.
>
> This is more what I was thinking.  But maybe one of the other ideas
> you wrote here is better; not sure.
>
>
Ok. I think I suggested this as one of the ideas upthread, to support hash
indexes for example. This might be a good safety-net, but AFAIC what we
have today should work since we pretty much construct index tuples in a
consistent way before doing a comparison.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-04 Thread Pavan Deolasee
On Wed, Apr 5, 2017 at 8:42 AM, Robert Haas <robertmh...@gmail.com> wrote:

> On Tue, Apr 4, 2017 at 10:21 PM, Pavan Deolasee
> <pavan.deola...@gmail.com> wrote:
> > On Thu, Mar 30, 2017 at 7:55 PM, Robert Haas <robertmh...@gmail.com>
> wrote:
> >>  but
> >> try to access the TOAST table would be fatal; that probably would have
> >> deadlock hazards among other problems.
> >
> > Hmm. I think you're right. We could make a copy of the heap tuple, drop
> the
> > lock and then access TOAST to handle that. Would that work?
>
> Yeah, but it might suck.  :-)


Well, better than causing a deadlock ;-)

Lets see if we want to go down the path of blocking WARM when tuples have
toasted attributes. I submitted a patch yesterday, but having slept over
it, I think I made mistakes there. It might not be enough to look at the
caller supplied new tuple because that may not have any toasted values, but
the final tuple that gets written to the heap may be toasted. We could look
at the new tuple's attributes to find if any indexed attributes are
toasted, but that might suck as well. Or we can simply block WARM if the
old or the new tuple has external attributes i.e. HeapTupleHasExternal()
returns true. That could be overly restrictive because irrespective of
whether the indexed attributes are toasted or just some other attribute is
toasted, we will block WARM on such updates. May be that's not a problem.

We will also need to handle the case where some older tuple in the chain
has toasted value and that tuple is presented to recheck (I think we can
handle that case fairly easily, but its not done in the code yet) because
of a subsequent WARM update and the tuples updated by WARM did not have any
toasted values (and hence allowed).

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-04 Thread Pavan Deolasee
On Thu, Mar 30, 2017 at 7:55 PM, Robert Haas <robertmh...@gmail.com> wrote:

>  but
> try to access the TOAST table would be fatal; that probably would have
> deadlock hazards among other problems.


Hmm. I think you're right. We could make a copy of the heap tuple, drop the
lock and then access TOAST to handle that. Would that work?

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-03 Thread Pavan Deolasee
On Thu, Mar 30, 2017 at 11:17 PM, Bruce Momjian <br...@momjian.us> wrote:

> On Tue, Mar 21, 2017 at 04:04:58PM -0400, Bruce Momjian wrote:
> > On Tue, Mar 21, 2017 at 04:56:16PM -0300, Alvaro Herrera wrote:
> > > Bruce Momjian wrote:
> > > > On Tue, Mar 21, 2017 at 04:43:58PM -0300, Alvaro Herrera wrote:
> > > > > Bruce Momjian wrote:
> > > > >
> > > > > > I don't think it makes sense to try and save bits and add
> complexity
> > > > > > when we have no idea if we will ever use them,
> > > > >
> > > > > If we find ourselves in dire need of additional bits, there is a
> known
> > > > > mechanism to get back 2 bits from old-style VACUUM FULL.  I assume
> that
> > > > > the reason nobody has bothered to write the code for that is that
> > > > > there's no *that* much interest.
> > > >
> > > > We have no way of tracking if users still have pages that used the
> bits
> > > > via pg_upgrade before they were removed.
> > >
> > > Yes, that's exactly the code that needs to be written.
> >
> > Yes, but once it is written it will take years before those bits can be
> > used on most installations.
>
> Actually, the 2 bits from old-style VACUUM FULL bits could be reused if
> one of the WARM bits would be set  when it is checked.  The WARM bits
> will all be zero on pre-9.0.  The check would have to be checking the
> old-style VACUUM FULL bit and checking that a WARM bit is set.
>
>
We're already doing that in the submitted patch.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-03 Thread Pavan Deolasee
On Fri, Mar 31, 2017 at 12:31 PM, Dilip Kumar <dilipbal...@gmail.com> wrote:

> On Thu, Mar 30, 2017 at 5:27 PM, Amit Kapila <amit.kapil...@gmail.com>
> wrote:
> > I am not sure if we can consider it as completely synthetic because we
> > might see some similar cases for json datatypes.  Can we once try to
> > see the impact when the same test runs from multiple clients?  For
> > your information, I am also trying to setup some tests along with one
> > of my colleague and we will report the results once the tests are
> > complete.
> We have done some testing and below is the test details and results.
>
> Test:
> I have derived this test from above test given by pavan[1] except
> below difference.
>
> - I have reduced the fill factor to 40 to ensure that multiple there
> is scope in the page to store multiple WARM chains.
> - WARM updated all the tuples.
> - Executed a large select to enforce lot of recheck tuple within single
> query.
> - Smaller tuple size (aid field is around ~100 bytes) just to ensure
> tuple have sufficient space on a page to get WARM updated.
>
> Results:
> ---
> * I can see more than 15% of regression in this case. This regression
> is repeatable.
> * If I increase the fill factor to 90 than regression reduced to 7%,
> may be only fewer tuples are getting WARM updated and others are not
> because of no space left on page after few WARM update.
>

Thanks for doing the tests. The tests show us that if the table gets filled
up with WARM chains, and they are not cleaned up and the table is subjected
to read-only workload, we will see regression. Obviously, the test is
completely CPU bound, something WARM is not meant to address.I am not yet
certain if recheck is causing the problem. Yesterday I ran the test where I
was seeing regression with recheck completely turned off and still saw
regression. So there is something else that's going on with this kind of
workload. Will check.

Having said that, I think there are some other ways to fix some of the
common problems with repeated rechecks. One thing that we can do it rely on
the index pointer flags to decide whether recheck is necessary or not. For
example, a WARM pointer to a WARM tuple does not require recheck.
Similarly, a CLEAR pointer to a CLEAR tuple does not require recheck. A
WARM pointer to a CLEAR tuple can be discarded immediately because the only
situation where it can occur is in the case of aborted WARM updates. The
only troublesome situation is a CLEAR pointer to a WARM tuple. That
entirely depends on whether the index had received a WARM insert or not.
What we can do though, if recheck succeeds for the first time and if the
chain has only WARM tuples, we set the WARM bit on the index pointer. We
can use the same hint mechanism as used for marking index pointers dead to
minimise overhead.

Obviously this will only handle the case when the same tuple is rechecked
often. But if a tuple is rechecked only once then may be other overheads
will kick-in, thus reducing the regression significantly.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-31 Thread Pavan Deolasee
On Fri, Mar 31, 2017 at 11:16 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Fri, Mar 31, 2017 at 10:24 AM, Pavan Deolasee
> <pavan.deola...@gmail.com> wrote:
> > Having worked on it for some time now, I can say that WARM uses pretty
> much
> > the same infrastructure that HOT uses for cleanup/pruning tuples from the
> > heap. So the risk of having a bug which can eat your data from the heap
> is
> > very low. Sure, it might mess up with indexes, return duplicate keys, not
> > return a row when it should have. Not saying they are not bad bugs, but
> > probably much less severe than someone removing live rows from the heap.
>
> Yes, that's true.  If there's nothing wrong with the way pruning
> works, then any other problem can be fixed by reindexing, I suppose.
>
>
Yeah, I think so.


> I'm not generally a huge fan of on-off switches for things like this,
> but I know Simon likes them.  I think the question is how much they
> really insulate us from bugs.  For the hash index patch, for example,
> the only way to really get insulation from bugs added in this release
> would be to ship both the old and the new code in separate index AMs
> (hash, hash2).  The code has been restructured so much in the process
> of doing all of this that any other form of on-off switch would be
> pretty hit-or-miss whether it actually provided any protection.
>
> Now, I am less sure about this case, but my guess is that you can't
> really have this be something that can be flipped on and off for a
> table.  Once a table has any WARM updates in it, the code that knows
> how to cope with that has to be enabled, and it'll work as well or
> poorly as it does.


That's correct. Once enabled, we will need to handle the case of two index
pointers pointing to the same root. The only way to get rid of that is
probably do a complete rewrite/reindex, I suppose. But I was mostly talking
about immutable flag at table creation time as rightly guessed.


>   Now, I understand you to be suggesting a flag at
> table-creation time that would, maybe, be immutable after that, but
> even then - are we going to run completely unmodified 9.6 code for
> tables where that's not enabled, and only go through any of the WARM
> logic when it is enabled?  Doesn't sound likely.  The commits already
> made from this patch series certainly affect everybody, and I can't
> see us adding switches that bypass
> ce96ce60ca2293f75f36c3661e4657a3c79ffd61 for example.


I don't think I am going to claim that either. But probably only 5% of the
new code would then be involved. Which is a lot less and a lot more
manageable. Having said that, I think if we at all do this, we should only
do it based on our experiences in the beta cycle, as a last resort. Based
on my own experiences during HOT development, long running pgbench tests,
with several concurrent clients, subjected to multiple AV cycles and
periodic consistency checks, usually brings up issues related to heap
corruption. So my confidence level is relatively high on that part of the
code. That's not to suggest that there can't be any bugs.

Obviously then there are other things such as regression to some workload
or additional work required by vacuum etc. And I think we should address
them and I'm fairly certain we can do that. It may not happen immediately,
but if we provide right knobs, may be those who are affected can fall back
to the old behaviour or not use the new code at all while we improve things
for them. Some of these things I could have already implemented, but
without a clear understanding of whether the feature will get in or not,
it's hard to keep putting infinite efforts into the patch. All
non-committers go through that dilemma all the time, I'm sure.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-31 Thread Pavan Deolasee
On Fri, Mar 31, 2017 at 6:47 PM, Robert Haas <robertmh...@gmail.com> wrote:
>
>
> 2. WARM is a non-optional feature which touches the on-disk format.
> There is nothing more dangerous than that.  If hash indexes have bugs,
> people can avoid those bugs by not using them; there are good reasons
> to suppose that hash indexes have very few existing users.  The
> expression evaluation changes, IMHO, are much more dangerous because
> everyone will be exposed to them, but they will not likely corrupt
> your data because they don't touch the on-disk format.  WARM is even a
> little more dangerous than that; everyone is exposed to those bugs,
> and in the worst case they could eat your data.
>
>
Having worked on it for some time now, I can say that WARM uses pretty much
the same infrastructure that HOT uses for cleanup/pruning tuples from the
heap. So the risk of having a bug which can eat your data from the heap is
very low. Sure, it might mess up with indexes, return duplicate keys, not
return a row when it should have. Not saying they are not bad bugs, but
probably much less severe than someone removing live rows from the heap.

And we can make it a table level property, keep it off by default, turn it
off on system tables in this release and change the defaults only when we
get more confidence assuming people use it by explicitly turning it on. Now
may be that's not the right approach and keeping it off by default will
mean it receives much less testing than we would like. So we can keep it on
in the beta cycle and then take a call. I went a good length to make it
work on system tables because during HOT development, Tom told me that it
better work for everything or it doesn't work at all. But with WARM it
works for system tables and I know no known bugs, but if we don't want to
risk system tables, we might want to turn it off (just prior to release may
be).

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Pavan Deolasee
On Thu, Mar 30, 2017 at 7:27 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Thu, Mar 30, 2017 at 6:37 AM, Pavan Deolasee
> <pavan.deola...@gmail.com> wrote:
> > I think we can fix that by comparing compressed values.  I know you had
> > raised concerns, but Robert confirmed that (IIUC) it's not a problem
> today.
>
> I'm not sure that's an entirely fair interpretation of what I said.
> My point was that, while it may not be broken today, it might not be a
> good idea to rely for correctness on it always being true.
>
>
I take that point. We have a choice of fixing it today or whenever to
support multiple compression techniques. We don't even know how that will
look like and whether we will be able to look at compressed data and tell
whether two values are compressed by exact same way.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Pavan Deolasee
On Thu, Mar 30, 2017 at 5:27 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

>
>
> How have you verified that?  Have you checked that in
> heap_prepare_insert it has called toast_insert_or_update() and then
> returned a tuple different from what the input tup is?  Basically, I
> am easily able to see it and even the reason why the heap and index
> tuples will be different.  Let me try to explain,
> toast_insert_or_update returns a new tuple which contains compressed
> data and this tuple is inserted in heap where as slot still refers to
> original tuple (uncompressed one) which is passed to heap_insert.
> Now, ExecInsertIndexTuples and the calls under it like FormIndexDatum
> will refer to the tuple in slot which is uncompressed and form the
> values[] using uncompressed value.


Ah, yes. You're right. Not sure why I saw things differently. That doesn't
anything though because during recheck we'll get compressed value and not
do anything with it. In the index we already have compressed value and we
can compare them. Even if we decide to decompress everything and do the
comparison, that should be possible. So I don't see a problem as far as
correctness goes.


>
> So IIUC, in above test during initialization you have one WARM update
> and then during actual test all are HOT updates, won't in such a case
> the WARM chain will be converted to HOT by vacuum and then all updates
> from thereon will be HOT and probably no rechecks?
>

There is no AV.. Just 1 tuple being HOT updated out of 100 tuples.
Confirmed by looking at pg_stat_user_tables. Also made sure that the tuple
doesn't get non-HOT updated in between, thus breaking the WARM chain.


>
>
> >
> > I then also repeated the tests, but this time using compressible values.
> The
> > regression in this case is much higher, may be 15% or more.
> >
>
> Sounds on higher side.
>
>
Yes, definitely. If we can't reduce that, we might want to provide table
level option to explicitly turn WARM off on such tables.


> IIUC, by the time you are comparing tuple attrs to check for modified
> columns, you don't have the compressed values for new tuple.
>
>
I think it depends. If the value is not being modified, then we will get
both values as compressed. At least I confirmed with your example and
running an update which only changes c1. Don't know if that holds for all
cases.


> >  I know you had
> > raised concerns, but Robert confirmed that (IIUC) it's not a problem
> today.
> >
>
> Yeah, but I am not sure if we can take Robert's statement as some sort
> of endorsement for what the patch does.
>
>
Sure.


> > We will figure out how to deal with it if we ever add support for
> different
> > compression algorithms or compression levels. And I also think this is
> kinda
> > synthetic use case and the fact that there is not much regression with
> > indexes as large as 2K bytes seems quite comforting to me.
> >
>
> I am not sure if we can consider it as completely synthetic because we
> might see some similar cases for json datatypes.  Can we once try to
> see the impact when the same test runs from multiple clients?


Ok. Might become hard to control HOT behaviour though. Or will need to do
mix of WARM/HOT updates. Will see if this is something easily doable by
setting high FF etc.


>   For
> your information, I am also trying to setup some tests along with one
> of my colleague and we will report the results once the tests are
> complete.
>
>
That'll be extremely helpful, especially if its a something close to
real-world scenario. Thanks for doing that.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Pavan Deolasee
On Wed, Mar 29, 2017 at 4:42 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Wed, Mar 29, 2017 at 1:10 PM, Pavan Deolasee
> <pavan.deola...@gmail.com> wrote:
> >
> > On Wed, Mar 29, 2017 at 12:02 PM, Amit Kapila <amit.kapil...@gmail.com>
> > wrote:
> >>
> >> On Wed, Mar 29, 2017 at 11:52 AM, Amit Kapila <amit.kapil...@gmail.com>
> >> wrote:
> >
> > Then during recheck, we pass already compressed values to
> > index_form_tuple(). But my point is, the following code will ensure that
> we
> > don't compress it again. My reading is that the first check for
> > !VARATT_IS_EXTENDED will return false if the value is already compressed.
> >
>
> You are right.  I was confused with previous check of VARATT_IS_EXTERNAL.
>
>
Ok, thanks.


> >
> > TBH I couldn't find why the original index insertion code will always
> supply
> > uncompressed values.
> >
>
> Just try by inserting large value of text column ('aa.bbb')
> upto 2.5K.  Then have a breakpoint in heap_prepare_insert and
> index_form_tuple, and debug both the functions, you can find out that
> even though we compress during insertion in heap, the index will
> compress the original value again.
>
>
Ok, tried that. AFAICS index_form_tuple gets compressed values.


>
> Yeah probably you are right, but I am not sure if it is good idea to
> compare compressed values.
>
>
Again, I don't see a problem there.


> I think with this new changes in btrecheck, it would appear to be much
> costlier as compare to what you have few versions back.  I am afraid
> that it can impact performance for cases where there are few WARM
> updates in chain and many HOT updates as it will run recheck for all
> such updates.


My feeling is that the recheck could be costly for very fat indexes, but
not doing WARM could be costly too for such indexes. We can possibly
construct a worst case where
1. set up a table with a fat index.
2. do a WARM update to a tuple
3. then do several HOT updates to the same tuple
4. query the row via the fat index.


Initialisation:

-- Adjust parameters to force index scans
-- enable_seqscan to false
-- seq_page_cost = 1

DROP TABLE IF EXISTS pgbench_accounts;

CREATE TABLE pgbench_accounts (
aid text,
bid bigint,
abalance bigint,
filler1 text DEFAULT md5(random()::text),
filler2 text DEFAULT md5(random()::text),
filler3 text DEFAULT md5(random()::text),
filler4 text DEFAULT md5(random()::text),
filler5 text DEFAULT md5(random()::text),
filler6 text DEFAULT md5(random()::text),
filler7 text DEFAULT md5(random()::text),
filler8 text DEFAULT md5(random()::text),
filler9 text DEFAULT md5(random()::text),
filler10 text DEFAULT md5(random()::text),
filler11 text DEFAULT md5(random()::text),
filler12 text DEFAULT md5(random()::text)
) WITH (fillfactor=90);
\set end 0
\set start (:end + 1)
\set end (:start + (:scale * 100))

INSERT INTO pgbench_accounts SELECT generate_series(:start, :end )::text ||
<2300 chars string>, (random()::bigint) % :scale, 0;

CREATE UNIQUE INDEX pgb_a_aid ON pgbench_accounts(aid);
CREATE INDEX pgb_a_filler1 ON pgbench_accounts(filler1);
CREATE INDEX pgb_a_filler2 ON pgbench_accounts(filler2);
CREATE INDEX pgb_a_filler3 ON pgbench_accounts(filler3);
CREATE INDEX pgb_a_filler4 ON pgbench_accounts(filler4);

-- Force a WARM update on one row
UPDATE pgbench_accounts SET filler1 = 'X' WHERE aid = '100' ||
repeat('abcdefghij', 2);

Test:
-- Fetch the row using the fat index. Since the row contains a
BEGIN;
SELECT substring(aid, 1, 10) FROM pgbench_accounts WHERE aid = '100' ||
 <2300 chars string> ORDER BY aid;
UPDATE pgbench_accounts SET abalance = abalance + 100 WHERE aid = '100' ||
 <2300 chars string>;
END;

I did 4 5-minutes runs with master and WARM and there is probably a 2-3%
regression.

(Results with 5 mins tests, txns is total for 5 mins, idx_scan is number of
scans on the fat index)
master:
txns  idx_scan
414117 828233
411109 822217
411848 823695
408424 816847

WARM:
txns   idx_scan
404139 808277
398880 797759
399949 799897
397927 795853

==

I then also repeated the tests, but this time using compressible values.
The regression in this case is much higher, may be 15% or more.

INSERT INTO pgbench_accounts SELECT generate_series(:start, :end )::text ||
repeat('abcdefghij', 2), (random()::bigint) % :scale, 0;

-- Fetch the row using the fat index. Since the row contains a
BEGIN;
SELECT substring(aid, 1, 10) FROM pgbench_accounts WHERE aid = '100' ||
repeat('abcdefghij', 2) ORDER BY aid;
UPDATE pgbench_accounts SET abalance = abalance + 100 WHERE aid = '100' ||
repeat('abcdefghij', 2);
END;

(Results with 5 mins tests, txns is total for 5 mins, idx_scan is number of
scans on the fat index)
master:
txns   idx_scan
56976 113953
56822 113645
56915 113831

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-29 Thread Pavan Deolasee
On Wed, Mar 29, 2017 at 12:02 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Wed, Mar 29, 2017 at 11:52 AM, Amit Kapila <amit.kapil...@gmail.com>
> wrote:
> > On Tue, Mar 28, 2017 at 10:35 PM, Pavan Deolasee
> > <pavan.deola...@gmail.com> wrote:
> >>
> >> On Tue, Mar 28, 2017 at 7:04 PM, Amit Kapila <amit.kapil...@gmail.com>
> >> wrote:
> >>>
> >>>   For such an heap insert, we will pass
> >>> the actual value of column to index_form_tuple during index insert.
> >>> However during recheck when we fetch the value of c2 from heap tuple
> >>> and pass it index tuple, the value is already in compressed form and
> >>> index_form_tuple might again try to compress it because the size will
> >>> still be greater than TOAST_INDEX_TARGET and if it does so, it might
> >>> make recheck fail.
> >>>
> >>
> >> Would it? I thought  "if
> >> (!VARATT_IS_EXTENDED(DatumGetPointer(untoasted_values[i]))" check
> should
> >> prevent that. But I could be reading those macros wrong. They are
> probably
> >> heavily uncommented and it's not clear what each of those VARATT_*
> macro do.
> >>
> >
> > That won't handle the case where it is simply compressed.  You need
> > check like VARATT_IS_COMPRESSED to take care of compressed heap
> > tuples, but then also it won't work because heap_tuple_fetch_attr()
> > doesn't handle compressed tuples.  You need to use
> > heap_tuple_untoast_attr() to handle the compressed case.  Also, we
> > probably need to handle other type of var attrs.  Now, If we want to
> > do all of that index_form_tuple()  might not be the right place, we
> > probably want to handle it in caller or provide an alternate API.
> >
>
> Another related, index_form_tuple() has a check for VARATT_IS_EXTERNAL
> not VARATT_IS_EXTENDED, so may be that is cause of confusion for you,
> but as I mentioned even if you change the check heap_tuple_fetch_attr
> won't suffice the need.
>
>
I am confused :-(

Assuming big-endian machine:

VARATT_IS_4B_U - !toasted && !compressed
VARATT_IS_4B_C - compressed (may or may not be toasted)
VARATT_IS_4B - !toasted (may or may not be compressed)
VARATT_IS_1B_E - toasted

#define VARATT_IS_EXTERNAL(PTR) VARATT_IS_1B_E(PTR)
#define VARATT_IS_EXTENDED(PTR) (!VARATT_IS_4B_U(PTR))

So VARATT_IS_EXTENDED means that the value is (toasted || compressed). If
we are looking at a value from the heap (untoasted) then it implies in-heap
compression. If we are looking at untoasted value, then it means
compression in the toast.

index_form_tuple() first checks if the value is externally toasted and
fetches the untoasted value if so. After that it checks if
!VARATT_IS_EXTENDED i.e. if the value is (!toasted && !compressed) and then
only try to apply compression on that.  It can't be a toasted value because
if it was, we just untoasted it. But it can be compressed either in the
heap or in the toast, in which case we don't try to compress it again. That
makes sense because if the value is already compressed there is not point
applying compression again.

Now what you're suggesting (it seems) is that when in-heap compression is
used and ExecInsertIndexTuples calls FormIndexDatum to create index tuple
values, it always passes uncompressed heap values. So when the index tuple
is originally inserted, it index_form_tuple() will try to compress it and
see if it fits in the index.

Then during recheck, we pass already compressed values to
index_form_tuple(). But my point is, the following code will ensure that we
don't compress it again. My reading is that the first check for
!VARATT_IS_EXTENDED will return false if the value is already compressed.

/*
 * If value is above size target, and is of a compressible datatype,
 * try to compress it in-line.
 */
if (!VARATT_IS_EXTENDED(DatumGetPointer(untoasted_values[i])) &&
VARSIZE(DatumGetPointer(untoasted_values[i])) > TOAST_INDEX_TARGET
&&
(att->attstorage == 'x' || att->attstorage == 'm'))
{
Datum   cvalue = toast_compress_datum(untoasted_values[i]);

if (DatumGetPointer(cvalue) != NULL)
{
/* successful compression */
if (untoasted_free[i])
pfree(DatumGetPointer(untoasted_values[i]));
untoasted_values[i] = cvalue;
untoasted_free[i] = true;
}
}

TBH I couldn't find why the original index insertion code will always
supply uncompressed values. But even if does, and even if the recheck gets
it in compressed form, I don't see how we will double-compress that.

As far as,

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-28 Thread Pavan Deolasee
On Tue, Mar 28, 2017 at 8:34 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Mon, Mar 27, 2017 at 10:25 PM, Pavan Deolasee
> <pavan.deola...@gmail.com> wrote:
> > On Tue, Mar 28, 2017 at 1:59 AM, Robert Haas <robertmh...@gmail.com>
> wrote:
> >> On Thu, Mar 23, 2017 at 2:47 PM, Pavan Deolasee
> >> <pavan.deola...@gmail.com> wrote:
> >> > It's quite hard to say that until we see many more benchmarks. As
> author
> >> > of
> >> > the patch, I might have got repetitive with my benchmarks. But I've
> seen
> >> > over 50% improvement in TPS even without chain conversion (6 indexes
> on
> >> > a 12
> >> > column table test).
> >>
> >> This seems quite mystifying.  What can account for such a large
> >> performance difference in such a pessimal scenario?  It seems to me
> >> that without chain conversion, WARM can only apply to each row once
> >> and therefore no sustained performance improvement should be possible
> >> -- unless rows are regularly being moved to new blocks, in which case
> >> those updates would "reset" the ability to again perform an update.
> >> However, one would hope that most updates get done within a single
> >> block, so that the row-moves-to-new-block case wouldn't happen very
> >> often.
> >
> > I think you're confusing between update chains that stay within a block
> vs
> > HOT/WARM chains. Even when the entire update chain stays within a block,
> it
> > can be made up of multiple HOT/WARM chains and each of these chains offer
> > ability to do one WARM update. So even without chain conversion, every
> > alternate update will be a WARM update. So the gains are perpetual.
>
> You're right, I had overlooked that.  But then I'm confused: how does
> the chain conversion stuff help as much as it does?  You said that you
> got a 50% improvement from WARM, because we got to skip half the index
> updates.  But then you said with chain conversion you got an
> improvement of more like 100%.  However, I would think that on this
> workload, chain conversion shouldn't save much.  If you're sweeping
> through the database constantly performing updates, the updates ought
> to be a lot more frequent than the vacuums.
>
> No?


These tests were done on a very large table of 80M rows. The table itself
was wide with 15 columns and a few indexes. So in a 8hr test, master could
do only 55M updates where as WARM did 105M updates. There were 4 autovacuum
cycles in both these runs. So while there were many updates, I am sure
autovacuum must have helped to increase the percentage of WARM updates
(from ~50% after steady state to ~67% after steady state). Also I said more
than 50%, but it was probably close to 65%.

Unfortunately these tests were done on different hardware, with different
settings and even slightly different scale factors. So they may not be
exactly comparable. But there is no doubt chain conversion will help to
some extent. I'll repeat the benchmark with chain conversion turned off and
report the exact difference.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-28 Thread Pavan Deolasee
On Tue, Mar 28, 2017 at 7:04 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

>
>
>   For such an heap insert, we will pass
> the actual value of column to index_form_tuple during index insert.
> However during recheck when we fetch the value of c2 from heap tuple
> and pass it index tuple, the value is already in compressed form and
> index_form_tuple might again try to compress it because the size will
> still be greater than TOAST_INDEX_TARGET and if it does so, it might
> make recheck fail.
>
>
Would it? I thought  "if
(!VARATT_IS_EXTENDED(DatumGetPointer(untoasted_values[i]))" check should
prevent that. But I could be reading those macros wrong. They are probably
heavily uncommented and it's not clear what each of those VARATT_* macro do.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-28 Thread Pavan Deolasee
On Tue, Mar 28, 2017 at 4:07 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

>
> Noted few cosmetic issues in 0005_warm_updates_v21:
>
> 1.
> pruneheap.c(939): warning C4098: 'heap_get_root_tuples' : 'void'
> function returning a value
>

Thanks. Will fix.


>
> 2.
> + *  HCWC_WARM_UPDATED_TUPLE - a tuple with HEAP_WARM_UPDATED is found
> somewhere
> + *in the chain. Note that when a tuple is WARM
> + *updated, both old and new versions are marked
> + *with this flag/
> + *
> + *  HCWC_WARM_TUPLE  - a tuple with HEAP_WARM_TUPLE is found somewhere in
> + *  the chain.
> + *
> + *  HCWC_CLEAR_TUPLE - a tuple without HEAP_WARM_TUPLE is found somewhere
> in
> + *   the chain.
>
> Description of all three flags is same.
>
>
Well the description is different (and correct), but given that it confused
you, I think I should rewrite those comments. Will do.


> 3.
> + *  HCWC_WARM_UPDATED_TUPLE - a tuple with HEAP_WARM_UPDATED is found
> somewhere
> + *in the chain. Note that when a tuple is WARM
> + *updated, both old and new versions are marked
> + *    with this flag/
>
> Spurious '/' at end of line.
>
>
Thanks. Will fix.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-28 Thread Pavan Deolasee
On Tue, Mar 28, 2017 at 4:05 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

>
>
> As asked previously, can you explain me on what basis are you
> considering it robust?  The comments on top of datumIsEqual() clearly
> indicates the danger of using it for toasted values (Also, it will
> probably not give the answer you want if either datum has been
> "toasted".).


Hmm. I don' see why the new code in recheck is unsafe. The index values
themselves can't be toasted (IIUC), but they can be compressed.
index_form_tuple() already untoasts any toasted heap attributes and
compresses them if needed. So once we pass heap values via
index_form_tuple() we should have exactly the same index values as they
were inserted. Or am I missing something obvious here?


  If you think that because we are using it during
> heap_update to find modified columns, then I think that is not right
> comparison, because there we are comparing compressed value (of old
> tuple) with uncompressed value (of new tuple) which should always give
> result as false.
>
>
Hmm, this seems like a problem. While HOT could tolerate occasional false
results (i.e. reporting a heap column as modified even though it it not),
WARM assumes that if the heap has reported different values, then they
better be different and should better result in different index values.
Because that's how recheck later works. Since index expressions are not
supported, I wonder if toasted heap values are the only remaining problem
in this area. So heap_tuple_attr_equals() should first detoast the heap
values and then do the comparison. I already have a test case that fails
for this reason, so let me try this approach.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-28 Thread Pavan Deolasee
On Tue, Mar 28, 2017 at 1:32 AM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

> Is the WARM tap test suite supposed to work when applied without all the
> other patches?  I just tried applied that one and running "make check -C
> src/test/modules", and it seems to hang after giving "ok 5" for
> t/002_warm_stress.pl.  (I had to add a Makefile too, attached.)
>
>
Yeah, sorry. Looks like I forgot to git add the Makefile.

BTW just tested on Ubuntu, and it works fine on that too. FWIW I'm using
perl v5.22.1 and IPC::Run 0.94 (assuming I got the versions correctly).


$ make -C src/test/modules/warm/ prove-check
make: Entering directory '/home/ubuntu/postgresql/src/test/modules/warm'
rm -rf /home/ubuntu/postgresql/src/test/modules/warm/tmp_check/log
cd . && TESTDIR='/home/ubuntu/postgresql/src/test/modules/warm'
PATH="/home/ubuntu/postgresql/tmp_install/home/ubuntu/pg-master-install/bin:$PATH"
LD_LIBRARY_PATH="/home/ubuntu/postgresql/tmp_install/home/ubuntu/pg-master-install/lib"
PGPORT='65432'
PG_REGRESS='/home/ubuntu/postgresql/src/test/modules/warm/../../../../src/test/regress/pg_regress'
prove -I ../../../../src/test/perl/ -I . --verbose t/*.pl
t/001_recovery.pl .
1..2
ok 1 - balanace matches after recovery
ok 2 - sum(delta) matches after recovery
ok
t/002_warm_stress.pl ..
1..10
ok 1 - dummy test passed
ok 2 - Fine match
ok 3 - psql exited normally
ok 4 - psql exited normally
ok 5 - psql exited normally
ok 6 - psql exited normally
ok 7 - psql exited normally
ok 8 - psql exited normally
ok 9 - psql exited normally
ok 10 - Fine match
ok
All tests successful.
Files=2, Tests=12, 22 wallclock secs ( 0.03 usr  0.00 sys +  7.94 cusr
 2.41 csys = 10.38 CPU)
Result: PASS
make: Leaving directory '/home/ubuntu/postgresql/src/test/modules/warm'

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-27 Thread Pavan Deolasee
On Mon, Mar 27, 2017 at 4:45 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Sat, Mar 25, 2017 at 1:24 PM, Amit Kapila <amit.kapil...@gmail.com>
> wrote:
> > On Fri, Mar 24, 2017 at 11:49 PM, Pavan Deolasee
> > <pavan.deola...@gmail.com> wrote:
> >>
> >> On Fri, Mar 24, 2017 at 6:46 PM, Amit Kapila <amit.kapil...@gmail.com>
> >> wrote:
> >>>
> >
> >> While looking at this problem, it occurred to me that the assumptions
> made
> >> for hash indexes are also wrong :-( Hash index has the same problem as
> >> expression indexes have. A change in heap value may not necessarily
> cause a
> >> change in the hash key. If we don't detect that, we will end up having
> two
> >> hash identical hash keys with the same TID pointer. This will cause the
> >> duplicate key scans problem since hashrecheck will return true for both
> the
> >> hash entries.
>
> Isn't it possible to detect duplicate keys in hashrecheck if we
> compare both hashkey and tid stored in index tuple with the
> corresponding values from heap tuple?
>
>
Hmm.. I thought that won't work. For example, say we have a tuple (X, Y, Z)
in the heap with a btree index on X and a hash index on Y. If that is
updated to (X, Y', Z) and say we do a WARM update and insert a new entry in
the hash index. Now if Y and Y' both generate the same hashkey, we will
have exactly similar looking <hashkey, TID> tuples in the hash index
leading to duplicate key scans.

I think one way to solve this is to pass both old and new heap values to
amwarminsert and expect each AM to detect duplicates and avoid creating of
a WARM pointer if index keys are exactly the same (we can do that since
there already exists another index tuple with the same keys pointing to the
same root TID).

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-27 Thread Pavan Deolasee
On Tue, Mar 28, 2017 at 7:49 AM, Bruce Momjian <br...@momjian.us> wrote:

> On Mon, Mar 27, 2017 at 04:29:56PM -0400, Robert Haas wrote:
> > On Thu, Mar 23, 2017 at 2:47 PM, Pavan Deolasee
> > <pavan.deola...@gmail.com> wrote:
> > > It's quite hard to say that until we see many more benchmarks. As
> author of
> > > the patch, I might have got repetitive with my benchmarks. But I've
> seen
> > > over 50% improvement in TPS even without chain conversion (6 indexes
> on a 12
> > > column table test).
> >
> > This seems quite mystifying.  What can account for such a large
> > performance difference in such a pessimal scenario?  It seems to me
> > that without chain conversion, WARM can only apply to each row once
> > and therefore no sustained performance improvement should be possible
> > -- unless rows are regularly being moved to new blocks, in which case
> > those updates would "reset" the ability to again perform an update.
> > However, one would hope that most updates get done within a single
> > block, so that the row-moves-to-new-block case wouldn't happen very
> > often.
> >
> > I'm perplexed.
>
> Yes, I asked the same question in this email:
>
> https://www.postgresql.org/message-id/2017032119.
> ge16...@momjian.us
>
>
And I've answered it so many times by now :-)

Just to add more to what I just said in another email, note that HOT/WARM
chains are created when a new root line pointer is created in the heap (a
line pointer that has an index pointing to it). And a new root line pointer
is created when a non-HOT/non-WARM update is performed. As soon as you do a
non-HOT/non-WARM update, the next update can again be a WARM update even
when everything fits in a single block.

That's why for a workload which doesn't do HOT updates and where not all
index keys are updated, you'll find every alternate update to a row to be a
WARM update, even when there is no chain conversion. That itself can save
lots of index bloat, reduce IO on the index and WAL.

Let me know if its still not clear and I can draw some diagrams to explain
it.

Thanks,
Pavan
-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-27 Thread Pavan Deolasee
On Tue, Mar 28, 2017 at 1:59 AM, Robert Haas <robertmh...@gmail.com> wrote:

> On Thu, Mar 23, 2017 at 2:47 PM, Pavan Deolasee
> <pavan.deola...@gmail.com> wrote:
> > It's quite hard to say that until we see many more benchmarks. As author
> of
> > the patch, I might have got repetitive with my benchmarks. But I've seen
> > over 50% improvement in TPS even without chain conversion (6 indexes on
> a 12
> > column table test).
>
> This seems quite mystifying.  What can account for such a large
> performance difference in such a pessimal scenario?  It seems to me
> that without chain conversion, WARM can only apply to each row once
> and therefore no sustained performance improvement should be possible
> -- unless rows are regularly being moved to new blocks, in which case
> those updates would "reset" the ability to again perform an update.
> However, one would hope that most updates get done within a single
> block, so that the row-moves-to-new-block case wouldn't happen very
> often.
>
>
I think you're confusing between update chains that stay within a block vs
HOT/WARM chains. Even when the entire update chain stays within a block, it
can be made up of multiple HOT/WARM chains and each of these chains offer
ability to do one WARM update. So even without chain conversion, every
alternate update will be a WARM update. So the gains are perpetual.

For example, if I take a simplistic example of a table with just one tuple
and four indexes and where every update updates just one of the indexes.
Assuming no WARM chain conversion this is what would happen for every
update:

1. WARM update, new entry in just one index
2. Regular update, new entries in all indexes
3. WARM update, new entry in just one index
4. Regular update, new entries in all indexes

At the end of N updates (assuming all fit in the same block), one index
will have N entries and rest will have N/2 entries.

Compare that against master:
1. Regular update, new entries in all indexes
2. Regular update, new entries in all indexes
3. Regular update, new entries in all indexes
4. Regular update, new entries in all indexes


At the end of N updates (assuming all fit in the same block), all indexes
will have N entries.  So with WARM we reduce bloats in 3 indexes. And WARM
works almost in a perpetual way even without chain conversion. If you see
the graph I earlier shared (attach again), without WARM chain conversion
the rate of WARM updates settle down to 50%, which is not surprising given
what I explained above.

Thanks,
Pavan
-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-27 Thread Pavan Deolasee
On Tue, Mar 28, 2017 at 1:32 AM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

> Is the WARM tap test suite supposed to work when applied without all the
> other patches?  I just tried applied that one and running "make check -C
> src/test/modules", and it seems to hang after giving "ok 5" for
> t/002_warm_stress.pl.  (I had to add a Makefile too, attached.)
>
>
These tests should run without WARM. I wonder though if IPC::Run's
start/pump/finish facility is fully portable. Andrew on off-list
conversation reminded me that there are no (or may be one) tests currently
using that in Postgres. I've run these tests on OSX, will try on some linux
platform too.

Thanks,
Pavan
-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[HACKERS] Stale comments in vacuumlazy.c

2017-03-26 Thread Pavan Deolasee
I happened to notice a stale comment at the very beginning of vacuumlazy.c.
ISTM we forgot to fix that when we introduced FSM. With FSM, vacuum no
longer needed to track per-page free space info. I propose attached fix.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


vacuumlazy_comment_fixes.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-25 Thread Pavan Deolasee
On Sat, 25 Mar 2017 at 11:03 PM, Peter Geoghegan <p...@bowt.ie> wrote:

> On Sat, Mar 25, 2017 at 12:54 AM, Amit Kapila <amit.kapil...@gmail.com>
> wrote:
> > I am not sure how do you want to binary compare two datums, if you are
> > thinking datumIsEqual(), that won't work.  I think you need to use
> > datatype specific compare function something like what we do in
> > _bt_compare().
>
> How will that interact with types like numeric, that have display
> scale or similar?
>
>  I wonder why Amit thinks that datumIsEqual won't work once we convert the
heap values to index tuple and then fetch using index_get_attr. After all
that's how the current index tuple was constructed when it was inserted. In
fact, we must not rely on _bt_compare because that might return "false
positive" even for two different heap binary values  (I think). To decide
whether to do WARM update or not in heap_update we only rely on binary
comparison. Could it happen that for two different binary heap values, we
still compute the same index attribute? Even when expression indexes are
not supported?

Thanks,
Pavan



> --
> Peter Geoghegan
>
-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-24 Thread Pavan Deolasee
On Fri, Mar 24, 2017 at 6:46 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

>
>
> I was worried for the case if the index is created non-default
> collation, will the datumIsEqual() suffice the need.  Now again
> thinking about it, I think it will because in the index tuple we are
> storing the value as in heap tuple.  However today it occurred to me
> how will this work for toasted index values (index value >
> TOAST_INDEX_TARGET).  It is mentioned on top of datumIsEqual() that it
> probably won't work for toasted values.  Have you considered that
> point?
>
>
No, I haven't and thanks for bringing that up. And now that I think more
about it and see the code, I think the naive way of just comparing index
attribute value against heap values is probably wrong. The example of
TOAST_INDEX_TARGET is one such case, but I wonder if there are other
varlena attributes that we might store differently in heap and index. Like
index_form_tuple() ->heap_fill_tuple seem to some churning for varlena.
It's not clear to me if index_get_attr will return the values which are
binary comparable to heap values.. I wonder if calling index_form_tuple on
the heap values, fetching attributes via index_get_attr on both index
tuples and then doing a binary compare is a more robust idea. Or may be
that's just duplicating efforts.

While looking at this problem, it occurred to me that the assumptions made
for hash indexes are also wrong :-( Hash index has the same problem as
expression indexes have. A change in heap value may not necessarily cause a
change in the hash key. If we don't detect that, we will end up having two
hash identical hash keys with the same TID pointer. This will cause the
duplicate key scans problem since hashrecheck will return true for both the
hash entries. That's a bummer as far as supporting WARM for hash indexes is
concerned, unless we find a way to avoid duplicate index entries.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-24 Thread Pavan Deolasee
On Fri, Mar 24, 2017 at 4:04 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Fri, Mar 24, 2017 at 12:25 AM, Pavan Deolasee
> <pavan.deola...@gmail.com> wrote:
> >
> >
> > On Thu, Mar 23, 2017 at 7:53 PM, Amit Kapila <amit.kapil...@gmail.com>
> >
> > The general sense I've got
> > here is that we're ok to push some work in background if it helps the
> > real-time queries, and I kinda agree with that.
> >
>
> I don't think we can define this work as "some" work, it can be a lot
> of work depending on the number of indexes.  Also, I think for some
> cases it will generate maintenance work without generating benefit.
> For example, when there is one index on a table and there are updates
> for that index column.
>
>
That's a fair point. I think we can address this though. At the end of
first index scan we would know how many warm pointers the index has and
whether it's worth doing a second scan. For the case you mentioned, we will
do a second scan just on that one index and skip on all other indexes and
still achieve the same result. On the other hand, if one index receives
many updates and other indexes are rarely updated then we might leave
behind a few WARM chains behind and won't be able to do IOS on those pages.
But given the premise that other indexes are receiving rare updates, it may
not be a problem. Note: the code is not currently written that way, but it
should be a fairly small change.

The other thing that we didn't talk about is that vacuum will need to track
dead tuples and warm candidate chains separately which increases memory
overhead. So for very large tables, and for the same amount of
maintenance_work_mem, one round of vacuum will be able to clean lesser
pages. We can work out more compact representation, but something not done
currently.


>
> > But this is clearly not
> > PG 10 material.
> >
>
> I don't see much discussion about this aspect of the patch, so not
> sure if it is acceptable to increase the cost of vacuum.  Now, I don't
> know if your idea of GUC's make it such that the additional cost will
> occur seldom and this additional pass has a minimal impact which will
> make it acceptable.


Yeah, I agree. I'm trying to schedule some more benchmarks, but any help is
appreciated.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Pavan Deolasee
On Thu, Mar 23, 2017 at 7:53 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Thu, Mar 23, 2017 at 3:44 PM, Pavan Deolasee
>
> >
> > Yes, this is a very fair point. The way I proposed to address this
> upthread
> > is by introducing a set of threshold/scale GUCs specific to WARM. So
> users
> > can control when to invoke WARM cleanup. Only if the WARM cleanup is
> > required, we do 2 index scans. Otherwise vacuum will work the way it
> works
> > today without any additional overhead.
> >
>
> I am not sure on what basis user can set such parameters, it will be
> quite difficult to tune those parameters.  I think the point is
> whatever threshold we keep, once that is crossed, it will perform two
> scans for all the indexes.


Well, that applies to even vacuum parameters, no? The general sense I've
got here is that we're ok to push some work in background if it helps the
real-time queries, and I kinda agree with that. If WARM improves things in
a significant manner even with these additional maintenance work, it's
still worth doing.

Having said that, I see many ways we can improve on this later. Like we can
track somewhere else information about tuples which may have received WARM
updates (I think it will need to be a per-index bitmap or so) and use that
to do WARM chain conversion in a single index pass. But this is clearly not
PG 10 material.


>   IIUC, this conversion of WARM chains is
> required so that future updates can be WARM or is there any other
> reason?  I see this as a big penalty for future updates.
>

It's also necessary for index-only-scans. But I don't see this as a big
penalty for future updates, because if there are indeed significant WARM
updates then not preparing for future updates will result in
write-amplification, something we are trying to solve here and something
which seems to be showing good gains.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Pavan Deolasee
On Thu, Mar 23, 2017 at 11:44 PM, Mithun Cy <mithun...@enterprisedb.com>
wrote:

> Hi Pavan,
> On Thu, Mar 23, 2017 at 12:19 AM, Pavan Deolasee
> <pavan.deola...@gmail.com> wrote:
> > Ok, no problem. I did some tests on AWS i2.xlarge instance (4 vCPU, 30GB
> > RAM, attached SSD) and results are shown below. But I think it is
> important
> > to get independent validation from your side too, just to ensure I am not
> > making any mistake in measurement. I've attached naively put together
> > scripts which I used to run the benchmark. If you find them useful,
> please
> > adjust the paths and run on your machine.
>
> I did a similar test appears. Your v19 looks fine to me, it does not
> cause any regression, On the other hand, I also ran tests reducing
> table fillfactor to 80 there I can see a small regression 2-3% in
> average when updating col2 and on updating col9 again I do not see any
> regression.
>
>
Thanks Mithun for repeating the tests and confirming that the v19 patch
looks ok.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Pavan Deolasee
On Thu, Mar 23, 2017 at 4:08 PM, Pavan Deolasee <pavan.deola...@gmail.com>
wrote:

>
>
> On Thu, Mar 23, 2017 at 3:02 PM, Amit Kapila <amit.kapil...@gmail.com>
> wrote:
>
>>
>>
>> That sounds like you are dodging the actual problem.  I mean you can
>> put that same PageIsFull() check in master code as well and then you
>> will most probably again see the same regression.
>
>
> Well I don't see it that way. There was a specific concern about a
> specific workload that WARM might regress. I think this change addresses
> that. Sure if you pick that one piece, put it in master first and then
> compare against rest of the WARM code, you will see a regression.
>

BTW the PageIsFull() check may not help as much in master as it does with
WARM. In master we anyways bail out early after couple of column checks. In
master it may help to reduce the 10% drop that we see while updating last
index column, but if we compare master and WARM with the patch applied,
regression should be quite nominal.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Pavan Deolasee
On Thu, Mar 23, 2017 at 3:02 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

>
>
> That sounds like you are dodging the actual problem.  I mean you can
> put that same PageIsFull() check in master code as well and then you
> will most probably again see the same regression.


Well I don't see it that way. There was a specific concern about a specific
workload that WARM might regress. I think this change addresses that. Sure
if you pick that one piece, put it in master first and then compare against
rest of the WARM code, you will see a regression. But I thought what we
were worried is WARM causing regression to some existing user, who might
see her workload running 10% slower, which this change mitigates.


> Also, I think if we
> test at fillfactor 80 or 75 (which is not unrealistic considering an
> update-intensive workload), then we might again see regression.


Yeah, we might, but it will be lesser than before, may be 2% instead of
10%. And by doing this we are further narrowing an already narrow test
case. I think we need to see things in totality and weigh in costs-benefits
trade offs. There are numbers for very common workloads, where WARM may
provide 20, 30 or even more than 100% improvements.

Andres and Alvaro already have other ideas to address this problem even
further. And as I said, we can pass-in index specific information and make
that routine bail-out even earlier. We need to accept that WARM will need
to do more attr checks than master, especially when there are more than 1
indexes on the table,  and sometimes those checks will go waste. I am ok if
we want to provide table-specific knob to disable WARM, but not sure if
others would like that idea.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Pavan Deolasee
Thanks Amit. v19 addresses some of the comments below.

On Thu, Mar 23, 2017 at 10:28 AM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Wed, Mar 22, 2017 at 4:06 PM, Amit Kapila <amit.kapil...@gmail.com>
> wrote:
> > On Tue, Mar 21, 2017 at 6:47 PM, Pavan Deolasee
> > <pavan.deola...@gmail.com> wrote:
> >>
> >>>
> >>
> >> Please find attached rebased patches.
> >>
> >
> > Few comments on 0005_warm_updates_v18.patch:
> >
>
> Few more comments on 0005_warm_updates_v18.patch:
> 1.
> @@ -234,6 +241,25 @@ index_beginscan(Relation heapRelation,
>   scan->heapRelation = heapRelation;
>   scan->xs_snapshot = snapshot;
>
> + /*
> + * If the index supports recheck,
> make sure that index tuple is saved
> + * during index scans. Also build and cache IndexInfo which is used by
> + * amrecheck routine.
> + *
> + * XXX Ideally, we should look at
> all indexes on the table and check if
> + * WARM is at all supported on the base table. If WARM is not supported
> + * then we don't need to do any recheck.
> RelationGetIndexAttrBitmap() does
> + * do that and sets rd_supportswarm after looking at all indexes. But we
> + * don't know if the function was called earlier in the
> session when we're
> + * here. We can't call it now because there exists a risk of causing
> + * deadlock.
> + */
> + if (indexRelation->rd_amroutine->amrecheck)
> + {
> +scan->xs_want_itup = true;
> + scan->indexInfo = BuildIndexInfo(indexRelation);
> + }
> +
>
> Don't we need to do this rechecking during parallel scans?  Also what
> about bitmap heap scans?
>
>
Yes, we need to handle parallel scans. Bitmap scans are not a problem
because it can never return the same TID twice. I fixed this though by
moving this inside index_beginscan_internal.


> 2.
> +++ b/src/backend/access/nbtree/nbtinsert.c
> -
>  typedef struct
>
> Above change is not require.
>
>
Sure. Fixed.


> 3.
> +_bt_clear_items(Page page, OffsetNumber *clearitemnos, uint16 nclearitems)
> +void _hash_clear_items(Page page, OffsetNumber *clearitemnos,
> +   uint16 nclearitems)
>
> Both the above functions look exactly same, isn't it better to have a
> single function like page_clear_items?  If you want separation for
> different index types, then we can have one common function which can
> be called from different index types.
>
>
Yes, makes sense. Moved that to bufpage.c. The reason why I originally had
index-specific versions because I started by putting WARM flag in
IndexTuple header. But since hash index does not have the bit free, moved
everything to TID bit-flag. I still left index-specific wrappers, but they
just call PageIndexClearWarmTuples.


> 4.
> - if (callback(htup, callback_state))
> + flags = ItemPointerGetFlags(>t_tid);
> + is_warm = ((flags & BTREE_INDEX_WARM_POINTER) != 0);
> +
> + if (is_warm)
> + stats->num_warm_pointers++;
> + else
> + stats->num_clear_pointers++;
> +
> + result = callback(htup, is_warm, callback_state);
> + if (result == IBDCR_DELETE)
> + {
> + if (is_warm)
> + stats->warm_pointers_removed++;
> + else
> + stats->clear_pointers_removed++;
>
> The patch looks to be inconsistent in collecting stats for btree and
> hash.  I don't see above stats are getting updated in hash index code.
>
>
Fixed. The hashbucketcleanup signature is just getting a bit too long. May
be we should move some of these counters in a structure and pass that
around. Not done here though.


> 5.
> +btrecheck(Relation indexRel, IndexInfo *indexInfo, IndexTuple indexTuple,
> + Relation heapRel, HeapTuple heapTuple)
> {
> ..
> + if (!datumIsEqual(values[i - 1], indxvalue, att->attbyval,
> + att->attlen))
> ..
> }
>
> Will this work if the index is using non-default collation?
>
>
Not sure I understand that. Can you please elaborate?


> 6.
> +++ b/src/backend/access/nbtree/nbtxlog.c
> @@ -390,83 +390,9 @@ btree_xlog_vacuum(XLogReaderState *record)
> -#ifdef UNUSED
>   xl_btree_vacuum *xlrec = (xl_btree_vacuum *) XLogRecGetData(record);
>
>   /*
> - * This section of code is thought to be no longer needed, after analysis
> - * of the calling paths. It is retained to allow the code to be reinstated
> - * if a flaw is revealed in that thinking.
> - *
> ..
>
> Why does this patch need to remove the above code under #ifdef UNUSED
>
>
Yeah, it isn't strictly necessary. But that dead code was coming in the way
and hence I decided to strip it out. I can put it back if it's an issue or
remove that as a separate commit first.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-22 Thread Pavan Deolasee
On Wed, Mar 22, 2017 at 4:53 PM, Mithun Cy <mithun...@enterprisedb.com>
wrote:

> On Wed, Mar 22, 2017 at 3:44 PM, Pavan Deolasee
> <pavan.deola...@gmail.com> wrote:
> >
> > This looks quite weird to me. Obviously these numbers are completely
> > non-comparable. Even the time for VACUUM FULL goes up with every run.
> >
> > May be we can blame it on AWS instance completely, but the pattern in
> your
> > tests looks very similar where the number slowly and steadily keeps going
> > up. If you do complete retest but run v18/v19 first and then run master,
> may
> > be we'll see a complete opposite picture?
> >
>
> For those tests I have done tests in the order --- <Master, patch18,
> patch18, Master> both the time numbers were same.


Hmm, interesting.


> One different thing
> I did was I was deleting the data directory between tests and creating
> the database from scratch. Unfortunately the machine I tested this is
> not available. I will test same with v19 once I get the machine and
> report you back.


Ok, no problem. I did some tests on AWS i2.xlarge instance (4 vCPU, 30GB
RAM, attached SSD) and results are shown below. But I think it is important
to get independent validation from your side too, just to ensure I am not
making any mistake in measurement. I've attached naively put together
scripts which I used to run the benchmark. If you find them useful, please
adjust the paths and run on your machine.

I reverted back to UNLOGGED table because with WAL the results looked very
weird (as posted earlier) even when I was taking a CHECKPOINT before each
set and had set max_wal_size and checkpoint_timeout high enough to avoid
any checkpoint during the run. Anyways, that's a matter of separate
investigation and not related to this patch.

I did two kinds of tests.
a) update last column of the index
b) update second column of the index

v19 does considerably better than even master for the last column update
case and pretty much inline for the second column update test. The reason
is very clear because v19 determines early in the cycle that the buffer is
already full and there is very little chance of doing a HOT update on the
page. In that case, it does not check any columns for modification. The
master on the other hand will scan through all 9 columns (for last column
update case) and incur the same kind of overhead of doing wasteful work.

The first/second/fourth column shows response time in ms and third and
fifth column shows percentage difference over master. (I hope the table
looks fine, trying some text-table generator tool :-). Apologies if it
looks messed up)



+---+
|  Second column update |
+---+
|   Master  | v18 | v19 |
+---+-+-+
| 96657.681 | 108122.868 | 11.86% | 96873.642  | 0.22%  |
+---+++++
| 98546.35  | 110021.27  | 11.64% | 97569.187  | -0.99% |
+---+++++
| 99297.231 | 110550.054 | 11.33% | 100241.261 | 0.95%  |
+---+++++
| 97196.556 | 110235.808 | 13.42% | 97216.207  | 0.02%  |
+---+++++
| 99072.432 | 110477.327 | 11.51% | 97950.687  | -1.13% |
+---+++++
| 96730.056 | 109217.939 | 12.91% | 96929.617  | 0.21%  |
+---+++++


+---+
|   Last column update  |
+---+
|   Master   | v18| v19 |
+++-+
| 112545.537 | 116563.608 | 3.57% | 103067.276 | -8.42% |
+++---+++
| 110169.224 | 115753.991 | 5.07% | 104411.44  | -5.23% |
+++---+++
| 112280.874 | 116437.11  | 3.70% | 104868.98  | -6.60% |
+++---+++
| 113171.556 | 116813.262 | 3.22% | 103907.012 | -8.19% |
+++---+++
| 110721.881 | 117442.709 | 6.07% | 104124.131 | -5.96% |
+++---+++
| 112138.601 | 115834.549 | 3.30% | 104858.624 | -6.49% |
+++---+++


Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


interest_attrs_tests.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-22 Thread Pavan Deolasee
On Wed, Mar 22, 2017 at 3:51 AM, Mithun Cy <mithun...@enterprisedb.com>
wrote:

>
> CREATE INDEX testindx ON testtab (col1, col2, col3, col4, col5, col6,
> col7, col8, col9);
> Performance measurement tests: Ran12 times to eliminate run to run
> latencies.
> ==
> VACUUM FULL;
> BEGIN;
> UPDATE testtab SET col2 = md5(random()::text);
> ROLLBACK;
>
> Response time recorded shows there is a much higher increase in response
> time from 10% to 25% after the patch.
>
>
After doing some tests on my side, I now think that there is something else
going on, unrelated to the patch. I ran the same benchmark on AWS i2.xlarge
machine with 32GB RAM. shared_buffers set to 16GB, max_wal_size to 256GB,
checkpoint_timeout to 60min and autovacuum off.

I compared master and v19, every time running 6 runs of the test. The
database was restarted whenever changing binaries, tables dropped/recreated
and checkpoint taken after each restart (but not between 2 runs, which I
believe what you did too.. but correct me if that's a wrong assumption).

Instead of col2, I am updating col9, but that's probably not too much
relevant.

VACUUM FULL;
BEGIN;
UPDATE testtab SET col9 = md5(random()::text);
ROLLBACK;


First set of 6 runs with master:
163629.8
181183.8
194788.1
194606.1
194589.9
196002.6

(database restart, table drop/create, checkpoint)
First set of 6 runs with v19:
190566.55
228274.489
238110.202
239304.681
258748.189
284882.4

(database restart, table drop/create, checkpoint)
Second set of 6 runs with master:
232267.5
298259.6
312315.1
341817.3
360729.2
385210.7

This looks quite weird to me. Obviously these numbers are completely
non-comparable. Even the time for VACUUM FULL goes up with every run.

May be we can blame it on AWS instance completely, but the pattern in your
tests looks very similar where the number slowly and steadily keeps going
up. If you do complete retest but run v18/v19 first and then run master,
may be we'll see a complete opposite picture?

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-22 Thread Pavan Deolasee
On Wed, Mar 22, 2017 at 8:43 AM, Pavan Deolasee <pavan.deola...@gmail.com>
wrote:

>
>
> BTW may I request another test with the attached patch? In this patch, we
> check if the PageIsFull() even before deciding which attributes to check
> for modification. If the page is already full, there is hardly any chance
> of doing a HOT update  (there could be a corner case where the new tuple is
> smaller than the tuple used in previous UPDATE and we have just enough
> space to do HOT update this time, but I can think that's too narrow).
>
>
I would also request you to do a slightly different test where instead of
updating the second column, we update the last column of the index i.e.
col9. Would really appreciate if you share results with both master and v19
patch.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Pavan Deolasee
On Wed, Mar 22, 2017 at 3:51 AM, Mithun Cy <mithun...@enterprisedb.com>
wrote:

> On Tue, Mar 21, 2017 at 8:10 PM, Robert Haas <robertmh...@gmail.com>
> wrote:
> > If the WAL writing hides the loss, then I agree that's not a big
> > concern.  But if the loss is still visible even when WAL is written,
> > then I'm not so sure.
>
> The tests table schema was taken from earlier tests what Pavan has posted
> [1], hence it is UNLOGGED all I tried to stress the tests. Instead of
> updating 1 row at a time through pgbench (For which I and Pavan both did
> not see any regression), I tried to update all the rows in the single
> statement.
>

Sorry, I did not mean to suggest that you set it up wrongly, I was just
trying to point out that the test case itself may not be very practical.
But given your recent numbers, the regression is clearly non-trivial and
something we must address.


> I have changed the settings as recommended and did a quick test as above
> in our machine by removing UNLOGGED world in create table statement.
>
> Patch Tested : Only 0001_interesting_attrs_v18.patch in [2]
>
> Response time recorded shows there is a much higher increase in response
> time from 10% to 25% after the patch.
>
>
Thanks for repeating the tests. They are very useful. It might make sense
to reverse the order or do 6 tests each and alternate between patched and
unpatched master just to get rid of any other anomaly.

BTW may I request another test with the attached patch? In this patch, we
check if the PageIsFull() even before deciding which attributes to check
for modification. If the page is already full, there is hardly any chance
of doing a HOT update  (there could be a corner case where the new tuple is
smaller than the tuple used in previous UPDATE and we have just enough
space to do HOT update this time, but I can think that's too narrow).

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


0001_interesting_attrs_v19.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Pavan Deolasee
On Tue, Mar 21, 2017 at 10:34 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Tue, Mar 21, 2017 at 12:49 PM, Bruce Momjian <br...@momjian.us> wrote:
> > On Tue, Mar 21, 2017 at 09:25:49AM -0400, Robert Haas wrote:
> >> On Tue, Mar 21, 2017 at 8:41 AM, Pavan Deolasee
> >> > TBH I see many artificial scenarios here. It will be very useful if
> he can
> >> > rerun the query with some of these restrictions lifted. I'm all for
> >> > addressing whatever we can, but I am not sure if this test
> demonstrates a
> >> > real world usage.
> >>
> >> That's a very fair point, but if these patches - or some of them - are
> >> going to get committed then these things need to get discussed.  Let's
> >> not just have nothing-nothing-nothing giant unagreed code drop.
> >
> > First, let me say I love this feature for PG 10, along with
> > multi-variate statistics.
> >
> > However, not to be a bummer on this, but the persistent question I have
> > is whether we are locking ourselves into a feature that can only do
> > _one_ index-change per WARM chain before a lazy vacuum is required.  Are
> > we ever going to figure out how to do more changes per WARM chain in the
> > future, and is our use of so many bits for this feature going to
> > restrict our ability to do that in the future.
> >
> > I know we have talked about it, but not recently, and if everyone else
> > is fine with it, I am too, but I have to ask these questions.
>
> I think that's a good question.  I previously expressed similar
> concerns.  On the one hand, it's hard to ignore the fact that, in the
> cases where this wins, it already buys us a lot of performance
> improvement.  On the other hand, as you say (and as I said), it eats
> up a lot of bits, and that limits what we can do in the future.


I think we can save a bit few bits, at some additional costs and/or
complexity. It all depends on what matters us more. For example, we can
choose not to use HEAP_LATEST_TUPLE bit and instead always find the root
tuple the hard way. Since only WARM would ever need to find that
information, may be it's ok since WARM's other advantage will justify that.
Or we cache the information computed during earlier heap_prune_page call
and use that (just guessing that we can make it work, no concrete idea at
this moment).

We can also save HEAP_WARM_UPDATED flag since this is required only for
abort-handling case. We can find a way to push that information down to the
old tuple if UPDATE aborts and we detect the broken chain. Again, not fully
thought through, but doable. Of course, we will have to carefully evaluate
all code paths and make sure that we don't lose that information ever.

If the consumption of bits become a deal breaker then I would first trade
the HEAP_LATEST_TUPLE bit and then HEAP_WARM_UPDATED just from correctness
perspective.

Thanks,
Pavan
-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Pavan Deolasee
On Tue, Mar 21, 2017 at 10:47 PM, Bruce Momjian <br...@momjian.us> wrote:

> On Tue, Mar 21, 2017 at 01:04:14PM -0400, Robert Haas wrote:
> > > I know we have talked about it, but not recently, and if everyone else
> > > is fine with it, I am too, but I have to ask these questions.
> >
> > I think that's a good question.  I previously expressed similar
> > concerns.  On the one hand, it's hard to ignore the fact that, in the
> > cases where this wins, it already buys us a lot of performance
> > improvement.  On the other hand, as you say (and as I said), it eats
> > up a lot of bits, and that limits what we can do in the future.  On
> > the one hand, there is a saying that a bird in the hand is worth two
> > in the bush.  On the other hand, there is also a saying that one
> > should not paint oneself into the corner.
> >
> > I'm not sure we've had any really substantive discussion of these
> > issues.  Pavan's response to my previous comments was basically "well,
> > I think it's worth it", which is entirely reasonable, because he
> > presumably wouldn't have written the patch that way if he thought it
> > sucked.  But it might not be the only opinion.
>
> Early in the discussion we talked about allowing multiple changes per
> WARM chain if they all changed the same index and were in the same
> direction so there were no duplicates, but it was complicated.  There
> was also discussion about checking the index during INSERT/UPDATE to see
> if there was a duplicate.  However, those ideas never led to further
> discussion.
>

Well, once I started thinking about how to do vacuum etc, I realised that
any mechanism which allows unlimited (even handful) updates per chain is
going to be very complex and error prone. But if someone has ideas to do
that, I am open. I must say though, it will make an already complex problem
even more complex.


>
> I know the current patch yields good results, but only on a narrow test
> case,


Hmm. I am kinda surprised you say that because I never thought it was a
narrow test case that we are targeting here. But may be I'm wrong.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Pavan Deolasee
On Tue, Mar 21, 2017 at 6:55 PM, Robert Haas <robertmh...@gmail.com> wrote:

>
> I think that very wide columns and highly indexed tables are not
> particularly unrealistic, nor do I think updating all the rows is
> particularly unrealistic.


Ok. But those who update 10M rows in a single transaction, would they
really notice 5-10% variation? I think it probably makes sense to run those
updates in smaller transactions and see if the regression is still visible
(otherwise tweaking synchronous_commit is mute anyways).


> Sure, it's not everything, but it's
> something.  Now, I would agree that all of that PLUS unlogged tables
> with fsync=off is not too realistic.  What kind of regression would we
> observe if we eliminated those last two variables?
>

Hard to say. I didn't find any regression on the machines available to me
even with the original test case that I used, which was pretty bad case to
start with (sure, Mithun tweaked it further to create even worse scenario).
May be the kind of machines he has access to, it might show up even with
those changes.


>
>
> I think that whether the code ends up getting contorted is an
> important consideration here.  For example, if the first of the things
> you mention can be done without making the code ugly, then I think
> that would be worth doing; it's likely to help fairly often in
> real-world cases.  The problem with making the code contorted and
> ugly, as you say that the second idea would require, is that it can
> easily mask bugs.


Agree. That's probably one reason why Alvaro wrote the patch to start with.
I'll give the first of those two options a try.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Pavan Deolasee
On Tue, Mar 21, 2017 at 5:34 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Tue, Mar 21, 2017 at 6:56 AM, Amit Kapila <amit.kapil...@gmail.com>
> wrote:
> >> Hmm, that test case isn't all that synthetic.  It's just a single
> >> column bulk update, which isn't anything all that crazy, and 5-10%
> >> isn't nothing.
> >>
> >> I'm kinda surprised it made that much difference, though.
> >>
> >
> > I think it is because heap_getattr() is not that cheap.  We have
> > noticed the similar problem during development of scan key push down
> > work [1].
>
> Yeah.  So what's the deal with this?  Is somebody working on figuring
> out a different approach that would reduce this overhead?  Are we
> going to defer WARM to v11?  Or is the intent to just ignore the 5-10%
> slowdown on a single column update and commit everything anyway?


I think I should clarify something. The test case does a single column
update, but it also has columns which are very wide, has an index on many
columns (and it updates a column early in the list). In addition, in the
test Mithun updated all 10million rows of the table in a single
transaction, used UNLOGGED table and fsync was turned off.

TBH I see many artificial scenarios here. It will be very useful if he can
rerun the query with some of these restrictions lifted. I'm all for
addressing whatever we can, but I am not sure if this test demonstrates a
real world usage.

Having said that, may be if we can do a few things to reduce the overhead.

- Check if the page has enough free space to perform a HOT/WARM update. If
not, don't look for all index keys.
- Pass bitmaps separately for each index and bail out early if we conclude
neither HOT nor WARM is possible. In this case since there is just one
index and as soon as we check the second column we know neither HOT nor
WARM is possible, we will return early. It might complicate the API a lot,
but I can give it a shot if that's what is needed to make progress.

Any other ideas?

Thanks,
Pavan
-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-20 Thread Pavan Deolasee
On Tue, Mar 14, 2017 at 7:17 AM, Alvaro Herrera 
wrote:

> > @@ -234,6 +236,21 @@ index_beginscan(Relation heapRelation,
> >   scan->heapRelation = heapRelation;
> >   scan->xs_snapshot = snapshot;
> >
> > + /*
> > +  * If the index supports recheck, make sure that index tuple is
> saved
> > +  * during index scans.
> > +  *
> > +  * XXX Ideally, we should look at all indexes on the table and
> check if
> > +  * WARM is at all supported on the base table. If WARM is not
> supported
> > +  * then we don't need to do any recheck.
> RelationGetIndexAttrBitmap() does
> > +  * do that and sets rd_supportswarm after looking at all indexes.
> But we
> > +  * don't know if the function was called earlier in the session
> when we're
> > +  * here. We can't call it now because there exists a risk of
> causing
> > +  * deadlock.
> > +  */
> > + if (indexRelation->rd_amroutine->amrecheck)
> > + scan->xs_want_itup = true;
> > +
> >   return scan;
> >  }
>
> I didn't like this comment very much.  But it's not necessary: you have
> already given relcache responsibility for setting rd_supportswarm.  The
> only problem seems to be that you set it in RelationGetIndexAttrBitmap
> instead of RelationGetIndexList, but it's not clear to me why.  I think
> if the latter function is in charge, then we can trust the flag more
> than the current situation.


I looked at this today.  AFAICS we don't have access to rd_amroutine in
RelationGetIndexList since we don't actually call index_open() in that
function. Would it be safe to do that? I'll give it a shot, but thought of
asking here first.

Thanks,
Pavan


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-20 Thread Pavan Deolasee
On Wed, Mar 15, 2017 at 12:46 AM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

> Pavan Deolasee wrote:
> > On Tue, Mar 14, 2017 at 7:17 AM, Alvaro Herrera <
> alvhe...@2ndquadrant.com>
> > wrote:
>
> > > I have already commented about the executor involvement in btrecheck();
> > > that doesn't seem good.  I previously suggested to pass the EState down
> > > from caller, but that's not a great idea either since you still need to
> > > do the actual FormIndexDatum.  I now think that a workable option would
> > > be to compute the values/isnulls arrays so that btrecheck gets them
> > > already computed.
> >
> > I agree with your complaint about modularity violation. What I am unclear
> > is how passing values/isnulls array will fix that. The way code is
> > structured currently, recheck routines are called by index_fetch_heap().
> So
> > if we try to compute values/isnulls in that function, we'll still need
> > access EState, which AFAIU will lead to similar violation. Or am I
> > mis-reading your idea?
>
> You're right, it's still a problem.



BTW I realised that we don't really need those executor bits in recheck
routines. We don't support WARM when attributes in index expressions are
modified. So we really don't need to do any comparison for those
attributes. I've written a separate form of FormIndexDatum() which will
only return basic index attributes and comparing them should be enough.
Will share rebased and updated patch soon.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-20 Thread Pavan Deolasee
On Mon, Mar 20, 2017 at 8:11 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Sun, Mar 19, 2017 at 3:05 AM, Pavan Deolasee
> <pavan.deola...@gmail.com> wrote:
> > On Thu, Mar 16, 2017 at 12:53 PM, Robert Haas <robertmh...@gmail.com>
> wrote:
>
> >>
> >> /me scratches head.
> >>
> >> Aren't pre-warm and post-warm just (better) names for blue and red?
> >>
> >
> > Yeah, sounds better.
>
> My point here wasn't really about renaming, although I do think
> renaming is something that should get done.  My point was that you
> were saying we need to mark index pointers as common, pre-warm, and
> post-warm.  But you're pretty much already doing that, I think.  I
> guess you don't have "common", but you do have "pre-warm" and
> "post-warm".
>
>
Ah, I mis-read that. Strictly speaking, we already have common (blue) and
post-warm (red), and I just finished renaming them to CLEAR (of WARM bit)
and WARM. May be it's still not the best name, but I think it looks better
than before.

But the larger point is that we don't have an easy to know if an index
pointer which was inserted with the original heap tuple (i.e. pre-WARM
update) should only return pre-WARM tuples or should it also return
post-WARM tuples. Right now we make that decision by looking at the
index-keys and discard the pointer whose index-key does not match the ones
created from heap-keys. If we need to change that then at every WARM
update, we will have to go back to the original pointer and change it's
state to pre-warm. That looks more invasive and requires additional index
management.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-19 Thread Pavan Deolasee
On Thu, Mar 16, 2017 at 12:53 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Wed, Mar 15, 2017 at 3:44 PM, Pavan Deolasee
> <pavan.deola...@gmail.com> wrote:
> > I couldn't find a better way without a lot of complex infrastructure.
> Even
> > though we now have ability to mark index pointers and we know that a
> given
> > pointer either points to the pre-WARM chain or post-WARM chain, this does
> > not solve the case when an index does not receive a new entry. In that
> case,
> > both pre-WARM and post-WARM tuples are reachable via the same old index
> > pointer. The only way we could deal with this is to mark index pointers
> as
> > "common", "pre-warm" and "post-warm". But that would require us to update
> > the old pointer's state from "common" to "pre-warm" for the index whose
> keys
> > are being updated. May be it's doable, but might be more complex than the
> > current approach.
>
> /me scratches head.
>
> Aren't pre-warm and post-warm just (better) names for blue and red?
>
>
Yeah, sounds better. Just to make it clear, the current design sets the
following information:

HEAP_WARM_TUPLE - When a row gets WARM updated, both old and new versions
of the row are marked with HEAP_WARM_TUPLE flag. This allows us to remember
that a certain row was WARM-updated, even if the update later aborts and we
cleanup the new version and truncate the chain. All subsequent tuple
versions will carry this flag until a non-HOT updates happens, which breaks
the HOT chain.

HEAP_WARM_RED - After first WARM update, the new version of the tuple is
marked with this flag. This flag is also carried forward to all future HOT
updated tuples. So the only tuple that has HEAP_WARM_TUPLE but not
HEAP_WARM_RED is the old version before the WARM update. Also, all tuples
marked with HEAP_WARM_RED flag satisfies HOT property (i.e. all index key
columns share the same value). Similarly, all tuples NOT marked with
HEAP_WARM_RED also satisfy HOT property. I've so far called them Red and
Blue chains respectively.

In addition, in the current patch, the new index pointers resulted from
WARM updates are marked BTREE_INDEX_RED_POINTER/HASH_INDEX_RED_POINTER

I think per your suggestion we can change HEAP_WARM_RED to HEAP_WARM_TUPLE
and similarly rename the index pointers to BTREE/HASH_INDEX_WARM_POINTER
and replace HEAP_WARM_TUPLE with something like HEAP_WARM_UPDATED_TUPLE to
signify that this or some previous version of this chain was once
WARM-updated.

Does that sound ok? I can change the patch accordingly.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-15 Thread Pavan Deolasee
On Tue, Mar 14, 2017 at 7:16 PM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

> Pavan Deolasee wrote:
> > On Tue, Mar 14, 2017 at 7:17 AM, Alvaro Herrera <
> alvhe...@2ndquadrant.com>
> > wrote:
>
> > > I have already commented about the executor involvement in btrecheck();
> > > that doesn't seem good.  I previously suggested to pass the EState down
> > > from caller, but that's not a great idea either since you still need to
> > > do the actual FormIndexDatum.  I now think that a workable option would
> > > be to compute the values/isnulls arrays so that btrecheck gets them
> > > already computed.
> >
> > I agree with your complaint about modularity violation. What I am unclear
> > is how passing values/isnulls array will fix that. The way code is
> > structured currently, recheck routines are called by index_fetch_heap().
> So
> > if we try to compute values/isnulls in that function, we'll still need
> > access EState, which AFAIU will lead to similar violation. Or am I
> > mis-reading your idea?
>
> You're right, it's still a problem.  (Honestly, I think the whole idea
> of trying to compute a fake index tuple starting from a just-read heap
> tuple is a problem in itself;


Why do you think so?


> I just wonder if there's a way to do the
> recheck that doesn't involve such a thing.)
>

I couldn't find a better way without a lot of complex infrastructure. Even
though we now have ability to mark index pointers and we know that a given
pointer either points to the pre-WARM chain or post-WARM chain, this does
not solve the case when an index does not receive a new entry. In that
case, both pre-WARM and post-WARM tuples are reachable via the same old
index pointer. The only way we could deal with this is to mark index
pointers as "common", "pre-warm" and "post-warm". But that would require us
to update the old pointer's state from "common" to "pre-warm" for the index
whose keys are being updated. May be it's doable, but might be more complex
than the current approach.


>
> > I wonder if we should instead invent something similar to IndexRecheck(),
> > but instead of running ExecQual(), this new routine will compare the
> index
> > values by the given HeapTuple against given IndexTuple. ISTM that for
> this
> > to work we'll need to modify all callers of index_getnext() and teach
> them
> > to invoke the AM specific recheck method if xs_tuple_recheck flag is set
> to
> > true by index_getnext().
>
> Yeah, grumble, that idea does sound intrusive, but perhaps it's
> workable.  What about bitmap indexscans?  AFAICS we already have a
> recheck there natively, so we only need to mark the page as lossy, which
> we're already doing anyway.
>

Yeah, bitmap indexscans should be ok. We need recheck logic only to avoid
duplicate scans and since a TID can only occur once in the bitmap, there is
no risk for duplicate results.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-14 Thread Pavan Deolasee
On Tue, Mar 14, 2017 at 7:19 PM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

> Pavan Deolasee wrote:
>
> > BTW I wanted to share some more numbers from a recent performance test. I
> > thought it's important because the latest patch has fully functional
> chain
> > conversion code as well as all WAL-logging related pieces are in place
> > too. I ran these tests on a box borrowed from Tomas (thanks!).  This has
> > 64GB RAM and 350GB SSD with 1GB on-board RAM. I used the same test setup
> > that I used for the first test results reported on this thread i.e. a
> > modified pgbench_accounts table with additional columns and additional
> > indexes (one index on abalance so that every UPDATE is a potential WARM
> > update).
> >
> > In a test where table + indexes exceeds RAM, running for 8hrs and
> > auto-vacuum parameters set such that we get 2-3 autovacuums on the table
> > during the test, we see WARM delivering more than 100% TPS as compared to
> > master. In this graph, I've plotted a moving average of TPS and the
> spikes
> > that we see coincides with the checkpoints (checkpoint_timeout is set to
> > 20mins and max_wal_size large enough to avoid any xlog-based
> checkpoints).
> > The spikes are more prominent on WARM but I guess that's purely because
> it
> > delivers much higher TPS. I haven't shown here but I see WARM updates
> close
> > to 65-70% of the total updates. Also there is significant reduction in
> WAL
> > generated per txn.
>
> Impressive results.  Labels on axes would improve readability of the chart
> :-)
>
>
Sorry about that. I was desperately searching for Undo button after hitting
"send" for the very same reason :-) Looks like I used gnuplot after a few
years.

Just to make it clear, the X-axis is duration of tests in seconds and
Y-axis is 450s moving average of TPS. BTW 450 is no magic figure. I
collected stats every 15s and took a moving average of last 30 samples.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-14 Thread Pavan Deolasee
On Tue, Mar 14, 2017 at 7:17 AM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

> > @@ -234,6 +236,21 @@ index_beginscan(Relation heapRelation,
> >   scan->heapRelation = heapRelation;
> >   scan->xs_snapshot = snapshot;
> >
> > + /*
> > +  * If the index supports recheck, make sure that index tuple is
> saved
> > +  * during index scans.
> > +  *
> > +  * XXX Ideally, we should look at all indexes on the table and
> check if
> > +  * WARM is at all supported on the base table. If WARM is not
> supported
> > +  * then we don't need to do any recheck.
> RelationGetIndexAttrBitmap() does
> > +  * do that and sets rd_supportswarm after looking at all indexes.
> But we
> > +  * don't know if the function was called earlier in the session
> when we're
> > +  * here. We can't call it now because there exists a risk of
> causing
> > +  * deadlock.
> > +  */
> > + if (indexRelation->rd_amroutine->amrecheck)
> > + scan->xs_want_itup = true;
> > +
> >   return scan;
> >  }
>
> I didn't like this comment very much.  But it's not necessary: you have
> already given relcache responsibility for setting rd_supportswarm.  The
> only problem seems to be that you set it in RelationGetIndexAttrBitmap
> instead of RelationGetIndexList, but it's not clear to me why.


Hmm. I think you're right. Will fix that way and test.


>
> I noticed that nbtinsert.c and nbtree.c have a bunch of new includes
> that they don't actually need.  Let's remove those.  nbtutils.c does
> need them because of btrecheck().


Right. It's probably a left over from the way I wrote the first version.
Will fix.

Speaking of which:
>
> I have already commented about the executor involvement in btrecheck();
> that doesn't seem good.  I previously suggested to pass the EState down
> from caller, but that's not a great idea either since you still need to
> do the actual FormIndexDatum.  I now think that a workable option would
> be to compute the values/isnulls arrays so that btrecheck gets them
> already computed.


I agree with your complaint about modularity violation. What I am unclear
is how passing values/isnulls array will fix that. The way code is
structured currently, recheck routines are called by index_fetch_heap(). So
if we try to compute values/isnulls in that function, we'll still need
access EState, which AFAIU will lead to similar violation. Or am I
mis-reading your idea?

I wonder if we should instead invent something similar to IndexRecheck(),
but instead of running ExecQual(), this new routine will compare the index
values by the given HeapTuple against given IndexTuple. ISTM that for this
to work we'll need to modify all callers of index_getnext() and teach them
to invoke the AM specific recheck method if xs_tuple_recheck flag is set to
true by index_getnext().

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC

2017-03-07 Thread Pavan Deolasee
On Wed, Mar 8, 2017 at 8:08 AM, Robert Haas <robertmh...@gmail.com> wrote:

> On Tue, Mar 7, 2017 at 9:12 PM, Andres Freund <and...@anarazel.de> wrote:
>
> >
> > I wonder however, if careful snapshot managment couldn't solve this as
> > well.  I have *not* thought a lot about this, but afaics we can easily
> > prevent all-visible from being set in cases it'd bother us by having an
> > "appropriate" xmin / registered snapshot.
>
> Yeah, but that's a tax on the whole system.
>
>
May be we can do that only when patched CIC (or any other operation which
may not like concurrent VM set) is in progress. We could use what Stephen
suggested upthread to find that state. But right now it's hard to think
because there is nothing on either side so we don't know what gets impacted
by aggressive VM set and how.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC

2017-03-07 Thread Pavan Deolasee
On Wed, Mar 8, 2017 at 7:33 AM, Robert Haas  wrote:

> On Tue, Mar 7, 2017 at 4:26 PM, Stephen Frost  wrote:
> > Right, that's what I thought he was getting at and my general thinking
> > was that we would need a way to discover if a CIC is ongoing on the
> > relation and therefore heap_page_prune(), or anything else, would know
> > that it can't twiddle the bits in the VM due to the ongoing CIC.
> > Perhaps a lock isn't the right answer there, but it would have to be
> > some kind of cross-process communication that operates at a relation
> > level..
>
> Well, I guess that's one option.  I lean toward the position already
> taken by Andres and Peter, namely, that it's probably not a great idea
> to pursue this optimization.


Fair point. I'm not going to "persist" with the idea too long. It seemed
like a good, low-risk feature to me which can benefit certain use cases
quite reasonably. It's not uncommon to create indexes (or reindex existing
indexes to remove index bloats) on extremely large tables and avoiding a
second heap scan can hugely benefit such cases. Holding up the patch for
something for which we don't even have a proposal yet seemed a bit strange
at first, but I see the point.

Anyways, for a recently vacuumed table of twice the size of RAM and on a
machine with SSDs, the patched CIC version runs about 25% faster. That's
probably the best case scenario.

I also realised that I'd attached a slightly older version of the patch. So
new version attached, but I'll remove the patch from CF if I don't see any
more votes in favour of the patch.

Thanks,
Pavan


cic_skip_all_visible_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cleanup: avoid direct use of ip_posid/ip_blkid

2017-03-05 Thread Pavan Deolasee
On Thu, Mar 2, 2017 at 9:55 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2/22/17 08:38, Pavan Deolasee wrote:
> > One reason why these macros are not always used is because they
> > typically do assert-validation to ensure ip_posid has a valid value.
> > There are a few places in code, especially in GIN and also when we are
> > dealing with user-supplied TIDs when we might get a TID with invalid
> > ip_posid. I've handled that by defining and using separate macros which
> > skip the validation. This doesn't seem any worse than what we are
> > already doing.
>
> I wonder why we allow that.  Shouldn't the tid type reject input that
> has ip_posid == 0?
>

Yes, I think it seems sensible to disallow InvalidOffsetNumber (or >
MaxOffsetNumber) in user-supplied value. But there are places in GIN and
with INSERT ON CONFLICT where we seem to use special values in ip_posid to
mean different things. So we might still need some way to accept invalid
values there.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[HACKERS] Skip all-visible pages during second HeapScan of CIC

2017-02-28 Thread Pavan Deolasee
Hello All,

During the second heap scan of CREATE INDEX CONCURRENTLY, we're only
interested in the tuples which were inserted after the first scan was
started. All such tuples can only exists in pages which have their VM bit
unset. So I propose the attached patch which consults VM during second scan
and skip all-visible pages. We do the same trick of skipping pages only if
certain threshold of pages can be skipped to ensure OS's read-ahead is not
disturbed.

The patch obviously shows significant reduction of time for building index
concurrently for very large tables, which are not being updated frequently
and which was vacuumed recently (so that VM bits are set). I can post
performance numbers if there is interest. For tables that are being updated
heavily, the threshold skipping was indeed useful and without that we saw a
slight regression.

Since VM bits are only set during VACUUM which conflicts with CIC on the
relation lock, I don't see any risk of incorrectly skipping pages that the
second scan should have scanned.

Comments?

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


cic_skip_all_visible_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-02-24 Thread Pavan Deolasee
On Fri, Feb 24, 2017 at 9:47 PM, Robert Haas <robertmh...@gmail.com> wrote:

>
> And there are no bugs, right?  :-)


Yeah yeah absolutely nothing. Just like any other feature committed to
Postgres so far ;-)

I need to polish the chain conversion patch a bit and also add missing
support for redo, hash indexes etc. Support for hash indexes will need
overloading of ip_posid bits in the index tuple (since there are no free
bits left in hash tuples). I plan to work on that next and submit a fully
functional patch, hopefully before the commit-fest starts.

(I have mentioned the idea of overloading ip_posid bits a few times now and
haven't heard any objection so far. Well, that could either mean that
nobody has read those emails seriously or there is general acceptance to
that idea.. I am assuming latter :-))

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-02-24 Thread Pavan Deolasee
On Fri, Feb 24, 2017 at 3:42 PM, Robert Haas <robertmh...@gmail.com> wrote:

>
>
> Wow, OK.  In my view, that makes the chain conversion code pretty much
> essential, because if you had WARM without chain conversion then the
> visibility map gets more or less irrevocably less effective over time,
> which sounds terrible.


Yes. I decided to complete chain conversion patch when I realised that IOS
will otherwise become completely useful if large percentage of rows are
updated just once. So I agree. It's not an optional patch and should get in
with the main WARM patch.


> But it sounds to me like even with the chain
> conversion, it might take multiple vacuum passes before all visibility
> map bits are set, which isn't such a great property (thus e.g.
> fdf9e21196a6f58c6021c967dc5776a16190f295).
>
>
The chain conversion algorithm first converts the chains during vacuum and
then checks if the page can be set all-visible. So I'm not sure why it
would take multiple vacuums before a page is set all-visible. The commit
you quote was written to ensure that we make another attempt to set the
page all-visible after al dead tuples are removed from the page. Similarly,
we will convert all WARM chains to HOT chains and then check for
all-visibility of the page.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-02-24 Thread Pavan Deolasee
On Fri, Feb 24, 2017 at 3:23 PM, Robert Haas <robertmh...@gmail.com> wrote:

>
> I don't immediately see how this will work with index-only scans.  If
> the tuple is HOT updated several times, HOT-pruned back to a single
> version, and then the page is all-visible, the index entries are
> guaranteed to agree with the remaining tuple, so it's fine to believe
> the data in the index tuple.  But with WARM, that would no longer be
> true, unless you have some trick for that...
>
>
Well the trick is to not allow index-only scans on such pages by not
marking them all-visible. That's why when a tuple is WARM updated, we carry
that information in the subsequent versions even when later updates are HOT
updates. The chain conversion algorithm will handle this by clearing those
bits and thus allowing index-only scans again.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-02-24 Thread Pavan Deolasee
On Fri, Feb 24, 2017 at 12:28 AM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

> Bruce Momjian wrote:
> > On Thu, Feb 23, 2017 at 03:45:24PM -0300, Alvaro Herrera wrote:
>
> > > > and potentially trim the first HOT chain as those tuples become
> > > > invisible.
> > >
> > > That can already happen even without WARM, no?
> >
> > Uh, the point is that with WARM those four early tuples can be removed
> > via a prune, rather than requiring a VACUUM. Without WARM, the fourth
> > tuple can't be removed until the index is cleared by VACUUM.
>
> I *think* that the WARM-updated one cannot be pruned either, because
> it's pointed to by at least one index (otherwise it'd have been a HOT
> update).  The ones prior to that can be removed either way.
>
>
No, even the WARM-updated can be pruned and if there are further HOT
updates, those can be pruned too. All indexes and even multiple pointers
from the same index are always pointing to the root of the WARM chain and
that line pointer does not go away unless the entire chain become dead. The
only material difference between HOT and WARM is that since there are two
index pointers from the same index to the same root line pointer, we must
do recheck. But HOT-pruning and all such things remain the same.

Let's take an example. Say, we have a table (a int, b int, c text) and two
indexes on first two columns.

   H  W
   H
(1, 100, 'foo') -> (1, 100, 'bar') --> (1, 200, 'bar') -> (1,
200, 'foo')

The first update will be a HOT update, the second update will be a WARM
update and the third update will again be a HOT update. The first and third
update do not create any new index entry, though the second update will
create a new index entry in the second index. Any further WARM updates to
this chain is not allowed, but further HOT updates are ok.

If all but the last version become DEAD, HOT-prune will remove all of them
and turn the first line pointer into REDIRECT line pointer. At this point,
the first index has one index pointer and the second index has two index
pointers, but all pointing to the same root line pointer, which has not
become REDIRECT line pointer.

   Redirect
o---> (1, 200, 'foo')

I think the part you want (be able to prune the WARM updated tuple) is
> part of what Pavan calls "turning the WARM chain into a HOT chain", so
> not part of the initial patch.  Pavan can explain this part better, and
> also set me straight in case I'm wrong in the above :-)
>
>
Umm.. it's a bit different. Without chain conversion, we still don't allow
further WARM updates to the above chain because that might create a third
index pointer and our recheck logic can't cope up with duplicate scans. HOT
updates are allowed though.

The latest patch that I proposed will handle this case and convert such
chains into regular HOT-pruned chains. To do that, we must remove the
duplicate (and now wrong) index pointer to the chain. Once we do that and
change the state on the heap tuple, we can once again do WARM update to
this tuple. Note that in this example the chain has just one tuple, which
will be the case typically, but the algorithm can deal with the case where
there are multiple tuples but with matching index keys.

Hope this helps.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-02-24 Thread Pavan Deolasee
On Fri, Feb 24, 2017 at 2:13 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Thu, Feb 23, 2017 at 9:21 PM, Bruce Momjian <br...@momjian.us> wrote:
> >
>
> > I have what might be a supid question.  As I remember, WARM only allows
> > a single index-column change in the chain.  Why are you seeing such a
> > large performance improvement?  I would have thought it would be that
> > high if we allowed an unlimited number of index changes in the chain.
>
> I'm not sure how the test case is set up.  If the table has multiple
> indexes, each on a different column, and only one of the indexes is
> updated, then you figure to win because now the other indexes need
> less maintenance (and get less bloated).  If you have only a single
> index, then I don't see how WARM can be any better than HOT, but maybe
> I just don't understand the situation.
>
>
That's correct. If you have just one index and if the UPDATE modifies
indexed indexed, the UPDATE won't be a WARM update and the patch gives you
no benefit. OTOH if the UPDATE doesn't modify any indexed columns, then it
will be a HOT update and again the patch gives you no benefit. It might be
worthwhile to see if patch causes any regression in these scenarios, though
I think it will be minimal or zero.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-02-24 Thread Pavan Deolasee
On Thu, Feb 23, 2017 at 11:53 PM, Bruce Momjian <br...@momjian.us> wrote:

> On Thu, Feb 23, 2017 at 03:03:39PM -0300, Alvaro Herrera wrote:
> > Bruce Momjian wrote:
> >
> > > As I remember, WARM only allows
> > > a single index-column change in the chain.  Why are you seeing such a
> > > large performance improvement?  I would have thought it would be that
> > > high if we allowed an unlimited number of index changes in the chain.
> >
> > The second update in a chain creates another non-warm-updated tuple, so
> > the third update can be a warm update again, and so on.
>
> Right, before this patch they would be two independent HOT chains.  It
> still seems like an unexpectedly-high performance win.  Are two
> independent HOT chains that much more expensive than joining them via
> WARM?


In these tests, there are zero HOT updates, since every update modifies
some index column. With WARM, we could reduce regular updates to half, even
when we allow only one WARM update per chain (chain really has a single
tuple for this discussion). IOW approximately half updates insert new index
entry in *every* index and half updates
insert new index entry *only* in affected index. That itself does a good
bit for performance.

So to answer your question: yes, joining two HOT chains via WARM is much
cheaper because it results in creating new index entries just for affected
indexes.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-02-24 Thread Pavan Deolasee
On Thu, Feb 23, 2017 at 11:30 PM, Bruce Momjian <br...@momjian.us> wrote:

> On Wed, Feb  1, 2017 at 10:46:45AM +0530, Pavan Deolasee wrote:
> > > contains a WARM tuple. Alternate ideas/suggestions and review of
> the
> > design
> > > are welcome!
> >
> > t_infomask2 contains one last unused bit,
> >
> >
> > Umm, WARM is using 2 unused bits from t_infomask2. You mean there is
> another
> > free bit after that too?
>
> We are obviously going to use several heap or item pointer bits for
> WARM, and once we do that it is going to be hard to undo that.  Pavan,
> are you saying you could do more with WARM if you had more bits?  Are we
> sure we have given you all the bits we can?  Do we want to commit to a
> lesser feature because the bits are not available?
>
>
btree implementation is complete as much as I would like (there are a few
TODOs, but no show stoppers), at least for the first release. There is a
free bit in btree index tuple header that I could use for chain conversion.
In the heap tuples, I can reuse HEAP_MOVED_OFF because that bit will only
be set along with HEAP_WARM_TUPLE bit. Since none of the upgraded clusters
can have HEAP_WARM_TUPLE bit set, I think we are safe.

WARM currently also supports hash indexes, but there is no free bit left in
hash index tuple header. I think I can work around that by using a bit from
ip_posid (not yet implemented/tested, but seems doable).

IMHO if we can do that i.e. support btree and hash indexes to start with,
we should be good to go for the first release. We can try to support other
popular index AMs in the subsequent release.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[HACKERS] Cleanup: avoid direct use of ip_posid/ip_blkid

2017-02-22 Thread Pavan Deolasee
Hello All,

I would like to propose the attached patch which removes all direct
references to ip_posid and ip_blkid members of ItemPointerData struct and
instead use ItemPointerGetOffsetNumber and ItemPointerGetBlockNumber macros
respectively to access these members.

My motivation to do this is to try to free-up some bits from ip_posid
field, given that it can only be maximum 13-bits long. But even without
that, it seems like a good cleanup to me.

One reason why these macros are not always used is because they typically
do assert-validation to ensure ip_posid has a valid value. There are a few
places in code, especially in GIN and also when we are dealing with
user-supplied TIDs when we might get a TID with invalid ip_posid. I've
handled that by defining and using separate macros which skip the
validation. This doesn't seem any worse than what we are already doing.

make check-world with --enable-tap-tests passes.

Comments/suggestions?

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


remove_ip_posid_blkid_ref_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-02-21 Thread Pavan Deolasee
Hi Tom,

On Wed, Feb 1, 2017 at 3:51 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:

>
> (I'm a little more concerned by Alvaro's apparent position that WARM
> is a done deal; I didn't think so.


Are there any specific aspects of the design that you're not comfortable
with? I'm sure there could be some rough edges in the implementation that
I'm hoping will get handled during the further review process. But if there
are some obvious things I'm overlooking please let me know.

Probably the same question to Andres/Robert who has flagged concerns. On my
side, I've run some very long tests with data validation and haven't found
any new issues with the most recent patches.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY

2017-02-19 Thread Pavan Deolasee
On Sun, Feb 19, 2017 at 3:43 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Fri, Feb 17, 2017 at 11:15 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>
> > Ah, nah, scratch that.  If any post-index-build pruning had occurred on a
> > page, the evidence would be gone --- the non-matching older tuples would
> > be removed and what remained would look consistent.  But it wouldn't
> > match the index.  You might be able to find problems if you were willing
> > to do the expensive check on *every* HOT chain, but that seems none too
> > practical.
>
> If the goal is just to detect tuples that aren't indexed and should
> be, what about scanning each index and building a TIDBitmap of all of
> the index entries, and then scanning the heap for non-HOT tuples and
> probing the TIDBitmap for each one?  If you find a heap entry for
> which you didn't find an index entry, go and recheck the index and see
> if one got added concurrently.  If not, whine.
>
>
This particular case of corruption results in a heap tuple getting indexed
by a wrong key (or to be precise, indexed by its old value). So the only
way to detect the corruption is to look at each index key and check if it
matches with the corresponding heap tuple. We could write some kind of self
join that can use a sequential scan and an index-only scan (David Rowley
had suggested something of that sort internally here), but we can't
guarantee index-only scan on a table which is being concurrently updated.
So not sure even that will catch every possible case.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [COMMITTERS] pgsql: Release note updates.

2017-02-07 Thread Pavan Deolasee
On Tue, Feb 7, 2017 at 9:29 PM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

> Alvaro Herrera wrote:
>
> >
> > Hmm.  Now that I think about it, it is probably possible to have a
> > transaction started before CIC that inserted a bunch of rows, and then
> > runs the UPDATE during the CIC race window.  Maybe there's a reason the
> > bug wouldn't hit in that case but I don't see it, and I'm not able to
> > test it right now to verify.
>
> Pavan adds that it's possible to have a transaction do INSERT while CIC
> is running, then some other transaction executes the UPDATE.
>
>
Just to elaborate on that, once a backend ends up with stale cached
bitmaps, AFAICS only a second relcache flush can correct that. This may not
come for long time. In fact since CIC is already holding a lock on the
table, I think second relcache flush will only happen at the end of phase
2. This can take a long time, especially for very large tables. In
meanwhile, the compromised backend can run many transactions with the stale
information. Those transactions can see not only the existing rows, but
also new rows inserted by other new but now committed-good transactions.

It's all theory and I haven't had time to try this out.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [COMMITTERS] pgsql: Release note updates.

2017-02-07 Thread Pavan Deolasee
On Tue, Feb 7, 2017 at 5:23 PM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

> Tom Lane wrote:
> > Release note updates.
> >
> > Add item for last-minute CREATE INDEX CONCURRENTLY fix.
>
> Hi,
>
> Sorry for not noticing earlier, but there is a bug in the notes:
>
> +  If CREATE INDEX CONCURRENTLY was used to build an index
> +  that depends on a column not previously indexed, then rows inserted
> +  or updated by transactions that ran concurrently with
> +  the CREATE INDEX command could have received incorrect
> +  index entries.
>
> CIC bug does not affect inserted rows, only updated rows, since the
> bogus bitmap is only used to compute whether to omit index tuples for
> HOT considerations.
>

That's correct.


>
> Also, the bollixed rows do not receive incorrect index entries -- they
> just do not receive any index entry at all.
>
>
I think it's somewhat both. While it's true that the updated rows may not
get new index entries as they deserve, they will be reachable from the
older index entries. So while a query such as "SELECT * FROM tab WHERE key
= newval" may not return any result, "SELECT * FROM tab WHERE key = oldval"
may actually return the updated (and wrong) tuple.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY

2017-02-06 Thread Pavan Deolasee
On Mon, Feb 6, 2017 at 11:54 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> After some discussion among the release team, we've concluded that the
> best thing to do is to push Pavan's/my patch into today's releases.
> This does not close the matter by any means: we should continue to
> study whether there are related bugs or whether there's a more principled
> way of fixing this bug.  But that patch clearly makes things better,
> and we shouldn't let worries about whether there are more bugs stop
> us from providing some kind of fix to users.
>
> I've made the push, and barring negative reports from the buildfarm,
> it will be in today's releases.
>

Thank you for taking care of it. Buildfarm is looking green until now.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY

2017-02-05 Thread Pavan Deolasee
On Mon, Feb 6, 2017 at 8:01 AM, Pavan Deolasee <pavan.deola...@gmail.com>
wrote:

>
>>
> I like this approach. I'll run the patch on a build with
> CACHE_CLOBBER_ALWAYS, but I'm pretty sure it will be ok.
>

While it looks certain that the fix will miss this point release, FWIW I
ran this patch with CACHE_CLOBBER_ALWAYS and it seems to be working as
expected.. Hard to run all the tests, but a small subset of regression and
test_decoding seems ok.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY

2017-02-05 Thread Pavan Deolasee
On Mon, Feb 6, 2017 at 9:41 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:

>
>
> Hmm.  Consider that the first time relcahe invalidation occurs while
> computing id_attrs, so now the retry logic will compute the correct
> set of attrs (considering two indexes, if we take the example given by
> Alvaro above.).


I don't quite get that. Since rd_idattr must be already cached at that
point and we don't expect to process a relcache flush between successive
calls to RelationGetIndexAttrBitmap(), we should return a consistent copy
of rd_idattr. But may be I am missing something.

Thanks,
Pavan


-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY

2017-02-05 Thread Pavan Deolasee
On Mon, Feb 6, 2017 at 5:44 AM, Andres Freund <and...@anarazel.de> wrote:

> On 2017-02-05 12:51:09 -0500, Tom Lane wrote:
> > Michael Paquier <michael.paqu...@gmail.com> writes:
> > > On Sun, Feb 5, 2017 at 6:53 PM, Pavel Stehule <pavel.steh...@gmail.com>
> wrote:
> > >> I agree with Pavan - a release with known important bug is not good
> idea.
> >
> > > This bug has been around for some time, so I would recommend taking
> > > the time necessary to make the best fix possible, even if it means
> > > waiting for the next round of minor releases.
> >
> > I think the way to think about this sort of thing is, if we had found
> > this bug when a release wasn't imminent, would we consider it bad enough
> > to justify an unscheduled release cycle?  I have to think the answer for
> > this one is "probably not".
>
> +1.  I don't think we serve our users by putting out a nontrivial bugfix
> hastily. Nor do I think we serve them in this instance by delaying the
> release to cover this fix; there's enough other fixes in the release
> imo.
>
>
I'm bit a surprised with this position. What do we tell our users now that
we know this bug exists? They can't fully trust CIC and they don't have any
mechanism to confirm if the newly built index is corruption free or not.
I'm not suggesting that we should hastily refactor the code, but at the
very least some sort of band-aid fix which helps the situation, yet have
minimal side-effects, is warranted. I believe proposed retry patch does
exactly that.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY

2017-02-05 Thread Pavan Deolasee
On Mon, Feb 6, 2017 at 8:15 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:

> On Mon, Feb 6, 2017 at 8:01 AM, Pavan Deolasee <pavan.deola...@gmail.com>
> wrote:
> >
> >
> > On Mon, Feb 6, 2017 at 1:44 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> >>
> >>
> >>
> >> > 2. In the second patch, we tried to recompute attribute lists if a
> >> > relcache
> >> > flush happens in between and index list is invalidated. We've seen
> >> > problems
> >> > with that, especially it getting into an infinite loop with
> >> > CACHE_CLOBBER_ALWAYS. Not clear why it happens, but apparently
> relcache
> >> > flushes keep happening.
> >>
> >> Well, yeah: the point of CLOBBER_CACHE_ALWAYS is to make a relcache
> flush
> >> happen whenever it possibly could.
> >
> >
> > Ah, ok. That explains why the retry logic as originally proposed was
> causing
> > infinite loop.
> >
> >>
> >> The way to deal with that without
> >> looping is to test whether the index set *actually* changed, not whether
> >> it just might have changed.
> >
> >
> > I like this approach. I'll run the patch on a build with
> > CACHE_CLOBBER_ALWAYS, but I'm pretty sure it will be ok. I also like the
> > fact that retry logic is now limited to only when index set changes,
> which
> > fits in the situation we're dealing with.
> >
>
> Don't you think it also has the same problem as mentioned by me in
> Alvaro's patch and you also agreed on it?


No, I don't think so. There can't be any more relcache flushes occurring
between the first RelationGetIndexAttrBitmap() and the subsequent
RelationGetIndexAttrBitmap() calls. The problem with the earlier proposed
approach was that we were not caching, yet returning stale information.
That implied the next call will again recompute a possibly different value.
The retry logic is trying to close a small race yet important race
condition where index set invalidation happens, but we cache stale
information.


> Do you see any need to fix
> it locally in
> RelationGetIndexAttrBitmap(), why can't we clear at transaction end?
>
>
We're trying to fix it in RelationGetIndexAttrBitmap() because that's where
the race exists. I'm not saying there aren't other and better ways of
handling it, but we don't know (I've studied Andres's proposal yet).

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY

2017-02-05 Thread Pavan Deolasee
On Mon, Feb 6, 2017 at 5:41 AM, Peter Geoghegan <p...@bowt.ie> wrote:

> On Sun, Feb 5, 2017 at 4:09 PM, Robert Haas <robertmh...@gmail.com> wrote:
> > I don't think this kind of black-and-white thinking is very helpful.
> > Obviously, data corruption is bad.  However, this bug has (from what
> > one can tell from this thread) been with us for over a decade; it must
> > necessarily be either low-probability or low-severity, or somebody
> > would've found it and fixed it before now.  Indeed, the discovery of
> > this bug was driven by new feature development, not a user report.  It
> > seems pretty clear that if we try to patch this and get it wrong, the
> > effects of our mistake could easily be a lot more serious than the
> > original bug.
>
> +1. The fact that it wasn't driven by a user report convinces me that
> this is the way to go.
>
>
I'm not sure that just because the bug wasn't reported by a user, makes it
any less critical. As Tomas pointed down thread, the nature of the bug is
such that the users may not discover it very easily, but that doesn't mean
it couldn't be affecting them all the time. We can now correlate many past
reports of index corruption to this bug, but we don't have evidence to
prove that. Lack of any good tool or built-in checks probably makes it even
harder.

The reason why I discovered this with WARM is because it now has a built-in
recheck logic, which was discarding some tuples returned by the index scan.
It took me lots of code review and inspection to finally conclude that this
must be an existing bug. Even then for lack of any utility, I could not
detect this easily with master. That doesn't mean I wasn't able to
reproduce, but detection was a problem.

If you look at the reproduction setup, one in every 10, if not 5, CREATE
INDEX CONCURRENTLY ends up creating a corrupt index. That's a good 10%
probability. And this is on a low end laptop, with just 5-10 concurrent
clients running. Probability of hitting the problem will be much higher on
a bigger machine, with many users (on a decent AWS machine, I would find
every third CIC to be corrupt). Moreover the setup is not doing anything
extraordinary. Just concurrent updates which change between HOT to non-HOT
and a CIC.

Thanks,
Pavan
-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY

2017-02-05 Thread Pavan Deolasee
On Mon, Feb 6, 2017 at 1:44 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:

>
>
> > 2. In the second patch, we tried to recompute attribute lists if a
> relcache
> > flush happens in between and index list is invalidated. We've seen
> problems
> > with that, especially it getting into an infinite loop with
> > CACHE_CLOBBER_ALWAYS. Not clear why it happens, but apparently relcache
> > flushes keep happening.
>
> Well, yeah: the point of CLOBBER_CACHE_ALWAYS is to make a relcache flush
> happen whenever it possibly could.


Ah, ok. That explains why the retry logic as originally proposed was
causing infinite loop.


> The way to deal with that without
> looping is to test whether the index set *actually* changed, not whether
> it just might have changed.
>

I like this approach. I'll run the patch on a build with
CACHE_CLOBBER_ALWAYS, but I'm pretty sure it will be ok. I also like the
fact that retry logic is now limited to only when index set changes, which
fits in the situation we're dealing with.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY

2017-02-04 Thread Pavan Deolasee
On Sat, Feb 4, 2017 at 11:54 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

>
> Based on Pavan's comments, I think trying to force this into next week's
> releases would be extremely unwise.  If the bug went undetected this long,
> it can wait for a fix for another three months.


Yes, I think bug existed ever since and went unnoticed. One reason could be
that the race happens only when the new index turns HOT updates into
non-HOT updates. Another reason could be that we don't have checks in place
to catch these kinds of corruption. Having said that, since we have
discovered the bug, at least many 2ndQuadrant customers have expressed
worry and want to know if the fix will be available in 9.6.2 and other
minor releases.  Since the bug can lead to data corruption, the worry is
justified. Until we fix the bug, there will be a constant worry about using
CIC.

If we can have some kind of band-aid fix to plug in the hole, that might be
enough as well. I tested my first patch (which will need some polishing)
and that works well AFAIK. I was worried about prepared queries and all,
but that seems ok too. RelationGetIndexList() always get called within
ExecInitModifyTable. The fix seems quite unlikely to cause any other side
effects.

Another possible band-aid is to throw another relcache invalidation in CIC.
Like adding a dummy index_set_state_flags() within yet another transaction.
Seems ugly, but should fix the problem for now and not cause any impact on
relcache mechanism whatsoever.

That seems better than
> risking new breakage when it's barely 48 hours to the release wrap
> deadline.  We do not have time to recover from any mistakes.
>

I'm not sure what the project policies are, but can we consider delaying
the release by a week for issues like these? Or do you think it will be
hard to come up with a proper fix for the issue and it will need some
serious refactoring?

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY

2017-02-04 Thread Pavan Deolasee
On Sat, Feb 4, 2017 at 12:10 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

>
>
> If we do above, then I think primary key attrs won't be returned
> because for those we are using relation copy rather than an original
> working copy of attrs. See code below:
>
> switch (attrKind)
> {
> ..
> case INDEX_ATTR_BITMAP_PRIMARY_KEY:
> return bms_copy(relation->rd_pkattr);
>
>
I don't think that would a problem since if relation_rd_indexattr is NULL,
primary key attrs will be recomputed and returned.


>
> Apart from above, I think after the proposed patch, it will be quite
> possible that in heap_update(), hot_attrs, key_attrs and id_attrs are
> computed using different index lists (consider relcache flush happens
> after computation of hot_attrs or key_attrs) which was previously not
> possible.  I think in the later code we assume that all the three
> should be computed using the same indexlist.  For ex. see the below
> code:
>

This seems like a real problem to me. I don't know what the consequences
are, but definitely having various attribute lists to have different view
of the set of indexes doesn't seem right.

The problem we are trying to fix is very clear by now. Relcache flush
clears rd_indexvalid and rd_indexattr together, but because of the race,
rd_indexattr is recomputed with the old information and gets cached again,
while rd_indexvalid (and rd_indexlist) remains unset. That leads to the
index corruption in CIC and may cause other issues too that we're not aware
right now. We want cached attribute lists to become invalid whenever index
list is invalidated and that's not happening right now.

So we have tried three approaches so far.

1. Simply reset rd_indexattr whenever we detect rd_indexvalid has been
reset. This was the first patch. It's a very simple patch and has worked
for me with CACHE_CLOBBER_ALWAYS and even fixes the CIC bug. The only
downside it seems that we're invalidating cached attribute lists outside
the normal relcache invalidation path. It also works on the premise that
RelationGetIndexList() will be called every time
before RelationGetIndexAttrBitmap() is called, otherwise we could still end
up using stale cached copies. I wonder if cached plans can be a problem
here.

2. In the second patch, we tried to recompute attribute lists if a relcache
flush happens in between and index list is invalidated. We've seen problems
with that, especially it getting into an infinite loop with
CACHE_CLOBBER_ALWAYS. Not clear why it happens, but apparently relcache
flushes keep happening.

3. We tried the third approach where we don't cache attribute lists if know
that index set has changed and hope that the next caller gets the correct
info. But Amit rightly identified problems with that approach too.

So we either need to find a 4th approach or stay with the first patch
unless we see a problem there too (cached plans, anything else). Or may be
fix CREATE INDEX CONCURRENTLY so that at least the first phase which add
the index entry to the catalog happens with a strong lock. This will lead
to momentary blocking of write queries, but protect us from the race. I'm
assuming this is the only code that can cause the problem, and I haven't
checked other potential hazards.

Ideas/suggestions?

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY

2017-02-03 Thread Pavan Deolasee
On Thu, Feb 2, 2017 at 10:14 PM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

>
>
> I'm going to study the bug a bit more, and put in some patch before the
> upcoming minor tag on Monday.
>
>
Looking at the history and some past discussions, it seems Tomas reported
somewhat similar problem and Andres proposed a patch here
https://www.postgresql.org/message-id/20140514155204.ge23...@awork2.anarazel.de
which got committed via b23b0f5588d2e2. Not exactly the same issue, but
related to relcache flush happening in index_open().

I wonder if we should just add a recheck logic
in RelationGetIndexAttrBitmap() such that if rd_indexvalid is seen as 0 at
the end, we go back and start again from RelationGetIndexList(). Or is
there any code path that relies on finding the old information?

The following comment at the top of RelationGetIndexAttrBitmap() also seems
like a problem to me. CIC works with lower strength lock and hence it can
change the set of indexes even though caller is holding the
RowExclusiveLock, no? The comment seems to suggest otherwise.

4748  * Caller had better hold at least RowExclusiveLock on the target
relation
4749  * to ensure that it has a stable set of indexes.  This also makes it
safe
4750  * (deadlock-free) for us to take locks on the relation's indexes.

I've attached a revised patch based on these thoughts. It looks a bit more
invasive than the earlier one-liner, but looks better to me.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


invalidate_indexattr_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY

2017-02-02 Thread Pavan Deolasee
On Thu, Feb 2, 2017 at 6:14 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:

>
>   /*
> + * If the index list was invalidated, we better also invalidate the index
> + * attribute list (which should automatically invalidate other attributes
> + * such as primary key and replica identity)
> + */
>
> + relation->rd_indexattr = NULL;
> +
>
> I think setting directly to NULL will leak the memory.
>

Good catch, thanks. Will fix.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY

2017-02-02 Thread Pavan Deolasee
On Mon, Jan 30, 2017 at 7:20 PM, Pavan Deolasee <pavan.deola...@gmail.com>
wrote:

>
> Based on my investigation so far and the evidence that I've collected,
> what probably happens is that after a relcache invalidation arrives at the
> new transaction, it recomputes the rd_indexattr but based on the old,
> cached rd_indexlist. So the rd_indexattr does not include the new columns.
> Later rd_indexlist is updated, but the rd_indexattr remains what it was.
>
>
I was kinda puzzled this report did not get any attention, though it's
clearly a data corruption issue. Anyways, now I know why this is happening
and can reproduce this very easily via debugger and two sessions.

The offending code is RelationGetIndexAttrBitmap(). Here is the sequence of
events:

Taking the same table as in the report:

CREATE TABLE cic_tab_accounts (
aid bigint UNIQUE,
abalance bigint,
aid1 bigint
);

1. Session S1 calls DROP INDEX pgb_a_aid1 and completes
2. Session S2 is in process of rebuilding its rd_indexattr bitmap (because
of previous relcache invalidation caused by DROP INDEX). To be premise,
assume that the session has finished rebuilding its index list. Since DROP
INDEX has just happened,
4793 indexoidlist = RelationGetIndexList(relation);

3. S1 then starts CREATE INDEX CONCURRENTLY pgb_a_aid1 ON
cic_tab_accounts(aid1)
4. S1 creates catalog entry for the new index, commits phase 1 transaction
and builds MVCC snapshot for the second phase and also finishes the initial
index build
5. S2 now proceeds. Now notice that S2 had computed the index list based on
the old view i.e. just one index.

The following comments in relcahce.c are now significant:

4799 /*
4800  * Copy the rd_pkindex and rd_replidindex value computed by
4801  * RelationGetIndexList before proceeding.  This is needed because
a
4802  * relcache flush could occur inside index_open below, resetting
the
4803  * fields managed by RelationGetIndexList. (The values we're
computing
4804  * will still be valid, assuming that caller has a sufficient lock
on
4805  * the relation.)
4806  */

That's what precisely goes wrong.

6. When index_open is called, relcache invalidation generated by the first
transaction commit in CIC gets accepted and processed. As part of this
invalidation, rd_indexvalid is set to 0, which invalidates rd_indexlist too
7. But S2 is currently using index list obtained at step 2 above (which has
only old index). It goes ahead and computed rd_indexattr based on just the
old index.
8. While relcahce invalidation processing at step 6 cleared rd_indexattr
(along with rd_indexvalid), it is again set at
4906 relation->rd_indexattr = bms_copy(indexattrs);

But what gets set at step 8 is based on the old view. There is no way
rd_indexattr will be invalidated until S2 receives another relcache
invalidation. This will happen when the CIC proceeds. But until then, every
update done by S2 will generate broken HOT chains, leading to data
corruption.

I can reproduce this entire scenario using gdb sessions. This also explains
why the patch I sent earlier helps to solve the problem.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-02-01 Thread Pavan Deolasee
On Thu, Feb 2, 2017 at 3:49 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Alvaro Herrera <alvhe...@2ndquadrant.com> writes:
> > Tom Lane wrote:
> >> I think what we ought to do about this is invent additional API
> >> functions, say
> >>
> >> Oid CatalogTupleInsertWithInfo(Relation heapRel, HeapTuple tup,
> >> CatalogIndexState indstate);
> >> void CatalogTupleUpdateWithInfo(Relation heapRel, ItemPointer otid,
> >> HeapTuple tup, CatalogIndexState indstate);
> >>
> >> and use these in place of simple_heap_foo plus CatalogIndexInsert
> >> in the places where this optimization had been applied.
>
> > This looks reasonable enough to me.
>
> Done.
>
>
Thanks for taking care of this. Shame that I missed this because I'd
specifically noted the special casing for large objects etc. But looks like
while changing 180+ call sites, I forgot my notes.

Thanks again,
Pavan


-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-31 Thread Pavan Deolasee
On Tue, Jan 31, 2017 at 7:21 PM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

>
> > > +#define HeapTupleHeaderGetNextTid(tup, next_ctid) \
> > > +do { \
> > > + AssertMacro(!((tup)->t_infomask2 & HEAP_LATEST_TUPLE)); \
> > > + ItemPointerCopy(&(tup)->t_ctid, (next_ctid)); \
> > > +} while (0)
> >
> > > Actually, I think this macro could just return the TID so that it can
> be
> > > used as struct assignment, just like ItemPointerCopy does internally --
> > > callers can do
> > >ctid = HeapTupleHeaderGetNextTid(tup);
> >
> > While I agree with your proposal, I wonder why we have ItemPointerCopy()
> in
> > the first place because we freely copy TIDs as struct assignment. Is
> there
> > a reason for that? And if there is, does it impact this specific case?
>
> I dunno.  This macro is present in our very first commit d31084e9d1118b.
> Maybe it's an artifact from the Lisp to C conversion.  Even then, we had
> some cases of iptrs being copied by struct assignment, so it's not like
> it didn't work.  Perhaps somebody envisioned that the internal details
> could change, but that hasn't happened in two decades so why should we
> worry about it now?  If somebody needs it later, it can be changed then.
>
>
May I suggest in that case that we apply the attached patch which removes
all references to ItemPointerCopy and its definition as well? This will
avoid confusion in future too. No issues noticed in regression tests.


> > There is one issue that bothers me. The current implementation lacks
> > ability to convert WARM chains into HOT chains. The README.WARM has some
> > proposal to do that. But it requires additional free bit in tuple header
> > (which we don't have) and of course, it needs to be vetted and
> implemented.
> > If the heap ends up with many WARM tuples, then index-only-scans will
> > become ineffective because index-only-scan can not skip a heap page, if
> it
> > contains a WARM tuple. Alternate ideas/suggestions and review of the
> design
> > are welcome!
>
> t_infomask2 contains one last unused bit,


Umm, WARM is using 2 unused bits from t_infomask2. You mean there is
another free bit after that too?


> and we could reuse vacuum
> full's bits (HEAP_MOVED_OUT, HEAP_MOVED_IN), but that will need some
> thinking ahead.  Maybe now's the time to start versioning relations so
> that we can ensure clusters upgraded to pg10 do not contain any of those
> bits in any tuple headers.
>

Yeah, IIRC old VACUUM FULL was removed in 9.0, which is good 6 year old.
Obviously, there still a chance that a pre-9.0 binary upgraded cluster
exists and upgrades to 10. So we still need to do something about them if
we reuse these bits. I'm surprised to see that we don't have any mechanism
in place to clear those bits. So may be we add something to do that.

I'd some other ideas (and a patch too) to reuse bits from t_ctid.ip_pos
given that offset numbers can be represented in just 13 bits, even with the
maximum block size. Can look at that if it comes to finding more bits.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


remove_itempointercopy.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-31 Thread Pavan Deolasee
On Tue, Jan 31, 2017 at 7:37 PM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

> Pavan Deolasee wrote:
>
> >
> > Sounds good. Should I submit that as a separate patch on current master?
>
> Yes, please.
>
>
Attached.

Two new APIs added.

- CatalogInsertHeapAndIndex which does a simple_heap_insert followed by
catalog updates
- CatalogUpdateHeapAndIndex which does a simple_heap_update followed by
catalog updates

There are only a handful callers remain for simple_heap_insert/update after
this patch. They are typically working with already opened indexes and
hence I left them unchanged.

make check-world passes with the patch.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


catalog_update.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-31 Thread Pavan Deolasee
On Tue, Jan 31, 2017 at 7:21 PM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

> Pavan Deolasee wrote:
> > On Thu, Jan 26, 2017 at 2:38 AM, Alvaro Herrera <
> alvhe...@2ndquadrant.com>
> > wrote:
>
> > > The simple_heap_update + CatalogUpdateIndexes pattern is getting
> > > obnoxious.  How about creating something like catalog_heap_update which
> > > does both things at once, and stop bothering each callsite with the
> WARM
> > > stuff?
> >
> > What I realised that there are really 2 patterns:
> > 1. simple_heap_insert, CatalogUpdateIndexes
> > 2. simple_heap_update, CatalogUpdateIndexes
> >
> > There are only couple of places where we already have indexes open or
> have
> > more than one tuple to update, so we call CatalogIndexInsert directly.
> What
> > I ended up doing in the attached patch is add two new APIs which combines
> > the two steps of each of these patterns. It seems much cleaner to me and
> > also less buggy for future users. I hope I am not missing a reason not to
> > do combine these steps.
>
> CatalogUpdateIndexes was just added as a convenience function on top of
> a very common pattern.  If we now have a reason to create a second one
> because there are now two very common patterns, it seems reasonable to
> have two functions.  I think I would commit the refactoring to create
> these functions ahead of the larger WARM patch, since I think it'd be
> bulky and largely mechanical.  (I'm going from this description; didn't
> read your actual code.)
>

Sounds good. Should I submit that as a separate patch on current master?

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-30 Thread Pavan Deolasee
On Thu, Jan 26, 2017 at 2:38 AM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

> Looking at your 0002 patch now.  It no longer applies, but the conflicts
> are trivial to fix.  Please rebase and resubmit.
>
>
Thanks.


>
>  Maybe I will spend some time trying to
> convert it to Perl using PostgresNode.
>
>
Agree. I put together a test harness to hammer the WARM code as much as we
can. This harness has already discovered some bugs, especially around index
creation part. It also discovered one outstanding bug in master, so it's
been useful. But I agree to rewrite it using perl.


>
> I think having the "recheck" index methods create an ExecutorState looks
> out of place.  How difficult is it to pass the estate from the calling
> code?
>

I couldn't find an easy way given the place where recheck is required. Can
you suggest something?


>
> IMO heap_get_root_tuple_one should be called just heap_get_root_tuple().
> That function and its plural sibling heap_get_root_tuples() should
> indicate in their own comments what the expectations are regarding the
> root_offsets output argument, rather than deferring to the comments in
> the "internal" function, since they differ on that point; for the rest
> of the invariants I think it makes sense to say "Also see the comment
> for heap_get_root_tuples_internal".  I wonder if heap_get_root_tuple
> should just return the ctid instead of assigning the value to a
> passed-in pointer, i.e.
> OffsetNumber
> heap_get_root_tuple(Page page, OffsetNumber target_offnum)
> {
> OffsetNumberoff;
> heap_get_root_tuples_internal(page, target_offnum, );
> return off;
> }
>
>
Yes, all of that makes sense. Will fix.


>
> The simple_heap_update + CatalogUpdateIndexes pattern is getting
> obnoxious.  How about creating something like catalog_heap_update which
> does both things at once, and stop bothering each callsite with the WARM
> stuff?  In fact, given that CatalogUpdateIndexes is used in other
> places, maybe we should leave its API alone and create another function,
> so that we don't have to change the many places that only do
> simple_heap_insert.  (Places like OperatorCreate which do either insert
> or update could just move the index update call into each branch.)
>
>
What I ended up doing is I added two new APIs.
- CatalogUpdateHeapAndIndex
- CatalogInsertHeapAndIndex

I could replace almost all occurrences of simple_heap_update +
CatalogUpdateIndexes with the first API and simple_heap_insert +
CatalogUpdateIndexes with the second API. This looks like a good
improvement to me anyways since there are about 180 places where these
functions are called almost in the same pattern. May be it will also avoid
a bug when someone forgets to update the index after inserting/updating
heap.


> .
> I wonder if heap_hot_search_buffer() and heap_hot_search() should return
> a tri-valued enum instead of boolean; that idea looks reasonable in
> theory but callers have to do more work afterwards, so maybe not.
>
>
Ok. I'll try to rearrange it a bit. May be we just have one API after all?
There are only a very few callers of these APIs.

Thanks,
Pavan


-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-30 Thread Pavan Deolasee
On Wed, Jan 25, 2017 at 10:06 PM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

> Reading 0001_track_root_lp_v9.patch again:
>
>
Thanks for the review.


> > +/*
> > + * We use the same HEAP_LATEST_TUPLE flag to check if the tuple's
> t_ctid field
> > + * contains the root line pointer. We can't use the same
> > + * HeapTupleHeaderIsHeapLatest macro because that also checks for
> TID-equality
> > + * to decide whether a tuple is at the of the chain
> > + */
> > +#define HeapTupleHeaderHasRootOffset(tup) \
> > +( \
> > + ((tup)->t_infomask2 & HEAP_LATEST_TUPLE) != 0 \
> > +)
> >
> > +#define HeapTupleHeaderGetRootOffset(tup) \
> > +( \
> > + AssertMacro(((tup)->t_infomask2 & HEAP_LATEST_TUPLE) != 0), \
> > + ItemPointerGetOffsetNumber(&(tup)->t_ctid) \
> > +)
>
> Interesting stuff; it took me a bit to see why these macros are this
> way.  I propose the following wording which I think is clearer:
>
>   Return whether the tuple has a cached root offset.  We don't use
>   HeapTupleHeaderIsHeapLatest because that one also considers the slow
>   case of scanning the whole block.
>

Umm, not scanning the whole block, but HeapTupleHeaderIsHeapLatest compares
t_ctid with the passed in TID and returns true if those matches. To know if
root lp is cached, we only rely on the HEAP_LATEST_TUPLE flag. Though if
the flag is set, then it implies latest tuple too.


>
> Please flag the macros that have multiple evaluation hazards -- there
> are a few of them.
>

Can you please tell me an example? I must be missing something.


>
>
> > +/*
> > + * Get TID of next tuple in the update chain. Caller should have
> checked that
> > + * we are not already at the end of the chain because in that case
> t_ctid may
> > + * actually store the root line pointer of the HOT chain whose member
> this
> > + * tuple is.
> > + */
> > +#define HeapTupleHeaderGetNextTid(tup, next_ctid) \
> > +do { \
> > + AssertMacro(!((tup)->t_infomask2 & HEAP_LATEST_TUPLE)); \
> > + ItemPointerCopy(&(tup)->t_ctid, (next_ctid)); \
> > +} while (0)
>
> Actually, I think this macro could just return the TID so that it can be
> used as struct assignment, just like ItemPointerCopy does internally --
> callers can do
> ctid = HeapTupleHeaderGetNextTid(tup);
>
>
Yes, makes sense. Will fix.


>
>
> The API of RelationPutHeapTuple appears a bit contorted, where
> root_offnum is both input and output.  I think it's cleaner to have the
> argument be the input, and have the output offset be the return value --
> please check whether that simplifies things; for example I think this:
>
> > + root_offnum = InvalidOffsetNumber;
> > + RelationPutHeapTuple(relation, buffer, heaptup,
> false,
> > + _offnum);
>
> becomes
>
> root_offnum = RelationPutHeapTuple(relation, buffer, heaptup,
> false,
>     InvalidOffsetNumber);
>
>
Make sense. Will fix.


>
>
> Many comments lack finishing periods in complete sentences, which looks
> odd.  Please fix.
>

Sorry, not sure where I picked that style from. I see that the existing
code has both styles, though I will add finishing periods because I like
that way too.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[HACKERS] Index corruption with CREATE INDEX CONCURRENTLY

2017-01-30 Thread Pavan Deolasee
elcache invalidation correctly and third  (PID
73091) which got it wrong. Specifically, look at line #77 where snapshot is
obtained for the second phase:

2017-01-29 10:44:31.238 UTC [73119] [xid=0]LOG:  CIC (pgb_a_aid1) building
first phase with snapshot (4670891:4670893)
At line 108, you'll see a new transaction with XID 4670908 still using the
old attribute list.

2017-01-29 10:44:31.889 UTC [73091] [xid=4670908]LOG:  heap_update
(cic_tab_accounts), hot_attrs ((b 9))

Other session, which I included for comparison, sees the new index and
recomputes its rd_indexattr in time and hence that update works as expected.


Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


cic_bug.tar.gz
Description: GNU Zip compressed data


relcache_issue_2.log
Description: Binary data


invalidate_indexattr.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failure in commit_ts tap tests

2017-01-21 Thread Pavan Deolasee
On Sat, Jan 21, 2017 at 9:39 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Pavan Deolasee <pavan.deola...@gmail.com> writes:
> > On Sat, Jan 21, 2017 at 9:09 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> >> Hm, but what of the "null" value?  Also, I get
> >>
> >> $ perl -e 'use warnings; use Test::More; ok("2017-01-01" != "null",
> "ok");'
> >> Argument "null" isn't numeric in numeric ne (!=) at -e line 1.
> >> Argument "2017-01-01" isn't numeric in numeric ne (!=) at -e line 1.
> >> ok 1 - ok
>
> > It declares the test as "passed", right?
>
> Oh!  So it does.  That is one darn weird behavior of the != operator.
>
>
Indeed! See this:

# first numeric matches, doesn't check beyond that
$ perl -e 'if ("2017-23" != "2017-24") {print "Not equal\n"} else {print
"Equal\n"}'
Equal

# first numeric doesn't match, operators works ok
$ perl -e 'if ("2017-23" != "2018-24") {print "Not equal\n"} else {print
"Equal\n"}'
Not equal

# comparison of numeric with non-numeric, works ok
$ perl -e 'if ("2017-23" != "Foo") {print "Not equal\n"} else {print
"Equal\n"}'
Not equal

# numeric on RHS, works ok
$ perl -e 'if ("Foo" != "2018-24") {print "Not equal\n"} else {print
"Equal\n"}'
Not equal

These tests show that the operator returns the correct result it finds a
numeric value at the start of the string, either on LHS or RHS. Also, it
will only compare the numeric values until first non-numeric character is
found.

# no numeric on either side
$ perl -e 'if ("Fri 2017-23" != "Fri 2017-23") {print "Not equal\n"} else
{print "Equal\n"}'
Equal

*# no numeric on either side, arbitrary strings declared as equal*
$ perl -e 'if ("Fri 2017-23" != "Foo") {print "Not equal\n"} else {print
"Equal\n"}'
Equal

These two tests show why we saw no failure earlier. If neither LHS or RHS
string has a starting numeric value, the operator declares the arguments as
equal, irrespective of their values. I tested the same with == operator and
that also exhibits the same behaviour. Weird and I wonder how it's not a
source of constant bugs in perl code (I don't use perl a lot, so may be
those who do are used to either turning warnings on or know this already.


>
>
> There's still the point that we're not actually exercising this script
> in the buildfarm ...
>

Yes indeed.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


  1   2   3   4   5   6   7   8   >