Re: [HACKERS] Row Level Security Bug ?

2017-11-12 Thread Joe Conway
On 11/12/2017 10:17 AM, Andrea Adami wrote:
> if i do:
> 
> SET ROLE 'manage...@scuola-1.it '

[SELECT from table]

> i see only one row (as expected)
> 
> but when i do:

[SELECT from VIEWs]

> I see all the rows always
> 
> this way i lack all the row level security i defined
> 
> is this either a bug or it's made by design ?
> if it's made by design why ?
> Is there  a way to write view that respect the row level security ?
> For my point of view is a nonsense make a row level security that
> doesn't work with the view.

See:
https://www.postgresql.org/docs/10/static/sql-createview.html
In particular: "Access to tables referenced in the view is determined by
permissions of the view owner."

And:
https://www.postgresql.org/docs/10/static/ddl-rowsecurity.html
"Superusers and roles with the BYPASSRLS attribute always bypass the row
security system when accessing a table. Table owners normally bypass row
security as well, though a table owner can choose to be subject to row
security with ALTER TABLE ... FORCE ROW LEVEL SECURITY."

HTH,

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


[HACKERS] Row Level Security Bug ?

2017-11-12 Thread Andrea Adami
Hello,
i have a db with a couple of tables
(enclosed the script to recreate it, please have a look before to proceed)
i enabled the row level security and all seem to work fine

if i do it (connected in as superuser like, usualy, postgres is):

select school, description, example
from schools

i can see all the rows

if i do:

SET ROLE 'manage...@scuola-1.it'

select school, description, example
from school

i see only one row (as expected)

but when i do:

select *
from _rls_test

select *
FROM _rls_test_security_barrier

select *
from _rls_test_with_check_local

select *
from _rls_test_with_check_local_cascade

I see all the rows always

this way i lack all the row level security i defined

is this either a bug or it's made by design ?
if it's made by design why ?
Is there  a way to write view that respect the row level security ?
For my point of view is a nonsense make a row level security that doesn't
work with the view.

Thanks to all the spend time to answer me.

here:
https://github.com/scuola247/postgresql
you can have a look at the complete database

Andrea Adami

===
===
===

CREATE DATABASE test
  WITH OWNER = postgres
   ENCODING = 'UTF8'
   TABLESPACE = pg_default
   CONNECTION LIMIT = -1;


CREATE SEQUENCE public.pk_seq
  INCREMENT 1
  MINVALUE 1
  MAXVALUE 9223372036854775807
  START 736220
  CACHE 1;


CREATE TABLE public.schools
(
  school bigint NOT NULL DEFAULT nextval('pk_seq'::regclass), -- Uniquely
identifies the table row
  description character varying(160) NOT NULL, -- Description for the school
  processing_code character varying(160) NOT NULL, -- A code that identify
the school on the government information system
  mnemonic character varying(30) NOT NULL, -- Short description to be use
as code
  example boolean NOT NULL DEFAULT false, -- It indicates that the data
have been inserted to be an example of the use of the data base
  behavior bigint, -- Indicates the subject used for the behavior
  CONSTRAINT schools_pk PRIMARY KEY (school),
  CONSTRAINT schools_uq_description UNIQUE (description),
  CONSTRAINT schools_uq_mnemonic UNIQUE (mnemonic),
  CONSTRAINT schools_uq_processing_code UNIQUE (processing_code, example)
);

-- Index: public.schools_fk_behavior

CREATE INDEX schools_fk_behavior
  ON public.schools
  USING btree
  (behavior);


CREATE TABLE public.usenames_schools
(
  usename_school bigint NOT NULL DEFAULT nextval('pk_seq'::regclass), --
Unique identification code for the row
  usename name NOT NULL, -- The session's usename
  school bigint NOT NULL, -- School enabled for the the usename
  CONSTRAINT usenames_schools_pk PRIMARY KEY (usename_school),
  CONSTRAINT usenames_schools_fk_school FOREIGN KEY (school)
  REFERENCES public.schools (school) MATCH SIMPLE
  ON UPDATE CASCADE ON DELETE CASCADE,
  CONSTRAINT usenames_schools_uq_usename_school UNIQUE (usename, school) --
Foe every usename one school can be enabled only one time
);

-- Index: public.usenames_schools_fx_school

CREATE INDEX usenames_schools_fx_school
  ON public.usenames_schools
  USING btree
  (school);

  CREATE OR REPLACE VIEW public._rls_test AS
 SELECT schools.school,
schools.description,
schools.example
   FROM schools;


CREATE OR REPLACE VIEW public._rls_test_security_barrier WITH
(security_barrier=true) AS
 SELECT schools.school,
schools.description,
schools.example
   FROM schools;

CREATE OR REPLACE VIEW public._rls_test_with_check_local WITH
(check_option=local) AS
 SELECT schools.school,
schools.description,
schools.example
   FROM schools;

CREATE OR REPLACE VIEW public._rls_test_with_check_local_cascade WITH
(check_option=cascaded) AS
 SELECT schools.school,
schools.description,
schools.example
   FROM schools;

-- now same data
-- now same data
-- now same data

INSERT INTO
public.schools(school,description,processing_code,mnemonic,example) VALUES
('289610','Istituto comprensivo "Voyager"','ZZIC1Z','IC
VOYAGER','t');
INSERT INTO
public.schools(school,description,processing_code,mnemonic,example) VALUES
('20','Istituto Tecnico Tecnologico "Leonardo da
Vinci"','ZZITTZ','ITT DAVINCI','t');
INSERT INTO
public.schools(school,description,processing_code,mnemonic,example) VALUES
('10','Istituto comprensivo ''Andromeda''','ZZIC8Z','IC
ANDROMEDA','t');


INSERT INTO public.usenames_schools(usename_school,usename,school) VALUES
('7266330','manage...@scuola-1.it','10');


-- THEN ENABLE ROW LEVEL SECURITY
-- THEN ENABLE ROW LEVEL SECURITY
-- THEN ENABLE ROW LEVEL SECURITY


  ALTER TABLE usenames_schools ENABLE ROW LEVEL SECURITY;

 ALTER TABLE schools ENABLE ROW LEVEL SECURITY;

CREATE POLICY usenames_schools_pl_usename ON usenames_schools TO public
 USING (usename = current_user)
  WITH CHECK (usename = current_user);

CREATE POLICY schools_pl_school ON schools 

[HACKERS] Proposed fix for bug #14826 (the invalid empty array business)

2017-09-23 Thread Tom Lane
I poked around among the callers of construct_[md_]array() and found at
least two more besides ts_lexize() that could build zero-element arrays;
one of very ancient standing is in the pg_prepared_statements view.

So I think we've got a clear hazard here that justifies changing the
behavior of the low-level array functions, rather than expecting every
call site to be on its guard about empty arrays.  Accordingly, I propose
the attached patch which makes construct_md_array responsible for getting
this right.

In principle we could now remove code from the places that do take care to
call construct_empty_array instead.  But I found only one place where it
really seemed worth removing anything, in ExecEvalArrayExpr.

I'm a little bit scared about back-patching this, as it seems at least
possible that some external code could be broken by the change in what
construct_[md_]array could hand back.  Given that some of these bugs
have been in place for ~ten years and nobody noticed till now, seems
like it might be good enough to fix them in HEAD only.

Comments?

regards, tom lane

diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index bd8a15d..09abd46 100644
*** a/src/backend/executor/execExprInterp.c
--- b/src/backend/executor/execExprInterp.c
*** ExecEvalArrayExpr(ExprState *state, Expr
*** 2131,2144 
  		Datum	   *dvalues = op->d.arrayexpr.elemvalues;
  		bool	   *dnulls = op->d.arrayexpr.elemnulls;
  
- 		/* Shouldn't happen here, but if length is 0, return empty array */
- 		if (nelems == 0)
- 		{
- 			*op->resvalue =
- PointerGetDatum(construct_empty_array(element_type));
- 			return;
- 		}
- 
  		/* setup for 1-D array of the given length */
  		ndims = 1;
  		dims[0] = nelems;
--- 2131,2136 
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index e4101c9..d1f2fe7 100644
*** a/src/backend/utils/adt/arrayfuncs.c
--- b/src/backend/utils/adt/arrayfuncs.c
*** array_map(FunctionCallInfo fcinfo, Oid r
*** 3297,3302 
--- 3297,3303 
   *
   * A palloc'd 1-D array object is constructed and returned.  Note that
   * elem values will be copied into the object even if pass-by-ref type.
+  * Also note the result will be 0-D not 1-D if nelems = 0.
   *
   * NOTE: it would be cleaner to look up the elmlen/elmbval/elmalign info
   * from the system catalogs, given the elmtype.  However, the caller is
*** construct_array(Datum *elems, int nelems
*** 3331,3336 
--- 3332,3338 
   *
   * A palloc'd ndims-D array object is constructed and returned.  Note that
   * elem values will be copied into the object even if pass-by-ref type.
+  * Also note the result will be 0-D not ndims-D if any dims[i] = 0.
   *
   * NOTE: it would be cleaner to look up the elmlen/elmbval/elmalign info
   * from the system catalogs, given the elmtype.  However, the caller is
*** construct_md_array(Datum *elems,
*** 3362,3373 
   errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)",
  		ndims, MAXDIM)));
  
- 	/* fast track for empty array */
- 	if (ndims == 0)
- 		return construct_empty_array(elmtype);
- 
  	nelems = ArrayGetNItems(ndims, dims);
  
  	/* compute required space */
  	nbytes = 0;
  	hasnulls = false;
--- 3364,3375 
   errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)",
  		ndims, MAXDIM)));
  
  	nelems = ArrayGetNItems(ndims, dims);
  
+ 	/* if ndims <= 0 or any dims[i] == 0, return empty array */
+ 	if (nelems <= 0)
+ 		return construct_empty_array(elmtype);
+ 
  	/* compute required space */
  	nbytes = 0;
  	hasnulls = false;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [BUGS] Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog

2017-06-10 Thread Alvaro Herrera
Noah Misch wrote:

> IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is long past due
> for your status update.  Please reacquaint yourself with the policy on open
> item ownership[1] and then reply immediately.  If I do not hear from you by
> 2017-06-11 07:00 UTC, I will transfer this item to release management team
> ownership without further notice.

I volunteer to own this item.  I'll report on Wednesday 14th at the latest.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential hot-standby bug around xacts committed but in xl_running_xacts

2017-05-02 Thread Simon Riggs
On 2 May 2017 at 18:06, Andres Freund  wrote:

>> What I suggest is that with logical decoding in mind we do this
>> 1. Inject a new record XLOG_SNAPSHOT_START at the start of
>> LogStandbySnapshot(). We start logical decoding from there.
>> 2. Record any transactions that end
>> 3. Now the full XLOG_RUNNING_XACTS record arrives. We apply all xacts
>> that are seen as running, minus any ended between 1 and 3
>
>> This avoids the problems for the race but without holding locks while
>> we log XLOG_RUNNING_XACTS, something that was considered painful for
>> Hot Standby.
>
> I don't think that really solves it, because other transactions could
> just be stuck just after the XLogInsert() forever.  And it'd have the
> issue of having to backpatch a new record.  I'm working on an
> alternative approach, let's hope that that works out.

Backpatchable approach.

1. Record CurrentInsertPosition()
2. LogStandbySnapshot()
3. Insert custom logical wal message containing currentinsertposition
and LSN of (2)

When we decode
1. Read standby snapshot
2. Scan forwards until we see message written by step (3) above, which
is identifiable because it contains LSN of snapshot.
3. Read initial LSN from message then re-scan from LSN until
xl_running_xacts message collecting any commits/aborts and removing
them from snapshot.

No new WAL messages, no locking problem, race condition handled.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential hot-standby bug around xacts committed but in xl_running_xacts

2017-05-02 Thread Craig Ringer
On 2 May 2017 at 13:12, Simon Riggs  wrote:

> What I suggest is that with logical decoding in mind we do this
> 1. Inject a new record XLOG_SNAPSHOT_START at the start of
> LogStandbySnapshot(). We start logical decoding from there.
> 2. Record any transactions that end
> 3. Now the full XLOG_RUNNING_XACTS record arrives. We apply all xacts
> that are seen as running, minus any ended between 1 and 3
>
> This avoids the problems for the race but without holding locks while
> we log XLOG_RUNNING_XACTS, something that was considered painful for
> Hot Standby.

Sounds like a sensible solution to me. It avoids the need for a rather
undesirable interlock between xlog and shmem commit.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential hot-standby bug around xacts committed but in xl_running_xacts

2017-05-01 Thread Simon Riggs
On 1 May 2017 at 22:38, Andres Freund  wrote:

> The thread below 
> http://archives.postgresql.org/message-id/f37e975c-908f-858e-707f-058d3b1eb214%402ndquadrant.com
> describes an issue in logical decoding that arises because
> xl_running_xacts' contents aren't necessarily coherent with the contents
> of the WAL, because RecordTransactionCommit() / RecordTransactionAbort()
> don't have any interlock against the procarray.  That means
> xl_running_xacts can contain transactions assumed to be running, that
> already have their commit/abort records WAL logged.

That is known/by design.

> I think that's not just problematic in logical decoding, but also
> Hot-Standby.  Consider the following:
>
> ProcArrayApplyRecoveryInfo() gets an xl_running_xacts record that's not
> suboverflowed, and thus will change to STANDBY_SNAPSHOT_READY.

Yes

> In that
> case it'll populate the KnownAssignedXids machinery using
> KnownAssignedXidsAdd().

No, because of this section of code in ProcArrayApplyRecoveryInfo()

/*
 * The running-xacts snapshot can contain xids that were still visible
 * in the procarray when the snapshot was taken, but were already
 * WAL-logged as completed. They're not running anymore, so ignore
 * them.
 */
 if (TransactionIdDidCommit(xid) || TransactionIdDidAbort(xid))
 continue;

> Am I missing something that protects against the above scenario?

It does seem so, yes. The locking issues in Hot Standby are complex,
but they do seem to be correct, thanks for reviewing them.

This is documented in multiple locations, including what I thought was
your own comment in LogStandbySnapshot().

What I suggest is that with logical decoding in mind we do this
1. Inject a new record XLOG_SNAPSHOT_START at the start of
LogStandbySnapshot(). We start logical decoding from there.
2. Record any transactions that end
3. Now the full XLOG_RUNNING_XACTS record arrives. We apply all xacts
that are seen as running, minus any ended between 1 and 3

This avoids the problems for the race but without holding locks while
we log XLOG_RUNNING_XACTS, something that was considered painful for
Hot Standby.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Potential hot-standby bug around xacts committed but in xl_running_xacts

2017-05-01 Thread Andres Freund
Hi,

The thread below 
http://archives.postgresql.org/message-id/f37e975c-908f-858e-707f-058d3b1eb214%402ndquadrant.com
describes an issue in logical decoding that arises because
xl_running_xacts' contents aren't necessarily coherent with the contents
of the WAL, because RecordTransactionCommit() / RecordTransactionAbort()
don't have any interlock against the procarray.  That means
xl_running_xacts can contain transactions assumed to be running, that
already have their commit/abort records WAL logged.

I think that's not just problematic in logical decoding, but also
Hot-Standby.  Consider the following:

ProcArrayApplyRecoveryInfo() gets an xl_running_xacts record that's not
suboverflowed, and thus will change to STANDBY_SNAPSHOT_READY.  In that
case it'll populate the KnownAssignedXids machinery using
KnownAssignedXidsAdd().

Once STANDBY_SNAPSHOT_READY, CheckRecoveryConsistency() will signal
postmaster to allow connections.

For HS, a snapshot will be built by GetSnapshotData() using
KnownAssignedXidsGetAndSetXmin().  That in turn uses the transactions
currently known to be running, to populate the snapshot.

Now, if transactions have committed before (in the "earlier LSN" sense)
the xl_running_xacts record, ExpireTreeKnownAssignedTransactionIds() in
xact_redo_commit() will already have run.  Which means we'll assume
already committed transactions are still running.  In other words, the
snapshot is corrupted.

Luckily this'll self-correct over time, fixed by
ExpireOldKnownAssignedTransactionIds().


Am I missing something that protects against the above scenario?


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Latch reset ordering bug in condition_variable.c

2017-02-09 Thread Robert Haas
On Thu, Feb 9, 2017 at 6:01 AM, Thomas Munro
 wrote:
> ConditionVariablePrepareToSleep() has a race that can leave you
> hanging, introduced by me in the v4 patch.  The problem is that that
> function resets our latch after adding our process to the wakeup list.
> With the right timing, the following sequence can happen:
>
> 1.  ConditionVariablePrepareToSleep() adds us to the wakeup list.
> 2.  Some other process calls ConditionVariableSignal().  It removes us
> from the wakeup list and sets our latch.
> 3.  ConditionVariablePrepareToSleep() resets our latch.
> 4.  We enter (or continue) our predicate loop.  Our exit condition
> happens not to be true yet, so we call ConditionVariableSleep().
> 5.  ConditionVariableSleep() never returns because WaitEventSet()
> blocks.  Our latch is not set, yet we are no longer in the wakeup list
> so ConditionalVariableSignal() will never set it.
>
> We should reset the latch first.  Then there is no way to reach
> ConditionVariableSleep() with neither a set latch nor an entry in the
> wakeup queue.
>
> See attached.  Thoughts?

Oops.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Latch reset ordering bug in condition_variable.c

2017-02-09 Thread Thomas Munro
Hi,

ConditionVariablePrepareToSleep() has a race that can leave you
hanging, introduced by me in the v4 patch.  The problem is that that
function resets our latch after adding our process to the wakeup list.
With the right timing, the following sequence can happen:

1.  ConditionVariablePrepareToSleep() adds us to the wakeup list.
2.  Some other process calls ConditionVariableSignal().  It removes us
from the wakeup list and sets our latch.
3.  ConditionVariablePrepareToSleep() resets our latch.
4.  We enter (or continue) our predicate loop.  Our exit condition
happens not to be true yet, so we call ConditionVariableSleep().
5.  ConditionVariableSleep() never returns because WaitEventSet()
blocks.  Our latch is not set, yet we are no longer in the wakeup list
so ConditionalVariableSignal() will never set it.

We should reset the latch first.  Then there is no way to reach
ConditionVariableSleep() with neither a set latch nor an entry in the
wakeup queue.

See attached.  Thoughts?

-- 
Thomas Munro
http://www.enterprisedb.com


fix-condition-variable-race.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] _hash_addovflpage has a bug

2017-01-10 Thread Robert Haas
On Mon, Jan 9, 2017 at 10:46 PM, Amit Kapila  wrote:
> Yeah, we can write code that way, but then it is better to rely just
> on retain_pin variable in the function and add an Assert for bucket
> page whenever we are retaining pin.  How about doing something like
> attached patch?

Committed.

>>  Not sure exactly how that
>> works out in terms of locking.
>
> We have to change the locking order as mentioned above by me.  This
> change is already present in that patch, so maybe we add the check as
> suggested by you along with that patch.  Now, another thing we could
> do is to extract those changes from WAL patch, but I am not sure if it
> is worth the effort.

I'm not sure at this point, either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] _hash_addovflpage has a bug

2017-01-09 Thread Amit Kapila
On Mon, Jan 9, 2017 at 11:59 PM, Robert Haas  wrote:
> On Fri, Jan 6, 2017 at 11:32 PM, Amit Kapila  wrote:
>> On Sat, Jan 7, 2017 at 2:33 AM, Robert Haas  wrote:
>>> It looks to to me like the recent hash index changes have left
>>> _hash_addovflpage slightly broken.  I think that if that function
>>> reaches the point where it calls _hash_getbuf() to fetch the next page
>>> in the bucket chain, we also need to clear retain_pin.  Otherwise,
>>> we'll erroneously think that we're supposed to retain a pin even when
>>> the current page is an overflow page rather than the primary bucket
>>> page, which is wrong.
>>>
>>
>> How?  I think we are ensuring that we retain pin only if it is a
>> bucket page.  The check is as below:
>> if ((pageopaque->hasho_flag & LH_BUCKET_PAGE) && retain_pin)
>
> Oh, right.  So I guess there's no bug, but I still think that's pretty
> ugly.  How about:
>
> if (retain_pin)
> LockBuffer(buf, BUFFER_LOCK_UNLOCK);
> else
> _hash_relbuf(rel, buf);
> retain_pin = false;
>
> instead?
>

Yeah, we can write code that way, but then it is better to rely just
on retain_pin variable in the function and add an Assert for bucket
page whenever we are retaining pin.  How about doing something like
attached patch?

>>> A larger question, I suppose, is why this function wants to be certain
>>> of adding a new page even if someone else has already done so.  It
>>> seems like it might be smarter for it to just return the newly added
>>> page to the caller and let the caller try to use it.  _hash_doinsert
>>> has an Assert() that the tuple fits on the returned page, but that
>>> could be deleted.  As it stands, if two backends try to insert a tuple
>>> into the same full page at the same time, both of them will add an
>>> overflow page and we'll end up with 2 overflow pages each containing 1
>>> tuple.
>>
>> I think it is because in the current algorithm we first get an
>> overflow page and then add it to the end.  Now we can change it such
>> that first, we acquire the lock on the tail page, check if we still
>> need an overflow page, if so, then proceed, else return the already
>> added overflow page.
>
> For the WAL patch, this is all going to need to be atomic anyway,
> right?  Like, you have to allocate the overflow page and add it to the
> bucket chain as a single logged operation?

Yes.

>  Not sure exactly how that
> works out in terms of locking.
>

We have to change the locking order as mentioned above by me.  This
change is already present in that patch, so maybe we add the check as
suggested by you along with that patch.  Now, another thing we could
do is to extract those changes from WAL patch, but I am not sure if it
is worth the effort.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


improve_code_hash_add_ovfl_page_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] _hash_addovflpage has a bug

2017-01-09 Thread Robert Haas
On Fri, Jan 6, 2017 at 11:32 PM, Amit Kapila  wrote:
> On Sat, Jan 7, 2017 at 2:33 AM, Robert Haas  wrote:
>> It looks to to me like the recent hash index changes have left
>> _hash_addovflpage slightly broken.  I think that if that function
>> reaches the point where it calls _hash_getbuf() to fetch the next page
>> in the bucket chain, we also need to clear retain_pin.  Otherwise,
>> we'll erroneously think that we're supposed to retain a pin even when
>> the current page is an overflow page rather than the primary bucket
>> page, which is wrong.
>>
>
> How?  I think we are ensuring that we retain pin only if it is a
> bucket page.  The check is as below:
> if ((pageopaque->hasho_flag & LH_BUCKET_PAGE) && retain_pin)

Oh, right.  So I guess there's no bug, but I still think that's pretty
ugly.  How about:

if (retain_pin)
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
else
_hash_relbuf(rel, buf);
retain_pin = false;

instead?

>> A larger question, I suppose, is why this function wants to be certain
>> of adding a new page even if someone else has already done so.  It
>> seems like it might be smarter for it to just return the newly added
>> page to the caller and let the caller try to use it.  _hash_doinsert
>> has an Assert() that the tuple fits on the returned page, but that
>> could be deleted.  As it stands, if two backends try to insert a tuple
>> into the same full page at the same time, both of them will add an
>> overflow page and we'll end up with 2 overflow pages each containing 1
>> tuple.
>
> I think it is because in the current algorithm we first get an
> overflow page and then add it to the end.  Now we can change it such
> that first, we acquire the lock on the tail page, check if we still
> need an overflow page, if so, then proceed, else return the already
> added overflow page.

For the WAL patch, this is all going to need to be atomic anyway,
right?  Like, you have to allocate the overflow page and add it to the
bucket chain as a single logged operation?  Not sure exactly how that
works out in terms of locking.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] _hash_addovflpage has a bug

2017-01-06 Thread Amit Kapila
On Sat, Jan 7, 2017 at 2:33 AM, Robert Haas  wrote:
> It looks to to me like the recent hash index changes have left
> _hash_addovflpage slightly broken.  I think that if that function
> reaches the point where it calls _hash_getbuf() to fetch the next page
> in the bucket chain, we also need to clear retain_pin.  Otherwise,
> we'll erroneously think that we're supposed to retain a pin even when
> the current page is an overflow page rather than the primary bucket
> page, which is wrong.
>

How?  I think we are ensuring that we retain pin only if it is a
bucket page.  The check is as below:
if ((pageopaque->hasho_flag & LH_BUCKET_PAGE) && retain_pin)

> A larger question, I suppose, is why this function wants to be certain
> of adding a new page even if someone else has already done so.  It
> seems like it might be smarter for it to just return the newly added
> page to the caller and let the caller try to use it.  _hash_doinsert
> has an Assert() that the tuple fits on the returned page, but that
> could be deleted.  As it stands, if two backends try to insert a tuple
> into the same full page at the same time, both of them will add an
> overflow page and we'll end up with 2 overflow pages each containing 1
> tuple.
>

I think it is because in the current algorithm we first get an
overflow page and then add it to the end.  Now we can change it such
that first, we acquire the lock on the tail page, check if we still
need an overflow page, if so, then proceed, else return the already
added overflow page.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] _hash_addovflpage has a bug

2017-01-06 Thread Robert Haas
It looks to to me like the recent hash index changes have left
_hash_addovflpage slightly broken.  I think that if that function
reaches the point where it calls _hash_getbuf() to fetch the next page
in the bucket chain, we also need to clear retain_pin.  Otherwise,
we'll erroneously think that we're supposed to retain a pin even when
the current page is an overflow page rather than the primary bucket
page, which is wrong.

A larger question, I suppose, is why this function wants to be certain
of adding a new page even if someone else has already done so.  It
seems like it might be smarter for it to just return the newly added
page to the caller and let the caller try to use it.  _hash_doinsert
has an Assert() that the tuple fits on the returned page, but that
could be deleted.  As it stands, if two backends try to insert a tuple
into the same full page at the same time, both of them will add an
overflow page and we'll end up with 2 overflow pages each containing 1
tuple.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [BUGS][PATCH] BUG #14486: Inserting and selecting interval have different constraints

2017-01-05 Thread Vitaly Burovoy
On 1/5/17, Tom Lane  wrote:
> Vitaly Burovoy  writes:
>> On 1/5/17, Tom Lane  wrote:
>>> My point is that ideally, any value that can physically fit into struct
>>> Interval ought to be considered valid.  The fact that interval_out can't
>>> cope is a bug in interval_out, which ideally we would fix without
>>> artificially restricting the range of the datatype.
>
>> Am I correct that we are still limited by ECPG which is limited by the
>> system "tm" struct?
>
> I'm not really that concerned about whether ECPG can deal with enormous
> intervals.  If it bothers you, and you want to go fix it, more power to
> you --- but I think fixing the backend is much higher priority.

I really do not think it is possible since it uses system struct.
My point - ECPG is a part of Postgres, we can't fix server side and
leave the rest.

>> Also users who use a binary protocol can also use the "tm" struct and
>> can not expect overflow.
>
> If they store an enormous interval value, its really up to them not to
> choke on it when they read it back.  Not our problem.

Those who store and those who read it back can be different groups of people.
For example, the second group are libraries' writers.
My point - that interval is big enough and limiting it can help people
from errors.
Because finally
* either they have an error in their data and that change will not
break their reslut (since it is wrong because of wrong source data)
* or they still get overflow and they have to find a solution - with
that patch or without it
* or they get result in that 16% interval between fitting hours into
"int" and "seconds" in PG_INT64, get silently corrupted result because
of ECPG or a library.

A solution for the third case can be the same as for the second one
because these groups can be the same (just with different data).

>> The docs say[1] the highest value of the interval type is 178_000_000
>> years which is
>
> ... irrelevant really.  That's talking about the largest possible value of
> the "months" field, viz (2^31-1)/12.  Perhaps we ought to document the
> other field limits, but right now there's nothing there about how large
> the hours field can get.

No, it was bad explanation of a point - now there is nothing
documented about "seconds" part of the "interval" type.
And we can just declare limit. Users don't mind reason of limitation,
they just will plan workaround if their data do not fit in limits
whatever they are.

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [BUGS][PATCH] BUG #14486: Inserting and selecting interval have different constraints

2017-01-05 Thread Tom Lane
Vitaly Burovoy  writes:
> On 1/5/17, Tom Lane  wrote:
>> My point is that ideally, any value that can physically fit into struct
>> Interval ought to be considered valid.  The fact that interval_out can't
>> cope is a bug in interval_out, which ideally we would fix without
>> artificially restricting the range of the datatype.

> Am I correct that we are still limited by ECPG which is limited by the
> system "tm" struct?

I'm not really that concerned about whether ECPG can deal with enormous
intervals.  If it bothers you, and you want to go fix it, more power to
you --- but I think fixing the backend is much higher priority.

> Also users who use a binary protocol can also use the "tm" struct and
> can not expect overflow.

If they store an enormous interval value, its really up to them not to
choke on it when they read it back.  Not our problem.

> The docs say[1] the highest value of the interval type is 178_000_000
> years which is

... irrelevant really.  That's talking about the largest possible value of
the "months" field, viz (2^31-1)/12.  Perhaps we ought to document the
other field limits, but right now there's nothing there about how large
the hours field can get.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [BUGS][PATCH] BUG #14486: Inserting and selecting interval have different constraints

2017-01-05 Thread Vitaly Burovoy
On 1/5/17, Tom Lane  wrote:
> Vitaly Burovoy  writes:
>> On 1/5/17, Tom Lane  wrote:
>>> We could think about replacing interval2tm's output format with some
>>> other struct that uses a TimeOffset for hours and so cannot overflow.
>>> I'm not sure though how far the effects would propagate; it might be
>>> more work than we want to put into this.
>
>> If values with overflow are already in a database, what do you expect
>> a new output function should fix?
>
> My point is that ideally, any value that can physically fit into struct
> Interval ought to be considered valid.  The fact that interval_out can't
> cope is a bug in interval_out, which ideally we would fix without
> artificially restricting the range of the datatype.
>
> Now, the problem with that of course is that it's not only interval_out
> but multiple other places.  But your proposed patch also requires touching
> nearly everything interval-related, so I'm not sure it has any advantage
> that way over the less restrictive answer.

OK, I try to limit values from
9223372036854775807/36/24/365=292471.2 years
to
77309411328/36/24/365=245146.5 years

i.e. cut 47325 years or ~16%, but it is only for interval's part
measured in seconds.

Am I correct that we are still limited by ECPG which is limited by the
system "tm" struct?
Also users who use a binary protocol can also use the "tm" struct and
can not expect overflow.

The docs say[1] the highest value of the interval type is 178_000_000
years which is
significantly bigger than both of values (current and limited)
measured in seconds.
I don't think applying that limitation is a big deal.


I tried to find functions should be touched and there are not so many of them:

-- internal functions:
interval_check_overflow(Interval *i)
tm2interval(struct pg_tm * tm, fsec_t fsec, Interval *span)
interval2tm(Interval span, struct pg_tm * tm, fsec_t *fsec)
intervaltypmodleastfield(int32 typmod)

-- output functions:
interval_out(PG_FUNCTION_ARGS)  -- skip output
interval_send(PG_FUNCTION_ARGS) -- skip output

-- main input/computing functions:
interval_in(PG_FUNCTION_ARGS)   -- forgot to cover
interval_recv(PG_FUNCTION_ARGS) -- covered
make_interval(PG_FUNCTION_ARGS) -- covered

AdjustIntervalForTypmod(Interval *interval, int32 typmod)  -- can lead
to overflow

interval_justify_interval(PG_FUNCTION_ARGS)  -- can lead to overflow
interval_justify_hours(PG_FUNCTION_ARGS)  -- can lead to overflow
interval_justify_days(PG_FUNCTION_ARGS)  -- can lead to overflow
interval_um(PG_FUNCTION_ARGS)  -- covered
interval_pl(PG_FUNCTION_ARGS)  -- covered
interval_mi(PG_FUNCTION_ARGS)  -- covered
interval_mul(PG_FUNCTION_ARGS) -- covered

-- can not lead to overflow
interval_transform(PG_FUNCTION_ARGS)
interval_div(PG_FUNCTION_ARGS)
interval_trunc(PG_FUNCTION_ARGS)
interval_part(PG_FUNCTION_ARGS)

-- based on other functions
interval_scale(PG_FUNCTION_ARGS) -- based on AdjustIntervalForTypmod
interval_accum(PG_FUNCTION_ARGS) -- based on interval_pl
interval_accum_inv(PG_FUNCTION_ARGS) -- based on interval_mi
interval_avg(PG_FUNCTION_ARGS)   -- based on interval_pl
interval_combine(PG_FUNCTION_ARGS)   -- based on interval_pl
mul_d_interval(PG_FUNCTION_ARGS) -- based on interval_mul

-- checking, not changing:
interval_finite(PG_FUNCTION_ARGS)
interval_smaller(PG_FUNCTION_ARGS)
interval_larger(PG_FUNCTION_ARGS)
interval_cmp_value(const Interval *interval)
interval_cmp_internal(Interval *interval1, Interval *interval2)
interval_eq(PG_FUNCTION_ARGS)
interval_ne(PG_FUNCTION_ARGS)
interval_lt(PG_FUNCTION_ARGS)
interval_gt(PG_FUNCTION_ARGS)
interval_le(PG_FUNCTION_ARGS)
interval_ge(PG_FUNCTION_ARGS)
interval_cmp(PG_FUNCTION_ARGS)
interval_hash(PG_FUNCTION_ARGS)

-- not applicable:
intervaltypmodin(PG_FUNCTION_ARGS)
intervaltypmodout(PG_FUNCTION_ARGS)
timestamp_pl_interval(PG_FUNCTION_ARGS)
timestamp_mi_interval(PG_FUNCTION_ARGS)
timestamptz_pl_interval(PG_FUNCTION_ARGS)
timestamptz_mi_interval(PG_FUNCTION_ARGS)


=
In fact, only six of them (not "touching nearly everything
interval-related") should be covered:

* interval_in
* interval_out (yes, the SAMESIGN there is useless)
* AdjustIntervalForTypmod
* interval_justify_interval
* interval_justify_hours
* interval_justify_days


[1]https://www.postgresql.org/docs/current/static/datatype-datetime.html
-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [BUGS][PATCH] BUG #14486: Inserting and selecting interval have different constraints

2017-01-05 Thread Tom Lane
Vitaly Burovoy  writes:
> On 1/5/17, Tom Lane  wrote:
>> We could think about replacing interval2tm's output format with some
>> other struct that uses a TimeOffset for hours and so cannot overflow.
>> I'm not sure though how far the effects would propagate; it might be
>> more work than we want to put into this.

> If values with overflow are already in a database, what do you expect
> a new output function should fix?

My point is that ideally, any value that can physically fit into struct
Interval ought to be considered valid.  The fact that interval_out can't
cope is a bug in interval_out, which ideally we would fix without
artificially restricting the range of the datatype.

Now, the problem with that of course is that it's not only interval_out
but multiple other places.  But your proposed patch also requires touching
nearly everything interval-related, so I'm not sure it has any advantage
that way over the less restrictive answer.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [BUGS][PATCH] BUG #14486: Inserting and selecting interval have different constraints

2017-01-05 Thread Vitaly Burovoy
On 1/5/17, Tom Lane  wrote:
> Vitaly Burovoy  writes:
>>> I've written a patch which fixes that bug (in attachment).
>>> Should it be registered in the CF?
>
>> Oops. Forgot to attach the patch. Fixed.
>
> I suspect that many of these SAMESIGN() tests you've added are not
> actually adequate/useful.  That's only sufficient when the output could be
> at most a factor of 2 out-of-range.  If it could overflow past the sign
> bit then you need to work harder.

I disagree. These SAMESIGN() tests cover addition where result can not
be more than one out-of-range.
Multiplications are covered just before.

> (By the same token, the existing SAMESIGN test in interval2tm is
> wrong.)
>
> Possibly we should consider alternatives before plowing ahead in this
> direction, since adding guards to interval_in and interval computations
> doesn't help with oversize values that are already stored in a database.

We can do nothing with values are already stored in a database. But we
can prevent appearing them later.

> We could think about replacing interval2tm's output format with some
> other struct that uses a TimeOffset for hours and so cannot overflow.
> I'm not sure though how far the effects would propagate; it might be
> more work than we want to put into this.

If values with overflow are already in a database, what do you expect
a new output function should fix?

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [BUGS][PATCH] BUG #14486: Inserting and selecting interval have different constraints

2017-01-05 Thread Tom Lane
Vitaly Burovoy  writes:
>> I've written a patch which fixes that bug (in attachment).
>> Should it be registered in the CF?

> Oops. Forgot to attach the patch. Fixed.

I suspect that many of these SAMESIGN() tests you've added are not
actually adequate/useful.  That's only sufficient when the output could be
at most a factor of 2 out-of-range.  If it could overflow past the sign
bit then you need to work harder.

(By the same token, the existing SAMESIGN test in interval2tm is
wrong.)

Possibly we should consider alternatives before plowing ahead in this
direction, since adding guards to interval_in and interval computations
doesn't help with oversize values that are already stored in a database.
We could think about replacing interval2tm's output format with some
other struct that uses a TimeOffset for hours and so cannot overflow.
I'm not sure though how far the effects would propagate; it might be
more work than we want to put into this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [BUGS][PATCH] BUG #14486: Inserting and selecting interval have different constraints

2017-01-05 Thread Vitaly Burovoy
On 1/5/17, Vitaly Burovoy  wrote:
> On 1/4/17, Pantelis Theodosiou  wrote:
>> On Wed, Jan 4, 2017 at 3:03 PM,  wrote:
>>
>>> The following bug has been logged on the website:
>>>
>>> Bug reference:  14486
>>> Logged by:  Per Modin
>>> Email address:  web+postgre...@modin.io
>>> PostgreSQL version: 9.6.1
>>> Operating system:   Linux 303a92173594 4.8.15-moby #1 SMP Sat Dec 17 0
>>> Description:
>>>
>>> Found this bug in 9.4.8, tried it in docker towards psql 9.6.1 and it's
>>> in
>>> there as well. A minimum working example would be as follows:
>>>
>>> ```
>>> postgres=# CREATE TABLE tbl AS SELECT 9223372036854 * interval '1
>>> second'
>>> col; TABLE tbl;
>>> SELECT 1
>>> ERROR:  interval out of range
>>> ```
>>>
>>> ```
>>> postgres=# SELECT count(*) FROM tbl;
>>>  count
>>> ---
>>>  1
>>> (1 row)
>>> ```
>>>
>>> It seems that inserting and retrieving data have different constraints.
>>> As
>>> you
>>> can see from query 2, the data still gets inserted.
>>>
>>> ```
>>> postgres=# select version();
>>>  version
>>> 
>>> --
>>>  PostgreSQL 9.6.1 on x86_64-pc-linux-gnu, compiled by gcc (Debian
>>> 4.9.2-10)
>>> 4.9.2, 64-bit
>>> (1 row)
>>> ```
>>>
>>> Best regards,
>>> Per Modin
>>>
>>>
>> And these work without error:
>>
>> postgres=# select col - 9223372036854 * interval '1 second' from tbl ;
>>  ?column?
>> --
>>  00:00:00
>> (1 row)
>>
>> postgres=# select col from xx where col < interval '10 year' ;
>>  col
>> -
>> (0 rows)
>>
>
> Yes, it is a bug, but it is not a constraint, it is just different
> internal checks.
> Moreover even if a function does not raise an error, output could be wrong
> (pay attention to the duplicated '-' sign)
> postgres=# SELECT '-2147483647:59:59'::interval - '1s'::interval;
>   ?column?
> 
>  --2147483648:00:00
> (1 row)
>
> I've written a patch which fixes that bug (in attachment).
> Should it be registered in the CF?

Oops. Forgot to attach the patch. Fixed.

-- 
Best regards,
Vitaly Burovoy


0001-Add-check-for-overflow-to-interval-functions.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [BUGS][PATCH] BUG #14486: Inserting and selecting interval have different constraints

2017-01-05 Thread Vitaly Burovoy
On 1/4/17, Pantelis Theodosiou  wrote:
> On Wed, Jan 4, 2017 at 3:03 PM,  wrote:
>
>> The following bug has been logged on the website:
>>
>> Bug reference:  14486
>> Logged by:  Per Modin
>> Email address:  web+postgre...@modin.io
>> PostgreSQL version: 9.6.1
>> Operating system:   Linux 303a92173594 4.8.15-moby #1 SMP Sat Dec 17 0
>> Description:
>>
>> Found this bug in 9.4.8, tried it in docker towards psql 9.6.1 and it's
>> in
>> there as well. A minimum working example would be as follows:
>>
>> ```
>> postgres=# CREATE TABLE tbl AS SELECT 9223372036854 * interval '1 second'
>> col; TABLE tbl;
>> SELECT 1
>> ERROR:  interval out of range
>> ```
>>
>> ```
>> postgres=# SELECT count(*) FROM tbl;
>>  count
>> ---
>>  1
>> (1 row)
>> ```
>>
>> It seems that inserting and retrieving data have different constraints.
>> As
>> you
>> can see from query 2, the data still gets inserted.
>>
>> ```
>> postgres=# select version();
>>  version
>> 
>> --
>>  PostgreSQL 9.6.1 on x86_64-pc-linux-gnu, compiled by gcc (Debian
>> 4.9.2-10)
>> 4.9.2, 64-bit
>> (1 row)
>> ```
>>
>> Best regards,
>> Per Modin
>>
>>
> And these work without error:
>
> postgres=# select col - 9223372036854 * interval '1 second' from tbl ;
>  ?column?
> --
>  00:00:00
> (1 row)
>
> postgres=# select col from xx where col < interval '10 year' ;
>  col
> -
> (0 rows)
>

Yes, it is a bug, but it is not a constraint, it is just different
internal checks.
Moreover even if a function does not raise an error, output could be wrong
(pay attention to the duplicated '-' sign)
postgres=# SELECT '-2147483647:59:59'::interval - '1s'::interval;
  ?column?

 --2147483648:00:00
(1 row)

I've written a patch which fixes that bug (in attachment).
Should it be registered in the CF?
-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-28 Thread Teodor Sigaev

The more I think about it, the more I think gin is just an innocent
bystander, for which I just happen to have a particularly demanding
test.  I think something about snapshots and wrap-around may be
broken.


After 10 hours of running I've got
1587  XX000 2016-04-28 05:57:09.964 MSK:ERROR:  unexpected chunk number 0 
(expected 1) for toast value 10192986 in pg_toast_16424
1587  XX000 2016-04-28 05:57:09.964 MSK:CONTEXT:  automatic analyze of table 
"teodor.public.foo"
1587  0 2016-04-28 05:57:09.974 MSK:LOG:  JJ transaction ID wrap limit is 
2147484514, limited by database with OID 16384



I've pushed v5 of gin-alone-cleanup patch, many thanks to all participants.
Will work on backpatch

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-28 Thread Noah Misch
On Tue, Apr 26, 2016 at 08:22:03PM +0300, Teodor Sigaev wrote:
> >>Check my reasoning: In version 4 I added a remebering of tail of pending
> >>list into blknoFinish variable. And when we read page which was a tail on
> >>cleanup start then we sets cleanupFinish variable and after cleaning that
> >>page we will stop further cleanup. Any insert caused during cleanup will be
> >>placed after blknoFinish (corner case: in that page), so, vacuum should not
> >>miss tuples marked as deleted.
> >
> >Yes, I agree with the correctness of v4.  But I do wonder if we should
> >use that early stopping for vacuum and gin_clean_pending_list, rather
> Interesting, I've missed this possible option
> 
> >than just using it for user backends.  While I think correctness
> >allows it to stop early, since these routines are explicitly about
> >cleaning things up it seems like they should volunteer to clean the
> >whole thing.
> 
> I believe that autovacuum should not require guaranteed full clean up, only
> vacuum and gin_clean_pending_list() should do that. In all other cases it
> should stop early to prevent possible infinite cleanup. Patch attached.

Teodor, this item remains open after twenty-three days of you owning it.  At
this pace, 9.6beta1 will have an unreliable GIN implementation.  What things
must happen so you can close this item, and when will you do those things?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-26 Thread Teodor Sigaev

Check my reasoning: In version 4 I added a remebering of tail of pending
list into blknoFinish variable. And when we read page which was a tail on
cleanup start then we sets cleanupFinish variable and after cleaning that
page we will stop further cleanup. Any insert caused during cleanup will be
placed after blknoFinish (corner case: in that page), so, vacuum should not
miss tuples marked as deleted.


Yes, I agree with the correctness of v4.  But I do wonder if we should
use that early stopping for vacuum and gin_clean_pending_list, rather

Interesting, I've missed this possible option


than just using it for user backends.  While I think correctness
allows it to stop early, since these routines are explicitly about
cleaning things up it seems like they should volunteer to clean the
whole thing.


I believe that autovacuum should not require guaranteed full clean up, only 
vacuum and gin_clean_pending_list() should do that. In all other cases it should 
stop early to prevent possible infinite cleanup. Patch attached.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


gin_alone_cleanup-5.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-23 Thread Noah Misch
On Fri, Apr 22, 2016 at 02:03:01PM -0700, Jeff Janes wrote:
> On Thu, Apr 21, 2016 at 11:00 PM, Noah Misch  wrote:
> > Could you describe the test case in sufficient detail for Teodor to 
> > reproduce
> > your results?
> 
> [detailed description and attachments]

Thanks.

> The more I think about it, the more I think gin is just an innocent
> bystander, for which I just happen to have a particularly demanding
> test.  I think something about snapshots and wrap-around may be
> broken.

Based on your review, I think the next step on $subject is for Teodor to
either commit gin_alone_cleanup-4.patch or to tell us what he'd like to see or
do before committing that patch.

While the failures during your crash recovery testing may not even be 9.6
regressions, they suggest the presence of serious bugs.  At a minimum, the
community should confirm they are not 9.6 regressions before releasing 9.6.  I
have added an open item to that effect.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-22 Thread Jeff Janes
On Thu, Apr 21, 2016 at 11:00 PM, Noah Misch  wrote:
> On Mon, Apr 18, 2016 at 05:48:17PM +0300, Teodor Sigaev wrote:
>> >>Added, see attached patch (based on v3.1)
>> >
>> >With this applied, I am getting a couple errors I have not seen before
>> >after extensive crash recovery testing:
>> >ERROR:  attempted to delete invisible tuple
>> >ERROR:  unexpected chunk number 1 (expected 2) for toast value
>> >100338365 in pg_toast_16425
>> Huh, seems, it's not related to GIN at all... Indexes don't play with toast
>> machinery. The single place where this error can occur is a heap_delete() -
>> deleting already deleted tuple.
>
> Like you, I would not expect gin_alone_cleanup-4.patch to cause such an error.
> I get the impression Jeff has a test case that he had run in many iterations
> against the unpatched baseline.  I also get the impression that a similar or
> smaller number of its iterations against gin_alone_cleanup-4.patch triggered
> these two errors (once apiece, or multiple times?).  Jeff, is that right?

Because the unpatched baseline suffers from the bug which was the
original topic of this thread, I haven't been able to test against the
original baseline.  It would fail from that other bug before it ran
long enough to hit these ones.  Both errors start within a few minutes
of each other, but do not appear to be related other than that.  Once
they start happening, they occur repeatedly.


> Could you describe the test case in sufficient detail for Teodor to reproduce
> your results?

I spawn a swarm of processes to update a counter in a randomly chosen
row, selected via the gin index.  They do this as fast as possible
until the server intentionally crashes.  When it recovers, they
compare notes and see if the results are consistent.

But in this case, the problem is not inconsistent results, but rather
errors during the updating stage.

The patch introduces a mechanism to crash the server, a mechanism to
fast-forward the XID, and some additional logging that I sometimes
find useful (which I haven't been using in this case, but don't want
to rip it out)

The perl script implements the core of the test harness.

The shell script sets up the server (using hard coded paths for the
data and the binaries, so will need to be changed), and then calls the
perl script in a loop.

Running on an 8 core system, I've never seen it have a problem in less
than 9 hours of run time.

This produces copious amounts of logging to stdout.  This is how I
look through the log files to find this particular problem:

sh do.sh >& do.err &

tail -f do.err | fgrep ERROR

The more I think about it, the more I think gin is just an innocent
bystander, for which I just happen to have a particularly demanding
test.  I think something about snapshots and wrap-around may be
broken.

Cheers,

Jeff


count.pl
Description: Binary data


crash_REL9_6.patch
Description: Binary data


do.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-22 Thread Robert Haas
On Fri, Apr 22, 2016 at 2:20 PM, Jeff Janes  wrote:
>> Check my reasoning: In version 4 I added a remebering of tail of pending
>> list into blknoFinish variable. And when we read page which was a tail on
>> cleanup start then we sets cleanupFinish variable and after cleaning that
>> page we will stop further cleanup. Any insert caused during cleanup will be
>> placed after blknoFinish (corner case: in that page), so, vacuum should not
>> miss tuples marked as deleted.
>
> Yes, I agree with the correctness of v4.  But I do wonder if we should
> use that early stopping for vacuum and gin_clean_pending_list, rather
> than just using it for user backends.  While I think correctness
> allows it to stop early, since these routines are explicitly about
> cleaning things up it seems like they should volunteer to clean the
> whole thing.

Not to hold anyone's feet to the fire, but we are running out of time
before beta1, and this seems to be one of the more serious outstanding
issues.  I'd rather not release beta1 with known data corruption bugs,
but that means either fixing whatever is broken here or reverting the
commit that caused or revealed the problem pretty soon.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-22 Thread Jeff Janes
On Mon, Apr 18, 2016 at 7:48 AM, Teodor Sigaev  wrote:
>>> Added, see attached patch (based on v3.1)
>>
>>
>> With this applied, I am getting a couple errors I have not seen before
>> after extensive crash recovery testing:
>> ERROR:  attempted to delete invisible tuple
>> ERROR:  unexpected chunk number 1 (expected 2) for toast value
>> 100338365 in pg_toast_16425
>
> Huh, seems, it's not related to GIN at all... Indexes don't play with toast
> machinery. The single place where this error can occur is a heap_delete() -
> deleting already deleted tuple.

Those are two independent errors.  The delete invisible tuple error
doesn't occur on toast tables.

The actual statement triggering the error is an update statement.
Since it is showing up in the delete path, I assume it must be an
update where the new tuple goes to a different page.  But, if the
soon-to-be-old tuple is not visible, why is the update trying to
update it in the first place?  It seems like the different parts of
the code disagree on what is visible.

update foo set count=count+1,text_array=$1 where text_array @> $2

I agree it might not have anything to do with gin indexes, but I
didn't see it in testing anything else.  It might be a wrap-around
problem which for some reason only the gin test is efficient at
evoking. What I've done now is apply your v4 patch directly to
e95680832854cf300e64c1 and I am trying to see if it also has the
problem.  If that is clean, then it is probably an independently
introduced bug which is just getting exercised by the gin index stress
test.  If that is the case I'll try to git bisect forward, but that
could take weeks given the runtimes involved.  If that is dirty, then
maybe the FSM vacuuming patch introduced/uncovered more than one bug,
and should be reverted.


>> I've restarted the test harness with intentional crashes turned off,
>> to see if the problems are related to crash recovery or are more
>> generic than that.

I do not see the problem when there is no crash-recovery cycling involved.

I also do not see the problem when compiled under --enable-cassert,
but that could just be because compiling that way makes it too slow to
get in sufficient testing to hit the bug; before I gave up.

>>
>> I've never seen these particular problems before, so don't have much
>> insight into what might be going on or how to debug it.
>
> Check my reasoning: In version 4 I added a remebering of tail of pending
> list into blknoFinish variable. And when we read page which was a tail on
> cleanup start then we sets cleanupFinish variable and after cleaning that
> page we will stop further cleanup. Any insert caused during cleanup will be
> placed after blknoFinish (corner case: in that page), so, vacuum should not
> miss tuples marked as deleted.

Yes, I agree with the correctness of v4.  But I do wonder if we should
use that early stopping for vacuum and gin_clean_pending_list, rather
than just using it for user backends.  While I think correctness
allows it to stop early, since these routines are explicitly about
cleaning things up it seems like they should volunteer to clean the
whole thing.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-22 Thread Peter Geoghegan
On Thu, Nov 5, 2015 at 2:44 PM, Jeff Janes  wrote:
> The bug theoretically exists in 9.5, but it wasn't until 9.6 (commit
> e95680832854cf300e64c) that free pages were recycled aggressively
> enough that it actually becomes likely to be hit.

In other words: The bug could be in 9.5, but that hasn't been
conclusively demonstrated. Fair?

I'm not an expert on GIN at all; I know far more about B-Tree. But,
commit e95680832854cf300e64c seems a bit odd to me. I don't see any
argument for why it's okay that the recycling of pages can happen
immediately for the pending list, rather than requiring it to happen
at some time later with a safe interlock (some like B-Tree's use of
RecentGlobalXmin). The GIN README discusses a few such issues, but it
wasn't updated by the commit I mentioned, which I would have expected.

OTOH, after all of 10 minutes I can't see what's special about
ginvacuumcleanup() that makes its long established
RecordFreeIndexPage() call fundamentally safer, which if true would be
a surprisingly obvious defect to go unfixed for all these years. This
is what you yourself said about it, I think. I need to look at it
again with fresh eyes, but offhand having no safe interlock for the
well established RecordFreeIndexPage() call for GIN seems like an
implausibly obvious bug.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-22 Thread Noah Misch
On Mon, Apr 18, 2016 at 05:48:17PM +0300, Teodor Sigaev wrote:
> >>Added, see attached patch (based on v3.1)
> >
> >With this applied, I am getting a couple errors I have not seen before
> >after extensive crash recovery testing:
> >ERROR:  attempted to delete invisible tuple
> >ERROR:  unexpected chunk number 1 (expected 2) for toast value
> >100338365 in pg_toast_16425
> Huh, seems, it's not related to GIN at all... Indexes don't play with toast
> machinery. The single place where this error can occur is a heap_delete() -
> deleting already deleted tuple.

Like you, I would not expect gin_alone_cleanup-4.patch to cause such an error.
I get the impression Jeff has a test case that he had run in many iterations
against the unpatched baseline.  I also get the impression that a similar or
smaller number of its iterations against gin_alone_cleanup-4.patch triggered
these two errors (once apiece, or multiple times?).  Jeff, is that right?  If
so, until we determine the cause, we should assume the cause arrived in
gin_alone_cleanup-4.patch.  An error in pointer arithmetic or locking might
corrupt an unrelated buffer, leading to this symptom.

> >I've restarted the test harness with intentional crashes turned off,
> >to see if the problems are related to crash recovery or are more
> >generic than that.
> >
> >I've never seen these particular problems before, so don't have much
> >insight into what might be going on or how to debug it.

Could you describe the test case in sufficient detail for Teodor to reproduce
your results?

> Check my reasoning: In version 4 I added a remebering of tail of pending
> list into blknoFinish variable. And when we read page which was a tail on
> cleanup start then we sets cleanupFinish variable and after cleaning that
> page we will stop further cleanup. Any insert caused during cleanup will be
> placed after blknoFinish (corner case: in that page), so, vacuum should not
> miss tuples marked as deleted.

Would any hacker volunteer to review Teodor's reasoning here?

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-18 Thread Teodor Sigaev

Added, see attached patch (based on v3.1)


With this applied, I am getting a couple errors I have not seen before
after extensive crash recovery testing:
ERROR:  attempted to delete invisible tuple
ERROR:  unexpected chunk number 1 (expected 2) for toast value
100338365 in pg_toast_16425
Huh, seems, it's not related to GIN at all... Indexes don't play with toast 
machinery. The single place where this error can occur is a heap_delete() - 
deleting already deleted tuple.




I've restarted the test harness with intentional crashes turned off,
to see if the problems are related to crash recovery or are more
generic than that.

I've never seen these particular problems before, so don't have much
insight into what might be going on or how to debug it.
Check my reasoning: In version 4 I added a remebering of tail of pending list 
into blknoFinish variable. And when we read page which was a tail on cleanup 
start then we sets cleanupFinish variable and after cleaning that page we will 
stop further cleanup. Any insert caused during cleanup will be placed after 
blknoFinish (corner case: in that page), so, vacuum should not miss tuples 
marked as deleted.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-17 Thread Jeff Janes
On Tue, Apr 12, 2016 at 9:53 AM, Teodor Sigaev  wrote:
>
> With pending cleanup patch backend will try to get lock on metapage with
> ConditionalLockPage. Will it interrupt autovacum worker?


Correct, ConditionalLockPage should not interrupt the autovacuum worker.

>>
>> Alvaro's recommendation, to let the cleaner off the hook once it
>> passes the page which was the tail page at the time it started, would
>> prevent any process from getting pinned down indefinitely, but would
>> not prevent the size of the list from increasing without bound.  I
>> think that would probably be good enough, because the current
>> throttling behavior is purely accidentally and doesn't *guarantee* a
>> limit on the size of the pending list.
>
> Added, see attached patch (based on v3.1)

With this applied, I am getting a couple errors I have not seen before
after extensive crash recovery testing:

ERROR:  attempted to delete invisible tuple

ERROR:  unexpected chunk number 1 (expected 2) for toast value
100338365 in pg_toast_16425

I've restarted the test harness with intentional crashes turned off,
to see if the problems are related to crash recovery or are more
generic than that.

I've never seen these particular problems before, so don't have much
insight into what might be going on or how to debug it.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-15 Thread Teodor Sigaev

Alvaro's recommendation, to let the cleaner off the hook once it
passes the page which was the tail page at the time it started, would
prevent any process from getting pinned down indefinitely, but would

Added, see attached patch (based on v3.1)
If there is no objections I will aplly it at mondeay and backpatch all supported 
versions.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-12 Thread Teodor Sigaev

There are only 3 fundamental options I see, the cleaner can wait,
"help", or move on.

"Helping" is what it does now and is dangerous.

Moving on gives the above-discussed unthrottling problem.

Waiting has two problems.  The act of waiting will cause autovacuums
to be canceled, unless ugly hacks are deployed to prevent that.   If
we deploy those ugly hacks, then we have the problem that a user
backend will end up waiting on an autovacuum to finish the cleaning,
and the autovacuum is taking its sweet time due to
autovacuum_vacuum_cost_delay.  (The "helping" model avoids this
problem because the user backend can just catch up with and pass the
io-throttled autovac process)
With pending cleanup patch backend will try to get lock on metapage with 
ConditionalLockPage. Will it interrupt autovacum worker?





For completeness sake, a fourth option would to move on, but only
after inserting the tuple directly into the main index structure
(rather then the pending list) like would be done with fastupdate off,
once the pending list is already oversized.  This is my favorite, but
there is no chance of it going into 9.6, much less being backpatched.

Agree, will reimplement for 9.7



Alvaro's recommendation, to let the cleaner off the hook once it
passes the page which was the tail page at the time it started, would
prevent any process from getting pinned down indefinitely, but would
not prevent the size of the list from increasing without bound.  I
think that would probably be good enough, because the current
throttling behavior is purely accidentally and doesn't *guarantee* a
limit on the size of the pending list.

Added, see attached patch (based on v3.1)
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


gin_alone_cleanup-4.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-12 Thread Teodor Sigaev

This restricts the memory used by ordinary backends when doing the
cleanup to be work_mem. Shouldn't we let them use
maintenance_work_mem? Only one backend can be doing this clean up of a
given index at any given time, so we don't need to worry about many
concurrent allocations of maintenance_work_mem.  This seems very
similar in spirit to index creation, where a backend is allowed to use
maintenance_work_mem.
Because it could be a several indexes in one pg instance. And each cleaner could 
eat maintenance_work_mem.





Also, do we plan on backpatching this?  While there are no known

Yes


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-12 Thread Noah Misch
On Thu, Apr 07, 2016 at 05:53:54PM -0700, Jeff Janes wrote:
> On Thu, Apr 7, 2016 at 4:33 PM, Tom Lane  wrote:
> > Jeff Janes  writes:
> >> To summarize the behavior change:
> >
> >> In the released code, an inserting backend that violates the pending
> >> list limit will try to clean the list, even if it is already being
> >> cleaned.  It won't accomplish anything useful, but will go through the
> >> motions until eventually it runs into a page the lead cleaner has
> >> deleted, at which point it realizes there is another cleaner and it
> >> stops.  This acts as a natural throttle to how fast insertions can
> >> take place into an over-sized pending list.
> >
> > Right.
> >
> >> The proposed change removes that throttle, so that inserters will
> >> immediately see there is already a cleaner and just go back about
> >> their business.  Due to that, unthrottled backends could add to the
> >> pending list faster than the cleaner can clean it, leading to
> >> unbounded growth in the pending list and could cause a user backend to
> >> becoming apparently unresponsive to the user, indefinitely.  That is
> >> scary to backpatch.
> >
> > It's scary to put into HEAD, either.  What if we simply don't take
> > that specific behavioral change?  It doesn't seem like this is an
> > essential part of fixing the bug as you described it.  (Though I've
> > not read the patch, so maybe I'm just missing the connection.)
> 
> There are only 3 fundamental options I see, the cleaner can wait,
> "help", or move on.
> 
> "Helping" is what it does now and is dangerous.
> 
> Moving on gives the above-discussed unthrottling problem.
> 
> Waiting has two problems.  The act of waiting will cause autovacuums
> to be canceled, unless ugly hacks are deployed to prevent that.   If
> we deploy those ugly hacks, then we have the problem that a user
> backend will end up waiting on an autovacuum to finish the cleaning,
> and the autovacuum is taking its sweet time due to
> autovacuum_vacuum_cost_delay.

Teodor, this thread has been quiet for four days, and the deadline to fix this
open item expired 23 hours ago.  Do you have a new plan for fixing it?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-07 Thread Jeff Janes
On Thu, Apr 7, 2016 at 4:33 PM, Tom Lane  wrote:
> Jeff Janes  writes:
>> To summarize the behavior change:
>
>> In the released code, an inserting backend that violates the pending
>> list limit will try to clean the list, even if it is already being
>> cleaned.  It won't accomplish anything useful, but will go through the
>> motions until eventually it runs into a page the lead cleaner has
>> deleted, at which point it realizes there is another cleaner and it
>> stops.  This acts as a natural throttle to how fast insertions can
>> take place into an over-sized pending list.
>
> Right.
>
>> The proposed change removes that throttle, so that inserters will
>> immediately see there is already a cleaner and just go back about
>> their business.  Due to that, unthrottled backends could add to the
>> pending list faster than the cleaner can clean it, leading to
>> unbounded growth in the pending list and could cause a user backend to
>> becoming apparently unresponsive to the user, indefinitely.  That is
>> scary to backpatch.
>
> It's scary to put into HEAD, either.  What if we simply don't take
> that specific behavioral change?  It doesn't seem like this is an
> essential part of fixing the bug as you described it.  (Though I've
> not read the patch, so maybe I'm just missing the connection.)

There are only 3 fundamental options I see, the cleaner can wait,
"help", or move on.

"Helping" is what it does now and is dangerous.

Moving on gives the above-discussed unthrottling problem.

Waiting has two problems.  The act of waiting will cause autovacuums
to be canceled, unless ugly hacks are deployed to prevent that.   If
we deploy those ugly hacks, then we have the problem that a user
backend will end up waiting on an autovacuum to finish the cleaning,
and the autovacuum is taking its sweet time due to
autovacuum_vacuum_cost_delay.  (The "helping" model avoids this
problem because the user backend can just catch up with and pass the
io-throttled autovac process)

For completeness sake, a fourth option would to move on, but only
after inserting the tuple directly into the main index structure
(rather then the pending list) like would be done with fastupdate off,
once the pending list is already oversized.  This is my favorite, but
there is no chance of it going into 9.6, much less being backpatched.


Alvaro's recommendation, to let the cleaner off the hook once it
passes the page which was the tail page at the time it started, would
prevent any process from getting pinned down indefinitely, but would
not prevent the size of the list from increasing without bound.  I
think that would probably be good enough, because the current
throttling behavior is purely accidentally and doesn't *guarantee* a
limit on the size of the pending list.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-07 Thread Tom Lane
Jeff Janes  writes:
> To summarize the behavior change:

> In the released code, an inserting backend that violates the pending
> list limit will try to clean the list, even if it is already being
> cleaned.  It won't accomplish anything useful, but will go through the
> motions until eventually it runs into a page the lead cleaner has
> deleted, at which point it realizes there is another cleaner and it
> stops.  This acts as a natural throttle to how fast insertions can
> take place into an over-sized pending list.

Right.

> The proposed change removes that throttle, so that inserters will
> immediately see there is already a cleaner and just go back about
> their business.  Due to that, unthrottled backends could add to the
> pending list faster than the cleaner can clean it, leading to
> unbounded growth in the pending list and could cause a user backend to
> becoming apparently unresponsive to the user, indefinitely.  That is
> scary to backpatch.

It's scary to put into HEAD, either.  What if we simply don't take
that specific behavioral change?  It doesn't seem like this is an
essential part of fixing the bug as you described it.  (Though I've
not read the patch, so maybe I'm just missing the connection.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-07 Thread Alvaro Herrera
Jeff Janes wrote:

> The proposed change removes that throttle, so that inserters will
> immediately see there is already a cleaner and just go back about
> their business.  Due to that, unthrottled backends could add to the
> pending list faster than the cleaner can clean it, leading to
> unbounded growth in the pending list and could cause a user backend to
> becoming apparently unresponsive to the user, indefinitely.  That is
> scary to backpatch.

Can we install a protection against that?  For instance, the cleaner
only cleans up the entries that were present when it started, and if
other items are inserted later into the pending list, they are left for
a future cleaner.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-07 Thread Jeff Janes
On Wed, Apr 6, 2016 at 9:52 AM, Teodor Sigaev  wrote:

> I'm inclining to push v3.1 as one of two winners by size/performance and,
> unlike to pending lock patch, it doesn't change an internal logic of lock
> machinery.

This restricts the memory used by ordinary backends when doing the
cleanup to be work_mem. Shouldn't we let them use
maintenance_work_mem? Only one backend can be doing this clean up of a
given index at any given time, so we don't need to worry about many
concurrent allocations of maintenance_work_mem.  This seems very
similar in spirit to index creation, where a backend is allowed to use
maintenance_work_mem.

Also, do we plan on backpatching this?  While there are no known
instances of anyone hitting this bug in any released code, still it is
a bug.  There are plenty of reports of corrupted indexes which never
get properly diagnosed, and for all we know some of them might be due
to this.  On the other hand, it is a substantial behavior change,
which is scary to backpatch.  (But all other proposed solutions also
have behavior changes of one kind or another.)

To summarize the bugs in the already-released code:

If a vacuum's dead list includes a tuple which is still in the pending
list, and the vacuum skips the pending list cleanup because it detects
someone else is already working on it, then the vacuum could remove
that tuple from the table, even though there is still a reference to
it in the index.

A page could be deleted from the pending list, put on the free space
map, then reused.  When the linked list of pending pages is followed
onto this reused page, it will fail to recognize is as being deleted
and recycled, and bad things will happen.  This is quite likely to
happen in 9.6 because of the new aggressive recycling of pages.  But
I've seen no argument as to why it is impossible to hit this bug in
released code, it seems like it would be possible to hit it but just
on a very rare basis.

To summarize the behavior change:

In the released code, an inserting backend that violates the pending
list limit will try to clean the list, even if it is already being
cleaned.  It won't accomplish anything useful, but will go through the
motions until eventually it runs into a page the lead cleaner has
deleted, at which point it realizes there is another cleaner and it
stops.  This acts as a natural throttle to how fast insertions can
take place into an over-sized pending list.

The proposed change removes that throttle, so that inserters will
immediately see there is already a cleaner and just go back about
their business.  Due to that, unthrottled backends could add to the
pending list faster than the cleaner can clean it, leading to
unbounded growth in the pending list and could cause a user backend to
becoming apparently unresponsive to the user, indefinitely.  That is
scary to backpatch.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-06 Thread Teodor Sigaev

I've tested the v2, v3 and v3.1 of the patch, to see if there are any
differences. The v2 no longer applies, so I tested it on ee943004. The following
table shows the total duration of the data load, and also sizes of the two GIN
indexes.

duration (sec)  subject  body
   ---
   v2  1290 23 MB   684 MB
   v3  1360 24 MB   487 MB
   v3.11360 24 MB   488 MB

Thank you very much.

Hmm. v3 is a just rebased version of v2, v3.1 hasn't unlock/lock cycle during 
cleanup, just to become similar to Jeff's pending lock patch. In theory, v2 and 
v3 should be very close, v3.1 should be close to pending_lock.


I'm inclining to push v3.1 as one of two winners by size/performance and, unlike 
to pending lock patch, it doesn't change an internal logic of lock machinery.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-05 Thread Tomas Vondra

Hi,

On 04/04/2016 02:25 PM, Tomas Vondra wrote:

On 04/04/2016 02:06 PM, Teodor Sigaev wrote:

The above-described topic is currently a PostgreSQL 9.6 open item.
Teodor,
since you committed the patch believed to have created it, you own
this open
item.  If that responsibility lies elsewhere, please let us know whose
responsibility it is to fix this.  Since new open items may be
discovered at
any time and I want to plan to have them all fixed well in advance of
the ship
date, I will appreciate your efforts toward speedy resolution.  Please
present, within 72 hours, a plan to fix the defect within seven days
of this
message.  Thanks.


I'm waiting of Tomas testing. Suppose, it isn't possible now? so, will
do myself, after that I will publish results.


Ah, damn. This completely slipped from my TODO list. I'll rerun the
tests either today or tomorrow, and report the results here.


So, I've done some testing on the patch, and in short it seems fine to 
me. It fixes the bug (no data corruption observed), and the performance 
seems fine too.


I've tested the v2, v3 and v3.1 of the patch, to see if there are any 
differences. The v2 no longer applies, so I tested it on ee943004. The 
following table shows the total duration of the data load, and also 
sizes of the two GIN indexes.


   duration (sec)  subject  body
  ---
  v2  1290 23 MB   684 MB
  v3  1360 24 MB   487 MB
  v3.11360 24 MB   488 MB

So, looks good to me.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-04 Thread Tomas Vondra

On 04/04/2016 02:06 PM, Teodor Sigaev wrote:

The above-described topic is currently a PostgreSQL 9.6 open item.
Teodor,
since you committed the patch believed to have created it, you own
this open
item.  If that responsibility lies elsewhere, please let us know whose
responsibility it is to fix this.  Since new open items may be
discovered at
any time and I want to plan to have them all fixed well in advance of
the ship
date, I will appreciate your efforts toward speedy resolution.  Please
present, within 72 hours, a plan to fix the defect within seven days
of this
message.  Thanks.


I'm waiting of Tomas testing. Suppose, it isn't possible now? so, will
do myself, after that I will publish results.


Ah, damn. This completely slipped from my TODO list. I'll rerun the 
tests either today or tomorrow, and report the results here.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-04 Thread Teodor Sigaev

The above-described topic is currently a PostgreSQL 9.6 open item.  Teodor,
since you committed the patch believed to have created it, you own this open
item.  If that responsibility lies elsewhere, please let us know whose
responsibility it is to fix this.  Since new open items may be discovered at
any time and I want to plan to have them all fixed well in advance of the ship
date, I will appreciate your efforts toward speedy resolution.  Please
present, within 72 hours, a plan to fix the defect within seven days of this
message.  Thanks.


I'm waiting of Tomas testing. Suppose, it isn't possible now? so, will do 
myself, after that I will publish results.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-04 Thread Noah Misch
On Thu, Feb 25, 2016 at 11:19:20AM -0800, Jeff Janes wrote:
> On Wed, Feb 24, 2016 at 8:51 AM, Teodor Sigaev  wrote:
> > Thank you for remembering this problem, at least for me.
> >
> >>> Well, turns out there's a quite significant difference, actually. The
> >>> index sizes I get (quite stable after multiple runs):
> >>>
> >>> 9.5 : 2428 MB
> >>> 9.6 + alone cleanup : 730 MB
> >>> 9.6 + pending lock : 488 MB
> >
> > Interesting, I don't see why alone_cleanup and pending_lock are so differ.
> > I'd like to understand that, but does somebody have an good theory?
> 
> Under my patch, anyone who wanted to do a clean up and detected
> someone else was doing one would wait for the concurrent one to end.
> (This is more consistent with the existing behavior, I just made it so
> they don't do any damage while they wait.)
> 
> Under your patch, if a backend wants to do a clean up and detects
> someone else is already doing one, it would just skip the clean up and
> proceed on with whatever it was doing.  This allows one process
> (hopefully a vacuum, but maybe a user backend) to get pinned down
> indefinitely, as other processes keep putting stuff onto the end of
> the pending_list with no throttle.
> 
> Since the freespace recycling only takes place once the list is
> completely cleaned, allowing some processes to add to the end while
> one poor process is trying to clean can lead to less effective
> recycling.
> 
> That is my theory, anyway.

[This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item.  Teodor,
since you committed the patch believed to have created it, you own this open
item.  If that responsibility lies elsewhere, please let us know whose
responsibility it is to fix this.  Since new open items may be discovered at
any time and I want to plan to have them all fixed well in advance of the ship
date, I will appreciate your efforts toward speedy resolution.  Please
present, within 72 hours, a plan to fix the defect within seven days of this
message.  Thanks.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-02-25 Thread Jeff Janes
On Wed, Feb 24, 2016 at 8:51 AM, Teodor Sigaev  wrote:
> Thank you for remembering this problem, at least for me.
>
>>> Well, turns out there's a quite significant difference, actually. The
>>> index sizes I get (quite stable after multiple runs):
>>>
>>> 9.5 : 2428 MB
>>> 9.6 + alone cleanup : 730 MB
>>> 9.6 + pending lock : 488 MB
>
> Interesting, I don't see why alone_cleanup and pending_lock are so differ.
> I'd like to understand that, but does somebody have an good theory?

Under my patch, anyone who wanted to do a clean up and detected
someone else was doing one would wait for the concurrent one to end.
(This is more consistent with the existing behavior, I just made it so
they don't do any damage while they wait.)

Under your patch, if a backend wants to do a clean up and detects
someone else is already doing one, it would just skip the clean up and
proceed on with whatever it was doing.  This allows one process
(hopefully a vacuum, but maybe a user backend) to get pinned down
indefinitely, as other processes keep putting stuff onto the end of
the pending_list with no throttle.

Since the freespace recycling only takes place once the list is
completely cleaned, allowing some processes to add to the end while
one poor process is trying to clean can lead to less effective
recycling.

That is my theory, anyway.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-02-25 Thread Tomas Vondra

Hi,

On 02/25/2016 05:32 PM, Teodor Sigaev wrote:

Well, turns out there's a quite significant difference, actually. The
index sizes I get (quite stable after multiple runs):

9.5 : 2428 MB
9.6 + alone cleanup : 730 MB
9.6 + pending lock : 488 MB


In attach modified alone_cleanup patch which doesn't break cleanup
process as it does pending_lock patch but attached patch doesn't touch a
lock management.

Tomas. if you can, pls, repeat test with this patch. If not, I will try
to do it, but later.


I'll do that once the system I used for that gets available - right now 
it's running other benchmarks.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-02-25 Thread Teodor Sigaev

Well, turns out there's a quite significant difference, actually. The
index sizes I get (quite stable after multiple runs):

9.5 : 2428 MB
9.6 + alone cleanup : 730 MB
9.6 + pending lock : 488 MB


In attach modified alone_cleanup patch which doesn't break cleanup process as it 
does pending_lock patch but attached patch doesn't touch a lock management.


Tomas. if you can, pls, repeat test with this patch. If not, I will try to do 
it, but later.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


gin_alone_cleanup-3.1.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-02-24 Thread Teodor Sigaev

Thank you for remembering this problem, at least for me.


Well, turns out there's a quite significant difference, actually. The
index sizes I get (quite stable after multiple runs):

9.5 : 2428 MB
9.6 + alone cleanup : 730 MB
9.6 + pending lock : 488 MB
Interesting, I don't see why alone_cleanup and pending_lock are so differ. I'd 
like to understand that, but does somebody have an good theory? The single point 
in pending_lock patch is an suspicious exception in ProcSleep, this exception 
may cause problem in future.




So that's quite a significant difference, I guess. The load duration for
each version look like this:

9.5 : 1415 seconds
9.6 + alone cleanup : 1310 seconds
9.6 + pending lock  : 1380 seconds

I'd say I'm happy with sacrificing ~5% of time in exchange for ~35%
reduction of index size.
I think, alone_cleanup patch is faster because regular insert could break its 
cleanup process if autovacuum waits to begin work on cleanup. So, insert process 
could returns earlier from pending cleanup process.


In attachment just rebased v2 alone_cleanup patch.
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


gin_alone_cleanup-3.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-02-24 Thread Tomas Vondra

On 02/24/2016 06:56 AM, Robert Haas wrote:

On Wed, Feb 24, 2016 at 9:17 AM, Tomas Vondra
 wrote:

...

Are we going to anything about this? While the bug is present in 9.5 (and
possibly other versions), fixing it before 9.6 gets out seems important
because reproducing it there is rather trivial (while I've been unable to
reproduce it on 9.5).


I'm not going to promise to commit anything here, because GIN is not
usually my area, but could you provide a link to the email that
contains the patch you think should be committed?


Sure. There are actually two candidate patches in two separate threads, 
I'm nots sure which one is better. Based on the testing both seem to fix 
the issue and the "pending lock" patch produces much smaller indexes (at 
least in my benchmark):


[1] http://www.postgresql.org/message-id/56041b26.2040...@sigaev.ru
[2] 
http://www.postgresql.org/message-id/CAMkU=1w7uu1gz8n0bxmykrlgth-uph+gphfhmneryzpcv7f...@mail.gmail.com


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-02-23 Thread Robert Haas
On Wed, Feb 24, 2016 at 9:17 AM, Tomas Vondra
 wrote:
>> Well, turns out there's a quite significant difference, actually. The
>> index sizes I get (quite stable after multiple runs):
>>
>> 9.5 : 2428 MB
>> 9.6 + alone cleanup : 730 MB
>> 9.6 + pending lock : 488 MB
>>
>> So that's quite a significant difference, I guess. The load duration for
>> each version look like this:
>>
>> 9.5 : 1415 seconds
>> 9.6 + alone cleanup : 1310 seconds
>> 9.6 + pending lock  : 1380 seconds
>>
>> I'd say I'm happy with sacrificing ~5% of time in exchange for ~35%
>> reduction of index size.
>>
>> The size of the index on 9.5 after VACUUM FULL (so pretty much the
>> smallest index possible) is 440MB, which suggests the "pending lock"
>> patch does a quite good job.
>>
>> The gin_metapage_info at the end of one of the runs (pretty much all the
>> runs look exactly the same) looks like this:
>>
>>pending lock   alone cleanup  9.5
>> 
>>   pending_head2   2   310460
>>   pending_tail  338 345   310806
>>   tail_free_size812 812  812
>>   n_pending_pages   330 339  347
>>   n_pending_tuples 10031037 1059
>>   n_total_pages   2   22
>>   n_entry_pages   1   11
>>   n_data_pages0   00
>>   n_entries   0   00
>>   version 2   22
>>
>> So almost no difference, except for the pending_* attributes, and even
>> in that case the values are only different for 9.5 branch. Not sure what
>> conclusion to draw from this - maybe it's necessary to collect the
>> function input while the load is running (but that'd be tricky to
>> process, I guess).
>
> Are we going to anything about this? While the bug is present in 9.5 (and
> possibly other versions), fixing it before 9.6 gets out seems important
> because reproducing it there is rather trivial (while I've been unable to
> reproduce it on 9.5).

I'm not going to promise to commit anything here, because GIN is not
usually my area, but could you provide a link to the email that
contains the patch you think should be committed?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-02-23 Thread Tomas Vondra

Hi,

On 01/05/2016 10:38 AM, Tomas Vondra wrote:

Hi,


...


There shouldn't be a difference between the two approaches (although I
guess there could be if one left a larger pending list than the other,
as pending lists is very space inefficient), but since you included
9.5 in your test I thought it would be interesting to see how either
patched version under 9.6 compared to 9.5.


Well, turns out there's a quite significant difference, actually. The
index sizes I get (quite stable after multiple runs):

9.5 : 2428 MB
9.6 + alone cleanup : 730 MB
9.6 + pending lock : 488 MB

So that's quite a significant difference, I guess. The load duration for
each version look like this:

9.5 : 1415 seconds
9.6 + alone cleanup : 1310 seconds
9.6 + pending lock  : 1380 seconds

I'd say I'm happy with sacrificing ~5% of time in exchange for ~35%
reduction of index size.

The size of the index on 9.5 after VACUUM FULL (so pretty much the
smallest index possible) is 440MB, which suggests the "pending lock"
patch does a quite good job.

The gin_metapage_info at the end of one of the runs (pretty much all the
runs look exactly the same) looks like this:

   pending lock   alone cleanup  9.5

  pending_head2   2   310460
  pending_tail  338 345   310806
  tail_free_size812 812  812
  n_pending_pages   330 339  347
  n_pending_tuples 10031037 1059
  n_total_pages   2   22
  n_entry_pages   1   11
  n_data_pages0   00
  n_entries   0   00
  version 2   22

So almost no difference, except for the pending_* attributes, and even
in that case the values are only different for 9.5 branch. Not sure what
conclusion to draw from this - maybe it's necessary to collect the
function input while the load is running (but that'd be tricky to
process, I guess).


Are we going to anything about this? While the bug is present in 9.5 
(and possibly other versions), fixing it before 9.6 gets out seems 
important because reproducing it there is rather trivial (while I've 
been unable to reproduce it on 9.5).


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-01-05 Thread Tomas Vondra

Hi,

On 12/23/2015 09:33 PM, Jeff Janes wrote:

On Mon, Dec 21, 2015 at 11:51 AM, Tomas Vondra
 wrote:



On 12/21/2015 07:41 PM, Jeff Janes wrote:


On Sat, Dec 19, 2015 at 3:19 PM, Tomas Vondra
 wrote:



...


So both patches seem to do the trick, but (2) is faster. Not sure
if this is expected. (BTW all the results are without asserts
enabled).



Do you know what the size of the pending list was at the end of each
test?

I think last one may be faster because it left a large mess behind
that someone needs to clean up later.



No. How do I measure it?


pageinspect's gin_metapage_info, or pgstattuple's pgstatginindex


Hmmm, so this turns out not very useful, because at the end the data I 
get from gin_metapage_info is almost exactly the same for both patches 
(more details below).






Also, do you have the final size of the indexes in each case?



No, I haven't realized the patches do affect that, so I haven't measured it.


There shouldn't be a difference between the two approaches (although I
guess there could be if one left a larger pending list than the other,
as pending lists is very space inefficient), but since you included
9.5 in your test I thought it would be interesting to see how either
patched version under 9.6 compared to 9.5.


Well, turns out there's a quite significant difference, actually. The 
index sizes I get (quite stable after multiple runs):


   9.5 : 2428 MB
   9.6 + alone cleanup : 730 MB
   9.6 + pending lock : 488 MB

So that's quite a significant difference, I guess. The load duration for 
each version look like this:


   9.5 : 1415 seconds
   9.6 + alone cleanup : 1310 seconds
   9.6 + pending lock  : 1380 seconds

I'd say I'm happy with sacrificing ~5% of time in exchange for ~35% 
reduction of index size.


The size of the index on 9.5 after VACUUM FULL (so pretty much the 
smallest index possible) is 440MB, which suggests the "pending lock" 
patch does a quite good job.


The gin_metapage_info at the end of one of the runs (pretty much all the 
runs look exactly the same) looks like this:


  pending lock   alone cleanup  9.5

 pending_head2   2   310460
 pending_tail  338 345   310806
 tail_free_size812 812  812
 n_pending_pages   330 339  347
 n_pending_tuples 10031037 1059
 n_total_pages   2   22
 n_entry_pages   1   11
 n_data_pages0   00
 n_entries   0   00
 version 2   22

So almost no difference, except for the pending_* attributes, and even 
in that case the values are only different for 9.5 branch. Not sure what 
conclusion to draw from this - maybe it's necessary to collect the 
function input while the load is running (but that'd be tricky to 
process, I guess).


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2015-12-23 Thread Jeff Janes
On Mon, Dec 21, 2015 at 11:51 AM, Tomas Vondra
 wrote:
>
>
> On 12/21/2015 07:41 PM, Jeff Janes wrote:
>>
>> On Sat, Dec 19, 2015 at 3:19 PM, Tomas Vondra
>>  wrote:
>
>
> ...
>
>>> So both patches seem to do the trick, but (2) is faster. Not sure
>>> if this is expected. (BTW all the results are without asserts
>>> enabled).
>>
>>
>> Do you know what the size of the pending list was at the end of each
>> test?
>>
>> I think last one may be faster because it left a large mess behind
>> that someone needs to clean up later.
>
>
> No. How do I measure it?

pageinspect's gin_metapage_info, or pgstattuple's pgstatginindex


>
>>
>> Also, do you have the final size of the indexes in each case?
>
>
> No, I haven't realized the patches do affect that, so I haven't measured it.

There shouldn't be a difference between the two approaches (although I
guess there could be if one left a larger pending list than the other,
as pending lists is very space inefficient), but since you included
9.5 in your test I thought it would be interesting to see how either
patched version under 9.6 compared to 9.5.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2015-12-21 Thread Tomas Vondra



On 12/21/2015 07:41 PM, Jeff Janes wrote:

On Sat, Dec 19, 2015 at 3:19 PM, Tomas Vondra
 wrote:


...


So both patches seem to do the trick, but (2) is faster. Not sure
if this is expected. (BTW all the results are without asserts
enabled).


Do you know what the size of the pending list was at the end of each
test?

I think last one may be faster because it left a large mess behind
that someone needs to clean up later.


No. How do I measure it?



Also, do you have the final size of the indexes in each case?


No, I haven't realized the patches do affect that, so I haven't measured it.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2015-12-21 Thread Jeff Janes
On Sat, Dec 19, 2015 at 3:19 PM, Tomas Vondra
 wrote:
> Hi,
>
> On 11/06/2015 02:09 AM, Tomas Vondra wrote:
>>
>> Hi,
>>
>> On 11/06/2015 01:05 AM, Jeff Janes wrote:
>>>
>>> On Thu, Nov 5, 2015 at 3:50 PM, Tomas Vondra
>>>  wrote:
>>
>> ...


 I can do that - I see there are three patches in the two threads:

1) gin_pending_lwlock.patch (Jeff Janes)
2) gin_pending_pagelock.patch (Jeff Janes)
3) gin_alone_cleanup-2.patch (Teodor Sigaev)

 Should I test all of them? Or is (1) obsoleted by (2) for example?
>>>
>>>
>>> 1 is obsolete.  Either 2 or 3 should fix the bug, provided this is the
>>> bug you are seeing.  They have different performance side effects, but
>>> as far as fixing the bug they should be equivalent.
>>
>>
>> OK, I'll do testing with those two patches then, and I'll also note the
>> performance difference (the data load was very stable). Of course, it's
>> just one particular workload.
>>
>> I'll post an update after the weekend.
>
>
> I've finally managed to test the two patches. Sorry for the delay.
>
> I've repeated the workload on 9.5, 9.6 and 9.6 with (1) or (2), looking for
> lockups or data corruption. I've also measured duration of the script, to
> see what's the impact on performance. The configuration (shared_buffers,
> work_mem ...) was exactly the same in all cases.
>
> 9.5 : runtime ~1380 seconds
> 9.6 : runtime ~1380 seconds (but lockups and data corruption)
> 9.6+(1) : runtime ~1380 seconds
> 9.6+(2) : runtime ~1290 seconds
>
> So both patches seem to do the trick, but (2) is faster. Not sure if this is
> expected. (BTW all the results are without asserts enabled).

Do you know what the size of the pending list was at the end of each test?

I think last one may be faster because it left a large mess behind
that someone needs to clean up later.

Also, do you have the final size of the indexes in each case?

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2015-12-19 Thread Tomas Vondra

Hi,

On 11/06/2015 02:09 AM, Tomas Vondra wrote:

Hi,

On 11/06/2015 01:05 AM, Jeff Janes wrote:

On Thu, Nov 5, 2015 at 3:50 PM, Tomas Vondra
 wrote:

...


I can do that - I see there are three patches in the two threads:

   1) gin_pending_lwlock.patch (Jeff Janes)
   2) gin_pending_pagelock.patch (Jeff Janes)
   3) gin_alone_cleanup-2.patch (Teodor Sigaev)

Should I test all of them? Or is (1) obsoleted by (2) for example?


1 is obsolete.  Either 2 or 3 should fix the bug, provided this is the
bug you are seeing.  They have different performance side effects, but
as far as fixing the bug they should be equivalent.


OK, I'll do testing with those two patches then, and I'll also note the
performance difference (the data load was very stable). Of course, it's
just one particular workload.

I'll post an update after the weekend.


I've finally managed to test the two patches. Sorry for the delay.

I've repeated the workload on 9.5, 9.6 and 9.6 with (1) or (2), looking 
for lockups or data corruption. I've also measured duration of the 
script, to see what's the impact on performance. The configuration 
(shared_buffers, work_mem ...) was exactly the same in all cases.


9.5 : runtime ~1380 seconds
9.6 : runtime ~1380 seconds (but lockups and data corruption)
9.6+(1) : runtime ~1380 seconds
9.6+(2) : runtime ~1290 seconds

So both patches seem to do the trick, but (2) is faster. Not sure if 
this is expected. (BTW all the results are without asserts enabled).


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql tab completion bug for ALL IN TABLESPACE

2015-12-14 Thread Andres Freund
On 2015-12-14 12:18:27 +0100, Andres Freund wrote:
> On 2015-12-12 21:04:31 +0900, Michael Paquier wrote:
> > Hi all,
> > 
> > I just bumped into the following thing while looking again at Thomas'
> > patch for psql tab completion:
> > --- a/src/bin/psql/tab-complete.c
> > +++ b/src/bin/psql/tab-complete.c
> > @@ -1040,7 +1040,7 @@ psql_completion(const char *text, int start, int end)
> >  pg_strcasecmp(prev5_wd, "IN") == 0 &&
> >  pg_strcasecmp(prev4_wd, "TABLESPACE") == 0 &&
> >  pg_strcasecmp(prev2_wd, "OWNED") == 0 &&
> > -pg_strcasecmp(prev4_wd, "BY") == 0)
> > +pg_strcasecmp(prev_wd, "BY") == 0)
> > {
> > COMPLETE_WITH_QUERY(Query_for_list_of_roles);
> > This should be backpatched, attached is the needed patch.
> 
> Hm, this seems to need slightly more expansive surgery.
> 
> Trivially the comments for ALL IN TABLESPACE seem broken/badly copy pasted:
>  /* ALTER TABLE,INDEX,MATERIALIZED VIEW xxx ALL IN TABLESPACE xxx */
> the first xxx doesnt make sense.
> 
> Secondly the OWNED BY completion then breaks the SET TABLESPACE
> completion. That's maybe not an outright bug, but seems odd nonetheless.
> 
> Fujii, Stephen?

I'm off to do something else for a while, but attached is where I
stopped at.
>From 0b7fe71b3c27abf97c1a2fdefdd10c1e71d3eba7 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 14 Dec 2015 12:06:59 +0100
Subject: [PATCH] Fix tab completion for ALTER ... TABLESPACE ... OWNED BY.

This was introduced broken, in b2de2a.

Author: Michael Paquier
Discussion: CAB7nPqSHDdSwsJqX0d2XzjqOHr==HdWiubCi4L=zs7yftun...@mail.gmail.com
Backpatch: 9.4, like the commit introducing the bug
---
 src/bin/psql/tab-complete.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 8c48881..85f843e 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1025,7 +1025,7 @@ psql_completion(const char *text, int start, int end)
 
 		COMPLETE_WITH_LIST(list_ALTER);
 	}
-	/* ALTER TABLE,INDEX,MATERIALIZED VIEW xxx ALL IN TABLESPACE xxx */
+	/* ALTER TABLE,INDEX,MATERIALIZED VIEW ALL IN TABLESPACE xxx */
 	else if (pg_strcasecmp(prev4_wd, "ALL") == 0 &&
 			 pg_strcasecmp(prev3_wd, "IN") == 0 &&
 			 pg_strcasecmp(prev2_wd, "TABLESPACE") == 0)
@@ -1035,15 +1035,24 @@ psql_completion(const char *text, int start, int end)
 
 		COMPLETE_WITH_LIST(list_ALTERALLINTSPC);
 	}
-	/* ALTER TABLE,INDEX,MATERIALIZED VIEW xxx ALL IN TABLESPACE xxx OWNED BY */
+	/* ALTER TABLE,INDEX,MATERIALIZED VIEW ALL IN TABLESPACE xxx OWNED BY */
 	else if (pg_strcasecmp(prev6_wd, "ALL") == 0 &&
 			 pg_strcasecmp(prev5_wd, "IN") == 0 &&
 			 pg_strcasecmp(prev4_wd, "TABLESPACE") == 0 &&
 			 pg_strcasecmp(prev2_wd, "OWNED") == 0 &&
-			 pg_strcasecmp(prev4_wd, "BY") == 0)
+			 pg_strcasecmp(prev_wd, "BY") == 0)
 	{
 		COMPLETE_WITH_QUERY(Query_for_list_of_roles);
 	}
+	/* ALTER TABLE,INDEX,MATERIALIZED VIEW ALL IN TABLESPACE xxx OWNED BY xxx */
+	else if (pg_strcasecmp(prev7_wd, "ALL") == 0 &&
+			 pg_strcasecmp(prev6_wd, "IN") == 0 &&
+			 pg_strcasecmp(prev5_wd, "TABLESPACE") == 0 &&
+			 pg_strcasecmp(prev3_wd, "OWNED") == 0 &&
+			 pg_strcasecmp(prev2_wd, "BY") == 0)
+	{
+		COMPLETE_WITH_CONST("SET TABLESPACE");
+	}
 	/* ALTER AGGREGATE,FUNCTION  */
 	else if (pg_strcasecmp(prev3_wd, "ALTER") == 0 &&
 			 (pg_strcasecmp(prev2_wd, "AGGREGATE") == 0 ||
-- 
2.5.0.400.gff86faf.dirty


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql tab completion bug for ALL IN TABLESPACE

2015-12-14 Thread Michael Paquier
On Mon, Dec 14, 2015 at 8:18 PM, Andres Freund  wrote:
> On 2015-12-12 21:04:31 +0900, Michael Paquier wrote:
>> Hi all,
>>
>> I just bumped into the following thing while looking again at Thomas'
>> patch for psql tab completion:
>> --- a/src/bin/psql/tab-complete.c
>> +++ b/src/bin/psql/tab-complete.c
>> @@ -1040,7 +1040,7 @@ psql_completion(const char *text, int start, int end)
>>  pg_strcasecmp(prev5_wd, "IN") == 0 &&
>>  pg_strcasecmp(prev4_wd, "TABLESPACE") == 0 &&
>>  pg_strcasecmp(prev2_wd, "OWNED") == 0 &&
>> -pg_strcasecmp(prev4_wd, "BY") == 0)
>> +pg_strcasecmp(prev_wd, "BY") == 0)
>> {
>> COMPLETE_WITH_QUERY(Query_for_list_of_roles);
>> This should be backpatched, attached is the needed patch.
>
> Hm, this seems to need slightly more expansive surgery.
>
> Trivially the comments for ALL IN TABLESPACE seem broken/badly copy pasted:
>  /* ALTER TABLE,INDEX,MATERIALIZED VIEW xxx ALL IN TABLESPACE xxx */
> the first xxx doesnt make sense.

Indeed.

> Secondly the OWNED BY completion then breaks the SET TABLESPACE
> completion. That's maybe not an outright bug, but seems odd nonetheless.

You mean that this code should do the completion of SET TABLESPACE
after "OWNED BY xxx" has been typed? Are you sure it's worth going
this far?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql tab completion bug for ALL IN TABLESPACE

2015-12-14 Thread Andres Freund
On 2015-12-14 20:58:21 +0900, Michael Paquier wrote:
> On Mon, Dec 14, 2015 at 8:49 PM, Andres Freund  wrote:
> > On 2015-12-14 20:44:20 +0900, Michael Paquier wrote:
> >> + /*
> >> +  * ALTER TABLE,INDEX,MATERIALIZED VIEW ALL IN TABLESPACE xxx OWNED 
> >> BY xxx
> >> +  * SET TABLESPACE.
> >> +  */
> >> + else if (pg_strcasecmp(prev9_wd, "ALL") == 0 &&
> >> +  pg_strcasecmp(prev8_wd, "IN") == 0 &&
> >> +  pg_strcasecmp(prev7_wd, "TABLESPACE") == 0 &&
> >> +  pg_strcasecmp(prev5_wd, "OWNED") == 0 &&
> >> +  pg_strcasecmp(prev4_wd, "BY") == 0 &&
> >> +  pg_strcasecmp(prev2_wd, "SET") == 0 &&
> >> +  pg_strcasecmp(prev_wd, "TABLESPACE") == 0)
> >> + {
> >> + COMPLETE_WITH_QUERY(Query_for_list_of_tablespaces);
> >> + }
> >
> > Isn't that already handled by the normal SET TABLESPACE case?
> 
> No, There is no SET TABLESPACE case, there is a TABLE SET TABLESPACE
> though. Just removing the TABLE seems to be fine..

ALTER TABLE ALL IN TABLESPACE pg_default OWNED BY andres SET TABLESPACE 
works, because of

/*
 * Finally, we look through the list of "things", such as TABLE, INDEX 
and
 * check if that was the previous word. If so, execute the query to get 
a
 * list of them.
 */
else
{
int i;

for (i = 0; words_after_create[i].name; i++)
{
if (pg_strcasecmp(prev_wd, words_after_create[i].name) 
== 0)
{
if (words_after_create[i].query)

COMPLETE_WITH_QUERY(words_after_create[i].query);
else if (words_after_create[i].squery)

COMPLETE_WITH_SCHEMA_QUERY(*words_after_create[i].squery,

   NULL);
break;
}
}
}



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql tab completion bug for ALL IN TABLESPACE

2015-12-14 Thread Michael Paquier
On Mon, Dec 14, 2015 at 9:08 PM, Andres Freund  wrote:
> ALTER TABLE ALL IN TABLESPACE pg_default OWNED BY andres SET TABLESPACE 
> works, because of

Missed that.
-   /* If we have TABLE  SET TABLESPACE provide a list of
tablespaces */
-   else if (pg_strcasecmp(prev4_wd, "TABLE") == 0 &&
-pg_strcasecmp(prev2_wd, "SET") == 0 &&
-pg_strcasecmp(prev_wd, "TABLESPACE") == 0)
-   COMPLETE_WITH_QUERY(Query_for_list_of_tablespaces);
So you could get rid of that as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql tab completion bug for ALL IN TABLESPACE

2015-12-14 Thread Michael Paquier
On Mon, Dec 14, 2015 at 8:49 PM, Andres Freund  wrote:
> On 2015-12-14 20:44:20 +0900, Michael Paquier wrote:
>> + /*
>> +  * ALTER TABLE,INDEX,MATERIALIZED VIEW ALL IN TABLESPACE xxx OWNED BY 
>> xxx
>> +  * SET TABLESPACE.
>> +  */
>> + else if (pg_strcasecmp(prev9_wd, "ALL") == 0 &&
>> +  pg_strcasecmp(prev8_wd, "IN") == 0 &&
>> +  pg_strcasecmp(prev7_wd, "TABLESPACE") == 0 &&
>> +  pg_strcasecmp(prev5_wd, "OWNED") == 0 &&
>> +  pg_strcasecmp(prev4_wd, "BY") == 0 &&
>> +  pg_strcasecmp(prev2_wd, "SET") == 0 &&
>> +  pg_strcasecmp(prev_wd, "TABLESPACE") == 0)
>> + {
>> + COMPLETE_WITH_QUERY(Query_for_list_of_tablespaces);
>> + }
>
> Isn't that already handled by the normal SET TABLESPACE case?

No, There is no SET TABLESPACE case, there is a TABLE SET TABLESPACE
though. Just removing the TABLE seems to be fine..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql tab completion bug for ALL IN TABLESPACE

2015-12-14 Thread Michael Paquier
On Mon, Dec 14, 2015 at 8:36 PM, Andres Freund  wrote:
> On 2015-12-14 12:18:27 +0100, Andres Freund wrote:
>> On 2015-12-12 21:04:31 +0900, Michael Paquier wrote:
>> > Hi all,
>> >
>> > I just bumped into the following thing while looking again at Thomas'
>> > patch for psql tab completion:
>> > --- a/src/bin/psql/tab-complete.c
>> > +++ b/src/bin/psql/tab-complete.c
>> > @@ -1040,7 +1040,7 @@ psql_completion(const char *text, int start, int end)
>> >  pg_strcasecmp(prev5_wd, "IN") == 0 &&
>> >  pg_strcasecmp(prev4_wd, "TABLESPACE") == 0 &&
>> >  pg_strcasecmp(prev2_wd, "OWNED") == 0 &&
>> > -pg_strcasecmp(prev4_wd, "BY") == 0)
>> > +pg_strcasecmp(prev_wd, "BY") == 0)
>> > {
>> > COMPLETE_WITH_QUERY(Query_for_list_of_roles);
>> > This should be backpatched, attached is the needed patch.
>>
>> Hm, this seems to need slightly more expansive surgery.
>>
>> Trivially the comments for ALL IN TABLESPACE seem broken/badly copy pasted:
>>  /* ALTER TABLE,INDEX,MATERIALIZED VIEW xxx ALL IN TABLESPACE xxx */
>> the first xxx doesnt make sense.
>>
>> Secondly the OWNED BY completion then breaks the SET TABLESPACE
>> completion. That's maybe not an outright bug, but seems odd nonetheless.
>>
>> Fujii, Stephen?
>
> I'm off to do something else for a while, but attached is where I
> stopped at.

Just a bit more and you can get the whole set...
-- 
Michael


psql-tab-alltbspace.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql tab completion bug for ALL IN TABLESPACE

2015-12-14 Thread Andres Freund
On 2015-12-12 21:04:31 +0900, Michael Paquier wrote:
> Hi all,
> 
> I just bumped into the following thing while looking again at Thomas'
> patch for psql tab completion:
> --- a/src/bin/psql/tab-complete.c
> +++ b/src/bin/psql/tab-complete.c
> @@ -1040,7 +1040,7 @@ psql_completion(const char *text, int start, int end)
>  pg_strcasecmp(prev5_wd, "IN") == 0 &&
>  pg_strcasecmp(prev4_wd, "TABLESPACE") == 0 &&
>  pg_strcasecmp(prev2_wd, "OWNED") == 0 &&
> -pg_strcasecmp(prev4_wd, "BY") == 0)
> +pg_strcasecmp(prev_wd, "BY") == 0)
> {
> COMPLETE_WITH_QUERY(Query_for_list_of_roles);
> This should be backpatched, attached is the needed patch.

Hm, this seems to need slightly more expansive surgery.

Trivially the comments for ALL IN TABLESPACE seem broken/badly copy pasted:
 /* ALTER TABLE,INDEX,MATERIALIZED VIEW xxx ALL IN TABLESPACE xxx */
the first xxx doesnt make sense.

Secondly the OWNED BY completion then breaks the SET TABLESPACE
completion. That's maybe not an outright bug, but seems odd nonetheless.

Fujii, Stephen?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql tab completion bug for ALL IN TABLESPACE

2015-12-14 Thread Andres Freund
On 2015-12-14 20:44:20 +0900, Michael Paquier wrote:
> + /*
> +  * ALTER TABLE,INDEX,MATERIALIZED VIEW ALL IN TABLESPACE xxx OWNED BY 
> xxx
> +  * SET TABLESPACE.
> +  */
> + else if (pg_strcasecmp(prev9_wd, "ALL") == 0 &&
> +  pg_strcasecmp(prev8_wd, "IN") == 0 &&
> +  pg_strcasecmp(prev7_wd, "TABLESPACE") == 0 &&
> +  pg_strcasecmp(prev5_wd, "OWNED") == 0 &&
> +  pg_strcasecmp(prev4_wd, "BY") == 0 &&
> +  pg_strcasecmp(prev2_wd, "SET") == 0 &&
> +  pg_strcasecmp(prev_wd, "TABLESPACE") == 0)
> + {
> + COMPLETE_WITH_QUERY(Query_for_list_of_tablespaces);
> + }

Isn't that already handled by the normal SET TABLESPACE case?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] psql tab completion bug for ALL IN TABLESPACE

2015-12-12 Thread Michael Paquier
Hi all,

I just bumped into the following thing while looking again at Thomas'
patch for psql tab completion:
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1040,7 +1040,7 @@ psql_completion(const char *text, int start, int end)
 pg_strcasecmp(prev5_wd, "IN") == 0 &&
 pg_strcasecmp(prev4_wd, "TABLESPACE") == 0 &&
 pg_strcasecmp(prev2_wd, "OWNED") == 0 &&
-pg_strcasecmp(prev4_wd, "BY") == 0)
+pg_strcasecmp(prev_wd, "BY") == 0)
{
COMPLETE_WITH_QUERY(Query_for_list_of_roles);
This should be backpatched, attached is the needed patch.

Regards,
-- 
Michael


20151212_psql_alltblspc.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Within CF app, "Bug Fixes" should be "Bug Fixes/Refactoring"

2015-11-07 Thread Peter Eisentraut
On 11/6/15 11:34 AM, Robert Haas wrote:
> On Fri, Nov 6, 2015 at 12:52 AM, Michael Paquier
>  wrote:
>>> I guess I'm wondering whether there's really enough of this to need
>>> its own category.
>>
>> We have a category "Code comments" as well. Let's give it a shot so I
>> am adding it. We could always remove it later if necessary.
> 
> Ugh, OK, whatever.  That sounds like we have too many categories.

Btw., the reason I added the Bug Fixes category a while back was that I
felt that bug fixes have a slightly different process to them:

1. One might feel that bug fixes should be dealt with before new features.

2. There often isn't much "do we want this" discussion needed for bugs,
only verification of the code.

I think that was kind of successful.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Within CF app, "Bug Fixes" should be "Bug Fixes/Refactoring"

2015-11-06 Thread Robert Haas
On Fri, Nov 6, 2015 at 12:52 AM, Michael Paquier
 wrote:
>> I guess I'm wondering whether there's really enough of this to need
>> its own category.
>
> We have a category "Code comments" as well. Let's give it a shot so I
> am adding it. We could always remove it later if necessary.

Ugh, OK, whatever.  That sounds like we have too many categories.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2015-11-05 Thread Jeff Janes
On Thu, Nov 5, 2015 at 2:18 PM, Tomas Vondra
 wrote:
> Hi,
>
> while repeating some full-text benchmarks on master, I've discovered
> that there's a data corruption bug somewhere. What happens is that while
> loading data into a table with GIN indexes (using multiple parallel
> connections), I sometimes get this:
>
> TRAP: FailedAssertion("!(((PageHeader) (page))->pd_special >=
> (__builtin_offsetof (PageHeaderData, pd_linp)))", File: "ginfast.c",
> Line: 537)
> LOG:  server process (PID 22982) was terminated by signal 6: Aborted
> DETAIL:  Failed process was running: autovacuum: ANALYZE messages
>
> The details of the assert are always exactly the same - it's always
> autovacuum and it trips on exactly the same check. And the backtrace
> always looks like this (full backtrace attached):
>
> #0  0x7f133b635045 in raise () from /lib64/libc.so.6
> #1  0x7f133b6364ea in abort () from /lib64/libc.so.6
> #2  0x007dc007 in ExceptionalCondition
> (conditionName=conditionName@entry=0x81a088 "!(((PageHeader)
> (page))->pd_special >= (__builtin_offsetof (PageHeaderData, pd_linp)))",
>errorType=errorType@entry=0x81998b "FailedAssertion",
> fileName=fileName@entry=0x83480a "ginfast.c",
> lineNumber=lineNumber@entry=537) at assert.c:54
> #3  0x004894aa in shiftList (stats=0x0, fill_fsm=1 '\001',
> newHead=26357, metabuffer=130744, index=0x7f133c0f7518) at ginfast.c:537
> #4  ginInsertCleanup (ginstate=ginstate@entry=0x7ffd98ac9160,
> vac_delay=vac_delay@entry=1 '\001', fill_fsm=fill_fsm@entry=1 '\001',
> stats=stats@entry=0x0) at ginfast.c:908
> #5  0x004874f7 in ginvacuumcleanup (fcinfo=) at
> ginvacuum.c:662
> ...

This looks like it is probably the same bug discussed here:

http://www.postgresql.org/message-id/CAMkU=1xalflhuuohfp5v33rzedlvb5aknnujceum9knbkrb...@mail.gmail.com

And here:

http://www.postgresql.org/message-id/56041b26.2040...@sigaev.ru

The bug theoretically exists in 9.5, but it wasn't until 9.6 (commit
e95680832854cf300e64c) that free pages were recycled aggressively
enough that it actually becomes likely to be hit.

There are some proposed patches in those threads, but discussion on
them seems to have stalled out.  Can you try one and see if it fixes
the problems you are seeing?

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2015-11-05 Thread Tomas Vondra



On 11/05/2015 11:44 PM, Jeff Janes wrote:
>

This looks like it is probably the same bug discussed here:

http://www.postgresql.org/message-id/CAMkU=1xalflhuuohfp5v33rzedlvb5aknnujceum9knbkrb...@mail.gmail.com

And here:

http://www.postgresql.org/message-id/56041b26.2040...@sigaev.ru

The bug theoretically exists in 9.5, but it wasn't until 9.6 (commit
e95680832854cf300e64c) that free pages were recycled aggressively
enough that it actually becomes likely to be hit.


I have only quickly skimmed the discussions, but my impression was that 
it's mostly about removing stuff that shouldn't be removed and such. But 
maybe there are race conditions that cause data corruption. I don't 
really want to dive too deeply into this, I've already spent too much 
time trying to reproduce it.




There are some proposed patches in those threads, but discussion on
them seems to have stalled out. Can you try one and see if it fixes
the problems you are seeing?


I can do that - I see there are three patches in the two threads:

  1) gin_pending_lwlock.patch (Jeff Janes)
  2) gin_pending_pagelock.patch (Jeff Janes)
  3) gin_alone_cleanup-2.patch (Teodor Sigaev)

Should I test all of them? Or is (1) obsoleted by (2) for example?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2015-11-05 Thread Jeff Janes
On Thu, Nov 5, 2015 at 3:50 PM, Tomas Vondra
 wrote:
>
>
> On 11/05/2015 11:44 PM, Jeff Janes wrote:
>>
>>
>> This looks like it is probably the same bug discussed here:
>>
>>
>> http://www.postgresql.org/message-id/CAMkU=1xalflhuuohfp5v33rzedlvb5aknnujceum9knbkrb...@mail.gmail.com
>>
>> And here:
>>
>> http://www.postgresql.org/message-id/56041b26.2040...@sigaev.ru
>>
>> The bug theoretically exists in 9.5, but it wasn't until 9.6 (commit
>> e95680832854cf300e64c) that free pages were recycled aggressively
>> enough that it actually becomes likely to be hit.
>
>
> I have only quickly skimmed the discussions, but my impression was that it's
> mostly about removing stuff that shouldn't be removed and such. But maybe
> there are race conditions that cause data corruption. I don't really want to
> dive too deeply into this, I've already spent too much time trying to
> reproduce it.
>
>>
>> There are some proposed patches in those threads, but discussion on
>> them seems to have stalled out. Can you try one and see if it fixes
>> the problems you are seeing?
>
>
> I can do that - I see there are three patches in the two threads:
>
>   1) gin_pending_lwlock.patch (Jeff Janes)
>   2) gin_pending_pagelock.patch (Jeff Janes)
>   3) gin_alone_cleanup-2.patch (Teodor Sigaev)
>
> Should I test all of them? Or is (1) obsoleted by (2) for example?

1 is obsolete.  Either 2 or 3 should fix the bug, provided this is the
bug you are seeing.  They have different performance side effects, but
as far as fixing the bug they should be equivalent.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] GIN data corruption bug(s) in 9.6devel

2015-11-05 Thread Tomas Vondra

Hi,

while repeating some full-text benchmarks on master, I've discovered
that there's a data corruption bug somewhere. What happens is that while
loading data into a table with GIN indexes (using multiple parallel
connections), I sometimes get this:

TRAP: FailedAssertion("!(((PageHeader) (page))->pd_special >=
(__builtin_offsetof (PageHeaderData, pd_linp)))", File: "ginfast.c",
Line: 537)
LOG:  server process (PID 22982) was terminated by signal 6: Aborted
DETAIL:  Failed process was running: autovacuum: ANALYZE messages

The details of the assert are always exactly the same - it's always
autovacuum and it trips on exactly the same check. And the backtrace
always looks like this (full backtrace attached):

#0  0x7f133b635045 in raise () from /lib64/libc.so.6
#1  0x7f133b6364ea in abort () from /lib64/libc.so.6
#2  0x007dc007 in ExceptionalCondition
(conditionName=conditionName@entry=0x81a088 "!(((PageHeader)
(page))->pd_special >= (__builtin_offsetof (PageHeaderData, pd_linp)))",
   errorType=errorType@entry=0x81998b "FailedAssertion",
fileName=fileName@entry=0x83480a "ginfast.c",
lineNumber=lineNumber@entry=537) at assert.c:54
#3  0x004894aa in shiftList (stats=0x0, fill_fsm=1 '\001',
newHead=26357, metabuffer=130744, index=0x7f133c0f7518) at ginfast.c:537
#4  ginInsertCleanup (ginstate=ginstate@entry=0x7ffd98ac9160,
vac_delay=vac_delay@entry=1 '\001', fill_fsm=fill_fsm@entry=1 '\001',
stats=stats@entry=0x0) at ginfast.c:908
#5  0x004874f7 in ginvacuumcleanup (fcinfo=) at
ginvacuum.c:662
...

It's not perfectly deterministic - sometimes I had repeat the whole load
multiple times (up to 5x, and each load takes ~30minutes).

I'm pretty sure this is not external issue, because I've reproduced it
on a different machine (different CPU / kernel / libraries / compiler).
It's however interesting that on the other machine I've also observed a
different kind of lockups, where the sessions get stuck on semop() in
gininsert (again, full backtrace attached):

#0  0x003f3d4eaf97 in semop () from /lib64/libc.so.6
#1  0x0067a41f in PGSemaphoreLock (sema=0x7f93290405d8) at
pg_sema.c:387
#2  0x006df613 in LWLockAcquire (lock=0x7f92a4dce900,
mode=LW_EXCLUSIVE) at lwlock.c:1049
#3  0x004878c6 in ginHeapTupleFastInsert
(ginstate=0x7ffd969c88f0, collector=0x7ffd969caeb0) at ginfast.c:250
#4  0x0047423a in gininsert (fcinfo=) at
gininsert.c:531
...

I'm not sure whether this is a different manifestation of the same issue
or another bug. The systems are not exactly the same - one has a single
socket (i5) while the other one has 2 (Xeon), the compilers and kernels
are different and so on.

I've also seen cases when the load seemingly completed OK, but trying to
dump the table to disk using COPY resulted in

   ERROR:  compressed data is corrupted

which I find rather strange as there was no previous error, and also
COPY should only dump table data (while the asserts were in GIN code
handling index pages, unless I'm mistaken). Seems like a case of
insufficient locking where two backends scribble on the same page
somehow, and then autovacuum hits the assert. Ot maybe not, not sure.

I've been unable to reproduce the issue on REL9_5_STABLE (despite
running the load ~20x on each machine), so that seems safe, and the
issue was introduced by some of the newer commits.

I've already spent too much CPU time on this, so perhaps someone with
better knowledge of the GIN code can take care of this. To reproduce it
you may use the same code I did - it's available here:

   https://bitbucket.org/tvondra/archie

it's a PoC of database with pgsql mailing lists with fulltext. It's a
bit messy, but it's rather simple

   1) clone the repo

  $ git clone https://bitbucket.org/tvondra/archie.git

   2) create a directory for downloading the mbox files

  $ mkdir archie-data

   3) download the mbox files (~4.5GB of data) using the download
  script (make sure archie/bin is on PATH)

  $ cd archie-data
  $ export PATH=../archie/bin:$PATH
  $ ../archie/download

   4) use "run" scipt (attached) to run the load n-times on a given
  commit

  $ run.sh master 10

  NOTICE: The run script is the messiest one, you'll have to
  edit it to fix paths etc.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Core was generated by `postgres: autovacuum worker process   archie 
 '.
Program terminated with signal SIGABRT, Aborted.
#0  0x7f133b635045 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x7f133b635045 in raise () from /lib64/libc.so.6
#1  0x7f133b6364ea in abort () from /lib64/libc.so.6
#2  0x007dc007 in ExceptionalCondition 
(conditionName=conditionName@entry=0x81a088 "!(((PageHeader) 
(page))->pd_special >= (__builtin_offsetof (PageHeaderData, 

Re: [HACKERS] Within CF app, "Bug Fixes" should be "Bug Fixes/Refactoring"

2015-11-05 Thread Peter Geoghegan
On Thu, Oct 29, 2015 at 1:29 PM, Peter Geoghegan  wrote:
>> "Refactoring" seems rather a narrow definition of what might show up
>> in such a category, btw.  Maybe "Code Beautification" would be a
>> suitable title?  I'm bikeshedding though.
>
> I think that there is value in limiting the number of topics. But I
> hardly but much weight on this. Any of the above are fine.

Can someone follow up and push this to the CF app? "Refactoring" seems
to be the consensus.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Within CF app, "Bug Fixes" should be "Bug Fixes/Refactoring"

2015-11-05 Thread Robert Haas
On Thu, Nov 5, 2015 at 4:53 PM, Peter Geoghegan  wrote:
> On Thu, Oct 29, 2015 at 1:29 PM, Peter Geoghegan  wrote:
>>> "Refactoring" seems rather a narrow definition of what might show up
>>> in such a category, btw.  Maybe "Code Beautification" would be a
>>> suitable title?  I'm bikeshedding though.
>>
>> I think that there is value in limiting the number of topics. But I
>> hardly but much weight on this. Any of the above are fine.
>
> Can someone follow up and push this to the CF app? "Refactoring" seems
> to be the consensus.

I guess I'm wondering whether there's really enough of this to need
its own category.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Within CF app, "Bug Fixes" should be "Bug Fixes/Refactoring"

2015-11-05 Thread Michael Paquier
On Fri, Nov 6, 2015 at 1:47 PM, Robert Haas  wrote:
> On Thu, Nov 5, 2015 at 4:53 PM, Peter Geoghegan  wrote:
>> On Thu, Oct 29, 2015 at 1:29 PM, Peter Geoghegan  wrote:
 "Refactoring" seems rather a narrow definition of what might show up
 in such a category, btw.  Maybe "Code Beautification" would be a
 suitable title?  I'm bikeshedding though.
>>>
>>> I think that there is value in limiting the number of topics. But I
>>> hardly but much weight on this. Any of the above are fine.
>>
>> Can someone follow up and push this to the CF app? "Refactoring" seems
>> to be the consensus.
>
> I guess I'm wondering whether there's really enough of this to need
> its own category.

We have a category "Code comments" as well. Let's give it a shot so I
am adding it. We could always remove it later if necessary.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2015-11-05 Thread Tomas Vondra

Hi,

On 11/06/2015 01:05 AM, Jeff Janes wrote:

On Thu, Nov 5, 2015 at 3:50 PM, Tomas Vondra
 wrote:

...


I can do that - I see there are three patches in the two threads:

   1) gin_pending_lwlock.patch (Jeff Janes)
   2) gin_pending_pagelock.patch (Jeff Janes)
   3) gin_alone_cleanup-2.patch (Teodor Sigaev)

Should I test all of them? Or is (1) obsoleted by (2) for example?


1 is obsolete.  Either 2 or 3 should fix the bug, provided this is the
bug you are seeing.  They have different performance side effects, but
as far as fixing the bug they should be equivalent.


OK, I'll do testing with those two patches then, and I'll also note the 
performance difference (the data load was very stable). Of course, it's 
just one particular workload.


I'll post an update after the weekend.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Within CF app, "Bug Fixes" should be "Bug Fixes/Refactoring"

2015-10-29 Thread Peter Geoghegan
I think that within the CF app, we should either rename the patch
topic "Bug Fixes" to "Bug Fixes/Refactoring", or introduce a new
"Refactoring" topic. I prefer the first approach.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Within CF app, "Bug Fixes" should be "Bug Fixes/Refactoring"

2015-10-29 Thread Michael Paquier
On Thu, Oct 29, 2015 at 8:33 PM, Peter Geoghegan  wrote:
> I think that within the CF app, we should either rename the patch
> topic "Bug Fixes" to "Bug Fixes/Refactoring", or introduce a new
> "Refactoring" topic. I prefer the first approach.

I would vote for the second approach, with a separate category for refactoring.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Within CF app, "Bug Fixes" should be "Bug Fixes/Refactoring"

2015-10-29 Thread Mike Blackwell
​

On Thu, Oct 29, 2015 at 2:45 PM, Michael Paquier 
 wrote:

> On Thu, Oct 29, 2015 at 8:33 PM, Peter Geoghegan  wrote:
> > I think that within the CF app, we should either rename the patch
> > topic "Bug Fixes" to "Bug Fixes/Refactoring", or introduce a new
> > "Refactoring" topic. I prefer the first approach.
>
> I would vote for the second approach, with a separate category for
> refactoring.
>
> ​
So if a bug fix involved some refactoring, which would you put it under?
Or would you
expect separate CF entries for the refactoring and the fix?​

​
​
​
​

__
*Mike Blackwell | Technical Analyst, Distribution Services/Rollout
Management | RR Donnelley*
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
mike.blackw...@rrd.com
http://www.rrdonnelley.com



* *
​


Re: [HACKERS] Within CF app, "Bug Fixes" should be "Bug Fixes/Refactoring"

2015-10-29 Thread Tom Lane
Michael Paquier  writes:
> On Thu, Oct 29, 2015 at 8:33 PM, Peter Geoghegan  wrote:
>> I think that within the CF app, we should either rename the patch
>> topic "Bug Fixes" to "Bug Fixes/Refactoring", or introduce a new
>> "Refactoring" topic. I prefer the first approach.

> I would vote for the second approach, with a separate category for 
> refactoring.

Ditto.  Bug fixes are not at all like refactoring --- in particular, we'd
usually not consider refactoring as fit material for back-patching.

"Refactoring" seems rather a narrow definition of what might show up
in such a category, btw.  Maybe "Code Beautification" would be a
suitable title?  I'm bikeshedding though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Within CF app, "Bug Fixes" should be "Bug Fixes/Refactoring"

2015-10-29 Thread Peter Geoghegan
On Thu, Oct 29, 2015 at 1:10 PM, Tom Lane  wrote:
> Ditto.  Bug fixes are not at all like refactoring --- in particular, we'd
> usually not consider refactoring as fit material for back-patching.
>
> "Refactoring" seems rather a narrow definition of what might show up
> in such a category, btw.  Maybe "Code Beautification" would be a
> suitable title?  I'm bikeshedding though.

I think that there is value in limiting the number of topics. But I
hardly but much weight on this. Any of the above are fine.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Within CF app, "Bug Fixes" should be "Bug Fixes/Refactoring"

2015-10-29 Thread Josh Berkus
On 10/29/2015 01:10 PM, Tom Lane wrote:
> Michael Paquier  writes:
>> On Thu, Oct 29, 2015 at 8:33 PM, Peter Geoghegan  wrote:
>>> I think that within the CF app, we should either rename the patch
>>> topic "Bug Fixes" to "Bug Fixes/Refactoring", or introduce a new
>>> "Refactoring" topic. I prefer the first approach.
> 
>> I would vote for the second approach, with a separate category for 
>> refactoring.
> 
> Ditto.  Bug fixes are not at all like refactoring --- in particular, we'd
> usually not consider refactoring as fit material for back-patching.
> 
> "Refactoring" seems rather a narrow definition of what might show up
> in such a category, btw.  Maybe "Code Beautification" would be a
> suitable title?  I'm bikeshedding though.

"Miscellaneous"?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential GIN vacuum bug

2015-10-04 Thread Jeff Janes
On Thu, Sep 3, 2015 at 10:42 AM, Jeff Janes  wrote:

> On Mon, Aug 31, 2015 at 12:10 AM, Jeff Janes  wrote:
>
>> On Sun, Aug 30, 2015 at 3:57 PM, Tom Lane  wrote:
>>
>>> Jeff Janes  writes:
>>> > On Sun, Aug 30, 2015 at 11:11 AM, Tom Lane  wrote:
>>>
>> > But we would still have to deal with the
>>> > fact that unconditional acquire attempt by the backends will cause a
>>> vacuum
>>> > to cancel itself, which is undesirable.
>>>
>>> Good point.
>>>
>>> > If we define a new namespace for
>>> > this lock (like the relation extension lock has its own namespace) then
>>> > perhaps the cancellation code could be made to not cancel on that
>>> > condition.  But that too seems like a lot of work to backpatch.
>>>
>>> We could possibly teach the autocancel logic to distinguish this lock
>>> type
>>> from others without using a new namespace.  That seems a bit klugy, but
>>> maybe better than adding a new namespace.  (For example, there are
>>> probably only a couple of modes in which we take page-level locks at
>>> present.  Choosing a currently unused, but self-exclusive, mode for
>>> taking
>>> such a lock might serve.)
>>>
>>
>> Like the attached?  (The conditional variant for user backends was
>> unceremoniously yanked out.)
>>
>
> A problem here is that now we have the user backends waiting on vacuum to
> do the clean up, but vacuum is using throttled IO and so taking its sweet
> time at it.  Under the old code, the user backend could pass up the vacuum
> while it was sleeping.
>
> Maybe we could have the vacuum detect when someone is waiting on it, and
> temporarily suspend throttling and just run at full speed.  I don't believe
> there is any precedence for that, but I do think there are other places
> where such a technique could be useful.  That is kind of a scary change to
> backpatch.
>
> I am running out of ideas.
>


Teodor published a patch in another thread:
http://www.postgresql.org/message-id/56041b26.2040...@sigaev.ru but I
thought it would be best to discuss it here.

It is similar to my most recent patch.

He removes the parts of the code that anticipates concurrent clean up, and
replaces them with asserts, which I was too lazy to do until we have a
final design.

He uses a different lock mode (ExclusiveLock, rather than
ShareUpdateExclusiveLock) when heavy-locking the metapage.  It doesn't make
a difference, as long as it is self-exclusive and no one else uses it in a
way that causes false sharing (which is currently the case--the only other
user of PageLocks is the hash index code)

He always does conditional locks in regular backends.  That means he
doesn't have to hack the lmgr to prevent vacuum from canceling itself, the
way I did.  It also means there is not the "priority inversion" I mention
above, where a user backend blocks on vacuum, but vacuum is intentionally
throttling itself.

On the other hand, using conditional locks for normal backends does mean
that the size of the pending list can increase without limit, as there is
nothing to throttle the user backends from adding tuples faster than they
are cleaned up. Perhaps worse, it can pin down a vacuum worker without
limit, as it keeps finding more pages have been added by the time it
finished the prior set of pages.  I actually do see this on my (not very
realistic) testing.

I think that for correctness, vacuum only needs to clean the part of the
pending list which existed as of the time vacuum started.  So as long as it
gets that far, it can just be done even if more pages have since been
added.  I'm not sure the best way implement that, I guess you remember the
blkno of the tail page from when you started, and would set a flag once you
truncated away a page with that same blkno.  That would solve the pinning
down a vacuum worker for an unlimited amount of time issue, but would not
solve the unlimited growth of the pending list issue.

Cheers,

Jeff


Re: [HACKERS] Potential GIN vacuum bug

2015-10-02 Thread Robert Haas
On Thu, Oct 1, 2015 at 4:44 PM, Jeff Janes  wrote:
> Is the locking around indexes summarized someplace?

Not to my knowledge.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential GIN vacuum bug

2015-10-01 Thread Jeff Janes
On Tue, Sep 1, 2015 at 8:05 AM, Robert Haas  wrote:

> On Sun, Aug 30, 2015 at 6:57 PM, Tom Lane  wrote:
> >> But we would still have to deal with the
> >> fact that unconditional acquire attempt by the backends will cause a
> vacuum
> >> to cancel itself, which is undesirable.
> >
> > Good point.
> >
> >> If we define a new namespace for
> >> this lock (like the relation extension lock has its own namespace) then
> >> perhaps the cancellation code could be made to not cancel on that
> >> condition.  But that too seems like a lot of work to backpatch.
> >
> > We could possibly teach the autocancel logic to distinguish this lock
> type
> > from others without using a new namespace.  That seems a bit klugy, but
> > maybe better than adding a new namespace.  (For example, there are
> > probably only a couple of modes in which we take page-level locks at
> > present.  Choosing a currently unused, but self-exclusive, mode for
> taking
> > such a lock might serve.)
>
> That seems like a pretty big kludge to me; it will work until it doesn't.
>
> Adding a new lock type (similar to "relation extension") would address
> a lot of my concern with the heavyweight lock approach.  It strikes me
> that trying to grab a lock on the index in what's basically a pretty
> low-level operation here could have a variety of unintended
> consequences.  The autovacuum thing is one; but consider also the risk
> of new deadlock scenarios resulting from a lock upgrade.  Those
> deadlocks would likely be, to use Peter Geoghegan's term,
> unprincipled.  The locking regime around indexes is already pretty
> complicated, and I'm skeptical about the idea that we can complicate
> it further without any blowback.
>

Is the locking around indexes summarized someplace?  About the best thing I
could come up with was to do a "git grep LockRelat" and then look for lines
where the first argument had a name that seemed likely to refer to an index.

Cheers,

Jeff


Re: [HACKERS] Arguable RLS security bug, EvalPlanQual() paranoia

2015-09-29 Thread Stephen Frost
* Adam Brightwell (adam.brightw...@crunchydatasolutions.com) wrote:
> On Mon, Aug 3, 2015 at 6:21 PM, Peter Geoghegan  wrote:
> > On Mon, Aug 3, 2015 at 3:07 PM, Stephen Frost  wrote:
> >> Thoughts?  Trying to keep it straight-forward and provide a simple
> >> solution for users to be able to address the issue, if they're worried
> >> about it.  Perhaps this, plus an additional paragraph which goes into
> >> more detail about exactly what's going on?
> >
> > I'm still thinking about it, but I think you have the right idea here.
> 
> I have attached a patch for review that I believe addresses the
> documentation side of this issue.
> 
> Thoughts or comments?

I'm not convinced this is the right place, but at a minimum it should be
referenced from the RLS documentation.  Further, it should be noted that
users who have direct SQL access can control what the isolation level
is for their transaction.

Also, isn't it possible to avoid this by locking the records?  If the
locking fails or blocks then you know another user has those records
locked and you don't update or you wait until you hold the lock.
Assuming that works (I don't immediately see why it wouldn't..), we
should provide an example.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Arguable RLS security bug, EvalPlanQual() paranoia

2015-09-29 Thread Adam Brightwell
On Mon, Aug 3, 2015 at 6:21 PM, Peter Geoghegan  wrote:
> On Mon, Aug 3, 2015 at 3:07 PM, Stephen Frost  wrote:
>> Thoughts?  Trying to keep it straight-forward and provide a simple
>> solution for users to be able to address the issue, if they're worried
>> about it.  Perhaps this, plus an additional paragraph which goes into
>> more detail about exactly what's going on?
>
> I'm still thinking about it, but I think you have the right idea here.

I have attached a patch for review that I believe addresses the
documentation side of this issue.

Thoughts or comments?

Thanks,
Adam

--
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com


transaction-isolation-rls-docs.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Arguable RLS security bug, EvalPlanQual() paranoia

2015-09-29 Thread Adam Brightwell
> I'm not convinced this is the right place, but at a minimum it should be
> referenced from the RLS documentation.  Further, it should be noted that
> users who have direct SQL access can control what the isolation level
> is for their transaction.

I agree that it should be referenced by the RLS docs.  However, I'm
not sure I can think of a better place for it.  My reasons for
choosing this location was that the behavior seems specific to the
READ COMMITTED isolation level and was an accepted MVCC anomaly; not
necessarily specific only to RLS and SBV.  But, again, I'd agree that
referencing it in the other locations would be desired.  Of course,
I'm willing to accept that I may be making the wrong assumptions.

> Also, isn't it possible to avoid this by locking the records?  If the
> locking fails or blocks then you know another user has those records
> locked and you don't update or you wait until you hold the lock.
> Assuming that works (I don't immediately see why it wouldn't..), we
> should provide an example.

I haven't found that to work, at least not with the specific case
presented by Peter.  Based on the following (output from Peter's
isolation test), I would understand that the 'malicious' UPDATE is
waiting for the previous update to be committed before it continues,
even without the FOR UPDATE lock on the rows.

step no_trust_mallory: update users set group_id = 1 where user_name =
'mallory';
step update_hc: update information set info = 'secret' where group_id = 2;
step updatem: update information set info = info where group_id = 2
returning 'mallory update: ' m, *; 
step commit_hc: commit;
step updatem: <... completed>

As well, due to this, as described by the READ COMMITTED documentation:

"it is possible for an updating command to see an inconsistent
snapshot: it can see the effects of concurrent updating commands on
the same rows it is trying to update"

I'm not convinced that this is something that FOR UPDATE can address
for this specific case.  If inconsistencies in the 'snapshot' can be
expected and are accepted at this isolation level, then I'm not sure
how we can reasonably expect locking the rows to have any affect.
Though, again, I'm willing to accept that I am not fully understanding
this behavior and that I am completely wrong.

-Adam

-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is this a bug?

2015-09-05 Thread Bruce Momjian
On Fri, Sep  4, 2015 at 09:40:10AM +0900, Michael Paquier wrote:
> Robert Haas wrote:
> > On Wed, Aug 26, 2015 at 3:31 PM, Alvaro Herrera
> >  wrote:
> > > Fabrízio de Royes Mello wrote:
> > >
> > >> Why this patch was reverted one day after applied [1]? I didn't see
> any
> > >> discussion around it.
> > >
> > > Contributors whose patches are getting committed should really
> subscribe
> > > to pgsql-committers.
> >
> > I would have thought discussion of committed patches should be moved
> > to -hackers.
> 
> I agree, but it happens anyway quite frequently.  Since recently, I make
> a point of changing the CC from -committers to -hackers, but due to
> force of habit I often forget.
> 
>  
> Noted. I usually don't do that.

I am thinking we should all agree if we should redirect commit comments
to hackers, or not, so we are consistent.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is this a bug?

2015-09-03 Thread Robert Haas
On Wed, Aug 26, 2015 at 3:31 PM, Alvaro Herrera
 wrote:
> Fabrízio de Royes Mello wrote:
>
>> Why this patch was reverted one day after applied [1]? I didn't see any
>> discussion around it.
>
> Contributors whose patches are getting committed should really subscribe
> to pgsql-committers.

I would have thought discussion of committed patches should be moved
to -hackers.  The description for the -committers list says:

"Notification of git commits are sent to this list. Do not post here!"

So, it's understandable that people would not expect other traffic there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is this a bug?

2015-09-03 Thread Alvaro Herrera
Robert Haas wrote:
> On Wed, Aug 26, 2015 at 3:31 PM, Alvaro Herrera
>  wrote:
> > Fabrízio de Royes Mello wrote:
> >
> >> Why this patch was reverted one day after applied [1]? I didn't see any
> >> discussion around it.
> >
> > Contributors whose patches are getting committed should really subscribe
> > to pgsql-committers.
> 
> I would have thought discussion of committed patches should be moved
> to -hackers.

I agree, but it happens anyway quite frequently.  Since recently, I make
a point of changing the CC from -committers to -hackers, but due to
force of habit I often forget.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential GIN vacuum bug

2015-09-03 Thread Jeff Janes
On Mon, Aug 31, 2015 at 12:10 AM, Jeff Janes  wrote:

> On Sun, Aug 30, 2015 at 3:57 PM, Tom Lane  wrote:
>
>> Jeff Janes  writes:
>> > On Sun, Aug 30, 2015 at 11:11 AM, Tom Lane  wrote:
>>
> > But we would still have to deal with the
>> > fact that unconditional acquire attempt by the backends will cause a
>> vacuum
>> > to cancel itself, which is undesirable.
>>
>> Good point.
>>
>> > If we define a new namespace for
>> > this lock (like the relation extension lock has its own namespace) then
>> > perhaps the cancellation code could be made to not cancel on that
>> > condition.  But that too seems like a lot of work to backpatch.
>>
>> We could possibly teach the autocancel logic to distinguish this lock type
>> from others without using a new namespace.  That seems a bit klugy, but
>> maybe better than adding a new namespace.  (For example, there are
>> probably only a couple of modes in which we take page-level locks at
>> present.  Choosing a currently unused, but self-exclusive, mode for taking
>> such a lock might serve.)
>>
>
> Like the attached?  (The conditional variant for user backends was
> unceremoniously yanked out.)
>

A problem here is that now we have the user backends waiting on vacuum to
do the clean up, but vacuum is using throttled IO and so taking its sweet
time at it.  Under the old code, the user backend could pass up the vacuum
while it was sleeping.

Maybe we could have the vacuum detect when someone is waiting on it, and
temporarily suspend throttling and just run at full speed.  I don't believe
there is any precedence for that, but I do think there are other places
where such a technique could be useful.  That is kind of a scary change to
backpatch.

I am running out of ideas.

Cheers,

Jeff


Re: [HACKERS] Is this a bug?

2015-09-03 Thread Michael Paquier
On Fri, Sep 4, 2015 at 3:52 AM, Alvaro Herrera 
wrote:

> Robert Haas wrote:
> > On Wed, Aug 26, 2015 at 3:31 PM, Alvaro Herrera
> >  wrote:
> > > Fabrízio de Royes Mello wrote:
> > >
> > >> Why this patch was reverted one day after applied [1]? I didn't see
> any
> > >> discussion around it.
> > >
> > > Contributors whose patches are getting committed should really
> subscribe
> > > to pgsql-committers.
> >
> > I would have thought discussion of committed patches should be moved
> > to -hackers.
>
> I agree, but it happens anyway quite frequently.  Since recently, I make
> a point of changing the CC from -committers to -hackers, but due to
> force of habit I often forget.
>

Noted. I usually don't do that.
-- 
Michael


Re: [HACKERS] Potential GIN vacuum bug

2015-09-01 Thread Robert Haas
On Sun, Aug 30, 2015 at 6:57 PM, Tom Lane  wrote:
>> But we would still have to deal with the
>> fact that unconditional acquire attempt by the backends will cause a vacuum
>> to cancel itself, which is undesirable.
>
> Good point.
>
>> If we define a new namespace for
>> this lock (like the relation extension lock has its own namespace) then
>> perhaps the cancellation code could be made to not cancel on that
>> condition.  But that too seems like a lot of work to backpatch.
>
> We could possibly teach the autocancel logic to distinguish this lock type
> from others without using a new namespace.  That seems a bit klugy, but
> maybe better than adding a new namespace.  (For example, there are
> probably only a couple of modes in which we take page-level locks at
> present.  Choosing a currently unused, but self-exclusive, mode for taking
> such a lock might serve.)

That seems like a pretty big kludge to me; it will work until it doesn't.

Adding a new lock type (similar to "relation extension") would address
a lot of my concern with the heavyweight lock approach.  It strikes me
that trying to grab a lock on the index in what's basically a pretty
low-level operation here could have a variety of unintended
consequences.  The autovacuum thing is one; but consider also the risk
of new deadlock scenarios resulting from a lock upgrade.  Those
deadlocks would likely be, to use Peter Geoghegan's term,
unprincipled.  The locking regime around indexes is already pretty
complicated, and I'm skeptical about the idea that we can complicate
it further without any blowback.

If we use a new lock type, it's a lot easier to reason about the
interactions, I think.  We know all of the things that will take that
lock type.  And we can be reasonably confident that future code
changes won't introduce any new ones, or that the current ones will be
considered before making changes.

It's not great to have to back-patch such a change, but in practice
the blast radius should be pretty limited.  People who are looking at
pg_locks might start to see a new lock type show up that they're not
expecting, but a lot of people either aren't looking at that data at
all, or are looking at it but not doing anything programmatic with it
and therefore won't really be impacted by something new showing up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential GIN vacuum bug

2015-08-31 Thread Jeff Janes
On Sun, Aug 30, 2015 at 3:57 PM, Tom Lane  wrote:

> Jeff Janes  writes:
> > On Sun, Aug 30, 2015 at 11:11 AM, Tom Lane  wrote:
> >> Your earlier point about how the current design throttles insertions to
> >> keep the pending list from growing without bound seems like a bigger
> deal
> >> to worry about.  I think we'd like to have some substitute for that.
> >> ...
>
> > If the goal is to not change existing behavior (like for back patching)
> the
> > margin should be 1, always wait.
>
> The current behavior is buggy, both as to performance and correctness,
> so I'm not sure how come "don't change the behavior" should be a
> requirement.
>

I'm not confident the new behavior, with regards to performance, is an
absolute win.
We usually don't backpatch performance changes unless they have no or very
little
trade off.  The only margin I can confidently say that for is the margin of
1.0.


>
> > But we would still have to deal with the
> > fact that unconditional acquire attempt by the backends will cause a
> vacuum
> > to cancel itself, which is undesirable.
>
> Good point.
>
> > If we define a new namespace for
> > this lock (like the relation extension lock has its own namespace) then
> > perhaps the cancellation code could be made to not cancel on that
> > condition.  But that too seems like a lot of work to backpatch.
>
> We could possibly teach the autocancel logic to distinguish this lock type
> from others without using a new namespace.  That seems a bit klugy, but
> maybe better than adding a new namespace.  (For example, there are
> probably only a couple of modes in which we take page-level locks at
> present.  Choosing a currently unused, but self-exclusive, mode for taking
> such a lock might serve.)
>

Like the attached?  (The conditional variant for user backends was
unceremoniously yanked out.)


>
> > Would we bother to back-patch a theoretical bug which there is no
> evidence
> > is triggering in the field?
>
> What's theoretical about it?  You seemed pretty sure that the issue in
>
> http://www.postgresql.org/message-id/flat/CA+bfosGVGVQhMAa=0-mue6coo7dbsgayxb-xsnr5vm-s39h...@mail.gmail.com
> was exactly this.
>

I was adamant that the broken concurrency was not helping him
performance-wise, and also introduces correctness bugs.  But I don't think
the unfortunate performance is a bug, just a issue highly related to one
that is a bug.  I don't think a margin of 2, or even 20, would help him.
It would just build a bigger time bomb with a longer fuse.  What he needs
is to turn fastupdate off, or get ssd, or get some other improvements that
aren't going to be backpatched.  If we don't know what setting to use to
fix one person's performance problem, I'd rather set it to something that
at least is know that it won't cause other people to have problems that
they didn't used to.


>
> > If we want to improve the current behavior rather than fix a bug, then I
> > think that if the list is greater than threshold*2 and the cleaning lock
> is
> > unavailable, what it should do is proceed to insert the tuple's keys into
> > the index itself, as if fastupdate = off.  That would require some major
> > surgery to the existing code, as by the time it invokes the clean up, it
> is
> > too late to not insert into the pending list.
>
> Meh.  That's introducing a whole new behavioral regime, which quite aside
> from correctness bugs might introduce new performance bugs of its own.
> It certainly doesn't sound like a better back-patch candidate than the
> other thing.
>

Right, the start of the paragraph was meant to transition from backpatch to
forward looking.

Continuing the forward looking part:

I've given up on fastupdate for 9.4, and turned it off for my indexes.  As
implemented it seems like a rather niche solution.  So it is kind of
unfortunate that it is on by default, and that there is no way to turn it
off except for each individual index separately.  Hopefully we can make it
less niche.  Or maybe the niche is what I (and apparently Mr. Kehlet)
are trying to do.

There seems to be two ways for fastupdate to help:

1) Let someone else do it, in the background.

That is pretty much useless from my perspective, because there is no way to
get someone else to actually do it as often as it is needed to be done.  I
can create an extension to expose the cleanup call to SQL, and then setup a
cron job (or a bgworker) to run that very frequently, or frequently poll to
decide it should be run.  That works, kind of, if this type of thing is
important enough for you to go through all that (It is not important enough
to me, for production use, at this point. I'd rather just set fastupdate to
off, but at least it is available if I need it).  Except, this is still a
serial job, and there is no way around that without turning fastupdate
off.  You can have a RAID of 30 disks, and the clean up process is still
going to have 1 IO outstanding at a time.  With 

Re: [HACKERS] Potential GIN vacuum bug

2015-08-30 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes:
 Attached is a patch to deal with this without the heavyweight locks.
 I realized that using the clean up lock on the meta page, rather than the
 pending head page, would be easier to implement as a pin was already held
 on the meta page throughout, and the pending head page can change during
 the cleanup if we need multiple passes.
 ...
 I still prefer the heavy-weight approach.  The buffer clean up lock for
 vacuuming seems fragile to start with, and abusing it for other purposes
 doesn't improve on that.

FWIW, I would go with the heavyweight lock approach as well.  The extra
cycles needed for a heavyweight lock don't seem significant in this
context, and you have far more control over which other operations
conflict or don't conflict with the lock.  Taking a buffer cleanup lock on
the metapage sounds really scary from that viewpoint; it's basically going
to conflict with everything else, even if the other operations only take
it for short intervals, and you have no freedom to adjust that.

Your earlier point about how the current design throttles insertions to
keep the pending list from growing without bound seems like a bigger deal
to worry about.  I think we'd like to have some substitute for that.
Perhaps we could make the logic in insertion be something like

if (pending-list-size  threshold)
{
if (conditional-lock-acquire(...))
{
do-pending-list-cleanup;
lock-release;
}
else if (pending-list-size  threshold * 2)
{
unconditional-lock-acquire(...);
if (pending-list-size  threshold)
do-pending-list-cleanup;
lock-release;
}
}

so that once the pending list got too big, incoming insertions would wait
for it to be cleared.  Whether to use a 2x safety margin or something else
could be a subject for debate, of course.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential GIN vacuum bug

2015-08-30 Thread Jeff Janes
On Sat, Aug 22, 2015 at 11:25 AM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Tue, Aug 18, 2015 at 8:59 AM, Robert Haas robertmh...@gmail.com
 wrote:

 On Mon, Aug 17, 2015 at 5:41 PM, Jeff Janes jeff.ja...@gmail.com wrote:
  User backends attempt to take the lock conditionally, because otherwise
 they
  would cause an autovacuum already holding the lock to cancel itself,
 which
  seems quite bad.
 
  Not that this a substantial behavior change, in that with this code the
 user
  backends which find the list already being cleaned will just add to the
 end
  of the pending list and go about their business.  So if things are
 added to
  the list faster than they can be cleaned up, the size of the pending
 list
  can increase without bound.
 
  Under the existing code each concurrent user backend will try to clean
 the
  pending list at the same time.  The work doesn't parallelize, so doing
 this
  is just burns CPU (and possibly consuming up to maintenance_work_mem
 for
  *each* backend) but it does server to throttle the insertion rate and so
  keep the list from growing without bound.
 
  This is just a proof-of-concept patch, because I don't know if this
 approach
  is the right approach.

 I'm not sure if this is the right approach, but I'm a little wary of
 involving the heavyweight lock manager in this.  If pending list
 cleanups are frequent, this could involve a lot of additional lock
 manager traffic, which could be bad for performance.



 Even if they are
 infrequent, it seems like it would be more natural to handle this
 without some regime of locks and pins and buffer cleanup locks on the
 buffers that are storing the pending list, rather than a heavyweight
 lock on the whole relation.  But I am just waving my hands wildly
 here.


 I also thought of a buffer clean up lock on the pending list head buffer
 to represent the right to do a clean up.  But with the proviso that once
 you have obtained the clean up lock, you can then drop the exclusive buffer
 content lock and continue to hold the conceptual lock just by maintaining
 the pin.  I think that this would be semantically correct, but backends
 doing a cleanup would have to get the lock conditionally, and I think you
 would have too many chances for false failures where it bails out when the
 other party simply holds a pin.  I guess I could implement it and see how
 it fairs in my test case.


Attached is a patch to deal with this without the heavyweight locks.

I realized that using the clean up lock on the meta page, rather than the
pending head page, would be easier to implement as a pin was already held
on the meta page throughout, and the pending head page can change during
the cleanup if we need multiple passes.

Also, I think the clean up lock on the metapage should actually be easier.
All queries need to visit the pending head, and they hold it long enough to
check all the keys (possibly hundreds, checked with arbitrarily slow SQL
functions) on that page.  The metapage is only checked for a few variables
which are C types.

I thought of checking for metadata-head == InvalidBlockNumber with a
sharelock before getting the clean-up lock and then again after, but decide
against it as the user backends already check that immediately before
calling this function, and wouldn't call it if there was no pending list
as-of that check.

I exchange the exclusive context lock given to us by
the LockBufferForCleanup for a share content lock, so as to not hold the
exclusive lock over the IO possibly needed to read the pending head page
into buffers.  I don't know that this is actually a win.

This fixed the same problem I was seeing that was fixed by the previous
heavy-weight lock patch.

I still prefer the heavy-weight approach.  The buffer clean up lock for
vacuuming seems fragile to start with, and abusing it for other purposes
doesn't improve on that.

Whichever approach is taken, more work is needed on the comments.  And the
code that currently checks for concurrent cleanup and bails out needs to be
changed to throw errors or something instead.  But I don't want to make too
many changes until I know which approach to take, and whether it will be
back-patched.

Cheers,

Jeff


gin_pending_lwlock.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential GIN vacuum bug

2015-08-30 Thread Jeff Janes
On Sun, Aug 30, 2015 at 11:11 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Jeff Janes jeff.ja...@gmail.com writes:

 Your earlier point about how the current design throttles insertions to
 keep the pending list from growing without bound seems like a bigger deal
 to worry about.  I think we'd like to have some substitute for that.
 Perhaps we could make the logic in insertion be something like

 if (pending-list-size  threshold)
 {
 if (conditional-lock-acquire(...))
 {
 do-pending-list-cleanup;
 lock-release;
 }
 else if (pending-list-size  threshold * 2)
 {
 unconditional-lock-acquire(...);
 if (pending-list-size  threshold)
 do-pending-list-cleanup;
 lock-release;
 }
 }

 so that once the pending list got too big, incoming insertions would wait
 for it to be cleared.  Whether to use a 2x safety margin or something else
 could be a subject for debate, of course.


If the goal is to not change existing behavior (like for back patching) the
margin should be 1, always wait.  But we would still have to deal with the
fact that unconditional acquire attempt by the backends will cause a vacuum
to cancel itself, which is undesirable.  If we define a new namespace for
this lock (like the relation extension lock has its own namespace) then
perhaps the cancellation code could be made to not cancel on that
condition.  But that too seems like a lot of work to backpatch.

Would we bother to back-patch a theoretical bug which there is no evidence
is triggering in the field?  Of course, if people are getting bit by this,
they probably wouldn't know.  You search for malevolent unicorns, get no
hits, and just assume there are no hits, without scouring the table and
seeing it is an index problem.  Or if you do realize it is an index
problem, you would probably never trace it back to the cause of the
problem.  There are quite a few reports of mysterious index corruptions
which never get resolved.

If we want to improve the current behavior rather than fix a bug, then I
think that if the list is greater than threshold*2 and the cleaning lock is
unavailable, what it should do is proceed to insert the tuple's keys into
the index itself, as if fastupdate = off.  That would require some major
surgery to the existing code, as by the time it invokes the clean up, it is
too late to not insert into the pending list.


Cheers,

Jeff


  1   2   3   4   5   6   >