Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-11-09 Thread Heikki Linnakangas

On 11/09/2014 08:06 AM, Michael Paquier wrote:

On Sat, Nov 8, 2014 at 5:40 PM, David Rowley dgrowle...@gmail.com wrote:

Please find attached another small fix. This time it's just a small typo in
the README, and just some updates to some, now outdated docs.

Speaking about the feature... The index operators are still named with
minmax, wouldn't it be better to switch to brin?


All the built-in opclasses still implement the min-max policy - they 
store the min and max values. BRIN supports other kinds of opclasses, 
like storing a containing box for points, but no such opclasses have 
been implemented yet.


Speaking of which, Alvaro, any chance we could get such on opclass still 
included into 9.5? It would be nice to have one, just to be sure that 
nothing minmax-specific has crept into the BRIN code.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for detailed description of errors cased by trigger-violations

2014-11-09 Thread Andreas Joseph Krogh
På lørdag 08. november 2014 kl. 23:39:50, skrev Tom Lane t...@sss.pgh.pa.us 
mailto:t...@sss.pgh.pa.us: Andreas Joseph Krogh andr...@visena.com writes:
  Hi. �� When working with Oracle it is possible to catch 
constraint-violations
  caused by triggers using JDBC, but it seems this isn't possible using PG, 
see
  this thread:
  https://github.com/impossibl/pgjdbc-ng/issues/111#issuecomment-62276464

 I'm not exactly following the point.  The complaint seems to be that

     RAISE EXCEPTION 'ID must be less then 10';

 doesn't send anything except the given primary message and a generic
 SQLSTATE.  Well, duh: it's not supposed to.  There are a bunch of options
 you can supply in RAISE to populate additional fields of the error report.
 For example, you could add

     USING SCHEMA = TG_TABLE_SCHEMA, TABLE = TG_TABLE_NAME

 if you wanted the report to name the table the trigger is attached to.

 So it seems to me this is lazy plpgsql programming, not a missing feature.
 It would only be a missing feature if you think plpgsql should try to
 auto-populate these fields; but I'd be against that because it would
 require too many assumptions about exactly what the function might be
 complaining about.

 regards, tom lane   This is fantastic, thanks Tom! It indeed was sloppy 
plpgsql programming. I didn't know about these extra arguments of RAISE making 
it possible to fine-tune the error-report from triggers.   Very nice and thanks 
again for making me look the right place.   -- Andreas Joseph Krogh CTO / 
Partner - Visena AS Mobile: +47 909 56 963 andr...@visena.com 
mailto:andr...@visena.com www.visena.com https://www.visena.com  
https://www.visena.com  

Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-11-09 Thread Fujii Masao
On Sat, Nov 8, 2014 at 4:56 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 I just pushed this, after some more minor tweaks.

Nice!

 Thanks, and please do continue testing!

I got the following PANIC error in the standby server when I set up
the replication servers and ran make installcheck. Note that I was
repeating the manual CHECKPOINT every second while installcheck
was running. Without the checkpoints, I could not reproduce the
problem. I'm not sure if CHECKPOINT really triggers this problem, though.
Anyway BRIN seems to have a problem around its WAL replay.

2014-11-09 22:19:42 JST sby1 WARNING:  page 547 of relation
base/16384/30878 does not exist
2014-11-09 22:19:42 JST sby1 CONTEXT:  xlog redo BRIN/UPDATE: rel
1663/16384/30878 heapBlk 6 revmapBlk 1 pagesPerRange 1 old TID (3,2)
TID (547,2)
2014-11-09 22:19:42 JST sby1 PANIC:  WAL contains references to invalid pages
2014-11-09 22:19:42 JST sby1 CONTEXT:  xlog redo BRIN/UPDATE: rel
1663/16384/30878 heapBlk 6 revmapBlk 1 pagesPerRange 1 old TID (3,2)
TID (547,2)
2014-11-09 22:19:47 JST sby1 LOG:  startup process (PID 15230) was
terminated by signal 6: Abort trap
2014-11-09 22:19:47 JST sby1 LOG:  terminating any other active server processes

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-11-09 Thread Fujii Masao
On Sun, Nov 9, 2014 at 6:41 AM, Rahila Syed rahilasye...@gmail.com wrote:
 Hello,

The patch was not applied to the master cleanly. Could you update the
 patch?
 Please find attached updated and rebased patch to compress FPW. Review
 comments given above have been implemented.

Thanks for updating the patch! Will review it.

BTW, I got the following compiler warnings.

xlogreader.c:755: warning: assignment from incompatible pointer type
autovacuum.c:1412: warning: implicit declaration of function
'CompressBackupBlocksPagesAlloc'
xlogreader.c:755: warning: assignment from incompatible pointer type

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Order of views in stats docs

2014-11-09 Thread Magnus Hagander
On Thu, Nov 6, 2014 at 3:01 PM, Peter Eisentraut pete...@gmx.net wrote:
 On 11/6/14 6:16 AM, Magnus Hagander wrote:
 Another thought I had in that case is maybe we need to break out the
 pg_stat_activity and pg_stat_replication views into their own table.
 They are really the only two views that are different in a lot of
 ways. Maybe call the splits session statistics views and object
 statistics views, instead of just standard statistics views? The
 difference being that they show snapshots of *right now*, whereas all
 the other views show accumulated statistics over time.

 Yeah, splitting this up a bit would help, too.

Here's an initial run of this. It still feels wrong that we have the
dynamic views under the statistics collector. Perhaps longterm we
should look at actually splitting them out to a completely different
sect1?

I only fixed the obvious parts in this - the split out, and the moving
of pg_stat_database_conflicts. But it's a first step at least.

Objections?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/
*** a/doc/src/sgml/monitoring.sgml
--- b/doc/src/sgml/monitoring.sgml
***
*** 147,155  postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
/para
  
para
!productnamePostgreSQL/productname also supports reporting of the exact
!command currently being executed by other server processes.  This
!facility is independent of the collector process.
/para
  
   sect2 id=monitoring-stats-setup
--- 147,157 
/para
  
para
!productnamePostgreSQL/productname also supports reporting dynamic
!information about exactly what is going on in the system right now, such as
!the exact command currently being executed by other server processes, and
!which other connections exist in the system.  This facility is independent
!of the collector process.
/para
  
   sect2 id=monitoring-stats-setup
***
*** 211,228  postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   /sect2
  
   sect2 id=monitoring-stats-views
!   titleViewing Collected Statistics/title
  
para
 Several predefined views, listed in xref
!linkend=monitoring-stats-views-table, are available to show the results
 of statistics collection.  Alternatively, one can
 build custom views using the underlying statistics functions, as discussed
 in xref linkend=monitoring-stats-functions.
/para
  
para
!When using the statistics to monitor current activity, it is important
 to realize that the information does not update instantaneously.
 Each individual server process transmits new statistical counts to
 the collector just before going idle; so a query or transaction still in
--- 213,233 
   /sect2
  
   sect2 id=monitoring-stats-views
!   titleViewing Statistics/title
  
para
 Several predefined views, listed in xref
!linkend=monitoring-stats-dynamic-views-table, are available to show
!the current state of the system. There are also several other
!views, listed in xref
!linkend=monitoring-stats-views-table, available to show the results
 of statistics collection.  Alternatively, one can
 build custom views using the underlying statistics functions, as discussed
 in xref linkend=monitoring-stats-functions.
/para
  
para
!When using the statistics to monitor collected data, it is important
 to realize that the information does not update instantaneously.
 Each individual server process transmits new statistical counts to
 the collector just before going idle; so a query or transaction still in
***
*** 263,270  postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
 stated above; instead they update continuously throughout the transaction.
/para
  
!   table id=monitoring-stats-views-table
!titleStandard Statistics Views/title
  
 tgroup cols=2
  thead
--- 268,275 
 stated above; instead they update continuously throughout the transaction.
/para
  
!   table id=monitoring-stats-dynamic-views-table
!titleDynamic Statistics Views/title
  
 tgroup cols=2
  thead
***
*** 288,293  postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
--- 293,322 
   /row
  
   row
+   entrystructnamepg_stat_replication/indextermprimarypg_stat_replication/primary/indexterm/entry
+   entryOne row per WAL sender process, showing statistics about
+replication to that sender's connected standby server.
+See xref linkend=pg-stat-replication-view for details.
+   /entry
+  /row
+ 
+ /tbody
+/tgroup
+   /table
+ 
+   table id=monitoring-stats-views-table
+titleCollected Statistics Views/title
+ 
+tgroup cols=2
+ thead
+  row
+   entryView Name/entry
+   entryDescription/entry
+  /row
+   

Re: [HACKERS] tracking commit timestamps

2014-11-09 Thread Steve Singer

On 11/07/2014 07:07 PM, Petr Jelinek wrote:


The list of what is useful might be long, but we can't have everything 
there as there are space constraints, and LSN is another 8 bytes and I 
still want to have some bytes for storing the origin or whatever you 
want to call it there, as that's the one I personally have biggest 
use-case for.
So this would be ~24bytes per txid already, hmm I wonder if we can 
pull some tricks to lower that a bit.




The reason why Jim and myself are asking for the LSN and not just the 
timestamp is that I want to be able to order the transactions. Jim 
pointed out earlier in the thread that just ordering on timestamp allows 
for multiple transactions with the same timestamp.


Maybe we don't need the entire LSN to solve that.  If you already have 
the commit timestamp maybe you only need another byte or two of 
granularity to order transactions that are within the same microsecond.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-11-09 Thread Greg Stark
On Sun, Nov 9, 2014 at 9:18 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Speaking of which, Alvaro, any chance we could get such on opclass still
 included into 9.5? It would be nice to have one, just to be sure that
 nothing minmax-specific has crept into the BRIN code.

I'm trying to do a bloom filter Brin index. I'm already a bit puzzled
by a few things but I've just started so maybe it'll become clear.

From what I've seen so far it feels more likely there's the opposite.
There's some boilerplate that I'm doing that feels like it could be
pushed down into general Brin code since it'll be the same for every
access method.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-11-09 Thread Greg Stark
On Sun, Nov 9, 2014 at 5:06 PM, Greg Stark st...@mit.edu wrote:
 I'm trying to do a bloom filter Brin index. I'm already a bit puzzled
 by a few things but I've just started so maybe it'll become clear.


So some quick comments from pretty early goings -- partly because I'm
afraid once I get past them I'll forget what it was I was confused
by


1) The manual describes the exensibility API including the BrinOpcInfo
struct -- but it doesn't define the BrinDesc struct that every API
method takes. It's not clear what exactly that argument is for or how
to make use of it.

2) The mention about additional opclass operators and to number them
from 11 up is fine -- but there's no explanation of how to decide what
operators need to be explicitly added like that. Specifically I gather
from reading minmax that = is handled internally by Brin and you only
need to add any other operators aside from = ? Is that right?

3) It's not entirely clear in the docs when each method is will be
invoked. Specifically it's not clear whether opcInfo is invoked once
when the index is defined or every time the definition is loaded to be
used. I gather it's the latter? Perhaps there needs to be a method
that's invoked specifically when the index is defined? I'm wondering
where I'm going to hook in the logic to determine the size and number
of hash functions to use for the bloom filter which needs to be
decided once when the index is created and then static for the index
in the future.

4) It doesn't look like BRIN handles cross-type operators at all. For
example this query with btree indexes can use the index just fine
because it looks up the operator based on both the left and right
operands:

::***# explain select * from data where i = 1::smallint;
┌─┐
│ QUERY PLAN  │
├─┤
│ Index Scan using btree_i on data  (cost=0.42..8.44 rows=1 width=14) │
│   Index Cond: (i = 1::smallint) │
└─┘
(2 rows)

But Minmax opclasses don't contain the cross-type operators and in
fact looking at the code I don't think minmax would be able to cope
(minmax_get_procinfo doesn't even get passed the type int he qual,
only the type of the column).

::***# explain select * from data2 where i = 1::smallint;
┌──┐
│QUERY PLAN│
├──┤
│ Seq Scan on data2  (cost=0.00..18179.00 rows=1 width=14) │
│   Filter: (i = 1::smallint)  │
└──┘
(2 rows)

Time: 0.544 ms

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-11-09 Thread Heikki Linnakangas

On 11/08/2014 01:57 AM, Petr Jelinek wrote:

My main problem is actually not with having tuple per-seqAM, but more
with the fact that Heikki does not want to have last_value as compulsory
column/parameter. How is the new AM then supposed to know where to pick
up and if it even can pick up?


Call nextval(), and use the value it returns as the starting point for 
the new AM.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: dynahash replacement for buffer table

2014-11-09 Thread Andres Freund
On 2014-11-07 11:08:57 -0500, Robert Haas wrote:
 On Wed, Nov 5, 2014 at 6:19 PM, Andres Freund and...@2ndquadrant.com wrote:
  * In benchmarks it becomes apparent that the dynamic element width makes
some macros like CHashTableGetNode() and
CHashTableGetGarbageByBucket() quite expensive. At least gcc on x86
can't figure how to build an efficient expression for the
target. There's two ways to address this:
a) Actually morph chash into something that will be included into the
   user sites. Since I think there'll only be a limited number of those
   that's probably acceptable.
 
 Have you benchmarked this?  How much does it help?

I've done quick benchmarks, and it helped in the 2-3% range in some
tests, and was a wash in others. What I did wasn't actually including it
in buf_table.c, but hardcoding the size in chash. That's obviously not
the real solution.

b) Simplify the access rules. I quite seriously doubt that the
  interleaving of garbage and freelists is actually necessary/helpful.
 
 That wasn't added for nothing.  I did a bunch of performance testing
 on this vs. dynahash (I think the test code is included in the patch)
 and the region of memory containing the freelists practically caught
 fire.  Spreading them out like this greatly improved the performance
 under heavy concurrency, but even with those changes the freelists
 were extremely hot.  Now, since this is the buffer mapping table
 rather than a tight loop around hash table manipulation, the
 difference may not be enough to measure.  But on a microbenchmark it
 *definitely* was.

I think it'd be much less visible for the buffer manager, but it might
still be visible...

  * There's several cases where it's noticeable that chash creates more
cacheline misses - that's imo a good part of why the single threaded
performance regresses. There's good reasons for the current design
without a inline bucket, namely that it makes the concurrency handling
easier. But I think that can be countered by still storing a pointer -
just to an element inside the bucket. Afaics that'd allow this to be
introduced quite easily?
 
 It's not obvious to me how that would work.  If you just throw those
 elements on the garbage lists with everything else, it will soon be
 the case that one bucket header ends up using the cell stored in some
 other bucket, which isn't going to be awesome.  So you need some kind
 of more complex scheme to preserve locality.

Treating the element inside the bucket as a kind of one element freelist
seems quite workable to me. Test whether it's marked deleted, scan the
hazard pointer list to see whether it can be used. If yes, grand. If
not, go to the current code. The fact that the element is cacheline
local will make the test for its deletedness almost free. Yes, the
hazard pointer scan surely ain't free - but at least for cases like
bufmgr where reads are never less frequent than writes and very often
*much* more frequent I'm pretty sure it'd be a noticeable win.

I'm afraid that I can't see us going forward unless we address the
noticeable single threaded penalties. Do you see that differently?
 
 I'm not sure.  We're talking about something on the order of half a
 percent on my tests, and lots of changes cause things to bounce around
 by that much.  I'm not sure it makes sense to get too worked up about
 that if the alternative is to add a big pile of new complexity.

I saw things in the range of 3-4% on my laptop.

  * There's some whitespace damage. (space followed by tab, followed by
actual contents)
 
 That's the least of our problems at this point.

Sure, but still worth cleaning up ;)

  * I still think it's a good idea to move the full memory barriers into
the cleanup/writing process by doing write memory barriers when
acquiring a hazard pointer and RMW atomic operations (e.g. atomic add)
in the process testing for the hazard pointer.
 
 Can you cite any existing precedent in other system software?  Does
 Linux do anything like that, for example?

No, I can't right now. I think it follows from the cache coherency
rules, but I can understand suspicion there.

  * Shouldn't we relax the CPU in a couple places like CHashAllocate(),
CHashBucketScan()'s loops?
 
 You mean like, have it sleep the way SpinLockAcquire() does?

Not actually like in s_lock(), but rather like the SPIN_DELAY() used in
the spinlock code for retries.

 Yeah, possibly, but that adds non-trivial code complexity which may
 not be worth it if - as is hoped for - there's no real contention
 there.

I think just adding a pg_spin_delay() before retrying should be trivial.

  * I don't understand right now (but then I'm in a Hotel bar) why it's
safe for CHashAddToGarbage() to clobber the hash value?
CHashBucketScan() relies the chain being ordered by hashcode. No?
Another backend might just have been past the IsMarked() bit and look
at the hashcode?
 
 I think 

Re: [HACKERS] row_to_json bug with index only scans: empty keys!

2014-11-09 Thread Tom Lane
I wrote:
 We could reduce the risks involved by narrowing the cases in which
 ExecEvalWholeRowVar will replace field names it got from the input.
 I'd be inclined to propose:

 1. If Var is of a named composite type, use *exactly* the field names
 associated with that type.  (This avoids the need to possibly produce
 RECORD outputs from a named-type Var, thus removing the Assert-weakening
 issue.)

 2. If Var is of type RECORD, replace only empty field names with aliases
 from the RTE.  (This might sound inconsistent --- could you end up with
 some names coming from point A and some from point B? --- but in practice
 I think it would always be all-or-nothing, because the issue is whether
 or not the planner bothered to attach column names to a lower-level
 targetlist.)

Attached are patches meant for HEAD and 9.2-9.4 respectively.  The HEAD
patch extends the prior patch to fix the RowExpr case I mentioned
yesterday.  The back-branch patch works as suggested above.  I added a
bunch of regression tests that document behavior in this area.  The two
patches include the same set of tests but have different expected output
for the cases we are intentionally not fixing in back branches.  If you
try the back-branch regression tests on an unpatched backend, you can
verify that the only cases that change behavior are ones where current
sources put out empty field names.

The test cases use row_to_json() so they could not be used directly before
9.2.  I tried the same cases using hstore(record) in 9.1.  While 9.1 does
some pretty darn dubious things, it does not produce empty field names in
any of these cases, so I think we probably should not consider
back-patching further than 9.2.

regards, tom lane

diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 7cfa63f..88af735 100644
*** a/src/backend/executor/execQual.c
--- b/src/backend/executor/execQual.c
***
*** 50,55 
--- 50,56 
  #include nodes/nodeFuncs.h
  #include optimizer/planner.h
  #include parser/parse_coerce.h
+ #include parser/parsetree.h
  #include pgstat.h
  #include utils/acl.h
  #include utils/builtins.h
*** ExecEvalWholeRowVar(WholeRowVarExprState
*** 712,717 
--- 713,720 
  {
  	Var		   *variable = (Var *) wrvstate-xprstate.expr;
  	TupleTableSlot *slot;
+ 	TupleDesc	output_tupdesc;
+ 	MemoryContext oldcontext;
  	bool		needslow = false;
  
  	if (isDone)
*** ExecEvalWholeRowVar(WholeRowVarExprState
*** 787,794 
  			/* If so, build the junkfilter in the query memory context */
  			if (junk_filter_needed)
  			{
- MemoryContext oldcontext;
- 
  oldcontext = MemoryContextSwitchTo(econtext-ecxt_per_query_memory);
  wrvstate-wrv_junkFilter =
  	ExecInitJunkFilter(subplan-plan-targetlist,
--- 790,795 
*** ExecEvalWholeRowVar(WholeRowVarExprState
*** 860,869 
  needslow = true;	/* need runtime check for null */
  		}
  
  		ReleaseTupleDesc(var_tupdesc);
  	}
  
! 	/* Skip the checking on future executions of node */
  	if (needslow)
  		wrvstate-xprstate.evalfunc = (ExprStateEvalFunc) ExecEvalWholeRowSlow;
  	else
--- 861,920 
  needslow = true;	/* need runtime check for null */
  		}
  
+ 		/*
+ 		 * Use the variable's declared rowtype as the descriptor for the
+ 		 * output values, modulo possibly assigning new column names below. In
+ 		 * particular, we *must* absorb any attisdropped markings.
+ 		 */
+ 		oldcontext = MemoryContextSwitchTo(econtext-ecxt_per_query_memory);
+ 		output_tupdesc = CreateTupleDescCopy(var_tupdesc);
+ 		MemoryContextSwitchTo(oldcontext);
+ 
  		ReleaseTupleDesc(var_tupdesc);
  	}
+ 	else
+ 	{
+ 		/*
+ 		 * In the RECORD case, we use the input slot's rowtype as the
+ 		 * descriptor for the output values, modulo possibly assigning new
+ 		 * column names below.
+ 		 */
+ 		oldcontext = MemoryContextSwitchTo(econtext-ecxt_per_query_memory);
+ 		output_tupdesc = CreateTupleDescCopy(slot-tts_tupleDescriptor);
+ 		MemoryContextSwitchTo(oldcontext);
+ 	}
  
! 	/*
! 	 * Construct a tuple descriptor for the composite values we'll produce,
! 	 * and make sure its record type is blessed.  The main reason to do this
! 	 * is to be sure that operations such as row_to_json() will see the
! 	 * desired column names when they look up the descriptor from the type
! 	 * information embedded in the composite values.
! 	 *
! 	 * We already got the correct physical datatype info above, but now we
! 	 * should try to find the source RTE and adopt its column aliases, in case
! 	 * they are different from the original rowtype's names.  For example, in
! 	 * SELECT foo(t) FROM tab t(x,y), the first two columns in the composite
! 	 * output should be named x and y regardless of tab's column names.
! 	 *
! 	 * If we can't locate the RTE, assume the column names we've got are OK.
! 	 * (As of this writing, the only cases where we can't locate the RTE are
! 	 * in execution of trigger WHEN 

[HACKERS] Column/type dependency recording inconsistencies

2014-11-09 Thread Petr Jelinek

Hi,

While working on an extension upgrade script I got hit by what I believe 
is a bug in the way we record dependencies between columns and types.


CREATE TABLE records dependency between relation and type, not between 
column and type, but ALTER TABLE ADD COLUMN and ALTER TABLE ALTER COLUMN 
TYPE record dependencies between relation column and type and not 
between relation and type


Now this might seem harmless (and apparently is most of the time), but 
there is one case where this breaks things - extensions.
If one creates type in an extension and then uses that type for some 
column in CREATE TABLE statement, everything is fine. But if one then 
uses that type in either ALTER TABLE ADD COLUMN or ALTER TABLE ALTER 
COLUMN TYPE then the extension becomes undroppable without CASCADE which 
is clearly wrong.


I attached sample extension that demonstrates this problem. Output of my 
psql console when creating/dropping this extension:

postgres=# create extension deptestext ;
CREATE EXTENSION
postgres=# drop extension deptestext;
ERROR:  cannot drop extension deptestext because other objects depend on it
DETAIL:  table testtable column undroppablecol2 depends on type 
undroppabletype

table testtable column undroppablecol1 depends on type undroppabletype
HINT:  Use DROP ... CASCADE to drop the dependent objects too.


I see two possible fixes for this:
 - either record relation/type dependency for ALTER TABLE ADD COLUMN 
and ALTER TABLE ALTER COLUMN TYPE statements instead of column/type one
 - or record dependency between the column and extension for ALTER 
TABLE ADD COLUMN and ALTER TABLE ALTER COLUMN TYPE statements


Thoughts?

Oh and btw looking at the code I think there is same issue with 
column/collation dependencies.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


deptestext--1.0.sql
Description: application/sql
default_version = '1.0'

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column/type dependency recording inconsistencies

2014-11-09 Thread Tom Lane
Petr Jelinek p...@2ndquadrant.com writes:
 CREATE TABLE records dependency between relation and type, not between 
 column and type, but ALTER TABLE ADD COLUMN and ALTER TABLE ALTER COLUMN 
 TYPE record dependencies between relation column and type and not 
 between relation and type

Really?  I get

regression=# CREATE TYPE droppabletype1 AS (a integer, b text);
CREATE TYPE
regression=# CREATE TABLE testtable (droppablecol1 droppabletype1, 
undroppablecol1 int);
CREATE TABLE
regression=# CREATE TYPE droppabletype2 AS (a integer, b text);
CREATE TYPE
regression=# alter table testtable add column f3 droppabletype2;
ALTER TABLE
regression=# select pg_describe_object(classid,objid,objsubid) as obj, 
pg_describe_object(refclassid,refobjid,refobjsubid) as ref, deptype from 
pg_depend where refobjid = 'droppabletype1'::regtype;
 obj  | ref | deptype 
--+-+-
 composite type droppabletype1| type droppabletype1 | i
 type droppabletype1[]| type droppabletype1 | i
 table testtable column droppablecol1 | type droppabletype1 | n
(3 rows)

regression=# select pg_describe_object(classid,objid,objsubid) as obj, 
pg_describe_object(refclassid,refobjid,refobjsubid) as ref, deptype from 
pg_depend where refobjid = 'droppabletype2'::regtype;
  obj  | ref | deptype 
---+-+-
 composite type droppabletype2 | type droppabletype2 | i
 type droppabletype2[] | type droppabletype2 | i
 table testtable column f3 | type droppabletype2 | n
(3 rows)

The dependencies look just the same to me ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column/type dependency recording inconsistencies

2014-11-09 Thread Petr Jelinek

On 09/11/14 22:23, Tom Lane wrote:

regression=# select pg_describe_object(classid,objid,objsubid) as obj, 
pg_describe_object(refclassid,refobjid,refobjsubid) as ref, deptype from 
pg_depend where refobjid = 'droppabletype1'::regtype;
  obj  | ref | deptype
--+-+-
  composite type droppabletype1| type droppabletype1 | i
  type droppabletype1[]| type droppabletype1 | i
  table testtable column droppablecol1 | type droppabletype1 | n
(3 rows)

regression=# select pg_describe_object(classid,objid,objsubid) as obj, 
pg_describe_object(refclassid,refobjid,refobjsubid) as ref, deptype from 
pg_depend where refobjid = 'droppabletype2'::regtype;
   obj  | ref | deptype
---+-+-
  composite type droppabletype2 | type droppabletype2 | i
  type droppabletype2[] | type droppabletype2 | i
  table testtable column f3 | type droppabletype2 | n
(3 rows)

The dependencies look just the same to me ...



Hmm actually you are correct, I somehow missed it when I looked manually 
(I didn't use pg_describe_object).


But the problem with the extension persists, I will try to dig more to 
find what is the real cause.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format and API changes (9.5)

2014-11-09 Thread Andres Freund
On 2014-11-06 17:32:33 +0200, Heikki Linnakangas wrote:
 Replying to some of your comments below. The rest were trivial issues that
 I'll just fix.
 
 On 10/30/2014 09:19 PM, Andres Freund wrote:
 * Is it really a good idea to separate XLogRegisterBufData() from
XLogRegisterBuffer() the way you've done it ? If we ever actually get
a record with a large numbers of blocks touched this issentially is
O(touched_buffers*data_entries).
 
 Are you worried that the linear search in XLogRegisterBufData(), to find the
 right registered_buffer struct, might be expensive? If that ever becomes a
 problem, a simple fix would be to start the linear search from the end, and
 make sure that when you touch a large number of blocks, you do all the
 XLogRegisterBufData() calls right after the corresponding
 XLogRegisterBuffer() call.

Yes, that was what I was (mildly) worried about. Since you specified a
high limit of buffers I'm sure someone will come up with a use case for
it ;)

 I've also though about having XLogRegisterBuffer() return the pointer to the
 struct, and passing it as argument to XLogRegisterBufData. So the pattern in
 WAL generating code would be like:
 
 registered_buffer *buf0;
 
 buf0 = XLogRegisterBuffer(0, REGBUF_STANDARD);
 XLogRegisterBufData(buf0, data, length);
 
 registered_buffer would be opaque to the callers. That would have potential
 to turn XLogRegisterBufData into a macro or inline function, to eliminate
 the function call overhead. I played with that a little bit, but the
 difference in performance was so small that it didn't seem worth it. But
 passing the registered_buffer pointer, like above, might not be a bad thing
 anyway.

Yes, that was roughly what I was thinking of as well. It's not all that
pretty, but it generally does seem like a good idea to me anyay.

 * There's lots of functions like XLogRecHasBlockRef() that dig through
the wal record. A common pattern is something like:
 if (XLogRecHasBlockRef(record, 1))
  XLogRecGetBlockTag(record, 1, NULL, NULL, oldblk);
 else
  oldblk = newblk;
 
I think doing that repeatedly is quite a bad idea. We should parse the
record once and then use it in a sensible format. Not do it in pieces,
over and over again. It's not like we ignore backup blocks - so doing
this lazily doesn't make sense to me.
Especially as ValidXLogRecord() *already* has parsed the whole damn
thing once.
 
 Hmm. Adding some kind of a parsed XLogRecord representation would need a
 fair amount of new infrastructure.

True.

 Vast majority of WAL records contain one,
 maybe two, block references, so it's not that expensive to find the right
 one, even if you do it several times.

I'm not convinced. It's not an infrequent thing these days to hear
people being bottlenecked by replay. And grovelling repeatedly through
larger records isn't *that* cheap.

 I ran a quick performance test on WAL replay performance yesterday. I ran
 pgbench for 100 transactions with WAL archiving enabled, and measured
 the time it took to replay the generated WAL. I did that with and without
 the patch, and I didn't see any big difference in replay times. I also ran
 perf on the startup process, and the profiles looked identical. I'll do
 more comprehensive testing later, with different index types, but I'm
 convinced that there is no performance issue in replay that we'd need to
 worry about.

Interesting. What checkpoint_segments/timeout and what scale did you
use? Since that heavily influences the average size of the record that's
quite relevant...

 If it mattered, a simple optimization to the above pattern would be to have
 XLogRecGetBlockTag return true/false, indicating if the block reference
 existed at all. So you'd do:
 
 if (!XLogRecGetBlockTag(record, 1, NULL, NULL, oldblk))
 oldblk != newblk;

That sounds like a good idea anyway?

 On the other hand, decomposing the WAL record into parts, and passing the
 decomposed representation to the redo routines would allow us to pack the
 WAL record format more tightly, as accessing the different parts of the
 on-disk format wouldn't then need to be particularly fast. For example, I've
 been thinking that it would be nice to get rid of the alignment padding in
 XLogRecord, and between the per-buffer data portions. We could copy the data
 to aligned addresses as part of the decomposition or parsing of the WAL
 record, so that the redo routines could still assume aligned access.

Right. I think it'd generally give us a bit more flexibility.

 * I wonder if it wouldn't be worthwile, for the benefit of the FPI
compression patch, to keep the bkp block data after *all* the
headers. That'd make it easier to just compress the data.
 
 Maybe. If we do that, I'd also be inclined to move all the bkp block headers
 to the beginning of the WAL record, just after the XLogInsert struct.

 Somehow it feels weird to have a bunch of header structs sandwiched between
 the rmgr-data and per-buffer 

Re: [HACKERS] Column/type dependency recording inconsistencies

2014-11-09 Thread Tom Lane
Petr Jelinek p...@2ndquadrant.com writes:
 But the problem with the extension persists, I will try to dig more to 
 find what is the real cause.

Hm ... I reproduced the problem here.  I can't see anything that looks
wrong about the pg_depend entries:

  obj   |  ref  
| deptype 
+---+-
 extension deptestext   | schema public 
| n
 composite type droppabletype1  | type droppabletype1   
| i
 type droppabletype1[]  | type droppabletype1   
| i
 type droppabletype1| schema public 
| n
 type droppabletype1| extension deptestext  
| e
 table testtable| schema public 
| n
 table testtable| extension deptestext  
| e
 table testtable column droppablecol2   | type droppabletype1   
| n
 table testtable column droppablecol1   | type droppabletype1   
| n
 table testtable column undroppablecol1 | type undroppabletype  
| n
 table testtable column undroppablecol2 | type undroppabletype  
| n
 type testtable[]   | type testtable
| i
 type testtable | table testtable   
| i
 composite type undroppabletype | type undroppabletype  
| i
 type undroppabletype[] | type undroppabletype  
| i
 type undroppabletype   | schema public 
| n
 type undroppabletype   | extension deptestext  
| e
 toast table pg_toast.pg_toast_162813   | table testtable   
| i
 type pg_toast.pg_toast_162813  | toast table pg_toast.pg_toast_162813  
| i
 index pg_toast.pg_toast_162813_index   | toast table pg_toast.pg_toast_162813 
column chunk_id  | a
 index pg_toast.pg_toast_162813_index   | toast table pg_toast.pg_toast_162813 
column chunk_seq | a
(21 rows)

but sure enough:

d1=# drop extension deptestext;
ERROR:  cannot drop extension deptestext because other objects depend on it
DETAIL:  table testtable column undroppablecol2 depends on type undroppabletype
table testtable column undroppablecol1 depends on type undroppabletype
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

I suspect this is actually a bug in the dependency search logic in DROP.
Don't have the energy to chase it right now though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column/type dependency recording inconsistencies

2014-11-09 Thread Petr Jelinek

On 09/11/14 23:04, Tom Lane wrote:

but sure enough:

d1=# drop extension deptestext;
ERROR:  cannot drop extension deptestext because other objects depend on it
DETAIL:  table testtable column undroppablecol2 depends on type undroppabletype
table testtable column undroppablecol1 depends on type undroppabletype
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

I suspect this is actually a bug in the dependency search logic in DROP.
Don't have the energy to chase it right now though.



Yes that's where I started searching also and yes it is. The actual 
situation is that the columns of the table that is about to be dropped 
are normally not added to the object list that is used for dependency 
checks (if the table is going to be dropped who cares about what column 
depends on anyway).
But this logic depends on the fact that the columns that belong to that 
table are processed after the table was already processed, however as 
the new type was added after the table was added, it's processed before 
the table and because of the dependency between type and columns, the 
columns are also processed before the table and so they are added to the 
object list for further dependency checking.


See use and source of object_address_present_add_flags in dependency.c.

I am not really sure how to fix this, other than changing this to two 
step process - essentially go through the resulting object list again 
and remove the columns that already have table there, but that's not 
exactly super pretty solution.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] psql tab completion: \c [ dbname [ username ] ]

2014-11-09 Thread Ian Barwick
Hi

Attached is a mighty trivial patch to extend psql tab completion
for \c / \connect to generate a list of role names, as lack thereof
was annoying me recently and I can't see any downside to doing
this.

The patch is a whole two lines so I haven't submitted it to the next
commitfest but will happily do so if appropriate.

Regards


Ian Barwick

-- 
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
new file mode 100644
index 886188c..56dc688
*** a/src/bin/psql/tab-complete.c
--- b/src/bin/psql/tab-complete.c
*** psql_completion(const char *text, int st
*** 3704,3709 
--- 3704,3711 
}
else if (strcmp(prev_wd, \\connect) == 0 || strcmp(prev_wd, \\c) == 
0)
COMPLETE_WITH_QUERY(Query_for_list_of_databases);
+   else if (strcmp(prev2_wd, \\connect) == 0 || strcmp(prev2_wd, \\c) 
== 0)
+   COMPLETE_WITH_QUERY(Query_for_list_of_roles);
  
else if (strncmp(prev_wd, \\da, strlen(\\da)) == 0)
COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_aggregates, NULL);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-11-09 Thread Greg Stark
On Sun, Nov 9, 2014 at 5:57 PM, Greg Stark st...@mit.edu wrote:
 2) The mention about additional opclass operators and to number them
 from 11 up is fine -- but there's no explanation of how to decide what
 operators need to be explicitly added like that. Specifically I gather
 from reading minmax that = is handled internally by Brin and you only
 need to add any other operators aside from = ? Is that right?

I see I totally misunderstood the use of the opclass procedure
functions. I think I understand now but just to be sure -- If I can
only handle BTEqualStrategyNumber keys then is it adequate to just
define the opclass containing only the equality operator?

Somehow I got confused between the amprocs that minmax uses to
implement the consistency function and the amops that the brin index
supports.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-11-09 Thread Petr Jelinek

On 09/11/14 20:47, Heikki Linnakangas wrote:

On 11/08/2014 01:57 AM, Petr Jelinek wrote:

My main problem is actually not with having tuple per-seqAM, but more
with the fact that Heikki does not want to have last_value as compulsory
column/parameter. How is the new AM then supposed to know where to pick
up and if it even can pick up?


Call nextval(), and use the value it returns as the starting point for
the new AM.



Hah, so simple, works for me.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-09 Thread Alvaro Herrera
Robert Haas wrote:

 I think the key question here is the time for which the data needs to
 be retained.  2^32 of anything is a lot, but why keep around that
 number of records rather than more (after all, we have epochs to
 distinguish one use of a given txid from another) or fewer?

The problem is not how much data we retain; is about how much data we
can address.  We must be able to address the data for transaction with
xid=2^32-1, even if you only retain the 1000 most recent transactions.
In fact, we already only retain data back to RecentXmin, if I recall
correctly.  All slru.c users work that way.

Back when pg_multixact/members had the 5-char issue, I came up with a
patch that had each slru.c user declare how many chars maximum were the
filenames.  I didn't push further with that because there was an issue
with it, I don't remember what it was offhand (but I don't think I
posted it).  But this is only needed so that the filenames are all equal
width, which is mostly cosmetical; the rest of the module works fine
with 4- or 5-char filenames, and can be trivially expanded to support 6
or more.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-11-09 Thread Alvaro Herrera
Heikki Linnakangas wrote:

 Speaking of which, Alvaro, any chance we could get such on opclass still
 included into 9.5? It would be nice to have one, just to be sure that
 nothing minmax-specific has crept into the BRIN code.

Emre Hasegeli contributed a patch for range types.  I am hoping he will
post a rebased version that we can consider including.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-11-09 Thread Amit Langote
On Sun, Nov 9, 2014 at 10:30 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sat, Nov 8, 2014 at 4:56 AM, Alvaro Herrera alvhe...@2ndquadrant.com 
 wrote:

 I just pushed this, after some more minor tweaks.

 Nice!

 Thanks, and please do continue testing!

 I got the following PANIC error in the standby server when I set up
 the replication servers and ran make installcheck. Note that I was
 repeating the manual CHECKPOINT every second while installcheck
 was running. Without the checkpoints, I could not reproduce the
 problem. I'm not sure if CHECKPOINT really triggers this problem, though.
 Anyway BRIN seems to have a problem around its WAL replay.

 2014-11-09 22:19:42 JST sby1 WARNING:  page 547 of relation
 base/16384/30878 does not exist
 2014-11-09 22:19:42 JST sby1 CONTEXT:  xlog redo BRIN/UPDATE: rel
 1663/16384/30878 heapBlk 6 revmapBlk 1 pagesPerRange 1 old TID (3,2)
 TID (547,2)
 2014-11-09 22:19:42 JST sby1 PANIC:  WAL contains references to invalid pages
 2014-11-09 22:19:42 JST sby1 CONTEXT:  xlog redo BRIN/UPDATE: rel
 1663/16384/30878 heapBlk 6 revmapBlk 1 pagesPerRange 1 old TID (3,2)
 TID (547,2)
 2014-11-09 22:19:47 JST sby1 LOG:  startup process (PID 15230) was
 terminated by signal 6: Abort trap
 2014-11-09 22:19:47 JST sby1 LOG:  terminating any other active server 
 processes


I could reproduce this using the same steps. It's the same page 547
here too if that's any helpful.

Thanks,
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-11-09 Thread Peter Geoghegan
On Sat, Oct 11, 2014 at 6:34 PM, Peter Geoghegan p...@heroku.com wrote:
 Attached patch, when applied, accelerates all tuplesort cases using
 abbreviated keys, building on previous work here, as well as the patch
 posted to that other thread.

I attach an updated patch set, rebased on top of the master branch's
tip. All relevant tuplesort cases (B-Tree, MinimalTuple, CLUSTER) are
now directly covered by the patch set, since there is now general
sortsupport support for those cases in the master branch -- no need to
apply some other patch from some other thread.

For the convenience of reviewers, this new revision includes a new
approach to making my improvements cumulative: A second commit adds
tuple count estimation. This hint, passed along to the text opclass's
convert routine, is taken from the optimizer's own estimate, or the
relcache's reltuples, depending on the tuplesort case being
accelerated. As in previous revisions, the idea is to give the opclass
a sense of proportion about how far along it is, to be weighed in
deciding whether or not to abort abbreviation. One potentially
controversial aspect of that is how the text opclass abbreviation cost
model/abort early stuff weighs simply having many tuples - past a
certain point, it *always* proceeds with abbreviation, not matter what
the cardinality of abbreviated keys so far is. For that reason it
particular, it seemed to make sense to split these parts out into a
second commit.

I hope that we can finish up all 9.5 work on accelerated sorting soon.
-- 
Peter Geoghegan
From 78d163220b774218170123208c238e0fe2c07eb6 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan p...@heroku.com
Date: Sun, 9 Nov 2014 14:38:44 -0800
Subject: [PATCH 2/2] Estimate total number of rows to be sorted

Sortsupport opclasses now accept a row hint, indicating the estimated
number of rows to be sorted.  This gives opclasses a sense of proportion
about how far along the copying of tuples is when considering aborting
abbreviation.

Estimates come from various sources.  The text opclass now always avoids
aborting abbreviated if the total number of rows to be sorted is high
enough, regardless of anything else.
---
 src/backend/access/nbtree/nbtree.c |  5 ++-
 src/backend/access/nbtree/nbtsort.c| 14 +-
 src/backend/commands/cluster.c |  4 +-
 src/backend/executor/nodeAgg.c |  5 ++-
 src/backend/executor/nodeSort.c|  1 +
 src/backend/utils/adt/orderedsetaggs.c |  2 +-
 src/backend/utils/adt/varlena.c| 80 --
 src/backend/utils/sort/tuplesort.c | 14 --
 src/include/access/nbtree.h|  2 +-
 src/include/utils/sortsupport.h|  7 ++-
 src/include/utils/tuplesort.h  |  6 +--
 11 files changed, 121 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index d881525..d26c60b 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -109,14 +109,15 @@ btbuild(PG_FUNCTION_ARGS)
 		elog(ERROR, index \%s\ already contains data,
 			 RelationGetRelationName(index));
 
-	buildstate.spool = _bt_spoolinit(heap, index, indexInfo-ii_Unique, false);
+	buildstate.spool = _bt_spoolinit(heap, index, indexInfo-ii_Unique,
+	 indexInfo-ii_Predicate != NIL, false);
 
 	/*
 	 * If building a unique index, put dead tuples in a second spool to keep
 	 * them out of the uniqueness check.
 	 */
 	if (indexInfo-ii_Unique)
-		buildstate.spool2 = _bt_spoolinit(heap, index, false, true);
+		buildstate.spool2 = _bt_spoolinit(heap, index, false, true, true);
 
 	/* do the heap scan */
 	reltuples = IndexBuildHeapScan(heap, index, indexInfo, true,
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index c60240f..dd57935 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -73,6 +73,7 @@
 #include storage/smgr.h
 #include tcop/tcopprot.h
 #include utils/rel.h
+#include utils/selfuncs.h
 #include utils/tuplesort.h
 
 
@@ -148,10 +149,13 @@ static void _bt_load(BTWriteState *wstate,
  * create and initialize a spool structure
  */
 BTSpool *
-_bt_spoolinit(Relation heap, Relation index, bool isunique, bool isdead)
+_bt_spoolinit(Relation heap, Relation index, bool isunique, bool ispartial,
+			  bool isdead)
 {
 	BTSpool*btspool = (BTSpool *) palloc0(sizeof(BTSpool));
 	int			btKbytes;
+	double		estRows;
+	float4		relTuples;
 
 	btspool-heap = heap;
 	btspool-index = index;
@@ -164,10 +168,16 @@ _bt_spoolinit(Relation heap, Relation index, bool isunique, bool isdead)
 	 * unique index actually requires two BTSpool objects.  We expect that the
 	 * second one (for dead tuples) won't get very full, so we give it only
 	 * work_mem.
+	 *
+	 * Certain cases will always have a relTuples of 0, such as reindexing as
+	 * part of a CLUSTER operation, or when reindexing toast tables.  This is
+	 * interpreted as no estimate available.
 	 */
 	

[HACKERS] Compiler warning in master branch

2014-11-09 Thread Peter Geoghegan
I see this when I build at -O2 from master's tip:

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -Werror -I../../../../src/include
-D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2   -c -o
brin_revmap.o brin_revmap.c -MMD -MP -MF .deps/brin_revmap.Po
brin_revmap.c: In function ‘brinRevmapExtend’:
brin_revmap.c:113:14: error: variable ‘mapBlk’ set but not used
[-Werror=unused-but-set-variable]
  BlockNumber mapBlk;
  ^

I'm using gcc 4.8.2 here.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_multixact not getting truncated

2014-11-09 Thread Josh Berkus
On 11/08/2014 01:46 PM, Andres Freund wrote:

 I think it'd be a good idea to tune them more automatedly in the
 future. But I think the current situation where you can vastly increase
 multivacuum_freeze_max_age while having
 multivacuum_multixact_freeze_max_age is *much* more useful in practice
 than when they always were the same.

Can you explain that?  Because I'm not picturing the situation where
that would make sense.

 I'm still not convinced that it makes sense to have a separate multixact
 threshold at all **since the same amount of vacuuming needs to be done
 regardless of whether we're truncating xids or mxids**.
 
 That's just plain wrong. The growth rate of one can be nearly
 independent of the other. It can e.g. be very sensible to have a huge
 xid freeze limit, but a much smaller multixact limit.

Yah, so who cares?  Either way you're in for a full table scan, and if
you're doing the full table scan anyway, you might as well freeze the xids.

 Certainly when I play with tuning this for customers, I'm going to lower
 vacuum_freeze_table_age as well.
 
 I'm these days suggesting that people should add manual vacuuming for
 older relations during off peak hours on busy databases. There's too
 many sites which service degrades noticeably during a full table vacuum.

Me too: https://github.com/pgexperts/flexible-freeze

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-11-09 Thread Peter Geoghegan
On Fri, Nov 7, 2014 at 12:57 PM, Robert Haas robertmh...@gmail.com wrote:
 Based on this review from a month ago, I'm going to mark this Waiting
 on Author.  If nobody updates the patch in a few days, I'll mark it
 Returned with Feedback.  Thanks.

Attached revision fixes the compiler warning that Heikki complained
about. I maintain SQL-callable stub functions from within contrib,
rather than follow Michael's approach. In other words, very little has
changed from my revision from July last [1].

Reminder: I maintain a slight preference for only offering one
suggestion per relation RTE, which is what this revision does (so no
change there). If a committer who picks this up wants me to alter
that, I don't mind doing so; since only Michael spoke up on this, I've
kept things my way.

This is not a completion mechanism; it is supposed to work on
*complete* column references with slight misspellings (e.g. incorrect
use of plurals, or column references with an omitted underscore
character). Weighing Tom's concerns about suggestions that are of
absolute low quality is what makes me conclude that this is the thing
to do.

[1] 
http://www.postgresql.org/message-id/CAM3SWZTzQO=oy4jmfb-65iefie8ihukderk-0oljetm8dsr...@mail.gmail.com
-- 
Peter Geoghegan
From 830bf9f668972ba6b531df5d4fcbd73db3472434 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan p...@heroku.com
Date: Sat, 30 Nov 2013 23:15:00 -0800
Subject: [PATCH] Levenshtein distance column HINT

Add a new HINT -- a guess as to what column the user might have intended
to reference, to be shown in various contexts where an
ERRCODE_UNDEFINED_COLUMN error is raised.  The user will see this HINT
when he or she fat-fingers a column reference in his or her ad-hoc SQL
query, or incorrectly pluralizes or fails to pluralize a column
reference.

The HINT suggests a column in the range table with the lowest
Levenshtein distance, or the tied-for-best pair of matching columns in
the event of there being exactly two equally likely candidates (iff each
candidate column comes from a separate RTE).  Limiting the cases where
multiple equally likely suggestions are all offered at once is a measure
against suggestions that are of low quality in an absolute sense.

A further, final measure is taken against suggestions that are of low
absolute quality:  If the distance exceeds a normalized distance
threshold, no suggestion is given.

The contrib Levenshtein distance implementation is moved from /contrib
to core.  However, the SQL-callable functions may only be used with the
fuzzystmatch extension installed, just as before -- the fuzzystmatch
definitions become mere forwarding stubs.
---
 contrib/fuzzystrmatch/Makefile|   3 -
 contrib/fuzzystrmatch/fuzzystrmatch.c |  81 --
 contrib/fuzzystrmatch/levenshtein.c   | 403 --
 src/backend/parser/parse_expr.c   |   9 +-
 src/backend/parser/parse_func.c   |   2 +-
 src/backend/parser/parse_relation.c   | 319 ---
 src/backend/utils/adt/Makefile|   2 +
 src/backend/utils/adt/levenshtein.c   | 393 +
 src/backend/utils/adt/varlena.c   |  25 ++
 src/include/parser/parse_relation.h   |   3 +-
 src/include/utils/builtins.h  |   5 +
 src/test/regress/expected/alter_table.out |   8 +
 src/test/regress/expected/join.out|  39 +++
 src/test/regress/expected/plpgsql.out |   1 +
 src/test/regress/expected/rowtypes.out|   1 +
 src/test/regress/expected/rules.out   |   1 +
 src/test/regress/expected/without_oid.out |   1 +
 src/test/regress/sql/join.sql |  24 ++
 18 files changed, 849 insertions(+), 471 deletions(-)
 delete mode 100644 contrib/fuzzystrmatch/levenshtein.c
 create mode 100644 src/backend/utils/adt/levenshtein.c

diff --git a/contrib/fuzzystrmatch/Makefile b/contrib/fuzzystrmatch/Makefile
index 024265d..0327d95 100644
--- a/contrib/fuzzystrmatch/Makefile
+++ b/contrib/fuzzystrmatch/Makefile
@@ -17,6 +17,3 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
-
-# levenshtein.c is #included by fuzzystrmatch.c
-fuzzystrmatch.o: fuzzystrmatch.c levenshtein.c
diff --git a/contrib/fuzzystrmatch/fuzzystrmatch.c b/contrib/fuzzystrmatch/fuzzystrmatch.c
index 7a53d8a..62e650f 100644
--- a/contrib/fuzzystrmatch/fuzzystrmatch.c
+++ b/contrib/fuzzystrmatch/fuzzystrmatch.c
@@ -154,23 +154,6 @@ getcode(char c)
 /* These prevent GH from becoming F */
 #define NOGHTOF(c)	(getcode(c)  16)	/* BDH */
 
-/* Faster than memcmp(), for this use case. */
-static inline bool
-rest_of_char_same(const char *s1, const char *s2, int len)
-{
-	while (len  0)
-	{
-		len--;
-		if (s1[len] != s2[len])
-			return false;
-	}
-	return true;
-}
-
-#include levenshtein.c
-#define LEVENSHTEIN_LESS_EQUAL
-#include levenshtein.c
-
 PG_FUNCTION_INFO_V1(levenshtein_with_costs);
 Datum
 levenshtein_with_costs(PG_FUNCTION_ARGS)
@@ 

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-11-09 Thread Michael Paquier
On Mon, Nov 10, 2014 at 1:48 PM, Peter Geoghegan p...@heroku.com wrote:

 On Fri, Nov 7, 2014 at 12:57 PM, Robert Haas robertmh...@gmail.com
 wrote:
  Based on this review from a month ago, I'm going to mark this Waiting
  on Author.  If nobody updates the patch in a few days, I'll mark it
  Returned with Feedback.  Thanks.

 Attached revision fixes the compiler warning that Heikki complained
 about. I maintain SQL-callable stub functions from within contrib,
 rather than follow Michael's approach. In other words, very little has
 changed from my revision from July last [1].

FWIW, I still find this bit of code that this patch adds in varlena.c ugly:
+#include levenshtein.c
+#define LEVENSHTEIN_LESS_EQUAL
+#include levenshtein.c
+#undef LEVENSHTEIN_LESS_EQUAL
-- 
Michael


Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-11-09 Thread Peter Geoghegan
On Sun, Nov 9, 2014 at 8:56 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 FWIW, I still find this bit of code that this patch adds in varlena.c ugly:

 +#include levenshtein.c
 +#define LEVENSHTEIN_LESS_EQUAL
 +#include levenshtein.c
 +#undef LEVENSHTEIN_LESS_EQUAL

Okay, but this is the coding that currently appears within contrib's
fuzzystrmatch.c, more or less unchanged. The #undef
LEVENSHTEIN_LESS_EQUAL line that I added ought to be unnecessary.
I'll give the final word on that to whoever picks this up.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_multixact not getting truncated

2014-11-09 Thread Josh Berkus
On 11/09/2014 08:00 PM, Josh Berkus wrote:
On 11/08/2014 01:46 PM, Andres Freund wrote:
 I'm these days suggesting that people should add manual vacuuming for
  older relations during off peak hours on busy databases. There's too
  many sites which service degrades noticeably during a full table vacuum.
 Me too: https://github.com/pgexperts/flexible-freeze

It turns out that not even a program of preventative scheduled vacuuming
helps.  This is because the template0 database anchors the minmxid and
prevents it from being advanced until autovacuum gets around to that
database, at whatever the minmxid threshold is.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-09 Thread Anssi Kääriäinen
On Sun, 2014-11-09 at 11:57 -0500, Steve Singer wrote:

 The reason why Jim and myself are asking for the LSN and not just the 
 timestamp is that I want to be able to order the transactions. Jim 
 pointed out earlier in the thread that just ordering on timestamp allows 
 for multiple transactions with the same timestamp.
 
 Maybe we don't need the entire LSN to solve that.  If you already have 
 the commit timestamp maybe you only need another byte or two of 
 granularity to order transactions that are within the same microsecond.

There is no guarantee that a commit with later LSN has a later
timestamp. There are cases where the clock could move significantly
backwards.

A robust solution to storing transaction commit information (including
commit order) in a way that can be referenced from other tables, can be
loaded to another cluster, and survives crashes would be a great
feature. But this feature doesn't have those properties.

 - Anssi




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

2014-11-09 Thread Etsuro Fujita

(2014/11/06 23:38), Fujii Masao wrote:

On Tue, Nov 4, 2014 at 12:04 PM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:

IIUC, I think that min = 0 disables fast update, so ISTM that it'd be
appropriate to set min to some positive value.  And ISTM that the idea of
using the min value of work_mem is not so bad.


OK. I changed the min value to 64kB.


*** 356,361  CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ replaceable
class=parametername/
--- 356,372 
   /listitem
  /varlistentry
  /variablelist
+variablelist
+varlistentry
+ termliteralPENDING_LIST_CLEANUP_SIZE//term

The above is still in uppercse.


Fixed.

Attached is the updated version of the patch. Thanks for the review!


Thanks for the updating the patch!

The patch looks good to me except for the following point:

*** a/src/backend/access/gin/ginfast.c
--- b/src/backend/access/gin/ginfast.c
***
*** 25,30 
--- 25,32 
  #include utils/memutils.h
  #include utils/rel.h

+ /* GUC parameter */
+ int   pending_list_cleanup_size = 0;

I think we need to initialize the GUC to boot_val, 4096 in this case.

I asked the opinions of others about the config_group of the GUC.  But 
there seems no opinions, so I agree with Fujii-san's idea of assigning 
the GUC CLIENT_CONN_STATEMENT.


Thanks,

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compiler warning in master branch

2014-11-09 Thread David Rowley
On Mon, Nov 10, 2014 at 4:31 PM, Peter Geoghegan p...@heroku.com wrote:

 I see this when I build at -O2 from master's tip:

 gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
 -Wdeclaration-after-statement -Wendif-labels
 -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
 -fwrapv -fexcess-precision=standard -Werror -I../../../../src/include
 -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2   -c -o
 brin_revmap.o brin_revmap.c -MMD -MP -MF .deps/brin_revmap.Po
 brin_revmap.c: In function ‘brinRevmapExtend’:
 brin_revmap.c:113:14: error: variable ‘mapBlk’ set but not used
 [-Werror=unused-but-set-variable]
   BlockNumber mapBlk;
   ^

 I'm using gcc 4.8.2 here.


It would appear just to need the attached.

Regards

David Rowley
diff --git a/src/backend/access/brin/brin_revmap.c 
b/src/backend/access/brin/brin_revmap.c
index 7f55ded..272c74e 100644
--- a/src/backend/access/brin/brin_revmap.c
+++ b/src/backend/access/brin/brin_revmap.c
@@ -110,7 +110,7 @@ brinRevmapTerminate(BrinRevmap *revmap)
 void
 brinRevmapExtend(BrinRevmap *revmap, BlockNumber heapBlk)
 {
-   BlockNumber mapBlk;
+   BlockNumber mapBlk PG_USED_FOR_ASSERTS_ONLY;
 
mapBlk = revmap_extend_and_get_blkno(revmap, heapBlk);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.5] Custom Plan API

2014-11-09 Thread Amit Kapila
On Sat, Nov 8, 2014 at 4:16 AM, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Oct 27, 2014 at 2:35 AM, Kouhei Kaigai kai...@ak.jp.nec.com
wrote:
  FYI, patch v12 part 2 no longer applies cleanly.
 
  Thanks. I rebased the patch set according to the latest master branch.
  The attached v13 can be applied to the master.

 I've committed parts 1 and 2 of this, without the documentation, and
 with some additional cleanup.

Few observations/questions related to this commit:

1.
@@ -5546,6 +5568,29 @@ get_variable(Var *var, int levelsup, bool
istoplevel, deparse_context *context)
  colinfo = deparse_columns_fetch(var-varno, dpns);
  attnum = var-varattno;
  }
+ else if (IS_SPECIAL_VARNO(var-varno) 
+ IsA(dpns-planstate, CustomScanState) 
+ (expr = GetSpecialCustomVar((CustomScanState *) dpns-planstate,
+ var, child_ps)) != NULL)
+ {
+ deparse_namespace save_dpns;
+
+ if (child_ps)
+ push_child_plan(dpns, child_ps, save_dpns);
+ /*
+ * Force parentheses because our caller probably assumed a Var is a
+ * simple expression.
+ */
+ if (!IsA(expr, Var))
+ appendStringInfoChar(buf, '(');
+ get_rule_expr((Node *) expr, context, true);
+ if (!IsA(expr, Var))
+ appendStringInfoChar(buf, ')');
+
+ if (child_ps)
+ pop_child_plan(dpns, save_dpns);
+ return NULL;
+ }

a. It seems Assert for netlelvelsup is missing in this loop.
b. Below comment in function get_variable can be improved
w.r.t handling for CustomScanState.  The comment indicates
that if varno is OUTER_VAR or INNER_VAR or INDEX_VAR, it handles
all of them similarly which seems to be slightly changed for
CustomScanState.

/*
 * Try to find the relevant RTE in this rtable.  In a plan tree, it's
 * likely that varno is
OUTER_VAR or INNER_VAR, in which case we must dig
 * down into the subplans, or INDEX_VAR, which is
resolved similarly. Also
 * find the aliases previously assigned for this RTE.
 */

2.
+void
+register_custom_path_provider(CustomPathMethods *cpp_methods)
{
..
}

Shouldn't there be unregister function corresponding to above
register function?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com