Re: Using a single standalone-backend run in initdb (was Re: [HACKERS] Bootstrap DATA is a pita)

2015-12-12 Thread Tom Lane
Mark Dilger  writes:
>> On Dec 12, 2015, at 3:42 PM, Tom Lane  wrote:
>> ... In general, though, I'd rather not try to
>> teach InteractiveBackend() such a large amount about SQL syntax.

> I use CREATE RULE within startup files in the fork that I maintain.  I have
> lots of them, totaling perhaps 50k lines of rule code.  I don't think any of 
> that
> code would have a problem with the double-newline separation you propose,
> which seems a more elegant solution to me.

Yeah?  Just for proof-of-concept, could you run your startup files with
the postgres.c patch as proposed, and see whether you get any failures?

> Admittedly, the double-newline
> separation would need to be documented at the top of each sql file, otherwise
> it would be quite surprising to those unfamiliar with it.

Agreed, that wouldn't be a bad thing.

I thought of a positive argument not to do the "fully right" thing by
means of implementing the exactly-right command boundary rules.  Suppose
that you mess up in information_schema.sql or another large input file
by introducing an extra left parenthesis in some query.  What would happen
if InteractiveBackend() were cognizant of the paren-matching rule is that
it would slurp everything till the end-of-file and then produce a syntax
error message quoting all that text; not much better than what happens
today.  With a command break rule like semi-newline-newline, there'll be
a limited horizon as to how much text gets swallowed before you get the
error message.

Note that this is different from the situation with a fully interactive
input processor like psql: if you're typing the same thing in psql,
you'll realize as soon as it doesn't execute the command when-expected
that something is wrong.  You won't type another thousand lines of input
before looking closely at what you typed already.

I'm still not quite sold on semi-newline-newline as being the best
possible command boundary rule here; but I do think that "fully correct"
boundary rules are less attractive than they might sound.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Disabling an index temporarily

2015-12-12 Thread Oleg Bartunov
On Sun, Dec 13, 2015 at 1:16 AM, Jaime Casanova <
jaime.casan...@2ndquadrant.com> wrote:

> indexrelid = 'indexname'::regclass;


This works, but might bloat system catalog.


Re: [HACKERS] [patch] Proposal for \rotate in psql

2015-12-12 Thread Pavel Stehule
2015-12-10 19:29 GMT+01:00 Pavel Stehule :

>
>
>
>> postgres=# \crosstabview 4 +month label
>>
>
> Maybe using optional int order column instead label is better - then you
> can do sort on client side
>
> so the syntax can be "\crosstabview VCol [+/-]HCol [[+-]HOrderCol]
>

here is patch - supported syntax: \crosstabview VCol [+/-]HCol [HOrderCol]

Order column should to contains any numeric value. Values are sorted on
client side

Regards

Pavel


>
> Regards
>
> Pavel
>
>
>
>>
>> ┌──┬───┬───┬┬───┬───┬───┬───┬───┬───┬───┬───┐
>> │ customer │  led  │  úno  │  bře   │  dub  │  kvě  │
>> čen  │  čec  │  srp  │  zář  │  říj  │  lis  │
>>
>> ╞══╪═══╪═══╪╪═══╪═══╪═══╪═══╪═══╪═══╪═══╪═══╡
>> │ A**  │   │   ││   │
>> │   │   │   │   │ 13000 │   │
>> │ A│   │   │ 8000   │   │
>> │   │   │   │   │   │   │
>> │ B*   │   │   ││   │
>> │   │   │   │   │   │ 3200  │
>> │ B*** │   │   ││   │
>> │   │   │   │ 26200 │   │   │
>> │ B*   │   │   ││   │
>> │   │ 14000 │   │   │   │   │
>> │ C**  │   │   ││ 7740  │
>> │   │   │   │   │   │   │
>> │ C*** │   │   ││   │
>> │   │   │   │ 26000 │   │   │
>> │ C*   │   │   ││ 12000 │
>> │   │   │   │   │   │   │
>> │ G*** │ 30200 │ 26880 │ 13536  │ 39360 │ 60480 │
>> 54240 │ 44160 │ 16320 │ 29760 │ 22560 │ 20160 │
>> │ G*** │   │   ││   │   │
>> 25500 │   │   │   │   │   │
>> │ G**  │   │   ││   │   │
>> 16000 │   │   │   │   │   │
>> │ I*   │   │   ││   │
>> │   │   │ 27920 │   │   │   │
>> │ i│   │   ││ 13500 │
>> │   │   │   │   │   │   │
>> │ n*   │   │   ││   │
>> │   │ 12600 │   │   │   │   │
>> │ Q**  │   │   ││   │ 16700
>> │   │   │   │   │   │   │
>> │ S*** │   │   ││   │
>> │   │ 8000  │   │   │   │   │
>> │ S*** │   │   ││   │ 5368
>> │   │   │   │   │   │   │
>> │ s*** │   │   │ 5000   │ 3200  │
>> │   │   │   │   │   │   │
>>
>> └──┴───┴───┴┴───┴───┴───┴───┴───┴───┴───┴───┘
>> (18 rows)
>>
>>
>>
>> Regards
>>
>> Pavel
>>
>>
>>>
>>>
>>>
 Best regards,
 --
 Daniel Vérité
 PostgreSQL-powered mailer: http://www.manitou-mail.org
 Twitter: @DanielVerite

>>>
>>>
>>
>
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
new file mode 100644
index 47e9da2..a45f4b8
*** a/doc/src/sgml/ref/psql-ref.sgml
--- b/doc/src/sgml/ref/psql-ref.sgml
*** lo_import 152801
*** 2461,2466 
--- 2461,2555 
  

  
+   
+ \crosstabview [ colV  [-|+]colH ] 
+ 
+ 
+ Execute the current query buffer (like \g) and 
shows the results
+ inside a crosstab grid. The output column colV
+ becomes a vertical header and the output column
+ colH becomes a 
horizontal header.
+ The results for the other output columns are projected inside the 
grid.
+ 
+ 
+ 
+ colV
+ and colH can indicate a
+ column position (starting at 1), or a column name. Normal case folding
+ and quoting rules apply on column names. By default,
+ colV is column 1
+ and colH is column 2.
+ A query having only one output column cannot be viewed in crosstab, 
and
+ colH must differ from
+ colV.
+ 
+ 
+ 
+ The vertical header, displayed as the leftmost column,
+ contains the set of all distinct values found in
+ column colV, in the order
+ of their first appearance in the query results.
+ 
+ 
+ The horizontal header, displayed as the first row,
+ contains the set of all distinct non-null values found in
+ column colH.  They come
+ by default in their order of appearance in the query results, or in 
ascending
+ order if 

Re: [HACKERS] PATCH: add pg_current_xlog_flush_location function

2015-12-12 Thread Amit Kapila
On Sun, Dec 13, 2015 at 12:07 AM, Tomas Vondra  wrote:

> Hi,
>
> attached is a patch adding a function pg_current_xlog_flush_location(),
> which proved quite useful when investigating the ext4 data loss bug. It's
> mostly what was already sent to that thread, except for docs that were
> missing in the initial version.
>
>
 /*

+ * Get latest WAL flush pointer

+ */

+XLogRecPtr

+GetXLogFlushRecPtr(void)

+{

+ SpinLockAcquire(>info_lck);

+ LogwrtResult = XLogCtl->LogwrtResult;

+ SpinLockRelease(>info_lck);

+

+ return LogwrtResult.Flush;

+}

+


Is there a reason why you can't use existing function
GetFlushRecPtr() in xlog.c?




With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

2015-12-12 Thread Jeff Janes
On Sat, Dec 12, 2015 at 9:48 AM, Corey Huinker  wrote:
>
>
> What, if any, other load should be placed on the underlying table during the
> test?
>
> I ask because CIC statements that run in seconds on our staging machine can
> take many hours on our production machine, when most of the access is just
> reads, though those reads may have been part of a larger transaction that
> did updates elsewhere.

That sounds like the CIC is just blocked waiting for long-lived
transactions to go away.  There isn't much that changes to the sorting
algorithm can do about that.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: track last known XLOG segment in control file

2015-12-12 Thread Amit Kapila
On Sun, Dec 13, 2015 at 4:24 AM, Tomas Vondra 
wrote:

>
>
> On 12/12/2015 11:39 PM, Andres Freund wrote:
>
>> On 2015-12-12 23:28:33 +0100, Tomas Vondra wrote:
>>
>>> On 12/12/2015 11:20 PM, Andres Freund wrote:
>>>
 On 2015-12-12 22:14:13 +0100, Tomas Vondra wrote:

> this is the second improvement proposed in the thread [1] about ext4
> data
> loss issue. It adds another field to control file, tracking the last
> known
> WAL segment. This does not eliminate the data loss, just the silent
> part of
> it when the last segment gets lost (due to forgetting the rename,
> deleting
> it by mistake or whatever). The patch makes sure the cluster refuses to
> start if that happens.
>

 Uh, that's fairly expensive. In many cases it'll significantly
 increase the number of fsyncs.

>>>
>>> It should do exactly 1 additional fsync per WAL segment. Or do you think
>>> otherwise?
>>>
>>
>> Which is nearly doubling the number of fsyncs, for a good number of
>> workloads. And it does so to a separate file, i.e. it's not like
>> these writes and the flushes can be combined. In workloads where
>> pg_xlog is on a separate partition it'll add the only source of
>> fsyncs besides checkpoint to the main data directory.
>>
>
>
I also think so.


> I doubt it will make any difference in practice, at least on reasonable
> hardware (which you should have, if fsync performance matters to you).
>
>
But some performance testing will be necessary, I don't expect this to go
> in without that. It'd be helpful if you could describe the workload.
>
>
I think to start with you can try to test pgbench read-write workload
when the data fits in shared_buffers.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] [patch] Proposal for \rotate in psql

2015-12-12 Thread Pavel Stehule
2015-12-13 8:14 GMT+01:00 Pavel Stehule :

>
>
> 2015-12-10 19:29 GMT+01:00 Pavel Stehule :
>
>>
>>
>>
>>> postgres=# \crosstabview 4 +month label
>>>
>>
>> Maybe using optional int order column instead label is better - then you
>> can do sort on client side
>>
>> so the syntax can be "\crosstabview VCol [+/-]HCol [[+-]HOrderCol]
>>
>
> here is patch - supported syntax: \crosstabview VCol [+/-]HCol [HOrderCol]
>
> Order column should to contains any numeric value. Values are sorted on
> client side
>

fixed error messages



>
> Regards
>
> Pavel
>
>
>>
>> Regards
>>
>> Pavel
>>
>>
>>
>>>
>>> ┌──┬───┬───┬┬───┬───┬───┬───┬───┬───┬───┬───┐
>>> │ customer │  led  │  úno  │  bře   │  dub  │  kvě  │
>>> čen  │  čec  │  srp  │  zář  │  říj  │  lis  │
>>>
>>> ╞══╪═══╪═══╪╪═══╪═══╪═══╪═══╪═══╪═══╪═══╪═══╡
>>> │ A**  │   │   ││   │
>>> │   │   │   │   │ 13000 │   │
>>> │ A│   │   │ 8000   │   │
>>> │   │   │   │   │   │   │
>>> │ B*   │   │   ││   │
>>> │   │   │   │   │   │ 3200  │
>>> │ B*** │   │   ││   │
>>> │   │   │   │ 26200 │   │   │
>>> │ B*   │   │   ││   │
>>> │   │ 14000 │   │   │   │   │
>>> │ C**  │   │   ││ 7740  │
>>> │   │   │   │   │   │   │
>>> │ C*** │   │   ││   │
>>> │   │   │   │ 26000 │   │   │
>>> │ C*   │   │   ││ 12000 │
>>> │   │   │   │   │   │   │
>>> │ G*** │ 30200 │ 26880 │ 13536  │ 39360 │ 60480 │
>>> 54240 │ 44160 │ 16320 │ 29760 │ 22560 │ 20160 │
>>> │ G*** │   │   ││   │   │
>>> 25500 │   │   │   │   │   │
>>> │ G**  │   │   ││   │   │
>>> 16000 │   │   │   │   │   │
>>> │ I*   │   │   ││   │
>>> │   │   │ 27920 │   │   │   │
>>> │ i│   │   ││ 13500 │
>>> │   │   │   │   │   │   │
>>> │ n*   │   │   ││   │
>>> │   │ 12600 │   │   │   │   │
>>> │ Q**  │   │   ││   │ 16700
>>> │   │   │   │   │   │   │
>>> │ S*** │   │   ││   │
>>> │   │ 8000  │   │   │   │   │
>>> │ S*** │   │   ││   │ 5368
>>> │   │   │   │   │   │   │
>>> │ s*** │   │   │ 5000   │ 3200  │
>>> │   │   │   │   │   │   │
>>>
>>> └──┴───┴───┴┴───┴───┴───┴───┴───┴───┴───┴───┘
>>> (18 rows)
>>>
>>>
>>>
>>> Regards
>>>
>>> Pavel
>>>
>>>



> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org
> Twitter: @DanielVerite
>


>>>
>>
>
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
new file mode 100644
index 47e9da2..a45f4b8
*** a/doc/src/sgml/ref/psql-ref.sgml
--- b/doc/src/sgml/ref/psql-ref.sgml
*** lo_import 152801
*** 2461,2466 
--- 2461,2555 
  

  
+   
+ \crosstabview [ colV  [-|+]colH ] 
+ 
+ 
+ Execute the current query buffer (like \g) and 
shows the results
+ inside a crosstab grid. The output column colV
+ becomes a vertical header and the output column
+ colH becomes a 
horizontal header.
+ The results for the other output columns are projected inside the 
grid.
+ 
+ 
+ 
+ colV
+ and colH can indicate a
+ column position (starting at 1), or a column name. Normal case folding
+ and quoting rules apply on column names. By default,
+ colV is column 1
+ and colH is column 2.
+ A query having only one output column cannot be viewed in crosstab, 
and
+ colH must differ from
+ colV.
+ 
+ 
+ 
+ The vertical header, displayed as the leftmost column,
+ contains the set of all distinct values found in
+ column colV, in the order
+ of their first appearance in the query results.
+ 
+ 
+ The horizontal header, displayed as the first row,
+  

Re: [HACKERS] Parallel Aggregate

2015-12-12 Thread Haribabu Kommi
On Sat, Dec 12, 2015 at 8:42 AM, David Rowley
 wrote:
> On 12 December 2015 at 04:00, Robert Haas  wrote:
>>
>> I'd like to commit David Rowley's patch from the other thread first,
>> and then deal with this one afterwards.  The only thing I feel
>> strongly needs to be changed in that patch is CFUNC -> COMBINEFUNC,
>> for clarity.
>
>
> I have addressed that in my local copy. I'm now just working on adding some
> test code which uses the new infrastructure. Perhaps I'll just experiment
> with the parallel aggregate stuff instead now.
>

Here I attached a patch with following changes, i feel it is better to
include them as part
of combine aggregate patch.

1. Added Missing outfuncs.c changes for newly added variables in
Aggref structure
2. Keeping the aggregate function in final aggregate stage to do the
final aggregate
on the received tuples from all workers.

Patch still needs a fix for correcting the explain plan output issue.

postgres=# explain analyze verbose select count(*), sum(f1) from tbl
where f1 % 100 = 0 group by f3;
   QUERY
PLAN

 Finalize HashAggregate  (cost=1853.75..1853.76 rows=1 width=12)
(actual time=92.428..92.429 rows=1 loops=1)
   Output: pg_catalog.count(*), pg_catalog.sum((sum(f1))), f3
   Group Key: tbl.f3
   ->  Gather  (cost=0.00..1850.00 rows=500 width=12) (actual
time=92.408..92.416 rows=3 loops=1)
 Output: f3, (count(*)), (sum(f1))


Regards,
Hari Babu
Fujitsu Australia


set_ref_final_agg.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Disabling an index temporarily

2015-12-12 Thread Tatsuo Ishii
> Tatsuo Ishii wrote:
>>> Wouldn't something like:
>>>
>>> ALTER INDEX foo SET DISABLED;
>>>
>>> See more in line with our grammar?
>>
>> But this will affect other sessions, no?
> 
> Not if it is used in a transaction that ends with a ROLLBACK,
> but then you might as well use DROP INDEX, except
> that DROP INDEX takes an access exclusive lock.

I thought about this. Problem with the transaction rollback technique
is, I would not be able to test with an application which runs
multiple transactions.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error with index on unlogged table

2015-12-12 Thread Michael Paquier
On Fri, Dec 11, 2015 at 4:54 PM, Michael Paquier
 wrote:
> On Fri, Dec 11, 2015 at 4:27 AM, Andres Freund  wrote:
>> On 2015-12-10 18:36:32 +0100, Andres Freund wrote:
>>> On 2015-12-10 12:19:12 -0500, Robert Haas wrote:
>>> > > The real problem there imo isn't that the copy_relation_data() doesn't
>>> > > deal with 0 block tables, but that ATExecSetTableSpace() doesn't have a
>>> > > unlogged table specific codepath like heap_create_with_catalog() has.
>>> >
>>> > It looks to me like somewhere we need to do log_smgrcreate(...,
>>> > INIT_FORKNUM) in the unlogged table case.
>>>
>>> Yes.
>>>
>>> > RelationCreateStorage()
>>> > skips this for the main forknum of an unlogged table, which seems OK,
>>> > but there's nothing that even attempts it for the init fork, which
>>> > does not seem OK.
>>>
>>> We unfortunately can't trivially delegate that work to
>>> RelationCreateStorage(). E.g. heap_create() documents that only the main
>>> fork is created :(
>>>
>>> > I guess that logic should happen in
>>> > ATExecSetTableSpace just after smgrcreate(dstrel, forkNum, false).
>>>
>>> Looks like it's the easiest place.
>>
>>> > > A second problem is that the smgrimmedsync() in copy_relation_data()
>>> > > isn't called for the init fork of unlogged relations, even if it needs
>>> > > to.
>>
>> Here's a patch doing that. It's not yet fully polished, but I wanted to
>> get it out, because I noticed one thing:
>>
>> In ATExecSetTableSpace(), for !main forks, we currently call
>> smgrcreate(), but not log_smgrcreate(). Even for PERSISTENT
>> relations. That seems a bit odd to me. It currently seems to be without
>> further consequence because, if there's actual data in the fork, we'll
>> just create the relation in _mdfd_getseg(); or we can cope with the
>> relation not being there.  But to me that feels wrong.
>>
>> It seems better to do the log_smgrcreate() for RELPERSISTENCE_PERMANENT,
>> not just INIT_FORKNUM. What do you guys think?
>
> This fixes the problem in my environment.
>
> +   if (rel->rd_rel->relpersistence ==
> RELPERSISTENCE_PERMANENT ||
> +   (rel->rd_rel->relpersistence ==
> RELPERSISTENCE_UNLOGGED &&
> +forkNum == INIT_FORKNUM))
> +   log_smgrcreate(, forkNum);
> There should be a XLogIsNeeded() check as well. Removing the check on
> RELPERSISTENCE_UNLOGGED is fine as well... Not mandatory though :)
>
> +* The init fork for an unlogged relation in many respects has to be
> +* treated the same as normal relation, changes need to be WAL
> logged and
> +* it needs to be synced to disk.
> +*/
> +   copying_initfork = relpersistence == RELPERSISTENCE_UNLOGGED &&
> +   forkNum == INIT_FORKNUM;
> Here as well just a check on INIT_FORKNUM would be fine.

Should we consider this bug a 9.5 blocker? I feel uneasy with the fact
of releasing a new major version knowing that we know some bugs on it,
and this one is not cool so I have added it in the list of open items.
We know the problem and there is a patch, so this is definitely
solvable.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2015-12-12 Thread Amit Kapila
On Wed, Dec 9, 2015 at 1:02 AM, Robert Haas  wrote:
>
> On Thu, Dec 3, 2015 at 1:48 AM, Amit Kapila 
wrote:
> > I think the way to address is don't add backend to Group list if it is
> > not intended to update the same page as Group leader.  For transactions
> > to be on different pages, they have to be 32768 transactionid's far
apart
> > and I don't see much possibility of that happening for concurrent
> > transactions that are going to be grouped.
>
> That might work.
>

Okay, attached patch group_update_clog_v3.patch implements the above.

> >> My idea for how this could possibly work is that you could have a list
> >> of waiting backends for each SLRU buffer page.
> >
> > Won't this mean that first we need to ensure that page exists in one of
> > the buffers and once we have page in SLRU buffer, we can form the
> > list and ensure that before eviction, the list must be processed?
> > If my understanding is right, then for this to work we need to probably
> > acquire CLogControlLock in Shared mode in addition to acquiring it
> > in Exclusive mode for updating the status on page and performing
> > pending updates for other backends.
>
> Hmm, that wouldn't be good.  You're right: this is a problem with my
> idea.  We can try what you suggested above and see how that works.  We
> could also have two or more slots for groups - if a backend doesn't
> get the lock, it joins the existing group for the same page, or else
> creates a new group if any slot is unused.
>

I have implemented this idea as well in the attached patch
group_slots_update_clog_v3.patch

>  I think it might be
> advantageous to have at least two groups because otherwise things
> might slow down when some transactions are rolling over to a new page
> while others are still in flight for the previous page.  Perhaps we
> should try it both ways and benchmark.
>

Sure, I can do the benchmarks with both the patches, but before that
if you can once check whether group_slots_update_clog_v3.patch is inline
with what you have in mind then it will be helpful.

Note - I have used attached patch transaction_burner_v1.patch (extracted
from Jeff's patch upthread) to verify the transactions that fall into
different
page boundaries.


group_update_clog_v3.patch
Description: Binary data


group_slots_update_clog_v3.patch
Description: Binary data


transactionid_burner_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication and multimaster

2015-12-12 Thread Craig Ringer
On 12 December 2015 at 02:24, Robert Haas  wrote:

> On Fri, Dec 11, 2015 at 5:16 AM, Andres Freund  wrote:
> > On 2015-12-11 18:12:55 +0800, Craig Ringer wrote:
> >> On 10 December 2015 at 03:19, Robert Haas 
> wrote:
> >> > On Sun, Dec 6, 2015 at 10:24 PM, Craig Ringer 
> >> > wrote:
> >> > > * A way to securely make a libpq connection from a bgworker without
> >> > messing
> >> > > with passwords etc. Generate one-time cookies, sometihng like that.
> >> >
> >> > Why would you have the bgworker connect to the database via TCP
> >> > instead of just doing whatever it wants to do directly?
> >
> >> pg_dump and pg_restore, mainly, for copying the initial database state.
> >
> > Well, you don't want to necessarily directly connect from the bgworker,
> > but from processes started from a bgworker. I guess that's where a good
> > bit of the Robert's confusion originated.
>
> That's part of it, yeah.  I'm a little scared of this design.  I mean,
> I understand now why Craig wants to do this (thanks for explaining,
> Craig!), but it seems like it's going to have a lot of the same
> reliability problems that pg_upgrade does.


Yes, and more.

Especially when dealing with multiple upstream servers, etc.

It's not very nice. I would very much prefer to have a better way to
achieve the initial data sync, but at present I don't think there is any
better approach that's even remotely practical.

I'm not saying there's a better way to get the functionality


Yup. That's the problem.


> but it's pretty obvious that depending on tools other than the server
> itself, and in particular
> pg_dump, vastly increases the failure surface area.


 It's not too bad to find pg_dump, though we landed up not being able to
re-use find_other_exec for various reasons I'll have to try to dig out of
the cruftier corners of my memory.  It has a fairly sane interface too.

Things get hairy when you want to do things like "give me all the
upstream's non-table objects, then give me [this set of table
definitions]"... then you go and sync the data from an exported snapshot
using COPY, then finish up by restoring the constraints for the set of
tables you dumped.

Being able to access pg_dump and pg_restore's dependency resolution logic,
object dumping routines, etc from regular SQL and from the SPI would be
wonderful.

I believe the main complaints about doing that when it was discussed in the
past were issues with downgrading. A pg_get_tabledef(...) in 9.6 might emit
keywords etc that a 9.2 server wouldn't understand, and the way we
currently solve this is to require that you run 9.2's pg_dump against the
9.6 server to get a 9.2-compatible dump. So if we had pg_get_blahdef
functions we'd still need external versions, so why bother having them?

The alternative is to have all the get_blahdef functions accept a param for
server version compatibility, which would work but burden future servers
with knowledge about older versions' features and corresponding code cruft
for some extended period of time.

So it's gone nowhere to date.

For that matter it's not clear that pg_get_blahdef functions would be the
right solution, but I can't see directly poking around in the catalogs and
basically re-implementing pg_dump being OK either. So what else could we do?


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-12-12 Thread Michael Paquier
On Sat, Dec 12, 2015 at 11:37 AM, Noah Misch  wrote:
> On Fri, Dec 11, 2015 at 09:34:34PM +0900, Michael Paquier wrote:
>> On Fri, Dec 11, 2015 at 8:48 PM, Alvaro Herrera
>>  wrote:
>> > Michael Paquier wrote:
>> >> On Fri, Dec 11, 2015 at 5:35 AM, Alvaro Herrera
>> >>  wrote:
>> >> I guess that to complete your idea we could allow PostgresNode to get
>> >> a custom name for its log file through an optional parameter like
>> >> logfile => 'myname' or similar. And if nothing is defined, process
>> >> falls back to applname. So this would give the following:
>> >> ${testname}_${logfile}.log
>> >
>> > Sure. I don't think we should the name only for the log file, though,
>> > but also for things like the "## " informative messages we print here
>> > and there.  That would make the log file simpler to follow.  Also, I'm
>> > not sure about having it be optional.  (TBH I'm not sure about applname
>> > either; why do we keep that one?)
>>
>> OK, so let's do this: the node name is a mandatory argument of
>> get_new_node, which is passed to "new PostgresNode" like the port and
>> the host, and it is then used in the log file name as well as in the
>> information messages you are mentioning. That's a patch simple enough.
>> Are you fine with this approach?
>
> Sounds reasonable so far.

OK, done so.

>> Regarding the application name, I still think it is useful to have it
>> though. pg_rewind should actually use it, and the other patch adding
>> the recovery routines will use it.
>
> Using the application_name connection parameter is fine, but I can't think of
> a reason to set it to "node_".$node->port instead of $node->name.  And I can't
> think of a use for the $node->applname field once you have $node->name.  What
> use case would benefit?

I have the applname stuff, and updated the log messages to use the
node name for clarity.

The patch to address those points is attached.
Regards,
-- 
Michael


20151212_tap_node_name.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Disabling an index temporarily

2015-12-12 Thread Albe Laurenz
Tatsuo Ishii wrote:
>> Wouldn't something like:
>>
>> ALTER INDEX foo SET DISABLED;
>>
>> See more in line with our grammar?
>
> But this will affect other sessions, no?

Not if it is used in a transaction that ends with a ROLLBACK,
but then you might as well use DROP INDEX, except
that DROP INDEX takes an access exclusive lock.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making tab-complete.c easier to maintain

2015-12-12 Thread Michael Paquier
On Sat, Dec 12, 2015 at 12:00 AM, Tom Lane  wrote:
> Greg Stark  writes:
>> So I don't think it makes sense to hold up improvements today hoping
>> for something like this.
>
> Yeah, it's certainly a wishlist item rather than something that should
> block shorter-term improvements.

OK, hence it seems that the next move is to move on with Thomas' stuff.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error with index on unlogged table

2015-12-12 Thread Andres Freund
On 2015-12-11 16:54:45 +0900, Michael Paquier wrote:
> +   if (rel->rd_rel->relpersistence ==
> RELPERSISTENCE_PERMANENT ||
> +   (rel->rd_rel->relpersistence ==
> RELPERSISTENCE_UNLOGGED &&
> +forkNum == INIT_FORKNUM))
> +   log_smgrcreate(, forkNum);
> There should be a XLogIsNeeded() check as well.

RelationCreateStorage(), which creates the main fork, has no such
check. Seems better to stay symmetric, even if it's not strictly
necessary.

> Removing the check on RELPERSISTENCE_UNLOGGED is fine as well... Not
> mandatory though :)

I've gone back and forth on this, I can't really convince myself either
way. We might end up reusing init forks for 'global temporary tables' or
somesuch, so being a bit stricter seems slight better. Anyway, it's of
no actual consequence for now.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2015-12-12 Thread Jeff Janes
On Sun, Dec 6, 2015 at 4:25 PM, Peter Geoghegan  wrote:

> Maybe we should consider trying to get patch 0002 (the memory
> pool/merge patch) committed first, something Greg Stark suggested
> privately. That might actually be an easier way of integrating this
> work, since it changes nothing about the algorithm we use for merging
> (it only improves memory locality), and so is really an independent
> piece of work (albeit one that makes a huge overall difference due to
> the other patches increasing the time spent merging in absolute terms,
> and especially as a proportion of the total).

I have a question about the terminology used in this patch.  What is a
tuple proper?  What is it in contradistinction to?  I would think that
a tuple which is located in its own palloc'ed space is the "proper"
one, leaving a tuple allocated in the bulk memory pool to be
called...something else.  I don't know what the
non-judgmental-sounding antonym of postpositive "proper" is.

Also, if I am reading this correctly, when we refill a pool from a
logical tape we still transform each tuple as it is read from the disk
format to the memory format.  This inflates the size quite a bit, at
least for single-datum tuples.  If we instead just read the disk
format directly into the pool, and converted them into the in-memory
format when each tuple came due for the merge heap, would that destroy
the locality of reference you are seeking to gain?

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Disabling an index temporarily

2015-12-12 Thread Julien Rouhaud
On 12/12/2015 11:42, Albe Laurenz wrote:
> Tatsuo Ishii wrote:
>>> Wouldn't something like:
>>>
>>> ALTER INDEX foo SET DISABLED;
>>>
>>> See more in line with our grammar?
>>
>> But this will affect other sessions, no?
> 
> Not if it is used in a transaction that ends with a ROLLBACK,
> but then you might as well use DROP INDEX, except
> that DROP INDEX takes an access exclusive lock.
> 
> Yours,
> Laurenz Albe
> 

Oleg and Teodor announced some time ago an extension for this exact use
case, see
http://www.postgresql.org/message-id/pine.lnx.4.64.0910062354510.6...@sn.sai.msu.ru

This also has the advantage of not needing an exclusive lock on the index.


-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error with index on unlogged table

2015-12-12 Thread Andres Freund
On 2015-12-12 20:49:52 +0900, Michael Paquier wrote:
> Should we consider this bug a 9.5 blocker?

I don't think so. But either way, I'm right now working on getting it
fixed anyway.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] psql tab completion bug for ALL IN TABLESPACE

2015-12-12 Thread Michael Paquier
Hi all,

I just bumped into the following thing while looking again at Thomas'
patch for psql tab completion:
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1040,7 +1040,7 @@ psql_completion(const char *text, int start, int end)
 pg_strcasecmp(prev5_wd, "IN") == 0 &&
 pg_strcasecmp(prev4_wd, "TABLESPACE") == 0 &&
 pg_strcasecmp(prev2_wd, "OWNED") == 0 &&
-pg_strcasecmp(prev4_wd, "BY") == 0)
+pg_strcasecmp(prev_wd, "BY") == 0)
{
COMPLETE_WITH_QUERY(Query_for_list_of_roles);
This should be backpatched, attached is the needed patch.

Regards,
-- 
Michael


20151212_psql_alltblspc.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making tab-complete.c easier to maintain

2015-12-12 Thread Michael Paquier
On Wed, Dec 9, 2015 at 8:17 PM, Thomas Munro wrote:
> Thanks for looking at this Michael.  It's probably not much fun to
> review!  Here is a new version with that bit removed.  More responses
> inline below.

Could this patch be rebased? There are now conflicts with 8b469bd7 and
it does not apply cleanly.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error with index on unlogged table

2015-12-12 Thread Andres Freund
On 2015-12-12 12:52:21 +0100, Andres Freund wrote:
> On 2015-12-12 20:49:52 +0900, Michael Paquier wrote:
> > Should we consider this bug a 9.5 blocker?
> 
> I don't think so. But either way, I'm right now working on getting it
> fixed anyway.

And done. Took a bit longer than predicted - I had to adjust my test
scripts to cope with < 9.4 not having tablespace mapping. Rather
annoying.


It'd be really convenient if tablespaces relative to the main data
directory were supported...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error with index on unlogged table

2015-12-12 Thread Tom Lane
Michael Paquier  writes:
> Should we consider this bug a 9.5 blocker? I feel uneasy with the fact
> of releasing a new major version knowing that we know some bugs on it,
> and this one is not cool so I have added it in the list of open items.
> We know the problem and there is a patch, so this is definitely
> solvable.

FWIW, general project policy has been that pre-existing bugs are not
release blockers as such.  However, if we have a fix in hand that we
would like to get some field testing on, it's certainly desirable to
push it out first in a beta or RC release, rather than having it first
hit the field in stable-branch updates.  So we might delay a beta/RC
to make sure such a fix is available for testing.  But that's not quite
the same thing as a release blocker.  To my mind, a release blocker
is something that would make it undesirable for users to upgrade to
the new release compared to the previous branch, so bugs the branches
have in common are never that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Disabling an index temporarily

2015-12-12 Thread Tatsuo Ishii
> Oleg and Teodor announced some time ago an extension for this exact use
> case, see
> http://www.postgresql.org/message-id/pine.lnx.4.64.0910062354510.6...@sn.sai.msu.ru
> 
> This also has the advantage of not needing an exclusive lock on the index.

Thanks for the info. I will try out them.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-12-12 Thread Andres Freund
Noah, Robert, All

On 2015-12-11 11:20:21 -0500, Robert Haas wrote:
> On Thu, Dec 10, 2015 at 9:32 AM, Robert Haas  wrote:
> >> Adding a 'prevOffsetStopLimit' and using it seems like a ~5 line patch.
> >
> > So let's do that, then.
> 
> Who is going to take care of this?

Here's two patches:

1) The fix to SetOffsetVacuumLimit().

   I've tested this by introducing a probabilistic "return false;" to
   find_multixact_start(), and torturing postgres by burning through
   billions of multixactids of various sizes.  Behaves about as
   bad^Wgood as without the induced failures; before the patch there
   were moments of spurious warnings/errors when ->offsetStopLimit was
   out of whack.

2) A subset of the comment changes from Noah's repository. Some of the
   comment changes didn't make sense without the removal
   SlruScanDirCbFindEarliest(), a small number of other changes I couldn't
   fully agree with.

   Noah, are you ok with pushing that subset of your changes? Is
   "Slightly edited subset of a larger patchset by Noah Misch" an OK
   attribution?


Noah, on a first glance I think e50cca0ae ("Remove the
SlruScanDirCbFindEarliest() test from TruncateMultiXact().") is a good
idea. So I do encourage you to submit that as a separate patch.

Regards,

Andres
>From d26d41a96a7fc8dbe9827d55837875790b21ca1b Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sat, 12 Dec 2015 17:57:21 +0100
Subject: [PATCH 1/2] Fix bug in SetOffsetVacuumLimit() triggered by
 find_multixact_start() failure.

In case find_multixact_start() failed SetOffsetVacuumLimit() installed 0
into MultiXactState->offsetStopLimit. Unlike oldestOffset the
to-be-installed value was not restored in the error branch.

Luckily there are no known cases where find_multixact_start() will
return an error in 9.5 and above.

But if the bug is triggered, e.g. due to filesystem permission issues,
it'd be somewhat bad: GetNewMultiXactId() could continue allocating
mxids even if close to a wraparound, or it could erroneously stop
allocating mxids, even if no wraparound is looming.  Luckily the wrong
value would be corrected the next time SetOffsetVacuumLimit() is called,
or by a restart.

Reported-By: Noah Misch, although this is not his preferred fix
Backpatch: 9.5, where the bug was introduced as part of 4f627f
---
 src/backend/access/transam/multixact.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index b66a2b6..d2619bd 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2552,6 +2552,7 @@ SetOffsetVacuumLimit(void)
 	bool		oldestOffsetKnown = false;
 	bool		prevOldestOffsetKnown;
 	MultiXactOffset offsetStopLimit = 0;
+	MultiXactOffset prevOffsetStopLimit;
 
 	/*
 	 * NB: Have to prevent concurrent truncation, we might otherwise try to
@@ -2566,6 +2567,7 @@ SetOffsetVacuumLimit(void)
 	nextOffset = MultiXactState->nextOffset;
 	prevOldestOffsetKnown = MultiXactState->oldestOffsetKnown;
 	prevOldestOffset = MultiXactState->oldestOffset;
+	prevOffsetStopLimit = MultiXactState->offsetStopLimit;
 	Assert(MultiXactState->finishedStartup);
 	LWLockRelease(MultiXactGenLock);
 
@@ -2633,11 +2635,13 @@ SetOffsetVacuumLimit(void)
 	{
 		/*
 		 * If we failed to get the oldest offset this time, but we have a
-		 * value from a previous pass through this function, use the old value
-		 * rather than automatically forcing it.
+		 * value from a previous pass through this function, use the old
+		 * values rather than automatically forcing an emergency autovacuum
+		 * cycle again.
 		 */
 		oldestOffset = prevOldestOffset;
 		oldestOffsetKnown = true;
+		offsetStopLimit = prevOffsetStopLimit;
 	}
 
 	/* Install the computed values */
-- 
2.6.0.rc3

>From addafc27e7b89187a3f55a2cbce136d58722c97b Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sat, 12 Dec 2015 17:53:41 +0100
Subject: [PATCH 2/2] Improve/Fix comments around multixact truncation.

My commits 4f627f8 ("Rework the way multixact truncations work.") and
aa29c1c ("Remove legacy multixact truncation support.") missed updating
a number of comments. Fix that. Additionally improve accuracy of a few
of the added comments.

Reported-By: Noah Misch
Author: Slightly edited subset of a larger patchset by Noah Misch
Backpatch: 9.5, which is the first branch to contain the above commits
---
 src/backend/access/heap/README.tuplock |  6 ++
 src/backend/access/transam/multixact.c | 36 +++---
 src/backend/access/transam/xlog.c  |  4 
 3 files changed, 22 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index 10b8d78..f845958 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -102,10 +102,8 @@ the MultiXacts in them are no longer of 

Re: [HACKERS] Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

2015-12-12 Thread Corey Huinker
On Fri, Dec 11, 2015 at 5:35 PM, Peter Geoghegan  wrote:

> On Fri, Dec 11, 2015 at 2:26 PM, Corey Huinker 
> wrote:
> > Sure, the machine we called "ninefivealpha", which incidentally, failed
> to
> > find a single bug in alpha2 thru beta2, is currently idle, and concurrent
> > index creation times are a bugbear around these parts. Can somebody,
> either
> > in this thread or privately, outline what sort of a test they'd like to
> see?
>
> Any kind of CREATE INDEX CONCURRENTLY test, before and after.
>
> I looked at a simple, random int4 column. That seems like a good case
> to focus on, since there isn't too much other overhead.  I think I
> performed my test on an unlogged table, to make sure other overhead
> was even further minimized.
>
> --
> Peter Geoghegan
>

What, if any, other load should be placed on the underlying table during
the test?

I ask because CIC statements that run in seconds on our staging machine can
take many hours on our production machine, when most of the access is just
reads, though those reads may have been part of a larger transaction that
did updates elsewhere.


Re: [HACKERS] Bootstrap DATA is a pita

2015-12-12 Thread Tom Lane
Mark Dilger  writes:
>> On Dec 11, 2015, at 2:54 PM, Caleb Welton  wrote:
>> Compare:
>> CREATE FUNCTION lo_export(oid, text) RETURNS integer LANGUAGE internal 
>> STRICT AS 'lo_export' WITH (OID=765);  
>> 
>> DATA(insert OID = 765 (  lo_export  PGNSP PGUID 12 1 0 0 0 f f f 
>> f t f v u 2 0 23 "26 25" _null_ _null_ _null_ _null_ _null_ lo_export _null_ 
>> _null_ _null_ ));

> I would like to hear more about this idea.  Are you proposing that we use 
> something
> like the above CREATE FUNCTION format to express what is currently being 
> expressed
> with DATA statements?

Yes, that sort of idea has been kicked around some already, see the
archives.

> That is an interesting idea, though I don't know what exactly
> that would look like.  If you want to forward this idea, I'd be eager to hear 
> your thoughts.
> If not, I'll try to make progress with my idea of tab delimited files and 
> such (or really,
> Alvaro's idea of csv files that I only slightly corrupted).

Personally I would like to see both approaches explored.  Installing as
much as we can via SQL commands is attractive for a number of reasons;
but there is going to be an irreducible minimum amount of stuff that
has to be inserted by something close to the current bootstrapping
process.  (And I'm not convinced that that "minimum amount" is going
to be very small...)  So it's not impossible that we'd end up accepting
*both* types of patches, one to do more in the post-bootstrap SQL world
and one to make the bootstrap data notation less cumbersome.  In any
case it would be useful to push both approaches forward some more before
we make any decisions between them.

BTW, there's another thing I'd like to see improved in this area, which is
a problem already but will get a lot worse if we push more work into the
post-bootstrap phase of initdb.  That is that the post-bootstrap phase is
both inefficient and impossible to debug.  If you've ever had a failure
there, you'll have seen that the backend spits out an entire SQL script
and says there's an error in it somewhere; that's because it gets the
whole per-stage script as one submission.  (Try introducing a syntax error
somewhere in information_schema.sql, and you'll see what I mean.)
Breaking the stage scripts down further would help, but that is
unattractive because each one requires a fresh backend startup/shutdown,
including a full checkpoint.  I'd like to see things rejiggered so that
there's only one post-bootstrap standalone backend session that performs
all the steps, but initdb feeds it just one SQL command at a time so that
errors are better localized.  That should both speed up initdb noticeably
and make debugging easier.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PATCH: add pg_current_xlog_flush_location function

2015-12-12 Thread Tomas Vondra

Hi,

attached is a patch adding a function pg_current_xlog_flush_location(), 
which proved quite useful when investigating the ext4 data loss bug. 
It's mostly what was already sent to that thread, except for docs that 
were missing in the initial version.


I'll put this into 2016-01 CF.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 60b9a09..0d67b82 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -16755,6 +16755,9 @@ SELECT set_config('log_statement_stats', 'off', false);
 pg_create_restore_point


+pg_current_xlog_flush_location
+   
+   
 pg_current_xlog_insert_location


@@ -16811,6 +16814,13 @@ SELECT set_config('log_statement_stats', 'off', false);
   
   

+pg_current_xlog_flush_location()
+
+   pg_lsn
+   Get current transaction log flush location
+  
+  
+   
 pg_current_xlog_insert_location()
 
pg_lsn
@@ -16943,13 +16953,14 @@ postgres=# select pg_start_backup('label_goes_here');
 pg_current_xlog_location displays the current transaction log write
 location in the same format used by the above functions.  Similarly,
 pg_current_xlog_insert_location displays the current transaction log
-insertion point.  The insertion point is the logical end
-of the transaction log
-at any instant, while the write location is the end of what has actually
-been written out from the server's internal buffers.  The write location
-is the end of what can be examined from outside the server, and is usually
+insertion point and pg_current_xlog_flush_location displays the
+current transaction log flush point. The insertion point is the logical
+end of the transaction log at any instant, while the write location is the end of
+what has actually been written out from the server's internal buffers and flush
+location is the location guaranteed to be written to durable storage. The write
+location is the end of what can be examined from outside the server, and is usually
 what you want if you are interested in archiving partially-complete transaction log
-files.  The insertion point is made available primarily for server
+files.  The insertion and flush points are made available primarily for server
 debugging purposes.  These are both read-only operations and do not
 require superuser permissions.

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 71fc8ff..40a17e8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10632,6 +10632,19 @@ GetXLogWriteRecPtr(void)
 }
 
 /*
+ * Get latest WAL flush pointer
+ */
+XLogRecPtr
+GetXLogFlushRecPtr(void)
+{
+	SpinLockAcquire(>info_lck);
+	LogwrtResult = XLogCtl->LogwrtResult;
+	SpinLockRelease(>info_lck);
+
+	return LogwrtResult.Flush;
+}
+
+/*
  * Returns the redo pointer of the last checkpoint or restartpoint. This is
  * the oldest point in WAL that we still need, if we have to restart recovery.
  */
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 329bb8c..35c581d 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -195,7 +195,7 @@ pg_current_xlog_location(PG_FUNCTION_ARGS)
 }
 
 /*
- * Report the current WAL insert location (same format as pg_start_backup etc)
+ * Report the current WAL flush location (same format as pg_start_backup etc)
  *
  * This function is mostly for debugging purposes.
  */
@@ -216,6 +216,27 @@ pg_current_xlog_insert_location(PG_FUNCTION_ARGS)
 }
 
 /*
+ * Report the current WAL insert location (same format as pg_start_backup etc)
+ *
+ * This function is mostly for debugging purposes.
+ */
+Datum
+pg_current_xlog_flush_location(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr	current_recptr;
+
+	if (RecoveryInProgress())
+		ereport(ERROR,
+(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("recovery is in progress"),
+ errhint("WAL control functions cannot be executed during recovery.")));
+
+	current_recptr = GetXLogFlushRecPtr();
+
+	PG_RETURN_LSN(current_recptr);
+}
+
+/*
  * Report the last WAL receive location (same format as pg_start_backup etc)
  *
  * This is useful for determining how much of WAL is guaranteed to be received
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 790ca66..985291d 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -235,6 +235,7 @@ extern void GetXLogReceiptTime(TimestampTz *rtime, bool *fromStream);
 extern XLogRecPtr GetXLogReplayRecPtr(TimeLineID *replayTLI);
 extern XLogRecPtr GetXLogInsertRecPtr(void);
 extern XLogRecPtr GetXLogWriteRecPtr(void);
+extern XLogRecPtr GetXLogFlushRecPtr(void);
 extern bool RecoveryIsPaused(void);
 

Re: [HACKERS] Disabling an index temporarily

2015-12-12 Thread Jaime Casanova
On 11 December 2015 at 22:03, Joshua D. Drake  wrote:
> On 12/11/2015 06:25 PM, Tatsuo Ishii wrote:
>
>> What about inventing a new SET command something like:
>>
>> SET disabled_index to 
>>
>> This adds  to "disabled index list". The disabled index
>> list let the planner to disregard the indexes in the list.
>>
>> SET enabled_index to 
>>
>> This removes  from the disabled index list.
>>
>> SHOW disabled_index
>>
>> This shows the content of the disabled index list.
>
>
> Wouldn't something like:
>
> ALTER INDEX foo SET DISABLED;
>
> See more in line with our grammar?
>
> I assume the index is only disabled as far as the planner is concerned and
> all updates/inserts/deletes will still actually update the index
> appropriately?
>

BTW, you can do that today with

UPDATE pg_index SET indisvalid = false
 WHERE indexrelid = 'indexname'::regclass;

-- 
Jaime Casanova  www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

2015-12-12 Thread Peter Geoghegan
On Sat, Dec 12, 2015 at 9:48 AM, Corey Huinker  wrote:
> I ask because CIC statements that run in seconds on our staging machine can
> take many hours on our production machine, when most of the access is just
> reads, though those reads may have been part of a larger transaction that
> did updates elsewhere.

I don't think it's necessary to simulate any other load.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries

2015-12-12 Thread Andreas Seltenreich
Greg Stark writes:

> There may be other errors that would be surprising for Tom or Robert. What
> I did with the string argument fuzzer was printed  counts for each sqlstate
> for the errors and watched for errors that only occurred occasionally or
> didn't make sense to me.
>
> Also, do you have any timeouts?

I currently set statement_timeout to 1s to avoid wasting time letting
postgres crunch numbers.  Less than 0.5% of the queries run into this
timeout.

> Do you have any stats on how long these queries are taking to plan?
> What's the longest query to plan you've found?

No, I'm currently not logging timing spects.  The schema I let the
instances log into is part of the repo[1].

> Do you have coverage data for the corpus?

I do have some older numbers for line coverage from before the recent
grammar extension:

| revision | overall | parser |
|--+-+|
| a4c1989  |26.0 |   20.4 |

regards,
Andreas

Footnotes: 
[1]  https://github.com/anse1/sqlsmith/blob/master/log.sql


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Rework access method interface

2015-12-12 Thread Petr Jelinek

On 2015-12-09 15:09, Alexander Korotkov wrote:


​Patch was rebased against current master.
Any notes about current version of patch?
It would be nice to commit it and continue work on other parts of am
extendability.​



The rebase seems broken, there are things missing in this version of the 
patch (for example the validation functions).



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Handle policies during DROP OWNED BY

2015-12-12 Thread Peter Eisentraut
On 12/11/15 4:12 PM, Stephen Frost wrote:
> As with ACLs, the DROP OWNED BY caller must have permission to modify
> the policy or a WARNING is thrown and no change is made to the policy.

That warning doesn't tell the user anything about how to fix the
situation or whether or why the situation is a problem and what to do
about it.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2015-12-12 Thread and...@anarazel.de
On 2015-11-15 16:24:24 -0500, Robert Haas wrote:
> I think what we should do about the buffer locks is polish up this
> patch and get it applied:
> 
> http://www.postgresql.org/message-id/20150907175909.gd5...@alap3.anarazel.de
> 
> I think it needs to be adapted to use predefined constants for the
> tranche IDs instead of hardcoded values, maybe based on the attached
> tranche-constants.patch.

Here's two patches doing that. The first is an adaption of your
constants patch, using an enum and also converting xlog.c's locks. The
second is the separation into distinct tranches.

One thing to call out is that an "oversized" s_lock can now make
BufferDesc exceed 64 bytes, right now that's just the case when it's
larger than 4 bytes.  I'm not sure if that's cause for real concern,
given that it's not very concurrent or ancient platforms where that's
the case.
http://archives.postgresql.org/message-id/20150915020625.GI9666%40alap3.anarazel.de
would alleviate that concern again, as it collapses flags, usage_count,
buf_hdr_lock and refcount into one 32 bit int...

Greetings,

Andres Freund
>From 5f10d977f285109df16bfef6b79456564251aa54 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sat, 12 Dec 2015 18:41:59 +0100
Subject: [PATCH 1/2] Define enum of builtin LWLock tranches.

It's a bit cumbersome having to allocate tranche IDs with
LWLockNewTrancheId() because the returned value needs to be shared
between backends, which all need to call LWLockRegisterTranche(),
somehow.  For builtin tranches we can easily do better, and simple
pre-define tranche IDs for those.

This is motivated by an upcoming patch adding further builtin tranches.

Author: Robert Haas and Andres Freund
Discussion:
CA+TgmoYciHS4FuU2rYNt8bX0+7ZcNRBGeO-L20apcXTeo7=4...@mail.gmail.com
---
 src/backend/access/transam/xlog.c | 10 +++---
 src/backend/storage/lmgr/lwlock.c |  7 ---
 src/include/storage/lwlock.h  | 11 +++
 3 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 71fc8ff..147fd53 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -512,7 +512,6 @@ typedef struct XLogCtlInsert
 	 */
 	WALInsertLockPadded *WALInsertLocks;
 	LWLockTranche WALInsertLockTranche;
-	int			WALInsertLockTrancheId;
 } XLogCtlInsert;
 
 /*
@@ -4653,7 +4652,7 @@ XLOGShmemInit(void)
 
 		/* Initialize local copy of WALInsertLocks and register the tranche */
 		WALInsertLocks = XLogCtl->Insert.WALInsertLocks;
-		LWLockRegisterTranche(XLogCtl->Insert.WALInsertLockTrancheId,
+		LWLockRegisterTranche(LWTRANCHE_WAL_INSERT,
 			  >Insert.WALInsertLockTranche);
 		return;
 	}
@@ -4677,17 +4676,14 @@ XLOGShmemInit(void)
 		(WALInsertLockPadded *) allocptr;
 	allocptr += sizeof(WALInsertLockPadded) * NUM_XLOGINSERT_LOCKS;
 
-	XLogCtl->Insert.WALInsertLockTrancheId = LWLockNewTrancheId();
-
 	XLogCtl->Insert.WALInsertLockTranche.name = "WALInsertLocks";
 	XLogCtl->Insert.WALInsertLockTranche.array_base = WALInsertLocks;
 	XLogCtl->Insert.WALInsertLockTranche.array_stride = sizeof(WALInsertLockPadded);
 
-	LWLockRegisterTranche(XLogCtl->Insert.WALInsertLockTrancheId, >Insert.WALInsertLockTranche);
+	LWLockRegisterTranche(LWTRANCHE_WAL_INSERT, >Insert.WALInsertLockTranche);
 	for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
 	{
-		LWLockInitialize([i].l.lock,
-		 XLogCtl->Insert.WALInsertLockTrancheId);
+		LWLockInitialize([i].l.lock, LWTRANCHE_WAL_INSERT);
 		WALInsertLocks[i].l.insertingAt = InvalidXLogRecPtr;
 	}
 
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index b13ebc6..84691df 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -445,7 +445,7 @@ CreateLWLocks(void)
 
 		/* Initialize all LWLocks in main array */
 		for (id = 0, lock = MainLWLockArray; id < numLocks; id++, lock++)
-			LWLockInitialize(>lock, 0);
+			LWLockInitialize(>lock, LWTRANCHE_MAIN);
 
 		/*
 		 * Initialize the dynamic-allocation counters, which are stored just
@@ -457,7 +457,7 @@ CreateLWLocks(void)
 		LWLockCounter = (int *) ((char *) MainLWLockArray - 3 * sizeof(int));
 		LWLockCounter[0] = NUM_FIXED_LWLOCKS;
 		LWLockCounter[1] = numLocks;
-		LWLockCounter[2] = 1;	/* 0 is the main array */
+		LWLockCounter[2] = LWTRANCHE_FIRST_USER_DEFINED;
 	}
 
 	if (LWLockTrancheArray == NULL)
@@ -466,12 +466,13 @@ CreateLWLocks(void)
 		LWLockTrancheArray = (LWLockTranche **)
 			MemoryContextAlloc(TopMemoryContext,
 		  LWLockTranchesAllocated * sizeof(LWLockTranche *));
+		Assert(LWLockTranchesAllocated >= LWTRANCHE_FIRST_USER_DEFINED);
 	}
 
 	MainLWLockTranche.name = "main";
 	MainLWLockTranche.array_base = MainLWLockArray;
 	MainLWLockTranche.array_stride = sizeof(LWLockPadded);
-	LWLockRegisterTranche(0, );
+	LWLockRegisterTranche(LWTRANCHE_MAIN, );
 }
 
 /*
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 4653e09..9d22273 

Re: [HACKERS] Using quicksort for every external sort run

2015-12-12 Thread Ants Aasma
On Sat, Dec 12, 2015 at 12:41 AM, Greg Stark  wrote:
> On Wed, Dec 9, 2015 at 2:44 AM, Peter Geoghegan  wrote:
>> On Tue, Dec 8, 2015 at 6:39 PM, Greg Stark  wrote:
>>
>> I guess you mean insertion sort. What's the theoretical justification
>> for the change?
>
> Well my thinking was that hard coding a series of comparisons would be
> faster than a loop doing a O(n^2) algorithm even for small constants.
> And sort networks are perfect for hard coded sorts because they do the
> same comparisons regardless of the results of previous comparisons so
> there are no branches. And even better the comparisons are as much as
> possible independent of each other -- sort networks are typically
> measured by the depth which assumes any comparisons between disjoint
> pairs can be done in parallel. Even if it's implemented in serial the
> processor is probably parallelizing some of the work.
>
> So I implemented a quick benchmark outside Postgres based on sorting
> actual SortTuples with datum1 defined to be random 64-bit integers (no
> nulls). Indeed the sort networks perform faster on average despite
> doing more comparisons. That makes me think the cpu is indeed doing
> some of the work in parallel.

The open coded version you shared bloats the code by 37kB, I'm not
sure it is pulling it's weight, especially given relatively heavy
comparators. A quick index creation test on int4's profiled with perf
shows about 3% of CPU being spent in the code being replaced. Any
improvement on that is going to be too small to easily quantify.

As the open coding doesn't help with eliminating control flow
dependencies, so my idea is to encode the sort network comparison
order in an array and use that to drive a simple loop. The code size
would be pretty similar to insertion sort and the loop overhead should
mostly be hidden by the CPU OoO machinery. Probably won't help much,
but would be interesting and simple enough to try out. Can you share
you code for the benchmark so I can try it out?

Regards,
Ants Aasma


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2015-12-12 Thread Greg Stark
On Sat, Dec 12, 2015 at 7:42 PM, Ants Aasma  wrote:
> As the open coding doesn't help with eliminating control flow
> dependencies, so my idea is to encode the sort network comparison
> order in an array and use that to drive a simple loop. The code size
> would be pretty similar to insertion sort and the loop overhead should
> mostly be hidden by the CPU OoO machinery. Probably won't help much,
> but would be interesting and simple enough to try out. Can you share
> you code for the benchmark so I can try it out?

I can. But the further results showing the number of comparisons is
higher than for insertion sort have dampened my enthusiasm for the
change. I'm assuming that even if it's faster for a simple integer or
sort it'll be much slower for anything that requires calling out to
the datatype comparator. I also hadn't actually measured what
percentage of the sort was being spent in the insertion sort. I had
guessed it would be higher.

The test is attached. qsort_tuple.c is copied from tuplesort (with the
ifdef for NOPRESORT added, but you could skip that if you want.).
Compile with something like:

gcc -DNOPRESORT -O3 -DCOUNTS -Wall -Wno-unused-function simd-sort-test.c


-- 
greg
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#ifdef COUNTS
unsigned nswaps;
unsigned ncmps;
#define countswap (nswaps++)
#define countcmp  (ncmps++)
#else
#define countswap ((void)0)
#define countcmp  ((void)0)
#endif

typedef void *Tuplesortstate;
typedef long long int Datum;
typedef int  bool;

typedef struct
{
	void	   *tuple;			/* the tuple proper */
	Datum		datum1;			/* value of first key column */
	bool		isnull1;		/* is first key column NULL? */
	int			tupindex;		/* see notes above */
} SortTuple;

typedef SortTuple elem_t;

typedef int (*SortTupleComparator) (const SortTuple *a, const SortTuple *b,
	Tuplesortstate *state);

typedef struct SortSupportData *SortSupport;

static int comparator(Datum a, Datum b, SortSupport ssup) {
	if (b > a)
		return -1;
	else if (b < a)
		return 1;
	else return 0;
};

struct SortSupportData {
	bool ssup_nulls_first;
	bool ssup_reverse;
	int (*comparator)(Datum,Datum,SortSupport);
};

struct SortSupportData ssup = {0, 0, comparator};

static inline int
ApplySortComparator(Datum datum1, bool isNull1,
	Datum datum2, bool isNull2,
	SortSupport ssup)
{
	int			compare;

	countcmp;

	if (isNull1)
	{
		if (isNull2)
			compare = 0;		/* NULL "=" NULL */
		else if (ssup->ssup_nulls_first)
			compare = -1;		/* NULL "<" NOT_NULL */
		else
			compare = 1;		/* NULL ">" NOT_NULL */
	}
	else if (isNull2)
	{
		if (ssup->ssup_nulls_first)
			compare = 1;		/* NOT_NULL ">" NULL */
		else
			compare = -1;		/* NOT_NULL "<" NULL */
	}
	else
	{
		compare = (*ssup->comparator) (datum1, datum2, ssup);
		if (ssup->ssup_reverse)
			compare = -compare;
	}

	return compare;
}

#define CHECK_FOR_INTERRUPTS() do {} while (0)

#define Min(a,b) ((a)<(b)?(a):(b))
#define Max(a,b) ((a)<(b)?(b):(a))

#include "qsort_tuple.c"

#define mycmp(a,b)	\
	(ApplySortComparator(list[a].datum1, list[a].isnull1, \
		 list[b].datum1, list[b].isnull1, \
		 ))

#define myswap(a,b)	\
	do {			\
		elem_t _tmp;\
		_tmp = list[a];\
		list[a] = list[b];			\
		list[b] = _tmp;\
		countswap;	\
	} while (0)

#define myswapif(a,b)\
	do {			\
		if (mycmp(a,b) > 0)			\
			myswap(a,b);\
	} while (0)


static int insertion_ok(unsigned N) {
	return N<1000;
}

static void insertion(elem_t *list, unsigned N) {
	if (N > 1000) {
		printf("insertion sort not feasible for large N\n");
		exit(1);
	}

	for (unsigned pm = 1; pm < N; pm++)
		for (unsigned pl = pm; pl && mycmp(pl-1, pl) > 0; pl--)
			myswap(pl, pl-1);
}

static int sort_networks_ok(unsigned N) {
	return N<=8;
}

static void sort_networks(elem_t *list, unsigned N) {
	if (N > 8) {
		printf("sort_networks only implemented for N in 0..8\n");
		exit(1);
	}
	switch(N) {
	case 0: 
	case 1:
		break;
		
	case 2:
		myswapif(0,1);
		break;
		
	case 3:
		myswapif(0,1); myswapif(1,2);
		myswapif(0,1);
		break;
		
	case 4:
		myswapif(0,1); myswapif(2,3);
		myswapif(1,3); myswapif(0,2);
		myswapif(1,2);
		break;
		
	case 5:
		myswapif(1,2); myswapif(3,4);
		myswapif(1,3); myswapif(0,2);
		myswapif(2,4); myswapif(0,3);
		myswapif(0,1); myswapif(2,3);
		myswapif(1,2);
		break;
		
	case 6:
		myswapif(0,1); myswapif(2,3); myswapif(4,5);
		myswapif(0,2); myswapif(3,5); myswapif(1,4);
		myswapif(0,1); myswapif(2,3); myswapif(4,5);
		myswapif(1,2); myswapif(3,4);
		myswapif(2,3);
		break;
		
	case 7:
		myswapif(1,2); myswapif(3,4); myswapif(5,6);
		myswapif(0,2); myswapif(4,6); myswapif(3,5);
		myswapif(2,6); myswapif(1,5); myswapif(0,4);
		myswapif(2,5); myswapif(0,3);
		myswapif(2,4); myswapif(1,3);
		myswapif(0,1); myswapif(2,3); myswapif(4,5);
		break;
		
	case 8:
		myswapif(0, 1); myswapif(2, 3); myswapif(4, 5); myswapif(6, 7);
		myswapif(0, 3); myswapif(1, 2); myswapif(4, 7); myswapif(5, 6);
		

Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2015-12-12 Thread Robert Haas
On Sat, Dec 12, 2015 at 1:17 PM, and...@anarazel.de  wrote:
> On 2015-11-15 16:24:24 -0500, Robert Haas wrote:
>> I think what we should do about the buffer locks is polish up this
>> patch and get it applied:
>>
>> http://www.postgresql.org/message-id/20150907175909.gd5...@alap3.anarazel.de
>>
>> I think it needs to be adapted to use predefined constants for the
>> tranche IDs instead of hardcoded values, maybe based on the attached
>> tranche-constants.patch.
>
> Here's two patches doing that. The first is an adaption of your
> constants patch, using an enum and also converting xlog.c's locks. The
> second is the separation into distinct tranches.

Personally, I prefer the #define approach to the enum, but I can live
with doing it this way.  Other than that, I think these patches look
good, although if it's OK with you I would like to make a pass over
the comments and the commit messages which seem to me that they could
benefit from a bit of editing (but not much substantive change).

> One thing to call out is that an "oversized" s_lock can now make
> BufferDesc exceed 64 bytes, right now that's just the case when it's
> larger than 4 bytes.  I'm not sure if that's cause for real concern,
> given that it's not very concurrent or ancient platforms where that's
> the case.
> http://archives.postgresql.org/message-id/20150915020625.GI9666%40alap3.anarazel.de
> would alleviate that concern again, as it collapses flags, usage_count,
> buf_hdr_lock and refcount into one 32 bit int...

I don't think that would be worth worrying about even if we didn't
have a plan in mind that would make it go away again, and even less so
given that we do have such a plan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Aggregate

2015-12-12 Thread Robert Haas
On Fri, Dec 11, 2015 at 4:42 PM, David Rowley
 wrote:
> On 12 December 2015 at 04:00, Robert Haas  wrote:
>> I'd like to commit David Rowley's patch from the other thread first,
>> and then deal with this one afterwards.  The only thing I feel
>> strongly needs to be changed in that patch is CFUNC -> COMBINEFUNC,
>> for clarity.
>
>
> I have addressed that in my local copy. I'm now just working on adding some
> test code which uses the new infrastructure.

Excellent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Handle policies during DROP OWNED BY

2015-12-12 Thread Robert Haas
On Sat, Dec 12, 2015 at 4:35 PM, Stephen Frost  wrote:
> * Peter Eisentraut (pete...@gmx.net) wrote:
>> On 12/11/15 4:12 PM, Stephen Frost wrote:
>> > As with ACLs, the DROP OWNED BY caller must have permission to modify
>> > the policy or a WARNING is thrown and no change is made to the policy.
>>
>> That warning doesn't tell the user anything about how to fix the
>> situation or whether or why the situation is a problem and what to do
>> about it.
>
> I modeled it after the other warnings which are output by DROP OWNED BY
> when it's unable to perform the requested drop.  I'm not against trying
> to add something, but you tend to get a bunch of those messages at once
> which means having a hint would result in a bunch of repeated messages
> and I don't think that'd be very helpful.  Further, it's essentially a
> 'permission denied' type of error, which generally means that the
> individual who is running it can't do anything to fix it anyway.
>
> I'm not against looking to improve things here, but I don't think just
> trying to make a change here makes sense.  We could throw a warning+hint
> at the end of DROP OWNED, if anything wasn't able to be dropped, which
> provided more information, perhaps.  I'm not convinced that would really
> be very useful to the individual running the command and would need to,
> in essence, be "please get someone with higher privileges to run this,
> or get them to give you permission to run it".  I don't think we really
> want to go there (anyone else recall the "please see your network
> administrator" errors..?).
>
> If I'm misunderstanding your thoughts here, please let me know.

This appears to address one of the open items at
https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items -- if so,
please update that page.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Using a single standalone-backend run in initdb (was Re: [HACKERS] Bootstrap DATA is a pita)

2015-12-12 Thread Mark Dilger

> On Dec 12, 2015, at 3:42 PM, Tom Lane  wrote:
> 
> Joe Conway  writes:
>> On 12/12/2015 02:31 PM, Tom Lane wrote:
>>> I'm not particularly wedded to this rule.  In principle we could go so
>>> far as to import psql's code that parses commands and figures out which
>>> semicolons are command terminators --- but that is a pretty large chunk
>>> of code, and I think it'd really be overkill considering that initdb
>>> deals only with fixed input scripts.  But if anyone has another simple
>>> rule for breaking SQL into commands, we can certainly discuss
>>> alternatives.
> 
>> Possibly inadequate, but I wrote a get_one_query() function to grab one
>> statement at a time from a possibly multi-statement string and it isn't
>> all that many lines of code:
>>  https://github.com/jconway/pgsynck/blob/master/pgsynck.c
> 
> Hmm.  Doesn't look like that handles semicolons embedded in CREATE RULE;
> for that you'd have to track parenthesis nesting as well.  (It's arguable
> that we won't ever need that case during initdb, but I'd just as soon not
> wire in such an assumption.)  In general, though, I'd rather not try to
> teach InteractiveBackend() such a large amount about SQL syntax.

I use CREATE RULE within startup files in the fork that I maintain.  I have
lots of them, totaling perhaps 50k lines of rule code.  I don't think any of 
that
code would have a problem with the double-newline separation you propose,
which seems a more elegant solution to me.  Admittedly, the double-newline
separation would need to be documented at the top of each sql file, otherwise
it would be quite surprising to those unfamiliar with it.

You mentioned upthread that introducing a syntax error in one of these files
results in a not-so-helpful error message that dumps the contents of the
entire file.  I can confirm that happens, and is hardly useful.

mark



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-12-12 Thread Robert Haas
On Sat, Dec 12, 2015 at 12:02 PM, Andres Freund  wrote:
> Noah, Robert, All
>
> On 2015-12-11 11:20:21 -0500, Robert Haas wrote:
>> On Thu, Dec 10, 2015 at 9:32 AM, Robert Haas  wrote:
>> >> Adding a 'prevOffsetStopLimit' and using it seems like a ~5 line patch.
>> >
>> > So let's do that, then.
>>
>> Who is going to take care of this?
>
> Here's two patches:
>
> 1) The fix to SetOffsetVacuumLimit().
>
>I've tested this by introducing a probabilistic "return false;" to
>find_multixact_start(), and torturing postgres by burning through
>billions of multixactids of various sizes.  Behaves about as
>bad^Wgood as without the induced failures; before the patch there
>were moments of spurious warnings/errors when ->offsetStopLimit was
>out of whack.

I find the commit message you wrote a little difficult to read, and
propose the following version instead, which reads better to me:

Previously, if find_multixact_start() failed, SetOffsetVacuumLimit()
would install 0 into MultiXactState->offsetStopLimit.  Luckily, there
are no known cases where find_multixact_start() will return an error
in 9.5 and above. But if it were to happen, for example due to
filesystem permission issues, it'd be somewhat bad:
GetNewMultiXactId() could continue allocating mxids even if close to a
wraparound, or it could erroneously stop allocating mxids, even if no
wraparound is looming.  The wrong value would be corrected the next
time SetOffsetVacuumLimit() is called, or by a restart.

(I have no comments on the substance of either patch and have reviewed
the first one to a negligible degree - it doesn't look obviously wrong
- and the second one not at all.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] PostGIS Doc Urls

2015-12-12 Thread Paul Ramsey
The attached patch changes web references to PostGIS to point to the actual 
community site (postgis.net) which is active, rather than the historical site 
(postgis.org).

P.


-- 
Paul Ramsey
http://cleverelephant.ca
http://postgis.net

postgis_doc.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2015-12-12 Thread Jeff Janes
On Sat, Dec 12, 2015 at 2:28 PM, Peter Geoghegan  wrote:
> On Sat, Dec 12, 2015 at 12:10 AM, Jeff Janes  wrote:
>> I have a question about the terminology used in this patch.  What is a
>> tuple proper?  What is it in contradistinction to?  I would think that
>> a tuple which is located in its own palloc'ed space is the "proper"
>> one, leaving a tuple allocated in the bulk memory pool to be
>> called...something else.  I don't know what the
>> non-judgmental-sounding antonym of postpositive "proper" is.
>
> "Tuple proper" is a term that appears 5 times in tuplesort.c today. As
> it says at the top of that file:
>
> /*
>  * The objects we actually sort are SortTuple structs.  These contain
>  * a pointer to the tuple proper (might be a MinimalTuple or IndexTuple),
>  * which is a separate palloc chunk --- we assume it is just one chunk and
>  * can be freed by a simple pfree().  SortTuples also contain the tuple's
>  * first key column in Datum/nullflag format, and an index integer.

Those usages make sense to me, as they are locally self-contained and
it is clear what they are in contradistinction to.   But your usage is
spread throughout (even in function names, not just comments) and
seems to contradict the current usage as yours are not separately
palloced, as the "proper" ones described here are.  I think that
"proper" only works when the same comment also defines the
alternative, rather than as some file-global description.  Maybe
"pooltuple" rather than "tupleproper"


>
>> Also, if I am reading this correctly, when we refill a pool from a
>> logical tape we still transform each tuple as it is read from the disk
>> format to the memory format.  This inflates the size quite a bit, at
>> least for single-datum tuples.  If we instead just read the disk
>> format directly into the pool, and converted them into the in-memory
>> format when each tuple came due for the merge heap, would that destroy
>> the locality of reference you are seeking to gain?
>
> Are you talking about alignment?

Maybe alignment, but also the size of the SortTuple struct itself,
which is not present on tape but is present in memory if I understand
correctly.

When reading 128kb (32 blocks) worth of in-memory pool, it seems like
it only gets to read 16 to 18 blocks of tape to fill them up, in the
case of building an index on single column 32-byte random md5 digests.
I don't exactly know where all of that space goes, I'm taking an
experimentalist approach.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries

2015-12-12 Thread David Fetter
On Sun, Dec 13, 2015 at 12:04:34AM +, Greg Stark wrote:
> On Sat, Dec 12, 2015 at 8:30 PM, Andreas Seltenreich  
> wrote:
> Did you publish the source already? I haven't been following all
> along, sorry if these are all answered questions.

Either he has, or the following is a truly astonishing coincidence:

https://github.com/anse1/sqlsmith

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Using a single standalone-backend run in initdb (was Re: [HACKERS] Bootstrap DATA is a pita)

2015-12-12 Thread Tom Lane
Andres Freund  writes:
> That's cool too. Besides processing the .bki files, and there largely
> reg*_in, the many restarts are the most expensive parts of initdb.

BTW, in case anyone is doubting it, I did a little bit of "perf" tracing
and confirmed Andres' comment here: more than 50% of the runtime of the
bootstrap phase is eaten by the pg_proc seqscans performed by regprocin.
There's nothing else amounting to more than a few percent, so basically
nothing else in bootstrap is worth optimizing before we fix that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] PostGIS Doc Urls

2015-12-12 Thread Tom Lane
Paul Ramsey  writes:
> The attached patch changes web references to PostGIS to point to the actual 
> community site (postgis.net) which is active, rather than the historical site 
> (postgis.org).

Pushed, thanks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries

2015-12-12 Thread Andreas Seltenreich
Greg Stark writes:

> On Sat, Dec 12, 2015 at 8:30 PM, Andreas Seltenreich  
> wrote:
> When you hit the timeout is this implemented in your fuzzer or using
> statement_timeout? If the former, can you add a statement_timeout of
> just short of the timeout in the fuzzer and find cases where the
> planner might not be calling CHECK_FOR_INTERRUPTS frequently enough?

It's the latter.  I don't think I can add a client-side timeout into
sqlsmith elegantly.  IMHO it's better to write another test tool that
just re-runs the queries that were logged with a timeout by sqlsmith and
investigates their timeout-behavior more closely.

>> I do have some older numbers for line coverage from before the recent 
>> grammar extension:
>
> If you have a corpus of queries in a simple format it would be pretty
> convenient to add them in a regression test and then run make coverage
> to get html reports.

Hmm, I thought I found a workflow that would yield sqlsmith's coverage
without integrating it into the regession tests.  This is what I did:

make install
initdb /tmp/gcov
pg_ctl -D /tmp/gcov start
make installcheck
pg_ctl -D /tmp/gcov stop
make coverage-clean
pg_ctl -D /tmp/gcov start
sqlsmith --target='dbname=regression' --max-queries=1
pg_ctl -D /tmp/gcov stop
make coverage-html

It seems to yield a pure sqlsmith-only coverage report, as a "make
coverage-html" before the "make coverage-clean" yields a report with
much higher score.  Maybe there are drawbacks to the workflow you are
suggesting?  I just re-did it with the current sqlsmith code, and it's
up by 25% compared to the latest tested revision:

| revision | overall | parser |
|--+-+|
| a4c1989  |26.0 |   20.4 |
| ee099e6  |33.8 |   25.8 |

I also put the report here, in case someone wants to look at certain
details, or make suggestions into what directions to best extend the
grammar to increase coverage.

http://ansel.ydns.eu/~andreas/coverage/
http://ansel.ydns.eu/~andreas/gcov.tar.xz

> Did you publish the source already? I haven't been following all
> along, sorry if these are all answered questions.

It's not had a proper release yet, but the code is available via github
in all its rapid-prototypesque glory:

https://github.com/anse1/sqlsmith

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Handle policies during DROP OWNED BY

2015-12-12 Thread Stephen Frost
* Peter Eisentraut (pete...@gmx.net) wrote:
> On 12/11/15 4:12 PM, Stephen Frost wrote:
> > As with ACLs, the DROP OWNED BY caller must have permission to modify
> > the policy or a WARNING is thrown and no change is made to the policy.
> 
> That warning doesn't tell the user anything about how to fix the
> situation or whether or why the situation is a problem and what to do
> about it.

I modeled it after the other warnings which are output by DROP OWNED BY
when it's unable to perform the requested drop.  I'm not against trying
to add something, but you tend to get a bunch of those messages at once
which means having a hint would result in a bunch of repeated messages
and I don't think that'd be very helpful.  Further, it's essentially a
'permission denied' type of error, which generally means that the
individual who is running it can't do anything to fix it anyway.

I'm not against looking to improve things here, but I don't think just
trying to make a change here makes sense.  We could throw a warning+hint
at the end of DROP OWNED, if anything wasn't able to be dropped, which
provided more information, perhaps.  I'm not convinced that would really
be very useful to the individual running the command and would need to,
in essence, be "please get someone with higher privileges to run this,
or get them to give you permission to run it".  I don't think we really
want to go there (anyone else recall the "please see your network
administrator" errors..?).

If I'm misunderstanding your thoughts here, please let me know.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] PATCH: track last known XLOG segment in control file

2015-12-12 Thread Andres Freund
Hi,

On 2015-12-12 22:14:13 +0100, Tomas Vondra wrote:
> this is the second improvement proposed in the thread [1] about ext4 data
> loss issue. It adds another field to control file, tracking the last known
> WAL segment. This does not eliminate the data loss, just the silent part of
> it when the last segment gets lost (due to forgetting the rename, deleting
> it by mistake or whatever). The patch makes sure the cluster refuses to
> start if that happens.

Uh, that's fairly expensive. In many cases it'll significantly increase
the number of fsyncs. I've a bit of a hard time believing this'll be
worthwhile. Additionally this doesn't seem to take WAL replay into
account?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2015-12-12 Thread Peter Geoghegan
On Sat, Dec 12, 2015 at 12:10 AM, Jeff Janes  wrote:
> I have a question about the terminology used in this patch.  What is a
> tuple proper?  What is it in contradistinction to?  I would think that
> a tuple which is located in its own palloc'ed space is the "proper"
> one, leaving a tuple allocated in the bulk memory pool to be
> called...something else.  I don't know what the
> non-judgmental-sounding antonym of postpositive "proper" is.

"Tuple proper" is a term that appears 5 times in tuplesort.c today. As
it says at the top of that file:

/*
 * The objects we actually sort are SortTuple structs.  These contain
 * a pointer to the tuple proper (might be a MinimalTuple or IndexTuple),
 * which is a separate palloc chunk --- we assume it is just one chunk and
 * can be freed by a simple pfree().  SortTuples also contain the tuple's
 * first key column in Datum/nullflag format, and an index integer.

> Also, if I am reading this correctly, when we refill a pool from a
> logical tape we still transform each tuple as it is read from the disk
> format to the memory format.  This inflates the size quite a bit, at
> least for single-datum tuples.  If we instead just read the disk
> format directly into the pool, and converted them into the in-memory
> format when each tuple came due for the merge heap, would that destroy
> the locality of reference you are seeking to gain?

Are you talking about alignment?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: track last known XLOG segment in control file

2015-12-12 Thread Tomas Vondra



On 12/12/2015 11:20 PM, Andres Freund wrote:

Hi,

On 2015-12-12 22:14:13 +0100, Tomas Vondra wrote:

this is the second improvement proposed in the thread [1] about ext4 data
loss issue. It adds another field to control file, tracking the last known
WAL segment. This does not eliminate the data loss, just the silent part of
it when the last segment gets lost (due to forgetting the rename, deleting
it by mistake or whatever). The patch makes sure the cluster refuses to
start if that happens.


Uh, that's fairly expensive. In many cases it'll significantly
increase the number of fsyncs.


It should do exactly 1 additional fsync per WAL segment. Or do you think 
otherwise?


> I've a bit of a hard time believing this'll be worthwhile.

The trouble is protections like this only seem worthwhile after the 
fact, when something happens. I think it's reasonable protection against 
issues similar to the one I reported ~2 weeks ago. YMMV.


> Additionally this doesn't seem to take WAL replay into account?

I think the comparison in StartupXLOG needs to be less strict, to allow 
cases when we actually replay more WAL segments. Is that what you mean?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Using a single standalone-backend run in initdb (was Re: [HACKERS] Bootstrap DATA is a pita)

2015-12-12 Thread Joe Conway
On 12/12/2015 02:31 PM, Tom Lane wrote:
> I'm not particularly wedded to this rule.  In principle we could go so
> far as to import psql's code that parses commands and figures out which
> semicolons are command terminators --- but that is a pretty large chunk
> of code, and I think it'd really be overkill considering that initdb
> deals only with fixed input scripts.  But if anyone has another simple
> rule for breaking SQL into commands, we can certainly discuss
> alternatives.

Possibly inadequate, but I wrote a get_one_query() function to grab one
statement at a time from a possibly multi-statement string and it isn't
all that many lines of code:

  https://github.com/jconway/pgsynck/blob/master/pgsynck.c

> Anyway, the attached patch tweaks postgres.c to follow that rule instead
> of slurp-to-EOF when -j is given.  I doubt that being non-backwards-
> compatible is a problem here; in fact, I'm tempted to rip out the -j
> switch altogether and just have standalone mode always parse input the
> same way.  Does anyone know of people using standalone mode other than
> for initdb?

sepgsql uses it for installation, but it does not appear to use -j
I'm not sure why it is required but at some point I'd like to dig into that.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

2015-12-12 Thread Peter Geoghegan
On Fri, Dec 11, 2015 at 9:13 AM, Robert Haas  wrote:
> Also, I'd be in favor of you updating the patch to reflect the
> comments from Tom and Simon on November 17th.

Attached revision:

* Has more worked out comments on encoding, per Simon's request.

* Uses Tom's preferred formulation for encoding TIDs (which involves
unsigned integer casts).

* Sets indexcursor = NULL when the tuplesort is empty, just to be
tidy, per Tom's request.

-- 
Peter Geoghegan
From dd15ae3d0a2bc29fbcefebe2ae08834d2e247070 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Sat, 14 Nov 2015 16:57:20 -0800
Subject: [PATCH] Speed up CREATE INDEX CONCURRENTLY's TID sort

Add a simple TID encoding scheme.  This is used for encoding and
decoding TIDs to and from int8/int64 values, for the benefit of an
expensive, extra tuplesort operation performed by CREATE INDEX
CONCURRENTLY.  The sort now uses the default int8 opclass to sort values
in the ad-hoc encoding, and so benefits from the availability of int8
SortSupport.

Despite the fact that item pointers take up only 6 bytes on disk, the
TID type is always pass-by-reference due to considerations about how
Datums are represented and accessed.  On the other hand, int8 is
usually, in practice, a pass-by-value type.

Testing shows this approach makes CREATE INDEX CONCURRENTLY
significantly faster on platforms where int8 is pass-by-value,
especially when the big saving in memory within tuplesort.c prevents the
sort from being performed externally.  The approach will help to some
degree on all platforms, though.
---
 src/backend/catalog/index.c | 71 ++---
 1 file changed, 67 insertions(+), 4 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index e59b163..c10be3d 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -109,6 +109,8 @@ static void index_update_stats(Relation rel,
 static void IndexCheckExclusion(Relation heapRelation,
 	Relation indexRelation,
 	IndexInfo *indexInfo);
+static inline int64 itemptr_encode(ItemPointer itemptr);
+static inline void itemptr_decode(ItemPointer itemptr, int64 encoded);
 static bool validate_index_callback(ItemPointer itemptr, void *opaque);
 static void validate_index_heapscan(Relation heapRelation,
 		Relation indexRelation,
@@ -2832,7 +2834,13 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	ivinfo.num_heap_tuples = heapRelation->rd_rel->reltuples;
 	ivinfo.strategy = NULL;
 
-	state.tuplesort = tuplesort_begin_datum(TIDOID, TIDLessOperator,
+	/*
+	 * Encode TIDs as int8 values for the sort, rather than directly sorting
+	 * item pointers.  This can be significantly faster, primarily because TID
+	 * is a pass-by-reference type on all platforms, whereas int8 is
+	 * pass-by-value on most platforms.
+	 */
+	state.tuplesort = tuplesort_begin_datum(INT8OID, Int8LessOperator,
 			InvalidOid, false,
 			maintenance_work_mem,
 			false);
@@ -2872,14 +2880,55 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 }
 
 /*
+ * itemptr_encode - Encode ItemPointer as int64/int8
+ *
+ * This representation must produce values encoded as int64 that sort in the
+ * same order as their corresponding original TID values would (using the
+ * default int8 opclass to produce a result equivalent to the default TID
+ * opclass).
+ *
+ * As noted in validate_index(), this can be significantly faster.
+ */
+static inline int64
+itemptr_encode(ItemPointer itemptr)
+{
+	BlockNumber		block = ItemPointerGetBlockNumber(itemptr);
+	OffsetNumber	offset = ItemPointerGetOffsetNumber(itemptr);
+	int64			encoded;
+
+	/*
+	 * Use the 16 least significant bits for the offset.  32 adjacent bits are
+	 * used for the block number.  Since remaining bits are unused, there
+	 * cannot be negative encoded values (We assume a two's complement
+	 * representation).
+	 */
+	encoded = ((uint64) block << 16) | (uint16) offset;
+
+	return encoded;
+}
+
+/*
+ * itemptr_decode - Decode int64/int8 representation back to ItemPointer
+ */
+static inline void
+itemptr_decode(ItemPointer itemptr, int64 encoded)
+{
+	BlockNumber		block = (BlockNumber) (encoded >> 16);
+	OffsetNumber	offset = (OffsetNumber) (encoded & 0x);
+
+	ItemPointerSet(itemptr, block, offset);
+}
+
+/*
  * validate_index_callback - bulkdelete callback to collect the index TIDs
  */
 static bool
 validate_index_callback(ItemPointer itemptr, void *opaque)
 {
 	v_i_state  *state = (v_i_state *) opaque;
+	int64		encoded = itemptr_encode(itemptr);
 
-	tuplesort_putdatum(state->tuplesort, PointerGetDatum(itemptr), false);
+	tuplesort_putdatum(state->tuplesort, Int64GetDatum(encoded), false);
 	state->itups += 1;
 	return false;/* never actually delete anything */
 }
@@ -2911,6 +2960,7 @@ validate_index_heapscan(Relation heapRelation,
 
 	/* state variables for the merge */
 	ItemPointer indexcursor = NULL;
+	

Using a single standalone-backend run in initdb (was Re: [HACKERS] Bootstrap DATA is a pita)

2015-12-12 Thread Tom Lane
I wrote:
> BTW, there's another thing I'd like to see improved in this area, which is
> a problem already but will get a lot worse if we push more work into the
> post-bootstrap phase of initdb.  That is that the post-bootstrap phase is
> both inefficient and impossible to debug.  If you've ever had a failure
> there, you'll have seen that the backend spits out an entire SQL script
> and says there's an error in it somewhere; that's because it gets the
> whole per-stage script as one submission.  (Try introducing a syntax error
> somewhere in information_schema.sql, and you'll see what I mean.)
> Breaking the stage scripts down further would help, but that is
> unattractive because each one requires a fresh backend startup/shutdown,
> including a full checkpoint.  I'd like to see things rejiggered so that
> there's only one post-bootstrap standalone backend session that performs
> all the steps, but initdb feeds it just one SQL command at a time so that
> errors are better localized.  That should both speed up initdb noticeably
> and make debugging easier.

I thought this sounded like a nice lazy-Saturday project, so I started
poking at it, and attached is a WIP patch.  The core issue that has to
be dealt with is that standalone-backend mode currently has just two
rules for deciding when to stop collecting input and execute the command
buffer, and they both suck:

1. By default, execute after every newline.  (Actually, you can quote
a newline with a backslash, but we don't use that ability anywhere.)

2. With -j, slurp the entire input until EOF, and execute it as one
giant multicommand string.

We're doing #2 to handle information_schema.sql and the other large
SQL scripts that initdb runs, which is why the response to an error in
those scripts is so yucky.

After some experimentation, I came up with the idea of executing any
time that a semicolon followed by two newlines is seen.  This nicely
breaks up input like information_schema.sql.  There are probably some
residual places where more than one command is executed in a single
string, but we could fix that with some more newlines.  Obviously,
this rule is capable of being fooled if you have a newline followed by
a blank line in a comment or quoted literal --- but it turns out that
no such case exists anywhere in initdb's data.

I'm not particularly wedded to this rule.  In principle we could go so
far as to import psql's code that parses commands and figures out which
semicolons are command terminators --- but that is a pretty large chunk
of code, and I think it'd really be overkill considering that initdb
deals only with fixed input scripts.  But if anyone has another simple
rule for breaking SQL into commands, we can certainly discuss
alternatives.

Anyway, the attached patch tweaks postgres.c to follow that rule instead
of slurp-to-EOF when -j is given.  I doubt that being non-backwards-
compatible is a problem here; in fact, I'm tempted to rip out the -j
switch altogether and just have standalone mode always parse input the
same way.  Does anyone know of people using standalone mode other than
for initdb?

The other part of the patch modifies initdb to do all its post-bootstrap
steps using a single standalone backend session.  I had to remove the
code that currently prints out progress markers for individual phases
of that processing, so that now you get output that looks like

creating directory /home/postgres/testversion/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
creating template1 database in /home/postgres/testversion/data/base/1 ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

Since initdb is just printing to the backend in an open-loop fashion,
it doesn't really know whether each command succeeds; in fact it is
usually pretty far ahead of the backend, until it does pclose() which
waits for the subprocess to exit.  So we can't readily keep the old
progress markers.  I don't think we'll miss them though.  The whole
"post-bootstrap initialization" step only takes a second or so on my
main dev machine, so breaking down the progress more finely isn't that
useful anymore.

I had to change ;\n to ;\n\n in a few places in initdb's internal
scripts, to ensure that VACUUM commands were executed by themselves
(otherwise you get "VACUUM can't run in a transaction block" type
failures).  I wasn't very thorough about that though, pending a
decision on exactly what the new command-boundary rule will be.

The upshot of these changes is that initdb runs about 10% faster overall
(more in -N mode), which is a useful savings.  Also, the response to a
syntax error in information_schema.sql now looks like this:

creating template1 database in /home/postgres/testversion/data/base/1 ... ok
performing post-bootstrap initialization ... FATAL:  column 

Re: [HACKERS] Bootstrap DATA is a pita

2015-12-12 Thread Andres Freund
On 2015-12-12 13:28:28 -0500, Tom Lane wrote:
> BTW, there's another thing I'd like to see improved in this area, which is
> a problem already but will get a lot worse if we push more work into the
> post-bootstrap phase of initdb.  That is that the post-bootstrap phase is
> both inefficient and impossible to debug.  If you've ever had a failure
> there, you'll have seen that the backend spits out an entire SQL script
> and says there's an error in it somewhere; that's because it gets the
> whole per-stage script as one submission.

Seen that more than once :(

> Breaking the stage scripts down further would help, but that is
> unattractive because each one requires a fresh backend startup/shutdown,
> including a full checkpoint.  I'd like to see things rejiggered so that
> there's only one post-bootstrap standalone backend session that performs
> all the steps, but initdb feeds it just one SQL command at a time so that
> errors are better localized.  That should both speed up initdb noticeably
> and make debugging easier.

One way to do that would be to not use the single user mode for that
stage. Afair, at that point we could actually just start the cluster
"normally", with the socket pointing somewhere locally (ugh, some path
length issues afoot, maybe allow relative directories? And, uh,
windows), and use psql to do the splitting and everything for us.

Your approach has probably some significant performance benefits,
because it essentially does pipelining. So while aesthetically
attractive, I'm afraid my proposal would lead to worse performance. So
it's probably actually DOA :(

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Using a single standalone-backend run in initdb (was Re: [HACKERS] Bootstrap DATA is a pita)

2015-12-12 Thread Andres Freund
On 2015-12-12 17:31:49 -0500, Tom Lane wrote:
> I thought this sounded like a nice lazy-Saturday project, so I started
> poking at it, and attached is a WIP patch.

Not bad, not bad at all.


> After some experimentation, I came up with the idea of executing any
> time that a semicolon followed by two newlines is seen.  ...
> but it turns out that no such case exists anywhere in initdb's data.

Not pretty, but hardly any worse than the current situation.


> I'm not particularly wedded to this rule.  In principle we could go so
> far as to import psql's code that parses commands and figures out which
> semicolons are command terminators --- but that is a pretty large chunk
> of code, and I think it'd really be overkill considering that initdb
> deals only with fixed input scripts.

Having that code somewhere abstracted wouldn't be bad though, extension
scripts have a somewhat similar problem.


> Anyway, the attached patch tweaks postgres.c to follow that rule instead
> of slurp-to-EOF when -j is given.  I doubt that being non-backwards-
> compatible is a problem here; in fact, I'm tempted to rip out the -j
> switch altogether and just have standalone mode always parse input the
> same way.

No objection here.


> Does anyone know of people using standalone mode other than
> for initdb?

Unfortunately yes. There's docker instances around that configure users
and everything using it.


> The other part of the patch modifies initdb to do all its post-bootstrap
> steps using a single standalone backend session.  I had to remove the
> code that currently prints out progress markers for individual phases
> of that processing, so that now you get output that looks like

That's cool too. Besides processing the .bki files, and there largely
reg*_in, the many restarts are the most expensive parts of initdb.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error with index on unlogged table

2015-12-12 Thread Michael Paquier
On Sat, Dec 12, 2015 at 10:31 PM, Andres Freund  wrote:
> On 2015-12-12 12:52:21 +0100, Andres Freund wrote:
>> On 2015-12-12 20:49:52 +0900, Michael Paquier wrote:
>> > Should we consider this bug a 9.5 blocker?
>>
>> I don't think so. But either way, I'm right now working on getting it
>> fixed anyway.
>
> And done. Took a bit longer than predicted - I had to adjust my test
> scripts to cope with < 9.4 not having tablespace mapping. Rather
> annoying.

Thanks! Cool to see this thread finally addressed.

> It'd be really convenient if tablespaces relative to the main data
> directory were supported...

I got bitten at some point as well by the fact that tablespaces
created after a base backup is taken have their location map to the
same point for a master and a standby :)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Using a single standalone-backend run in initdb (was Re: [HACKERS] Bootstrap DATA is a pita)

2015-12-12 Thread Tom Lane
Andres Freund  writes:
> On 2015-12-12 17:31:49 -0500, Tom Lane wrote:
>> Does anyone know of people using standalone mode other than
>> for initdb?

> Unfortunately yes. There's docker instances around that configure users
> and everything using it.

Hm, that means that we *do* have to worry about backwards compatibility.
We might be able to get away with changing the behavior of -j mode anyway,
though, since this proposal mostly only changes when execution happens
and not what is valid input for -j mode.  (It would probably break some
apps if we took away the switch, since right now, you do not need a
semicolon to terminate commands in the regular standalone mode.)
Failing that, we could define a new switch, I guess.

> That's cool too. Besides processing the .bki files, and there largely
> reg*_in, the many restarts are the most expensive parts of initdb.

Right.  The proposal we were discussing upthread would move all the reg*
lookups into creation of the .bki file, basically, which would improve
that part of things quite a bit.  (BTW, if we are concerned about initdb
speed, that might be a reason not to be too eager to shift processing
from bootstrap to non-bootstrap mode.  Other than the reg* issue,
bootstrap is certainly a far faster way to put rows into the catalogs
than individual SQL commands could be.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PATCH: track last known XLOG segment in control file

2015-12-12 Thread Tomas Vondra

Hi,

this is the second improvement proposed in the thread [1] about ext4 
data loss issue. It adds another field to control file, tracking the 
last known WAL segment. This does not eliminate the data loss, just the 
silent part of it when the last segment gets lost (due to forgetting the 
rename, deleting it by mistake or whatever). The patch makes sure the 
cluster refuses to start if that happens.


[1] http://www.postgresql.org/message-id/56583bdd.9060...@2ndquadrant.com

It's a fairly simple patch, but obviously it touches very complex part 
of the code. I'll add it to 2016-01 CF.



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 71fc8ff..50f10a5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -,6 +,16 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 			use_existent = true;
 			openLogFile = XLogFileInit(openLogSegNo, _existent, true);
 			openLogOff = 0;
+
+			/* update the last known segment in the control file */
+			LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+			if (ControlFile->lastKnownSegment != openLogSegNo)
+			{
+elog(WARNING, "updating segment number = %lu", openLogSegNo);
+ControlFile->lastKnownSegment = openLogSegNo;
+UpdateControlFile();
+			}
+			LWLockRelease(ControlFileLock);
 		}
 
 		/* Make sure we have the current logfile open */
@@ -5904,6 +5914,7 @@ StartupXLOG(void)
 	XLogPageReadPrivate private;
 	bool		fast_promoted = false;
 	struct stat st;
+	XLogSegNo	lastLogSegNo = 0;
 
 	/*
 	 * Read control file and check XLOG status looks valid.
@@ -6865,6 +6876,9 @@ StartupXLOG(void)
 /* Remember this record as the last-applied one */
 LastRec = ReadRecPtr;
 
+/* Also remember the segment number */
+XLByteToSeg(ReadRecPtr, lastLogSegNo);
+
 /* Allow read-only connections if we're consistent now */
 CheckRecoveryConsistency();
 
@@ -6942,6 +6956,18 @@ StartupXLOG(void)
 	RmgrTable[rmid].rm_cleanup();
 			}
 
+			/*
+			 * Check that we've actually seen all the XLOG segments, i.e. that
+			 * we've reached ControlFile->lastKnownSegment (this may fail for
+			 * example when someone deletes the last XLOG segment, or in case
+			 * of a filesystem issue).
+			 */
+			if (ControlFile->lastKnownSegment != lastLogSegNo)
+ereport(FATAL,
+		(errmsg("not reached the last known segment (expected %lX/%lX seen %lX/%lX)",
+(ControlFile->lastKnownSegment >> 8), (ControlFile->lastKnownSegment & 0xFF),
+(lastLogSegNo >> 8), (lastLogSegNo & 0xFF;
+
 			ereport(LOG,
 	(errmsg("redo done at %X/%X",
 		 (uint32) (ReadRecPtr >> 32), (uint32) ReadRecPtr)));
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 32e1d81..44dde42 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -293,6 +293,9 @@ main(int argc, char *argv[])
 		   (uint32) ControlFile.backupEndPoint);
 	printf(_("End-of-backup record required:%s\n"),
 		   ControlFile.backupEndRequired ? _("yes") : _("no"));
+	printf(_("Last known segment:   %lX/%X\n"),
+		   (uint64) (ControlFile.lastKnownSegment >> 8),
+		   (uint32) (ControlFile.lastKnownSegment & 0xFF));
 	printf(_("wal_level setting:%s\n"),
 		   wal_level_str(ControlFile.wal_level));
 	printf(_("wal_log_hints setting:%s\n"),
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index ad1eb4b..f0ba450 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -164,12 +164,18 @@ typedef struct ControlFileData
 	 * start up. If it's false, but backupStartPoint is set, a backup_label
 	 * file was found at startup but it may have been a leftover from a stray
 	 * pg_start_backup() call, not accompanied by pg_stop_backup().
+	 *
+	 * lastKnownSegment is the segment sequence number of the last known XLOG
+	 * segment. This is useful to check that the recovery actually processed
+	 * all segments allocated before the crash (serves as a protection against
+	 * accidentally deleted segments etc.)
 	 */
 	XLogRecPtr	minRecoveryPoint;
 	TimeLineID	minRecoveryPointTLI;
 	XLogRecPtr	backupStartPoint;
 	XLogRecPtr	backupEndPoint;
 	bool		backupEndRequired;
+	XLogSegNo	lastKnownSegment;
 
 	/*
 	 * Parameter settings that determine if the WAL can be used for archival

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: track last known XLOG segment in control file

2015-12-12 Thread Andres Freund
On 2015-12-12 23:28:33 +0100, Tomas Vondra wrote:
> On 12/12/2015 11:20 PM, Andres Freund wrote:
> >On 2015-12-12 22:14:13 +0100, Tomas Vondra wrote:
> >>this is the second improvement proposed in the thread [1] about ext4 data
> >>loss issue. It adds another field to control file, tracking the last known
> >>WAL segment. This does not eliminate the data loss, just the silent part of
> >>it when the last segment gets lost (due to forgetting the rename, deleting
> >>it by mistake or whatever). The patch makes sure the cluster refuses to
> >>start if that happens.
> >
> >Uh, that's fairly expensive. In many cases it'll significantly
> >increase the number of fsyncs.
> 
> It should do exactly 1 additional fsync per WAL segment. Or do you think
> otherwise?

Which is nearly doubling the number of fsyncs, for a good number of
workloads. And it does so to a separate file, i.e. it's not like these
writes and the flushes can be combined. In workloads where pg_xlog is on
a separate partition it'll add the only source of fsyncs besides
checkpoint to the main data directory.

> > I've a bit of a hard time believing this'll be worthwhile.
> 
> The trouble is protections like this only seem worthwhile after the fact,
> when something happens. I think it's reasonable protection against issues
> similar to the one I reported ~2 weeks ago. YMMV.

Meh. That argument can be used to justify about everything.

Obviously we should be more careful about fsyncing files, including the
directories. I do plan come back to your recent patch.

> > Additionally this doesn't seem to take WAL replay into account?
> 
> I think the comparison in StartupXLOG needs to be less strict, to allow
> cases when we actually replay more WAL segments. Is that what you mean?

What I mean is that the value isn't updated during recovery, afaics. You
could argue that minRecoveryPoint is that, in a way.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: track last known XLOG segment in control file

2015-12-12 Thread Tomas Vondra



On 12/12/2015 11:39 PM, Andres Freund wrote:

On 2015-12-12 23:28:33 +0100, Tomas Vondra wrote:

On 12/12/2015 11:20 PM, Andres Freund wrote:

On 2015-12-12 22:14:13 +0100, Tomas Vondra wrote:

this is the second improvement proposed in the thread [1] about ext4 data
loss issue. It adds another field to control file, tracking the last known
WAL segment. This does not eliminate the data loss, just the silent part of
it when the last segment gets lost (due to forgetting the rename, deleting
it by mistake or whatever). The patch makes sure the cluster refuses to
start if that happens.


Uh, that's fairly expensive. In many cases it'll significantly
increase the number of fsyncs.


It should do exactly 1 additional fsync per WAL segment. Or do you think
otherwise?


Which is nearly doubling the number of fsyncs, for a good number of
workloads. And it does so to a separate file, i.e. it's not like
these writes and the flushes can be combined. In workloads where
pg_xlog is on a separate partition it'll add the only source of
fsyncs besides checkpoint to the main data directory.


I doubt it will make any difference in practice, at least on reasonable 
hardware (which you should have, if fsync performance matters to you).


But some performance testing will be necessary, I don't expect this to 
go in without that. It'd be helpful if you could describe the workload.



I've a bit of a hard time believing this'll be worthwhile.


The trouble is protections like this only seem worthwhile after the fact,
when something happens. I think it's reasonable protection against issues
similar to the one I reported ~2 weeks ago. YMMV.


Meh. That argument can be used to justify about everything.

Obviously we should be more careful about fsyncing files, including
the directories. I do plan come back to your recent patch.


My argument is that this is a reasonable protection against failures in 
that area - both our faults (in understanding the durability guarantees 
on a particular file system), or file system developer.


Maybe it's not, because the chance of running into exactly the same 
issue in this part of code is negligible.





Additionally this doesn't seem to take WAL replay into account?


I think the comparison in StartupXLOG needs to be less strict, to allow
cases when we actually replay more WAL segments. Is that what you mean?


What I mean is that the value isn't updated during recovery, afaics.
You could argue that minRecoveryPoint is that, in a way.


Oh, right. Will fix if we conclude that the general idea makes sense.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Using a single standalone-backend run in initdb (was Re: [HACKERS] Bootstrap DATA is a pita)

2015-12-12 Thread Tom Lane
Joe Conway  writes:
> On 12/12/2015 02:31 PM, Tom Lane wrote:
>> I'm not particularly wedded to this rule.  In principle we could go so
>> far as to import psql's code that parses commands and figures out which
>> semicolons are command terminators --- but that is a pretty large chunk
>> of code, and I think it'd really be overkill considering that initdb
>> deals only with fixed input scripts.  But if anyone has another simple
>> rule for breaking SQL into commands, we can certainly discuss
>> alternatives.

> Possibly inadequate, but I wrote a get_one_query() function to grab one
> statement at a time from a possibly multi-statement string and it isn't
> all that many lines of code:
>   https://github.com/jconway/pgsynck/blob/master/pgsynck.c

Hmm.  Doesn't look like that handles semicolons embedded in CREATE RULE;
for that you'd have to track parenthesis nesting as well.  (It's arguable
that we won't ever need that case during initdb, but I'd just as soon not
wire in such an assumption.)  In general, though, I'd rather not try to
teach InteractiveBackend() such a large amount about SQL syntax.

With a rule like "break at ;\n\n" it's possible to ensure that command
breaks occur only where wanted, though in corner cases you might have to
format your input oddly.  (For instance, if you needed that in a SQL
literal, you might resort to E';\n\n' or use the standard's rules about
concatenated string literals.)  If you get it wrong the consequences
aren't too disastrous: you'll get an unterminated-input syntax error,
or in the other direction multiple commands will get run together for
execution, which most of the time isn't a big issue.

>> Does anyone know of people using standalone mode other than
>> for initdb?

> sepgsql uses it for installation, but it does not appear to use -j
> I'm not sure why it is required but at some point I'd like to dig into that.

It might be easier than starting a full postmaster and having to figure
out a secure place for the socket etc.  I'm prepared to back off the
proposal about changing the default behavior of standalone mode; that
leaves us with a choice between changing -j's behavior and inventing
a new switch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries

2015-12-12 Thread Greg Stark
On Sat, Dec 12, 2015 at 8:30 PM, Andreas Seltenreich  wrote:
> I currently set statement_timeout to 1s to avoid wasting time letting
> postgres crunch numbers.  Less than 0.5% of the queries run into this
> timeout.


I wonder if any of these timeouts would be interesting to look at.
Some may just be very large queries that will take a few seconds to
plan but others may be queries that are uncovering N^2 algorithms or
even conceivably loops that are not terminating.

When you hit the timeout is this implemented in your fuzzer or using
statement_timeout? If the former, can you add a statement_timeout of
just short of the timeout in the fuzzer and find cases where the
planner might not be calling CHECK_FOR_INTERRUPTS frequently enough?

> > Do you have coverage data for the corpus?
>
> I do have some older numbers for line coverage from before the recent grammar 
> extension:

If you have a corpus of queries in a simple format it would be pretty
convenient to add them in a regression test and then run make coverage
to get html reports.

Did you publish the source already? I haven't been following all
along, sorry if these are all answered questions.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers