Re: [HACKERS] A worst case for qsort

2014-08-05 Thread Peter Geoghegan
On Tue, Aug 5, 2014 at 11:49 PM, Fabien COELHO  wrote:
> If so, adding some randomness in the decision process would suffice to
> counter the adversarial input argument you raised.


This is specifically addressed by the paper. Indeed, randomly choosing
a pivot is a common strategy. It won't fix the problem.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A worst case for qsort

2014-08-05 Thread Fabien COELHO


For example, if we had reason to be concerned about *adversarial* 
inputs, I think that there is a good chance that our qsort() actually 
would be problematic to the point of driving us to prefer some generally 
slower alternative.


That is an interesting point.

Indeed, a database in general often stores user-supplied data, which may 
happen to be sorted for presentation purpose in an interface. Same thing 
occured with hashtable algorithms and was/is a way to do DOS attacks on 
web applications. I'm not sure whether the qsort version discussed here 
would apply on user-supplied data, though. If so, adding some randomness 
in the decision process would suffice to counter the adversarial input 
argument you raised.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] missing PG_RETURN_UINT16

2014-08-05 Thread Fujii Masao
On Wed, Aug 6, 2014 at 4:47 AM, Robert Haas  wrote:
> On Mon, Aug 4, 2014 at 11:35 AM, Manuel Kniep  wrote:
>> I’m missing the PG_RETURN_UINT16 macro in fmgr.h
>> Since we already have the PG_GETARG_UINT16 macro
>> I guess it makes sense to to have it.
>>
>> here is the trivial patch for it.
>
> I see no reason not to add this.  Anyone else want to object?

+1 to add that.

What about backpatch to 9.4? This is very simple change and there seems to
be no reason to wait for it until 9.5.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog add synchronous mode

2014-08-05 Thread Fujii Masao
On Wed, Aug 6, 2014 at 2:34 PM,   wrote:
>> >> I have improved the patch  by making following changes:
>> >>
>> >> 1. Since stream_stop() was redundant, stream_stop() at the time of
>> WAL file closing was deleted.
>> >>
>> >> 2. Change the Flash judging timing for the readability of source code.
>> >>I have changed the Flash judging timing , from the continuous
>> message after receiving to
>> >>before the feedbackmassege decision of continue statement after
>> execution.
>> >
>> > Thanks for the updated version of the patch!
>> >
>> > While reviewing the patch, I found that HandleCopyStream() is still
>> > long and which decreases the readability of the source code.
>> > So I feel inclined to refactor the HandleCopyStream() more for better
>> > readability. What about the attached refactoring patch?
>>
>> Sorry, I forgot to attached the patch in previous email. So attached.
>
> Thank you for the refactoring patch.
> I did a review of the patch.
>
> -   break;  /* ignore the rest of 
> this XLogData packet */
>
> +   return true;/* ignore the rest of this 
> XLogData packet */
>
> For break statement at close of wal file, it is a return to true.
> It may be a behavior of continue statement. Is it satisfactory?

Sorry I failed to see your point.

In the original code, when we reach the end of WAL file and it's streaming
stopping point, we break out of inner loop in the code block for processing
XLogData packet. And then we goes back to top of outer loop in
HandleCopyStream. ISTM that the refactored code also works the same way.
Anyway, could you elaborate the problem?

> The walreceiver distributes XLogWalRcvProcessMsg and XLogWalRcvWrite, but 
> isn't that division necessary?

Not necessary, but I have no objection to the idea.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add modulo (%) operator to pgbench

2014-08-05 Thread Fabien COELHO


Hello Alvaro,


I wonder if it would be necessary to offer the division operator
semantics corresponding to whatever additional modulo operator we choose
to offer.  That is, if we add emod, do we need "ediv" as well?


I would make sense, however I do not need it, and I'm not sure of a use 
case where it would be needed, so I do not think that it is "necessary". 
If it happens to be, it could be added then quite easily.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add modulo (%) operator to pgbench

2014-08-05 Thread Fabien COELHO


Hello Robert,


The issue is that there are 3 definitions of modulo, two of which are fine
(Knuth floored division and Euclidian), and the last one much less useful.
Alas, C (%) & SQL (MOD) choose the bad definition:-( I really need any of
the other two. The attached patch adds all versions, with "%" and "mod"
consistent with their C and SQL unfortunate counterparts, and "fmod" and
"emod" the sane ones.


Three different modulo operators seems like a lot for a language that
doesn't even have a real expression syntax, but I'll yield to whatever
the consensus is on this one.


I agree that it is overkill.

In fact there is a link: if there was a real expression syntax, the 
remainder sign could be fixed afterwards, so the standard C/SQL version 
would do. If it is not available, the modulo operator must be right.


If there is only one modulo added, I would rather have the Knuth version. 
However I was afraid that someone would object if "%" does not return the 
same result than the C/PostgreSQL versions (even if I think that nearly 
nobody has a clue about what % returns when arguments are negative), hence 
the 3 modulo version to counter this potential critic.


But I would prefer just one version with the Knuth (or Euclidian)
definitions.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.auto.conf and reload

2014-08-05 Thread Fujii Masao
On Tue, Aug 5, 2014 at 12:49 PM, Fujii Masao  wrote:
> On Mon, Aug 4, 2014 at 11:52 PM, Tom Lane  wrote:
>> Fujii Masao  writes:
>>> The patch chooses the last settings for every parameters and ignores the
>>> former settings. But I don't think that every parameters need to be 
>>> processed
>>> this way. That is, we can change the patch so that only PGC_POSTMASTER
>>> parameters are processed that way. The invalid settings in the parameters
>>> except PGC_POSTMASTER can be checked by pg_ctl reload as they are now.
>>> Also this approach can reduce the number of comparison to choose the
>>> last setting, i.e., "n" in O(n^2) is the number of uncommented 
>>> *PGC_POSTMASTER*
>>> parameters (not every parameters). Thought?
>>
>> I don't find that to be a particularly good idea.  In the first place,
>> it introduces extra complication and a surprising difference in the
>> behavior of different GUCs.  In the second place, I thought part of the
>> point of this patch was to suppress log messages complaining about
>> invalid values that then weren't actually used for anything.  That issue
>> exists just as much for non-POSTMASTER variables.  (IOW, "value cannot
>> be changed now" is not the only log message we're trying to suppress.)
>
> Yep, sounds reasonable. This makes me think that we can suppress
> such invalid message of the parameters which are actually not used
> at not only conf file reload but also *postmaster startup*. That's another
> story, though. Anyway, barring any objection, I will commit Amit's patch.

Applied the slightly-modified version!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Simon Riggs
On 6 August 2014 03:16, Bruce Momjian  wrote:
> On Wed, Aug  6, 2014 at 09:17:35AM +0900, Michael Paquier wrote:
>> On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs  wrote:
>> >
>> > On 5 August 2014 22:38, Claudio Freire  wrote:
>> > Thinking some more, there seems like this whole store-multiple-LSNs
>> > thing is too much. We can still do block-level incrementals just by
>> > using a single LSN as the reference point. We'd still need a complex
>> > file format and a complex file reconstruction program, so I think that
>> > is still "next release". We can call that INCREMENTAL BLOCK LEVEL.
>>
>> Yes, that's the approach taken by pg_rman for its block-level
>> incremental backup. Btw, I don't think that the CPU cost to scan all
>> the relation files added to the one to rebuild the backups is worth
>> doing it on large instances. File-level backup would cover most of the
>
> Well, if you scan the WAL files from the previous backup, that will tell
> you what pages that need incremental backup.

That would require you to store that WAL, which is something we hope
to avoid. Plus if you did store it, you'd need to retrieve it from
long term storage, which is what we hope to avoid.

> I am thinking we need a wiki page to outline all these options.

There is a Wiki page.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog add synchronous mode

2014-08-05 Thread furuyao
> >> I have improved the patch  by making following changes:
> >>
> >> 1. Since stream_stop() was redundant, stream_stop() at the time of
> WAL file closing was deleted.
> >>
> >> 2. Change the Flash judging timing for the readability of source code.
> >>I have changed the Flash judging timing , from the continuous
> message after receiving to
> >>before the feedbackmassege decision of continue statement after
> execution.
> >
> > Thanks for the updated version of the patch!
> >
> > While reviewing the patch, I found that HandleCopyStream() is still
> > long and which decreases the readability of the source code.
> > So I feel inclined to refactor the HandleCopyStream() more for better
> > readability. What about the attached refactoring patch?
> 
> Sorry, I forgot to attached the patch in previous email. So attached.

Thank you for the refactoring patch.
I did a review of the patch. 

-   break;  /* ignore the rest of 
this XLogData packet */

+   return true;/* ignore the rest of this 
XLogData packet */

For break statement at close of wal file, it is a return to true. 
It may be a behavior of continue statement. Is it satisfactory?

The walreceiver distributes XLogWalRcvProcessMsg and XLogWalRcvWrite, but isn't 
that division necessary?

Regards,

--
Furuya Osamu


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-05 Thread Noah Misch
On Tue, Aug 05, 2014 at 09:32:59PM -0700, Peter Geoghegan wrote:
> On Tue, Aug 5, 2014 at 8:55 PM, Noah Misch  wrote:
> > Comparator cost affects external sorts more than it affects internal sorts.
> > When I profiled internal and external int4 sorting, btint4cmp() was 0.37% of
> > the internal sort profile and 10.26% of the external sort profile.
> 
> Did you attempt to characterize where wall time was being spent?

No.  I have the perf data files, if you'd like to see any particular report.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-05 Thread Peter Geoghegan
On Tue, Aug 5, 2014 at 8:55 PM, Noah Misch  wrote:
> Comparator cost affects external sorts more than it affects internal sorts.
> When I profiled internal and external int4 sorting, btint4cmp() was 0.37% of
> the internal sort profile and 10.26% of the external sort profile.

Did you attempt to characterize where wall time was being spent? With
a difference like that, it seems likely that it's largely down to the
fact that quicksort is cache oblivious, rather than, say, that there
were more comparisons required.

>> I must admit that this did surprise me, but then I don't grok tape
>> sort. What's particularly interesting here is that when work_mem is
>> cranked up to 512MB, which is a high setting, but still not high
>> enough to do an internal sort, the difference closes in a bit. Instead
>> of 41 runs, there are only 2. Patched now takes 16.3 seconds.
>> Meanwhile, master is somewhat improved, and consistently takes 65
>> seconds to complete the sort.
>
>> Does anyone recall hearing complaints around higher work_mem settings
>> regressing performance?
>
> Jeff Janes has mentioned it:
> http://www.postgresql.org/message-id/CAMkU=1zVD82voXw1vBG1kWcz5c2G=supgohpkm0thwmprk1...@mail.gmail.com

I knew that I'd heard that at least once. Apparently some other
database systems have external sorts that tend to be faster than
equivalent internal sorts. I'd guess that that is an artifact of their
having a substandard internal sort, though.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-05 Thread Noah Misch
On Tue, Aug 05, 2014 at 07:32:35PM -0700, Peter Geoghegan wrote:
> select * from (select * from tags order by tag offset 1) ii;
> 
> Git master takes about 25 seconds to execute the query. Patched takes
> about 6.8 seconds. That seems very good, but this is not really new
> information.
> 
> However, with work_mem set low enough to get an external sort, the
> difference is more interesting. If I set work_mem to 10 MB, then the
> query takes about 10.7 seconds to execute with a suitably patched
> Postgres. Whereas on master, it consistently takes a full 69 seconds.
> That's the largest improvement I've seen so far, for any case.

Comparator cost affects external sorts more than it affects internal sorts.
When I profiled internal and external int4 sorting, btint4cmp() was 0.37% of
the internal sort profile and 10.26% of the external sort profile.

> I must admit that this did surprise me, but then I don't grok tape
> sort. What's particularly interesting here is that when work_mem is
> cranked up to 512MB, which is a high setting, but still not high
> enough to do an internal sort, the difference closes in a bit. Instead
> of 41 runs, there are only 2. Patched now takes 16.3 seconds.
> Meanwhile, master is somewhat improved, and consistently takes 65
> seconds to complete the sort.

> Does anyone recall hearing complaints around higher work_mem settings
> regressing performance?

Jeff Janes has mentioned it:
http://www.postgresql.org/message-id/CAMkU=1zVD82voXw1vBG1kWcz5c2G=supgohpkm0thwmprk1...@mail.gmail.com

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Append to a GUC parameter ?

2014-08-05 Thread Fabrízio de Royes Mello
On Tue, Aug 5, 2014 at 10:55 PM, Tom Lane  wrote:

> Josh Berkus  writes:
> > BTW, while there's unlikely to be a good reason to put search_path in
> > pg.conf with appends, there are a LOT of reasons to want to be able to
> > append to it during a session.
>
> [shrug...]  You can do that today with current_setting()/set_config().
>
>
+1

With a very simple statement you can do that:

fabrizio=# SELECT current_setting('search_path');
 current_setting
-
 "$user",public
(1 row)

fabrizio=# SELECT set_config('search_path',
current_setting('search_path')||', another_schema', false);
   set_config

 "$user",public, another_schema
(1 row)


Or you can create a simple sql function to concat your configs:

fabrizio=# CREATE FUNCTION concat_config(name TEXT, value TEXT, is_local
BOOLEAN)
fabrizio-# RETURNS text
fabrizio-# LANGUAGE sql
fabrizio-# AS $$ SELECT set_config(name, current_setting(name)||','||value,
is_local); $$;
CREATE FUNCTION
fabrizio=# SELECT concat_config('search_path', 'another_schema', false);
 concat_config
---
 "$user",public,another_schema
(1 row)

fabrizio=# SELECT concat_config('search_path', 'schema2', false);
 concat_config
---
 "$user",public,another_schema,schema2
(1 row)


Regards,

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] how to debug into InitPostgres() and InitCatalogCache()?

2014-08-05 Thread Alvaro Herrera
土卜皿 wrote:
> 2014-08-05 22:08 GMT+08:00 土卜皿 :
> 
> > hi, all
> >   I already can debug general postgres using "ddd" and "select
> > pg_backend_pid();" ,  now, I want to study the details of the system
> > catalog cache and system cache management, so I need to debug the function
> > InitPostgres() and InitCatalogCache(), and I tried the following steps:

>   For debug initialization including system catalog cache, I add some code
> like:
> 
>  bool forDebug = true;
> 
> while (forDebug){
>   forDebug = true;
> }
> 
> in the InitPosgres()'s starting position.

There's also the -W switch, which makes it wait precisely to let you
attach a debugger.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minmax indexes

2014-08-05 Thread Alvaro Herrera
FWIW I think I haven't responded appropriately to the points raised by
Heikki.  Basically, as I see it there are three main items:

1. the revmap physical-to-logical mapping is too complex; let's use
something else.

We had revmap originally in a separate fork.  The current approach grew
out of the necessity of putting it in the main fork while ensuring that
fast access to individual pages is possible.  There are of course many
ways to skin this cat; Heikki's proposal is to have it always occupy the
first few physical pages, rather than require a logical-to-physical
mapping table.  To implement this he proposes to move other pages out of
the way as the index grows.  I don't really have much love for this
idea.  We can change how this is implemented later in the cycle, if we
find that a different approach is better than my proposal.  I don't want
to spend endless time meddling with this (and I definitely don't want to
have this delay the eventual commit of the patch.)


2. vacuuming is not optimal

Right now, to eliminate garbage index tuples we need to first scan
the revmap to figure out which tuples are unreferenced.  There is a
concern that if there's an excess of dead tuples, the index becomes
unvacuumable because palloc() fails due to request size.  This is
largely theoretical because in order for this to happen there need to be
several million dead index tuples.  As a minimal fix to alleviate this
problem without requiring a complete rework of vacuuming, we can cap
that palloc request to maintenance_work_mem and remove dead tuples in a
loop instead of trying to remove all of them in a single pass.

Another thing proposed was to store range numbers (or just heap page
numbers) within each index tuple.  I felt that this would add more bloat
unnecessarily.  However, there is some padding space in index tuple that
maybe we can use to store range numbers.  I will think some more about
how we can use this to simplify vacuuming.


3. avoid MMTuple as it is just unnecessary extra complexity.

The main thing that MMTuple adds is not the fact that we save 2 bytes
by storing BlockNumber as is instead of within a TID field.  Instead,
it's that we can construct and deconstruct using our own design, which
means we can use however many Datum entries we want and however many
"null" flags.  In normal heap and index tuples, there are always the
same number of datum/nulls.  In minmax, the number of nulls is twice the
number of indexed columns; the number of datum values is determined by
how many datum values are stored per opclass ("sortable" opclasses
store 2 columns, but geometry would store only one).  If we were to use
regular IndexTuples, we would lose that .. and I have no idea how it
would work.  In other words, MMTuples look fine to me.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-05 Thread Peter Geoghegan
I've looked at another (admittedly sympathetic) dataset that is
publicly available: the flickr "tags" dataset [1]. I end up with a
single table of tags; it's a large enough table, at 388 MB, but the
tuples are not very wide. There are 7,656,031 tags/tuples.

Predictably enough, this query is very fast when an internal sort is
used on a patched Postgres:

select * from (select * from tags order by tag offset 1) ii;

Git master takes about 25 seconds to execute the query. Patched takes
about 6.8 seconds. That seems very good, but this is not really new
information.

However, with work_mem set low enough to get an external sort, the
difference is more interesting. If I set work_mem to 10 MB, then the
query takes about 10.7 seconds to execute with a suitably patched
Postgres. Whereas on master, it consistently takes a full 69 seconds.
That's the largest improvement I've seen so far, for any case.

I must admit that this did surprise me, but then I don't grok tape
sort. What's particularly interesting here is that when work_mem is
cranked up to 512MB, which is a high setting, but still not high
enough to do an internal sort, the difference closes in a bit. Instead
of 41 runs, there are only 2. Patched now takes 16.3 seconds.
Meanwhile, master is somewhat improved, and consistently takes 65
seconds to complete the sort.

This probably has something to do with CPU cache effects. I believe
that all world class external sorting algorithms are cache sensitive.
I'm not sure what the outcome would have been had there not been a
huge amount of memory available for the OS cache to use, which there
was. I think there is probably something to learn about how to improve
tape sort here.

Does anyone recall hearing complaints around higher work_mem settings
regressing performance?

[1] 
http://www.isi.edu/integration/people/lerman/load.html?src=http://www.isi.edu/~lerman/downloads/flickr/flickr_taxonomies.html
, bottom of page
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Append to a GUC parameter ?

2014-08-05 Thread Bruce Momjian
On Tue, Aug  5, 2014 at 07:22:13PM -0600, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Tue, Aug  5, 2014 at 12:52:51PM -0700, Josh Berkus wrote:
> >> However, other than shared_preload_libraries, I can't think of a GUC
> >> which would benefit from this.  Can you?
> 
> > search_path?
> 
> Given the security implications of search_path, assembling its value
> from independent parts sounds like a darn bad idea from here.

I can imagine someone wanting to prefix/suffix to search_path for a user
via ALTER USER.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Append to a GUC parameter ?

2014-08-05 Thread Bruce Momjian
On Tue, Aug  5, 2014 at 06:36:01PM -0700, Josh Berkus wrote:
> On 08/05/2014 06:25 PM, Alvaro Herrera wrote:
> 
> > I'm not sold on += as the syntax to use.  It just needs to be something
> > different from =. 
> 
> Alternative is to use "+" in the value string:
> 
> shared_preload_libraries = '+auto_explain'
> 
> ... not sure that's better.
> 
> The alternative is to add an "append" keyword to everything, which
> strikes me as yucky.

I think you need _prefix_ and _suffix_ operators.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Bruce Momjian
On Wed, Aug  6, 2014 at 09:17:35AM +0900, Michael Paquier wrote:
> On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs  wrote:
> >
> > On 5 August 2014 22:38, Claudio Freire  wrote:
> > Thinking some more, there seems like this whole store-multiple-LSNs
> > thing is too much. We can still do block-level incrementals just by
> > using a single LSN as the reference point. We'd still need a complex
> > file format and a complex file reconstruction program, so I think that
> > is still "next release". We can call that INCREMENTAL BLOCK LEVEL.
> 
> Yes, that's the approach taken by pg_rman for its block-level
> incremental backup. Btw, I don't think that the CPU cost to scan all
> the relation files added to the one to rebuild the backups is worth
> doing it on large instances. File-level backup would cover most of the

Well, if you scan the WAL files from the previous backup, that will tell
you what pages that need incremental backup.

I am thinking we need a wiki page to outline all these options.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Append to a GUC parameter ?

2014-08-05 Thread Tom Lane
Josh Berkus  writes:
> BTW, while there's unlikely to be a good reason to put search_path in
> pg.conf with appends, there are a LOT of reasons to want to be able to
> append to it during a session.

[shrug...]  You can do that today with current_setting()/set_config().

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Append to a GUC parameter ?

2014-08-05 Thread Josh Berkus
On 08/05/2014 06:25 PM, Alvaro Herrera wrote:

> I'm not sold on += as the syntax to use.  It just needs to be something
> different from =. 

Alternative is to use "+" in the value string:

shared_preload_libraries = '+auto_explain'

... not sure that's better.

The alternative is to add an "append" keyword to everything, which
strikes me as yucky.

> But I am thinking that whatever it is, it would be
> required.  

Yes, absolutely.

> The idea behing GUC_LIST_ADDITIVE is that += is only supported for
> variables that have that flag set.  If you tried to use += with other
> variables, an error would be raised.

Yes.

BTW, while there's unlikely to be a good reason to put search_path in
pg.conf with appends, there are a LOT of reasons to want to be able to
append to it during a session.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Claudio Freire
On Tue, Aug 5, 2014 at 9:04 PM, Simon Riggs  wrote:
> On 5 August 2014 22:38, Claudio Freire  wrote:
>
>>> * When we take an incremental backup we need the WAL from the backup
>>> start LSN through to the backup stop LSN. We do not need the WAL
>>> between the last backup stop LSN and the new incremental start LSN.
>>> That is a huge amount of WAL in many cases and we'd like to avoid
>>> that, I would imagine. (So the space savings aren't just the delta
>>> from the main data files, we should also look at WAL savings).
>>
>> Yes, probably something along the lines of removing redundant FPW and
>> stuff like that.
>
> Not what I mean at all, sorry for confusing.
>
> Each backup has a start LSN and a stop LSN. You need all the WAL
> between those two points (-X option)
>
> But if you have an incremental backup (b2), it depends upon an earlier
> backup (b1).
>
> You don't need the WAL between b1.stop_lsn and b2.start_lsn.
>
> In typical cases, start to stop will be a few hours or less, whereas
> we'd be doing backups at most daily. Which would mean we'd only need
> to store at most 10% of the WAL files because we don't need WAL
> between backups.

I was assuming you wouldn't store that WAL. You might not even have it.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Claudio Freire
On Tue, Aug 5, 2014 at 9:17 PM, Michael Paquier
 wrote:
> On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs  wrote:
>>
>> On 5 August 2014 22:38, Claudio Freire  wrote:
>> Thinking some more, there seems like this whole store-multiple-LSNs
>> thing is too much. We can still do block-level incrementals just by
>> using a single LSN as the reference point. We'd still need a complex
>> file format and a complex file reconstruction program, so I think that
>> is still "next release". We can call that INCREMENTAL BLOCK LEVEL.
>
> Yes, that's the approach taken by pg_rman for its block-level
> incremental backup. Btw, I don't think that the CPU cost to scan all
> the relation files added to the one to rebuild the backups is worth
> doing it on large instances. File-level backup would cover most of the
> use cases that people face, and simplify footprint on core code. With
> a single LSN as reference position of course to determine if a file
> needs to be backup up of course, if it has at least one block that has
> been modified with a LSN newer than the reference point.


It's the finding of that block that begs optimizing IMO.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Append to a GUC parameter ?

2014-08-05 Thread Alvaro Herrera
David G Johnston wrote:
> Alvaro Herrera-9 wrote

> > I think this merits a new GUC flag, say GUC_LIST_ADDITIVE.
> 
> Would that allow, without any special syntax, multiple declarations of, say,
> shared_preload_libraries, to have their values appended instead of the most
> (or least, I forget which) recent one overwrite (or be ignored by) the
> existing value?
> 
> The idea of requiring "+=" instead of just "=" is not particularly
> appealing...

I'm not sold on += as the syntax to use.  It just needs to be something
different from =.  But I am thinking that whatever it is, it would be
required.  Otherwise, for example if you have search_path=foo in
postgresql.conf and search_path=bar in auto.conf via ALTER SYSTEM, you
would end up with search_path=foo,bar instead of just bar which I think
would be the expected outcome.  Of course, we would offer a new ALTER
SYSTEM option to indicate that you want to have auto.conf use += instead
of =.

The idea behing GUC_LIST_ADDITIVE is that += is only supported for
variables that have that flag set.  If you tried to use += with other
variables, an error would be raised.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Append to a GUC parameter ?

2014-08-05 Thread Tom Lane
Bruce Momjian  writes:
> On Tue, Aug  5, 2014 at 12:52:51PM -0700, Josh Berkus wrote:
>> However, other than shared_preload_libraries, I can't think of a GUC
>> which would benefit from this.  Can you?

> search_path?

Given the security implications of search_path, assembling its value
from independent parts sounds like a darn bad idea from here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] how to debug into InitPostgres() and InitCatalogCache()?

2014-08-05 Thread 土卜皿
2014-08-05 22:08 GMT+08:00 土卜皿 :

> hi, all
>   I already can debug general postgres using "ddd" and "select
> pg_backend_pid();" ,  now, I want to study the details of the system
> catalog cache and system cache management, so I need to debug the function
> InitPostgres() and InitCatalogCache(), and I tried the following steps:
>
> [refer: How to debug postgresql during initdb ]
> 
>
> (1) in first terminal:
>
> ddd initdb
>
> and set a breakpoint on the next executable line after “PG_CMD_OPEN” in
> function bootstrap_template1() ,
>
> Then within gdb, use “r $myfolder/mydb”,
>
> and found the cursor stop at the breakpoint
>
> (2) in the second terminal:
>
> ps aux | grep "bin\/postgres"
>
> after get the posgtres server pid, start a new ddd without args, and
> within gdb, using "attach postgres.pid"
>
> and set a breakpoint as InitCatalogCache()
>
> (3) back to the first ddd window, hit the "cont" button. I hope it can
> stop at the second ddd's breakpoint, but nothing happened?
>
> I must make mistake in some step, or my understanding is fully wrong,
> please give me some advice, thanks a lot
>


  I am sorry that this is a simple question.  I found that the above steps
make the posgres reach boot_yyparse() of BootstrapModeMain(void), which
means postgres has already finished the initialization including
InitCatalogCache(), and it was waiting query command or other user command .

  For debug initialization including system catalog cache, I add some code
like:

 bool forDebug = true;

while (forDebug){
  forDebug = true;
}

in the InitPosgres()'s starting position.

>
>
>
>
>


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Michael Paquier
On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs  wrote:
>
> On 5 August 2014 22:38, Claudio Freire  wrote:
> Thinking some more, there seems like this whole store-multiple-LSNs
> thing is too much. We can still do block-level incrementals just by
> using a single LSN as the reference point. We'd still need a complex
> file format and a complex file reconstruction program, so I think that
> is still "next release". We can call that INCREMENTAL BLOCK LEVEL.

Yes, that's the approach taken by pg_rman for its block-level
incremental backup. Btw, I don't think that the CPU cost to scan all
the relation files added to the one to rebuild the backups is worth
doing it on large instances. File-level backup would cover most of the
use cases that people face, and simplify footprint on core code. With
a single LSN as reference position of course to determine if a file
needs to be backup up of course, if it has at least one block that has
been modified with a LSN newer than the reference point.

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] A worst case for qsort

2014-08-05 Thread Peter Geoghegan
There was some informal discussion of worst case performance of my
text sort support patch at the most recent developer's meeting in
Ottawa. There is no need to repeat the details here, which aren't
terribly interesting - I made a point about putting worst case
performance in context. I also made an argument around quicksort,
which despite being a widely used sorting algorithm is known to go
quadratic in the worst case, a really bad outcome. I've heard stories
about GUIs that hit the worst case and froze when a user clicked on a
column header to sort its contents. This kind of thing occurred until
surprisingly recently, and was one reason why Postgres eventually
adopted its own qsort() rather than using the OS-supplied
implementation. A lot of the measures proposed by Sedgewick in
particular made these outcomes occur very infrequently in practice.

Our quicksort routine is almost a verbatim copy of the implementation
that appeared in NetBSD, which is itself almost a verbatim copy of the
implementation that appears in the 1993 paper by J. L. Bentley and M.
D. McIlroy, ‘Engineering a sort function’. McIlroy
(with some input from Bentley) subsequently described a way of making
many high quality implementations perform very badly, including his
own:

http://www.cs.dartmouth.edu/~doug/mdmspe.pdf

I don't want to belabor the point here. I do not mean to imply that
everything I've proposed to date wrt sort support for text is
self-evidently reasonable, in the same way that our use of Bentley and
McIlroy's qsort() seems reasonable on balance. I expect some
interesting discussion there.

Anyway, the paper says:

"The adversarial method works for almost any polymorphic program
recognizable as quicksort. The subject quicksort may copy values at
will, or work with lists rather than arrays. It may even pick the
pivot at random. The quicksort will be vulnerable provided only that
it satisfies some mild assumptions that are met by every
implementation I have seen".

IMHO, while worst case performance is a very useful tool for analyzing
algorithms (particularly their worst case time complexity), a worst
case should be put in its practical context. For example, if we had
reason to be concerned about *adversarial* inputs, I think that there
is a good chance that our qsort() actually would be problematic to the
point of driving us to prefer some generally slower alternative.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Simon Riggs
On 5 August 2014 22:38, Claudio Freire  wrote:

>> * When we take an incremental backup we need the WAL from the backup
>> start LSN through to the backup stop LSN. We do not need the WAL
>> between the last backup stop LSN and the new incremental start LSN.
>> That is a huge amount of WAL in many cases and we'd like to avoid
>> that, I would imagine. (So the space savings aren't just the delta
>> from the main data files, we should also look at WAL savings).
>
> Yes, probably something along the lines of removing redundant FPW and
> stuff like that.

Not what I mean at all, sorry for confusing.

Each backup has a start LSN and a stop LSN. You need all the WAL
between those two points (-X option)

But if you have an incremental backup (b2), it depends upon an earlier
backup (b1).

You don't need the WAL between b1.stop_lsn and b2.start_lsn.

In typical cases, start to stop will be a few hours or less, whereas
we'd be doing backups at most daily. Which would mean we'd only need
to store at most 10% of the WAL files because we don't need WAL
between backups.

>> * For me, file based incremental is a useful and robust feature.
>> Block-level incremental is possible, but requires either significant
>> persistent metadata (1 MB per GB file) or access to the original
>> backup. One important objective here is to make sure we do NOT have to
>> re-read the last backup when taking the next backup; this helps us to
>> optimize the storage costs for backups. Plus, block-level recovery
>> requires us to have a program that correctly re-writes data into the
>> correct locations in a file, which seems likely to be a slow and bug
>> ridden process to me. Nice, safe, solid file-level incremental backup
>> first please. Fancy, bug prone, block-level stuff much later.
>
> Ok. You could do incremental first without any kind of optimization,

Yes, that is what makes sense to me. Fast, simple, robust and most of
the benefit.

We should call this INCREMENTAL FILE LEVEL

> then file-level optimization by keeping a file-level LSN range, and
> then extend that to block-segment-level LSN ranges. That sounds like a
> plan to me.

Thinking some more, there seems like this whole store-multiple-LSNs
thing is too much. We can still do block-level incrementals just by
using a single LSN as the reference point. We'd still need a complex
file format and a complex file reconstruction program, so I think that
is still "next release". We can call that INCREMENTAL BLOCK LEVEL

> But, I don't see how you'd do the one without optimization without
> reading the previous backup for comparing deltas. Remember checksums
> are deemed not trustworthy, not just by me, so that (which was the
> original proposition) doesn't work.

Every incremental backup refers to an earlier backup as a reference
point, which may then refer to an earlier one, in a chain.

Each backup has a single LSN associated with it, as stored in the
backup_label. (So we don't need the profile stage now, AFAICS)

To decide whether we need to re-copy the file, you read the file until
we find a block with a later LSN. If we read the whole file without
finding a later LSN then we don't need to re-copy. That means we read
each file twice, which is slower, but the file is at most 1GB in size,
we we can assume will be mostly in memory for the second read.

As Marco says, that can be optimized using filesystem timestamps instead.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minmax indexes

2014-08-05 Thread Josh Berkus
On 08/05/2014 04:41 PM, Alvaro Herrera wrote:
> I have chosen to keep the name "minmax", even if the opclasses now let
> one implement completely different things on top of it such as geometry
> bounding boxes and bloom filters (aka bitmap indexes).  I don't see a
> need for a rename: essentially, in PR we can just say "we have these
> neat minmax indexes that other databases also have, but instead of just
> being used for integer data, they can also be used for geometry, GIS and
> bitmap indexes, so as always we're more powerful than everyone else when
> implementing new database features".

Plus we haven't come up with a better name ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Append to a GUC parameter ?

2014-08-05 Thread David G Johnston
Alvaro Herrera-9 wrote
> Bruce Momjian wrote:
>> On Tue, Aug  5, 2014 at 12:52:51PM -0700, Josh Berkus wrote:
>> > On 08/05/2014 11:12 AM, Jerry Sievers wrote:
>> > > shared_preload_libraries += auto_explain
>> > > 
>> > > Would do the trick.
>> > > 
>> > > I've  never heard this mentioned before so presume not many have
>> > > contemplated this.
>> > 
>> > It's been discussed.
>> > 
>> > However, other than shared_preload_libraries, I can't think of a GUC
>> > which would benefit from this.  Can you?
>> 
>> search_path?
> 
> Almost anything that's marked GUC_LIST_INPUT.  A quick grep shows some
> variables for which it doesn't make sense, such as DateStyle, and others
> for which it would, such as temp_tablespaces, synchronous_standby_names,
> listen_addresses, and the various preload_libraries params.  Not sure
> about log_destination.
> 
> I think this merits a new GUC flag, say GUC_LIST_ADDITIVE.

Would that allow, without any special syntax, multiple declarations of, say,
shared_preload_libraries, to have their values appended instead of the most
(or least, I forget which) recent one overwrite (or be ignored by) the
existing value?

The idea of requiring "+=" instead of just "=" is not particularly
appealing...

David J.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Append-to-a-GUC-parameter-tp5813811p5813842.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade and toast tables bug discovered

2014-08-05 Thread Bruce Momjian
On Thu, Jul 10, 2014 at 06:38:26PM -0400, Bruce Momjian wrote:
> On Thu, Jul 10, 2014 at 06:17:14PM -0400, Bruce Momjian wrote:
> > Well, we are going to need to call internal C functions, often bypassing
> > their typical call sites and the assumption about locking, etc.  Perhaps
> > this could be done from a plpgsql function.  We could add and drop a
> > dummy column to force TOAST table creation --- we would then only need a
> > way to detect if a function _needs_ a TOAST table, which was skipped in
> > binary upgrade mode previously.
> > 
> > That might be a minimalistic approach.
> 
> I have thought some more on this.  I thought I would need to open
> pg_class in C and do complex backend stuff, but I now realize I can do
> it from libpq, and just call ALTER TABLE and I think that always
> auto-checks if a TOAST table is needed.  All I have to do is query
> pg_class from libpq, then construct ALTER TABLE commands for each item,
> and it will optionally create the TOAST table if needed.  I just have to
> use a no-op ALTER TABLE command, like SET STATISTICS.

Attached is a completed patch which handles oid conflicts in pg_class
and pg_type for TOAST tables that were not needed in the old cluster but
are needed in the new one.  I was able to recreate a failure case and
this fixed it.

The patch need to be backpatched because I am getting more-frequent bug
reports from users using pg_upgrade to leave now-end-of-life'ed 8.4. 
There is not a good work-around for pg_upgrade failures without this
fix, but at least pg_upgrade throws an error.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/contrib/pg_upgrade/dump.c b/contrib/pg_upgrade/dump.c
new file mode 100644
index b112f3a..642dd5d
*** a/contrib/pg_upgrade/dump.c
--- b/contrib/pg_upgrade/dump.c
***
*** 12,17 
--- 12,19 
  #include "pg_upgrade.h"
  
  #include 
+ #include "catalog/binary_upgrade.h"
+ 
  
  void
  generate_old_dump(void)
*** generate_old_dump(void)
*** 67,69 
--- 69,139 
  	end_progress_output();
  	check_ok();
  }
+ 
+ 
+ /*
+  * It is possible for there to be a mismatch in the need for TOAST tables
+  * between the old and new servers, e.g. some pre-9.1 tables didn't need
+  * TOAST tables but will need them in 9.1+.  (There are also opposite cases,
+  * but these are handled by setting binary_upgrade_next_toast_pg_class_oid.)
+  *
+  * We can't allow the TOAST table to be created by pg_dump with a
+  * pg_dump-assigned oid because it might conflict with a later table that
+  * uses that oid, causing a "file exists" error for pg_class conflicts, and
+  * a "duplicate oid" error for pg_type conflicts.  (TOAST tables need pg_type
+  * entries.)
+  *
+  * Therefore, a backend in binary-upgrade mode will not create a TOAST
+  * table unless an OID as passed in via pg_upgrade_support functions.
+  * This function is called after the restore and uses ALTER TABLE to
+  * auto-create any needed TOAST tables which will not conflict with
+  * restored oids.
+  */
+ void
+ optionally_create_toast_tables(void)
+ {
+ 	int			dbnum;
+ 
+ 	prep_status("Creating newly-required TOAST tables");
+ 
+ 	for (dbnum = 0; dbnum < new_cluster.dbarr.ndbs; dbnum++)
+ 	{
+ 		PGresult   *res;
+ 		int			ntups;
+ 		int			rowno;
+ 		int			i_nspname,
+ 	i_relname;
+ 		DbInfo	   *active_db = &new_cluster.dbarr.dbs[dbnum];
+ 		PGconn	   *conn = connectToServer(&new_cluster, active_db->db_name);
+ 
+ 		res = executeQueryOrDie(conn,
+ "SELECT n.nspname, c.relname "
+ "FROM	pg_catalog.pg_class c, "
+ "		pg_catalog.pg_namespace n "
+ "WHERE	c.relnamespace = n.oid AND "
+ 			  "		n.nspname NOT IN ('pg_catalog', 'information_schema') AND "
+ 			  	"c.relkind IN ('r', 'm') AND "
+ "c.reltoastrelid = 0");
+ 
+ 		ntups = PQntuples(res);
+ 		i_nspname = PQfnumber(res, "nspname");
+ 		i_relname = PQfnumber(res, "relname");
+ 		for (rowno = 0; rowno < ntups; rowno++)
+ 		{
+ 			/* enable auto-oid-numbered TOAST creation if needed */
+ 			PQclear(executeQueryOrDie(conn, "SELECT binary_upgrade.set_next_toast_pg_class_oid('%d'::pg_catalog.oid);",
+ 	OPTIONALLY_CREATE_TOAST_OID));
+ 
+ 			/* dummy command that also triggers check for required TOAST table */
+ 			PQclear(executeQueryOrDie(conn, "ALTER TABLE %s.%s RESET (binary_upgrade_dummy_option);",
+ 	quote_identifier(PQgetvalue(res, rowno, i_nspname)),
+ 	quote_identifier(PQgetvalue(res, rowno, i_relname;
+ 		}
+ 
+ 		PQclear(res);
+ 
+ 		PQfinish(conn);
+ 	}
+ 
+ 	check_ok();
+ }
diff --git a/contrib/pg_upgrade/pg_upgrade.c b/contrib/pg_upgrade/pg_upgrade.c
new file mode 100644
index 3e97b66..49ea873
*** a/contrib/pg_upgrade/pg_upgrade.c
--- b/contrib/pg_upgrade/pg_upgrade.c
*** create_new_objects(void)
*** 363,368 
--- 363,370 
  	if (GET_MAJOR_VERSION(old_cluster.major_version) < 903)
  		set_froz

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-05 Thread Peter Geoghegan
On Tue, Aug 5, 2014 at 12:33 PM, Robert Haas  wrote:
> I'm also not sure it won't come up again.  There are certainly other
> text-like datatypes out there that might want to optimize sorts; e.g.
> citext.

Fair enough. Actually, come to think of it I find BpChar/character(n)
a far more likely candidate. I've personally never used the SQL
standard character(n) type, which is why I didn't think of it until
now. TPC-H does make use of character(n) though, and that might be a
good reason in and of itself to care about its sorting performance.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fixed redundant i18n strings in json

2014-08-05 Thread Jeff Janes
On Tue, Aug 5, 2014 at 9:35 AM, Robert Haas  wrote:

> On Sat, Aug 2, 2014 at 9:15 AM, Daniele Varrazzo
>  wrote:
> > I'd definitely replace /arg/argument/. Furthermore I'd avoid the form
> > "argument 1: something is wrong": the string is likely to end up in
> > sentences with other colons so I'd rather use "something is wrong at
> > argument 1".
> >
> > Is the patch attached better?
>
> Looks OK to me.  I thought someone else might comment, but since no
> one has, committed.
>
> (As a note for the future, you forgot to update the regression test
> outputs; I took care of that before committing.)
>

I think you missed one of the regression tests, see attached

Thanks,

Jeff


regression.diffs
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Append to a GUC parameter ?

2014-08-05 Thread Alvaro Herrera
Bruce Momjian wrote:
> On Tue, Aug  5, 2014 at 12:52:51PM -0700, Josh Berkus wrote:
> > On 08/05/2014 11:12 AM, Jerry Sievers wrote:
> > > shared_preload_libraries += auto_explain
> > > 
> > > Would do the trick.
> > > 
> > > I've  never heard this mentioned before so presume not many have
> > > contemplated this.
> > 
> > It's been discussed.
> > 
> > However, other than shared_preload_libraries, I can't think of a GUC
> > which would benefit from this.  Can you?
> 
> search_path?

Almost anything that's marked GUC_LIST_INPUT.  A quick grep shows some
variables for which it doesn't make sense, such as DateStyle, and others
for which it would, such as temp_tablespaces, synchronous_standby_names,
listen_addresses, and the various preload_libraries params.  Not sure
about log_destination.

I think this merits a new GUC flag, say GUC_LIST_ADDITIVE.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add modulo (%) operator to pgbench

2014-08-05 Thread Alvaro Herrera
Robert Haas wrote:
> On Mon, Aug 4, 2014 at 5:20 AM, Fabien COELHO  wrote:
> >> This patch is pretty trivial.
> > Another slightly less trivial but more useful version.
> >
> > The issue is that there are 3 definitions of modulo, two of which are fine
> > (Knuth floored division and Euclidian), and the last one much less useful.
> > Alas, C (%) & SQL (MOD) choose the bad definition:-( I really need any of
> > the other two. The attached patch adds all versions, with "%" and "mod"
> > consistent with their C and SQL unfortunate counterparts, and "fmod" and
> > "emod" the sane ones.
> 
> Three different modulo operators seems like a lot for a language that
> doesn't even have a real expression syntax, but I'll yield to whatever
> the consensus is on this one.

I wonder if it would be necessary to offer the division operator
semantics corresponding to whatever additional modulo operator we choose
to offer.  That is, if we add emod, do we need "ediv" as well?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Claudio Freire
On Tue, Aug 5, 2014 at 3:23 PM, Simon Riggs  wrote:
> On 4 August 2014 19:30, Claudio Freire  wrote:
>> On Mon, Aug 4, 2014 at 5:15 AM, Gabriele Bartolini
>>  wrote:
>>>I really like the proposal of working on a block level incremental
>>> backup feature and the idea of considering LSN. However, I'd suggest
>>> to see block level as a second step and a goal to keep in mind while
>>> working on the first step. I believe that file-level incremental
>>> backup will bring a lot of benefits to our community and users anyway.
>>
>> Thing is, I don't see how the LSN method is that much harder than an
>> on-disk bitmap. In-memory bitmap IMO is just a recipe for disaster.
>>
>> Keeping a last-updated-LSN for each segment (or group of blocks) is
>> just as easy as keeping a bitmap, and far more flexible and robust.
>>
>> The complexity and cost of safely keeping the map up-to-date is what's
>> in question here, but as was pointed before, there's no really safe
>> alternative. Nor modification times nor checksums (nor in-memory
>> bitmaps IMV) are really safe enough for backups, so you really want to
>> use something like the LSN. It's extra work, but opens up a world of
>> possibilities.
>
> OK, some comments on all of this.
>
> * Wikipedia thinks the style of backup envisaged should be called 
> "Incremental"
> https://en.wikipedia.org/wiki/Differential_backup
>
> * Base backups are worthless without WAL right up to the *last* LSN
> seen during the backup, which is why pg_stop_backup() returns an LSN.
> This is the LSN that is the effective viewpoint of the whole base
> backup. So if we wish to get all changes since the last backup, we
> must re-quote this LSN. (Or put another way - file level LSNs don't
> make sense - we just need one LSN for the whole backup).

File-level LSNs are an optimization. When you want to backup all files
modified since the last base or incremental backup (yes, you need the
previous backup label at least), you check the file-level LSN range.
That tells you which "changesets" touched that file, so you know
whether to process it or not.

Block-level LSNs (or, rather, block-segment-level) are just a
refinement of that.

> * When we take an incremental backup we need the WAL from the backup
> start LSN through to the backup stop LSN. We do not need the WAL
> between the last backup stop LSN and the new incremental start LSN.
> That is a huge amount of WAL in many cases and we'd like to avoid
> that, I would imagine. (So the space savings aren't just the delta
> from the main data files, we should also look at WAL savings).

Yes, probably something along the lines of removing redundant FPW and
stuff like that.

> * For me, file based incremental is a useful and robust feature.
> Block-level incremental is possible, but requires either significant
> persistent metadata (1 MB per GB file) or access to the original
> backup. One important objective here is to make sure we do NOT have to
> re-read the last backup when taking the next backup; this helps us to
> optimize the storage costs for backups. Plus, block-level recovery
> requires us to have a program that correctly re-writes data into the
> correct locations in a file, which seems likely to be a slow and bug
> ridden process to me. Nice, safe, solid file-level incremental backup
> first please. Fancy, bug prone, block-level stuff much later.

Ok. You could do incremental first without any kind of optimization,
then file-level optimization by keeping a file-level LSN range, and
then extend that to block-segment-level LSN ranges. That sounds like a
plan to me.

But, I don't see how you'd do the one without optimization without
reading the previous backup for comparing deltas. Remember checksums
are deemed not trustworthy, not just by me, so that (which was the
original proposition) doesn't work.

> * If we don't want/have file checksums, then we don't need a profile
> file and using just the LSN seems fine. I don't think we should
> specify that manually - the correct LSN is written to the backup_label
> file in a base backup and we should read it back from there.

Agreed


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Append to a GUC parameter ?

2014-08-05 Thread Bruce Momjian
On Tue, Aug  5, 2014 at 12:52:51PM -0700, Josh Berkus wrote:
> On 08/05/2014 11:12 AM, Jerry Sievers wrote:
> > shared_preload_libraries += auto_explain
> > 
> > Would do the trick.
> > 
> > I've  never heard this mentioned before so presume not many have
> > contemplated this.
> 
> It's been discussed.
> 
> However, other than shared_preload_libraries, I can't think of a GUC
> which would benefit from this.  Can you?

search_path?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-05 Thread Peter Geoghegan
On Tue, Aug 5, 2014 at 12:33 PM, Robert Haas  wrote:
> Per your other email, here's the patch again; hopefully I've got the
> right stuff in the file this time.

Your patch looks fine to me. I recommend committing patch 1 with these
additions. I can't think of any reason to rehash the 2012 discussion.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Append to a GUC parameter ?

2014-08-05 Thread Josh Berkus
On 08/05/2014 11:12 AM, Jerry Sievers wrote:
> shared_preload_libraries += auto_explain
> 
> Would do the trick.
> 
> I've  never heard this mentioned before so presume not many have
> contemplated this.

It's been discussed.

However, other than shared_preload_libraries, I can't think of a GUC
which would benefit from this.  Can you?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] missing PG_RETURN_UINT16

2014-08-05 Thread Robert Haas
On Mon, Aug 4, 2014 at 11:35 AM, Manuel Kniep  wrote:
> I’m missing the PG_RETURN_UINT16 macro in fmgr.h
> Since we already have the PG_GETARG_UINT16 macro
> I guess it makes sense to to have it.
>
> here is the trivial patch for it.

I see no reason not to add this.  Anyone else want to object?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSL regression test suite

2014-08-05 Thread Robert Haas
On Mon, Aug 4, 2014 at 10:38 AM, Heikki Linnakangas
 wrote:
> Now that we use TAP for testing client tools, I think we can use that to
> test various SSL options too. I came up with the attached. Comments?
>
> It currently assumes that the client's and the server's hostnames are
> "postgres-client.test" and "postgres-server.test", respectively. That makes
> it a bit tricky to run on a single systme. The README includes instructions;
> basically you need to set up an additional loopback device, and add entries
> to /etc/hosts for that.

That seems so onerous that I think few people will do it, and not
regularly, resulting in the tests breaking and nobody noticing.
Reconfiguring the loopback interface like that requires root
privilege, and won't survive a reboot, and doing it in the system
configuration will require hackery specific to the particular flavor
of Linux you're running, and probably other hackery on non-Linux
systems (never mind Windows).  Why can't you make it work over
127.0.0.1?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add modulo (%) operator to pgbench

2014-08-05 Thread Robert Haas
On Mon, Aug 4, 2014 at 5:20 AM, Fabien COELHO  wrote:
>> This patch is pretty trivial.
> Another slightly less trivial but more useful version.
>
> The issue is that there are 3 definitions of modulo, two of which are fine
> (Knuth floored division and Euclidian), and the last one much less useful.
> Alas, C (%) & SQL (MOD) choose the bad definition:-( I really need any of
> the other two. The attached patch adds all versions, with "%" and "mod"
> consistent with their C and SQL unfortunate counterparts, and "fmod" and
> "emod" the sane ones.

Three different modulo operators seems like a lot for a language that
doesn't even have a real expression syntax, but I'll yield to whatever
the consensus is on this one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-05 Thread Robert Haas
On Tue, Aug 5, 2014 at 3:15 PM, Peter Geoghegan  wrote:
> On Tue, Aug 5, 2014 at 12:03 PM, Robert Haas  wrote:
>> and if some opclass other than text wants to
>> provide a sortsupport shim that supplies a comparator only sometimes,
>> it will need its own copy of the logic.
>
> That's true, but my proposal to do things that way reflects the fact
> that text is a type oddly tied to the platform. I don't think it will
> come up again (note that in the 4 byte Datum case, we still use sort
> support to some degree on other platforms with patch 2 applied). It
> seemed logical to impose the obligation to deal with that on
> varlena.c.

Per your other email, here's the patch again; hopefully I've got the
right stuff in the file this time.

On this point, I'm all for confining knowledge of things to a
particular module to that module.  However, in this particular case, I
don't believe that there's anything specific to varlena.c in
bttext_inject_shim(); it's a cut down version of the combination of
functions that appear in other modules, and there's nothing to
encourage someone who modifies one of those functions to also update
varlena.c.  Indeed, people developing on non-Windows platforms won't
even be compiling that function, so it would be easy for most of us to
miss the need for an update.  So I argue that my approach is keeping
this knowledge more localized.

I'm also not sure it won't come up again.  There are certainly other
text-like datatypes out there that might want to optimize sorts; e.g.
citext.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index bc036a3..fdf2f4c 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -230,19 +230,19 @@ MJExamineQuals(List *mergeclauses,
  qual->opno);
 
 		/* And get the matching support or comparison function */
+		Assert(clause->ssup.comparator == NULL);
 		sortfunc = get_opfamily_proc(opfamily,
 	 op_lefttype,
 	 op_righttype,
 	 BTSORTSUPPORT_PROC);
 		if (OidIsValid(sortfunc))
 		{
-			/* The sort support function should provide a comparator */
+			/* The sort support function can provide a comparator */
 			OidFunctionCall1(sortfunc, PointerGetDatum(&clause->ssup));
-			Assert(clause->ssup.comparator != NULL);
 		}
-		else
+		if (clause->ssup.comparator == NULL)
 		{
-			/* opfamily doesn't provide sort support, get comparison func */
+			/* support not available, get comparison func */
 			sortfunc = get_opfamily_proc(opfamily,
 		 op_lefttype,
 		 op_righttype,
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 4b5ef99..552e498 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -246,62 +246,6 @@ get_ordering_op_properties(Oid opno,
 }
 
 /*
- * get_sort_function_for_ordering_op
- *		Get the OID of the datatype-specific btree sort support function,
- *		or if there is none, the btree comparison function,
- *		associated with an ordering operator (a "<" or ">" operator).
- *
- * *sortfunc receives the support or comparison function OID.
- * *issupport is set TRUE if it's a support func, FALSE if a comparison func.
- * *reverse is set FALSE if the operator is "<", TRUE if it's ">"
- * (indicating that comparison results must be negated before use).
- *
- * Returns TRUE if successful, FALSE if no btree function can be found.
- * (This indicates that the operator is not a valid ordering operator.)
- */
-bool
-get_sort_function_for_ordering_op(Oid opno, Oid *sortfunc,
-  bool *issupport, bool *reverse)
-{
-	Oid			opfamily;
-	Oid			opcintype;
-	int16		strategy;
-
-	/* Find the operator in pg_amop */
-	if (get_ordering_op_properties(opno,
-   &opfamily, &opcintype, &strategy))
-	{
-		/* Found a suitable opfamily, get matching support function */
-		*sortfunc = get_opfamily_proc(opfamily,
-	  opcintype,
-	  opcintype,
-	  BTSORTSUPPORT_PROC);
-		if (OidIsValid(*sortfunc))
-			*issupport = true;
-		else
-		{
-			/* opfamily doesn't provide sort support, get comparison func */
-			*sortfunc = get_opfamily_proc(opfamily,
-		  opcintype,
-		  opcintype,
-		  BTORDER_PROC);
-			if (!OidIsValid(*sortfunc)) /* should not happen */
-elog(ERROR, "missing support function %d(%u,%u) in opfamily %u",
-	 BTORDER_PROC, opcintype, opcintype, opfamily);
-			*issupport = false;
-		}
-		*reverse = (strategy == BTGreaterStrategyNumber);
-		return true;
-	}
-
-	/* ensure outputs are set on failure */
-	*sortfunc = InvalidOid;
-	*issupport = false;
-	*reverse = false;
-	return false;
-}
-
-/*
  * get_equality_op_for_ordering_op
  *		Get the OID of the datatype-specific btree equality operator
  *		associated with an ordering operator (a "<" or ">" operator).
diff --git a/src/backend/utils/sort/sortsupport.c b/src/backen

Re: [HACKERS] Proposed changing the definition of decade for date_trunc and extract

2014-08-05 Thread Robert Haas
On Sun, Aug 3, 2014 at 3:29 PM, Gavin Flower
 wrote:
> So it would probably be a good idea to mention in the relevant
> documentation, that there was no Year Zero, and that 1 AD follows directly
> after 1 BC.

Well, maybe.  By and large, the PostgreSQL documentation should
confine itself to documenting facts about PostgreSQL rather than, say,
facts about date reckoning, except to the extent that PostgreSQL does
something different from the universal norms.  I mean, we could
document the rules for leap years, too, or how many days there are in
each month, or that many people who used the Julian calendar
considered the first day of the year to be March 25th, but really,
those are things that mostly deserve to be mentioned in a treatise on
historical calendaring rather than our documentation, unless there is
some PostgreSQL-specific treatment that requires expounding on them.

On the original patch, like a few others who have spoken, I think
changing the current behavior is a bad idea.  I certainly respect
Mike's desire to get the behavior right, and to set things up in the
way that makes sense to him, but if we start changing things like this
on the basis of a small number of complaints, we will piss off a lot
of users.  There are many people out there who have thousands upon
thousands of lines of PostgreSQL code that they can't easily audit and
change, and the standard for doing things that might break that code
in subtle ways is, and should be, high.  If what we were doing today
was outright wrong, of course I'd be in favor of changing it.  But the
current behavior isn't at all unreasonable; there are two possible
definitions and we picked one.  In that kind of situation, backward
compatibility concerns carry a great deal of weight.  Even if, in a
vacuum, more users would have chosen to do it as Mike proposes vs.
what we actually picked, we'll make enough people unhappy by changing
it now to make it a poor decision for the project.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-05 Thread Peter Geoghegan
On Tue, Aug 5, 2014 at 12:03 PM, Robert Haas  wrote:
> and if some opclass other than text wants to
> provide a sortsupport shim that supplies a comparator only sometimes,
> it will need its own copy of the logic.

That's true, but my proposal to do things that way reflects the fact
that text is a type oddly tied to the platform. I don't think it will
come up again (note that in the 4 byte Datum case, we still use sort
support to some degree on other platforms with patch 2 applied). It
seemed logical to impose the obligation to deal with that on
varlena.c.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-05 Thread Peter Geoghegan
On Tue, Aug 5, 2014 at 12:03 PM, Robert Haas  wrote:
> ...but I'm wondering what underlies that decision.   I would
> understand the decision to go that way if it simplified things
> elsewhere, but in fact it seems that's what underlies the addition of
> ssup_operator to SortSupportData, which in turn requires a number of
> changes elsewhere.

The changes aren't too invasive. There is exactly one place where it
isn't a trivial matter of storing the operator that was already
available:

+ /* Original operator must be provided */
+ clause->ssup.ssup_operator = get_opfamily_member(opfamily,
+
 op_lefttype,
+
 op_righttype,
+
 opstrategy);

> So I think it's better to just change the sortsupport contract so that
> filling in the comparator is optional.  Patch for that attached.
> Objections?

I'd have preferred to maintain the obligation for some sane
sortsupport state to be provided. It's not as if I feel too strongly
about it, though.

You attached "git diff --stat" output, and not an actual patch. Please re-send.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-05 Thread Robert Haas
On Sat, Aug 2, 2014 at 6:58 PM, Peter Geoghegan  wrote:
> On Sat, Aug 2, 2014 at 2:45 AM, Peter Geoghegan  wrote:
>> I'll post something over the weekend.
>
> Attached is a cumulative pair of patches generated using
> git-format-patch. I refer anyone who wants to know how the two parts
> fit together to the commit messages of each patch. In passing, I have
> added a reference to the MIT License as outlined by Noah.

OK, I have taken a look at patch 1.  You write:

+ * As a general principle, operator classes with a cataloged sort support
+ * function are expected to provide sane sort support state, including a
+ * function pointer comparator.  Rather than revising that principle, just
+ * setup a shim for the WIN32 UTF-8 and non-"C" collation special case here.

...but I'm wondering what underlies that decision.   I would
understand the decision to go that way if it simplified things
elsewhere, but in fact it seems that's what underlies the addition of
ssup_operator to SortSupportData, which in turn requires a number of
changes elsewhere.  The upshot of those changes is that it makes it
possible to write bttext_inject_shim, but AFAICS that's just
recapitulating what get_sort_function_for_ordering_op and
PrepareSortSupportFromOrderingOp are already doing.  Any material
change to either of those functions will have to be reflected in
bttext_inject_shim; and if some opclass other than text wants to
provide a sortsupport shim that supplies a comparator only sometimes,
it will need its own copy of the logic.

So I think it's better to just change the sortsupport contract so that
filling in the comparator is optional.  Patch for that attached.
Objections?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
 src/backend/executor/nodeMergejoin.c |8 ++---
 src/backend/utils/cache/lsyscache.c  |   56 --
 src/backend/utils/sort/sortsupport.c |   42 ++---
 src/include/utils/lsyscache.h|2 --
 4 files changed, 35 insertions(+), 73 deletions(-)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Simon Riggs
On 4 August 2014 19:30, Claudio Freire  wrote:
> On Mon, Aug 4, 2014 at 5:15 AM, Gabriele Bartolini
>  wrote:
>>I really like the proposal of working on a block level incremental
>> backup feature and the idea of considering LSN. However, I'd suggest
>> to see block level as a second step and a goal to keep in mind while
>> working on the first step. I believe that file-level incremental
>> backup will bring a lot of benefits to our community and users anyway.
>
> Thing is, I don't see how the LSN method is that much harder than an
> on-disk bitmap. In-memory bitmap IMO is just a recipe for disaster.
>
> Keeping a last-updated-LSN for each segment (or group of blocks) is
> just as easy as keeping a bitmap, and far more flexible and robust.
>
> The complexity and cost of safely keeping the map up-to-date is what's
> in question here, but as was pointed before, there's no really safe
> alternative. Nor modification times nor checksums (nor in-memory
> bitmaps IMV) are really safe enough for backups, so you really want to
> use something like the LSN. It's extra work, but opens up a world of
> possibilities.

OK, some comments on all of this.

* Wikipedia thinks the style of backup envisaged should be called "Incremental"
https://en.wikipedia.org/wiki/Differential_backup

* Base backups are worthless without WAL right up to the *last* LSN
seen during the backup, which is why pg_stop_backup() returns an LSN.
This is the LSN that is the effective viewpoint of the whole base
backup. So if we wish to get all changes since the last backup, we
must re-quote this LSN. (Or put another way - file level LSNs don't
make sense - we just need one LSN for the whole backup).

* When we take an incremental backup we need the WAL from the backup
start LSN through to the backup stop LSN. We do not need the WAL
between the last backup stop LSN and the new incremental start LSN.
That is a huge amount of WAL in many cases and we'd like to avoid
that, I would imagine. (So the space savings aren't just the delta
from the main data files, we should also look at WAL savings).

* For me, file based incremental is a useful and robust feature.
Block-level incremental is possible, but requires either significant
persistent metadata (1 MB per GB file) or access to the original
backup. One important objective here is to make sure we do NOT have to
re-read the last backup when taking the next backup; this helps us to
optimize the storage costs for backups. Plus, block-level recovery
requires us to have a program that correctly re-writes data into the
correct locations in a file, which seems likely to be a slow and bug
ridden process to me. Nice, safe, solid file-level incremental backup
first please. Fancy, bug prone, block-level stuff much later.

* One purpose of this could be to verify the backup. rsync provides a
checksum, pg_basebackup does not. However, checksums are frequently
prohibitively expensive, so perhaps asking for that is impractical and
maybe only a secondary objective.

* If we don't want/have file checksums, then we don't need a profile
file and using just the LSN seems fine. I don't think we should
specify that manually - the correct LSN is written to the backup_label
file in a base backup and we should read it back from there. We should
also write a backup_label file to incremental base backups, then we
can have additional lines saying what the source backups were. So full
base backup backup_labels remain as they are now, but we add one
additional line per increment, so we have the full set of increments,
much like a history file.

Normal backup_label files look like this

START WAL LOCATION: %X/%X
CHECKPOINT LOCATION: %X/%X
BACKUP METHOD: streamed
BACKUP FROM: standby
START TIME: 
LABEL: foo

so we would have a file that looks like this

START WAL LOCATION: %X/%X
CHECKPOINT LOCATION: %X/%X
BACKUP METHOD: streamed
BACKUP FROM: standby
START TIME: 
LABEL: foo
INCREMENTAL 1
START WAL LOCATION: %X/%X
CHECKPOINT LOCATION: %X/%X
BACKUP METHOD: streamed
BACKUP FROM: standby
START TIME: 
LABEL: foo incremental 1
INCREMENTAL 2
START WAL LOCATION: %X/%X
CHECKPOINT LOCATION: %X/%X
BACKUP METHOD: streamed
BACKUP FROM: standby
START TIME: 
LABEL: foo incremental 2
... etc ...

which we interpret as showing the original base backup, then the first
increment, then the second increment etc.. which allows us to recover
the backups in the correct sequence.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Append to a GUC parameter ?

2014-08-05 Thread Jerry Sievers
Using the 'include' keyword in postgresql.conf let's us do groovy things
like paste together pieces of  general purpose configs into the working file.

-- But since all we can do is set/reset the parameters the possibility
   of concatenating fragments that use some of the list parameters won't
   work.

To wit; suppose we had a fragment for enabling auto_explain and another
for enabling pg_stat_statements.

Both of these require us to put something in shared_preload_libraries
and both make use of custom_variable_classes.

Just FWIW...  Something like...

shared_preload_libraries += auto_explain

Would do the trick.

I've  never heard this mentioned before so presume not many have
contemplated this.

Comments?

Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consult...@comcast.net
p: 312.241.7800


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fixed redundant i18n strings in json

2014-08-05 Thread Robert Haas
On Sat, Aug 2, 2014 at 9:15 AM, Daniele Varrazzo
 wrote:
> I'd definitely replace /arg/argument/. Furthermore I'd avoid the form
> "argument 1: something is wrong": the string is likely to end up in
> sentences with other colons so I'd rather use "something is wrong at
> argument 1".
>
> Is the patch attached better?

Looks OK to me.  I thought someone else might comment, but since no
one has, committed.

(As a note for the future, you forgot to update the regression test
outputs; I took care of that before committing.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spinlocks and compiler/memory barriers

2014-08-05 Thread Robert Haas
On Sun, Jul 6, 2014 at 3:12 PM, Andres Freund  wrote:
>> > If you want to do that, it's fine with me.  What I would do is:
>> >
>> > - Back-patch the addition of the sparcv8+ stuff all the way.  If
>> > anyone's running anything older, let them complain...
>> > - Remove the special case for MIPS without gcc intrinsics only in
>> > master, leaving the back-branches broken.  If anyone cares, let them
>> > complain...
>> > - Nothing else.
>>
>> I've gone ahead and done the second of these things.
>
> Thanks.
>
>> Andres, do you want to go take a stab at fixing the SPARC stuff?
>
> Will do, will probably take me till thursday to come up with the brain
> cycles.

Ping?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-08-05 Thread Robert Haas
On Thu, Jun 5, 2014 at 4:43 AM, Amit Kapila  wrote:

> I have improved the patch  by making following changes:
> a. Improved the bgwriter logic to log for xl_running_xacts info and
> removed the hibernate logic as bgwriter will now work only when
> there is scarcity of buffer's in free list. Basic idea is when the
> number of buffers on freelist drops below the low threshold, the
> allocating backend sets the latch and bgwriter wakesup and begin
>
> adding buffer's to freelist until it reaches high threshold and then
>
> again goes back to sleep.
>

This essentially removes BgWriterDelay, but it's still mentioned in
BgBufferSync().  Looking further, I see that with the patch applied,
BgBufferSync() is still present in the source code but is no longer called
from anywhere.  Please don't submit patches that render things unused
without actually removing them; it makes it much harder to see what you've
changed.  I realize you probably left it that way for testing purposes, but
you need to clean such things up before submitting.  Likewise, if you've
rendered GUCs or statistics counters removed, you need to rip them out, so
that the scope of the changes you've made is clear to reviewers.

A comparison of BgBufferSync() with BgBufferSyncAndMoveBuffersToFreelist()
reveals that you've removed at least one behavior that some people (at
least, me) will care about, which is the guarantee that the background
writer will scan the entire buffer pool at least every couple of minutes.
This is important because it guarantees that dirty data doesn't sit in
memory forever.  When the system becomes busy again after a long idle
period, users will expect the system to have used the idle time to flush
dirty buffers to disk.  This also improves data recovery prospects if, for
example, somebody loses their pg_xlog directory - there may be dirty
buffers whose contents are lost, of course, but they won't be months old.

b.  New stats for number of buffers on freelist has been added, some
>  old one's like maxwritten_clean can be removed as new logic for
>  syncing buffers and moving them to free list doesn't use them.
>  However I think it's better to remove them once the new logic is
>  accepted.  Added some new logs for info related to free list under
>  BGW_DEBUG
>

If I'm reading this right, the new statistic is an incrementing counter
where, every time you update it, you add the number of buffers currently on
the freelist.  That makes no sense.  I think what you should be counting is
the number of allocations that are being satisfied from the free-list.
Then, by comparing the rate at which that value is incrementing to the rate
at which buffers_alloc is incrementing, somebody can figure out what
percentage of allocations are requiring a clock-sweep run.  Actually, I
think it's better to flip it around: count the number of allocations that
require an individual backend to run the clock sweep (vs. being satisfied
from the free-list); call it, say, buffers_backend_clocksweep.  We can then
try to tune the patch to make that number as small as possible under
varying workloads.

c.  Used the already existing bgwriterLatch in BufferStrategyControl to
>  wake bgwriter when number of buffer's in freelist drops below
>  threshold.
>

Seems like a good idea.


> d.  Autotune the low and high threshold for freelist for various
>  configurations.  Generally if keep small number (200~2000) of buffers
>  always available on freelist, then even for high shared buffers
>  like 15GB, it appears to be sufficient.  However when the value
>  of shared buffer's is less, then we need much smaller number.  I
>  think we can provide these as config knobs for user as well, but for
>  now based on LWLOCK_STATS result, I have chosen some hard
>  coded values for low and high threshold values for freelist.
>  Values for low and high threshold have been decided based on total
>  number of shared buffers, basically I have divided them into 5
>  categories (16~100, 100~1000,  1000~1, 1~10,
>  10 and above) and then ran tests(read-only pgbench) for various
>  configurations falling under these categories.  The reason for keeping
>  lesser categories for larger shared buffers is that if there are small
>  number (200~2000) of buffers available on free list, then it seems to
>  be sufficient for quite high loads, however as the total number of
> shared
>  buffer's decreases we need to be more careful as if we keep the
> number as
>  too low then it will lead to more clock sweep by backends (which means
>  freelist lock contention) and if we keep number higher bgwriter will
> evict
>  many useful buffers.  Results based on LWLOCK_STATS is at end of mail.
>

I think we need to come up with some kind of formula here rather than just
a list of hard-coded constants.  And it definitely needs some comments
explaining the logic 

Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-05 Thread testman1316
You are correct sir, 4.1 seconds.  Are you a consulant?  We ae looking for a 
Postgresql guru for advice.   We are doing a proof of concept of Postgresql on 
AWS

From: Mark Kirkwood-2 [via PostgreSQL] 
[mailto:ml-node+s1045698n5813763...@n5.nabble.com]
Sent: Tuesday, August 05, 2014 12:58 AM
To: Ramirez, Danilo
Subject: Re: PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

On 05/08/14 08:48, testman1316 wrote:

> We am trying to get an idea of the raw performance of Oracle vs PostgreSQL.
> We have extensive oracle experience but are new to PostgreSQL. We are going
> to run lots of queries with our data, etc. But first we wanted to see just
> how they perform on basic kernel tasks, i.e. math and branching since SQL is
> built on that.
>
> In AWS RDS we created two db.m3.2xlarge instances one with oracle
> 11.2.0.4.v1 license included, the other with PostgreSQL (9.3.3)
>
> In both we ran code that did 1 million square roots (from 1 to 1 mill). Then
> did the same but within an If..Then statement.
>
> The results were a bit troubling:
>
> Oracle  4.8 seconds
>
> PostgreSQL  21.803 seconds
>
> adding an if statement:
>
> Oracle  4.78 seconds
>
> PostgreSQL  24.4 seconds
>
> code Oracle square root
>
> SET SERVEROUTPUT ON
> SET TIMING ON
>
> DECLARE
>n NUMBER := 0;
> BEGIN
>FOR f IN 1..1000
> LOOP
>  n := SQRT (f);
>END LOOP;
> END;
>
> PostgreSQL
>
> DO LANGUAGE plpgsql $$ DECLARE n real;
> DECLARE f integer;
> BEGIN
> FOR f IN 1..1000 LOOP
> n = SQRT (f);
> END LOOP;
> RAISE NOTICE 'Result => %',n;
> END $$;
>
> oracle adding if
>
> SET SERVEROUTPUT ON
> SET TIMING ON
>
> DECLARE
>n NUMBER := 0;
> BEGIN
>FOR f IN 1..1000
> LOOP
>if 0 =0 then
>  n := SQRT (f);
>  end if;
>END LOOP;
>
> postgres adding if
>
> DO LANGUAGE plpgsql $$ DECLARE n real;
> DECLARE f integer;
> BEGIN
> FOR f IN 1..1000 LOOP
> if 0=0 then
> n = SQRT (f);
> end if;
> END LOOP;
> RAISE NOTICE 'Result => %',n;
> END $$;
>
> I used an anonymous block for PostgreSQL. I also did it as a function and
> got identical results
>
> CREATE OR REPLACE FUNCTION testpostgrescpu()
>RETURNS real AS
> $BODY$
> declare
>   n real;
>   f integer;
>
> BEGIN
> FOR f IN 1..1000 LOOP
>  n = SQRT (f);
> END LOOP;
>
>
> RETURN n;
> END;
> $BODY$
>LANGUAGE plpgsql VOLATILE
>COST 100;
> ALTER FUNCTION testpostgrescpu()
>OWNER TO xxx
>
> Based on what we had heard of PostgreSQL and how it is comparable to Oracle
> in many ways, we were taken aback by the results. Did we code PostgreSQL
> incorrectly? What are we missing or is this the way it is.
>
> Note: once we started running queries on the exact same data in Oracle and
> PostgreSQL we saw a similar pattern. On basic queries little difference, but
> as they started to get more and more complex Oracle was around 3-5 faster.
>
> Again, this was run on identical AWS RDS instances, we ran them many times
> during the day on different days and results were always the same
>
>
>
>
>


Looking at this guy:

DO LANGUAGE plpgsql $$ DECLARE n real;
DECLARE f integer;
BEGIN
FOR f IN 1..1000 LOOP
   n = SQRT (f);
END LOOP;
RAISE NOTICE 'Result => %',n;
END $$;

Takes about 12s with with Postgres 9.4 running on Ubuntu 14.04 hosted on
real HW (Intel i7).

Changing n to be float8 rather than real, i.e:

DO LANGUAGE plpgsql $$ DECLARE n float8;
DECLARE f integer;
BEGIN
FOR f IN 1..1000 LOOP
   n = SQRT (f);
END LOOP;
RAISE NOTICE 'Result => %',n;
END $$;

...time drops to about 2s (which I'm guessing would get it to about
Oracle speed on your EC2 setup).

The moral of the story for this case is that mapping Oracle to Postgres
datatypes can require some careful thought. Using 'native' types (like
integer, float8 etc) will generally give vastly quicker performance.


Adding in the 'if' in the float8 case increases run time to 4s. So looks
like plpgsql might have a slightly higher cost for handling added
conditionals. Be interesting to dig a bit more and see what is taking
the time.

Regards

Mark



--
Sent via pgsql-hackers mailing list ([hidden 
email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


If you reply to this email, your message will be added to the discussion below:
http://postgresql.1045698.n5.nabble.com/PostrgeSQL-vs-oracle-doing-1-million-sqrts-am-I-doing-it-wrong-tp5813732p5813763.html
To unsubscribe from PostrgeSQL vs oracle doing 1 million sqrts am I doing it 
wrong?, click 
here.
NAML

Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-05 Thread Shaun Thomas

On 08/05/2014 12:56 AM, Mark Kirkwood wrote:


The moral of the story for this case is that mapping Oracle to Postgres
datatypes can require some careful thought. Using 'native' types (like
integer, float8 etc) will generally give vastly quicker performance.


We've seen a lot of this ourselves. Oracle's NUMERIC is a native type, 
whereas ours is emulated. From the performance, it would appear that 
REAL is another calculated type.


At least you used INT though. I've seen too many Oracle shops using 
NUMERIC in PostgreSQL because it's there, and suffering major 
performance hits because of it.


That said, the documentation here says FLOAT4 is an alias for REAL, so 
it's somewhat nonintuitive for FLOAT4 to be so much slower than FLOAT8, 
which is an alias for DOUBLE PRECISION.


http://www.postgresql.org/docs/9.3/static/datatype.html

Not sure why that would be.

--
Shaun Thomas
OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Audit of logout

2014-08-05 Thread Fujii Masao
On Tue, Jul 15, 2014 at 10:45 PM, Magnus Hagander  wrote:
> On Wed, Jul 2, 2014 at 10:52 PM, Tom Lane  wrote:
>> I wrote:
>>> In short, maybe we ought to invent a new category PGC_SU_BACKEND (not
>>> wedded to this spelling), which is to PGC_BACKEND as PGC_SUSET is to
>>> PGC_USERSET, ie same when-it-can-be-changed behavior but only superusers
>>> are allowed to change it.  I don't have any objection to making these two
>>> settings only adjustable by superusers --- I just don't want to give up
>>> the existing timing restrictions for them.
>>
>> Another idea would be to get rid of PGC_SUSET as a separate category, and
>> instead have a superuser-only bit in the GUC flags, which would apply to
>> all categories.  This would be a bit more orthogonal, though likely a
>> much more invasive change.
>
> That could become interesting in the futuren ow that we have ALTER
> SYSTEM SET. It could allow a non-superuser to make persistent
> configuration changes. Now, I'm not sure we actually *want* that
> though... But having it as a separate bit would make it possible for
> ALTER SYSTEM SET to say that for example regular users would be able
> to change work_mem persistently. But if we want to go down that route,
> we might need a more fine grained permissions model than just
> superuser vs non-superuser...
>
> I think going with the PGC_SU_BACKEND is the right choice at this
> time, until we have an actual usecase for the other :)

Yep, the attached patch introduces PGC_SU_BACKEND and
changes the contexts of log_connections and log_disconnections
to PGC_SU_BACKEND. Review?

Regards,

-- 
Fujii Masao
*** a/src/backend/tcop/postgres.c
--- b/src/backend/tcop/postgres.c
***
*** 3258,3264  get_stats_option_name(const char *arg)
   * argv[0] is ignored in either case (it's assumed to be the program name).
   *
   * ctx is PGC_POSTMASTER for secure options, PGC_BACKEND for insecure options
!  * coming from the client, or PGC_SUSET for insecure options coming from
   * a superuser client.
   *
   * If a database name is present in the command line arguments, it's
--- 3258,3264 
   * argv[0] is ignored in either case (it's assumed to be the program name).
   *
   * ctx is PGC_POSTMASTER for secure options, PGC_BACKEND for insecure options
!  * coming from the client, or PGC_SU_BACKEND for insecure options coming from
   * a superuser client.
   *
   * If a database name is present in the command line arguments, it's
*** a/src/backend/utils/init/postinit.c
--- b/src/backend/utils/init/postinit.c
***
*** 957,963  process_startup_options(Port *port, bool am_superuser)
  	GucContext	gucctx;
  	ListCell   *gucopts;
  
! 	gucctx = am_superuser ? PGC_SUSET : PGC_BACKEND;
  
  	/*
  	 * First process any command-line switches that were included in the
--- 957,963 
  	GucContext	gucctx;
  	ListCell   *gucopts;
  
! 	gucctx = am_superuser ? PGC_SU_BACKEND : PGC_BACKEND;
  
  	/*
  	 * First process any command-line switches that were included in the
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
***
*** 512,517  const char *const GucContext_Names[] =
--- 512,518 
  	 /* PGC_INTERNAL */ "internal",
  	 /* PGC_POSTMASTER */ "postmaster",
  	 /* PGC_SIGHUP */ "sighup",
+ 	 /* PGC_SU_BACKEND */ "superuser-backend",
  	 /* PGC_BACKEND */ "backend",
  	 /* PGC_SUSET */ "superuser",
  	 /* PGC_USERSET */ "user"
***
*** 910,916  static struct config_bool ConfigureNamesBool[] =
  		NULL, NULL, NULL
  	},
  	{
! 		{"log_connections", PGC_BACKEND, LOGGING_WHAT,
  			gettext_noop("Logs each successful connection."),
  			NULL
  		},
--- 911,917 
  		NULL, NULL, NULL
  	},
  	{
! 		{"log_connections", PGC_SU_BACKEND, LOGGING_WHAT,
  			gettext_noop("Logs each successful connection."),
  			NULL
  		},
***
*** 919,925  static struct config_bool ConfigureNamesBool[] =
  		NULL, NULL, NULL
  	},
  	{
! 		{"log_disconnections", PGC_BACKEND, LOGGING_WHAT,
  			gettext_noop("Logs end of a session, including duration."),
  			NULL
  		},
--- 920,926 
  		NULL, NULL, NULL
  	},
  	{
! 		{"log_disconnections", PGC_SU_BACKEND, LOGGING_WHAT,
  			gettext_noop("Logs end of a session, including duration."),
  			NULL
  		},
***
*** 5681,5696  set_config_option(const char *name, const char *value,
  			 * signals to individual backends only.
  			 */
  			break;
  		case PGC_BACKEND:
  			if (context == PGC_SIGHUP)
  			{
  /*
!  * If a PGC_BACKEND parameter is changed in the config file,
!  * we want to accept the new value in the postmaster (whence
!  * it will propagate to subsequently-started backends), but
!  * ignore it in existing backends.  This is a tad klugy, but
!  * necessary because we don't re-read the config file during
!  * backend start.
   *
   * In EXEC_BACKEND builds, this works differently: we load all
   * nondefault settings from the CONFIG_EXEC_PARAMS file duri

[HACKERS] how to debug into InitPostgres() and InitCatalogCache()?

2014-08-05 Thread 土卜皿
hi, all
  I already can debug general postgres using "ddd" and "select
pg_backend_pid();" ,  now, I want to study the details of the system
catalog cache and system cache management, so I need to debug the function
InitPostgres() and InitCatalogCache(), and I tried the following steps:

[refer: How to debug postgresql during initdb ]


(1) in first terminal:

ddd initdb

and set a breakpoint on the next executable line after “PG_CMD_OPEN” in
function bootstrap_template1() ,

Then within gdb, use “r $myfolder/mydb”,

and found the cursor stop at the breakpoint

(2) in the second terminal:

ps aux | grep "bin\/postgres"

after get the posgtres server pid, start a new ddd without args, and within
gdb, using "attach postgres.pid"

and set a breakpoint as InitCatalogCache()

(3) back to the first ddd window, hit the "cont" button. I hope it can stop
at the second ddd's breakpoint, but nothing happened?

I must make mistake in some step, or my understanding is fully wrong,
please give me some advice, thanks a lot!

Dillon


Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-05 Thread Roberto Mello
On Tue, Aug 5, 2014 at 9:50 AM, Kevin Grittner  wrote:
>
> Since that is outside the loop, the difference should be nominal;

Apologies. I misread on my phone and though it was within the loop.

> and in a quick test it was.  On the other hand, reducing the
> procedural code made a big difference.



> test=# DO LANGUAGE plpgsql $$ DECLARE n real;
> BEGIN
>   PERFORM SQRT(f) FROM generate_series(1, 1000) x(f);
> END $$;
> DO
> Time: 3916.815 ms

That is a big difference. Are you porting a lot of code from PL/SQL,
and therefore evaluating the performance difference of running this
code? Or is this just a general test where you wish to assess the
performance difference?

PL/pgSQL could definitely use some loving, as far as optimization
goes, but my feeling is that it hasn't happened because there are
other suitable backends that give the necessary flexibility for the
different use cases.

Roberto


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-05 Thread Kevin Grittner
Roberto Mello  wrote:

> In addition to the other suggestions that have been posted (using
> a procedural language more suitable to mathematical ops, etc) I
> noticed that you are using a RAISE in the PostgreSQL version that
> you are not in Oracle.
>
> I am curious as to what the difference is if you use the RAISE in
> both or neither cases.

Since that is outside the loop, the difference should be nominal;
and in a quick test it was.  On the other hand, reducing the
procedural code made a big difference.

test=# \timing on
Timing is on.
test=# DO LANGUAGE plpgsql $$ DECLARE n real;
DECLARE f integer;
BEGIN
FOR f IN 1..1000 LOOP
n = SQRT (f);
END LOOP;
RAISE NOTICE 'Result => %',n;
END $$;
NOTICE:  Result => 3162.28
DO
Time: 23687.914 ms

test=# DO LANGUAGE plpgsql $$ DECLARE n real;
BEGIN
  PERFORM SQRT(f) FROM generate_series(1, 1000) x(f);
END $$;
DO
Time: 3916.815 ms

Eliminating the plpgsql function entirely shaved a little more off:

test=# SELECT  FROM generate_series(1, 1000) x(f);
Time: 3762.886 ms

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-08-05 Thread Fujii Masao
On Wed, Jul 23, 2014 at 5:21 PM, Pavan Deolasee
 wrote:
> 1. Need for compressing full page backups:
> There are good number of benchmarks done by various people on this list
> which clearly shows the need of the feature. Many people have already voiced
> their agreement on having this in core, even as a configurable parameter.

Yes!

> Having said that, IMHO we should go one step at a time. We are using pglz
> for compressing toast data for long, so we can continue to use the same for
> compressing full page images. We can simultaneously work on adding more
> algorithms to core and choose the right candidate for different scenarios
> such as toast or FPW based on test evidences. But that work can happen
> independent of this patch.

This gradual approach looks good to me. And, if the additional compression
algorithm like lz4 is always better than pglz for every scenarios, we can just
change the code so that the additional algorithm is always used. Which would
make the code simpler.

> 3. Compressing one block vs all blocks:
> Andres suggested that compressing all backup blocks in one go may give us
> better compression ratio. This is worth trying. I'm wondering what would the
> best way to do so without minimal changes to the xlog insertion code. Today,
> we add more rdata items for backup block header(s) and backup blocks
> themselves (if there is a "hole" then 2 per backup block) beyond what the
> caller has supplied. If we have to compress all the backup blocks together,
> then one approach is to copy the backup block headers and the blocks to a
> temp buffer, compress that and replace the rdata entries added previously
> with a single rdata.

Basically sounds reasonable. But, how does this logic work if there are
multiple rdata and only some of them are backup blocks?

If a "hole" is not copied to that temp buffer, ISTM that we should
change backup block header  so that it contains the info for a
"hole", e.g., location that a "hole" starts. No?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format and API changes (9.5)

2014-08-05 Thread Michael Paquier
On Sat, Aug 2, 2014 at 1:40 AM, Heikki Linnakangas 
wrote:
> I ran this through my WAL page comparison tool to verify that all the WAL
> record types are replayed correctly (although I did some small cleanup
after
> that, so it's not impossible that I broke it again; will re-test before
> committing).
I have run as well some tests on it with a master a a standby to check WAL
construction and replay:
- regression tests as well as tests from contrib, isolation
- WAL page comparison tool
- Some tests with pgbench
- When testing pg_xlogdump I found an assertion failure for record
XLOG_GIN_INSERT:
Assertion failed: (((bool) (((const void*)(&insertData->newitem.key) !=
((void*)0)) && ((&insertData->newitem.key)->ip_posid != 0, function
gin_desc, file gindesc.c, line 127.

Then, I had a look at the code, and here are some comments:
- This comment has been introduced with 96ef3b8 that has added page
checksums. Perhaps the authors can tell what was the original purpose of
this comment (Jeff, Simon). For now, it may be better to remove this FIXME
and to let this comment as is.
+* FIXME: I didn't understand below comment. Is it still
relevant?
+*
 * This also means there is no corresponding API call for
this, so an
 * smgr implementation has no need to implement anything.
Which means
 * nothing is needed in md.c etc
- Would it be a win to add an assertion check for (CritSectionCount == 0)
in XLogEnsureRecordSpace?
- There is no mention of REGBUF_NO_IMAGE in transam/README.
- This patch adds a lot of blocks like that in the redo code:
+ if (BufferIsValid(buffer))
+   UnlockReleaseBuffer(buffer);
What do you think about adding a new macro like UnlockBufferIfValid(buffer)?
- Don't we need a call to XLogBeginInsert() in XLogSaveBufferForHint():
+   int flags = REGBUF_FORCE_IMAGE;
+   if (buffer_std)
+   flags |= REGBUF_STANDARD;
+   XLogRegisterBuffer(0, buffer, flags);
[...]
-   recptr = XLogInsert(RM_XLOG_ID, XLOG_FPI, rdata);
+   recptr = XLogInsert(RM_XLOG_ID, XLOG_FPI);
- There is a typo in the header of xlogrecord.h, the "to" is not needed:
+ * Definitions for to the WAL record
format.

- There is still a TODO item in ValidXLogRecordHeader to check if block
data header is valid or not. Just mentioning.
- Just came across that: s/reference refes to/reference refers to
- XLogRecGetBlockRefIds needing an already-allocated array for *out is not
user-friendly. Cannot we just do all the work inside this function?
- All the functions in xlogconstruct.c could be defined in a separate
header xlogconstruct.h. What they do is rather independent from xlog.h. The
same applies to all the structures and functions of xlogconstruct.c in
xlog_internal.h: XLogRecordAssemble, XLogRecordAssemble,
InitXLogRecordConstruct. (worth noticing as well that the only reason
XLogRecData is defined externally of xlogconstruct.c is that it is being
used in XLogInsert()).

The patch is more solid and better structured than the previous versions.
It is in good shape.
Regards,
-- 
Michael


Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-05 Thread Roberto Mello
Em segunda-feira, 4 de agosto de 2014, testman1316 
escreveu:

>
> SET SERVEROUTPUT ON
> SET TIMING ON
>
> DECLARE
>   n NUMBER := 0;
> BEGIN
>   FOR f IN 1..1000
> LOOP
> n := SQRT (f);
>   END LOOP;
>
>
In addition to the other suggestions that have been posted (using a
procedural language more suitable to mathematical ops, etc) I noticed that
you are using a RAISE in the PostgreSQL version that you are not in Oracle.

I am curious as to what the difference is if you use the RAISE in both or
neither cases.

Roberto


Re: [HACKERS] pg_receivexlog add synchronous mode

2014-08-05 Thread Fujii Masao
On Tue, Aug 5, 2014 at 9:04 PM, Fujii Masao  wrote:
> On Tue, Jul 29, 2014 at 7:07 PM,   wrote:
>> I have improved the patch  by making following changes:
>>
>> 1. Since stream_stop() was redundant, stream_stop() at the time of WAL file 
>> closing was deleted.
>>
>> 2. Change the Flash judging timing for the readability of source code.
>>I have changed the Flash judging timing , from the continuous message 
>> after receiving to
>>before the feedbackmassege decision of continue statement after execution.
>
> Thanks for the updated version of the patch!
>
> While reviewing the patch, I found that HandleCopyStream() is still
> long and which decreases the readability of the source code.
> So I feel inclined to refactor the HandleCopyStream() more for better
> readability. What about the attached refactoring patch?

Sorry, I forgot to attached the patch in previous email. So attached.

Regards,

-- 
Fujii Masao
*** a/src/bin/pg_basebackup/receivelog.c
--- b/src/bin/pg_basebackup/receivelog.c
***
*** 31,42  static char current_walfile_name[MAXPGPATH] = "";
--- 31,53 
  static bool reportFlushPosition = false;
  static XLogRecPtr lastFlushPosition = InvalidXLogRecPtr;
  
+ static bool still_sending = true;		/* feedback still needs to be sent? */
+ 
  static PGresult *HandleCopyStream(PGconn *conn, XLogRecPtr startpos,
   uint32 timeline, char *basedir,
  			   stream_stop_callback stream_stop, int standby_message_timeout,
   char *partial_suffix, XLogRecPtr *stoppos);
  static int CopyStreamPoll(PGconn *conn, long timeout_ms);
  static int CopyStreamReceive(PGconn *conn, long timeout, char **buffer);
+ static bool ProcessKeepaliveMsg(PGconn *conn, char *copybuf, int len,
+ XLogRecPtr blockpos, int64 *last_status);
+ static bool ProcessXLogDataMsg(PGconn *conn, char *copybuf, int len,
+ 			   XLogRecPtr *blockpos, uint32 timeline,
+ 			   char *basedir, stream_stop_callback stream_stop,
+ 			   char *partial_suffix);
+ static PGresult *HandleEndOfCopyStream(PGconn *conn, char *copybuf,
+ 	   XLogRecPtr blockpos, char *basedir, char *partial_suffix,
+ 	   XLogRecPtr *stoppos);
  
  static bool ReadEndOfStreamingResult(PGresult *res, XLogRecPtr *startpos,
  		 uint32 *timeline);
***
*** 740,755  HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
  	char	   *copybuf = NULL;
  	int64		last_status = -1;
  	XLogRecPtr	blockpos = startpos;
! 	bool		still_sending = true;
  
  	while (1)
  	{
  		int			r;
- 		int			xlogoff;
- 		int			bytes_left;
- 		int			bytes_written;
  		int64		now;
- 		int			hdr_len;
  		long		sleeptime;
  
  		/*
--- 751,763 
  	char	   *copybuf = NULL;
  	int64		last_status = -1;
  	XLogRecPtr	blockpos = startpos;
! 
! 	still_sending = true;
  
  	while (1)
  	{
  		int			r;
  		int64		now;
  		long		sleeptime;
  
  		/*
***
*** 818,1015  HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
  			goto error;
  		if (r == -2)
  		{
! 			PGresult   *res = PQgetResult(conn);
! 
! 			/*
! 			 * The server closed its end of the copy stream.  If we haven't
! 			 * closed ours already, we need to do so now, unless the server
! 			 * threw an error, in which case we don't.
! 			 */
! 			if (still_sending)
! 			{
! if (!close_walfile(basedir, partial_suffix, blockpos))
! {
! 	/* Error message written in close_walfile() */
! 	PQclear(res);
! 	goto error;
! }
! if (PQresultStatus(res) == PGRES_COPY_IN)
! {
! 	if (PQputCopyEnd(conn, NULL) <= 0 || PQflush(conn))
! 	{
! 		fprintf(stderr,
! _("%s: could not send copy-end packet: %s"),
! progname, PQerrorMessage(conn));
! 		PQclear(res);
! 		goto error;
! 	}
! 	PQclear(res);
! 	res = PQgetResult(conn);
! }
! still_sending = false;
! 			}
! 			if (copybuf != NULL)
! PQfreemem(copybuf);
! 			copybuf = NULL;
! 			*stoppos = blockpos;
! 			return res;
  		}
  
  		/* Check the message type. */
  		if (copybuf[0] == 'k')
  		{
! 			int			pos;
! 			bool		replyRequested;
! 
! 			/*
! 			 * Parse the keepalive message, enclosed in the CopyData message.
! 			 * We just check if the server requested a reply, and ignore the
! 			 * rest.
! 			 */
! 			pos = 1;			/* skip msgtype 'k' */
! 			pos += 8;			/* skip walEnd */
! 			pos += 8;			/* skip sendTime */
! 
! 			if (r < pos + 1)
! 			{
! fprintf(stderr, _("%s: streaming header too small: %d\n"),
! 		progname, r);
  goto error;
- 			}
- 			replyRequested = copybuf[pos];
- 
- 			/* If the server requested an immediate reply, send one. */
- 			if (replyRequested && still_sending)
- 			{
- now = feGetCurrentTimestamp();
- if (!sendFeedback(conn, blockpos, now, false))
- 	goto error;
- last_status = now;
- 			}
  		}
  		else if (copybuf[0] == 'w')
  		{
! 			/*
! 			 * Once we've decided we don't want to receive any more, just
! 			 * ignore any subsequent XLogData m

Re: [HACKERS] pg_receivexlog add synchronous mode

2014-08-05 Thread Fujii Masao
On Tue, Jul 29, 2014 at 7:07 PM,   wrote:
> I have improved the patch  by making following changes:
>
> 1. Since stream_stop() was redundant, stream_stop() at the time of WAL file 
> closing was deleted.
>
> 2. Change the Flash judging timing for the readability of source code.
>I have changed the Flash judging timing , from the continuous message 
> after receiving to
>before the feedbackmassege decision of continue statement after execution.

Thanks for the updated version of the patch!

While reviewing the patch, I found that HandleCopyStream() is still
long and which decreases the readability of the source code.
So I feel inclined to refactor the HandleCopyStream() more for better
readability. What about the attached refactoring patch?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql: show only failed queries

2014-08-05 Thread Fujii Masao
On Wed, Jul 16, 2014 at 4:34 AM, Pavel Stehule  wrote:
>
>
>
> 2014-07-15 12:07 GMT+02:00 Fujii Masao :
>
>> On Tue, Jul 15, 2014 at 7:01 PM, Pavel Stehule 
>> wrote:
>> >
>> >
>> >
>> > 2014-07-15 11:29 GMT+02:00 Fujii Masao :
>> >
>> >> On Thu, Jul 10, 2014 at 9:56 PM, Pavel Stehule
>> >> 
>> >> wrote:
>> >> > Hello
>> >> >
>> >> > here is a proposed patch - autocomplete for known psql variable
>> >> > content
>> >>
>> >> Even after applying the patch, the following psql variables were not
>> >> displayed
>> >> on the tab-completion of \set command.
>> >>
>> >> HISTFILE
>> >> HISTSIZE
>> >> HOST
>> >> IGNOREEOF
>> >> LASTOID
>> >>
>> >> I'm not sure if it's worth displaying all of them on the tab-completion
>> >> of
>> >> \set
>> >> because it seems strange to change some of them by \set command, for
>> >> example
>> >> HOST, though.
>> >
>> >
>> > For these variables are not default - so doesn't exist and cannot be
>> > completed by default.
>> >
>> > there are two fix:
>> >
>> > a) fix autocomplete source - preferred
>>
>> +1
>
>
> here is the patch

Thanks for the patch!

I got the following compiler warnings.

tab-complete.c:4155: warning: assignment discards qualifiers from
pointer target type
tab-complete.c:4166: warning: assignment discards qualifiers from
pointer target type

+"FETCH_COUNT", "HISTCONTROL", "HISTFILE", "HISTSIZE", "HOST",
"IGNOREOFF",

Typo: IGNOREOFF -> IGNOREEOF

+/* all psql known variables are included in list by default */
+for (known_varname = known_varnames; *known_varname; known_varname++)
+varnames[nvars++] = pg_strdup(*known_varname);

Don't we need to append both prefix and suffix to even known variables?

+else if (strcmp(prev2_wd, "\\set") == 0)
+{
+if (strcmp(prev_wd, "AUTOCOMMIT") == 0)

ISTM that some psql variables like IGNOREEOF are not there. Why not?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Introducing coarse grain parallelism by postgres_fdw.

2014-08-05 Thread Ashutosh Bapat
Hi Kyotaro,
I looked at the patches and felt that the approach taken here is too
intrusive, considering that the feature is only for foreign scans.

There are quite a few members added to the generic Path, Plan structures,
whose use is is induced only through foreign scans. Each path now stores
two sets of costs, one with parallelism and one without. The parallel
values will make sense only when there is a foreign scan, which uses
parallelism, in the plan tree. So, those costs are maintained unnecessarily
or the memory for those members is wasted in most of the cases, where the
tables involved are not foreign. Also, not many foreign tables will be able
to use the parallelism, e.g. file_fdw. Although, that's my opinion; I would
like hear from others.

Instead, an FDW which can use parallelism can add two paths one with and
one without parallelism with appropriate costs and let the logic choosing
the cheapest path take care of the actual choice. In fact, I thought,
parallelism would be always faster than the non-parallel one, except when
the foreign server is too much loaded. But we won't be able to check that
anyway. Can you point out a case where the parallelism may not win over
serial execution?

BTW, the name parallelism seems to be misleading here. All, it will be able
to do is fire the queries (or data fetch requests) asynchronously. So, we
might want to change the naming appropriately.


On Fri, Aug 1, 2014 at 2:48 PM, Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:

> Hello,
>
> > Hello, this is the new version which is complete to some extent
> > of parallelism based on postgres_fdw.
> >
> > This compares the costs for parallel and non-parallel execution
> > and choose parallel one if it is faster by some extent specified
> > by GUCs. The attached files are,
> >
> >  0001_parallel_exec_planning_v0.patch:
> >- PostgreSQL body stuff for parallel execution planning.
> >
> >  0002_enable_postgres_fdw_to_run_in_parallel_v0.patch:
> >- postgres_fdw parallelization.
> >
> >  0003_file_fdw_changes_to_avoid_error.patch:
> >- error avoidig stuff for file_fdw (not necessary for this patch)
> >
> >  env.sql:
> >- simple test script to try this patch.
> >
> > =
> >
> >  - planner stuff to handle cost of parallel execution. Including
> >indication of parallel execution.
> >
> >  - GUCs to control how easy to go parallel.
> >
> >parallel_cost_threshold is the threshold of path total cost
> >where to enable parallel execution.
> >
> >prallel_ratio_threshond is the threshold of the ratio of
> >parallel cost to non-parallel cost where to choose the
> >parallel path.
> >
> >  - postgres_fdw which can run in multiple sessions using snapshot
> >export and fetches in parallel for foreign scans on dedicated
> >connections.
>
> But now the effect of async execution of FETCH'es is omitted
> during planning.
>
> >foreign server has a new option 'max_aux_connections', which
> >limits the number of connections for parallel execution per
> >(server, user) pairs.
> >
> >  - change file_fdw to follow the changes of planner stuff.
> >
> >
> > Whth the patch attached, the attached sql script shows the
> > following result (after some line breaks are added).
> >
> > postgres=# EXPLAIN ANALYZE SELECT a.a, a.b, b.c
> >FROM fvs1 a join fvs1_2 b on (a.a = b.a);
> > QUERY PLAN
> >
> 
> > Hash Join  (cost=9573392.96..9573393.34 rows=1 width=40 parallel)
> >(actual time=2213.400..2213.407 rows=12 loops=1)
> >  Hash Cond: (a.a = b.a)
> >  ->  Foreign Scan on fvs1 a
> >(cost=9573392.96..9573393.29 rows=10 width=8 parallel)
> >(actual time=2199.992..2199.993 rows=10 loops=1)
> >  ->  Hash  (cost=9573393.29..9573393.29 rows=10 width=36)
> >(actual time=13.388..13.388 rows=10 loops=1)
> >Buckets: 1024  Batches: 1  Memory Usage: 6kB
> >->  Foreign Scan on fvs1_2 b
> >(cost=9573392.96..9573393.29 rows=10 width=36
> parallel)
> >(actual time=13.376..13.379 rows=10 loops=1)
> >  Planning time: 4.761 ms
> >  Execution time: 2227.462 ms
> > (8 rows)
> > postgres=# SET parallel_ratio_threshold to 0.0;
> > postgres=# EXPLAIN ANALYZE SELECT a.a, a.b, b.c
> >FROM fvs1 a join fvs1 b on (a.a = b.a);
> > QUERY PLAN
> >
> --
> >  Hash Join  (cost=318084.32..318084.69 rows=1 width=40)
> > (actual time=4302.913..4302.928 rows=12 loops=1)
> >Hash Cond: (a.a = b.a)
> >->  Foreign Scan on fvs1 a  (cost=159041.93..159042.26 rows=10
> width=8)
> >(actual time=2122.989..2122.992 rows=10
> loops=1)
> >->  Hash  (cost=159042.26..159042.26 rows=10 width=500)
> >  (actual time=2179.900..2179

[HACKERS] Patch to support SEMI and ANTI join removal

2014-08-05 Thread David Rowley
I've been working away at allowing semi and anti joins to be added to the
list of join types that our join removal code supports.

The basic idea is that we can removal a semi or anti join if the left hand
relation references the relation that's being semi/anti joined if the join
condition matches a foreign key definition on the left hand relation.

To give an example:

Given the 2 tables:
create table t2 (t1_id int primary key);
create table t1 (value int references t2);

The join to t2 would not be required in:

select * from t1 where value in(select t1_id from t2);

Neither would it be here:

select * from t1 where not exists(select 1 from t2 where t1_id=value);

To give a bit of background, I initially proposed the idea here:

http://www.postgresql.org/message-id/caaphdvq0nai8ceqtnndqg6mhfh__7_a6tn9xu4v0cut9wab...@mail.gmail.com

And some issues were raised around the fact that updates to the referenced
relation would only flush out changes to the referencing tables on
completion of the command, and if we happened to be planning a query that
was located inside a volatile function then we wouldn't know that the
parent query hadn't updated some of these referenced tables.

Noah raised this concern here:
http://www.postgresql.org/message-id/20140603235053.ga351...@tornado.leadboat.com
But proposed a solution here:
http://www.postgresql.org/message-id/20140605000407.ga390...@tornado.leadboat.com

In the attached I've used Noah's solution to the problem, and it seems to
work just fine. (See regression test in the attached patch)

Tom raised a point here:
http://www.postgresql.org/message-id/19326.1401891...@sss.pgh.pa.us

Where he mentioned that it may be possible that the foreign key trigger
queue gets added to after planning has taken place.
I've spent some time looking into this and I've not yet managed to find a
case where this matters as it seems that updates made in 1 command are not
visible to that same command. I've tested various different test cases in
all transaction isolation levels and also tested update commands which call
volatile functions that perform updates in the same table that the outer
update will reach later in the command.

The patch (attached) is also now able to detect when a NOT EXISTS clause
cannot produce any records at all.

If I make a simple change to the tables I defined above:

ALTER TABLE t1 ALTER COLUMN value SET NOT NULL;

Then the following will be produced:

explain (costs off) select * from t1 where not exists(select 1 from t2
where t1_id=value);
QUERY PLAN
--
 Result
   One-Time Filter: false
   ->  Seq Scan on t1

A small note on my intentions with this patch:

I'm not seeing the use case for all of this to be massive, I'm more
interested in this patch to use it as a stepping stone towards implementing
INNER JOIN removals which would use foreign keys in a similar way to
attempt to prove that the join is not required. I decided to tackle semi
and anti joins first as these are a far more simple case, and it also adds
quite a bit of the infrastructure that would be required for inner join
removal, plus if nobody manages to poke holes in my ideas with this then I
should have good grounds to begin the work on the inner join removal code.
I also think if we're bothering to load foreign key constraints at planning
time, then only using them for inner join removals wouldn't be making full
use of them, so likely this patch would be a good idea anyway.

Currently most of my changes are in analyzejoin.c, but I did also have to
make changes to load the foreign key constraints so that they were
available to the planner. One thing that is currently lacking, which would
likely be needed, before the finished patch is ready, would be a
"relhasfkeys" column in pg_class. Such a column would mean that it would be
possible to skip scanning pg_constraint for foreign keys when there's none
to find. I'll delay implementing that until I get a bit more feedback to
weather this patch would be a welcome addition to the existing join removal
code or not.

I'm submitting this (a little early) for the August commitfest, but if
anyone has any time to glance at it before then then that would be a really
good help.

Regards

David Rowley


semianti_join_removal_2410c7c_2014-08-05.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-05 Thread Marti Raudsepp
On Mon, Aug 4, 2014 at 11:48 PM, testman1316  wrote:
> In both we ran code that did 1 million square roots (from 1 to 1 mill). Then
> did the same but within an If..Then statement.

> Note: once we started running queries on the exact same data in Oracle and
> PostgreSQL we saw a similar pattern. On basic queries little difference, but
> as they started to get more and more complex Oracle was around 3-5 faster.

Looks like from the test cases you posted, you're not actually
benchmarking any *queries*, you're comparing the speeds of the
procedural languages. And yes, PL/pgSQL is known to be a farily slow
language.

If you want fair benchmark results, you should instead concentrate on
what databases are supposed to do: store and retrieve data; finding
the most optimal way to execute complicated SQL queries. In most
setups, that's where the majority of database processor time is spent,
not procedure code.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Gabriele Bartolini
Hi Claudio,

  I think there has been a misunderstanding. I agree with you (and I
think also Marco) that LSN is definitely a component to consider in
this process. We will come up with an alternate proposal which
considers LSNS either today or tomorrow. ;)

Thanks,
Gabriele
--
 Gabriele Bartolini - 2ndQuadrant Italia
 PostgreSQL Training, Services and Support
 gabriele.bartol...@2ndquadrant.it | www.2ndQuadrant.it


2014-08-04 20:30 GMT+02:00 Claudio Freire :
> On Mon, Aug 4, 2014 at 5:15 AM, Gabriele Bartolini
>  wrote:
>>I really like the proposal of working on a block level incremental
>> backup feature and the idea of considering LSN. However, I'd suggest
>> to see block level as a second step and a goal to keep in mind while
>> working on the first step. I believe that file-level incremental
>> backup will bring a lot of benefits to our community and users anyway.
>
> Thing is, I don't see how the LSN method is that much harder than an
> on-disk bitmap. In-memory bitmap IMO is just a recipe for disaster.
>
> Keeping a last-updated-LSN for each segment (or group of blocks) is
> just as easy as keeping a bitmap, and far more flexible and robust.
>
> The complexity and cost of safely keeping the map up-to-date is what's
> in question here, but as was pointed before, there's no really safe
> alternative. Nor modification times nor checksums (nor in-memory
> bitmaps IMV) are really safe enough for backups, so you really want to
> use something like the LSN. It's extra work, but opens up a world of
> possibilities.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-05 Thread Mark Kirkwood

On 05/08/14 17:56, Mark Kirkwood wrote:



Adding in the 'if' in the float8 case increases run time to 4s. So looks
like plpgsql might have a slightly higher cost for handling added
conditionals. Be interesting to dig a bit more and see what is taking
the time.



Thinking about this a bit more, I wonder if the 'big O' has added some 
optimizations in PL/SQL for trivial conditionals - i.e you are adding:


IF (0 = 0) THEN

END IF;

...it may be going...'Ah yes, always true...so remove'!

So it might be interesting to try some (hopefully not so easily 
removable) non trivial ones like:



DO LANGUAGE plpgsql $$ DECLARE
DECLARE i integer;
BEGIN
FOR i IN 1..1000 LOOP
  IF (i%100 = 0) THEN
NULL;
  END IF;
END LOOP;
END $$;

Now I guess there is the chance that PL/SQL might understand that NULL 
inside a loop means it can remove it...so you may need to experiment 
further. The point to take away here is that for interesting loops and 
conditions - there may be not such a significant difference!


Regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers