Re: [HACKERS] GIN improvements part2: fast scan

2014-01-27 Thread Heikki Linnakangas

On 01/28/2014 05:54 AM, Tomas Vondra wrote:

Then I ran those scripts on:

   * 9.3
   * 9.4 with Heikki's patches (9.4-heikki)
   * 9.4 with Heikki's and first patch (9.4-alex-1)
   * 9.4 with Heikki's and both patches (9.4-alex-2)


It would be good to also test with unpatched 9.4 (ie. git master). The 
packed posting lists patch might account for a large part of the 
differences between 9.3 and the patched 9.4 versions.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix comment typo in /src/backend/command/cluster.c

2014-01-27 Thread Heikki Linnakangas

On 01/28/2014 08:58 AM, David Fetter wrote:

On Tue, Jan 28, 2014 at 02:51:22PM +0900, Michael Paquier wrote:

On Tue, Jan 28, 2014 at 2:29 PM, David Fetter  wrote:

On Tue, Jan 28, 2014 at 04:48:35PM +1300, Gavin Flower wrote:

I came across that abbreviation in a first years Maths course
"Principles of Mathematics" in 1968 at the University of Auckland..


By my rough count (ack -l '\biff\b' |wc -l), it's used to mean
equivalence 81 times in the source tree.  Should we have a glossary of
such terms?

And what about directly replacing those expressions in the comments of
the code with some more understandable language? This would be more
suited for non-native English speakers than maintaining a glossary
that you can surely find here and there after some googling.


I'm interested to find 29 instances of "if and only if" in the source,
which should be the same thing.

Please find attached a mechanically done patch which expands the
remaining instances of "iff" to the longer form, all of which are in
comments.  The patched source passes make -j8, but I have not tested
it further.


"iff" is well-known abbreviation, I don't see a need to expunge it from 
the source code. There might be places where some other wording or 
spelling it out as "if and only if" would be better, but a mechanical 
search/replace is not warranted.


FWIW, many other languages use a similar abbreviation for the same 
thing, so it's not impossible for a non-native English speaker with 
basic math education to guess. In Finnish, it's "joss", which stands for 
"jos ja vain jos", and a quick look at Wikipedia shows a similar 
construct in many other languages.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git-review: linking commits to review discussion in git

2014-01-27 Thread Heikki Linnakangas

On 01/27/2014 11:36 PM, Murtuza Mukadam wrote:

Hello All,
   We have linked peer review discussions on
'pgsql-hackers' to their respective commits within the main
postgresql.git repository. You can view the linked reviews from 2012
until present in the GitHub repo at
https://github.com/mmukadam/postgres/tree/review

If you want to work with these reviews locally, you can use our
git-review tool. It allows you to create reviews and attach them to
commits in git. We didn't modify git, instead we added some scripts
that use standard git commands. git-review is beta, but since it only
adds a detached 'review' branch and modifies the contents of this
branch, it has minimal impact and can easily be removed by deleting
the 'review' branch and scripts.

The online man-page is here:
http://users.encs.concordia.ca/~m_mukada/git-review.html

In order to install git-review, you need to clone the repository:
https://github.com/mmukadam/git-review.git

The online tutorial is available here:
http://users.encs.concordia.ca/~m_mukada/git-review-tutorial.html

The clone of postgresql.git with linked review discussion is here (new
review discussion are linked nightly)
https://github.com/mmukadam/postgres

This work is part of my Master's thesis. If you'd like us to change
the tool to better suit your review process, have another git repo
you'd like us to link commits with review discussion, or have other
feedback, please let us know.


I don't understand what this does. The repository at 
https://github.com/mmukadam/postgres looks like just a clone of the main 
PostgreSQL repository, with no extra links anywhere. And the repository 
at https://github.com/mmukadam/postgres/tree/review looks like a mailing 
list archive turned into a git repository, but I don't see any links to 
the commits in the main repository there.


Am I missing something?

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Keep pg_stat_statements' query texts in a file, not in shared me

2014-01-27 Thread Peter Geoghegan
Then I will post my response.

On Mon, Jan 27, 2014 at 8:54 PM, Peter Geoghegan  wrote:
> On Mon, Jan 27, 2014 at 8:20 PM, KONDO Mitsumasa
>  wrote:
>> At least, only postgres superuser can see pg_stat_statemnet view in old
>> version.
>> And you should change document at this sentences.
>
> No, it was precisely the same situation in every way that matters; the
> texts and other details were serialized to disk, on the same
> filesystem, with the same permissions. The only difference was that
> the texts might not be completely current back then, because time can
> pass between shutdown/serialization. From a security perspective,
> that's obviously 100% equivalent. Now, if you're saying that the
> option to not store the texts existed back then and that made a big
> difference, I'm completely unconvinced. That is not the security model
> that we follow. How could it be? You're asking us to believe that
> there is an environment in which statement execution costs and
> normalized query strings are more confidential than *all* of the
> actual data stored in the database. You're proposing that for some
> users the former is top secret, but the latter is somewhat less
> confidential, so it matters that root can also read query texts, a
> user already naturally entitled to read *all* data in the database.
> You have forced me to say what I would have preferred not to: that's
> nonsense, pure and simple.
>
>> IMHO, I have thought your approach is very rough and have some problems.
>> Therefore, I thought
>> it will be return with feed back by Tom.
>> I'm not sure about your patch in detail.
>
> How did you know it was very rough without reading it?
>
>> However, I think your patch have
>> another
>>  porblem that is happened in lessor write-back cache enviroment systems
>> which are
>> like Windows system. It may cause extreme less performance in these systems.
>> Did
>> you test it? When we use shared_buffers, it does not let you do physical-
>> disk-write untill we want to write it. But your patch cannot control it, it
>> may
>> cause more lessor performance than linux systems. It will be less
>> performance
>> than removing LWLock. And your patch might works well only at linux system
>> or
>> having good write-back cache enviroment systems.
>
> This is completely fatuous. Tom spent a number of days scrutinizing it
> in detail. Perhaps you should actually read the patch, or read the
> discussion, and then you'll know why he concluded that my approach was
> fundamentally sound. If you have a filesystem that can see huge spikes
> in latency due to a write() of a relatively tiny query text, you have
> many problems, but blocking other queries that need to record
> statement execution costs in the shared hashtable actually isn't one
> of them.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Union-ifying RangeTblEntry

2014-01-27 Thread Craig Ringer
Hi all

I'm about to have to add _another_ flag to RangeTblEntry, to track
row-security expansion.

In the process I noticed the comment:

/*
 * XXX the fields applicable to only some rte kinds should be
 * merged into a union.  I didn't do this yet because the diffs
 * would impact a lot of code that is being actively worked on.
 * FIXME someday.
 */

and it struck me that the end of the 9.4 commitfest might be a
reasonable time to do this now that PstgreSQL is subject to "pulsed"
development with commitfests.

As part of that, a number of the flag fields on RangeTblEntry into a
bitfield.

Comments?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-01-27 Thread Rajeev rastogi


On 27th January, Mitsumasa KONDO wrote:
> > 2014-01-26 Simon Riggs  > >
> >
> > On 21 January 2014 19:48, Simon Riggs  > > wrote:
> >  > On 21 January 2014 12:54, KONDO Mitsumasa
>  > > wrote:
> >  >> Rebased patch is attached.
> >  >
> >  > Does this fix the Windows bug reported by Kumar on 20/11/2013 ?
> Sorry, I was misunderstanding. First name of Mr. Rajeev Rastogi is
> Kumar! I searched only e-mail address and title by his name...
> 
> I don't have windows compiler enviroment, but attached patch might be
> fixed.
> Could I ask Mr. Rajeev Rastogi to test my patch again?

I tried to test this but I could not apply the patch on latest git HEAD.
This may be because of recent patch (related to pg_stat_statement only
"pg_stat_statements external query text storage "), which got committed 
on 27th January.


Thanks and Regards,
Kumar Rajeev Rastogi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Infinite recursion in row-security based on updatable s.b. views

2014-01-27 Thread Craig Ringer
On 01/24/2014 07:16 PM, Dean Rasheed wrote:

> My first thought is to add a boolean flag to RangeTblEntry (similar to
> the "inh" flag) to say whether RLS expansion is requested for that
> RTE. Then set it to false on each RTE as you add new RLS quals to it's
> securityQuals.

That's what I was getting at with adding flags to RangeTblEntry, yes.

Given the number of flags we're growing I wonder if they should be
consolodated into a bitmask, but I'll leave that problem for later.

I'll do this part first, since it seems you agree that a RangeTblEntry
flag is the appropriate path. That'll make row-security checking work
and make the patch testable.

It won't deal with recursive rules, they'll still crash the backend.
I'll deal with that as a further step.

> In addition, when adding RLS quals to an RTE, I think they should be
> fully and recursively expanded immediately, before setting the new
> flag to false and moving on --- think recursively calling the rewriter
> to expand view references in the new RLS qual, and
> expand_security_qual() to expand any additional RLS quals in the
> securityQuals list --- with loop detection there. I'm not pretending
> that's going to be easy, but there are a couple of existing pieces of
> code to borrow ideas from. Doing it this way should make it possible
> to do the loop detection without any global structures.

Ugh. I was really hoping to avoid *another* place where we do recursive
expansion and infinite recursion checking, especially when the rewriter
already does this job.

Unfortunately, so long as the rewriter doesn't know about inheritance,
it cannot fully solve this problem. A mutually recursive rule involving
inheritance wouldn't get detected by rewriter based checking.

The original RLS patch only detects direct recursion, btw, it looks like
it won't catch mutual recursion.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix comment typo in /src/backend/command/cluster.c

2014-01-27 Thread Michael Paquier
On Tue, Jan 28, 2014 at 2:29 PM, David Fetter  wrote:
> On Tue, Jan 28, 2014 at 04:48:35PM +1300, Gavin Flower wrote:
>> On 28/01/14 16:33, Andrew Dunstan wrote:
>> >On 01/27/2014 10:24 PM, Sawada Masahiko wrote:
>> >>Hi all,
>> >>
>> >>Attached patch fixes the typo which is in
>> >>"src/backend/command/cluster.c".
>> >
>> >Are you sure that's a typo? "iff" is usually short hand for "if
>> >and only if".
>> >
>> >cheers
>> >
>> >andrew
>> >
>> >
>> Certainly, that is how I would interpret it.
>>
>> I came across that abbreviation in a first years Maths course
>> "Principles of Mathematics" in 1968 at the University of Auckland..
>
> By my rough count (ack -l '\biff\b' |wc -l), it's used to mean
> equivalence 81 times in the source tree.  Should we have a glossary of
> such terms?
And what about directly replacing those expressions in the comments of
the code with some more understandable language? This would be more
suited for non-native English speakers than maintaining a glossary
that you can surely find here and there after some googling.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Partial sort

2014-01-27 Thread Alexander Korotkov
On Tue, Jan 28, 2014 at 7:41 AM, Marti Raudsepp  wrote:

> On Mon, Jan 27, 2014 at 9:26 PM, Alexander Korotkov
>  wrote:
> > For now, I have attempt to fix extra columns in mergejoin problem. It
> would
> > be nice if you test it.
>
> Yes, it solves the test cases I was trying with, thanks.
>
> > 1) With enable_partialsort = off all mergejoin logic should behave as
> > without partial sort patch.
> > 2) With partial sort patch get_cheapest_fractional_path_for_pathkeys
> > function is much more expensive to execute. With enable_partialsort =
> off it
> > should be as cheap as without partial sort patch.
>
> When it comes to planning time, I really don't think you should
> bother. The planner enable_* settings are meant for troubleshooting,
> debugging and learning about the planner. You should not expect people
> to disable them in a production setting. It's not worth complicating
> the code for that rare case.
>
> This is stated in the documentation
> (http://www.postgresql.org/docs/current/static/runtime-config-query.html)
> and repeatedly on the mailing lists.
>
> But some benchmarks of planning performance are certainly warranted.
>

I didn't test it, but I worry that overhead might be high.
If it's true then it could be like constraint_exclusion option which id off
by default because of planning overhead.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it

2014-01-27 Thread Amit Kapila
On Wed, Jan 22, 2014 at 8:56 PM, Tom Lane  wrote:
> Amit Kapila  writes:
>> On Tue, Jan 21, 2014 at 8:25 PM, Robert Haas  wrote:
>>> While reviewing, I noted that the
>>> "skipping missing configuration file" message in ParseConfigFile()
>>> uses an elevel of LOG, while the other messages in the same file use
>>> "elevel".  I'm thinking that's a bug.
>
>> It might not be right for all cases, but I think this is saving us in few 
>> cases
>> when the caller (directly or indirectly) has sent elevel as ERROR or FATAL,
>> in such cases we just want to LOG it, if strict is not true. Now it might not
>> be appropriate if the caller has sent DEBUG2 and we use LOG, but to
>> fix it (if this sounds an issue), some more analysis is required, so let's
>> keep it as it is for now.
>
> The problem with this coding is that at SIGHUP, *all* the postgres
> processes will bleat about a missing file.
>
> To avoid that kind of log spam, the usual practice is to code the
> elog level as something like "IsUnderPostmaster ? DEBUG2 : LOG";
> see the elevel setting in ProcessConfigFile() for precedent.
> In fact, I'm pretty sure that the elevel passed to ParseConfigFile
> should always have come from that logic.

   The reason why I have mentioned that it can receive different elevel
   than what ProcessConfigFile() sends is that ParseConfigFile() gets
   called from ParseConfigFp() which gets called from other paths
   (recovery, parse_extension.. ) as well which sends elevel as ERROR
   or higher. However ParseConfigFp() when called from those paths
   will not call ParseConfigFile() as none of them will contain any
   include directive, but code as written doesn't ensure that it will
   always come elevel set similar to ProcessConfigFile().

> In any case, your argument is bogus because we could already have
> thrown an error at the given elevel further up in ParseConfigFile,
> or later in ParseConfigFp.

  True, but not for this kind of case (ignore if file not present).



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime

2014-01-27 Thread Amit Langote
On Tue, Jan 28, 2014 at 1:41 PM, Amit Kapila  wrote:
> On Mon, Jan 27, 2014 at 7:48 PM, Amit Langote  wrote:
>> On Mon, Jan 13, 2014 at 2:50 PM, Amit Kapila  wrote:
>>> I have extended test (contrib) module dsm_demo such that now user
>>> can specify during dsm_demo_create the lifespan of segment.
>>
>> Applied dsm_keep_segment_v1.patch and dsm_demo_v1.patch.
>> Got the following warning when I tried above example:
>>
>> postgres=# select dsm_demo_create('this message is from session-new', 1);
>> WARNING:  dynamic shared memory leak: segment 1402373971 still referenced
>> WARNING:  dynamic shared memory leak: segment 1402373971 still referenced
>>  dsm_demo_create
>> -
>>   1402373971
>> (1 row)
>
> Thanks for verification.
> The reason is that resource owner has reference to segment till commit time
> which leads to this warning. The fix is to remove reference from
> resource owner when user calls this new API dsm_keep_segment, similar
> to what currently dsm_keep_mapping does.
>
> Find the updated patch to fix this problem attached with this mail.
>

Thanks, the warnings are gone.

--
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix comment typo in /src/backend/command/cluster.c

2014-01-27 Thread David Fetter
On Tue, Jan 28, 2014 at 04:48:35PM +1300, Gavin Flower wrote:
> On 28/01/14 16:33, Andrew Dunstan wrote:
> >On 01/27/2014 10:24 PM, Sawada Masahiko wrote:
> >>Hi all,
> >>
> >>Attached patch fixes the typo which is in
> >>"src/backend/command/cluster.c".
> >
> >Are you sure that's a typo? "iff" is usually short hand for "if
> >and only if".
> >
> >cheers
> >
> >andrew
> >
> >
> Certainly, that is how I would interpret it.
> 
> I came across that abbreviation in a first years Maths course
> "Principles of Mathematics" in 1968 at the University of Auckland..

By my rough count (ack -l '\biff\b' |wc -l), it's used to mean
equivalence 81 times in the source tree.  Should we have a glossary of
such terms?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup and pg_stat_tmp directory

2014-01-27 Thread Amit Kapila
On Tue, Jan 28, 2014 at 9:26 AM, Fujii Masao  wrote:
> Hi,
>
> The files in pg_stat_tmp directory don't need to be backed up because they are
> basically reset at the archive recovery. So I think it's worth
> changing pg_basebackup
> so that it skips any files in pg_stat_tmp directory. Thought?

I think this is good idea, but can't it also avoid
PGSTAT_STAT_PERMANENT_TMPFILE along with temp files in
pg_stat_tmp



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup and pg_stat_tmp directory

2014-01-27 Thread Fujii Masao
On Tue, Jan 28, 2014 at 1:08 PM, Michael Paquier
 wrote:
> On Tue, Jan 28, 2014 at 12:56 PM, Fujii Masao  wrote:
>> Hi,
>>
>> The files in pg_stat_tmp directory don't need to be backed up because they 
>> are
>> basically reset at the archive recovery. So I think it's worth
>> changing pg_basebackup
>> so that it skips any files in pg_stat_tmp directory. Thought?
> Skipping pgstat_temp_directory in basebackup.c would make more sense
> than directly touching pg_basebackup.
> My 2c.

Yeah, that's what I was thinking :)

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime

2014-01-27 Thread Amit Kapila
On Mon, Jan 27, 2014 at 7:48 PM, Amit Langote  wrote:
> On Mon, Jan 13, 2014 at 2:50 PM, Amit Kapila  wrote:
>> I have extended test (contrib) module dsm_demo such that now user
>> can specify during dsm_demo_create the lifespan of segment.
>
> Applied dsm_keep_segment_v1.patch and dsm_demo_v1.patch.
> Got the following warning when I tried above example:
>
> postgres=# select dsm_demo_create('this message is from session-new', 1);
> WARNING:  dynamic shared memory leak: segment 1402373971 still referenced
> WARNING:  dynamic shared memory leak: segment 1402373971 still referenced
>  dsm_demo_create
> -
>   1402373971
> (1 row)

Thanks for verification.
The reason is that resource owner has reference to segment till commit time
which leads to this warning. The fix is to remove reference from
resource owner when user calls this new API dsm_keep_segment, similar
to what currently dsm_keep_mapping does.

Find the updated patch to fix this problem attached with this mail.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


dsm_keep_segment_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inherit support for foreign tables

2014-01-27 Thread Kouhei Kaigai
> > I wonder what shall be the cases when foreign table is on a server which
> does not support *all* SQL features.
> >
> > Does a FDW need to have the possible inherit options mentioned in its
> documentation for this patch?
> 
> The answer is no, in my understanding.  The altering operation simply
> declares some chages for foreign tables in the inheritance tree and does
> nothing to the underlying storages.
>
It seems to me Atri mention about the idea to enforce constraint when
partitioned foreign table was referenced...

BTW, isn't it feasible to consign FDW drivers its decision?
If a foreign table has a check constraint (X BETWEEN 101 AND 200),
postgres_fdw will be capable to run this check on the remote server,
however, file_fdw will not be capable. It is usual job of them when
qualifiers are attached on Path node.
Probably, things to do is simple. If the backend appends the configured
check constraint as if a user-given qualifier, FDW driver will handle it
appropriately.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei 


> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Etsuro Fujita
> Sent: Tuesday, January 28, 2014 1:08 PM
> To: Atri Sharma; David Fetter
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] inherit support for foreign tables
> 
> (2014/01/28 0:55), Atri Sharma wrote:
> >> On 27-Jan-2014, at 21:03, David Fetter  wrote:
> >>> On Mon, Jan 27, 2014 at 05:06:19PM +0900, Etsuro Fujita wrote:
> >>> Hi Hanada-san,
> >>>
> >>> While still reviwing this patch, I feel this patch has given enough
> >>> consideration to interactions with other commands, but found the
> >>> following incorrect? behabior:
> >>>
> >>> postgres=# CREATE TABLE product (id INTEGER, description TEXT);
> >>> CREATE TABLE postgres=# CREATE FOREIGN TABLE product1 () INHERITS
> >>> (product) SERVER fs OPTIONS (filename '/home/foo/product1.csv',
> >>> format 'csv'); CREATE FOREIGN TABLE postgres=# ALTER TABLE product
> >>> ALTER COLUMN description SET STORAGE EXTERNAL;
> >>> ERROR:  "product1" is not a table or materialized view
> >>>
> >>> ISTN the ALTER TABLE simple recursion mechanism (ie
> >>> ATSimpleRecursion()) should be modified for the ALTER COLUMN SET
> >>> STORAGE case.
> >>
> >> This points to a larger discussion about what precisely foreign
> >> tables can and cannot inherit from local ones.  I don't think that a
> >> generic solution will be satisfactory, as the PostgreSQL FDW could,
> >> at least in principle, support many more than the CSV FDW, as shown above.
> >>
> >> In my estimation, the outcome of discussion above is not a blocker
> >> for this
> 
> I just thought that among the structures that local tables can alter, the
> ones that foreign tables also can by ALTER FOREIGN TABLE are inherited,
> and the others are not inherited.  So for the case as shown above, I thought
> that we silently ignore executing the ALTER COLUMN SET STORAGE command for
> the foreign table.  I wonder that would be the first step.
> 
> > I wonder what shall be the cases when foreign table is on a server which
> does not support *all* SQL features.
> >
> > Does a FDW need to have the possible inherit options mentioned in its
> documentation for this patch?
> 
> The answer is no, in my understanding.  The altering operation simply
> declares some chages for foreign tables in the inheritance tree and does
> nothing to the underlying storages.
> 
> Thanks,
> 
> Best regards,
> Etsuro Fujita
> 
> 
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
> changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch (v2) for updatable security barrier views

2014-01-27 Thread Kouhei Kaigai
> > AFAICS the only area of objection is the handling of inherited
> > relations, which occurs within the planner in the current patch. I can
> > see that would be a cause for concern since the planner is pluggable
> > and it would then be possible to bypass security checks. Obviously
> > installing a new planner isn't trivial, but doing so shouldn't cause
> > collateral damage.
> 
> FWIW, I don't see any way _not_ to do that in RLS. If there are security
> quals on a child table, they must be added, and that can only happen once
> inheritance expansion happens. That's in the planner.
> 
> I don't see it as acceptable to ignore security quals on child tables, and
> if we can't, we've got to do some work in the planner.
> 
> (I'm starting to really loathe inheritance).
>
Let me ask an elemental question. What is the reason why inheritance
expansion logic should be located on the planner stage, not on the tail
of rewriter?
Reference to a relation with children is very similar to reference of
multiple tables using UNION ALL. Isn't it a crappy idea to move the
logic into rewriter stage (if we have no technical reason here)?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei 


> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Craig Ringer
> Sent: Tuesday, January 28, 2014 12:35 PM
> To: Simon Riggs; Dean Rasheed
> Cc: Kaigai, Kouhei(海外, 浩平); Tom Lane; PostgreSQL Hackers; Kohei KaiGai;
> Robert Haas
> Subject: Re: [HACKERS] WIP patch (v2) for updatable security barrier views
> 
> On 01/28/2014 12:09 AM, Simon Riggs wrote:
> > On 27 January 2014 15:04, Dean Rasheed  wrote:
> >
> >> So for example, when planning the query to update an inheritance
> >> child, the rtable will contain an RTE for the parent, but it will not
> >> be referenced in the parse tree, and so it will not be expanded while
> >> planning the child update.
> >
> > Am I right in thinking that we have this fully working now?
> 
> I haven't found any further problems, though I've been focusing more on
> reworking RLS on top of it.
> 
> > AFAICS the only area of objection is the handling of inherited
> > relations, which occurs within the planner in the current patch. I can
> > see that would be a cause for concern since the planner is pluggable
> > and it would then be possible to bypass security checks. Obviously
> > installing a new planner isn't trivial, but doing so shouldn't cause
> > collateral damage.
> 
> FWIW, I don't see any way _not_ to do that in RLS. If there are security
> quals on a child table, they must be added, and that can only happen once
> inheritance expansion happens. That's in the planner.
> 
> I don't see it as acceptable to ignore security quals on child tables, and
> if we can't, we've got to do some work in the planner.
> 
> (I'm starting to really loathe inheritance).
> 
> > We have long had restrictions around updateable views. My suggestion
> > from here is that we accept the restriction that we cannot yet have
> > the 3-way combination of updateable views, security views and views on
> > inherited tables.
> 
> That prevents the use of updatable security barrier views over partitioned
> tables, and therefore prevents row-security use on inherited tables.
> 
> That seems like a very big thing to close off. I'm perfectly happy having
> that limitation for 9.4, we just need to make it possible to remove the
> limitation later.
> 
> > Most people aren't using inherited tables
> 
> Again, because we (ab)use them for paritioning, I'm not sure they're as
> little-used as I'd like.
> 
> > and people that are have
> > special measures in place for their apps. We won't lose much by
> > accepting that restriction for 9.4 and re-addressing the issue in a
> > later release.
> 
> Yep, I'd be happy with that.
> 
> 
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
> 
> 
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
> changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup and pg_stat_tmp directory

2014-01-27 Thread Michael Paquier
On Tue, Jan 28, 2014 at 12:56 PM, Fujii Masao  wrote:
> Hi,
>
> The files in pg_stat_tmp directory don't need to be backed up because they are
> basically reset at the archive recovery. So I think it's worth
> changing pg_basebackup
> so that it skips any files in pg_stat_tmp directory. Thought?
Skipping pgstat_temp_directory in basebackup.c would make more sense
than directly touching pg_basebackup.
My 2c.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inherit support for foreign tables

2014-01-27 Thread Etsuro Fujita

(2014/01/28 0:55), Atri Sharma wrote:

On 27-Jan-2014, at 21:03, David Fetter  wrote:

On Mon, Jan 27, 2014 at 05:06:19PM +0900, Etsuro Fujita wrote:
Hi Hanada-san,

While still reviwing this patch, I feel this patch has given enough
consideration to interactions with other commands, but found the
following incorrect? behabior:

postgres=# CREATE TABLE product (id INTEGER, description TEXT);
CREATE TABLE
postgres=# CREATE FOREIGN TABLE product1 () INHERITS (product)
SERVER fs OPTIONS (filename '/home/foo/product1.csv', format 'csv');
CREATE FOREIGN TABLE
postgres=# ALTER TABLE product ALTER COLUMN description SET STORAGE
EXTERNAL;
ERROR:  "product1" is not a table or materialized view

ISTN the ALTER TABLE simple recursion mechanism (ie
ATSimpleRecursion()) should be modified for the ALTER COLUMN SET
STORAGE case.


This points to a larger discussion about what precisely foreign tables
can and cannot inherit from local ones.  I don't think that a generic
solution will be satisfactory, as the PostgreSQL FDW could, at least
in principle, support many more than the CSV FDW, as shown above.

In my estimation, the outcome of discussion above is not a blocker for
this


I just thought that among the structures that local tables can alter, 
the ones that foreign tables also can by ALTER FOREIGN TABLE are 
inherited, and the others are not inherited.  So for the case as shown 
above, I thought that we silently ignore executing the ALTER COLUMN SET 
STORAGE command for the foreign table.  I wonder that would be the first 
step.



I wonder what shall be the cases when foreign table is on a server which does 
not support *all* SQL features.

Does a FDW need to have the possible inherit options mentioned in its 
documentation for this patch?


The answer is no, in my understanding.  The altering operation simply 
declares some chages for foreign tables in the inheritance tree and does 
nothing to the underlying storages.


Thanks,

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_basebackup and pg_stat_tmp directory

2014-01-27 Thread Fujii Masao
Hi,

The files in pg_stat_tmp directory don't need to be backed up because they are
basically reset at the archive recovery. So I think it's worth
changing pg_basebackup
so that it skips any files in pg_stat_tmp directory. Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix comment typo in /src/backend/command/cluster.c

2014-01-27 Thread Sawada Masahiko
>
> On 01/27/2014 10:24 PM, Sawada Masahiko wrote:
>
>> Hi all,
>>
>> Attached patch fixes the typo which is in "src/backend/command/cluster.
>> c".
>>
>>
>
> Are you sure that's a typo? "iff" is usually short hand for "if and only
> if".
>
>
Oops, I made mistake.
Thanks!

Regards,

-
Masahiko Sawada


-- 
Regards,

---
Sawada Masahiko


Re: [HACKERS] INTERVAL overflow detection is terribly broken

2014-01-27 Thread Bruce Momjian
On Mon, Jan 27, 2014 at 07:19:21PM -0500, Tom Lane wrote:
> Bruce Momjian  writes:
> > Oh, one odd thing about this patch.  I found I needed to use INT64_MAX,
> > but I don't see it used anywhere else in our codebase.  Is this OK?  Is
> > there a better way?
> 
> Most of the overflow tests in int.c and int8.c are coded to avoid relying
> on the MIN or MAX constants; which seemed like better style at the time.

Yes, I looked at those but they seemed like overkill for interval.  For
a case where there was an int64 multiplied by a double, then cast back
to an int64, I checked the double against INT64_MAX before casting to an
int64.

> I'm not sure whether relying on INT64_MAX to exist is portable.

The only use I found was in pgbench:

#ifndef INT64_MAX
#define INT64_MAX   INT64CONST(0x7FFF)
#endif

so I have just added that to my patch, and INT64_MIN:

#ifndef INT64_MIN
#define INT64_MIN   (-INT64CONST(0x7FFF) - 1)
#endif

This is only used for HAVE_INT64_TIMESTAMP.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix comment typo in /src/backend/command/cluster.c

2014-01-27 Thread Gavin Flower

On 28/01/14 16:33, Andrew Dunstan wrote:


On 01/27/2014 10:24 PM, Sawada Masahiko wrote:

Hi all,

Attached patch fixes the typo which is in 
"src/backend/command/cluster.c".





Are you sure that's a typo? "iff" is usually short hand for "if and 
only if".


cheers

andrew



Certainly, that is how I would interpret it.

I came across that abbreviation in a first years Maths course 
"Principles of Mathematics" in 1968 at the University of Auckland..



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Partial sort

2014-01-27 Thread Marti Raudsepp
On Mon, Jan 27, 2014 at 9:26 PM, Alexander Korotkov
 wrote:
> For now, I have attempt to fix extra columns in mergejoin problem. It would
> be nice if you test it.

Yes, it solves the test cases I was trying with, thanks.

> 1) With enable_partialsort = off all mergejoin logic should behave as
> without partial sort patch.
> 2) With partial sort patch get_cheapest_fractional_path_for_pathkeys
> function is much more expensive to execute. With enable_partialsort = off it
> should be as cheap as without partial sort patch.

When it comes to planning time, I really don't think you should
bother. The planner enable_* settings are meant for troubleshooting,
debugging and learning about the planner. You should not expect people
to disable them in a production setting. It's not worth complicating
the code for that rare case.

This is stated in the documentation
(http://www.postgresql.org/docs/current/static/runtime-config-query.html)
and repeatedly on the mailing lists.

But some benchmarks of planning performance are certainly warranted.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-27 Thread Robert Haas
On Mon, Jan 27, 2014 at 12:03 PM, Amit Kapila  wrote:
>> I think that's a good thing to try.  Can you code it up?
>
> I have tried to improve algorithm in another way so that we can get
> benefit of same chunks during find match (something similar to lz).
> The main change is to consider chunks at fixed boundary (4 byte)
> and after finding match, try to find if there is a longer match than
> current chunk. While finding longer match, it still takes care that
> next bigger match should be at chunk boundary. I am not
> completely sure about the chunk boundary may be 8 or 16 can give
> better results.
>
> I think now we can once run with this patch on high end m/c.

Here are the results I got on the community PPC64 box.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgrb-v5 test.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

2014-01-27 Thread Amit Kapila
On Mon, Jan 27, 2014 at 9:01 PM, Tom Lane  wrote:
> Amit Kapila  writes:
>> On Fri, Jan 24, 2014 at 9:10 AM, Amit Kapila  wrote:
>
>> To proceed with the review of this patch, I need to know about
>> whether appending version number or any other constant togli
>
>> Default Event Source name is acceptable or not, else for now
>> we can remove this part of code from patch and handle non-default
>> case where the change will be that pg_ctl will enquire non-default
>> event_source value from server.
>
>> Could you please let me know your views about same?
>
> Unless I'm missing something, this entire thread is a tempest in a teapot,
> because the default event_source value does not matter, because *by
> default we don't log to eventlog*.  The presumption is that if the user
> turns on logging to eventlog, it's his responsibility to first make sure
> that event_source is set to something appropriate.  And who's to say that
> plain "PostgreSQL" isn't what he wanted, anyway?  Even if he's got
> multiple servers on one machine, maybe directing all their logs to the
> same place is okay by him.

I think it's matter of user preference, how exactly he wants the setup
and as currently we don't have any strong reason to change default, so
lets keep it intact.

> Also, those who don't run multiple servers are probably not going to
> thank us for moving their logs around unnecessarily.
>
> In short, I think we should just reject this idea as introducing more
> problems than it solves, and not fully solving even the problem it
> purports to solve.
>
> Possibly there's room for a documentation patch reminding users to
> make sure that event_source is set appropriately before they turn
> on eventlog.

 Okay, but in that case also right now pg_ctl doesn't know the value
 of event source, so I think thats a clear bug and we should go ahead
 and fix it.

 As you said, I think we can improve documentation in this regard so
 that user will be able to setup event log without any such problems.

 As part of this patch we can fix the issue (make pg_ctl aware for event
 source name) and improve documentation.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch (v2) for updatable security barrier views

2014-01-27 Thread Craig Ringer
On 01/28/2014 12:09 AM, Simon Riggs wrote:
> On 27 January 2014 15:04, Dean Rasheed  wrote:
> 
>> So for example, when planning the query to update an inheritance
>> child, the rtable will contain an RTE for the parent, but it will not
>> be referenced in the parse tree, and so it will not be expanded while
>> planning the child update.
> 
> Am I right in thinking that we have this fully working now?

I haven't found any further problems, though I've been focusing more on
reworking RLS on top of it.

> AFAICS the only area of objection is the handling of inherited
> relations, which occurs within the planner in the current patch. I can
> see that would be a cause for concern since the planner is pluggable
> and it would then be possible to bypass security checks. Obviously
> installing a new planner isn't trivial, but doing so shouldn't cause
> collateral damage.

FWIW, I don't see any way _not_ to do that in RLS. If there are security
quals on a child table, they must be added, and that can only happen
once inheritance expansion happens. That's in the planner.

I don't see it as acceptable to ignore security quals on child tables,
and if we can't, we've got to do some work in the planner.

(I'm starting to really loathe inheritance).

> We have long had restrictions around updateable views. My suggestion
> from here is that we accept the restriction that we cannot yet have
> the 3-way combination of updateable views, security views and views on
> inherited tables.

That prevents the use of updatable security barrier views over
partitioned tables, and therefore prevents row-security use on inherited
tables.

That seems like a very big thing to close off. I'm perfectly happy
having that limitation for 9.4, we just need to make it possible to
remove the limitation later.

> Most people aren't using inherited tables

Again, because we (ab)use them for paritioning, I'm not sure they're as
little-used as I'd like.

> and people that are have
> special measures in place for their apps. We won't lose much by
> accepting that restriction for 9.4 and re-addressing the issue in a
> later release.

Yep, I'd be happy with that.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix comment typo in /src/backend/command/cluster.c

2014-01-27 Thread Andrew Dunstan


On 01/27/2014 10:24 PM, Sawada Masahiko wrote:

Hi all,

Attached patch fixes the typo which is in "src/backend/command/cluster.c".




Are you sure that's a typo? "iff" is usually short hand for "if and only 
if".


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fix comment typo in /src/backend/command/cluster.c

2014-01-27 Thread Sawada Masahiko
Hi all,

Attached patch fixes the typo which is in "src/backend/command/cluster.c".

Regards,

---
Sawada Masahiko


fix_typo-cluster.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add force option to dropdb

2014-01-27 Thread Sawada Masahiko
On 2014年1月17日 0:56, salah jubeh  wrote:
>
>>If the user owns objects, that will prevent this from working also.  I
>>have the feeling that adding DROP OWNED BY and/or REASSIGNED OWNED BY
>>calls to this utility would be a bit excessive, but who knows.
>
> Please find attached the first attempt to drop drop the client connections.
> I have added an option -k, --kill instead of force since killing client
> connection does not guarantee -drop force-.
> Regards
>
>
> On Tuesday, January 14, 2014 8:06 PM, Alvaro Herrera
>  wrote:
> salah jubeh wrote:
>
>> For the sake of completeness:
>> 1. I think also, I need also to temporary disallow conecting to the
>> database, is that right?
>> 2. Is there other factors can hinder dropping database?
>
> If the user owns objects, that will prevent this from working also.  I
> have the feeling that adding DROP OWNED BY and/or REASSIGNED OWNED BY
> calls to this utility would be a bit excessive, but who knows.
>
>
>> 3. Should I write two patches one for pg_version>=9.2 and one for
>> pg_version<9.2
>
>
> No point -- nothing gets applied to branches older than current
> development anyway.
>

Thank you for the patch.
And sorry for delay in reviewing.

I started to look this patch, So the following is first review comment.

- This patch is not patched to master branch
I tried to patch this patch file to master branch, but I got following error.
$ cd postgresql
$ patch -d. -p1 < ../dropdb.patch
can't find fiel to patch at input line 3
Perhaps you used the wrong -p or --strip option?
the text leading up to this was:
--
|--- dropdb_org.c 2014-01-16
|+++ dropdb.c 2014-01-16
--

There is not dropdb_org.c. I think  that you made mistake when the
patch is created.

- This patch is not according the coding rule
For example, line 71 of the patch:
//new connections are not allowed
It should be:
/* new connections are not allowed */
(Comment blocks that need specific line breaks should be formatted as
block comments, where the comment starts as /*--.)
Please refer to coding rule.


Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freezing without write I/O

2014-01-27 Thread Robert Haas
On Mon, Jan 27, 2014 at 4:16 PM, Simon Riggs  wrote:
> On 26 January 2014 12:58, Andres Freund  wrote:
>> On 2014-01-25 20:26:16 -0800, Peter Geoghegan wrote:
>>> Shouldn't this patch be in the January commitfest?
>>
>> I think we previously concluded that there wasn't much chance to get
>> this into 9.4 and there's significant work to be done on the patch
>> before new reviews are required, so not submitting it imo makes sense.
>
> I think we should make this a priority feature for 9.5

+1.  I can't think of many things we might do that would be more important.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog

2014-01-27 Thread Peter Eisentraut
On 11/30/13, 6:59 AM, Haribabu kommi wrote:
> To detect provided data and xlog directories are same or not, I reused the
> Existing make_absolute_path() code as follows.

I note that initdb does not detect whether the data and xlog directories
are the same.  I think there is no point in addressing this only in
pg_basebackup.  If we want to forbid it, it should be done in initdb
foremost.

I'm not sure it's worth the trouble, but if I were to do it, I'd just
stat() the two directories and compare their inodes.  That seems much
easier and more robust than comparing path strings.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] git-review: linking commits to review discussion in git

2014-01-27 Thread Murtuza Mukadam
Hello All,
  We have linked peer review discussions on
'pgsql-hackers' to their respective commits within the main
postgresql.git repository. You can view the linked reviews from 2012
until present in the GitHub repo at
https://github.com/mmukadam/postgres/tree/review

If you want to work with these reviews locally, you can use our
git-review tool. It allows you to create reviews and attach them to
commits in git. We didn't modify git, instead we added some scripts
that use standard git commands. git-review is beta, but since it only
adds a detached 'review' branch and modifies the contents of this
branch, it has minimal impact and can easily be removed by deleting
the 'review' branch and scripts.

The online man-page is here:
http://users.encs.concordia.ca/~m_mukada/git-review.html

In order to install git-review, you need to clone the repository:
https://github.com/mmukadam/git-review.git

The online tutorial is available here:
http://users.encs.concordia.ca/~m_mukada/git-review-tutorial.html

The clone of postgresql.git with linked review discussion is here (new
review discussion are linked nightly)
https://github.com/mmukadam/postgres

This work is part of my Master's thesis. If you'd like us to change
the tool to better suit your review process, have another git repo
you'd like us to link commits with review discussion, or have other
feedback, please let us know.

Cheers,
Murtuza


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Race condition in b-tree page deletion

2014-01-27 Thread Peter Geoghegan
On Sat, Jan 18, 2014 at 11:45 AM, Kevin Grittner  wrote:
> Heikki Linnakangas  wrote:
>
>> Here's a patch implementing this scheme.

I thought I'd weigh in here too, since this is closely tied to the
page split patch, which is dependent on this patch.

> The basic approach is sound.  The papers on which we based our
> btree implementation did not discuss entry deletion, and our
> current approach introduces new possible states into the on-disk
> index structure which are not covered by the papers and are not
> always handled correctly by the current code.

Lehman and Yao don't discuss deletion. But V. Lanin and D. Shasha. do,
in the paper "A Symmetric Concurrent B-Tree Algorithm"[1], as the
README notes - we currently follow a "simplified" version of that.
Apparently a number of people proposed solutions to address the
omission of L&Y, as one survey of those approaches notes [2]. One of
the approaches appearing in that survey is L&S.

The classic L&Y algorithm only has right-links, and not left-links.
The same applies to L&S. But I'd describe this patch as bringing what
we do closer to L&S, which is more or less a refinement of B-Link
trees (that is, L&Y B-Trees). L&S does have a concept of an outlink,
crucial to their "symmetric deletion approach", which is something
that we don't need, because we already have left-links (we have them
primarily to support backwards index scans). L&S says "In general, the
link technique calls for processes that change sections of a data
structure in a way that would cause other processes to fail to provide
a link to another section of the data structure where recovery is
possible". So loosely speaking we're doing a "conceptual
_bt_moveleft()" for deletion as the inverse of a literal _bt_moveright
for insertion.

L&S says "a deletion needs no more than two write locks at a time
during its ascent" (the descent is just a garden variety _bt_search()
insertion scankey descent). However, we currently may obtain as many
as 4 write locks for "page deletion". As the README notes:

"""
To delete an empty page, we acquire write lock on its left sibling (if
any), the target page itself, the right sibling (there must be one), and
the parent page, in that order.
"""

We obtain 2 write locks in the first phase (on target and parent). In
the second phase, we do what is described above: lock 4 pages in that
order. However, the reason for this difference from the L&S paper is
made clear in "2.5 Freeing Empty nodes", which kind of cops out of
addressing how to *unlink* empty/half-dead pages, with a couple of
brief descriptions of approaches that others have come up with that
are not at all applicable to us.

For that reason, might I suggest that you change this in the README:

+ Page Deletion
+ -

I suggest that it should read "Page Deletion and Unlinking" to
emphasize the distinction.

> The general feeling is that this patch is not suitable for
> back-patching.  I agree with this, but this is a bug which could
> lead to index corruption which exists in all supported branches.
> If nobody can suggest an alternative which we can back-patch, we
> might want to reconsider this after this has been in production
> releases for several months.

That was what I thought. Let's revisit the question of back-patching
this when 9.4 has been released for 6 months or so.

> Other than that, I have not found any problems.

Incidentally, during my research of these techniques, I discovered
that Johnson and Shasha argue that "for most concurrent B-tree
applications, merge-at-empty produces significantly lower
restructuring rates, and only a slightly lower space utilization, than
merge-at-half". This is covered in the paper "B-trees with inserts,
deletes, and modifies: Why Free-at-empty is Better than Merge-at-half"
[3]. I was pleased to learn that there was a sound, well regarded
theoretical basis for that behavior, even if the worst-case bloat can
be unpleasant. Though having said that, that paper doesn't weigh what
we do in the event of non-HOT updates, and that could change the
equation. That isn't an argument against merge-at-empty; it's an
argument against the current handling of non-HOT updates.

Currently, the comments above _bt_doinsert() say at one point:

 *  This routine is called by the public interface routines, btbuild
 *  and btinsert.

This is obsolete; since commit 89bda95d, only the insert public
interface routine calls _bt_doinsert(). Perhaps fix this in passing.

[1] L&S: 
https://archive.org/stream/symmetricconcurr00lani/symmetricconcurr00lani_djvu.txt

[2] http://www.dtic.mil/get-tr-doc/pdf?AD=ADA232287&Location=U2&doc=GetTRDoc.pdf
(Chapter 3, "The Coherent Shared Memory Algorithm")

[3] http://cs.nyu.edu/shasha/papers/bjournal.ps
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-01-27 Thread Kouhei Kaigai
Hi Stephen,

Thanks for your comments.

> * Kouhei Kaigai (kai...@ak.jp.nec.com) wrote:
> > Is somebody available to volunteer to review the custom-scan patch?
> 
> I looked through it a bit and my first take away from it was that the patches
> to actually use the new hooks were also making more changes to the backend
> code, leaving me with the impression that the proposed interface isn't
> terribly stable.  Perhaps those changes should have just been in the first
> patch, but they weren't and that certainly gave me pause.
> 
Yes, the part-1 patch provides a set of interface portion to interact
between the backend code and extension code. Rest of part-2 and part-3
portions are contrib modules that implements its feature on top of
custom-scan API.

> I'm also not entirely convinced that this is the direction to go in when
> it comes to pushing down joins to FDWs.  While that's certainly a goal that
> I think we all share, this seems to be intending to add a completely different
> feature which happens to be able to be used for that.  For FDWs, wouldn't
> we only present the FDW with the paths where the foreign tables for that
> FDW, or perhaps just a given foreign server, are being joined?
>
FDW's join pushing down is one of the valuable use-cases of this interface,
but not all. As you might know, my motivation is to implement GPU acceleration
feature on top of this interface, that offers alternative way to scan or join
relations or potentially sort or aggregate.
Probably, it is too stretch interpretation if we implement radix-sort on top
of FDW. I'd like you to understand the part-3 patch (FDW's join pushing-down)
is a demonstration of custom-scan interface for application, but not designed
for a special purpose.

Right now, I put all the logic to interact CSI and FDW driver on postgres_fdw
side, it might be an idea to have common code (like a logic to check whether
the both relations to be joined belongs to same foreign server) on the backend
side as something like a gateway of them.


As an aside, what should be the scope of FDW interface?
In my understanding, it allows extension to implement "something" on behalf of
a particular data structure being declared with CREATE FOREIGN TABLE.
In other words, extension's responsibility is to generate a view of "something"
according to PostgreSQL' internal data structure, instead of the object itself.
On the other hands, custom-scan interface allows extensions to implement
alternative methods to scan or join particular relations, but it is not a role
to perform as a target being referenced in queries. In other words, it is 
methods
to access objects.
It is natural both features are similar because both of them intends extensions
to hook the planner and executor, however, its purpose is different.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INTERVAL overflow detection is terribly broken

2014-01-27 Thread Tom Lane
Bruce Momjian  writes:
> Oh, one odd thing about this patch.  I found I needed to use INT64_MAX,
> but I don't see it used anywhere else in our codebase.  Is this OK?  Is
> there a better way?

Most of the overflow tests in int.c and int8.c are coded to avoid relying
on the MIN or MAX constants; which seemed like better style at the time.
I'm not sure whether relying on INT64_MAX to exist is portable.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-01-27 Thread Peter Geoghegan
On Mon, Jan 27, 2014 at 2:01 PM, Andrew Dunstan  wrote:
> I care very much what the module does to the performance of all statements.
> But I don't care much if selecting from pg_stat_statements itself is a bit
> slowed. Perhaps I didn't express myself as clearly as I could have.

Oh, I see. Of course the overhead of calling the pg_stat_statements()
function is much less important. Actually, I think that calling that
function is going to add a lot of noise to any benchmark aimed at
measuring added overhead as the counters struct is expanded.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A minor correction in comment in heaptuple.c

2014-01-27 Thread Kevin Grittner
Bruce Momjian  wrote:
> On Tue, Jun 18, 2013 at 11:04:25AM -0700, Kevin Grittner wrote:
>> D'Arcy J.M. Cain 
>>
>>> Although, the more I think about it, the more I think that the
>>> comment is both confusing and superfluous.  The code itself is
>>> much clearer.
>>
>> Seriously, if there is any comment there at all, it should be a
>> succinct explanation for why we didn't do this

> Is everyone OK with me applying this patch from Kevin, attached?

I guess my intent was misunderstood -- what I was trying to get
across was that the comment added exactly nothing to what you could
get more quickly by reading the code below it.  I felt the comment
should either be removed entirely, or a concise explanation of why
it was right thing to do should be there rather than just echoing
the code in English.  I wasn't suggesting applying the mini-patch,
but suggesting that *if* we have a comment there at all, it should
make clear why such a patch would be wrong; i.e., why is it is OK
to have attnum > tupdesc->natts here?  How would we get to such a
state, and why is NULL the right thing for it?  Such a comment
would actually help someone trying to understand the code, rather
than wasting their time.  After all, in the same file we have this:

    /* Check for caller error */
    if (attnum <= 0 || attnum > slot->tts_tupleDescriptor->natts)
    elog(ERROR, "invalid attribute number %d", attnum);

An explanation of why it is caller error one place and not another
isn't a waste of space.

>> (which passes `make check-world`)

And I was disappointed that our regression tests don't actually
exercise that code path, which would be another good way to make
the point.

So anyway, *I* would object to applying that; it was meant to
illustrate what the comment, if any, should cover; not to be an
actual code change.  I don't think the change that was pushed helps
that comment carry its own weight, either.  If we can't do better
than that, we should just drop it.

I guess I won't try to illustrate a point *that* particular way
again

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-27 Thread Dean Rasheed
On 25 January 2014 02:21, Florian Pflug  wrote:
> I've now split this up into
>
> invtrans_base: Basic machinery plus tests with SQL-language aggregates
> invtrans_arith: COUNT(),SUM(),AVG(),STDDEV() and the like
> invtrans_minmax: MIN(),MAX(),BOOL_AND(),BOOL_OR()
> invtrans_collecting: ARRAY_AGG(), STRING_AGG()
> invtrans_docs: Documentation
>

Thanks. That makes it a bit easier to review, and hopefully easier to commit.

The thing that I'm currently trying to get my head round is the
null/not null, strict/not strict handling in this patch, which I find
a bit odd, particularly when transfn and inv_transfn have different
strictness settings:


strict transfn vs non-strict inv_transfn


This case is explicitly forbidden, both in CREATE AGGREGATE and in the
executor. To me, that seems overly restrictive --- if transfn is
strict, then you know for sure that no NULL values are added to the
aggregate state, so surely it doesn't matter whether or not
inv_transfn is strict. It will never be asked to remove NULL values.

I think there are definite use-cases where a user might want to use a
pre-existing function as the inverse transition function, so it seems
harsh to force them to wrap it in a strict function in this case.

AFAICS, the code in advance/retreat_windowaggregate should just work
if those checks are removed.


non-strict transfn vs strict inv_transfn


At first this seems as though it must be an error --- the forwards
transition function allows NULL values into the aggregate state, and
the inverse transition function is strict, so it cannot remove them.

But actually what the patch is doing in this case is treating the
forwards transition function as partially strict --- it won't be
passed NULL values (at least in a window context), but it may be
passed a NULL state in order to build the initial state when the first
non-NULL value arrives.

It looks like this behaviour is intended to support aggregates that
ignore NULL values, but cannot actually be made to have a strict
transition function. I think typically this is because the aggregate's
initial state is NULL and it's state type differs from the type of the
values being aggregated, and so can't be automatically created from
the first non-NULL value.

That all seems quite ugly though, because now you have a transition
function that is not strict, which is passed NULL values when used in
a normal aggregate context, and not passed NULL values when used in a
window context (whether sliding or not), except for the NULL state for
the first non-NULL value.

I'm not sure if there is a better way to do it though. If we disallow
this case, these aggregates would have to use non-strict functions for
both the forward and inverse transition functions, which means they
would have to implement their own non-NULL value counting.
Alternatively, allowing strict transition functions for these
aggregates would require that we provide some other way to initialise
the state from the first non-NULL input, such as a new initfn.

Thoughts?

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INTERVAL overflow detection is terribly broken

2014-01-27 Thread Bruce Momjian
On Mon, Jan 27, 2014 at 04:47:22PM -0500, Bruce Momjian wrote:
> The updated attached patch has overflow detection for interval
> subtraction, multiply, and negate.  There are also some comparison
> cleanups.

Oh, one odd thing about this patch.  I found I needed to use INT64_MAX,
but I don't see it used anywhere else in our codebase.  Is this OK?  Is
there a better way?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-01-27 Thread Andrew Dunstan


On 01/27/2014 04:37 PM, Peter Geoghegan wrote:

On Mon, Jan 27, 2014 at 4:45 AM, Andrew Dunstan  wrote:

I personally don't give a tinker's cuss about whether the patch slows down
pg_stat_statements a bit.

Why not? The assurance that the overhead is generally very low is what
makes it possible to install it widely usually without a second
thought. That's hugely valuable.





I care very much what the module does to the performance of all 
statements. But I don't care much if selecting from pg_stat_statements 
itself is a bit slowed. Perhaps I didn't express myself as clearly as I 
could have.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INTERVAL overflow detection is terribly broken

2014-01-27 Thread Bruce Momjian
On Sun, Jan 26, 2014 at 02:13:03PM +0100, Florian Pflug wrote:
> On Jan26, 2014, at 03:50 , Bruce Momjian  wrote:
> > Patch attached.
> 
> > +   if ((float)tm->tm_year * MONTHS_PER_YEAR + tm->tm_mon > INT_MAX)
> > +   return -1;
> 
> Is this bullet-proof? If float and int are both 32-bit, the float's mantissa
> will be less than 32-bit (24 or so, I think), and thus won't be able to
> represent INT_MAX accurately. This means there's a value of
> tm_year * MONTHS_PER_YEAR which is less than INT_MAX, yet for which
> tm_year * MONTHS_PER_YEAR + tm_mon will return tm_year * MONTHS_PER_YEAR
> unmodified if tm_mon is small enough (e.g, 1). But if tm_year * 
> MONTHS_PER_YEAR
> was close enough to INT_MAX, the actual value of
> tm_year * MONTHS_PER_YEAR + tm_mon might still be larger than INT_MAX,
> the floating-point computation just won't detect it. In that case, the
> check would succeed, yet the actual integer computation would still overflow.
> 
> It should be safe with "double" instead of "float".

You are correct.  I have adjusted the code to use "double".

> > +   if (SAMESIGN(span1->month, span2->month) &&
> > +   !SAMESIGN(result->month, span1->month))
> 
> This assumes wrapping semantics for signed integral types, which isn't
> guaranteed by the C standard - it says overflows of signed integral types
> produce undefined results. We currently depend on wrapping semantics for
> these types in more places, and therefore need GCC's "-fwrapv" anway, but
> I still wonder if adding more of these kinds of checks is a good idea.

Well, I copied this from int.c, so what I did was to mention the other
function I got this code from.  If we ever change int.c we would need to
change this too.

The updated attached patch has overflow detection for interval
subtraction, multiply, and negate.  There are also some comparison
cleanups.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
new file mode 100644
index 6bf4cf6..b7d7d80
*** a/doc/src/sgml/datatype.sgml
--- b/doc/src/sgml/datatype.sgml
*** SELECT E'\\xDEADBEEF';
*** 1587,1593 
 
 
  interval [ fields ] [ (p) ]
! 12 bytes
  time interval
  -17800 years
  17800 years
--- 1587,1593 
 
 
  interval [ fields ] [ (p) ]
! 16 bytes
  time interval
  -17800 years
  17800 years
diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
new file mode 100644
index a61b40e..70284e9
*** a/src/backend/utils/adt/datetime.c
--- b/src/backend/utils/adt/datetime.c
*** DecodeInterval(char **field, int *ftype,
*** 2976,2981 
--- 2976,2983 
  	type = DTK_MONTH;
  	if (*field[i] == '-')
  		val2 = -val2;
+ 	if (((double)val * MONTHS_PER_YEAR + val2) > INT_MAX)
+ 		return DTERR_FIELD_OVERFLOW;
  	val = val * MONTHS_PER_YEAR + val2;
  	fval = 0;
  }
diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c
new file mode 100644
index 4581862..3b8e8e8
*** a/src/backend/utils/adt/timestamp.c
--- b/src/backend/utils/adt/timestamp.c
***
*** 41,46 
--- 41,47 
  #error -ffast-math is known to break this code
  #endif
  
+ #define SAMESIGN(a,b)   (((a) < 0) == ((b) < 0))
  
  /* Set at postmaster start */
  TimestampTz PgStartTime;
*** interval2tm(Interval span, struct pg_tm
*** 1694,1700 
  #ifdef HAVE_INT64_TIMESTAMP
  	tfrac = time / USECS_PER_HOUR;
  	time -= tfrac * USECS_PER_HOUR;
! 	tm->tm_hour = tfrac;		/* could overflow ... */
  	tfrac = time / USECS_PER_MINUTE;
  	time -= tfrac * USECS_PER_MINUTE;
  	tm->tm_min = tfrac;
--- 1695,1705 
  #ifdef HAVE_INT64_TIMESTAMP
  	tfrac = time / USECS_PER_HOUR;
  	time -= tfrac * USECS_PER_HOUR;
! 	tm->tm_hour = tfrac;
! 	if (!SAMESIGN(tm->tm_hour, tfrac))
! 		ereport(ERROR,
! (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
!  errmsg("interval out of range")));
  	tfrac = time / USECS_PER_MINUTE;
  	time -= tfrac * USECS_PER_MINUTE;
  	tm->tm_min = tfrac;
*** recalc:
*** 1725,1731 
  int
  tm2interval(struct pg_tm * tm, fsec_t fsec, Interval *span)
  {
! 	span->month = tm->tm_year * MONTHS_PER_YEAR + tm->tm_mon;
  	span->day = tm->tm_mday;
  #ifdef HAVE_INT64_TIMESTAMP
  	span->time = (tm->tm_hour * INT64CONST(60)) +
--- 1730,1740 
  int
  tm2interval(struct pg_tm * tm, fsec_t fsec, Interval *span)
  {
! 	double total_months = (double)tm->tm_year * MONTHS_PER_YEAR + tm->tm_mon;
! 
! 	if (total_months > INT_MAX || total_months < INT_MIN)
! 		return -1;
! 	span->month = total_months;
  	span->day = tm->tm_mday;
  #ifdef HAVE_INT64_TIMESTAMP
  	span->time = (tm->tm_hour * INT64CONST(60)) +
*** interval_um(PG_FUNCTION_ARGS)
*** 2826,2833 
---

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-01-27 Thread Peter Geoghegan
On Mon, Jan 27, 2014 at 4:45 AM, Andrew Dunstan  wrote:
> I personally don't give a tinker's cuss about whether the patch slows down
> pg_stat_statements a bit.

Why not? The assurance that the overhead is generally very low is what
makes it possible to install it widely usually without a second
thought. That's hugely valuable.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freezing without write I/O

2014-01-27 Thread Simon Riggs
On 26 January 2014 12:58, Andres Freund  wrote:
> On 2014-01-25 20:26:16 -0800, Peter Geoghegan wrote:
>> Shouldn't this patch be in the January commitfest?
>
> I think we previously concluded that there wasn't much chance to get
> this into 9.4 and there's significant work to be done on the patch
> before new reviews are required, so not submitting it imo makes sense.

I think we should make this a priority feature for 9.5

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-01-27 Thread Simon Riggs
On 27 January 2014 20:47, Tom Lane  wrote:
> Peter Geoghegan  writes:
>> On Mon, Jan 27, 2014 at 12:25 PM, Robert Haas  wrote:
>>> I haven't reviewed the patch, but -1 for adding a GUC.
>
>> I'm pretty surprised that it's been suggested that some people might
>> prefer AccessExclusiveLocks. Why would anyone prefer that?
>
> For one thing, so they can back this out if it proves to be broken,
> as the last committed version was.

Agreed

> Given that this patch was marked
> (by its author) as Ready for Committer without any review in the current
> CF

True. The main review happened in a previous commitfest and there was
a small addition for this CF.

It was my understanding that you wanted us to indicate that to allow
you to review. I am happy to set status differently, as you wish,
presumably back to needs review.


>I can't say that I have any faith in it.

That's a shame.


-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new json funcs

2014-01-27 Thread Alvaro Herrera
Andrew Dunstan escribió:

> I'm not sure I understand the need. This is the difference between
> the _text variants and their parents. Why would you call
> json_object_field when you want the dequoted text?

Because I first need to know its type.  Sometimes it's an array, or an
object, or a boolean, and for those I won't call the _text version
afterwards but just use the original.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new json funcs

2014-01-27 Thread Andrew Dunstan


On 01/27/2014 03:53 PM, Alvaro Herrera wrote:

Andrew Dunstan escribió:


Note that we can only do this when the result type stays the same.
It does not for json_each/json_each_text or
json_extract_path/json_extract_path_text, which is why we have
different functions for those cases.

In C code, if I extract a value using json_object_field or
json_array_element, is there a way to turn it into the dequoted version,
that is, the value that I would have gotten had I called
json_object_field_text or json_array_element_text instead?

I wrote a quick and dirty hack in the event triggers patch that just
removes the outermost "" and turns any \" into ", but that's probably
incomplete.  Does jsonfuncs.c offer any way to do this?  That might be
useful for the crowd that cares about the detail being discussed in this
subthread.




I'm not sure I understand the need. This is the difference between the 
_text variants and their parents. Why would you call json_object_field 
when you want the dequoted text?


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-01-27 Thread Simon Riggs
On 27 January 2014 20:35, Peter Geoghegan  wrote:
> On Mon, Jan 27, 2014 at 12:25 PM, Robert Haas  wrote:
>> I haven't reviewed the patch, but -1 for adding a GUC.
>
> I'm pretty surprised that it's been suggested that some people might
> prefer AccessExclusiveLocks. Why would anyone prefer that?

Nobody has said "prefer". I said...

> Some people may be surprised
> that their programs don't wait in the same places they used to. We
> hope that is a positive and useful behaviour, but it may not always be
> so.

We get the new behaviour by default and I expect we'll be very happy with it.

A second thought is that if we have problems of some kind in the field
as a result of the new lock modes then we will be able to turn them
off. I'm happy to fix any problems that occur, but that doesn't mean
there won't be any. If everybody is confident that we've foreseen
every bug, then no problem, lets remove it. I recall being asked to
add hot_standby = on | off for similar reasons.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new json funcs

2014-01-27 Thread Alvaro Herrera
Andrew Dunstan escribió:

> Note that we can only do this when the result type stays the same.
> It does not for json_each/json_each_text or
> json_extract_path/json_extract_path_text, which is why we have
> different functions for those cases.

In C code, if I extract a value using json_object_field or
json_array_element, is there a way to turn it into the dequoted version,
that is, the value that I would have gotten had I called
json_object_field_text or json_array_element_text instead?

I wrote a quick and dirty hack in the event triggers patch that just
removes the outermost "" and turns any \" into ", but that's probably
incomplete.  Does jsonfuncs.c offer any way to do this?  That might be
useful for the crowd that cares about the detail being discussed in this
subthread.

Thanks,

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-01-27 Thread Tom Lane
Peter Geoghegan  writes:
> On Mon, Jan 27, 2014 at 12:25 PM, Robert Haas  wrote:
>> I haven't reviewed the patch, but -1 for adding a GUC.

> I'm pretty surprised that it's been suggested that some people might
> prefer AccessExclusiveLocks. Why would anyone prefer that?

For one thing, so they can back this out if it proves to be broken,
as the last committed version was.  Given that this patch was marked
(by its author) as Ready for Committer without any review in the current
CF, I can't say that I have any faith in it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core

2014-01-27 Thread Peter Geoghegan
On Mon, Jan 27, 2014 at 12:39 PM, Tom Lane  wrote:
> OK.  Committed after a couple of small further revisions.

Great. Thank you.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extension_control_path

2014-01-27 Thread Dimitri Fontaine
Hi,

Sergey Muraviov  writes:
> Now patch applies cleanly and works. :-)

Cool ;-)

> But I have some notes:
>
> 1. There is an odd underscore character in functions
> find_in_extension_control_path and list_extension_control_paths:
> \"extension_control__path\""

Fixed in the new version of the patch, attached.

> 2. If we have several versions of one extension in different directories
> (which are listed in extension_control_path parameter) then we
> get strange output from pg_available_extensions and
> pg_available_extension_versions views (Information about extension, whose
> path is at the beginning of the list, is duplicated). And only one version
> of the extension can be created.

Fixed.

> 3. It would be fine to see an extension control path
> in pg_available_extensions and pg_available_extension_versions views (in
> separate column or within of extension name).

I think the on-disk location is an implementation detail and decided in
the attached version not to change those system view definitions.

> 4. Perhaps the CREATE EXTENSION command should be improved to allow
> creation of the required version of the extension.
> So we can use different versions of extensions in different databases.

Fixed in the attached.

I also fixed ALTER EXTENSION UPDATE to search for udpate scripts in the
same directory where the main control file is found, but I suspect this
part requires more thinking.

When we ALTER EXTENSION UPDATE we might now have several places where we
find extname.control files, with possibly differents default_version
properties.

In the attached, we select the directory containing the control file
where default_version matches the already installed extension version.
That matches with a model where the new version of the extension changes
the default_version in an auxiliary file.

We might want to instead match on the default_version in the control
file to match with the new version we are asked to upgrade to.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 5773,5778  SET XML OPTION { DOCUMENT | CONTENT };
--- 5773,5827 
  
   
  
+  
+   extension_control_path (string)
+   
+extension_control_path configuration parameter
+   
+   extension packaging
+   
+
+ The command CREATE EXTENSION searches for the extension
+ control file in order to install it. The value
+ for extension_control_path is used to search for
+ the name.control files.
+
+ 
+
+ The value for extension_control_path must be a list
+ of absolute directory paths separated by colons (or semi-colons on
+ Windows). If a list element starts with the special
+ string $extdir, the
+ compiled-in PostgreSQL package extension
+ directory is substituted for $extdir; this is where
+ the extensions provided by the standard
+ PostgreSQL distribution are installed.
+ (Use pg_config --extdir to find out the name of
+ this directory.) For example:
+ 
+ extension_control_path = '/usr/local/postgresql/extension:/home/my_project:$extdir'
+ 
+ or, in a Windows environment:
+ 
+ extension_control_path = 'C:\tools\postgresql;H:\my_project\lib;$extdir'
+ 
+
+ 
+
+ The default value for this parameter is '$extdir'.
+
+ 
+
+ This parameter can be changed at run time by superusers, but a
+ setting done that way will only persist until the end of the
+ client connection, so this method should be reserved for
+ development purposes. The recommended way to set this parameter
+ is in the postgresql.conf configuration
+ file.
+
+   
+  
+ 
   
dynamic_library_path (string)

*** a/src/backend/commands/extension.c
--- b/src/backend/commands/extension.c
***
*** 25,30 
--- 25,31 
  
  #include 
  #include 
+ #include 
  #include 
  
  #include "access/htup_details.h"
***
*** 60,71 
--- 61,76 
  bool		creating_extension = false;
  Oid			CurrentExtensionObject = InvalidOid;
  
+ /* GUC extension_control_path */
+ char   *Extension_control_path;
+ 
  /*
   * Internal data structure to hold the results of parsing a control file
   */
  typedef struct ExtensionControlFile
  {
  	char	   *name;			/* name of the extension */
+ 	char	   *filename;		/* full path of the extension control file */
  	char	   *directory;		/* directory for script files */
  	char	   *default_version;	/* default install target version, if any */
  	char	   *module_pathname;	/* string to substitute for MODULE_PATHNAME */
***
*** 342,397  is_extension_script_filename(const char *filename)
  	return (extension != NULL) && (strcmp(extension, ".sql") == 0);
  }
  
  static char *
! get_extension

Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core

2014-01-27 Thread Tom Lane
Peter Geoghegan  writes:
> On Sun, Jan 26, 2014 at 10:38 AM, Tom Lane  wrote:
>> Meh.  This line of argument seems to reduce to "we don't need to worry
>> about performance of this code path because it won't be reached often".

> I think I may have over-elaborated, giving you the false impression
> that this was something I felt strongly about. I'm glad that the
> overhead has been shown to be quite low, and I think that lexing
> without the lock held will be fine.

OK.  Committed after a couple of small further revisions.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-01-27 Thread Peter Geoghegan
On Mon, Jan 27, 2014 at 12:25 PM, Robert Haas  wrote:
> I haven't reviewed the patch, but -1 for adding a GUC.

I'm pretty surprised that it's been suggested that some people might
prefer AccessExclusiveLocks. Why would anyone prefer that?


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-01-27 Thread Robert Haas
On Mon, Jan 27, 2014 at 12:58 PM, Simon Riggs  wrote:
> On 24 January 2014 08:33, Simon Riggs  wrote:
>> On 24 January 2014 07:08, Peter Geoghegan  wrote:
>>> On Wed, Jan 15, 2014 at 6:57 AM, Simon Riggs  wrote:
 v15 to fix the above problem.
>>>
>> v16 attached
>
> v17 attached
>
> This version adds a GUC called ddl_exclusive_locks which allows us to
> keep the 9.3 behaviour if we wish it. Some people may be surprised
> that their programs don't wait in the same places they used to. We
> hope that is a positive and useful behaviour, but it may not always be
> so.
>
> I'll commit this on Thurs 30 Jan unless I hear objections.

I haven't reviewed the patch, but -1 for adding a GUC.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-01-27 Thread Peter Geoghegan
On Mon, Jan 27, 2014 at 10:58 AM, Heikki Linnakangas
 wrote:
> Okay, promise not to laugh. I did write a bunch of hacks, to generate
> graphviz .dot files from the btree pages, and render them into pictures. It
> consist of multiple parts, all in the attached tarball.

It's funny that you should say that, because I was thinking about
building something similar over the past few days. I find it very
useful to build ad-hoc tools like that, and I thought that it would be
particularly useful to have something visual for testing btree code.
Sure, you can come up with a test and keep the details straight in
your head while coordinating everything, but that won't scale as new
testing scenarios inevitably occur to you. I've done plenty of things
with contrib/pageinspect along these lines in the past, but this is
better. Anything that reduces the friction involved in building an
accurate mental model of things is a very good thing.



-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Partial sort

2014-01-27 Thread Alexander Korotkov
Hi!

On Tue, Jan 21, 2014 at 3:24 AM, Marti Raudsepp  wrote:

> On Tue, Jan 14, 2014 at 5:49 PM, Alexander Korotkov
>  wrote:
> >On Tue, Jan 14, 2014 at 12:54 AM, Marti Raudsepp  wrote:
> >> I've been trying it out in a few situations. I implemented a new
> >> enable_partialsort GUC to make it easier to turn on/off, this way it's
> a lot
> >> easier to test. The attached patch applies on top of
> partial-sort-5.patch
> >
> > I though about such option. Generally not because of testing convenience,
> > but because of overhead of planning. This way you implement it is quite
> > naive :)
>
> I don't understand. I had another look at this and cost_sort still
> seems like the best place to implement this, since that's where the
> patch decides how many pre-sorted columns to use. Both mergejoin and
> simple order-by plans call into it. If enable_partialsort=false then I
> skip all pre-sorted options except full sort, making cost_sort behave
> pretty much like it did before the patch.
>
> I could change pathkeys_common to return 0, but that seems like a
> generic function that shouldn't be tied to partialsort. The old code
> paths called pathkeys_contained_in anyway, which has similar
> complexity. (Apart for initial_cost_mergejoin, but that doesn't seem
> special enough to make an exception for).
>
> Or should I use?:
>   enable_partialsort ? pathkeys_common(...) : 0
>
> > For instance, merge join rely on partial sort which will be
> > replaced with simple sort.
>
> Are you saying that enable_partialsort=off should keep
> partialsort-based mergejoins enabled?
>
> Or are you saying that merge joins shouldn't use "simple sort" at all?
> But merge join was already able to use full Sort nodes before your
> patch.
>

Sorry that I didn't explained it. In particular I mean following:
1) With enable_partialsort = off all mergejoin logic should behave as
without partial sort patch.
2) With partial sort patch get_cheapest_fractional_path_for_pathkeys
function is much more expensive to execute. With enable_partialsort = off
it should be as cheap as without partial sort patch.
I'll try to implement this option in this week.
For now, I have attempt to fix extra columns in mergejoin problem. It would
be nice if you test it.

--
With best regards,
Alexander Korotkov.


partial-sort-7.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2014-01-27 Thread Alvaro Herrera
Heikki Linnakangas wrote:

> I spent some time whacking this around, new patch version attached.
> I moved the mmap() code into a new function, that leaves the
> PGSharedMemoryCreate more readable.

Did this patch go anywhere?

Someone just pinged me about a kernel scalability problem in Linux with
huge pages; if someone did performance measurements with this patch,
perhaps it'd be good to measure again with the kernel patch in place.

https://lkml.org/lkml/2014/1/26/227

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-01-27 Thread Peter Geoghegan
On Mon, Jan 27, 2014 at 10:27 AM, Heikki Linnakangas
 wrote:
>> I think I see some bugs in _bt_moveright(). If you examine
>> _bt_finish_split() in detail, you'll see that it doesn't just drop the
>> write buffer lock that the caller will have provided (per its
>> comments) - it also drops the buffer pin. It calls _bt_insert_parent()
>> last, which was previously only called last thing by _bt_insertonpg()
>> (some of the time), and _bt_insertonpg() is indeed documented to drop
>> both the lock and the pin. And if you look at _bt_moveright() again,
>> you'll see that there is a tacit assumption that the buffer pin isn't
>> dropped, or at least that "opaque" (the BTPageOpaque BT special page
>> area pointer) continues to point to the same page held in the same
>> buffer, even though the code doesn't set the "opaque' pointer again
>> and it may not point to a pinned buffer or even the appropriate
>> buffer. Ditto "page". So "opaque" may point to the wrong thing on
>> subsequent iterations - you don't do anything with the return value of
>> _bt_getbuf(), unlike all of the existing call sites. I believe you
>> should store its return value, and get the held page and the special
>> pointer on the page, and assign that to the "opaque" pointer before
>> the next iteration (an iteration in which you very probably really do
>> make progress not limited to completing a page split, the occurrence
>> of which the _bt_moveright() loop gets "stuck on"). You know, do what
>> is done in the non-split-handling case. It may not always be the same
>> buffer as before, obviously.
>
>
> Yep, fixed.

Can you explain what the fix was, please?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-27 Thread Josh Berkus
On 01/26/2014 07:56 PM, Rajeev rastogi wrote:
> I shall rework to improve this patch. Below are the summarization of all
> discussions, which will be used as input for improving the patch:
> 
> 1. Method of degrading the synchronous mode:
>   a. Expose the configuration variable to a new SQL-callable functions.
>   b. Using ALTER SYSTEM SET.
>   c. Auto-degrade using some sort of configuration parameter as done in 
> current patch.
>   d. Or may be combination of above, which DBA can use depending on their 
> use-cases.  
> 
>   We can discuss further to decide on one of the approach.
> 
> 2. Synchronous mode should upgraded/restored after at-least one synchronous 
> standby comes up and has caught up with the master.
> 
> 3. A better monitoring/administration interfaces, which can be even better if 
> it is made as a generic trap system.
> 
>   I shall propose a better approach for this.
> 
> 4. Send committing clients, a WARNING if they have committed a synchronous 
> transaction and we are in degraded mode.
> 
> 5. Please add more if I am missing something.

I think we actually need two degrade modes:

A. degrade once: if the sync standby connection is ever lost, degrade
and do not resync.

B. reconnect: if the sync standby catches up again, return it to sync
status.

The reason you'd want "degrade once" is to avoid the "flaky network"
issue where you're constantly degrading then reattaching the sync
standby, resulting in horrible performance.

If we did offer "degrade once" though, we'd need some easy way to
determine that the master was in a state of permanent degrade, and a
command to make it resync.

Discuss?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Support for pg_stat_archiver view

2014-01-27 Thread Fujii Masao
On Sat, Jan 25, 2014 at 3:19 PM, Michael Paquier
 wrote:
> On Sat, Jan 25, 2014 at 5:41 AM, Fujii Masao  wrote:
>> On Thu, Jan 23, 2014 at 4:10 PM, Michael Paquier
>>  wrote:
>>> On Thu, Jan 9, 2014 at 6:36 AM, Gabriele Bartolini
>>>  wrote:
 Il 08/01/14 18:42, Simon Riggs ha scritto:
> Not sure I see why it needs to be an SRF. It only returns one row.
 Attached is version 4, which removes management of SRF stages.
>>>
>>> I have been looking at v4 a bit more, and found a couple of small things:
>>> - a warning in pgstatfuncs.c
>>> - some typos and format fixing in the sgml docs
>>> - some corrections in a couple of comments
>>> - Fixed an error message related to pg_stat_reset_shared referring
>>> only to bgwriter and not the new option archiver. Here is how the new
>>> message looks:
>>> =# select pg_stat_reset_shared('popo');
>>> ERROR:  22023: unrecognized reset target: "popo"
>>> HINT:  Target must be "bgwriter" or "archiver"
>>> A new version is attached containing those fixes. I played also with
>>> the patch and pgbench, simulating some archive failures and successes
>>> while running pgbench and the reports given by pg_stat_archiver were
>>> correct, so I am marking this patch as "Ready for committer".
>>
>> I refactored the patch further.
>>
>> * Remove ArchiverStats global variable to simplify pgarch.c.
>> * Remove stats_timestamp field from PgStat_ArchiverStats struct because
>>it's not required.
> Thanks, pgstat_send_archiver is cleaner now.
>
>>
>> I have some review comments:
>>
>> +s.archived_wals,
>> +s.last_archived_wal,
>> +s.last_archived_wal_time,
>> +s.failed_attempts,
>> +s.last_failed_wal,
>> +s.last_failed_wal_time,
>>
>> The column names of pg_stat_archiver look not consistent at least to me.
>> What about the followings?
>>
>>   archive_count
>>   last_archived_wal
>>   last_archived_time
>>   fail_count
>>   last_failed_wal
>>   last_failed_time
> And what about archived_count and failed_count instead of respectively
> archive_count and failed_count? The rest of the names are better now
> indeed.
>
>> I think that it's time to rename all the variables related to 
>> pg_stat_bgwriter.
>> For example, it's better to change PgStat_GlobalStats to PgStat_Bgwriter.
>> I think that it's okay to make this change as separate patch, though.
> Yep, this is definitely a different patch.
>
>> +char last_archived_wal[MAXFNAMELEN];/* last WAL file archived */
>> +TimestampTz last_archived_wal_timestamp;/* last archival success */
>> +PgStat_Counter failed_attempts;
>> +char last_failed_wal[MAXFNAMELEN];/* last WAL file involved
>> in failure */
>> Some hackers don't like the increase of the size of the statsfile. In order 
>> to
>> reduce the size as much as possible, we should use the exact size of WAL file
>> here instead of MAXFNAMELEN?
> The first versions of the patch used a more limited size length more
> inline with what you say:
> +#define MAX_XFN_CHARS40
> But this is inconsistent with xlog_internal.h.

Using MAX_XFN_CHARS instead of MAXFNAMELEN has a clear advantage
to reduce the size of the statistics file. Though I'm not sure how much this
change improve the performance of the statistics collector, basically
I'd like to
use MAX_XFN_CHARS here.

+(errmsg("corrupted statistics file (global) \"%s\"",
statfile)));

I think that it's better to use the view name "pg_stat_bgwriter" instead of
internal name "global" here. The view name is easier-to-understand to a user.

+(errmsg("corrupted statistics file (archiver)
\"%s\"", statfile)));

Same as above. What using "pg_stat_archiver" instead of just "archive"?

+if (fread(&myArchiverStats, 1, sizeof(myArchiverStats),
+  fpin) != sizeof(myArchiverStats))
+{
+ereport(pgStatRunningInCollector ? LOG : WARNING,
+(errmsg("corrupted statistics file \"%s\"", statfile)));

Same as above. Isn't it better to add something like
"pg_stat_archiver" even here?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alternative back-end block formats

2014-01-27 Thread Christian Convey
Hi Craig,

On Sun, Jan 26, 2014 at 5:47 AM, Craig Ringer  wrote:

> On 01/21/2014 07:43 PM, Christian Convey wrote:
> > Hi all,
> >
> > I'm playing around with Postgres, and I thought it might be fun to
> > experiment with alternative formats for relation blocks, to see if I can
> > get smaller files and/or faster server performance.
>
> It's not clear how you'd do this without massively rewriting the guts of
> Pg.
>
> Per the docs on internal structure, Pg has a block header, then tuples
> within the blocks, each with a tuple header and list of Datum values for
> the tuple. Each Datum has a generic Datum header (handling varlena vs
> fixed length values etc) then a type-specific on-disk representation
> controlled by the type output function for that type.
>

I'm still in the process of getting familiar with the pg backend code, so I
don't have a concrete plan yet.  However, I'm working on the assumption
that some set of macros and functions encapsulates the page layout.

If/when I tackle this, I expect to add a layer of indirection somewhere
around that boundary, so that some non-catalog tables, whose schemas meet
certain simplifying assumptions, are read and modified using specialized
code.

I don't want to get into the specific optimizations I'd like to try, only
because I haven't fully studied the code yet, so I don't want to put my
foot in my mouth.

What concrete problem do you mean to tackle? What idea do you want to
> explore or implement?
>

My real motivation is that I'd like to get more familiar with the pg
backend codebase, and tilting at this windmill seemed like an interesting
way to accomplish that.

If I was focused on really solving a real-world problem, I'd say that this
lays the groundwork for table-schema-specific storage optimizations and
optimized record-filtering code.  But I'd only make that argument if I
planned to (a) perform a careful study with statistically significant
benchmarks, and/or (b) produce a merge-worthy patch.  At this point I have
no intentions of doing so.  My main goal really is just to have fun with
the code.


> > Does anyone know if this has been done before with Postgres?  I would
> > have assumed yes, but I'm not finding anything in Google about people
> > having done this.
>
> AFAIK (and I don't know much in this area) the storage manager isn't
> very pluggable compared to the rest of Pg.
>

Thanks for the warning.  Duly noted.

Kind regards,
Christian


Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-01-27 Thread Heikki Linnakangas

On 01/23/2014 11:36 PM, Peter Geoghegan wrote:

The first thing I noticed about this patchset is that it completely
expunges btree_xlog_startup(), btree_xlog_cleanup() and
btree_safe_restartpoint(). The post-recovery cleanup that previously
occurred to address both sets of problems (the problem addressed by
this patch, incomplete page splits, and the problem addressed by the
dependency patch, incomplete page deletions) now both occur
opportunistically/lazily. Notably, this means that there are now
exactly zero entries in the resource manager list with a
'restartpoint' callback. I guess we should consider removing it
entirely for that reason. As you said in the commit message where
gin_safe_restartpoint was similarly retired (commit 631118fe):

"""
The post-recovery cleanup mechanism was never totally reliable, as insertion
to the parent could fail e.g because of running out of memory or disk space,
leaving the tree in an inconsistent state.
"""

I'm of the general impression that post-recovery cleanup is
questionable. I'm surprised that you didn't mention this commit
previously. You just mentioned the original analogous work on GiST,
but this certainly seems to be something you've been thinking about
*systematically* eliminating for a while.


Yes, that's correct, these b-tree patches are part of a grand plan to 
eliminate post-recovery cleanup actions altogether.



So while post-recovery callbacks no longer exist for any
rmgr-managed-resource, 100% of remaining startup and cleanup callbacks
concern the simple management of memory of AM-specific recovery
contexts (for GiST, GiN and SP-GiST). I have to wonder if there isn't
a better abstraction than that, such as a generic recovery memory
context, allowing you to retire all 3 callbacks. I mean, StartupXLOG()
just calls those callbacks for each resource at exactly the same time
anyway, just as it initializes resource managers in precisely the same
manner earlier on. Plus if you look at what those AM-local memory
management routines do, it all seems very simple.

I think you should bump XLOG_PAGE_MAGIC.

Perhaps you should increase the elevel here:

+   elog(DEBUG1, "finishing incomplete split of %u/%u",
+BufferGetBlockNumber(lbuf), BufferGetBlockNumber(rbuf));

Since this is a very rare occurrence that involves some subtle
interactions, I can't think why you wouldn't want to LOG this for
forensic purposes.


Hmm. I'm not sure I agree with that line of thinking in general - what 
is an admin supposed to do about the message? But there's a precedent in 
vacuumlazy.c, which logs a message when it finds all-zero pages in the 
heap. So I guess that should be a LOG.



Why did you remove the local linkage of _bt_walk_left(), given that it
is called in exactly the same way as before? I guess that that is just
a vestige of some earlier revision.


Yeah, reverted.


I think I see some bugs in _bt_moveright(). If you examine
_bt_finish_split() in detail, you'll see that it doesn't just drop the
write buffer lock that the caller will have provided (per its
comments) - it also drops the buffer pin. It calls _bt_insert_parent()
last, which was previously only called last thing by _bt_insertonpg()
(some of the time), and _bt_insertonpg() is indeed documented to drop
both the lock and the pin. And if you look at _bt_moveright() again,
you'll see that there is a tacit assumption that the buffer pin isn't
dropped, or at least that "opaque" (the BTPageOpaque BT special page
area pointer) continues to point to the same page held in the same
buffer, even though the code doesn't set the "opaque' pointer again
and it may not point to a pinned buffer or even the appropriate
buffer. Ditto "page". So "opaque" may point to the wrong thing on
subsequent iterations - you don't do anything with the return value of
_bt_getbuf(), unlike all of the existing call sites. I believe you
should store its return value, and get the held page and the special
pointer on the page, and assign that to the "opaque" pointer before
the next iteration (an iteration in which you very probably really do
make progress not limited to completing a page split, the occurrence
of which the _bt_moveright() loop gets "stuck on"). You know, do what
is done in the non-split-handling case. It may not always be the same
buffer as before, obviously.


Yep, fixed.


Do you suppose that there are similar problems in other direct callers
of _bt_finish_split()? I'm thinking in particular of
_bt_findinsertloc().


Hmm, no, the other callers look OK to me.


I'm also not sure about the lock escalation that may occur within
_bt_moveright() for callers with BT_READ access - there is nothing to
prevent a caller from ending up being left with a write lock where
before they only had a read lock if they find that
!P_INCOMPLETE_SPLIT() with the write lock (after lock promotion) and
conclude that there is no split finishing to be done after all. It
looks like all callers currently won't care about that, but it s

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2014-01-27 Thread Simon Riggs
On 27 January 2014 17:44, Pavel Stehule  wrote:

> This topic is interesting - we found very bad performance with hashing large
> tables with high work_mem. MergeJoin with quicksort was significantly
> faster.

I've seen this also.

> I didn't deeper research - there is a possibility of virtualization
> overhead.

I took measurements and the effect was repeatable and happened for all
sizes of work_mem, but nothing more to add.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-01-27 Thread Simon Riggs
On 27 January 2014 17:58, Simon Riggs  wrote:
> On 24 January 2014 08:33, Simon Riggs  wrote:
>> On 24 January 2014 07:08, Peter Geoghegan  wrote:
>>> On Wed, Jan 15, 2014 at 6:57 AM, Simon Riggs  wrote:
 v15 to fix the above problem.
>>>
>> v16 attached
>
> v17 attached

Frostbite...

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


reduce_lock_levels.v17.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-01-27 Thread Simon Riggs
On 24 January 2014 08:33, Simon Riggs  wrote:
> On 24 January 2014 07:08, Peter Geoghegan  wrote:
>> On Wed, Jan 15, 2014 at 6:57 AM, Simon Riggs  wrote:
>>> v15 to fix the above problem.
>>
> v16 attached

v17 attached

This version adds a GUC called ddl_exclusive_locks which allows us to
keep the 9.3 behaviour if we wish it. Some people may be surprised
that their programs don't wait in the same places they used to. We
hope that is a positive and useful behaviour, but it may not always be
so.

I'll commit this on Thurs 30 Jan unless I hear objections.

Thanks

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-27 Thread Tom Lane
I wrote:
>> the idea that we might get many dozen such warnings on more-current
>> compilers is scarier, as that might well interfere with people's
>> ability to do development on, say, Windows.  Could somebody check
>> whether MSVC for instance complains about format strings using "z"?
>> Or shall I just commit this and we'll see what the buildfarm says?

> Given the lack of response, I've pushed the patch, and will canvass
> the buildfarm results later.

Just to follow up on that, I couldn't find any related warnings in the
buildfarm this morning (although there is one laggard machine, "anole",
which uses an unusual compiler and still hasn't reported in).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new json funcs

2014-01-27 Thread Andrew Dunstan


On 01/27/2014 12:43 PM, Merlin Moncure wrote:

On Fri, Jan 24, 2014 at 3:26 PM, Josh Berkus  wrote:

On 01/24/2014 12:59 PM, Andrew Dunstan wrote:

On 01/24/2014 03:40 PM, Laurence Rowe wrote:

For consistency with the existing json functions (json_each,
json_each_text, etc.) it might be better to add separate
json_to_record_text and json_to_recordset_text functions in place of
the nested_as_text parameter to json_to_record and json_to_recordset.



It wouldn't be consistent with json_populate_record() and
json_populate_recordset(), the two closest relatives, however.

And yes, I appreciate that we have not been 100% consistent. Community
design can be a bit messy that way.

FWIW, I prefer the parameter to having differently named functions.

+1.




Note that we can only do this when the result type stays the same. It 
does not for json_each/json_each_text or 
json_extract_path/json_extract_path_text, which is why we have different 
functions for those cases.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new json funcs

2014-01-27 Thread Merlin Moncure
On Fri, Jan 24, 2014 at 3:26 PM, Josh Berkus  wrote:
> On 01/24/2014 12:59 PM, Andrew Dunstan wrote:
>>
>> On 01/24/2014 03:40 PM, Laurence Rowe wrote:
>>> For consistency with the existing json functions (json_each,
>>> json_each_text, etc.) it might be better to add separate
>>> json_to_record_text and json_to_recordset_text functions in place of
>>> the nested_as_text parameter to json_to_record and json_to_recordset.
>>>
>>>
>>
>> It wouldn't be consistent with json_populate_record() and
>> json_populate_recordset(), the two closest relatives, however.
>>
>> And yes, I appreciate that we have not been 100% consistent. Community
>> design can be a bit messy that way.
>
> FWIW, I prefer the parameter to having differently named functions.

+1.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2014-01-27 Thread Pavel Stehule
2014-01-27 Stephen Frost 

> * Simon Riggs (si...@2ndquadrant.com) wrote:
> > I don't see anything for 9.4 in here now.
>
> Attached is what I was toying with (thought I had attached it previously
> somewhere..  perhaps not), but in re-testing, it doesn't appear to do
> enough to move things in the right direction in all cases.  I did play
> with this a fair bit yesterday and while it improved some cases by 20%
> (eg: a simple join between pgbench_accounts and pgbench_history), when
> we decide to *still* hash the larger side (as in my 'test_case2.sql'),
> it can cause a similairly-sized decrease in performance.  Of course, if
> we can push that case to hash the smaller side (which I did by hand with
> cpu_tuple_cost), then it goes back to being a win to use a larger number
> of buckets.
>
> I definitely feel that there's room for improvment here but it's not an
> easily done thing, unfortunately.  To be honest, I was pretty surprised
> when I saw that the larger number of buckets performed worse, even if it
> was when we picked the "wrong" side to hash and I plan to look into that
> more closely to try and understand what's happening.  My first guess
> would be what Tom had mentioned over the summer- if the size of the
> bucket array ends up being larger than the CPU cache, we can end up
> paying a great deal more to build the hash table than it costs to scan
> through the deeper buckets that we end up with as a result (particularly
> when we're scanning a smaller table).  Of course, choosing to hash the
> larger table makes that more likely..
>

This topic is interesting - we found very bad performance with hashing
large tables with high work_mem. MergeJoin with quicksort was significantly
faster.

I didn't deeper research - there is a possibility of virtualization
overhead.

Regards

Pavel


>
> Thanks,
>
> Stephen
>


Re: [HACKERS] Planning time in explain/explain analyze

2014-01-27 Thread Andreas Karlsson

On 01/13/2014 09:48 PM, Robert Haas wrote:

What I'm saying is that if EXPLAIN reports something that's labelled
"Planning Time", it should *be* the planning time, and not anything
else.  When we retrieve a plan from cache, it would be sensible not to
report the planning time at all, and IMHO it would also be sensible to
report the time it actually took to plan whenever we originally did
it.  But reporting a value that is not the planning time and calling
it the planning time does not seem like a good idea to me.


Here is a patch which only prints when "Planning time" when a prepared 
statment actually planned a query. I do not really like how I check for 
if it was replanned, but I tried to avoid making changes in plancache.c.


Does this idea look ok?

--
Andreas Karlsson
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
new file mode 100644
index 2af1738..482490b
*** a/doc/src/sgml/perform.sgml
--- b/doc/src/sgml/perform.sgml
*** EXPLAIN SELECT * FROM tenk1;
*** 89,94 
--- 89,95 
   QUERY PLAN
  -
   Seq Scan on tenk1  (cost=0.00..458.00 rows=1 width=244)
+  Planning time: 0.113 ms
  
 
  
*** EXPLAIN SELECT * FROM tenk1;
*** 162,167 
--- 163,174 
 
  
 
+ The Planning time shown is the time it took to generate
+ the query plan from the parsed query and optimize it. It does not include
+ rewriting and parsing.
+
+ 
+
  Returning to our example:
  
  
*** EXPLAIN SELECT * FROM tenk1;
*** 170,175 
--- 177,183 
   QUERY PLAN
  -
   Seq Scan on tenk1  (cost=0.00..458.00 rows=1 width=244)
+  Planning time: 0.113 ms
  
 
  
*** EXPLAIN SELECT * FROM tenk1 WHERE unique
*** 198,203 
--- 206,212 
  
   Seq Scan on tenk1  (cost=0.00..483.00 rows=7001 width=244)
 Filter: (unique1 < 7000)
+  Planning time: 0.104 ms
  
  
  Notice that the EXPLAIN output shows the WHERE
*** EXPLAIN SELECT * FROM tenk1 WHERE unique
*** 234,239 
--- 243,249 
 Recheck Cond: (unique1 < 100)
 ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..5.04 rows=101 width=0)
   Index Cond: (unique1 < 100)
+  Planning time: 0.093 ms
  
  
  Here the planner has decided to use a two-step plan: the child plan
*** EXPLAIN SELECT * FROM tenk1 WHERE unique
*** 262,267 
--- 272,278 
 Filter: (stringu1 = 'xxx'::name)
 ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..5.04 rows=101 width=0)
   Index Cond: (unique1 < 100)
+  Planning time: 0.089 ms
  
  
  The added condition stringu1 = 'xxx' reduces the
*** EXPLAIN SELECT * FROM tenk1 WHERE unique
*** 283,288 
--- 294,300 
  -
   Index Scan using tenk1_unique1 on tenk1  (cost=0.29..8.30 rows=1 width=244)
 Index Cond: (unique1 = 42)
+  Planning time: 0.076 ms
  
  
  In this type of plan the table rows are fetched in index order, which
*** EXPLAIN SELECT * FROM tenk1 WHERE unique
*** 311,316 
--- 323,329 
 Index Cond: (unique1 < 100)
   ->  Bitmap Index Scan on tenk1_unique2  (cost=0.00..19.78 rows=999 width=0)
 Index Cond: (unique2 > 9000)
+  Planning time: 0.094 ms
  
  
  But this requires visiting both indexes, so it's not necessarily a win
*** EXPLAIN SELECT * FROM tenk1 WHERE unique
*** 331,336 
--- 344,350 
 ->  Index Scan using tenk1_unique2 on tenk1  (cost=0.29..71.27 rows=10 width=244)
   Index Cond: (unique2 > 9000)
   Filter: (unique1 < 100)
+  Planning time: 0.087 ms
  
 
  
*** WHERE t1.unique1 < 10 AND t1.unique2
*** 364,369 
--- 378,384 
 Index Cond: (unique1 < 10)
 ->  Index Scan using tenk2_unique2 on tenk2 t2  (cost=0.29..7.91 rows=1 width=244)
   Index Cond: (unique2 = t1.unique2)
+  Planning time: 0.117 ms
  
 
  
*** WHERE t1.unique1 < 10 AND t2.unique2
*** 415,420 
--- 430,436 
 ->  Materialize  (cost=0.29..8.51 rows=10 width=244)
   ->  Index Scan using tenk2_unique2 on tenk2 t2  (cost=0.29..8.46 rows=10 width=244)
 Index Cond: (unique2 < 10)
+  Planning time: 0.119 ms
  
  
  The condition t1.hundred < t2.hundred can't be
*** WHERE t1.unique1 < 100 AND t1.unique2
*** 462,467 
--- 478,484 
 Recheck Cond: (unique1 < 100)
 ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..5.04 rows=101 width=0)
   Index Cond: (unique1 < 100)
+  Planning time: 0.182 ms
  
 
  
*** WHERE t1.unique1 < 100 AND t1.unique2
*** 492,497 
--- 509,515 -

Re: [HACKERS] Fwd: Request for error explaination || Adding a new integer in indextupleData Structure

2014-01-27 Thread Tom Lane
Rohit Goyal  writes:
> Hi All,
> I was trying to modify indextupledata structure by adding an integer
> variable. ButI faced an error message "psql: FATAL:  could not find tuple
> for opclass 10032".

> Could anyone please help me in resolving this issue.

You broke a system catalog index.  Without seeing what you changed and
where, it's impossible to say just how, but that's the bottom line.

In recent versions of PG, opclass 10032 is btree name_ops (unless you've
also added/removed system catalog entries), which is a pretty plausible
thing to be one of the first indexscanned fetches during relcache.c
initialization, so I don't think there's any great significance in this
particular error message.  It's likely that you broke *all* indexscans
not just one specific one.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] "pg_ctl stop" times out when it should respond quickly

2014-01-27 Thread Ronan Dunklau
Le mardi 7 janvier 2014 17:05:03 Michael Paquier a écrit :
> On Sun, Jan 5, 2014 at 3:49 PM, MauMau  wrote:
> > Could you confirm again and tell me what problem is happening?
> 
> FWIW, I just quickly tested those two patches independently and got
> them correctly applied with patch -p1 < $PATCH on master at edc4345.
> They compiled and passed as well make check.
> Regards,

Both patches apply cleanly, I'll focus on the pg_stop_fail_v3 patch.

Tests are running correctly.

The bug this patch is supposed to fix has been reproduced on current HEAD, and 
cannot be reproduced using the patch.

Previous concerns about using both get_pgpid and postmaster_is_alive are 
adressed.

There is no regression tests covering this bugfix, althought I don't know if 
it would be practical to implement them.


-- 
Ronan Dunklau
http://dalibo.com - http://dalibo.org

signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2014-01-27 Thread Stephen Frost
* Stephen Frost (sfr...@snowman.net) wrote:
> * Simon Riggs (si...@2ndquadrant.com) wrote:
> > I don't see anything for 9.4 in here now.
> 
> Attached [...]

I'm apparently bad at this 'attaching' thing, particularly on this
subject.

Here it is.

Thanks,

Stephen
colordiff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
new file mode 100644
index 6a2f236..a8a8168
*** a/src/backend/executor/nodeHash.c
--- b/src/backend/executor/nodeHash.c
*** ExecChooseHashTableSize(double ntuples,
*** 489,496 
  		/* We expect the hashtable to fit in memory */
  		double		dbuckets;
  
! 		dbuckets = ceil(ntuples / NTUP_PER_BUCKET);
! 		dbuckets = Min(dbuckets, max_pointers);
  		nbuckets = (int) dbuckets;
  
  		nbatch = 1;
--- 489,495 
  		/* We expect the hashtable to fit in memory */
  		double		dbuckets;
  
! 		dbuckets = Min(ntuples, max_pointers);
  		nbuckets = (int) dbuckets;
  
  		nbatch = 1;


signature.asc
Description: Digital signature


Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2014-01-27 Thread Stephen Frost
* Simon Riggs (si...@2ndquadrant.com) wrote:
> I don't see anything for 9.4 in here now.

Attached is what I was toying with (thought I had attached it previously
somewhere..  perhaps not), but in re-testing, it doesn't appear to do
enough to move things in the right direction in all cases.  I did play
with this a fair bit yesterday and while it improved some cases by 20%
(eg: a simple join between pgbench_accounts and pgbench_history), when
we decide to *still* hash the larger side (as in my 'test_case2.sql'),
it can cause a similairly-sized decrease in performance.  Of course, if
we can push that case to hash the smaller side (which I did by hand with
cpu_tuple_cost), then it goes back to being a win to use a larger number
of buckets.

I definitely feel that there's room for improvment here but it's not an
easily done thing, unfortunately.  To be honest, I was pretty surprised
when I saw that the larger number of buckets performed worse, even if it
was when we picked the "wrong" side to hash and I plan to look into that
more closely to try and understand what's happening.  My first guess
would be what Tom had mentioned over the summer- if the size of the
bucket array ends up being larger than the CPU cache, we can end up
paying a great deal more to build the hash table than it costs to scan
through the deeper buckets that we end up with as a result (particularly
when we're scanning a smaller table).  Of course, choosing to hash the
larger table makes that more likely..

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Fwd: Request for error explaination || Adding a new integer in indextupleData Structure

2014-01-27 Thread Rohit Goyal
Hi All,

I was trying to modify indextupledata structure by adding an integer
variable. ButI faced an error message "psql: FATAL:  could not find tuple
for opclass 10032".

Could anyone please help me in resolving this issue.


Regards,
Rohit Goyal



-- 
Regards,
Rohit Goyal


Re: [HACKERS] WIP patch (v2) for updatable security barrier views

2014-01-27 Thread Simon Riggs
On 27 January 2014 16:11, Tom Lane  wrote:
> Simon Riggs  writes:
>> Am I right in thinking that we have this fully working now?
>
> I will look at this at some point during the CF, but have not yet,
> and probably won't as long as it's not marked "ready for committer".

I've marked it Ready for Committer, to indicate my personal opinion.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part2: fast scan

2014-01-27 Thread Alexander Korotkov
On Sun, Jan 26, 2014 at 8:14 PM, Heikki Linnakangas  wrote:

> In addition to that, I'm using the ternary consistent function to check
>> if minItem is a match, even if we haven't loaded all the entries yet.
>> That's less important, but I think for something like "rare1 | (rare2 &
>> frequent)" it might be useful. It would allow us to skip fetching
>> 'frequent', when we already know that 'rare1' matches for the current
>> item. I'm not sure if that's worth the cycles, but it seemed like an
>> obvious thing to do, now that we have the ternary consistent function.
>>
>
> So, that clearly isn't worth the cycles :-). At least not with an
> expensive consistent function; it might be worthwhile if we pre-build the
> truth-table, or cache the results of the consistent function.
>

I believe cache consistent function results is quite same as lazy
truth-table. I think it's a good option to use with two-state consistent
function. However, I don't think it's a reason to refuse from three-state
consistent function because number of entries could be large.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] dynamic shared memory and locks

2014-01-27 Thread Robert Haas
On Thu, Jan 23, 2014 at 11:10 AM, Robert Haas  wrote:
> On Wed, Jan 22, 2014 at 12:42 PM, Andres Freund  
> wrote:
>> On 2014-01-22 12:40:34 -0500, Robert Haas wrote:
>>> On Wed, Jan 22, 2014 at 12:11 PM, Tom Lane  wrote:
>>> > Andres Freund  writes:
>>> >> Shouldn't we introduce a typedef LWLock* LWLockid; or something to avoid
>>> >> breaking external code using lwlocks?
>>> >
>>> > +1, in fact there's probably no reason to touch most *internal* code using
>>> > that type name either.
>>>
>>> I thought about this but figured it was too much of a misnomer to
>>> refer to a pointer as an ID.  But, if we're sure we want to go that
>>> route, I can go revise the patch along those lines.
>>
>> I personally don't care either way for internal code as long as external
>> code continues to work. There's the argument of making the commit better
>> readable by having less noise and less divergence in the branches and
>> there's your argument of that being less clear.
>
> OK, well then, if no one objects violently, I'll stick my current
> approach of getting rid of all core mentions of LWLockId in favor of
> LWLock *, but also add typedef LWLock *LWLockId with a comment that
> this is to minimize breakage of third-party code.

Hearing no objections, violent or otherwise, I've done that, made the
other adjustments suggested by Andres and KaiGai, and committed this.
Let's see what the buildfarm thinks...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch (v2) for updatable security barrier views

2014-01-27 Thread Simon Riggs
On 27 January 2014 15:04, Dean Rasheed  wrote:

> So for example, when planning the query to update an inheritance
> child, the rtable will contain an RTE for the parent, but it will not
> be referenced in the parse tree, and so it will not be expanded while
> planning the child update.

Am I right in thinking that we have this fully working now?

If we commit this aspect soon, we stand a chance of also touching upon RLS.

AFAICS the only area of objection is the handling of inherited
relations, which occurs within the planner in the current patch. I can
see that would be a cause for concern since the planner is pluggable
and it would then be possible to bypass security checks. Obviously
installing a new planner isn't trivial, but doing so shouldn't cause
collateral damage.

We have long had restrictions around updateable views. My suggestion
from here is that we accept the restriction that we cannot yet have
the 3-way combination of updateable views, security views and views on
inherited tables.

Most people aren't using inherited tables and people that are have
special measures in place for their apps. We won't lose much by
accepting that restriction for 9.4 and re-addressing the issue in a
later release. We need not adopt an all or nothing approach. Perhaps
we might yet find a solution for 9.4, but again, that need not delay
the rest of the patch.

>From a review perspective, I'd want to see some greatly expanded
README comments, but given the Wiki entry, I think we can do that
quickly. Other than that, the code seems clear, modular and well
tested, so is something I could see me committing the uncontentious
parts of.

Thoughts?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch (v2) for updatable security barrier views

2014-01-27 Thread Tom Lane
Simon Riggs  writes:
> Am I right in thinking that we have this fully working now?

I will look at this at some point during the CF, but have not yet,
and probably won't as long as it's not marked "ready for committer".

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inherit support for foreign tables

2014-01-27 Thread Atri Sharma


Sent from my iPad

> On 27-Jan-2014, at 21:03, David Fetter  wrote:
> 
>> On Mon, Jan 27, 2014 at 05:06:19PM +0900, Etsuro Fujita wrote:
>> Hi Hanada-san,
>> 
>> While still reviwing this patch, I feel this patch has given enough
>> consideration to interactions with other commands, but found the
>> following incorrect? behabior:
>> 
>> postgres=# CREATE TABLE product (id INTEGER, description TEXT);
>> CREATE TABLE
>> postgres=# CREATE FOREIGN TABLE product1 () INHERITS (product)
>> SERVER fs OPTIONS (filename '/home/foo/product1.csv', format 'csv');
>> CREATE FOREIGN TABLE
>> postgres=# ALTER TABLE product ALTER COLUMN description SET STORAGE
>> EXTERNAL;
>> ERROR:  "product1" is not a table or materialized view
>> 
>> ISTN the ALTER TABLE simple recursion mechanism (ie
>> ATSimpleRecursion()) should be modified for the ALTER COLUMN SET
>> STORAGE case.
> 
> This points to a larger discussion about what precisely foreign tables
> can and cannot inherit from local ones.  I don't think that a generic
> solution will be satisfactory, as the PostgreSQL FDW could, at least
> in principle, support many more than the CSV FDW, as shown above.
> 
> In my estimation, the outcome of discussion above is not a blocker for
> this 

I wonder what shall be the cases when foreign table is on a server which does 
not support *all* SQL features.

Does a FDW need to have the possible inherit options mentioned in its 
documentation for this patch?

Regards,

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inherit support for foreign tables

2014-01-27 Thread David Fetter
On Mon, Jan 27, 2014 at 05:06:19PM +0900, Etsuro Fujita wrote:
> Hi Hanada-san,
> 
> While still reviwing this patch, I feel this patch has given enough
> consideration to interactions with other commands, but found the
> following incorrect? behabior:
> 
> postgres=# CREATE TABLE product (id INTEGER, description TEXT);
> CREATE TABLE
> postgres=# CREATE FOREIGN TABLE product1 () INHERITS (product)
> SERVER fs OPTIONS (filename '/home/foo/product1.csv', format 'csv');
> CREATE FOREIGN TABLE
> postgres=# ALTER TABLE product ALTER COLUMN description SET STORAGE
> EXTERNAL;
> ERROR:  "product1" is not a table or materialized view
> 
> ISTN the ALTER TABLE simple recursion mechanism (ie
> ATSimpleRecursion()) should be modified for the ALTER COLUMN SET
> STORAGE case.

This points to a larger discussion about what precisely foreign tables
can and cannot inherit from local ones.  I don't think that a generic
solution will be satisfactory, as the PostgreSQL FDW could, at least
in principle, support many more than the CSV FDW, as shown above.

In my estimation, the outcome of discussion above is not a blocker for
this patch.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

2014-01-27 Thread Tom Lane
Amit Kapila  writes:
> On Fri, Jan 24, 2014 at 9:10 AM, Amit Kapila  wrote:
>> On Fri, Jan 24, 2014 at 2:38 AM, Robert Haas  wrote:
>>> I mean, if
>>> there's a GUC that controls the event source name, then it can be
>>> changed between restarts, regardless of what you call it.

>> Yes, but not default values (when user don't provide any value
>> for event_soource). Here the question is about default value of
>> event_source.

> To proceed with the review of this patch, I need to know about
> whether appending version number or any other constant togli

> Default Event Source name is acceptable or not, else for now
> we can remove this part of code from patch and handle non-default
> case where the change will be that pg_ctl will enquire non-default
> event_source value from server.

> Could you please let me know your views about same?

Unless I'm missing something, this entire thread is a tempest in a teapot,
because the default event_source value does not matter, because *by
default we don't log to eventlog*.  The presumption is that if the user
turns on logging to eventlog, it's his responsibility to first make sure
that event_source is set to something appropriate.  And who's to say that
plain "PostgreSQL" isn't what he wanted, anyway?  Even if he's got
multiple servers on one machine, maybe directing all their logs to the
same place is okay by him.

Also, those who don't run multiple servers are probably not going to
thank us for moving their logs around unnecessarily.

In short, I think we should just reject this idea as introducing more
problems than it solves, and not fully solving even the problem it
purports to solve.

Possibly there's room for a documentation patch reminding users to
make sure that event_source is set appropriately before they turn
on eventlog.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-01-27 Thread Andrew Dunstan

On 01/27/2014 08:48 AM, Mitsumasa KONDO wrote:
>
>
> The issue of concern is not the performance of pg_stat_statements,
> AUIU. The issue is whether this patch affects performance
> generally, i.e. is there a significant cost in collecting these
> extra stats. To test this you would compare two general pgbench
> runs, one with the patch applied and one without.
>
> I showed first test result which is compared with without
> pg_stat_statements and without patch last day. They ran in same server
> and same benchmark settings(clients and scale factor) as today's
> result. When you merge and see the results, you can confirm not to
> affect of performance in my patch.
>
>


Yeah, sorry, I misread your message. I think this is good to go from a
performance point of view, although I'm still a bit worried about the
validity of the method (accumulating a sum of squares). OTOH, Welford's
method probably requires slightly more per statement overhead, and
certainly requires one more accumulator per statement. I guess if we
find ill conditioned results it wouldn't be terribly hard to change.

cheers

andrew


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

2014-01-27 Thread Amit Kapila
On Fri, Jan 24, 2014 at 9:10 AM, Amit Kapila  wrote:
> On Fri, Jan 24, 2014 at 2:38 AM, Robert Haas  wrote:
>> On Thu, Jan 23, 2014 at 9:23 AM, Amit Kapila  wrote:
>>> On Thu, Jan 23, 2014 at 10:26 AM, Tom Lane  wrote:

 I think what we might want to do is redefine the server's behavior
 as creating an event named after the concatenation of event_source
 and port number, or maybe even get rid of event_source entirely and
 just say it's "PostgreSQL" followed by the port number.
>>>
>>>To accomplish this behaviour, each time server starts and stops,
>>>we need to register and unregister event log using mechanism
>>>described at below link to ensure that there is no mismatch between
>>>what server uses and what OS knows.
>>>http://www.postgresql.org/docs/devel/static/event-log-registration.html
>>
>> Why wouldn't that be necessary with your approach, too?
>
>Because in my approach we are using compile time constant
> + #define DEFAULT_EVENT_SOURCE
>"PostgreSQL " PG_MAJORVERSION
>
>> I mean, if
>> there's a GUC that controls the event source name, then it can be
>> changed between restarts, regardless of what you call it.
>
> Yes, but not default values (when user don't provide any value
> for event_soource). Here the question is about default value of
> event_source.

To proceed with the review of this patch, I need to know about
whether appending version number or any other constant to
Default Event Source name is acceptable or not, else for now
we can remove this part of code from patch and handle non-default
case where the change will be that pg_ctl will enquire non-default
event_source value from server.

Could you please let me know your views about same?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch (v2) for updatable security barrier views

2014-01-27 Thread Dean Rasheed
On 27 January 2014 07:54, Kouhei Kaigai  wrote:
> Hello,
>
> I checked the latest updatable security barrier view patch.
> Even though I couldn't find a major design problem in this revision,
> here are two minor comments below.
>
> I think, it needs to be reviewed by committer to stick direction
> to implement this feature. Of course, even I know Tom argued the
> current design of this feature on the up-thread, it does not seem
> to me Dean's design is not reasonable.
>

Thanks for looking at this.


> Below is minor comments of mine:
>
> @@ -932,9 +938,32 @@ inheritance_planner(PlannerInfo *root)
> if (final_rtable == NIL)
> final_rtable = subroot.parse->rtable;
> else
> -   final_rtable = list_concat(final_rtable,
> +   {
> +   List   *tmp_rtable = NIL;
> +   ListCell   *cell1, *cell2;
> +
> +   /*
> +* Planning this new child may have turned some of the original
> +* RTEs into subqueries (if they had security barrier quals). If
> +* so, we want to use these in the final rtable.
> +*/
> +   forboth(cell1, final_rtable, cell2, subroot.parse->rtable)
> +   {
> +   RangeTblEntry *rte1 = (RangeTblEntry *) lfirst(cell1);
> +   RangeTblEntry *rte2 = (RangeTblEntry *) lfirst(cell2);
> +
> +   if (rte1->rtekind == RTE_RELATION &&
> +   rte1->securityQuals != NIL &&
> +   rte2->rtekind == RTE_SUBQUERY)
> +   tmp_rtable = lappend(tmp_rtable, rte2);
> +   else
> +   tmp_rtable = lappend(tmp_rtable, rte1);
> +   }
>
> Do we have a case if rte1 is regular relation with securityQuals but
> rte2 is not a sub-query? If so, rte2->rtekind == RTE_SUBQUERY should
> be a condition in Assert, but the third condition in if-block.
>

Yes it is possible for rte1 to be a RTE_RELATION with securityQuals
and rte2 to be something other than a RTE_SUBQUERY because the
subquery expansion code in expand_security_quals() only expands RTEs
that are actually used in the query.

So for example, when planning the query to update an inheritance
child, the rtable will contain an RTE for the parent, but it will not
be referenced in the parse tree, and so it will not be expanded while
planning the child update.


> In case when a sub-query is simple enough; no qualifier and no projection
> towards underlying scan, is it pulled-up even if this sub-query has
> security-barrier attribute, isn't it?
> See the example below. The view v2 is defined as follows.
>
> postgres=# CREATE VIEW v2 WITH (security_barrier) AS SELECT * FROM t2 WHERE x 
> % 10 = 5;
> CREATE VIEW
> postgres=# EXPLAIN SELECT * FROM v2 WHERE f_leak(z);
>QUERY PLAN
> -
>  Subquery Scan on v2  (cost=0.00..3.76 rows=1 width=41)
>Filter: f_leak(v2.z)
>->  Seq Scan on t2  (cost=0.00..3.50 rows=1 width=41)
>  Filter: ((x % 10) = 5)
> (4 rows)
>
> postgres=# EXPLAIN SELECT * FROM v2;
> QUERY PLAN
> ---
>  Seq Scan on t2  (cost=0.00..3.50 rows=1 width=41)
>Filter: ((x % 10) = 5)
> (2 rows)
>
> The second explain result shows the underlying sub-query is
> pulled-up even though it has security-barrier attribute.
> (IIRC, it was a new feature in v9.3.)
>

Actually what happens is that it is planned as a subquery scan, then
at the very end of the planning process, it detects that the subquery
scan is trivial (has no quals, and has a no-op targetlist) and it
removes that plan node --- see set_subqueryscan_references() and
trivial_subqueryscan().

That subquery scan removal code requires the targetlist to have
exactly the same number of attributes, in exactly the same order as
the underlying relation. As soon as you add anything non-trivial to
the select list in the above queries, or even just change the order of
its attributes, the subquery scan node is no longer removed.


> On the other hand, this kind of optimization was not applied
> on a sub-query being extracted from a relation with securityQuals
>
> postgres=# EXPLAIN UPDATE v2 SET z = z;
>  QUERY PLAN
> 
>  Update on t2 t2_1  (cost=0.00..3.51 rows=1 width=47)
>->  Subquery Scan on t2  (cost=0.00..3.51 rows=1 width=47)
>  ->  Seq Scan on t2 t2_2  (cost=0.00..3.50 rows=1 width=47)
>Filter: ((x % 10) = 5)
> (4 rows)
>
> If it has no security_barrier option, the view reference is extracted
> in the rewriter stage, it was pulled up as we expected.
>
> postgres=# ALTER VIEW v2 RESET (security_barrier);
> ALTER VIEW
> postgres=# EXPLAIN UPDATE t2 SET z = z;
> QUERY PLAN
> ---
>  Update on t2  (cost=0.00..3.00 rows=100 width=47)
> 

Re: [HACKERS] Standalone synchronous master

2014-01-27 Thread Robert Haas
On Sun, Jan 26, 2014 at 10:56 PM, Rajeev rastogi
 wrote:
> On 01/25/2014, Josh Berkus wrote:
>> > ISTM the consensus is that we need better monitoring/administration
>> > interfaces so that people can script the behavior they want in
>> > external tools. Also, a new synchronous apply replication mode would
>> > be handy, but that'd be a whole different patch. We don't have a
>> patch
>> > on the table that we could consider committing any time soon, so I'm
>> > going to mark this as rejected in the commitfest app.
>>
>> I don't feel that "we'll never do auto-degrade" is determinative;
>> several hackers were for auto-degrade, and they have a good use-case
>> argument.  However, we do have consensus that we need more scaffolding
>> than this patch supplies in order to make auto-degrade *safe*.
>>
>> I encourage the submitter to resumbit and improved version of this
>> patch (one with more monitorability) for  9.5 CF1.  That'll give us a
>> whole dev cycle to argue about it.
>
> I shall rework to improve this patch. Below are the summarization of all
> discussions, which will be used as input for improving the patch:
>
> 1. Method of degrading the synchronous mode:
> a. Expose the configuration variable to a new SQL-callable functions.
> b. Using ALTER SYSTEM SET.
> c. Auto-degrade using some sort of configuration parameter as done in 
> current patch.
> d. Or may be combination of above, which DBA can use depending on 
> their use-cases.
>
>   We can discuss further to decide on one of the approach.
>
> 2. Synchronous mode should upgraded/restored after at-least one synchronous 
> standby comes up and has caught up with the master.
>
> 3. A better monitoring/administration interfaces, which can be even better if 
> it is made as a generic trap system.
>
>   I shall propose a better approach for this.
>
> 4. Send committing clients, a WARNING if they have committed a synchronous 
> transaction and we are in degraded mode.
>
> 5. Please add more if I am missing something.

All of those things have been mentioned, but I'm not sure we have
consensus on which of them we actually want to do, or how.  Figuring
that out seems like the next step.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shouldn't we log permission errors when accessing the configured trigger file?

2014-01-27 Thread Magnus Hagander
On Mon, Jan 27, 2014 at 3:43 PM, Robert Haas  wrote:

> On Sun, Jan 26, 2014 at 1:03 PM, Andres Freund 
> wrote:
> > For some reason CheckForStandbyTrigger() doesn't report permission
> > errors when stat()int the trigger file. Shouldn't we fix that?
> >
> > static bool
> > CheckForStandbyTrigger(void)
> > {
> > ...
> > if (stat(TriggerFile, &stat_buf) == 0)
> > {
> > ereport(LOG,
> > (errmsg("trigger file found: %s",
> TriggerFile)));
> > unlink(TriggerFile);
> > triggered = true;
> > fast_promote = true;
> > return true;
> > }
> >
> > Imo the stat() should warn about all errors but ENOENT?
>
> Seems reasonable.  It could lead to quite a bit of log spam, I
> suppose, but the way things are now could be pretty mystifying if
> you've located your trigger file somewhere outside $PGDATA, and a
> parent directory is lacking permissions.
>

+1. Since it actually indicates something that's quite broken (since with
that you can never make the trigger work until you fix it), the log spam
seems like it would be appropriate. (Logspam is never nice, but a single
log line is also very easy to miss - this should log enough that you
wouldn't)


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] shouldn't we log permission errors when accessing the configured trigger file?

2014-01-27 Thread Robert Haas
On Sun, Jan 26, 2014 at 1:03 PM, Andres Freund  wrote:
> For some reason CheckForStandbyTrigger() doesn't report permission
> errors when stat()int the trigger file. Shouldn't we fix that?
>
> static bool
> CheckForStandbyTrigger(void)
> {
> ...
> if (stat(TriggerFile, &stat_buf) == 0)
> {
> ereport(LOG,
> (errmsg("trigger file found: %s", 
> TriggerFile)));
> unlink(TriggerFile);
> triggered = true;
> fast_promote = true;
> return true;
> }
>
> Imo the stat() should warn about all errors but ENOENT?

Seems reasonable.  It could lead to quite a bit of log spam, I
suppose, but the way things are now could be pretty mystifying if
you've located your trigger file somewhere outside $PGDATA, and a
parent directory is lacking permissions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Race condition in b-tree page deletion

2014-01-27 Thread Heikki Linnakangas

On 12/17/2013 04:55 AM, Jim Nasby wrote:

On 11/9/13, 10:02 AM, Heikki Linnakangas wrote:

3. Another approach would be to get rid of the "can't delete
rightmost child" limitation. We currently have that limitation
because it ensures that we never need to change the high-key of a
page. If we delete a page that is the rightmost child of its
parent, we transfer the deleted keyspace from the parent page to
its right sibling. To do that, we need to update the high key of
the parent, as well as the downlink of the right sibling at the
grandparent level. That's a bit complicated, because updating the
high key might require splitting the page.


Is the rightmost child issue likely to affect indexes on increasing
values, like a queue table?


No. In a FIFO queue, you keep adding new items to the right-end of the 
index, so old pages become non-rightmost fairly quickly.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-01-27 Thread Stephen Frost
KaiGai Kohei,

* Kouhei Kaigai (kai...@ak.jp.nec.com) wrote:
> Is somebody available to volunteer to review the custom-scan patch?

I looked through it a bit and my first take away from it was that the
patches to actually use the new hooks were also making more changes to
the backend code, leaving me with the impression that the proposed
interface isn't terribly stable.  Perhaps those changes should have just
been in the first patch, but they weren't and that certainly gave me
pause.

I'm also not entirely convinced that this is the direction to go in when
it comes to pushing down joins to FDWs.  While that's certainly a goal
that I think we all share, this seems to be intending to add a
completely different feature which happens to be able to be used for
that.  For FDWs, wouldn't we only present the FDW with the paths where
the foreign tables for that FDW, or perhaps just a given foreign server,
are being joined?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime

2014-01-27 Thread Amit Langote
On Mon, Jan 27, 2014 at 11:18 PM, Amit Langote  wrote:
> On Mon, Jan 13, 2014 at 2:50 PM, Amit Kapila  wrote:
>> I have extended test (contrib) module dsm_demo such that now user
>> can specify during dsm_demo_create the lifespan of segment.
>> The values it can accept are 0 or 1. Default value is 0.
>> 0 -- means segment will be accessible for session life time
>> 1 -- means segment will be accessible for postmaster life time
>>
>>
>> The behaviour is as below:
>> Test -1 (Session life time)
>> Session - 1
>> -- here it will create segment for session lifetime
>> select dsm_demo_create('this message is from session-1', 0);
>>  dsm_demo_create
>> -
>>82712
>>
>> Session - 2
>> -
>> select dsm_demo_read(82712);
>>dsm_demo_read
>> 
>>  this message is from session-1
>> (1 row)
>>
>>
>> Session-1
>> \q
>>
>> Session-2
>> postgres=# select dsm_demo_read(82712);
>>  dsm_demo_read
>> ---
>>
>> (1 row)
>>
>> Conclusion of Test-1 : As soon as session which has created segment finished,
>> the segment becomes non-accessible.
>>
>>
>> Test -2 (Postmaster life time)
>> Session - 1
>> -- here it will create segment for postmaster lifetime
>> select dsm_demo_create('this message is from session-1', 1);
>>  dsm_demo_create
>> -
>>82712
>>
>> Session - 2
>> -
>> select dsm_demo_read(82712);
>>dsm_demo_read
>> 
>>  this message is from session-1
>> (1 row)
>>
>>
>> Session-1
>> \q
>>
>> Session-2
>> postgres=# select dsm_demo_read(82712);
>>  dsm_demo_read
>> ---
>>  this message is from session-1
>> (1 row)
>>
>> Conclusion of Test-2 : a. Segment is accessible for postmaster lifetime.
>>  b. if user restart server, segment is
>> not accessible.
>>
>>
>
> Applied dsm_keep_segment_v1.patch and dsm_demo_v1.patch.
> Got the following warning when I tried above example:
>
> postgres=# select dsm_demo_create('this message is from session-new', 1);
> WARNING:  dynamic shared memory leak: segment 1402373971 still referenced
> WARNING:  dynamic shared memory leak: segment 1402373971 still referenced
>  dsm_demo_create
> -
>   1402373971
> (1 row)
>
>

I see that PrintDSMLeakWarning() which emits this warning is for debugging.

--
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime

2014-01-27 Thread Amit Langote
On Mon, Jan 13, 2014 at 2:50 PM, Amit Kapila  wrote:
> I have extended test (contrib) module dsm_demo such that now user
> can specify during dsm_demo_create the lifespan of segment.
> The values it can accept are 0 or 1. Default value is 0.
> 0 -- means segment will be accessible for session life time
> 1 -- means segment will be accessible for postmaster life time
>
>
> The behaviour is as below:
> Test -1 (Session life time)
> Session - 1
> -- here it will create segment for session lifetime
> select dsm_demo_create('this message is from session-1', 0);
>  dsm_demo_create
> -
>82712
>
> Session - 2
> -
> select dsm_demo_read(82712);
>dsm_demo_read
> 
>  this message is from session-1
> (1 row)
>
>
> Session-1
> \q
>
> Session-2
> postgres=# select dsm_demo_read(82712);
>  dsm_demo_read
> ---
>
> (1 row)
>
> Conclusion of Test-1 : As soon as session which has created segment finished,
> the segment becomes non-accessible.
>
>
> Test -2 (Postmaster life time)
> Session - 1
> -- here it will create segment for postmaster lifetime
> select dsm_demo_create('this message is from session-1', 1);
>  dsm_demo_create
> -
>82712
>
> Session - 2
> -
> select dsm_demo_read(82712);
>dsm_demo_read
> 
>  this message is from session-1
> (1 row)
>
>
> Session-1
> \q
>
> Session-2
> postgres=# select dsm_demo_read(82712);
>  dsm_demo_read
> ---
>  this message is from session-1
> (1 row)
>
> Conclusion of Test-2 : a. Segment is accessible for postmaster lifetime.
>  b. if user restart server, segment is
> not accessible.
>
>

Applied dsm_keep_segment_v1.patch and dsm_demo_v1.patch.
Got the following warning when I tried above example:

postgres=# select dsm_demo_create('this message is from session-new', 1);
WARNING:  dynamic shared memory leak: segment 1402373971 still referenced
WARNING:  dynamic shared memory leak: segment 1402373971 still referenced
 dsm_demo_create
-
  1402373971
(1 row)



--
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What is happening on buildfarm member crake?

2014-01-27 Thread Robert Haas
On Sat, Jan 25, 2014 at 5:04 PM, Tom Lane  wrote:
> Yeah.  If Robert's diagnosis is correct, and it sounds pretty plausible,
> then this is really just one instance of a bug that's probably pretty
> widespread in our signal handlers.  Somebody needs to go through 'em
> all and look for touches of shared memory.

I haven't made a comprehensive study of every signal handler we have,
but looking at procsignal_sigusr1_handler, the list of functions that
can get called from here is quite short: CheckProcSignal(),
RecoveryConflictInterrupt(), SetLatch(), and latch_sigusr1_handler().
Taking those in reverse order:

- latch_sigusr1_handler() is fine.  Nothing down this path touches
shared memory; moreover, if we've already disowned our latch, the
waiting flag won't be set and this will do nothing at all.
- The call to SetLatch() is problematic as we already know.  This is
new code in 9.4.
- RecoveryConflictInterrupt() does nothing if proc_exit_inprogress is
set.  So it's fine.
- CheckProcSignal() also appears problematic.  If we've already
detached shared memory, MyProcSignalSlot will be pointing to garbage,
but we'll try to dereference it anyway.

I think maybe the best fix is to *clear* MyProc in
ProcKill/AuxiliaryProcKill and MyProcSignalSlot in
CleanupProcSignalState, as shown in the attached patch.  Most places
that dereference those pointers already check that they aren't null,
and we can easily add a NULL guard to the SetLatch() call in
procsignal_sigusr1_handler, which the attached patch also does.

This might not be a complete fix to every problem of this type that
exists anywhere in our code, but I think it's enough to make the world
safe for procsignal_sigusr1_handler.  We also have a *large* number of
signal handlers that do little more than this:

if (MyProc)
SetLatch(&MyProc->procLatch);

...and this change should make all of those safe as well.  So I think
this is a pretty good start.

Assuming nobody objects too much to this basic approach, should I
back-patch the parts of this that apply pre-9.4?  The problem with
CleanupProcSignalState, at least, goes all the way back to 9.0, when
the signal-multiplexing infrastructure was introduced.  But the
probability of an actual crash must be pretty low, or I imagine we
would have noticed this sooner.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 6ebabce..1372a7e 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -149,6 +149,8 @@ CleanupProcSignalState(int status, Datum arg)
 	slot = &ProcSignalSlots[pss_idx - 1];
 	Assert(slot == MyProcSignalSlot);
 
+	MyProcSignalSlot = NULL;
+
 	/* sanity check */
 	if (slot->pss_pid != MyProcPid)
 	{
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index ee6c24c..f64e1c4 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -772,6 +772,7 @@ ProcKill(int code, Datum arg)
 {
 	/* use volatile pointer to prevent code rearrangement */
 	volatile PROC_HDR *procglobal = ProcGlobal;
+	PGPROC	   *proc;
 
 	Assert(MyProc != NULL);
 
@@ -796,31 +797,34 @@ ProcKill(int code, Datum arg)
 	 */
 	LWLockReleaseAll();
 
-	/* Release ownership of the process's latch, too */
-	DisownLatch(&MyProc->procLatch);
+	/*
+	 * Clear MyProc first; then disown the process latch.  This is so that
+	 * signal handlers won't try to clear the process latch after it's no
+	 * longer ours.
+	 */
+	proc = MyProc;
+	MyProc = NULL;
+	DisownLatch(&proc->procLatch);
 
 	SpinLockAcquire(ProcStructLock);
 
 	/* Return PGPROC structure (and semaphore) to appropriate freelist */
 	if (IsAnyAutoVacuumProcess())
 	{
-		MyProc->links.next = (SHM_QUEUE *) procglobal->autovacFreeProcs;
-		procglobal->autovacFreeProcs = MyProc;
+		proc->links.next = (SHM_QUEUE *) procglobal->autovacFreeProcs;
+		procglobal->autovacFreeProcs = proc;
 	}
 	else if (IsBackgroundWorker)
 	{
-		MyProc->links.next = (SHM_QUEUE *) procglobal->bgworkerFreeProcs;
-		procglobal->bgworkerFreeProcs = MyProc;
+		proc->links.next = (SHM_QUEUE *) procglobal->bgworkerFreeProcs;
+		procglobal->bgworkerFreeProcs = proc;
 	}
 	else
 	{
-		MyProc->links.next = (SHM_QUEUE *) procglobal->freeProcs;
-		procglobal->freeProcs = MyProc;
+		proc->links.next = (SHM_QUEUE *) procglobal->freeProcs;
+		procglobal->freeProcs = proc;
 	}
 
-	/* PGPROC struct isn't mine anymore */
-	MyProc = NULL;
-
 	/* Update shared estimate of spins_per_delay */
 	procglobal->spins_per_delay = update_spins_per_delay(procglobal->spins_per_delay);
 
@@ -849,6 +853,7 @@ AuxiliaryProcKill(int code, Datum arg)
 {
 	int			proctype = DatumGetInt32(arg);
 	PGPROC	   *auxproc PG_USED_FOR_ASSERTS_ONLY;
+	PGPROC	   *proc;
 
 	Assert(proctype >= 0 && proctype < NUM_AUXILIARY_PROCS);
 
@@ -859,16 +864,19 @@ AuxiliaryProcKill(int code, Datum arg)
 	/* Release any LW locks I am holding (see

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-01-27 Thread Mitsumasa KONDO
2014-01-27 Andrew Dunstan 

>
> On 01/27/2014 07:09 AM, KONDO Mitsumasa wrote:
>
>> (2014/01/23 23:18), Andrew Dunstan wrote:
>>
>>> What is more, if the square root calculation is affecting your
>>> benchmarks, I
>>> suspect you are benchmarking the wrong thing.
>>>
>> I run another test that has two pgbench-clients in same time, one is
>> select-only-query and another is executing 'SELECT *  pg_stat_statement'
>> query in every one second. I used v6 patch in this test.
>>
>>
>
> The issue of concern is not the performance of pg_stat_statements, AUIU.
> The issue is whether this patch affects performance generally, i.e. is
> there a significant cost in collecting these extra stats. To test this you
> would compare two general pgbench runs, one with the patch applied and one
> without.

I showed first test result which is compared with without
pg_stat_statements and without patch last day.  They ran in same server and
same benchmark settings(clients and scale factor) as today's result. When
you merge and see the results, you can confirm not to affect of performance
in my patch.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


Re: [HACKERS] Changeset Extraction v7.1

2014-01-27 Thread Robert Haas
On Sat, Jan 25, 2014 at 5:25 PM, Andres Freund  wrote:
>> I'm also wondering about
>> whether we've got the right naming here.  AFAICT, it's not the case
>> that we're going to use the "catalog xmin" for catalogs and the "data
>> xmin" for non-catalogs.  Rather, the "catalog xmin" is going to always
>> be included in globalxmin calculations, so IOW it should just be
>> called "xmin".
>
> Well, not really. That's true for GetSnapshotData(), but not for
> GetOldestXmin() where we calculate an xmin *not* including the catalog
> xmin. And the data_xmin is always used, regardless of
> catalog/non_catalog, we peg the xmin further for catalog tables, based
> on that value.
> The reason for doing things this way is that it makes all current usages
> of RecentGlobalXmin safe, since that is the more conservative
> value. Only in inspected location we can use RecentGlobalDataXmin which
> *does* include data_xmin but *not* catalog_xmin.

Well, OK, so I guess I'm turned around.  But I guess my point is - if
one of data_xmin and catalog_xmin is really just xmin, then I think it
would be more clear to call that one "xmin".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inherit support for foreign tables

2014-01-27 Thread Shigeru Hanada
2014-01-27 Etsuro Fujita :
> While still reviwing this patch, I feel this patch has given enough
> consideration to interactions with other commands, but found the following
> incorrect? behabior:
>
> postgres=# CREATE TABLE product (id INTEGER, description TEXT);
> CREATE TABLE
> postgres=# CREATE FOREIGN TABLE product1 () INHERITS (product) SERVER fs
> OPTIONS (filename '/home/foo/product1.csv', format 'csv');
> CREATE FOREIGN TABLE
> postgres=# ALTER TABLE product ALTER COLUMN description SET STORAGE
> EXTERNAL;
> ERROR:  "product1" is not a table or materialized view
>
> ISTN the ALTER TABLE simple recursion mechanism (ie ATSimpleRecursion())
> should be modified for the ALTER COLUMN SET STORAGE case.
>
> I just wanted to quickly tell you this for you to take time to consider.

Thanks for the review.  It must be an oversight, so I'll fix it up soon.

-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.1

2014-01-27 Thread Robert Haas
On Sat, Jan 25, 2014 at 5:25 PM, Andres Freund  wrote:
>> Looking over patch 0002, I see that there's code to allow a walsender
>> to create or drop a physical replication slot.  Also, if we've
>> acquired a replication slot, there's code to update it, and code to
>> make sure we disconnect from it, but there's no code to acquire it. I
>> think maybe the hunk in StartReplication() is supposed to be calling
>> ReplicationSlotAcquire() instead of ReplicationSlotRelease(),
>
> Uh. You had me worried here for a minute or two, a hunk or two earlier
> than the ReplicationSlotRelease() you mention. What probably confused
> you is that StartReplication only returns once all streaming is
> finished. Not my idea...

No, what confuses me is that there's no call to
ReplicationSlotAcquire() in patch 0001 or patch 0002  the function
is added but not called.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Review] inherit support for foreign tables

2014-01-27 Thread Shigeru Hanada
2014-01-27 Etsuro Fujita :
> (2014/01/25 11:27), Shigeru Hanada wrote:
> Yeah, the consistency is essential for its ease of use.  But I'm not sure
> that inherited stats ignoring foreign tables is actually useful for query
> optimization.  What I think about the consistency is a) the ANALYZE command
> with no table names skips ANALYZEing inheritance trees that include at least
> one foreign table as a child, but b) the ANALYZE command with a table name
> indicating an inheritance tree that includes any foreign tables does compute
> the inherited stats in the same way as an inheritance tree consiting of only
> ordinary tables by acquiring the sample rows from each foreign table on the
> far side.

b) sounds little complex to understand or explain.

Is it too big change that making ANALYZE command to handle foreign
tables too even if no table name was specified?  IIRC, performance
issue was the reason to exclude foreign tables from auto-analyze
processing.  ANALYZEing large database contains local huge data also
takes long time.  One idea to avoid unexpected long processing is to
add option to foreign tables to mark it as "not-auto-analyzable".

Thoughts?
-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >