Re: [HACKERS] Parallel Index Scans

2017-01-17 Thread Rahila Syed
>+ /* Check if the scan for current scan keys is finished */
>+ if (so->arrayKeyCount < btscan->btps_arrayKeyCount)
>+ *status = false;

>I didn't clearly understand, in which scenario the arrayKeyCount is less
>than btps_arrayKeyCount?
Consider following array scan keys

select * from test2 where j=ANY(ARRAY[1000,2000,3000]);

By the time the current worker has finished reading heap tuples
corresponding
to array key 1000(arrayKeyCount = 0), some other worker might have advanced
the scan to the
array key 2000(btps_arrayKeyCount =1). In this case when the current worker
fetches next page to scan,
it must advance its scan keys before scanning the next page of parallel
scan.
I hope this helps.

>+BlockNumber
>+_bt_parallel_seize(IndexScanDesc scan, bool *status)

>The return value of the above function is validated only in _bt_first
>function, but in other cases it is not.
In other cases it is validated in _bt_readnextpage() which is called after
_bt_parallel_seize().

>From the function description,
>it is possible to return P_NONE for the workers also with status as
>true. I feel it is better to handle the P_NONE case internally only
>so that callers just check for the status. Am i missing anything?

In case of the next block being P_NONE and status true, the code
calls _bt_parallel_done() to notify other workers followed by
BTScanPosInvalidate(). Similar check for block = P_NONE also
happens in existing code. See following in _bt_readnextpage(),


if (blkno == P_NONE || !so->currPos.moreRight)
{
   _bt_parallel_done(scan);
BTScanPosInvalidate(so->currPos);
return false;
}
So to keep it consistent with the existing code, the check
is kept outside _bt_parallel_seize().

Thank you,
Rahila Syed


On Wed, Jan 18, 2017 at 6:25 AM, Haribabu Kommi 
wrote:

>
>
> On Mon, Jan 16, 2017 at 11:11 PM, Amit Kapila 
> wrote:
>
>>
>> Changed as per suggestion.
>>
>>
>> I have also rebased the optimizer/executor support patch
>> (parallel_index_opt_exec_support_v4.patch) and added a test case in
>> it.
>
>
> Thanks for the patch. Here are comments found during review.
>
> parallel_index_scan_v4.patch:
>
>
> + amtarget = (char *) ((void *) target) + offset;
>
> The above calcuation can be moved after NULL check?
>
> + * index_beginscan_parallel - join parallel index scan
>
> The name and the description doesn't sync properly, any better description?
>
> + BTPARALLEL_DONE,
> + BTPARALLEL_IDLE
> +} PS_State;
>
> The order of above two enum values can be changed according to their use.
>
> + /* Check if the scan for current scan keys is finished */
> + if (so->arrayKeyCount < btscan->btps_arrayKeyCount)
> + *status = false;
>
> I didn't clearly understand, in which scenario the arrayKeyCount is less
> than btps_arrayKeyCount?
>
>
> +BlockNumber
> +_bt_parallel_seize(IndexScanDesc scan, bool *status)
>
> The return value of the above function is validated only in _bt_first
> function, but in other cases it is not. From the function description,
> it is possible to return P_NONE for the workers also with status as
> true. I feel it is better to handle the P_NONE case internally only
> so that callers just check for the status. Am i missing anything?
>
>
> +extern BlockNumber _bt_parallel_seize(IndexScanDesc scan, bool *status);
> +extern void _bt_parallel_release(IndexScanDesc scan, BlockNumber
> scan_page);
>
> Any better names for the above functions, as these function will
> provide/set
> the next page that needs to be read.
>
>
> parallel_index_opt_exec_support_v4.patch:
>
> +#include "access/parallel.h"
>
> Why is it required to be include nbtree.c? i didn't find
> any code changes in the patch.
>
>
> + /* reset (parallel) index scan */
> + if (node->iss_ScanDesc)
> + {
>
> Why this if check required? There is an assert check in later function
> calls.
>
>
> Regards,
> Hari Babu
> Fujitsu Australia
>


Re: [HACKERS] RustgreSQL

2017-01-17 Thread David Fetter
On Mon, Jan 09, 2017 at 12:51:43PM -0500, Robert Haas wrote:
> On Sun, Jan 8, 2017 at 4:59 AM, Gavin Flower
>  wrote:
> >> Is this completely unrealistic or is it carved in stone PostgreSQL will
> >> always be a C project forever and ever?
> >>
> > From my very limited understanding, PostgreSQL is more likely to be
> > converted to C++!
> 
> I'm tempted to snarkily reply that we should start by finishing the
> conversion of PostgreSQL from LISP to C before we worry about
> converting it to anything else.  There are various code comments that
> imply that it actually was LISP at one time and I can certainly
> believe that given our incredibly wasteful use of linked lists in so
> many places.  gram.y asserts that this problem was fixed as far as the
> grammar is concerned...
> 
>  *AUTHORDATEMAJOR EVENT
>  *Andrew Yu Sept, 1994
> POSTQUEL to SQL conversion
>  *Andrew Yu Oct, 1994   lispy
> code conversion
> 
> ...but I think it'd be fair to say that even there it was fixed only in part.

David Gould (added to Cc:) mentioned that he had some ideas as to how
to address this.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_hba_file_settings view patch

2017-01-17 Thread Haribabu Kommi
On Tue, Jan 17, 2017 at 5:24 PM, Michael Paquier 
wrote:

> On Tue, Jan 17, 2017 at 10:19 AM, Haribabu Kommi
>  wrote:
> > On Tue, Jan 10, 2017 at 6:35 PM, Michael Paquier <
> michael.paqu...@gmail.com>
> > wrote:
> >> +/* LDAP supports 10 currently, keep this well above the most any
> >> method needs */
> >> +#define MAX_OPTIONS 12
> >> Er, why? There is an assert already, that should be enough.
> >
> >
> > Which Assert? This macro is used to verify that the maximum number
> > of authentication options that are possible for a single hba line.
>
> That one:
> +   Assert(noptions <= MAX_OPTIONS);
> +   if (noptions)
> +   return PointerGetDatum(
> +   construct_array(options, noptions, TEXTOID, -1, false,
> 'i'));


Sorry, I didn't clearly understand of your comment. The MAX_OPTIONS
macro is used to fill the Datum array to generate the options text array
data.


> >> =# \d pg_hba_rules
> >>View "pg_catalog.pg_hba_rules"
> >>   Column  |  Type   | Collation | Nullable | Default
> >> --+-+---+--+-
> >>  line_number  | integer |   |  |
> >>  type | text|   |  |
> >>  keyword_database | text[]  |   |  |
> >>  database | text[]  |   |  |
> >>  keyword_user | text[]  |   |  |
> >>  user_name| text[]  |   |  |
> >>  keyword_address  | text|   |  |
> >>  address  | inet|   |  |
> >>  netmask  | inet|   |  |
> >>  hostname | text|   |  |
> >>  method   | text|   |  |
> >>  options  | text[]  |   |  |
> >>  error| text|   |  |
> >> keyword_database and database map actually to the same thing if you
> >> refer to a raw pg_hba.conf file because they have the same meaning for
> >> user. You could simplify the view and just remove keyword_database,
> >> keyword_user and keyword_address. This would simplify your patch as
> >> well with all hte mumbo-jumbo to see if a string is a dedicated
> >> keyword or not. In its most simple shape pg_hba_rules should show to
> >> the user as an exact map of the entries of the raw file.
> >
> > I removed keyword_database and keyword_user columns where the data
> > in those columns can easily represent with the database and user columns.
> > But for address filed can contains keywords such as "same host" and etc
> and
> > also a hostname also. Because of this reason, this field is converted
> into
> > 3 columns in the view.
>
> Hm. We could as well consider cidr and use just one column. But still,
> the use of inet as a data type in a system view looks like a wrong
> choice to me. Or we could actually just use text... Opinions from
> others are welcome here of course.
>

Changed to text datatype and merged address, keyword_address and hostname
into address column. The netmask is the extra column in the view.


> > Updated patch attached.
>
> This begins to look good. I have found a couple of minor issues.
>
> +  
> +   The pg_hba_rules view can be read only by
> +   superusers.
> +  
> This is not true anymore.
>

removed.

+ 
> +  Line number within client authentication configuration file
> +  the current value was set at
> + 
> I'd tune that without a past sentence. Saying just pg_hba.conf would
> be fine perhaps?
>

changed to - "Line number of the client authentication rule in pg_hba.conf
file"


> +
> + database
> + text[]
> + List of database name
> +
> This should be plural, database nameS.
>

corrected.

+ 
> +  List of keyword address names,
> +  name can be all, samehost and samenet
> + 
> Phrasing looks weird to me, what about "List of keyword address names,
> whose values can be all, samehost or samenet", with  markups.
>

corrected.

+postgres=# select line_number, type, database, user_name, auth_method
> from pg_hba_rules;
> Nit: this could be upper-cased.
>

corrected.

+static Datum
> +getauthmethod(UserAuth auth_method)
> +{
> + ...
> +   default:
> +   elog(ERROR, "unexpected authentication method in parsed HBA
> entry");
> +   break;
> +   }
> I think that you should also remove the default clause here to catchup
> errors at compilation.
>

removed.


> +   switch (hba->conntype)
> +   {
> +   case ctLocal:
> +   values[index] = CStringGetTextDatum("local");
> +   break;
> +   case ctHost:
> +   values[index] = CStringGetTextDatum("host");
> +   break;
> +   case ctHostSSL:
> +   values[index] = CStringGetTextDatum("hostssl");
> +   break;
> +   case ctHostNoSSL:
> +   values[index] = CStringGetTextDatum("hostnossl");

Re: [HACKERS] Cache Hash Index meta page.

2017-01-17 Thread Mithun Cy
On Tue, Jan 17, 2017 at 10:07 AM, Amit Kapila  wrote:
> 1.
> (a) I think you can retain the previous comment or modify it slightly.
> Just removing the whole comment and replacing it with a single line
> seems like a step backward.

-- Fixed, Just modified to say it

> (b) Another somewhat bigger problem is that with this new change it
> won't retain the pin on meta page till the end which means we might
> need to perform an I/O again during operation to fetch the meta page.
> AFAICS, you have just changed it so that you can call new API
> _hash_getcachedmetap, if that's true, then I think you have to find
> some other way of doing it.  BTW, why can't you design your new API
> such that it always take pinned metapage?  You can always release the
> pin in the caller if required.  I understand that you don't always
> need a metapage in that API, but I think the current usage of that API
> is also not that good.


-- Yes what you say is right. I wanted to make _hash_getcachedmetap
API self-sufficient. But always 2 possible consecutive reads for
metapage are connected as we pin the page to buffer to avoid I/O. Now
redesigned the API such a way that caller pass pinned meta page if we
want to set the metapage cache; So this gives us the flexibility to
use the cached meta page data in different places.


> 2.
> + if (bucket_opaque->hasho_prevblkno != InvalidBlockNumber ||
> + bucket_opaque->hasho_prevblkno > cachedmetap->hashm_maxbucket)
> + cachedmetap = _hash_getcachedmetap(rel, true);
>
> I don't understand the meaning of above if check.  It seems like you
> will update the metapage when previous block number is not a valid
> block number which will be true at the first split.  How will you
> ensure that there is a re-split and cached metapage is not relevant.
> I think if there is && in the above condition, then we can ensure it.
>

-- Oops that was a mistake corrected as you stated.

> 3.
> + Given a hashkey get the target bucket page with read lock, using cached
> + metapage. The getbucketbuf_from_hashkey method below explains the same.
> +
>
> All the sentences in algorithm start with small letters, then why do
> you need an exception for this sentence.  I think you don't need to
> add an empty line. Also, I think the usage of
> getbucketbuf_from_hashkey seems out of place.  How about writing it
> as:
>
> The usage of cached metapage is explained later.
>

-- Fixed as like you have asked.

>
> 4.
> + If target bucket is split before metapage data was cached then we are
> + done.
> + Else first release the bucket page and then update the metapage cache
> + with latest metapage data.
>
> I think it is better to retain original text of readme and add about
> meta page update.
>

-- Fixed. Now where ever it is meaning full I have kept original
wordings. But, the way we get to right target buffer after the latest
split is slightly different than what the original code did. So there
is a slight modification to show we use metapage cache.

> 5.
> + Loop:
> ..
> ..
> + Loop again to reach the new target bucket.
>
> No need to write "Loop again ..", that is implicit.
>

-- Fixed as liked you have asked.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


cache_hash_index_meta_page_12.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gather Merge

2017-01-17 Thread Rushabh Lathia
On Tue, Jan 17, 2017 at 6:44 PM, Amit Kapila 
wrote:

> On Tue, Jan 17, 2017 at 5:19 PM, Rushabh Lathia
>  wrote:
> >
> >
> >
> > Here is the benchmark number which I got with the latest (v6) patch:
> >
> > - max_worker_processes = DEFAULT (8)
> > - max_parallel_workers_per_gather = 4
> > - Cold cache environment is ensured. With every query execution - server
> is
> >   stopped and also OS caches were dropped.
> > - The reported values of execution time (in ms) is median of 3
> executions.
> > - power2 machine with 512GB of RAM
> > - With default postgres.conf
> >
>
> You haven't mentioned scale factor used in these tests.
>

Oops sorry. Those results are for scale factor 10.


>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>



-- 
Rushabh Lathia


Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2017-01-17 Thread Michael Paquier
On Tue, Dec 20, 2016 at 10:47 AM, Michael Paquier
 wrote:
> And Heikki has mentioned me that he'd prefer not having an extra
> dependency for the normalization, which is LGPL-licensed by the way.
> So I have looked at the SASLprep business to see what should be done
> to get a complete implementation in core, completely independent of
> anything known.
>
> The first thing is to be able to understand in the SCRAM code if a
> string is UTF-8 or not, and this code is in src/common/. pg_wchar.c
> offers a set of routines exactly for this purpose, which is built with
> libpq but that's not available for src/common/. So instead of moving
> all the file, I'd like to create a new file in src/common/utf8.c which
> includes pg_utf_mblen() and pg_utf8_islegal(). On top of that I think
> that having a routine able to check a full string would be useful for
> many users, as pg_utf8_islegal() can only check one set of characters.
> If the password string is found to be of UTF-8 format, SASLprepare is
> applied. If not, the string is copied as-is with perhaps unexpected
> effects for the client But he's in trouble already if client is not
> using UTF-8.
>
> Then comes the real business... Note that's my first time touching
> encoding, particularly UTF-8 in depth, so please be nice. I may write
> things that are incorrect or sound so from here :)
>
> The second thing is the normalization itself. Per RFC4013, NFKC needs
> to be applied to the string.  The operation is described in [1]
> completely, and it is named as doing 1) a compatibility decomposition
> of the bytes of the string, followed by 2) a canonical composition.
>
> About 1). The compatibility decomposition is defined in [2], "by
> recursively applying the canonical and compatibility mappings, then
> applying the canonical reordering algorithm". Canonical and
> compatibility mapping are some data available in UnicodeData.txt, the
> 6th column of the set defined in [3] to be precise. The meaning of the
> decomposition mappings is defined in [2] as well. The canonical
> decomposition is basically to look for a given UTF-8 character, and
> then apply the multiple characters resulting in its new shape. The
> compatibility mapping should as well be applied, but [5], a perl tool
> called charlint.pl doing this normalization work, does not care about
> this phase... Do we?
>
> About 2)... Once the decomposition has been applied, those bytes need
> to be recomposed using the Canonical_Combining_Class field of
> UnicodeData.txt in [3], which is the 3rd column of the set. Its values
> are defined in [4]. An other interesting thing, charlint.pl [5] does
> not care about this phase. I am wondering if we should as well not
> just drop this part as well...
>
> Once 1) and 2) are done, NKFC is complete, and so is SASLPrepare.
>
> So what we need from Postgres side is a mapping table to, having the
> following fields:
> 1) Hexa sequence of UTF8 character.
> 2) Its canonical combining class.
> 3) The kind of decomposition mapping if defined.
> 4) The decomposition mapping, in hexadecimal format.
> Based on what I looked at, either perl or python could be used to
> process UnicodeData.txt and to generate a header file that would be
> included in the tree. There are 30k entries in UnicodeData.txt, 5k of
> them have a mapping, so that will result in many tables. One thing to
> improve performance would be to store the length of the table in a
> static variable, order the entries by their hexadecimal keys and do a
> dichotomy lookup to find an entry. We could as well use more fancy
> things like a set of tables using a Radix tree using decomposed by
> bytes. We should finish by just doing one lookup of the table for each
> character sets anyway.
>
> In conclusion, at this point I am looking for feedback regarding the
> following items:
> 1) Where to put the UTF8 check routines and what to move.
> 2) How to generate the mapping table using UnicodeData.txt. I'd think
> that using perl would be better.
> 3) The shape of the mapping table, which depends on how many
> operations we want to support in the normalization of the strings.
> The decisions for those items will drive the implementation in one
> sense or another.
>
> [1]: http://www.unicode.org/reports/tr15/#Description_Norm
> [2]: 
> http://www.unicode.org/Public/5.1.0/ucd/UCD.html#Character_Decomposition_Mappings
> [3]: http://www.unicode.org/Public/5.1.0/ucd/UCD.html#UnicodeData.txt
> [4]: 
> http://www.unicode.org/Public/5.1.0/ucd/UCD.html#Canonical_Combining_Class_Values
> [5]: https://www.w3.org/International/charlint/
>
> Heikki, others, thoughts?

FWIW, this patch is on a "waiting on author" state and that's right.
As the discussion on SASLprepare() and the decisions regarding the way
to implement it, or at least have it, are still pending, I am not
planning to move on with any implementation until we have a plan about
what to do. Just using libidn (LGPL) for a first shot is rather
painless 

Re: [WIP] RE: [HACKERS] DECLARE STATEMENT setting up a connection in ECPG

2017-01-17 Thread Michael Paquier
On Fri, Jan 6, 2017 at 6:10 PM, Ideriha, Takeshi
 wrote:
> Hi
> Thank you for looking over my patch.
>
>> Thank you. Would it be possible for you to re-create the patch without the
>> white-space changes?
>
> I'm sorry for adding redundant white-spaces.
> Attached is a correct version.
>
>> I noticed that the docs say the statement is a PostgreSQL addon.
>> However, I think other databases do have the same statement, or don't they?
>
> Yes, other databases such as Oracle and DB2 also have the same statement.
> I fixed docs and mentioned the above databases in the compatibility section.
>
> But I'm not sure whether I should mention the other databases explicitly
> because the other command docs don't mention Oracle or so explicitly in 
> compatibility section
> as long as I read.

Idehira-san, this is a very intrusive patch... It really needs a
specific reviewer to begin with, but really it would be nice if you
could review someone else's patch, with a difficulty rather equivalent
to what we have here.

By the way, I have been able to crash your patch when running the
regression tests:
(lldb) bt
* thread #1: tid = 0x, 0x7fff89a828b0
libsystem_platform.dylib`_platform_strcmp + 176, stop reason = signal
SIGSTOP
  * frame #0: 0x7fff89a828b0 libsystem_platform.dylib`_platform_strcmp + 176
frame #1: 0x00010c835bc3
libecpg.6.dylib`ecpg_release_declared_statement(connection_name="con3")
+ 83 at prepare.c:740
frame #2: 0x00010c838103
libecpg.6.dylib`ECPGdisconnect(lineno=81, connection_name="ALL") + 179
at connect.c:697
frame #3: 0x00010c811922 declare`main(argc=1,
argv=0x7fff533ee320) + 434 at declare.pgc:81
frame #4: 0x7fff932345ad libdyld.dylib`start + 1

You also need to add in src/interfaces/ecpg/test/sql/.gitignore new
entries related to the files you are adding and that get generated.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Assignment of valid collation for SET operations on queries with UNKNOWN types.

2017-01-17 Thread Ashutosh Bapat
On Wed, Jan 18, 2017 at 10:55 AM, Michael Paquier
 wrote:
> On Sun, Jan 8, 2017 at 10:55 AM, Tom Lane  wrote:
>> Ashutosh Bapat  writes:
>>> On Tue, Jan 3, 2017 at 5:57 PM, Rahila Syed  wrote:
 Are you suggesting extending the patch to include coercing from unknown to
 text for all possible cases where a column of unknown type is being 
 created?
>>
>>> I guess, that's what Tom is suggesting.
>>
>> Yes; I think the point is we should change the semantics we assume for
>> "SELECT 'foo'", not only for views but across the board.  There are plenty
>> of non-view-related cases where it doesn't behave well, for example
>>
>> regression=# select * from
>>   (select 'foo' from generate_series(1,3)) ss order by 1;
>> ERROR:  failed to find conversion function from unknown to text
>>
>> Furthermore, it seems like a seriously bad idea for "SELECT 'foo'" to mean
>> something different in the context of CREATE VIEW than it means elsewhere.
>
> And this offers the same semantics for any DDL using SELECT's target
> list to build the list of column's types, which is consistent.
>
>> The trick is to not break cases where we've already hacked things to allow
>> external influence on the resolved types of SELECT-list items.  AFAICT,
>> these cases are just INSERT/SELECT and set operations (eg unions):
>>
>> regression=# create table target (f1 int);
>> CREATE TABLE
>> regression=# explain verbose insert into target select '42' from 
>> generate_series(1,3);
>>QUERY PLAN
>>
>> 
>> -
>>  Insert on public.target  (cost=0.00..10.00 rows=1000 width=4)
>>->  Function Scan on pg_catalog.generate_series  (cost=0.00..10.00 
>> rows=1000
>> width=4)
>>  Output: 42
>>  Function Call: generate_series(1, 3)
>> (4 rows)
>>
>> regression=# explain verbose select '42' union all select 43;
>>QUERY PLAN
>> 
>>  Append  (cost=0.00..0.04 rows=2 width=4)
>>->  Result  (cost=0.00..0.01 rows=1 width=4)
>>  Output: 42
>>->  Result  (cost=0.00..0.01 rows=1 width=4)
>>  Output: 43
>> (5 rows)
>>
>> In both of the above cases, we're getting an integer constant not a
>> text or "unknown" constant, because the type information gets imported
>> from outside the "SELECT".
>>
>> I spent some time fooling with this today and came up with the attached
>> patch.  I think this is basically the direction we should go in, but
>> there are various loose ends:
>>
>> 1. I adjusted a couple of existing regression tests whose results are
>> affected, but didn't do anything towards adding new tests showing the
>> desirable results of this change (as per the examples in this thread).
>>
>> 2. I didn't do anything about docs, either, though maybe no change
>> is needed.  One user-visible change from this is that queries should
>> never return any "unknown"-type columns to clients anymore.  But I
>> think that is not documented now, so maybe there's nothing to change.
>>
>> 3. We need to look at whether pg_upgrade is affected.  I think we
>> might be able to let it just ignore the issue for views, since they'd
>> get properly updated during the dump and reload anyway.  But if someone
>> had an actual table (or matview) with an "unknown" column, that would
>> be a pg_upgrade hazard, because "unknown" doesn't store like "text".
>>
>> 4. If "unknown" were marked as a pseudotype not base type in pg_type
>> (ie, typtype "p" not "b") then the special case for it in
>> CheckAttributeType could go away completely.  I'm not really sure
>> why it's "b" today --- maybe specifically because of this point that
>> it's currently possible to create tables with unknown columns?  But
>> I've not looked at what all the implications of changing that might
>> be, and anyway it could be left for a followon patch.
>>
>> 5. I noticed that the "resolveUnknown" arguments of transformSortClause
>> and other functions in parse_clause.c could go away: there's really no
>> reason not to just treat them as "true" always.  But that could be
>> separate cleanup as well.
>>
>> Anybody want to hack on those loose ends?
>
> Ashutosh, Rahila, do you have plans to review things here?

I won't be able to work on creating patches for TODOs listed by Tom.
But I can review if someone volunteers to produce the patches.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2017-01-17 Thread Michael Paquier
On Wed, Jan 18, 2017 at 2:23 PM, Noah Misch  wrote:
> The latest versions document this precisely, but I agree with Peter's concern
> about plain "scram".  Suppose it's 2025 and PostgreSQL support SASL mechanisms
> OAUTHBEARER, SCRAM-SHA-256, SCRAM-SHA-256-PLUS, and SCRAM-SHA3-512.  What
> should the pg_hba.conf options look like at that time?  I don't think having a
> single "scram" option fits in such a world.

Sure.

> I see two strategies that fit:
>
> 1. Single "sasl" option, with a GUC, similar to ssl_ciphers, controlling the
>mechanisms to offer.
> 2. Separate options "scram_sha_256", "scram_sha3_512", "oauthbearer", etc.

Or we could have a sasl option, with a mandatory array of mechanisms
to define one or more items, so method entries in pg_hba.conf would
look llke that:
sasl mechanism=scram_sha_256,scram_sha3_512

Users could define different methods in each hba line once a user and
a database map. I am not sure if many people would care about that
though.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Assignment of valid collation for SET operations on queries with UNKNOWN types.

2017-01-17 Thread Michael Paquier
On Sun, Jan 8, 2017 at 10:55 AM, Tom Lane  wrote:
> Ashutosh Bapat  writes:
>> On Tue, Jan 3, 2017 at 5:57 PM, Rahila Syed  wrote:
>>> Are you suggesting extending the patch to include coercing from unknown to
>>> text for all possible cases where a column of unknown type is being created?
>
>> I guess, that's what Tom is suggesting.
>
> Yes; I think the point is we should change the semantics we assume for
> "SELECT 'foo'", not only for views but across the board.  There are plenty
> of non-view-related cases where it doesn't behave well, for example
>
> regression=# select * from
>   (select 'foo' from generate_series(1,3)) ss order by 1;
> ERROR:  failed to find conversion function from unknown to text
>
> Furthermore, it seems like a seriously bad idea for "SELECT 'foo'" to mean
> something different in the context of CREATE VIEW than it means elsewhere.

And this offers the same semantics for any DDL using SELECT's target
list to build the list of column's types, which is consistent.

> The trick is to not break cases where we've already hacked things to allow
> external influence on the resolved types of SELECT-list items.  AFAICT,
> these cases are just INSERT/SELECT and set operations (eg unions):
>
> regression=# create table target (f1 int);
> CREATE TABLE
> regression=# explain verbose insert into target select '42' from 
> generate_series(1,3);
>QUERY PLAN
>
> 
> -
>  Insert on public.target  (cost=0.00..10.00 rows=1000 width=4)
>->  Function Scan on pg_catalog.generate_series  (cost=0.00..10.00 
> rows=1000
> width=4)
>  Output: 42
>  Function Call: generate_series(1, 3)
> (4 rows)
>
> regression=# explain verbose select '42' union all select 43;
>QUERY PLAN
> 
>  Append  (cost=0.00..0.04 rows=2 width=4)
>->  Result  (cost=0.00..0.01 rows=1 width=4)
>  Output: 42
>->  Result  (cost=0.00..0.01 rows=1 width=4)
>  Output: 43
> (5 rows)
>
> In both of the above cases, we're getting an integer constant not a
> text or "unknown" constant, because the type information gets imported
> from outside the "SELECT".
>
> I spent some time fooling with this today and came up with the attached
> patch.  I think this is basically the direction we should go in, but
> there are various loose ends:
>
> 1. I adjusted a couple of existing regression tests whose results are
> affected, but didn't do anything towards adding new tests showing the
> desirable results of this change (as per the examples in this thread).
>
> 2. I didn't do anything about docs, either, though maybe no change
> is needed.  One user-visible change from this is that queries should
> never return any "unknown"-type columns to clients anymore.  But I
> think that is not documented now, so maybe there's nothing to change.
>
> 3. We need to look at whether pg_upgrade is affected.  I think we
> might be able to let it just ignore the issue for views, since they'd
> get properly updated during the dump and reload anyway.  But if someone
> had an actual table (or matview) with an "unknown" column, that would
> be a pg_upgrade hazard, because "unknown" doesn't store like "text".
>
> 4. If "unknown" were marked as a pseudotype not base type in pg_type
> (ie, typtype "p" not "b") then the special case for it in
> CheckAttributeType could go away completely.  I'm not really sure
> why it's "b" today --- maybe specifically because of this point that
> it's currently possible to create tables with unknown columns?  But
> I've not looked at what all the implications of changing that might
> be, and anyway it could be left for a followon patch.
>
> 5. I noticed that the "resolveUnknown" arguments of transformSortClause
> and other functions in parse_clause.c could go away: there's really no
> reason not to just treat them as "true" always.  But that could be
> separate cleanup as well.
>
> Anybody want to hack on those loose ends?

Ashutosh, Rahila, do you have plans to review things here?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2017-01-17 Thread Noah Misch
On Tue, Nov 15, 2016 at 07:52:06AM +0900, Michael Paquier wrote:
> On Sat, Nov 5, 2016 at 9:36 PM, Michael Paquier  
> wrote:
> > On Sat, Nov 5, 2016 at 12:58 AM, Peter Eisentraut 
> >  wrote:
> > pg_hba.conf uses "scram" as keyword, but scram refers to a family of
> > authentication methods. There is as well SCRAM-SHA-1, SCRAM-SHA-256
> > (what this patch does). Hence wouldn't it make sense to use
> > scram_sha256 in pg_hba.conf instead? If for example in the future
> > there is a SHA-512 version of SCRAM we could switch easily to that and
> > define scram_sha512.
> 
> OK, I have added more docs regarding the use of scram in pg_hba.conf,
> particularly in client-auth.sgml to describe what scram is better than
> md5 in terms of protection, and also completed the data of pg_hba.conf
> about the new keyword used in it.

The latest versions document this precisely, but I agree with Peter's concern
about plain "scram".  Suppose it's 2025 and PostgreSQL support SASL mechanisms
OAUTHBEARER, SCRAM-SHA-256, SCRAM-SHA-256-PLUS, and SCRAM-SHA3-512.  What
should the pg_hba.conf options look like at that time?  I don't think having a
single "scram" option fits in such a world.  I see two strategies that fit:

1. Single "sasl" option, with a GUC, similar to ssl_ciphers, controlling the
   mechanisms to offer.
2. Separate options "scram_sha_256", "scram_sha3_512", "oauthbearer", etc.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Protect syscache from bloating with negative cache entries

2017-01-17 Thread Michael Paquier
On Sat, Jan 14, 2017 at 9:36 AM, Tomas Vondra
 wrote:
> On 01/14/2017 12:06 AM, Andres Freund wrote:
>> On 2017-01-13 17:58:41 -0500, Tom Lane wrote:
>>>
>>> But, again, the catcache isn't the only source of per-process bloat
>>> and I'm not even sure it's the main one.  A more holistic approach
>>> might be called for.
>>
>> It'd be helpful if we'd find a way to make it easy to get statistics
>> about the size of various caches in production systems. Right now
>> that's kinda hard, resulting in us having to make a lot of
>> guesses...
>
> What about a simple C extension, that could inspect those caches? Assuming
> it could be loaded into a single backend, that should be relatively
> acceptable way (compared to loading it to all backends using
> shared_preload_libraries).

This extension could do a small amount of work on a portion of the
syscache entries at each query loop, still I am wondering if that
would not be nicer to get that in-core and configurable, which is
basically the approach proposed by Horiguchi-san. At least it seems to
me that it has some merit, and if we could make that behavior
switchable, disabled by default, that would be a win for some class of
applications. What do others think?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)

2017-01-17 Thread Haribabu Kommi
On Sat, Jan 14, 2017 at 2:58 AM, Dilip Kumar  wrote:

> On Wed, Jan 11, 2017 at 7:47 PM, Dilip Kumar 
> wrote:
> > +void
> > +pgstat_count_sqlstmt(const char *commandTag)
> > +{
> > + PgStat_SqlstmtEntry *htabent;
> > + bool found;
> > +
> > + if (!pgstat_track_sql)
> > + return
> >
> > Callers of pgstat_count_sqlstmt are always ensuring that
> > pgstat_track_sql is true, seems it's redundant here.
>
> I have done testing of the feature, it's mostly working as per the
> expectation.
>

Thanks for the review and test.

The use case for this view is to find out the number of different SQL
statements
that are getting executed successfully on an instance by the
application/user.

I have few comments/questions.
>
> 
> If you execute the query with EXECUTE then commandTag will be EXECUTE
> that way we will not show the actual query type, I mean all the
> statements will get the common tag "EXECUTE".
>
> You can try this.
> PREPARE fooplan (int) AS INSERT INTO t VALUES($1);
> EXECUTE fooplan(1);
>
> --
>

Yes, that's correct. Currently the count is incremented based on the base
command,
because of this reason, the EXECUTE is increased, instead of the actual
operation.



> + /* Destroy the SQL statement counter stats HashTable */
> + hash_destroy(pgstat_sql);
> +
> + /* Create SQL statement counter Stats HashTable */
>
> I think in the patch we have used mixed of "Stats HashTable" and
> "stats HashTable", I think better
> to be consistent throughout the patch. Check other similar instances.
>
> 
>

Corrected.


>
> @@ -1102,6 +1102,12 @@ exec_simple_query(const char *query_string)
>
>   PortalDrop(portal, false);
>
> + /*
> + * Count SQL statement for pg_stat_sql view
> + */
> + if (pgstat_track_sql)
> + pgstat_count_sqlstmt(commandTag);
>
> We are only counting the successful SQL statement, Is that intentional?
>

Yes, I thought of counting the successful SQL operations that changed the
database over a period of time. I am not sure whether there will be many
failure operations that can occur on an instance to be counted.


> --
> I have noticed that pgstat_count_sqlstmt is called from
> exec_simple_query and exec_execute_message. Don't we want to track the
> statement executed from functions?
> ---


The view is currently designed to count user/application initiated SQL
statements,
but not the internal SQL queries that are getting executed from functions
and etc.

>>+void
>>+pgstat_count_sqlstmt(const char *commandTag)
>>+{
>>+ PgStat_SqlstmtEntry *htabent;
>>+ bool found;
>>+
>>+ if (!pgstat_track_sql)
>>+ return
>
>Callers of pgstat_count_sqlstmt are always ensuring that
>pgstat_track_sql is true, seems it's redundant here.

Removed the check in pgstat_count_sqlstmt statement and add it
exec_execute_message
function where the check is missed.

Updated patch attached.

Regards,
Hari Babu
Fujitsu Australia


pg_stat_sql_row_mode_4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel bitmap heap scan

2017-01-17 Thread Dilip Kumar
On Fri, Jan 13, 2017 at 6:36 PM, Dilip Kumar  wrote:
>
> Please verify with the new patch.

Patch 0001 and 0003 required to rebase on the latest head. 0002 is
still the same.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


0001-opt-parallelcost-refactoring-v11.patch
Description: Binary data


0002-hash-support-alloc-free-v11.patch
Description: Binary data


0003-parallel-bitmap-heap-scan-v11.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PSQL commands: \quit_if, \quit_unless

2017-01-17 Thread Michael Paquier
On Wed, Jan 18, 2017 at 3:24 AM, Robert Haas  wrote:
> On Sat, Jan 14, 2017 at 12:22 AM, Tom Lane  wrote:
>>
>> $ cat loop.sql
>> \if :x < 1000
>>   \echo :x
>>   \set x :x + 1
>>   \include loop.sql
>> \fi
>> $ psql --set x=0 -f loop.sql
>>
>> Somebody is going to think of that workaround for not having loops, and
>> then whine about how psql runs out of file descriptors and/or stack.
>
> Hmm, I think somebody just DID think of it.
>
> But personally this doesn't upset me a bit.  If somebody complains
> about that particular thing, I think that would be an excellent time
> to suggest that they write a patch to add a looping construct.

Agreed.

As far as I can see on this thread, something could be done, it is
just that we don't know yet at which extent things could be done with
the first shot. There are many things that could be done, but at least
I'd suggest to get \if, \fi and \quit to satisfy the first
requirements of this thread, and let loops out of it. I have switched
the patch as "returned with feedback" as getting a new patch is going
to require some thoughts to get the context handling done correctly on
psql side.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DROP FUNCTION of multiple functions

2017-01-17 Thread Michael Paquier
On Wed, Jan 18, 2017 at 5:26 AM, Peter Eisentraut
 wrote:
> On 1/10/17 1:52 AM, Michael Paquier wrote:
>> I don't see any problems with 0001.
>
> I was wondering, should we rename funcname -> name, and funcargs ->
> args, or perhaps the whole FuncWithArgs struct, so there is no confusion
> when used with operators?

FuncWithArgs implies that this is related to a function, so removing
func as prefix may make things cleaner.

>> One comment though: there are still many list_make2() or even
>> list_make3 calls for some object types. Would it make sense to replace
>> those lists with a decided number of items by a Node and simplify the
>> interface?
>
> (I don't see any list_make3.)

Indeed, I am watching too much code.

> It would be nice to refine this further,
> but the remaining uses are quite marginal.  The main problem was that
> before you had to create singleton lists and then unpack them, because
> there was no other way.  The remaining uses are more genuine lists or lcons.

OK. Of course, I am not saying that this patch in particular should
shake more the world. I have been just trying to point out future
potential improvements and keep a trace of them in the archives while
thinking about it.

>> In 0005, a nit:
>> +DROP FUNCTION functest_IS_1(int, int, text), functest_IS_2(int),
>> functest_IS_3(int);
>>  -- Cleanups
>> The DROP query could be moved below the cleanup comment.
>
> I can do that, but the idea was that the commands below the cleanups
> line weren't really tests.

That's a nit, you can ignore that.

>> While looking at 0006... DROP POLICY and DROP RULE could be unified. I
>> just noticed that while reading the code.
>
> DROP TRIGGER also looks similar.  drop_type3 then. ;-)

Or drop_type_on, drop_type_on_table, etc.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb_delete with arrays

2017-01-17 Thread Michael Paquier
On Tue, Jan 17, 2017 at 8:45 PM, Magnus Hagander  wrote:
> On Tue, Jan 17, 2017 at 8:25 AM, Michael Paquier 
> wrote:
>> On Sun, Dec 18, 2016 at 1:27 AM, Dmitry Dolgov <9erthali...@gmail.com>
>> wrote:
>> > * use variadic arguments for `jsonb_delete_array`. For rare cases, when
>> > someone decides to use this function directly instead of corresponding
>> > operator. It will be more consistent with `jsonb_delete` from my point
>> > of
>> > view, because it's transition from `jsonb_delete(data, 'key')` to
>> > `jsonb_delete(data, 'key1', 'key2')` is more smooth, than to
>> > `jsonb_delete(data, '{key1, key2}')`.
>>
>> That's a good idea.
>
> I can see the point of that. In the particular usecase I built it for
> originally though, the list of keys came from the application, in which case
> binding them as an array was a lot more efficient (so as not to require a
> whole lot of different prepared statements, one for each number of
> parameters). But that should be workaround-able using the VARIADIC keyword
> in the caller. Or by just using the operator.

Yes that should be enough:
=# select jsonb_delete('{"a":1 , "b":2, "c":3}', 'a', 'b', 'c');
 jsonb_delete
--
 {}
(1 row)
=# select '{"a":1 , "b":2, "c":3}'::jsonb - '{a,b}'::text[];
 ?column?
--
 {"c": 3}
(1 row)
That's a nice bonus, perhaps that's not worth documenting as most
users will likely care only about the operator.

>> > I've attached a patch with these modifications. What do you think?
>>
>> Looking at both patches proposed, documentation is still missing in
>> the list of jsonb operators as '-' is missing for arrays. I am marking
>> this patch as waiting on author for now.
>
> Added in updated patch. Do you see that as enough, or do we need it in some
> more places in the docs as well?

I am not seeing other places to update, thanks.

Another victim of 352a24a... Your patch is failing to apply because
now the headers of the functions is generated automatically. And the
OIDs have been taken recently. I have fixed that to test your patch,
the result is attached. The patch is marked as ready for committer.
-- 
Michael
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 10e31868ba..af3d2aa6a8 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10845,6 +10845,14 @@ table2-mapping


 -
+text[]
+Delete multiple key/value pairs or string
+elements from left operand.  Key/value pairs are matched based
+on their key value.
+'{"a": "b", "c": "d"}'::jsonb - '{a,c}'::text[] 

+   
+   
+-
 integer
 Delete the array element with specified index (Negative
 integers count from the end).  Throws an error if top level
diff --git a/src/backend/utils/adt/jsonfuncs.c 
b/src/backend/utils/adt/jsonfuncs.c
index 58c721c074..d624fdbf79 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -3438,6 +3438,92 @@ jsonb_delete(PG_FUNCTION_ARGS)
 }
 
 /*
+ * SQL function jsonb_delete (jsonb, variadic text[])
+ *
+ * return a copy of the jsonb with the indicated items
+ * removed.
+ */
+Datum
+jsonb_delete_array(PG_FUNCTION_ARGS)
+{
+   Jsonb  *in = PG_GETARG_JSONB(0);
+   ArrayType  *keys = PG_GETARG_ARRAYTYPE_P(1);
+   Datum  *keys_elems;
+   bool   *keys_nulls;
+   int keys_len;
+   JsonbParseState *state = NULL;
+   JsonbIterator *it;
+   JsonbValue  v,
+  *res = NULL;
+   boolskipNested = false;
+   JsonbIteratorToken r;
+
+   if (ARR_NDIM(keys) > 1)
+   ereport(ERROR,
+   (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+errmsg("wrong number of array subscripts")));
+
+   if (JB_ROOT_IS_SCALAR(in))
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("cannot delete from scalar")));
+
+   if (JB_ROOT_COUNT(in) == 0)
+   PG_RETURN_JSONB(in);
+
+   deconstruct_array(keys, TEXTOID, -1, false, 'i',
+ _elems, _nulls, _len);
+
+   if (keys_len == 0)
+   PG_RETURN_JSONB(in);
+
+   it = JsonbIteratorInit(>root);
+
+   while ((r = JsonbIteratorNext(, , skipNested)) != 0)
+   {
+   skipNested = true;
+
+   if ((r == WJB_ELEM || r == WJB_KEY) && v.type == jbvString)
+   {
+   int i;
+   boolfound = false;
+
+   for (i = 0; i < keys_len; i++)
+   {
+   char   *keyptr;
+   int keylen;
+
+   if (keys_nulls[i])
+

Re: [HACKERS] PoC: Grouped base relation

2017-01-17 Thread Ashutosh Bapat
On Tue, Jan 17, 2017 at 10:07 PM, Antonin Houska  wrote:
> Ashutosh Bapat  wrote:
>> [... snip ]]
>>
>> This all works well, as long as the aggregate is "summing" something
>> across rows. The method doesn't work when aggregation is say
>> "multiplying" across the rows or "concatenating" across the rows like
>> array_agg() or string_agg(). They need a different strategy to combine
>> aggregates across relations.
>
> Good point. The common characteristic of these seems to be that thay don't
> have aggcombinefn defined.

I don't think aggcombinefn isn't there because we couldn't write it
for array_agg() or string_agg(). I guess, it won't be efficient to
have those aggregates combined across parallel workers.

Also, the point is naming that kind of function as aggtransmultifn
would mean that it's always supposed to multiply, which isn't true for
all aggregates.

>
>> IIUC, we are trying to solve multiple problems here:
>
>> 1. Pushing down aggregates/groups down join tree, so that the number of rows
>> to be joined decreases.  This might be a good optimization to have. However
>> there are problems in the current patch. Every path built for a relation
>> (join or base) returns the same result expressed by the relation or its
>> subset restricted by parameterization or unification. But this patch changes
>> that. It creates paths which represent grouping in the base relation.  I
>> think, we need a separate relation to represent that result and hold paths
>> which produce that result. That itself would be a sizable patch.
>
> Whether a separate relation (RelOptInfo) should be created for grouped
> relation is an important design decision indeed. More important than your
> argument about the same result ("partial path", used to implement parallel
> nodes actually does not fit this criterion perfectly - it only returns part of
> the set) is the fact that the data type (target) differs.

Right!

>
> I even spent some time coding a prototype where separate RelOptInfo is created
> for the grouped relation but it was much more invasive. In particular, if only
> some relations are grouped, it's hard to join them with non-grouped ones w/o
> changing make_rel_from_joinlist and subroutines substantially. (Decision
> whether the plain or the grouped relation should be involved in joining makes
> little sense at the leaf level of the join tree.)
>
> So I took the approach that resembles the partial paths - separate pathlists
> within the same RelOptInfo.


Yes, it's hard, but I think without having a separate RelOptInfo the
design won't be complete. Is there a subset of problem that can be
solved by using a separate RelOptInfo e.g. pushing aggregates down
child relations or anything else.

>
>> 2. Try to push down aggregates based on the equivalence classes, where
>> grouping properties can be transferred from one relation to the other using
>> EC mechanism.
>
> I don't think the EC part should increase the patch complexity a lot. Unless I
> missed something, it's rather isolated to the part where target of the grouped
> paths is assembled. And I think it's important even for initial version of the
> patch.
>
>> This seems to require solving the problem of combining aggregates across the
>> relations. But there might be some usecases which could benefit without
>> solving this problem.
>
> If "combining aggregates ..." refers to joining grouped relations, then I
> insist on doing this in the initial version of the new feature too. Otherwise
> it'd only work if exactly one base relation of the query is grouped.

No. "combining aggregates" refers to what aggtransmultifn does. But,
possibly that problem needs to be solved in the first step itself.

>
>> 3. If the relation to which we push the aggregate is an append relation,
>> push (partial) aggregation/grouping down into the child relations. - We
>> don't do that right now even for grouping aggregation on a single append
>> table. Parallel partial aggregation does that, but not exactly per
>> relation. That may be a sizable project in itself.  Even without this piece
>> the rest of the optimizations proposed by this patch are important.
>
> Yes, this can be done in a separate patch. I'll consider it.
>
>> 4. Additional goal: push down the aggregation to any relation (join/base)
>> where it can be computed.
>
> I think this can be achieved by adding extra aggregation nodes to the join
> tree. As I still anticipate more important design changes, this part is not at
> the top of my TODO list.

I guess, attempting this will reveal some more design changes
required; it may actually simplify the design.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-01-17 Thread Michael Paquier
On Wed, Jan 18, 2017 at 11:36 AM, Karl O. Pinc  wrote:
> On Wed, 18 Jan 2017 11:08:25 +0900
> Michael Paquier  wrote:
>
>> Our ideas rather map here, ERRCODE_INTERNAL_ERROR would be adapted for
>> this situation. Do any of you want to give it a shot or should I?
>
> You're welcome to it.

What do you think about that?
+   log_filepath = strchr(lbuffer, ' ');
+   if (log_filepath == NULL)
+   {
+   /*
+* Corrupted data found, return NULL to the caller and still
+* inform him on the situation.
+*/
+   ereport(WARNING,
+   (ERRCODE_INTERNAL_ERROR,
+errmsg("corrupted data found in \"%s\"",
+   LOG_METAINFO_DATAFILE)));
+   break;
+   }

Recent commits 352a24a1 (not my fault) and e7b020f7 (my fault) are
creating conflicts, so a rebase was as well welcome.
-- 
Michael
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 07afa3c77a..3188496bc2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4248,6 +4248,25 @@ SELECT * FROM parent WHERE key = 2400;
  must be enabled to generate
 CSV-format log output.

+   
+When either stderr or
+csvlog are included, the file
+current_logfiles gets created and records the location
+of the log file(s) currently in use by the logging collector and the
+associated logging destination. This provides a convenient way to
+find the logs currently in use by the instance. Here is an example of
+contents of this file:
+
+stderr pg_log/postgresql.log
+csvlog pg_log/postgresql.csv
+
+Note that this file gets updated when a new log file is created
+as an effect of rotation or when log_destination is
+reloaded.  current_logfiles is removed when
+stderr and csvlog
+are not included in log_destination or when the
+logging collector is disabled.
+   
 

 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 10e31868ba..c756fbe066 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15441,6 +15441,13 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS 
t(ls,n);
   
 
   
+   
pg_current_logfile(text)
+   text
+   Primary log file name, or log in the requested format,
+   currently in use by the logging collector
+  
+
+  

pg_my_temp_schema()
oid
OID of session's temporary schema, or 0 if none
@@ -15659,6 +15666,34 @@ SET search_path TO schema , 
schema, ..

 

+pg_current_logfile
+   
+
+   
+Logging
+pg_current_logfile function
+   
+
+   
+pg_current_logfile returns, as text,
+the path of the log file(s) currently in use by the logging collector.
+The path includes the  directory
+and the log file name.  Log collection must be enabled or the return value
+is NULL.  When multiple log files exist, each in a
+different format, pg_current_logfile called
+without arguments returns the path of the file having the first format
+found in the ordered list: stderr, csvlog.
+NULL is returned when no log file has any of these
+formats.  To request a specific file format supply, as text,
+either csvlog or stderr as the value of the
+optional parameter. The return value is NULL when the
+log format requested is not a configured
+.
+pg_current_logfiles reflects the contents of the
+file current_logfiles.
+   
+
+   
 pg_my_temp_schema

 
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 5c52824dfc..388ed344ee 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -61,6 +61,12 @@ Item
 
 
 
+ current_logfiles
+ File recording the log file(s) currently written to by the logging
+  collector
+
+
+
  global
  Subdirectory containing cluster-wide tables, such as
  pg_database
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 31aade102b..1d7f68b8a4 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1055,6 +1055,7 @@ REVOKE EXECUTE ON FUNCTION pg_xlog_replay_pause() FROM 
public;
 REVOKE EXECUTE ON FUNCTION pg_xlog_replay_resume() FROM public;
 REVOKE EXECUTE ON FUNCTION pg_rotate_logfile() FROM public;
 REVOKE EXECUTE ON FUNCTION pg_reload_conf() FROM public;
+REVOKE EXECUTE ON FUNCTION pg_current_logfile() FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_stat_reset() FROM public;
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_shared(text) FROM public;
diff --git a/src/backend/postmaster/syslogger.c 
b/src/backend/postmaster/syslogger.c
index 13a03014eb..e6899c6246 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -146,6 +146,7 @@ static char *logfile_getname(pg_time_t timestamp, const 
char *suffix);
 static 

Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait

2017-01-17 Thread Peter Eisentraut
On 1/17/17 8:35 PM, Michael Paquier wrote:
> Yeah, it seems to me that we are likely looking for a wait mode saying
> to exit pg_ctl once Postgres is happily rejecting connections, because
> that means that it is up and that it is sorting out something first
> before accepting them. This would basically filter the states in the
> control file that we find as acceptable if the connection test
> continues complaining about PQPING_REJECT.

Note that you can still use pg_ctl start --no-wait if that is what you
want.  But that should be the exception.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

2017-01-17 Thread Michael Paquier
On Tue, Jan 17, 2017 at 7:36 PM, Kyotaro HORIGUCHI
 wrote:
> I managed to reproduce this. A little tweak as the first patch
> lets the standby to suicide as soon as walreceiver sees a
> contrecord at the beginning of a segment.

Good idea.

> I believe that a continuation record cannot be span over three or
> more *segments* (is it right?), so keeping one spare segment
> would be enough. The attached second patch does this.

I have to admit that I did not think about this problem much yet (I
bookmarked this report weeks ago to be honest as something to look
at), but that does not look right to me. Couldn't a record be spawned
across even more segments? Take a random string longer than 64MB or
event longer for example.

> Other possible measures might be,
>
> - Allowing switching wal source while reading a continuation
>   record. Currently ReadRecord assumes that a continuation record
>   can be read from single source. But this needs refactoring
>   involving xlog.c, xlogreader.c and relatives.

This is scary thinking about back-branches.

> - Delaying recycling a segment until the last partial record on it
>   completes. This seems doable in page-wise (coarse resolution)
>   but would cost additional reading of past xlog files (page
>   header of past pages is required).

Hm, yes. That looks like the least invasive way to go. At least that
looks more correct than the others.

> - Delaying write/flush feedback until the current record is
>   completed. walreceiver is not conscious of a WAL record and
>   this might break synchronous replication.

Not sure about this one yet.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension)

2017-01-17 Thread Peter Eisentraut
On 12/29/16 10:55 PM, Haribabu Kommi wrote:
> Fujitsu was interested in developing a columnar storage extension with
> minimal
> changes the server backend.
> 
> The columnar store is implemented as an extension using index access
> methods.
> This can be easily enhanced with pluggable storage methods once they are
> available.
> 
> A new index method (VCI) is added to create columnar index on the table.

I'm confused.  You say that you are adding an index access method, for
which we have a defined extension mechanism, but the code doesn't do
that.  Instead, it sprinkles a bunch of hooks through the table access
code.  So you are really adding ways to add alternatives to heap
storage, except we have no way to know whether these hooks have been
designed with any kind of generality in mind.  So is it an index access
method or a table access method?

Either way, you shouldn't need a new relkind.  Note that all indexes
have the same relkind, even if they use different access methods.

I think there are two ways to integrate column storage into PostgreSQL:
One is to use the FDW interface.  That has been done before, see
cstore_fdw.  The other is to define a storage manager extension
interface.  That has been tried but has not been completed yet.  Adding
a bunch of custom hooks all over the place seems worse than both of those.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2017-01-17 Thread Peter Eisentraut
On 1/13/17 9:22 AM, Peter Moser wrote:
> The goal of temporal aligners and normalizers is to split ranges to allow a
> reduction from temporal queries to their non-temporal counterparts.
> Splitting
> ranges is necessary for temporal query processing. Temporal aligners and
> normalizer may then be used as building-blocks for any temporal query
> construct.

I would need to see the exact definitions of these constructs.  Please
send some documentation.

> We have published two papers, that contain formal definitions and
> related work
> for the temporal aligner and normalizer. Please see [1] and [2].

I don't have access to those.

>> I think there are probably many interesting applications for normalizing
>> or otherwise adjusting ranges.  I'd like to see an overview and
>> consideration of other applications.
> 
> Please see the attached file adjustment.sql for some interesting
> applications.

That's surely interesting, but without knowing what these operations are
supposed to do, I can only reverse engineer and guess.

>> Ideally, I'd like to see these things implemented as some kind of
>> user-space construct, like an operator or function.  I think we'd need a
>> clearer definition of what it is they do before we can evaluate that.
> 
> Can you please explain what you mean by "user-space construct" in this case.

Implement them using the extensibility features, such as a user-defined
operator.  I don't know if it's possible, but it's something to consider.

Using common terms such as ALIGN and NORMALIZE for such a specific
functionality seems a bit wrong.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-17 Thread Etsuro Fujita

On 2017/01/16 11:38, Etsuro Fujita wrote:

On 2017/01/14 6:39, Jeff Janes wrote:

I do get a compiler warning:

foreign.c: In function 'CreateLocalJoinPath':
foreign.c:832: warning: implicit declaration of function
'pathkeys_contained_in'



Will fix.


Done.  Attached is the new version.  I also adjusted the code a bit further.

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 1519,1540  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Merge Join
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Merge Cond: (t1.c1 = t2.c1)
!  ->  Sort
 Output: t1.c1, t1.c3, t1.*
!Sort Key: t1.c1
!->  Foreign Scan on public.ft1 t1
!  Output: t1.c1, t1.c3, t1.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Sort
 Output: t2.c1, t2.*
!Sort Key: t2.c1
!->  Foreign Scan on public.ft2 t2
!  Output: t2.c1, t2.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1519,1534 
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Nested Loop
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Join Filter: (t1.c1 = t2.c1)
!  ->  Foreign Scan on public.ft1 t1
 Output: t1.c1, t1.c3, t1.*
!Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Foreign Scan on public.ft2 t2
 Output: t2.c1, t2.*
!Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
***
*** 1563,1584  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!->  Merge Join
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Merge Cond: (t1.c1 = t2.c1)
!  ->  Sort
 Output: t1.c1, t1.c3, t1.*
!Sort Key: t1.c1
!->  Foreign Scan on public.ft1 t1
!  Output: t1.c1, t1.c3, t1.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Sort
 Output: t2.c1, t2.*
!Sort Key: t2.c1
!->  Foreign Scan on public.ft2 t2
!  Output: t2.c1, t2.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, 

Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-01-17 Thread Karl O. Pinc
On Wed, 18 Jan 2017 11:08:25 +0900
Michael Paquier  wrote:

> Our ideas rather map here, ERRCODE_INTERNAL_ERROR would be adapted for
> this situation. Do any of you want to give it a shot or should I?

You're welcome to it.


Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-01-17 Thread Michael Paquier
On Wed, Jan 18, 2017 at 7:36 AM, Karl O. Pinc  wrote:
> Maybe.  It's not user-supplied data that's corrupted but it is
> PG generated data which is generated for and supplied to the user.
> I just looked at all uses of XX001 and it is true that they all
> involve corruption of user-supplied data.
>
> If you don't want to use XX001 use XX000.  It does not seem worth
> making a new error code for just this one case.

Our ideas rather map here, ERRCODE_INTERNAL_ERROR would be adapted for
this situation. Do any of you want to give it a shot or should I?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-01-17 Thread Michael Paquier
On Wed, Jan 18, 2017 at 3:06 AM, Gilles Darold  wrote:
> This have already been discuted previously in this thread, one of my
> previous patch version has implemented this behavior but we decide that
> what we really want is to be able to use the function using the
> following simple query:
>
> SELECT pg_read_file(pg_current_logfiles());
>
> and not something like
>
> SELECT pg_read_file(SELECT file FROM pg_current_logfiles() LIMIT 1);
> or
> SELECT pg_read_file(SELECT file FROM pg_current_logfiles() WHERE
> method='stderr');
>
> You can also obtain the "kind" of output from the SRF function using:
>
> SELECT pg_read_file('current_logfiles');

I don't really understand this argument as you can do that as well:
SELECT pg_read_file(file) FROM pg_current_logfiles WHERE method = 'stderr';

> I'm not against adding a warning or error message here even if it may
> never occurs, but we need a new error code as it seems to me that no
> actual error code can be used.

ERRCODE_INTERNAL_ERROR, no?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-01-17 Thread Michael Paquier
On Tue, Jan 17, 2017 at 7:19 PM, Beena Emerson  wrote:
> PFA the patch with the documentation included.

It is usually better to keep doc lines under control of 72-80
characters if possible.

+   /* column 1: Wal segment size */
+   len = strlen(value);
+   pq_sendint(, len, 4);
+   pq_sendbytes(, value, len);
Bip. Error. This is a parameter value, not the WAL segment size.

Except those minor points this works as expected, and is consistent
with non-replication sessions, so that's nice by itself:
=# create user toto replication login;
CREATE ROLE

$ psql "replication=1" -U toto
=> SHOW foo;
ERROR:  42704: unrecognized configuration parameter "foo"
LOCATION:  GetConfigOptionByName, guc.c:7968
Time: 0.245 ms
=> SHOW log_directory;
ERROR:  42501: must be superuser to examine "log_directory"
LOCATION:  GetConfigOptionByName, guc.c:7974

I think that we could get a committer look at that at the least.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait

2017-01-17 Thread Michael Paquier
On Wed, Jan 18, 2017 at 10:35 AM, Michael Paquier
 wrote:
> On Wed, Jan 18, 2017 at 7:31 AM, Stephen Frost  wrote:
>> Perhaps we need a way for pg_ctl to realize a cold-standby case and
>> throw an error or warning if --wait is specified then, but that hardly
>> seems like the common use-case.  It also wouldn't make any sense to have
>> anything in the init system which depended on PG being up in such a case
>> because, well, PG isn't ever going to be 'up'.
>
> Yeah, it seems to me that we are likely looking for a wait mode saying
> to exit pg_ctl once Postgres is happily rejecting connections, because
> that means that it is up and that it is sorting out something first
> before accepting them. This would basically filter the states in the
> control file that we find as acceptable if the connection test
> continues complaining about PQPING_REJECT.

Another option would be as well to log the state of the control file
to the user to let him know what currently happens, and document that
increasing the wait timeout is recommended if the recovery time since
the last redo point takes longer.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait

2017-01-17 Thread Michael Paquier
On Wed, Jan 18, 2017 at 7:31 AM, Stephen Frost  wrote:
> Perhaps we need a way for pg_ctl to realize a cold-standby case and
> throw an error or warning if --wait is specified then, but that hardly
> seems like the common use-case.  It also wouldn't make any sense to have
> anything in the init system which depended on PG being up in such a case
> because, well, PG isn't ever going to be 'up'.

Yeah, it seems to me that we are likely looking for a wait mode saying
to exit pg_ctl once Postgres is happily rejecting connections, because
that means that it is up and that it is sorting out something first
before accepting them. This would basically filter the states in the
control file that we find as acceptable if the connection test
continues complaining about PQPING_REJECT.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Index Scans

2017-01-17 Thread Haribabu Kommi
On Mon, Jan 16, 2017 at 11:11 PM, Amit Kapila 
wrote:

>
> Changed as per suggestion.
>
>
> I have also rebased the optimizer/executor support patch
> (parallel_index_opt_exec_support_v4.patch) and added a test case in
> it.


Thanks for the patch. Here are comments found during review.

parallel_index_scan_v4.patch:


+ amtarget = (char *) ((void *) target) + offset;

The above calcuation can be moved after NULL check?

+ * index_beginscan_parallel - join parallel index scan

The name and the description doesn't sync properly, any better description?

+ BTPARALLEL_DONE,
+ BTPARALLEL_IDLE
+} PS_State;

The order of above two enum values can be changed according to their use.

+ /* Check if the scan for current scan keys is finished */
+ if (so->arrayKeyCount < btscan->btps_arrayKeyCount)
+ *status = false;

I didn't clearly understand, in which scenario the arrayKeyCount is less
than btps_arrayKeyCount?


+BlockNumber
+_bt_parallel_seize(IndexScanDesc scan, bool *status)

The return value of the above function is validated only in _bt_first
function, but in other cases it is not. From the function description,
it is possible to return P_NONE for the workers also with status as
true. I feel it is better to handle the P_NONE case internally only
so that callers just check for the status. Am i missing anything?


+extern BlockNumber _bt_parallel_seize(IndexScanDesc scan, bool *status);
+extern void _bt_parallel_release(IndexScanDesc scan, BlockNumber
scan_page);

Any better names for the above functions, as these function will provide/set
the next page that needs to be read.


parallel_index_opt_exec_support_v4.patch:

+#include "access/parallel.h"

Why is it required to be include nbtree.c? i didn't find
any code changes in the patch.


+ /* reset (parallel) index scan */
+ if (node->iss_ScanDesc)
+ {

Why this if check required? There is an assert check in later function
calls.


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] PoC: Grouped base relation

2017-01-17 Thread Tomas Vondra

On 01/17/2017 08:05 PM, Antonin Houska wrote:

[ Trying to respond to both Tomas and David. I'll check tomorrow if anything
else of the thread needs my comment. ]

Tomas Vondra  wrote:


On 01/17/2017 12:42 AM, David Rowley wrote:

On 10 January 2017 at 06:56, Antonin Houska  wrote:



I've been thinking about this aggtransmultifn and I'm not sure if it's
really needed. Adding a whole series of new transition functions is
quite a pain. At least I think so, and I have a feeling Robert might
agree with me.

Let's imagine some worst case (and somewhat silly) aggregate query:

SELECT count(*)
FROM million_row_table
CROSS JOIN another_million_row_table;

Today that's going to cause 1 TRILLION transitions! Performance will
be terrible.

If we pushed the aggregate down into one of those tables and performed
a Partial Aggregate on that, then a Finalize Aggregate on that single
row result (after the join), then that's 1 million transfn calls, and
1 million combinefn calls, one for each row produced by the join.

If we did it your way (providing I understand your proposal correctly)
there's 1 million transfn calls on one relation, then 1 million on the
other and then 1 multiplyfn call. which does 100 * 100

What did we save vs. using the existing aggcombinefn infrastructure
which went into 9.6? Using this actually costs us 1 extra function
call, right? I'd imagine the size of the patch to use aggcombinefn
instead would be a fraction of the size of the one which included all
the new aggmultiplyfns and pg_aggregate.h changes.




I think the patch relies on the assumption that the grouping reduces
cardinality,


Yes.


so a CROSS JOIN without a GROUP BY clause may not be the best
counterexample.


Yet it tells me that my approach is not ideal in some cases ...


It sounds like a much more manageable project by using aggcombinefn
instead. Then maybe one day when we can detect if a join did not cause
any result duplication (i.e Unique Joins), we could finalise the
aggregates on the first call, and completely skip the combine state
altogether.




I don't quite see how the patch could use aggcombinefn without sacrificing a
lot of the benefits. Sure, it's possible to run the aggcombinefn in a loop
(with number of iterations determined by the group size on the other side of
the join), but that sounds pretty expensive and eliminates the reduction of
transition function calls. The join cardinality would still be reduced,
though.


That's what I think. The generic case is that neither side of the join is
unique. If it appears that both relations should be aggregated below the join,
aggcombinefn would have to be called multiple times on each output row of the
join to achieve the same result as the calling aggmultiplyfn.


I do have other question about the patch, however. It seems to rely on the
fact that the grouping and joins both reference the same columns. I wonder how
uncommon such queries are.

To give a reasonable example, imagine the typical start schema, which is
pretty standard for large analytical databases. A dimension table is
"products" and the fact table is "sales", and the schema might look like this:

CREATE TABLE products (
idSERIAL PRIMARY KEY,
name  TEXT,
category_id   INT,
producer_id   INT
);

CREATE TABLE sales (
product_idREFERENCES products (id),
nitemsINT,
price NUMERIC
);

A typical query then looks like this:

SELECT category_id, SUM(nitems), SUM(price)
FROM products p JOIN sales s ON (p.id = s.product_id)
GROUP BY p.category_id;

which obviously uses different columns for the grouping and join, and so the
patch won't help with that. Of course, a query grouping by product_id would
allow the patch to work


Right, the current version does not handle this. Thanks for suggestion.

>

So you're saying it's merely a limitation of the initial patch version 
and not an inherent limitation?





Another thing is that in my experience most queries do joins on foreign keys
(so the PK side is unique by definition), so the benefit on practical examples
is likely much smaller.


ok. So in some cases the David's approach might be better.



In which cases would David's approach be more efficient? But even if 
there are such cases, I assume we could generate both paths and decide 
based on cost, just like with all other alternative paths.


>

However I think the ability to join 2 grouped (originally non-unique)
relations is still important. Consider a query involving "sales" as well as
another table which also has many-to-one relationship to "products".



Well, can you give a practical example? What you describe seems like a 
combination of two fact tables + a dimension, something like this:


CREATE TABLE products (
idSERIAL PRIMARY KEY,
name  TEXT,
category_id   INT,
producer_id   INT
);

CREATE TABLE sales (
product_idREFERENCES 

Re: [HACKERS] [COMMITTERS] pgsql: Generate fmgr prototypes automatically

2017-01-17 Thread Tom Lane
Alvaro Herrera  writes:
> Alvaro Herrera wrote:
>> It is possible to replace many occurrences of builtins.h with
>> fmgrprotos.h.  I just tried this
>> git grep -l 'include.*utils/builtins.h' -- *.c | xargs perl -pi -e 
>> 's{utils/builtins.h}{utils/fmgrprotos.h}'
>> There's a large number of changes that the oneliner produces that must
>> be reverted for the compile to be silent, but a large portion can
>> remain.  (I only tried src/backend/access).

> 92 files are changed, 241 files still require builtins.h.

Let's hold off on that till we decide on the bigger-picture plan here.
If we're going to try to move those functions out of builtins.h,
this result will change considerably.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-01-17 Thread Karl O. Pinc
On Tue, 17 Jan 2017 23:00:43 +0100
Gilles Darold  wrote:

> Le 17/01/2017 à 19:58, Karl O. Pinc a écrit :
> > On Tue, 17 Jan 2017 19:06:22 +0100
> > Gilles Darold  wrote:
> >  
> >> Le 17/01/2017 à 03:22, Michael Paquier a écrit :  
> >>> On Tue, Jan 17, 2017 at 2:21 AM, Karl O. Pinc 
> >>> wrote:
>  On January 15, 2017 11:47:51 PM CST, Michael Paquier
>   wrote:
> >  
> > Also, I would rather see an ERROR returned to the user in case
> > of bad data in current_logfiles, I did not change that either as
> > that's the original intention of Gilles.
> >> I'm not against adding a warning or error message here even if it
> >> may never occurs, but we need a new error code as it seems to me
> >> that no actual error code can be used.
> > Seems to me the XX001 "data_corrupted" sub-category of
> > XX000 "internal_error" is appropriate.  
> 
> I don't think so, this file doesn't contain any data and we must not
> report such error in this case. Somethink like "bad/unknow file
> format" would be better.

Maybe.  It's not user-supplied data that's corrupted but it is
PG generated data which is generated for and supplied to the user.
I just looked at all uses of XX001 and it is true that they all
involve corruption of user-supplied data.

If you don't want to use XX001 use XX000.  It does not seem worth
making a new error code for just this one case.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait

2017-01-17 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
> On Tue, Jan 17, 2017 at 4:46 PM, Stephen Frost  wrote:
> >> But what if we're restarting after, say, rebooting?  Then there's
> >> nobody to see the progress messages, perhaps.  The system just seems
> >> to take an eternity to return to the usual runlevel.
> >
> > Not unlike an fsck.
> 
> Right.  That's why people developed journaled filesystems like ext3
> and ext4 - because waiting for increasingly-large disks to be checked
> for errors sucked.  And that made fsck times vastly lower and everyone
> said "huzzah".  Because waiting for things to happen stinks, and
> people want to do as little of it as is reasonably possible.

Sure, but they still have a recovery process that they go through when
recovering from a crash, just as we do, and those things which are
waiting for the filesystem have to wait until it is.  If a PG user has
an issue with waiting for recovery to finish then they should make
checkpoints happen more often (typically by reducing
checkpoint_timeout...), so that we don't have as much to replay through
since the last one.

Just as a user could reduce the journal size of ext4 if they're worried
that it'll take too long for the system to replay the last set of
journaled entires during recovery after a crash.

> >> I saw the discussion on this thread, but I didn't realize that it
> >> meant that pg_ctl was going to wait for crash recovery, let alone
> >> archive recovery.  That seems not good.
> >
> > I disagree.  The database isn't done starting up until it's gone through
> > recovery.  If there are other bits of the system which are depending on
> > the database being online, shouldn't they wait until it's actually
> > online to be started?
> 
> They aren't necessarily depending on the database; they could be
> entirely unrelated.

Not in modern boot systems today...  If they aren't depending on the
database then they can get started as soon as everything they *do*
depend on is up and running.  Those daemons or what-have-you which
depend on the database say so through the init dependency system.

> > Admittedly, such processes should probably be prepared to try
> > reconnecting to the database on a failure, but I don't see this as
> > really all that different from how a journaling filesystem operates.
> 
> A journaling filesystem doesn't have a mode where it enters archive
> recovery mode and stays there permanently leaving the system in an
> unusable state.

Now there I agree with you, whatever we're doing with pg_ctl here
shouldn't mean that it never returns, but is that actually what happens
with pg_ctl --wait?  If so, then that's what is wrong, not this
particular patch which is just making --wait the default.

If I'm understanding your concern correctly, you're worried about the
case of a cold standby where the database is only replaying WAL but not
configured to come up as a hot standby and therefore PQping() won't ever
succeed?

Except, that isn't what would ever happen because the timeout for the
--wait option is 60s, according to the pg_ctl docs anyway, after which
it'll throw an error and say the server didn't start up, even if it
would have after a few minutes.  One could wonder why we have the
default set to a value lower than checkpoint_timeout, making it entirely
likely that the database recovery would take longer than the timeout on
a busy/high-volume server that's actually checkpointing on-time, but
just barely.

Perhaps we need a way for pg_ctl to realize a cold-standby case and
throw an error or warning if --wait is specified then, but that hardly
seems like the common use-case.  It also wouldn't make any sense to have
anything in the init system which depended on PG being up in such a case
because, well, PG isn't ever going to be 'up'.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix an assertion failure related to an exclusive backup.

2017-01-17 Thread Michael Paquier
On Tue, Jan 17, 2017 at 11:42 PM, Fujii Masao  wrote:
> With the patch, what happens if pg_stop_backup exits with an error
> after removing backup_label file before resetting the backup state
> to none?

Removing the backup_label file is the last error that can happen
during the time the callback is set. And the counter is reset
immediately after.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait

2017-01-17 Thread Robert Haas
On Tue, Jan 17, 2017 at 4:46 PM, Stephen Frost  wrote:
>> But what if we're restarting after, say, rebooting?  Then there's
>> nobody to see the progress messages, perhaps.  The system just seems
>> to take an eternity to return to the usual runlevel.
>
> Not unlike an fsck.

Right.  That's why people developed journaled filesystems like ext3
and ext4 - because waiting for increasingly-large disks to be checked
for errors sucked.  And that made fsck times vastly lower and everyone
said "huzzah".  Because waiting for things to happen stinks, and
people want to do as little of it as is reasonably possible.

>> I saw the discussion on this thread, but I didn't realize that it
>> meant that pg_ctl was going to wait for crash recovery, let alone
>> archive recovery.  That seems not good.
>
> I disagree.  The database isn't done starting up until it's gone through
> recovery.  If there are other bits of the system which are depending on
> the database being online, shouldn't they wait until it's actually
> online to be started?

They aren't necessarily depending on the database; they could be
entirely unrelated.

> Admittedly, such processes should probably be prepared to try
> reconnecting to the database on a failure, but I don't see this as
> really all that different from how a journaling filesystem operates.

A journaling filesystem doesn't have a mode where it enters archive
recovery mode and stays there permanently leaving the system in an
unusable state.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-01-17 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
> I'm OK with continuing to use "xlog" as the user-facing name for the
> write-ahead log, and I am OK with switching to wal.  But leaving
> things in the halfway in-between state where they are right now seems
> like a mess.  It conveniences the people who happen to care about the
> names of the parts we haven't renamed yet but not the part we already
> did rename at the price of a permanent inconsistency in naming
> conventions that can't ever eliminate.  Yuck.

I'm more of a '-1' when it comes to keeping everything called "xlog",
and a '+1' on renaming everything to be WAL, since that's what we
usually refer to this thing as, but I agree with Robert that only doing
a half-way renaming is about a -1000.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-01-17 Thread Robert Haas
On Mon, Jan 16, 2017 at 10:48 PM, Fujii Masao  wrote:
> If we do that, we should vote on all the "renaming" stuff, i.e., not only
> function names but also program names like pg_receivexlog, directory names
> like clog, option names like xlogdir option of initdb, return value names of
> the functions like xlog_position in pg_create_physical_replication_slot, etc.

Right.  I think a lot of that stuff should also be changed.  If we
weren't OK with breaking compatibility, why'd we change pg_xlog ->
pg_wal?  If we're not willing to change other things to match, let's
revert that change and be done with it.  It's undeniable that there's
going to be some pain here and I'm not committed to incurring that
pain, but I don't really understand the the theory that saying that a
half-renaming will save us pain vs. a more through renaming.  If we
change some things and not others, people will have to try to remember
in which cases they now have to say xlog instead of wal and in which
cases they still have to say xlog and (perhaps) in which cases they
can choose either.  I think "xlog" is terrible naming; there is no
universe in which "x" is a reasonable short-hand for either
"transaction" or "write-ahead".

Q: OK, where is my WAL stored?
A: pg_wal
Q: How do I reset it?
A: pg_resetxlog
Q: Why is it called pg_resetxlog?
A: Because we call the transaction log "xlog".
Q: Evidently you don't.
A: Well, it used to be called pg_xlog prior to version 10, but then we
renamed it, but the tool has still got the old name.
Q: Are you kidding me?
A: No.
Q: Do you guys suck at picking names for things?
A: Yes.
Q: Wouldn't it at least be better to settle on ONE incomprehensible
abbreviation for any given concept instead of having TWO DIFFERENT
ONES?
A: Hey, look at the time.

If we don't make this all consistent, we'll be having that
conversation -- or some variant of it -- forever.  There will be
constant arguments about whether to give in and rename some more
things.  The people who are now saying that we shouldn't break
compatibility for this change (as if we haven't already) will be even
less happy when they keep having to argue against breaking
compatibility for the same thing a second time.

I'm OK with continuing to use "xlog" as the user-facing name for the
write-ahead log, and I am OK with switching to wal.  But leaving
things in the halfway in-between state where they are right now seems
like a mess.  It conveniences the people who happen to care about the
names of the parts we haven't renamed yet but not the part we already
did rename at the price of a permanent inconsistency in naming
conventions that can't ever eliminate.  Yuck.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-01-17 Thread Gilles Darold
Le 17/01/2017 à 19:58, Karl O. Pinc a écrit :
> On Tue, 17 Jan 2017 19:06:22 +0100
> Gilles Darold  wrote:
>
>> Le 17/01/2017 à 03:22, Michael Paquier a écrit :
>>> On Tue, Jan 17, 2017 at 2:21 AM, Karl O. Pinc 
>>> wrote:  
 On January 15, 2017 11:47:51 PM CST, Michael Paquier
  wrote:  
>
> Also, I would rather see an ERROR returned to the user in case of
> bad data in current_logfiles, I did not change that either as
> that's the original intention of Gilles.  
>> I'm not against adding a warning or error message here even if it may
>> never occurs, but we need a new error code as it seems to me that no
>> actual error code can be used.  
> Seems to me the XX001 "data_corrupted" sub-category of
> XX000 "internal_error" is appropriate.

I don't think so, this file doesn't contain any data and we must not
report such error in this case. Somethink like "bad/unknow file format"
would be better.

 

-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: pg_authid.rolpassword format (was Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol)

2017-01-17 Thread Peter Eisentraut
On 1/3/17 9:09 AM, Heikki Linnakangas wrote:
> Since not everyone agrees with this approach, I split this patch into 
> two. The first patch refactors things, replacing the isMD5() function 
> with get_password_type(), without changing the representation of 
> pg_authid.rolpassword. That is hopefully uncontroversial.

I have checked these patches.

The refactoring in the first patch seems sensible.  As Michael pointed
out, there is still a reference to "plain:" in the first patch.

The commit message needs to be updated, because the function
plain_crypt_verify() was already added in a previous patch.

I'm not fond of this kind of coding

password = encrypt_password(password_type, stmt->role, password);

where the 'password' variable has a different meaning before and after.

This error message might be a mistake:

elog(ERROR, "unrecognized password type conversion");

I think some pieces from the second patch could be included in the first
patch, e.g., the parts for passwordcheck.c and user.c.

> And the second 
> patch adds the "plain:" prefix, which not everyone agrees on.

The code also gets a little bit dubious, as it introduces an "unknown"
password type, which is sometimes treated as plaintext and sometimes as
an error.  I think this is going be messy.

I would skip this patch for now at least.  Too much controversy, and we
don't know how the rest of the patches for this feature will look like
to be able to know if it's worth it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait

2017-01-17 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
> On Tue, Jan 17, 2017 at 11:27 AM, Peter Eisentraut
>  wrote:
> > On 1/15/17 11:40 PM, Fujii Masao wrote:
> >> This change may confuse the users who run "pg_ctl start" to perform a crash
> >> recovery, archive recovery and standby server (with hot_standby=off) 
> >> because
> >> "pg_ctl start" would not return so long time.
> >
> > Well, this change was made because the previous behavior confused people
> > as well, because pg_ctl would return before it was actually done.
> >
> > The new state shouldn't be confusing because pg_ctl prints out progress
> > messages.
> 
> But what if we're restarting after, say, rebooting?  Then there's
> nobody to see the progress messages, perhaps.  The system just seems
> to take an eternity to return to the usual runlevel.

Not unlike an fsck.

> I saw the discussion on this thread, but I didn't realize that it
> meant that pg_ctl was going to wait for crash recovery, let alone
> archive recovery.  That seems not good.

I disagree.  The database isn't done starting up until it's gone through
recovery.  If there are other bits of the system which are depending on
the database being online, shouldn't they wait until it's actually
online to be started?

Admittedly, such processes should probably be prepared to try
reconnecting to the database on a failure, but I don't see this as
really all that different from how a journaling filesystem operates.
Now, perhaps the init process needs to be adjusted so that the progress
reports from pg_ctl are seen by the user, but that's outside of PG.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Logical Replication WIP

2017-01-17 Thread Robert Haas
On Tue, Jan 17, 2017 at 11:15 AM, Petr Jelinek
 wrote:
>> Is there anything stopping anyone from implementing it?
>
> No, just didn't seem priority for the functionality right now.

Why is it OK for this to not support rename like everything else does?
 It shouldn't be more than a few hours of work to fix that, and I
think leaving stuff like that out just because it's a lower priority
is fairly short-sighted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait

2017-01-17 Thread Robert Haas
On Tue, Jan 17, 2017 at 11:27 AM, Peter Eisentraut
 wrote:
> On 1/15/17 11:40 PM, Fujii Masao wrote:
>> This change may confuse the users who run "pg_ctl start" to perform a crash
>> recovery, archive recovery and standby server (with hot_standby=off) because
>> "pg_ctl start" would not return so long time.
>
> Well, this change was made because the previous behavior confused people
> as well, because pg_ctl would return before it was actually done.
>
> The new state shouldn't be confusing because pg_ctl prints out progress
> messages.

But what if we're restarting after, say, rebooting?  Then there's
nobody to see the progress messages, perhaps.  The system just seems
to take an eternity to return to the usual runlevel.

I saw the discussion on this thread, but I didn't realize that it
meant that pg_ctl was going to wait for crash recovery, let alone
archive recovery.  That seems not good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Generate fmgr prototypes automatically

2017-01-17 Thread Alvaro Herrera
Alvaro Herrera wrote:

> It is possible to replace many occurrences of builtins.h with
> fmgrprotos.h.  I just tried this
>git grep -l 'include.*utils/builtins.h' -- *.c | xargs perl -pi -e 
> 's{utils/builtins.h}{utils/fmgrprotos.h}'
> There's a large number of changes that the oneliner produces that must
> be reverted for the compile to be silent, but a large portion can
> remain.  (I only tried src/backend/access).

92 files are changed, 241 files still require builtins.h.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
commit 36f1297a9214702f8c67ad96dfc28cb99a66722b
Author: Alvaro Herrera 
AuthorDate: Tue Jan 17 18:32:20 2017 -0300
CommitDate: Tue Jan 17 18:34:20 2017 -0300

remove builtins when fmgrprotos is enough

diff --git a/contrib/btree_gist/btree_bit.c b/contrib/btree_gist/btree_bit.c
index f34fa87..9ce06c9 100644
--- a/contrib/btree_gist/btree_bit.c
+++ b/contrib/btree_gist/btree_bit.c
@@ -5,7 +5,7 @@
 
 #include "btree_gist.h"
 #include "btree_utils_var.h"
-#include "utils/builtins.h"
+#include "utils/fmgrprotos.h"
 #include "utils/bytea.h"
 #include "utils/varbit.h"
 
diff --git a/contrib/btree_gist/btree_bytea.c b/contrib/btree_gist/btree_bytea.c
index df6c960..1be494d 100644
--- a/contrib/btree_gist/btree_bytea.c
+++ b/contrib/btree_gist/btree_bytea.c
@@ -5,7 +5,7 @@
 
 #include "btree_gist.h"
 #include "btree_utils_var.h"
-#include "utils/builtins.h"
+#include "utils/fmgrprotos.h"
 #include "utils/bytea.h"
 
 
diff --git a/contrib/btree_gist/btree_date.c b/contrib/btree_gist/btree_date.c
index 56031d4..dc55582 100644
--- a/contrib/btree_gist/btree_date.c
+++ b/contrib/btree_gist/btree_date.c
@@ -5,7 +5,7 @@
 
 #include "btree_gist.h"
 #include "btree_utils_num.h"
-#include "utils/builtins.h"
+#include "utils/fmgrprotos.h"
 #include "utils/date.h"
 
 typedef struct
diff --git a/contrib/btree_gist/btree_gist.c b/contrib/btree_gist/btree_gist.c
index e1dc253..d3f13d6 100644
--- a/contrib/btree_gist/btree_gist.c
+++ b/contrib/btree_gist/btree_gist.c
@@ -27,7 +27,7 @@ gbtreekey_in(PG_FUNCTION_ARGS)
 }
 
 #include "btree_utils_var.h"
-#include "utils/builtins.h"
+#include "utils/fmgrprotos.h"
 Datum
 gbtreekey_out(PG_FUNCTION_ARGS)
 {
diff --git a/contrib/btree_gist/btree_interval.c 
b/contrib/btree_gist/btree_interval.c
index e5cd0a2..afb205e 100644
--- a/contrib/btree_gist/btree_interval.c
+++ b/contrib/btree_gist/btree_interval.c
@@ -5,7 +5,7 @@
 
 #include "btree_gist.h"
 #include "btree_utils_num.h"
-#include "utils/builtins.h"
+#include "utils/fmgrprotos.h"
 #include "utils/timestamp.h"
 
 typedef struct
diff --git a/contrib/btree_gist/btree_macaddr.c 
b/contrib/btree_gist/btree_macaddr.c
index 87d96c0..5a545ac 100644
--- a/contrib/btree_gist/btree_macaddr.c
+++ b/contrib/btree_gist/btree_macaddr.c
@@ -5,7 +5,7 @@
 
 #include "btree_gist.h"
 #include "btree_utils_num.h"
-#include "utils/builtins.h"
+#include "utils/fmgrprotos.h"
 #include "utils/inet.h"
 
 typedef struct
diff --git a/contrib/btree_gist/btree_text.c b/contrib/btree_gist/btree_text.c
index 2e00cb6..751e165 100644
--- a/contrib/btree_gist/btree_text.c
+++ b/contrib/btree_gist/btree_text.c
@@ -5,7 +5,7 @@
 
 #include "btree_gist.h"
 #include "btree_utils_var.h"
-#include "utils/builtins.h"
+#include "utils/fmgrprotos.h"
 
 /*
 ** Text ops
diff --git a/contrib/btree_gist/btree_time.c b/contrib/btree_gist/btree_time.c
index 27f30bc..9d19a10 100644
--- a/contrib/btree_gist/btree_time.c
+++ b/contrib/btree_gist/btree_time.c
@@ -5,7 +5,7 @@
 
 #include "btree_gist.h"
 #include "btree_utils_num.h"
-#include "utils/builtins.h"
+#include "utils/fmgrprotos.h"
 #include "utils/date.h"
 #include "utils/timestamp.h"
 
diff --git a/contrib/btree_gist/btree_utils_var.c 
b/contrib/btree_gist/btree_utils_var.c
index 70b3794..5af5164 100644
--- a/contrib/btree_gist/btree_utils_var.c
+++ b/contrib/btree_gist/btree_utils_var.c
@@ -11,7 +11,7 @@
 
 #include "btree_utils_var.h"
 #include "utils/pg_locale.h"
-#include "utils/builtins.h"
+#include "utils/fmgrprotos.h"
 #include "utils/rel.h"
 
 /* used for key sorting */
diff --git a/contrib/intarray/_int_bool.c b/contrib/intarray/_int_bool.c
index 5d9e676..4973ce6 100644
--- a/contrib/intarray/_int_bool.c
+++ b/contrib/intarray/_int_bool.c
@@ -4,7 +4,7 @@
 #include "postgres.h"
 
 #include "miscadmin.h"
-#include "utils/builtins.h"
+#include "utils/fmgrprotos.h"
 
 #include "_int.h"
 
diff --git a/contrib/intarray/_int_selfuncs.c b/contrib/intarray/_int_selfuncs.c
index 9b4a22f..a89eb57 100644
--- a/contrib/intarray/_int_selfuncs.c
+++ b/contrib/intarray/_int_selfuncs.c
@@ -19,7 +19,7 @@
 #include "catalog/pg_operator.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_type.h"
-#include "utils/builtins.h"
+#include "utils/fmgrprotos.h"
 #include "utils/selfuncs.h"
 #include "utils/syscache.h"
 #include "utils/lsyscache.h"
diff --git a/contrib/isn/isn.c 

Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)

2017-01-17 Thread Robert Haas
On Fri, Jan 13, 2017 at 10:58 AM, Dilip Kumar  wrote:
> If you execute the query with EXECUTE then commandTag will be EXECUTE
> that way we will not show the actual query type, I mean all the
> statements will get the common tag "EXECUTE".

Somebody might say that's a feature, not a bug.  But then again,
someone else might say the opposite.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Too many autovacuum workers spawned during forced auto-vacuum

2017-01-17 Thread Robert Haas
On Fri, Jan 13, 2017 at 8:45 AM, Alvaro Herrera
 wrote:
> Amit Khandekar wrote:
>> In a server where autovacuum is disabled and its databases reach
>> autovacuum_freeze_max_age limit, an autovacuum is forced to prevent
>> xid wraparound issues. At this stage, when the server is loaded with a
>> lot of DML operations, an exceedingly high number of autovacuum
>> workers keep on getting spawned, and these do not do anything, and
>> then quit.
>
> I think this is the same problem as reported in
> https://www.postgresql.org/message-id/CAMkU=1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta=ypy...@mail.gmail.com

If I understand correctly, and it's possible that I don't, the issues
are distinct.  I think that the issue in that thread has to do with
the autovacuum launcher starting workers over and over again in a
tight loop, whereas this issue seems to be about autovacuum workers
restarting the launcher over and over again in a tight loop.  In that
thread, it's the autovacuum launcher that is looping, which can only
happen when autovacuum=on.  In this thread, the autovacuum launcher is
repeatedly exiting and getting restarted, which can only happen when
autovacuum=off.

In general, it seems we've been pretty cavalier about just how often
it's reasonable to start the autovacuum launcher when autovacuum=off.
That code probably doesn't see much real-world use.  Foreground
processes signal the postmaster only every 64kB transactions, which on
today's hardware can't happen more than every couple of seconds if
you're not using subtransactions or intentionally burning XIDs, but
hardware keeps getting faster, and you might be using subtransactions.
However, requiring that 65,536 transactions pass between signals does
serve as something of a rate limit.  In the case about which Amit is
complaining, there's no rate limit at all.  As fast as the autovacuum
launcher starts up, it spawns a worker and exits; as fast as the
worker can determine that it can't do anything useful, it starts a new
launcher.  Clearly, some kind of rate control is needed here; the only
question is about where to put it.

I would be tempted to install something directly in postmaster.c.  If
CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER) && Shutdown ==
NoShutdown but we last set start_autovac_launcher = true less than 10
seconds ago, don't do it again.  That limits us to launching the
autovacuum launcher at most six times a minute when autovacuum = off.
You could argue that defeats the point of the SendPostmasterSignal in
SetTransactionIdLimit, but I don't think so.  If vacuuming the oldest
database took less than 10 seconds, then we won't vacuum the
next-oldest database until we hit the next 64kB transaction ID
boundary, but that can only cause a problem if we've got so many
databases that we don't get to them all before we run out of
transaction IDs, which is almost unthinkable.  If you had a ten
million tiny databases that all crossed the threshold at the same
instant, it would take you 640 million transaction IDs to visit them
all.  If you also had autovacuum_freeze_max_age set very close to the
upper limit for that variable, you could conceivably have the system
shut down before all of those databases were reached.  But that's a
pretty artificial scenario.  If someone has that scenario, perhaps
they should consider more sensible configuration choices.

I wondered for a while why the existing guard in
vac_update_datfrozenxid() isn't sufficient to prevent this problem.
That turns out to be due to Tom's commit
794e3e81a0e8068de2606015352c1254cb071a78, which causes
ForceTransactionIdLimitUpdate() always returns true when we're past
xidVacLimit.  The commit doesn't contain much in the way of
justification for the change, but I think the issue must be that if
the database nearest to wraparound is dropped, we need some mechanism
for eventually forcing xidVacLimit to get updated, rather than just
spewing warnings.

Another place where we could insert a guard is inside
SetTransactionIdLimit itself.  This is a little tricky.  The easy idea
would be just to skip sending the signal if xidVacLimit hasn't
advanced, but that's wrong in the case where there are multiple
databases with exactly the same oldest XID; vacuuming the first one
doesn't change anything.  It would be correct -- I think -- to skip
sending the signal when xidVacLimit doesn't advance and
vac_update_datfrozenxid() didn't change the current database's value
either, but that requires passing a flag down the call stack a few
levels.  That's only mildly ugly so I'd be fine with it if it were the
best fix, but there seem to be better options.

Amit's chosen yet another possible place to insert the guard: teach
autovacuum that if a worker skips at least one table due to concurrent
autovacuum activity AND ends up vacuuming no tables, don't call
vac_update_datfrozenxid().  Since there is or was another worker
running, vac_update_datfrozenxid() either already has been called or

Re: [HACKERS] Gather Merge

2017-01-17 Thread Peter Geoghegan
On Tue, Jan 17, 2017 at 4:26 AM, Rushabh Lathia
 wrote:
> Another observation is, HashAggregate (case 1) is performs better compare to
> GroupAggregate (case 2), but still it doesn't justify the cost difference of
> those two.

It may not be the only issue, or even the main issue, but I'm fairly
suspicious of the fact that cost_sort() doesn't distinguish between
the comparison cost of text and int4, for example.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Generate fmgr prototypes automatically

2017-01-17 Thread Alvaro Herrera
Tom Lane wrote:

> Alternatively ... is there a specific reason why you chose to make
> builtins.h the key inclusion file for this change, rather than having
> callers include fmgrprotos.h directly?  It seems like the stuff remaining
> in builtins.h is just a laundry list of random utility functions.
> Maybe dispersing those to other headers is the thing to do.

It is possible to replace many occurrences of builtins.h with
fmgrprotos.h.  I just tried this
   git grep -l 'include.*utils/builtins.h' -- *.c | xargs perl -pi -e 
's{utils/builtins.h}{utils/fmgrprotos.h}'
There's a large number of changes that the oneliner produces that must
be reverted for the compile to be silent, but a large portion can
remain.  (I only tried src/backend/access).

Anyway I support the idea of creating other header files for specific
purposes, such as ruleutils.h, varlena.h, etc.  (I am against creating a
header file that contains commonly used macros such as
CStringGetTextDatum together with varstr_sortsupport.  I think the
latter should have its own place).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] generating fmgr prototypes automatically

2017-01-17 Thread Peter Eisentraut
On 1/5/17 12:24 PM, Pavel Stehule wrote:
> I checked last set of patches and I didn't find any issue. 
> 
> There are no problems with patching, compilation and all regress tests
> passed.
> 
> I'll mark this patch as ready for commiter

This has been committed.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DROP FUNCTION of multiple functions

2017-01-17 Thread Peter Eisentraut
On 1/10/17 1:52 AM, Michael Paquier wrote:
> I don't see any problems with 0001.

I was wondering, should we rename funcname -> name, and funcargs ->
args, or perhaps the whole FuncWithArgs struct, so there is no confusion
when used with operators?

 In 0002, the comment of
> class_args/CreateOpClassItem in parsenodes.h needs to be updated,
> because this becomes used by operators as well.

I think that aspect might be a mistake in my patch.  I'll check this
further, but I think the comment ought to remain true.

> One comment though: there are still many list_make2() or even
> list_make3 calls for some object types. Would it make sense to replace
> those lists with a decided number of items by a Node and simplify the
> interface?

(I don't see any list_make3.)  It would be nice to refine this further,
but the remaining uses are quite marginal.  The main problem was that
before you had to create singleton lists and then unpack them, because
there was no other way.  The remaining uses are more genuine lists or lcons.

> drop_type1 and drop_type2 are not really explicit, especially after
> reading that one allows schema-qualified names, not the other. Could
> you use something like drop_type_qualified? The same applies to
> comment_type and security_label.

Yeah, I'll try to find better names.

> 0004 could be merged with 0002, no? That looks good to me.

I think 0001 through 0004 would be committed together.  The split is
just for review.

> In 0005, a nit:
> +DROP FUNCTION functest_IS_1(int, int, text), functest_IS_2(int),
> functest_IS_3(int);
>  -- Cleanups
> The DROP query could be moved below the cleanup comment.

I can do that, but the idea was that the commands below the cleanups
line weren't really tests.

> It may be a good idea to add an example of query dropping two
> functions at once in the docs.

Yes.

> Could you add some regression tests for the drop of aggregates and
> operators to stress the parsing code path?

Yes.

> While looking at 0006... DROP POLICY and DROP RULE could be unified. I
> just noticed that while reading the code.

DROP TRIGGER also looks similar.  drop_type3 then. ;-)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2017-01-17 Thread Tom Lane
Andres Freund  writes:
> On 2017-01-17 13:43:38 -0500, Tom Lane wrote:
>> I'm not convinced that that optimization is worth preserving, but if we
>> keep it then ProjectSet isn't le mot juste here, any more than you'd want
>> to rename Result to Project without changing its existing
>> functionality.

> Right. I'd removed that, and re-added it; primarily because the plans
> looked more complex without it. After all, you'd thought it worth adding
> that hack ;)   I'm happy with removing it again too.

Well, it seemed reasonable to do that as long as the only cost was ten or
so lines in create_projection_plan.  But if we're contorting not only the
behavior but the very name of the SRF-evaluation plan node type, that's
not negligible cost anymore.  So now I'm inclined to take it out.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Grouped base relation

2017-01-17 Thread Antonin Houska
[ Trying to respond to both Tomas and David. I'll check tomorrow if anything
else of the thread needs my comment. ]

Tomas Vondra  wrote:

> On 01/17/2017 12:42 AM, David Rowley wrote:
> > On 10 January 2017 at 06:56, Antonin Houska  wrote:

> > I've been thinking about this aggtransmultifn and I'm not sure if it's
> > really needed. Adding a whole series of new transition functions is
> > quite a pain. At least I think so, and I have a feeling Robert might
> > agree with me.
> >
> > Let's imagine some worst case (and somewhat silly) aggregate query:
> >
> > SELECT count(*)
> > FROM million_row_table
> > CROSS JOIN another_million_row_table;
> >
> > Today that's going to cause 1 TRILLION transitions! Performance will
> > be terrible.
> >
> > If we pushed the aggregate down into one of those tables and performed
> > a Partial Aggregate on that, then a Finalize Aggregate on that single
> > row result (after the join), then that's 1 million transfn calls, and
> > 1 million combinefn calls, one for each row produced by the join.
> >
> > If we did it your way (providing I understand your proposal correctly)
> > there's 1 million transfn calls on one relation, then 1 million on the
> > other and then 1 multiplyfn call. which does 100 * 100
> >
> > What did we save vs. using the existing aggcombinefn infrastructure
> > which went into 9.6? Using this actually costs us 1 extra function
> > call, right? I'd imagine the size of the patch to use aggcombinefn
> > instead would be a fraction of the size of the one which included all
> > the new aggmultiplyfns and pg_aggregate.h changes.
> >

> I think the patch relies on the assumption that the grouping reduces
> cardinality,

Yes.

> so a CROSS JOIN without a GROUP BY clause may not be the best
> counterexample.

Yet it tells me that my approach is not ideal in some cases ...

> > It sounds like a much more manageable project by using aggcombinefn
> > instead. Then maybe one day when we can detect if a join did not cause
> > any result duplication (i.e Unique Joins), we could finalise the
> > aggregates on the first call, and completely skip the combine state
> > altogether.
> >

> I don't quite see how the patch could use aggcombinefn without sacrificing a
> lot of the benefits. Sure, it's possible to run the aggcombinefn in a loop
> (with number of iterations determined by the group size on the other side of
> the join), but that sounds pretty expensive and eliminates the reduction of
> transition function calls. The join cardinality would still be reduced,
> though.

That's what I think. The generic case is that neither side of the join is
unique. If it appears that both relations should be aggregated below the join,
aggcombinefn would have to be called multiple times on each output row of the
join to achieve the same result as the calling aggmultiplyfn.

> I do have other question about the patch, however. It seems to rely on the
> fact that the grouping and joins both reference the same columns. I wonder how
> uncommon such queries are.
> 
> To give a reasonable example, imagine the typical start schema, which is
> pretty standard for large analytical databases. A dimension table is
> "products" and the fact table is "sales", and the schema might look like this:
> 
> CREATE TABLE products (
> idSERIAL PRIMARY KEY,
> name  TEXT,
> category_id   INT,
> producer_id   INT
> );
> 
> CREATE TABLE sales (
> product_idREFERENCES products (id),
> nitemsINT,
> price NUMERIC
> );
> 
> A typical query then looks like this:
> 
> SELECT category_id, SUM(nitems), SUM(price)
> FROM products p JOIN sales s ON (p.id = s.product_id)
> GROUP BY p.category_id;
> 
> which obviously uses different columns for the grouping and join, and so the
> patch won't help with that. Of course, a query grouping by product_id would
> allow the patch to work

Right, the current version does not handle this. Thanks for suggestion.

> Another thing is that in my experience most queries do joins on foreign keys
> (so the PK side is unique by definition), so the benefit on practical examples
> is likely much smaller.

ok. So in some cases the David's approach might be better.

However I think the ability to join 2 grouped (originally non-unique)
relations is still important. Consider a query involving "sales" as well as
another table which also has many-to-one relationship to "products".

> But I guess my main question is if there are actual examples of queries the
> patch is trying to improve, or whether the general benefit is allowing
> parallel plans for queries where it would not be possible otherwise.

In fact I did all this with postgres_fdw in mind.

>From this perspective, David's approach can be slightly more efficient if all
the tables are local, but aggregation of multiple base relations below the
join can save a lot of effort if the tables are remote (as it 

Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-01-17 Thread Karl O. Pinc
On Tue, 17 Jan 2017 19:06:22 +0100
Gilles Darold  wrote:

> Le 17/01/2017 à 03:22, Michael Paquier a écrit :
> > On Tue, Jan 17, 2017 at 2:21 AM, Karl O. Pinc 
> > wrote:  
> >> On January 15, 2017 11:47:51 PM CST, Michael Paquier
> >>  wrote:  


> >>> Also, I would rather see an ERROR returned to the user in case of
> >>> bad data in current_logfiles, I did not change that either as
> >>> that's the original intention of Gilles.  

> I'm not against adding a warning or error message here even if it may
> never occurs, but we need a new error code as it seems to me that no
> actual error code can be used.  

Seems to me the XX001 "data_corrupted" sub-category of
XX000 "internal_error" is appropriate.

https://www.postgresql.org/docs/devel/static/errcodes-appendix.html

As a dbadmin/sysadm I'd defiantly like to see something in the log if
you're going to raise anything user-side.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2017-01-17 Thread Andres Freund
On 2017-01-17 13:43:38 -0500, Tom Lane wrote:
> Although ... looking closer at Andres' patch, the new node type *is*
> channeling Result, in the sense that it might or might not have any input
> plan.  This probably traces to what I wrote in September:
>
> + * XXX Possibly-temporary hack: if the subpath is a dummy ResultPath,
> + * don't bother with it, just make a Result with no input.  This avoids 
> an
> + * extra Result plan node when doing "SELECT srf()".  Depending on what 
> we
> + * decide about the desired plan structure for SRF-expanding nodes, this
> + * optimization might have to go away, and in any case it'll probably 
> look
> + * a good bit different.
>
> I'm not convinced that that optimization is worth preserving, but if we
> keep it then ProjectSet isn't le mot juste here, any more than you'd want
> to rename Result to Project without changing its existing
> functionality.

Right. I'd removed that, and re-added it; primarily because the plans
looked more complex without it. After all, you'd thought it worth adding
that hack ;)   I'm happy with removing it again too.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2017-01-17 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jan 17, 2017 at 1:18 PM, Tom Lane  wrote:
>> Using Result for two completely different things is a wart though.  If we
>> had it to do over I think we'd define Result as a scan node that produces
>> rows from no input, and create a separate Project node for the case of
>> projecting from input tuples.  People are used to seeing Result in EXPLAIN
>> output, so it's not worth the trouble of changing that IMO, but we don't
>> have to use it as a model for more node types.

> +1, although I think changing the existing node would be fine too if
> somebody wanted to do the work.  It's not worth having that wart
> forever just to avoid whatever minor pain-of-adjustment might be
> involved.

Although ... looking closer at Andres' patch, the new node type *is*
channeling Result, in the sense that it might or might not have any input
plan.  This probably traces to what I wrote in September:

+ * XXX Possibly-temporary hack: if the subpath is a dummy ResultPath,
+ * don't bother with it, just make a Result with no input.  This avoids an
+ * extra Result plan node when doing "SELECT srf()".  Depending on what we
+ * decide about the desired plan structure for SRF-expanding nodes, this
+ * optimization might have to go away, and in any case it'll probably look
+ * a good bit different.

I'm not convinced that that optimization is worth preserving, but if we
keep it then ProjectSet isn't le mot juste here, any more than you'd want
to rename Result to Project without changing its existing functionality.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PSQL commands: \quit_if, \quit_unless

2017-01-17 Thread Robert Haas
On Sat, Jan 14, 2017 at 12:22 AM, Tom Lane  wrote:
>
> $ cat loop.sql
> \if :x < 1000
>   \echo :x
>   \set x :x + 1
>   \include loop.sql
> \fi
> $ psql --set x=0 -f loop.sql
>
> Somebody is going to think of that workaround for not having loops, and
> then whine about how psql runs out of file descriptors and/or stack.

Hmm, I think somebody just DID think of it.

But personally this doesn't upset me a bit.  If somebody complains
about that particular thing, I think that would be an excellent time
to suggest that they write a patch to add a looping construct.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2017-01-17 Thread Robert Haas
On Tue, Jan 17, 2017 at 1:18 PM, Tom Lane  wrote:
> Andres Freund  writes:
>> I'd not have gone for SetResult if we didn't already have Result.  I'm
>> not super happy ending up having Project in ProjectSet but not in the
>> Result that end up doing the majority of the projection.  But eh, we can
>> live with it.
>
> Using Result for two completely different things is a wart though.  If we
> had it to do over I think we'd define Result as a scan node that produces
> rows from no input, and create a separate Project node for the case of
> projecting from input tuples.  People are used to seeing Result in EXPLAIN
> output, so it's not worth the trouble of changing that IMO, but we don't
> have to use it as a model for more node types.

+1, although I think changing the existing node would be fine too if
somebody wanted to do the work.  It's not worth having that wart
forever just to avoid whatever minor pain-of-adjustment might be
involved.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2017-01-17 Thread Tom Lane
Andres Freund  writes:
> I'd not have gone for SetResult if we didn't already have Result.  I'm
> not super happy ending up having Project in ProjectSet but not in the
> Result that end up doing the majority of the projection.  But eh, we can
> live with it.

Using Result for two completely different things is a wart though.  If we
had it to do over I think we'd define Result as a scan node that produces
rows from no input, and create a separate Project node for the case of
projecting from input tuples.  People are used to seeing Result in EXPLAIN
output, so it's not worth the trouble of changing that IMO, but we don't
have to use it as a model for more node types.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2017-01-17 Thread Pavel Stehule
2017-01-16 23:51 GMT+01:00 Alvaro Herrera :

> Given
> https://www.postgresql.org/message-id/20170116210019.
> a3glfwspg5lnf...@alap3.anarazel.de
> which is going to heavily change how the executor works in this area, I
> am returning this patch to you again.  I would like a few rather minor
> changes:
>
> 1. to_xmlstr can be replaced with calls to xmlCharStrdup.
> 2. don't need xml_xmlnodetostr either -- just use xml_xmlnodetoxmltype
>(which returns text*) and extract the cstring from the varlena.  It's
>a bit more wasteful in terms of cycles, but I don't think we care.
>If we do care, change the function so that it returns cstring, and
>have the callers that want text wrap it in cstring_to_text.
> 3. have a new perValueCxt memcxt in TableExprState, child of buildercxt,
>and switch to it just before GetValue() (reset it just before
>switching).  Then, don't worry about leaks in GetValue.  This way,
>the text* conversions et al don't matter.
>
> After that I think we're going to need to get this working on top of
> Andres' changes.  Which I'm afraid is going to be rather major surgery,
> but I haven't looked.
>

I'll try to clean xml part first, and then I can reflect the SRF changes. I
am not sure if I understand to all your proposed changes here, I have to
look there.

Regards

Pavel

>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-01-17 Thread Gilles Darold
Le 17/01/2017 à 03:22, Michael Paquier a écrit :
> On Tue, Jan 17, 2017 at 2:21 AM, Karl O. Pinc  wrote:
>> On January 15, 2017 11:47:51 PM CST, Michael Paquier 
>>  wrote:
>>> On Mon, Jan 16, 2017 at 1:14 PM, Karl O. Pinc  wrote:
>>> With all those issues fixed, I finish with the attached, that I am
>>> fine to pass down to a committer. I still think that this should be
>>> only one function using a SRF with rows shaped as (type, path) for
>>> simplicity, but as I am visibly outnumbered I won't insist further.
>> That also makes a certain amount of sense to me but I can't say I have 
>> thought deeply about it. Haven't paid any attention to this issue and am 
>> fine letting things move forward as is.
> Gilles, what's your opinion here? As the author that's your call at
> the end. What the point here is would be to change
> pg_current_logfiles() to something like that:
> =# SELECT * FROM pg_current_logfiles();
>  method | file
> |
>  stderr | pg_log/postgresql.log
>  csvlog | pg_log/postgresql.csv
> And using this SRF users can filter the method with a WHERE clause.
> And as a result the 1-arg version is removed. No rows are returned if
> current_logfiles does not exist. I don't think there is much need for
> a system view either.

This have already been discuted previoulsy in this thread, one of my
previous patch version has implemented this behavior but we decide that
what we really want is to be able to use the function using the
following simple query:

SELECT pg_read_file(pg_current_logfiles());

and not something like

SELECT pg_read_file(SELECT file FROM pg_current_logfiles() LIMIT 1);
or
SELECT pg_read_file(SELECT file FROM pg_current_logfiles() WHERE
method='stderr');

You can also obtain the "kind" of output from the SRF function using:

SELECT pg_read_file('current_logfiles');


>>> Also, I would rather see an ERROR returned to the user in case of bad
>>> data in current_logfiles, I did not change that either as that's the
>>> original intention of Gilles.
>> I could be wrong but I seem to recall that Robert recommended against an 
>> error message. If there is bad data then something is really wrong, up to 
>> some sort of an attack on the back end. Because this sort of thing simply 
>> shouldn't happen it's enough in my opinion to guard against buffer overruns 
>> etc and just get on with life. If something goes unexpectedly and badly 
>> wrong with internal data structures in general there would have to be all 
>> kinds of additional code to cover every possible problem and that doesn't 
>> seem to make sense.
> Hm... OK. At the same time not throwing at least a WARNING is
> confusing, because a NULL result is returned as well even if the input
> method is incorrect and even if the method is correct but it is not
> present in current_logfiles. As the user is thought as a trusted user
> as it has access to this function, I would think that being verbose on
> the error handling, or at least warnings, would make things easier to
> analyze.

I'm not against adding a warning or error message here even if it may
never occurs, but we need a new error code as it seems to me that no
actual error code can be used.  


-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2017-01-17 Thread Andres Freund
Hi,

On 2017-01-17 12:52:20 -0500, Tom Lane wrote:
> Robert Haas  writes:
> > On Mon, Jan 16, 2017 at 2:13 PM, Tom Lane  wrote:
> >> "Srf" is ugly as can be, and unintelligible.  SetResult might be OK.
>
> > The operation we're performing here, IIUC, is projection.  SetResult
> > lacks a verb, although Set could be confused with one; someone might
> > think this is the node that sets a result, whatever that means.
> > Anyway, I suggest working Project in there somehow.  If Project by
> > itself seems like it's too generic, perhaps ProjectSet or
> > ProjectSetResult would be suitable.

I'd not have gone for SetResult if we didn't already have Result.  I'm
not super happy ending up having Project in ProjectSet but not in the
Result that end up doing the majority of the projection.  But eh, we can
live with it.


> Andres' patch is already using "SetProjectionPath" for the path struct
> type.  Maybe make that "ProjectSetPath", giving rise to a "ProjectSet"
> plan node?

WFM.


> I'm happy to do a global-search-and-replace while I'm reviewing the
> patch, but let's decide on names PDQ.

Yes, let's decide soon please.


Greeting,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2017-01-17 Thread Robert Haas
On Tue, Jan 17, 2017 at 12:52 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, Jan 16, 2017 at 2:13 PM, Tom Lane  wrote:
>>> "Srf" is ugly as can be, and unintelligible.  SetResult might be OK.
>
>> The operation we're performing here, IIUC, is projection.  SetResult
>> lacks a verb, although Set could be confused with one; someone might
>> think this is the node that sets a result, whatever that means.
>> Anyway, I suggest working Project in there somehow.  If Project by
>> itself seems like it's too generic, perhaps ProjectSet or
>> ProjectSetResult would be suitable.
>
> Andres' patch is already using "SetProjectionPath" for the path struct
> type.  Maybe make that "ProjectSetPath", giving rise to a "ProjectSet"
> plan node?

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Index Scans

2017-01-17 Thread Robert Haas
On Mon, Jan 16, 2017 at 7:11 AM, Amit Kapila  wrote:
> Fixed.

Thanks for the update.  Some more comments:

It shouldn't be necessary for MultiExecBitmapIndexScan to modify the
IndexScanDesc.  That seems really broken.  If a parallel scan isn't
supported here (and I'm sure that's the case right now) then no such
IndexScanDesc should be getting created.

WAIT_EVENT_PARALLEL_INDEX_SCAN is in fact btree-specific.  There's no
guarantee that any other AMs the implement parallel index scans will
use that wait event, and they might have others instead.  I would make
it a lot more specific, like WAIT_EVENT_BTREE_PAGENUMBER.  (Waiting
for the page number needed to continue a parallel btree scan to become
available.)

Why do we need separate AM support for index_parallelrescan()?  Why
can't index_rescan() cover that case?  If the AM-specific method is
getting the IndexScanDesc, it can see for itself whether it is a
parallel scan.

I'd rename PS_State to BTPS_State, to match the other renamings.

If we're going to update all of the AMs to set the new support
functions to NULL, we should also update contrib/bloom.

index_parallelscan_estimate has multiple lines that go over 80
characters for no really good reason.  Separate the initialization of
index_scan from the variable declaration.  Do the same for
amindex_size.  Also, you don't really need to encase the end of the
function in an "else" block when the "if" block is guaranteed to
returrn.

Several function header comments still use the style where the first
word of the description is "It".  Say "this function" or similar the
first time, instead of "it". Then when you say "it" later, it's clear
that it refers back to where you said "this function".

index_parallelscan_initialize also has a line more than 80 characters
that looks easy to fix by splitting the declaration from the
initialization.

I think it's a bad idea to add a ParallelIndexScanDesc argument to
index_beginscan().  That function is used in lots of places, and
somebody might think that they are allowed to actually pass a non-NULL
value there, which they aren't: they must go through
index_beginscan_parallel.  I think that the additional argument should
only be added to index_beginscan_internal, and
index_beginscan_parallel should remain unchanged.  Either that, or get
rid of index_beginscan_parallel altogether and have everyone use
index_beginscan directly, and put the snapshot-restore logic in that
function.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2017-01-17 Thread Tom Lane
Robert Haas  writes:
> On Mon, Jan 16, 2017 at 2:13 PM, Tom Lane  wrote:
>> "Srf" is ugly as can be, and unintelligible.  SetResult might be OK.

> The operation we're performing here, IIUC, is projection.  SetResult
> lacks a verb, although Set could be confused with one; someone might
> think this is the node that sets a result, whatever that means.
> Anyway, I suggest working Project in there somehow.  If Project by
> itself seems like it's too generic, perhaps ProjectSet or
> ProjectSetResult would be suitable.

Andres' patch is already using "SetProjectionPath" for the path struct
type.  Maybe make that "ProjectSetPath", giving rise to a "ProjectSet"
plan node?

I'm happy to do a global-search-and-replace while I'm reviewing the
patch, but let's decide on names PDQ.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving RLS planning

2017-01-17 Thread Stephen Frost
Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> Here's an updated version of the RLS planning patch that gets rid of
> >> the incorrect interaction with Query.hasRowSecurity and adjusts
> >> terminology as agreed.
> 
> > I've spent a fair bit of time going over this change to understand it,
> > how it works, and how it changes the way RLS and securiy barrier views
> > work.
> 
> Thanks for the review.  Attached is an updated patch that I believe
> addresses all of the review comments so far.  The code is unchanged from
> v2, but I improved the README, some comments, and the regression tests.

I've reviewed your updates and they answer all of my comments and I
appreciate the EC regression tests you added.

I also agree with Dean's down-thread suggested regression test change.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Packages: Again

2017-01-17 Thread Robert Haas
On Fri, Jan 13, 2017 at 7:24 PM, Peter Geoghegan  wrote:
> MERGE isn't UPSERT, and isn't even in competition with UPSERT as a
> feature. I've written reams of text explaining why this is so in
> precise detail, ...

No matter how much text you write, I doubt that I will ever believe
that statement.

Leaving that aside, Serge is entirely right about the underlying
principle.  We extend the standard all the time.  The only policy we
have is that we try pretty hard not to adopt standard syntax with
non-standard semantics.  If we pick our own semantics we pick our own
syntax, too, so as not to get in the way of later attempts to
implement the standard syntax with the standard semantics.

Of course, the other principle here is that the people who write the
code get a large share in deciding what ultimately happens.  If
somebody submits an actual patch in this area, fine.  If this is just
an argument about what would be better for someone else to implement,
it's mostly a waste of time.  Anybody who does work in this area is
likely to have their own strong opinion on what should be implemented
- and they are likely to implement precisely that thing.  That doesn't
guarantee acceptance, but it gets you further than bikeshedding from
the sidelines.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] move collation import to backend

2017-01-17 Thread Peter Eisentraut
On 1/9/17 10:04 PM, Euler Taveira wrote:
> On 18-12-2016 18:30, Peter Eisentraut wrote:
>> Updated patch with that fix.
>>
> Peter, I reviewed and improved your patch.
> 
> * I document the new function. Since collation is a database object, I
> chose "Database Object Management Functions" section.

OK

> * I've added a check to any-encoding database because I got 'FATAL:
> collation "C" already exists' on Debian 8, although, I didn't get on
> CentOS 7. The problem is that Debian has two locales for C (C and
> C.UTF-8) and CentOS has just one (C).

OK

> * I've added OidIsValid to test the new collation row.

OK

> * I've changed the parameter order. Schema seems more important than
> if_not_exists. Also, we generally leave those boolean parameters for the
> end of list. I don't turn if_not_exists optional but IMO it would be a
> good idea (default = true).

I put them that way because in an SQL command the "IF NOT EXISTS" comes
before the schema, but I can see how it is weird that way in a function.

> * You removed some #if and #ifdef while moving things around. I put it back.
> * You didn't pgident some lines of code but I'm sure you didn't for a
> small patch footprint.

I had left the #if in initdb, but I think your changes are better.

> I'm attaching the complete and also a patch at the top of your last patch.

Thanks.  If there are no more comments, I will proceed with that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2017-01-17 Thread Robert Haas
On Mon, Jan 16, 2017 at 2:13 PM, Tom Lane  wrote:
> Andres Freund  writes:
>> That worked quite well.  So we have a few questions, before I clean this
>> up:
>
>> - For now the node is named 'Srf' both internally and in explain - not
>>   sure if we want to make that something longer/easier to understand for
>>   others? Proposals? TargetFunctionScan? SetResult?
>
> "Srf" is ugly as can be, and unintelligible.  SetResult might be OK.

The operation we're performing here, IIUC, is projection.  SetResult
lacks a verb, although Set could be confused with one; someone might
think this is the node that sets a result, whatever that means.
Anyway, I suggest working Project in there somehow.  If Project by
itself seems like it's too generic, perhaps ProjectSet or
ProjectSetResult would be suitable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-01-17 Thread Robert Haas
On Mon, Jan 16, 2017 at 10:59 AM, Finnerty, Jim  wrote:
> The ability to exploit hashed aggregation within sorted groups, when the 
> order of the input stream can be exploited this way, is potentially a useful 
> way to improve aggregation performance more generally.  This would 
> potentially be beneficial when the input size is expected to be larger than 
> the amount of working memory available for hashed aggregation, but where 
> there is enough memory to hash-aggregate just the unsorted grouping key 
> combinations, and when the cumulative cost of rebuilding the hash table for 
> each sorted subgroup is less than the cost of sorting the entire input.  In 
> other words, if most of the grouping key combinations are already segregated 
> by virtue of the input order, then hashing the remaining combinations within 
> each sorted group might be done in memory, at the cost of rebuilding the hash 
> table for each sorted subgroup.

Neat idea.

> I haven’t looked at the code for this change yet (I hope I will have the time 
> to do that).  Ideally the decision to choose the aggregation method as 
> sorted, hashed, or mixed hash/sort should be integrated into the cost model, 
> but given the notorious difficulty of estimating intermediate cardinalities 
> accurately it would be difficult to develop a cardinality model and a cost 
> model accurate enough to choose among these options consistently well.

Yes, that might be a little tricky.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2017-01-17 Thread Robert Haas
On Mon, Jan 16, 2017 at 4:09 AM, Amit Langote
 wrote:
> The problem is that whereas the SlotValueDescription that we build to show
> in the error message should be based on the tuple that was passed to
> ExecInsert() or whatever NextCopyFrom() returned for CopyFrom() to
> process, it might fail to be the case if the tuple was needed to be
> converted after tuple routing.  slot (the tuple in it and its tuple
> descriptor) and resultRelInfo that ExecConstraint() receives *do*
> correspond with each other, even after possible tuple conversion following
> tuple-routing, and hence constraint checking itself works fine (since
> commit 2ac3ef7a01 [1]).  As said, it's the val_desc built to show in the
> error message being based on the converted-for-partition tuple that could
> be seen as a problem - is it acceptable if we showed in the error message
> whatever the converted-for-partition tuple looks like which might have
> columns ordered differently from the root table?  If so, we could simply
> forget the whole thing, including reverting f1b4c771 [2].
>
> An example:
>
> create table p (a int, b char, c int) partition by list (a);
> create table p1 (b char, c int, a int);-- note the column order
> alter table p attach partition p1 for values in (1);
> alter table p add constraint check_b check (b = 'x');
>
> insert into p values (1, 'y', 1);
> ERROR:  new row for relation "p1" violates check constraint "check_b"
> DETAIL:  Failing row contains (y, 1, 1).
>
> Note that "(y, 1, 1)" results from using p1's descriptor on the converted
> tuple.  As long that's clear and acceptable, I think we need not worry
> about this patch and revert the previously committed patch for this "problem".

Hmm.  It would be fine, IMHO, if the detail message looked like the
one that BuildIndexValueDescription produces.  Without the column
names, the clarity is somewhat lessened.

Anybody else have an opinion on this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix cardinality estimates for parallel joins.

2017-01-17 Thread Robert Haas
On Mon, Jan 16, 2017 at 7:23 AM, Amit Kapila  wrote:
> On Sat, Jan 14, 2017 at 12:07 AM, Robert Haas  wrote:
>> Fix cardinality estimates for parallel joins.
>>
>
> +   /*
> +* In the case of a parallel plan, the row count needs to represent
> +* the number of tuples processed per worker.
> +*/
> +   path->rows = clamp_row_est(path->rows / parallel_divisor);
> }
>
> path->startup_cost = startup_cost;
> @@ -2014,6 +1996,10 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
> else
> path->path.rows = path->path.parent->rows;
>
> +   /* For partial paths, scale row estimate. */
> +   if (path->path.parallel_workers > 0)
> +   path->path.rows /= get_parallel_divisor(>path);
>
>
> Isn't it better to call clamp_row_est in join costing functions as we
> are doing in cost_seqscan()?  Is there a reason to keep those
> different?

No, those should probably be changed to match.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-01-17 Thread Stephen Frost
Peter,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 1/13/17 10:18 AM, Stephen Frost wrote:
> > Certainly, check_postgres is going to have to be changed to address this
> > and, unsurprisingly, it's already had to address a variety of major
> > version differences that have been introduced over the years.
> 
> check_postgres will not need to be changed except for the actions that
> check the disk, which you don't need unless you are using archiving.

That isn't really the point though, is it?  check_postgres will need to
be changed because there are actions which check the pg_xlog directory.
That'll cause a new release, which will be the "release that works with
PG10."

Perhaps if you're following along with -hackers and know how
check_postgres works then you'll realize that you might not *have* to
upgrade your check_postgres installation if only the directory is
changed and nothing else is, but I've got a pretty hard time seeing that
as a very common user use-case.

The implication here seems to be that because the older version of
check_postgres might appear to continue working for *some* set of
actions (but not all) that we should encouarge users to keep using that
older version with PG10.  That doesn't make any sense to me and I
certainly don't agree with it.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Logical Replication WIP

2017-01-17 Thread Petr Jelinek
On 17/01/17 17:11, Peter Eisentraut wrote:
> On 1/15/17 1:48 PM, Petr Jelinek wrote:
>> It's meant to decouple the synchronous commit setting for logical
>> replication workers from the one set for normal clients. Now that we
>> have owners for subscription and subscription runs as that owner, maybe
>> we could do that via ALTER USER.
> 
> I was thinking about that as well.
> 
>> However I think the apply should by
>> default run with sync commit turned off as the performance benefits are
>> important there given that there is one worker that has to replicate in
>> serialized manner and the success of replication is not confirmed by
>> responding to COMMIT but by reporting LSNs of various replication stages.
> 
> Hmm, I don't think we should ship with an "unsafe" default.  Do we have
> any measurements of the performance impact?
> 

I will have to do some for the patch specifically, I only have ones for
pglogical/bdr where it's quite significant.

The default is not unsafe really, we still report correct flush position
to the publisher. The synchronous replication on publisher will still
work even if synchronous standby is subscription which itself has sync
commit off (that's why the complicated
send_feedback()/get_flush_position()) but will have higher latency as
flushes don't happen immediately. Cascading should be fine as well even
around crashes as logical decoding only picks up flushed WAL.

It could be however argued there may be some consistency issues around
the crash as other transactions could have already seen data that
disappeared after postgres recovery and then reappeared again when the
replication caught up again. That might indeed be a show stopper for the
default off.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Grouped base relation

2017-01-17 Thread Antonin Houska
Ashutosh Bapat  wrote:
> [... snip ]]
>
> This all works well, as long as the aggregate is "summing" something
> across rows. The method doesn't work when aggregation is say
> "multiplying" across the rows or "concatenating" across the rows like
> array_agg() or string_agg(). They need a different strategy to combine
> aggregates across relations.

Good point. The common characteristic of these seems to be that thay don't
have aggcombinefn defined.

> IIUC, we are trying to solve multiple problems here:

> 1. Pushing down aggregates/groups down join tree, so that the number of rows
> to be joined decreases.  This might be a good optimization to have. However
> there are problems in the current patch. Every path built for a relation
> (join or base) returns the same result expressed by the relation or its
> subset restricted by parameterization or unification. But this patch changes
> that. It creates paths which represent grouping in the base relation.  I
> think, we need a separate relation to represent that result and hold paths
> which produce that result. That itself would be a sizable patch.

Whether a separate relation (RelOptInfo) should be created for grouped
relation is an important design decision indeed. More important than your
argument about the same result ("partial path", used to implement parallel
nodes actually does not fit this criterion perfectly - it only returns part of
the set) is the fact that the data type (target) differs.

I even spent some time coding a prototype where separate RelOptInfo is created
for the grouped relation but it was much more invasive. In particular, if only
some relations are grouped, it's hard to join them with non-grouped ones w/o
changing make_rel_from_joinlist and subroutines substantially. (Decision
whether the plain or the grouped relation should be involved in joining makes
little sense at the leaf level of the join tree.)

So I took the approach that resembles the partial paths - separate pathlists
within the same RelOptInfo.

> 2. Try to push down aggregates based on the equivalence classes, where
> grouping properties can be transferred from one relation to the other using
> EC mechanism.

I don't think the EC part should increase the patch complexity a lot. Unless I
missed something, it's rather isolated to the part where target of the grouped
paths is assembled. And I think it's important even for initial version of the
patch.

> This seems to require solving the problem of combining aggregates across the
> relations. But there might be some usecases which could benefit without
> solving this problem.

If "combining aggregates ..." refers to joining grouped relations, then I
insist on doing this in the initial version of the new feature too. Otherwise
it'd only work if exactly one base relation of the query is grouped.

> 3. If the relation to which we push the aggregate is an append relation,
> push (partial) aggregation/grouping down into the child relations. - We
> don't do that right now even for grouping aggregation on a single append
> table. Parallel partial aggregation does that, but not exactly per
> relation. That may be a sizable project in itself.  Even without this piece
> the rest of the optimizations proposed by this patch are important.

Yes, this can be done in a separate patch. I'll consider it.

> 4. Additional goal: push down the aggregation to any relation (join/base)
> where it can be computed.

I think this can be achieved by adding extra aggregation nodes to the join
tree. As I still anticipate more important design changes, this part is not at
the top of my TODO list.

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-01-17 Thread Peter Eisentraut
On 1/13/17 10:18 AM, Stephen Frost wrote:
> The point I was making was that serious montioring systems would have to
> be changed and I stand by that.

I don't think my monitoring systems are any less serious than yours.

> Certainly, check_postgres is going to have to be changed to address this
> and, unsurprisingly, it's already had to address a variety of major
> version differences that have been introduced over the years.

check_postgres will not need to be changed except for the actions that
check the disk, which you don't need unless you are using archiving.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait

2017-01-17 Thread Peter Eisentraut
On 1/15/17 11:40 PM, Fujii Masao wrote:
> This change may confuse the users who run "pg_ctl start" to perform a crash
> recovery, archive recovery and standby server (with hot_standby=off) because
> "pg_ctl start" would not return so long time.

Well, this change was made because the previous behavior confused people
as well, because pg_ctl would return before it was actually done.

The new state shouldn't be confusing because pg_ctl prints out progress
messages.

> Also during that long time,
> the error message "FATAL:  the database system is starting up" keeps 
> outputing.

We could potentially avoid some of this by using the new facilities in
pg_ctl to read pg_control and not use PQping() before the state is out
of DB_IN_CRASH_RECOVERY.

Note, however, that this isn't a new problem.  The way pg_ctl start -w
is implemented hasn't changed.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New CORRESPONDING clause design

2017-01-17 Thread Tom Lane
Surafel Temsgen  writes:
>  My design is
>  *In parsing stage*
> 1. Check at least one common column name appear in queries
> 2. If corresponding column list is not specified, then make corresponding
> list from common column name in queries target lists in the order
> that those column names appear in first query
> 3. If corresponding column list is specified, then check that every column
> name
> in The corresponding column list appears in column names of both queries
> *In planner*
> 1. Take projection columen name from corresponding list
> 2. Reorder left and right queries target lists according to corresponding
> list
> 3. For each query, project columens as many as number of corresponding
> columen.

FWIW, I think you need to do most of that work in the parser, not the
planner.  The parser certainly has to determine the actual output column
names and types of the setop construct, and we usually consider that the
parsing stage is expected to fully determine the semantics of the query.
So I'd envision the parser as emitting some explicit representation of
the column mapping for each side, rather than expecting the planner to
determine for itself what maps to what.

It probably does make sense for the actual implementation to involve
inserting projection nodes below the setop.  I'm pretty sure the executor
code for setops expects to just pass input tuples through without
projection.  You could imagine changing that, but it would add a penalty
for non-CORRESPONDING setops, which IMO would be the wrong tradeoff.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2017-01-17 Thread Petr Jelinek
On 17/01/17 17:09, Peter Eisentraut wrote:
> 
>> Yes, that will need some discussion about corner case behaviour. For
>> example, have partitioned table 'foo' which is in publication, then you
>> have table 'bar' which is not in publication, you attach it to the
>> partitioned table 'foo', should it automatically be added to
>> publication? Then you detach it, should it then be removed from publication?
>> What if 'bar' was in publication before it was attached/detached to/from
>> 'foo'? What if 'foo' wasn't in publication but 'bar' was? Should we
>> allow ONLY syntax for partitioned table when they are being added and
>> removed?
> 
> Let's think about that in a separate thread.
> 

Agreed.

>>> reread_subscription() complains if the subscription name was changed.
>>> I don't know why that is a problem.
>>
>> Because we don't have ALTER SUBSCRIPTION RENAME currently. Maybe should
>> be Assert?
> 
> Is there anything stopping anyone from implementing it?
> 

No, just didn't seem priority for the functionality right now.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2017-01-17 Thread Peter Eisentraut
On 1/15/17 1:48 PM, Petr Jelinek wrote:
> It's meant to decouple the synchronous commit setting for logical
> replication workers from the one set for normal clients. Now that we
> have owners for subscription and subscription runs as that owner, maybe
> we could do that via ALTER USER.

I was thinking about that as well.

> However I think the apply should by
> default run with sync commit turned off as the performance benefits are
> important there given that there is one worker that has to replicate in
> serialized manner and the success of replication is not confirmed by
> responding to COMMIT but by reporting LSNs of various replication stages.

Hmm, I don't think we should ship with an "unsafe" default.  Do we have
any measurements of the performance impact?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2017-01-17 Thread Peter Eisentraut
On 1/15/17 5:20 PM, Petr Jelinek wrote:
> Well, it's 4 because max_worker_processes is 8, I think default
> max_worker_processes should be higher than
> max_logical_replication_workers so that's why I picked 4. If we are okay
> wit bumping the max_worker_processes a bit, I am all for increasing
> max_logical_replication_workers as well.
> 
> The quick setup mentions 10 mainly for consistency with slots and wal
> senders (those IMHO should also not be 0 by default at this point...).

Those defaults have now been changed, so the "Quick setup" section could
potentially be simplified a bit.

> I did this somewhat differently, with struct that defines options and
> has different union members for physical and logical replication. What
> do you think of that?

Looks good.

>>  Not sure about
>> worker_internal.h.  Maybe rename apply.c to worker.c?
>>
> 
> Hmm I did that, seems reasonably okay. Original patch in fact had both
> worker.c and apply.c and I eventually moved the worker.c functions to
> either apply.c or launcher.c.

I'm not too worried about this.

> Yes, that will need some discussion about corner case behaviour. For
> example, have partitioned table 'foo' which is in publication, then you
> have table 'bar' which is not in publication, you attach it to the
> partitioned table 'foo', should it automatically be added to
> publication? Then you detach it, should it then be removed from publication?
> What if 'bar' was in publication before it was attached/detached to/from
> 'foo'? What if 'foo' wasn't in publication but 'bar' was? Should we
> allow ONLY syntax for partitioned table when they are being added and
> removed?

Let's think about that in a separate thread.

>> reread_subscription() complains if the subscription name was changed.
>> I don't know why that is a problem.
> 
> Because we don't have ALTER SUBSCRIPTION RENAME currently. Maybe should
> be Assert?

Is there anything stopping anyone from implementing it?


I'm happy with these patches now.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2017-01-17 Thread Peter Eisentraut
On 1/15/17 2:28 PM, Petr Jelinek wrote:
> Well the preinstalled information_schema is excluded by the
> FirstNormalObjectId filter as it's created by initdb. If user drops and
> recreates it that means it was created as user object.
> 
> My opinion is that FOR ALL TABLES should replicate all user tables (ie,
> anything that has Oid >= FirstNormalObjectId), if those are added to
> information_schema that's up to user. We also replicate user created
> tables in pg_catalog even if it's system catalog so I don't see why
> information_schema should be filtered on schema level.

Fair enough.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New CORRESPONDING clause design

2017-01-17 Thread David Fetter
On Tue, Jan 17, 2017 at 08:20:25AM -0600, Merlin Moncure wrote:
> On Tue, Jan 17, 2017 at 12:37 AM, Surafel Temsgen  
> wrote:
> > I am new here and I really want to contribute, I have read same resource
> > that help understanding database system and postgresql. I would like to
> > start implementing sql syntax corresponding by clause because I believe
> > implementing sql syntax gives an opportunity to familiarize  many part of
> > postgresql source code. Previous implementation is here and have an issue on
> > explain query and break cases on unlabeled NULLs
> > To repeat what a corresponding by clause means
> > Corresponding clause either contains a BY(...) clause or not. If it
> > doesn't have a BY(...) clause the usage is as follows.
> 
> This is great stuff. Does the syntax only apply to UNION?  I would
> imagine it would also apply to INTERSECT/EXCEPT?  What about UNION
> ALL?

My draft working standard from 2011 says in 7IWD-02-Foundation section 7.13 
:

a) If CORRESPONDING is specified, then:
i) Within the columns of T1, equivalent s shall not be specified 
more than once
and within the columns of T2, equivalent s shall not be specified 
more than
once.
ii) At least one column of T1 shall have a  that is the  of some
column of T2.
iii) Case:
1) If  is not specified, then let SL be a  of those
s that are s of both T1 and T2 in the order that those
s appear in T1.
2) If  is specified, then let SL be a  
of those s explicitly appearing in the  in the order 
that these
s appear in the . Every  in
the  shall be a  of both T1 and T2.
iv) The  or  is equivalent to:
( SELECT SL FROM TN1 ) OP ( SELECT SL FROM TN2 )

Earlier, it defines ( UNION | EXCEPT ) [ ALL | DISTINCT ] to have a
meaning in this context, to wit:

 ::=

|  UNION [ ALL | DISTINCT ]
[  ] 
|  EXCEPT [ ALL | DISTINCT ]
[  ] 

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2017

2017-01-17 Thread Anastasia Lubennikova

I'm ready to be a mentor.

10.01.2017 12:53, Alexander Korotkov:

Hi all!

In 2016 PostgreSQL project didn't pass to GSoC program.  In my 
understanding the reasons for that are following.


1. We did last-minute submission of our application to GSoC.
2. In 2016 GSoC application form for mentoring organizations has been 
changed.  In particular, it required more detailed information about 
possible project.


As result we didn't manage to make a good enough application that 
time.  Thus, our application was declined. See [1] and [2] for details.


I think that the right way to manage this in 2017 would be to start 
collecting required information in advance. According to GSoC 2017 
timeline [3] mentoring organization can submit their applications from 
January 19 to February 9. Thus, now it's a good time to start 
collecting project ideas and make call for mentors.  Also, we need to 
decide who would be our admin this year.


In sum, we have following questions:
1. What project ideas we have?
2. Who are going to be mentors this year?
3. Who is going to be project admin this year?

BTW, I'm ready to be mentor this year.  I'm also open to be an admin 
if needed.


[1] 
https://www.postgresql.org/message-id/flat/CAA-aLv4p1jfuMpsRaY2jDUQqypkEXUxeb7z8Mp-0mW6M03St7A%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/flat/CALxAEPuGpAjBSN-PTuxHfuLLqDS47BEbO_ZYxUYQR3ud1nwbww%40mail.gmail.com

[3] https://developers.google.com/open-source/gsoc/timeline

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] ISO/IEC 9075-2:2016 for postgres community

2017-01-17 Thread Oleg Bartunov
On Tue, Jan 17, 2017 at 6:26 PM, Alvaro Herrera 
wrote:

> Oleg Bartunov wrote:
> > Hi there,
> >
> > I just bought ISO/IEC 9075-2:2016
> > http://www.iso.org/iso/home/store/catalogue_tc/catalogue_
> detail.htm?csnumber=63556
> > to satisfy my interest on json support in SQL.  I am not ready to discuss
> > here implementation details, but there is one problem with the status of
> > this document, which is copyright protected.
> >
> > "All rights reserved. Unless otherwise specified, no part of this
> > publication may be reproduced or utilized otherwise in any form or by any
> > means, electronic or mechanical, including photocopying, or posting on
> the
> > internet or an intranet, without prior written permission. Permission can
> > be requested from either ISO at the address below or ISO’s member body in
> > the country of the requester."
> >
> > My question is should we ask ISO for permission to use my copy in our
> > community or  should we buy at least one another copy (for reviewer) ?
>
> With previous versions, we have relied on draft versions
> "indistinguishable from latest" posted by third parties.  I couldn't
> find such a version for sql2106 in an (admittedly very quick) search.
>

I found several mentions about such a draft, but failed to find any.  I
think we can wait until we have a prototype or somebody share a link.


> I doubt ISO would give the community permission to use your copy, but it
> wouldn't hurt to ask them.
>

> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [HACKERS] ISO/IEC 9075-2:2016 for postgres community

2017-01-17 Thread Alvaro Herrera
Oleg Bartunov wrote:
> Hi there,
> 
> I just bought ISO/IEC 9075-2:2016
> http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=63556
> to satisfy my interest on json support in SQL.  I am not ready to discuss
> here implementation details, but there is one problem with the status of
> this document, which is copyright protected.
> 
> "All rights reserved. Unless otherwise specified, no part of this
> publication may be reproduced or utilized otherwise in any form or by any
> means, electronic or mechanical, including photocopying, or posting on the
> internet or an intranet, without prior written permission. Permission can
> be requested from either ISO at the address below or ISO’s member body in
> the country of the requester."
> 
> My question is should we ask ISO for permission to use my copy in our
> community or  should we buy at least one another copy (for reviewer) ?

With previous versions, we have relied on draft versions
"indistinguishable from latest" posted by third parties.  I couldn't
find such a version for sql2106 in an (admittedly very quick) search.
I doubt ISO would give the community permission to use your copy, but it
wouldn't hurt to ask them.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-17 Thread Alvaro Herrera
Reading through the track_root_lp patch now.

> + /*
> +  * For HOT (or WARM) updated tuples, we store the offset of the 
> root
> +  * line pointer of this chain in the ip_posid field of the new 
> tuple.
> +  * Usually this information will be available in the 
> corresponding
> +  * field of the old tuple. But for aborted updates or 
> pg_upgraded
> +  * databases, we might be seeing the old-style CTID chains and 
> hence
> +  * the information must be obtained by hard way
> +  */
> + if (HeapTupleHeaderHasRootOffset(oldtup.t_data))
> + root_offnum = 
> HeapTupleHeaderGetRootOffset(oldtup.t_data);
> + else
> + heap_get_root_tuple_one(page,
> + 
> ItemPointerGetOffsetNumber(&(oldtup.t_self)),
> + _offnum);

Hmm.  So the HasRootOffset tests the HEAP_LATEST_TUPLE bit, which is
reset temporarily during an update.  So that case shouldn't occur often.

Oh, I just noticed that HeapTupleHeaderSetNextCtid also clears the flag.

> @@ -4166,10 +4189,29 @@ l2:
>   HeapTupleClearHotUpdated();
>   HeapTupleClearHeapOnly(heaptup);
>   HeapTupleClearHeapOnly(newtup);
> + root_offnum = InvalidOffsetNumber;
>   }
>  
> - RelationPutHeapTuple(relation, newbuf, heaptup, false); /* 
> insert new tuple */
> + /* insert new tuple */
> + RelationPutHeapTuple(relation, newbuf, heaptup, false, root_offnum);
> + HeapTupleHeaderSetHeapLatest(heaptup->t_data);
> + HeapTupleHeaderSetHeapLatest(newtup->t_data);
>  
> + /*
> +  * Also update the in-memory copy with the root line pointer information
> +  */
> + if (OffsetNumberIsValid(root_offnum))
> + {
> + HeapTupleHeaderSetRootOffset(heaptup->t_data, root_offnum);
> + HeapTupleHeaderSetRootOffset(newtup->t_data, root_offnum);
> + }
> + else
> + {
> + HeapTupleHeaderSetRootOffset(heaptup->t_data,
> + ItemPointerGetOffsetNumber(>t_self));
> + HeapTupleHeaderSetRootOffset(newtup->t_data,
> + ItemPointerGetOffsetNumber(>t_self));
> + }

This is repetitive.  I think after RelationPutHeapTuple it'd be better
to assign root_offnum = >t_self, so that we can just call
SetRootOffset() on each tuple without the if().


> + HeapTupleHeaderSetHeapLatest((HeapTupleHeader) item);
> + if (OffsetNumberIsValid(root_offnum))
> + HeapTupleHeaderSetRootOffset((HeapTupleHeader) item,
> + root_offnum);
> + else
> + HeapTupleHeaderSetRootOffset((HeapTupleHeader) item,
> + offnum);

Just a matter of style, but this reads nicer IMO:

HeapTupleHeaderSetRootOffset((HeapTupleHeader) item,
OffsetNumberIsValid(root_offnum) ? root_offnum : offnum);


> @@ -740,8 +742,9 @@ heap_page_prune_execute(Buffer buffer,
>   * holds a pin on the buffer. Once pin is released, a tuple might be pruned
>   * and reused by a completely unrelated tuple.
>   */
> -void
> -heap_get_root_tuples(Page page, OffsetNumber *root_offsets)
> +static void
> +heap_get_root_tuples_internal(Page page, OffsetNumber target_offnum,
> + OffsetNumber *root_offsets)
>  {
>   OffsetNumber offnum,

I think this function deserves more/better/updated commentary.

> @@ -439,7 +439,9 @@ rewrite_heap_tuple(RewriteState state,
>* set the ctid of this tuple to point to the new 
> location, and
>* insert it right away.
>*/
> - new_tuple->t_data->t_ctid = mapping->new_tid;
> + HeapTupleHeaderSetNextCtid(new_tuple->t_data,
> + 
> ItemPointerGetBlockNumber(>new_tid),
> + 
> ItemPointerGetOffsetNumber(>new_tid));

I think this would be nicer:
HeapTupleHeaderSetNextTid(new_tuple->t_data, >new_tid);
AFAICS all the callers are doing ItemPointerGetFoo for a TID, so this is
overly verbose for no reason.  Also, the "c" in Ctid stands for
"current"; I think we can omit that.

> @@ -525,7 +527,9 @@ rewrite_heap_tuple(RewriteState state,
>   new_tuple = unresolved->tuple;
>   free_new = true;
>   old_tid = unresolved->old_tid;
> - new_tuple->t_data->t_ctid = new_tid;
> + HeapTupleHeaderSetNextCtid(new_tuple->t_data,
> + 
> ItemPointerGetBlockNumber(_tid),
> + 
> ItemPointerGetOffsetNumber(_tid));

Did you forget to SetHeapLatest 

[HACKERS] reminder: PGCon 2017 CFP

2017-01-17 Thread Dan Langille
Hello,

There are two days left in the PGCon 2017 CFP, which closes on 19 January.
Please get your submissions in soon.

PGCon 2017 will be on 23-26 May 2017 at University of Ottawa.

* 23-24 (Tue-Wed) tutorials
* 24 (Wed) The Unconference
* 25-26 (Thu-Fri) talks - the main part of the conference

See http://www.pgcon.org/2017/ 


We are now accepting proposals for the main part of the conference (25-26 May).
Proposals can be quite simple. We do not require academic-style papers.

If you are doing something interesting with PostgreSQL, please submit
a proposal.  You might be one of the backend hackers or work on a
PostgreSQL related project and want to share your know-how with
others. You might be developing an interesting system using PostgreSQL
as the foundation. Perhaps you migrated from another database to
PostgreSQL and would like to share details.  These, and other stories
are welcome. Both users and developers are encouraged to share their
experiences.

Here are a some ideas to jump start your proposal process:

- novel ways in which PostgreSQL is used
- migration of production systems from another database
- data warehousing
- tuning PostgreSQL for different work loads
- replication and clustering
- hacking the PostgreSQL code
- PostgreSQL derivatives and forks
- applications built around PostgreSQL
- benchmarking and performance engineering
- case studies
- location-aware and mapping software with PostGIS
- The latest PostgreSQL features and features in development
- research and teaching with PostgreSQL
- things the PostgreSQL project could do better
- integrating PostgreSQL with 3rd-party software

Both users and developers are encouraged to share their experiences.

The schedule is:

1 Dec 2016 Proposal acceptance begins
19 Jan 2017 Proposal acceptance ends
19 Feb 2017 Confirmation of accepted proposals

NOTE: the call for lightning talks will go out very close to the conference.
Do not submit lightning talks proposals until then.

See also >

Instructions for submitting a proposal to PGCon 2017 are available
from: >

-- 
Dan Langille - BSDCan / PGCon
d...@langille.org 



Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix an assertion failure related to an exclusive backup.

2017-01-17 Thread Fujii Masao
On Tue, Jan 17, 2017 at 10:37 PM, Michael Paquier
 wrote:
> On Tue, Jan 17, 2017 at 5:40 PM, Fujii Masao  wrote:
>> Fix an assertion failure related to an exclusive backup.
>>
>> Previously multiple sessions could execute pg_start_backup() and
>> pg_stop_backup() to start and stop an exclusive backup at the same time.
>> This could trigger the assertion failure of
>> "FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)".
>> This happend because, even while pg_start_backup() was starting
>> an exclusive backup, other session could run pg_stop_backup()
>> concurrently and mark the backup as not-in-progress unconditionally.
>>
>> This patch introduces ExclusiveBackupState indicating the state of
>> an exclusive backup. This state is used to ensure that there is only
>> one session running pg_start_backup() or pg_stop_backup() at
>> the same time, to avoid the assertion failure.
>
> Please note that this commit message is not completely exact. This fix
> does not only avoid triggerring this assertion failure, it also makes
> sure that no manual on-disk intervention is needed by the user to
> remove a backup_label file after a failure of pg_stop_backup(). Before
> this patch, what happened is that the exclusive backup counter in
> XLogCtl got decremented before removing backup_label. However, after
> the counter was decremented, if an error occurred, the shared memory
> counter would have been at 0 with a backup_label file on disk.
> Subsequent attempts to start pg_start_backup() would have failed, and
> putting the system backup into a consistent state would have required
> an operator to remove by hand the backup_label file. The heart of the
> logic here is in the callback of pg_stop_backup() when an error
> happens during the deletion of the backup_label file.

With the patch, what happens if pg_stop_backup exits with an error
after removing backup_label file before resetting the backup state
to none?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New CORRESPONDING clause design

2017-01-17 Thread Merlin Moncure
On Tue, Jan 17, 2017 at 12:37 AM, Surafel Temsgen  wrote:
> I am new here and I really want to contribute, I have read same resource
> that help understanding database system and postgresql. I would like to
> start implementing sql syntax corresponding by clause because I believe
> implementing sql syntax gives an opportunity to familiarize  many part of
> postgresql source code. Previous implementation is here and have an issue on
> explain query and break cases on unlabeled NULLs
> To repeat what a corresponding by clause means
> Corresponding clause either contains a BY(...) clause or not. If it
> doesn't have a BY(...) clause the usage is as follows.

This is great stuff. Does the syntax only apply to UNION?  I would
imagine it would also apply to INTERSECT/EXCEPT?  What about UNION
ALL?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-01-17 Thread Karl O. Pinc
On Tue, 17 Jan 2017 07:58:43 -0600
"Karl O. Pinc"  wrote:

> On Tue, 17 Jan 2017 11:22:45 +0900
> Michael Paquier  wrote:
> 
> > On Tue, Jan 17, 2017 at 2:21 AM, Karl O. Pinc 
> > wrote:  
> 
> > >>Also, I would rather see an ERROR returned to the user in case of
> > >>bad data in current_logfiles,

> > 
> > Hm... OK. At the same time not throwing at least a WARNING is
> > confusing

What I find a little disconcerting is that there'd be nothing in the
logs.



Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-01-17 Thread Karl O. Pinc
On Tue, 17 Jan 2017 11:22:45 +0900
Michael Paquier  wrote:

> On Tue, Jan 17, 2017 at 2:21 AM, Karl O. Pinc  wrote:

> >>Also, I would rather see an ERROR returned to the user in case of
> >>bad data in current_logfiles, I did not change that either as
> >>that's the original intention of Gilles.  
> >
> > I could be wrong but I seem to recall that Robert recommended
> > against an error message. If there is bad data then something is
> > really wrong, up to some sort of an attack on the back end. Because
> > this sort of thing simply shouldn't happen it's enough in my
> > opinion to guard against buffer overruns etc and just get on with
> > life. If something goes unexpectedly and badly wrong with internal
> > data structures in general there would have to be all kinds of
> > additional code to cover every possible problem and that doesn't
> > seem to make sense.  
> 
> Hm... OK. At the same time not throwing at least a WARNING is
> confusing, because a NULL result is returned as well even if the input
> method is incorrect and even if the method is correct but it is not
> present in current_logfiles. As the user is thought as a trusted user
> as it has access to this function, I would think that being verbose on
> the error handling, or at least warnings, would make things easier to
> analyze.

These are all valid points.

In the context of reliability it's worth noting that
pg_current_logfile() results are inherently unreliable.
If the OS returns ENFILE or EMFILE when opening the current_logfiles
file (but not previously) the content, and so pg_current_logfile()
results, will be outdated until the next logfile rotation.
You wouldn't expect this to happen, but it might.  Which
is similar to the situation where the content of the current_logfiles
is corrupted; very unexpected but possible.


Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Fix an assertion failure related to an exclusive backup.

2017-01-17 Thread Michael Paquier
On Tue, Jan 17, 2017 at 5:40 PM, Fujii Masao  wrote:
> Fix an assertion failure related to an exclusive backup.
>
> Previously multiple sessions could execute pg_start_backup() and
> pg_stop_backup() to start and stop an exclusive backup at the same time.
> This could trigger the assertion failure of
> "FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)".
> This happend because, even while pg_start_backup() was starting
> an exclusive backup, other session could run pg_stop_backup()
> concurrently and mark the backup as not-in-progress unconditionally.
>
> This patch introduces ExclusiveBackupState indicating the state of
> an exclusive backup. This state is used to ensure that there is only
> one session running pg_start_backup() or pg_stop_backup() at
> the same time, to avoid the assertion failure.

Please note that this commit message is not completely exact. This fix
does not only avoid triggerring this assertion failure, it also makes
sure that no manual on-disk intervention is needed by the user to
remove a backup_label file after a failure of pg_stop_backup(). Before
this patch, what happened is that the exclusive backup counter in
XLogCtl got decremented before removing backup_label. However, after
the counter was decremented, if an error occurred, the shared memory
counter would have been at 0 with a backup_label file on disk.
Subsequent attempts to start pg_start_backup() would have failed, and
putting the system backup into a consistent state would have required
an operator to remove by hand the backup_label file. The heart of the
logic here is in the callback of pg_stop_backup() when an error
happens during the deletion of the backup_label file.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2017-01-17 Thread Michael Paquier
I sent this email one month ago but forgot to cc pgsql-hackers ;)
For the record, it is the set of patches attached that have been
pushed as 974ece5, and only Fujii-san has received them... Thanks for
committing the fix by the way!

On Thu, Dec 15, 2016 at 3:30 AM, Fujii Masao  wrote:
> On Wed, Dec 14, 2016 at 11:10 PM, Fujii Masao  wrote:
>> On second thought, do users really want to distinguish those three errornous
>> states? I'm inclined to merge the checks for those three conditions into one,
>> that is,
>>
>> if (XLogCtl->Insert.exclusiveBackupState != EXCLUSIVE_BACKUP_IN_NONE)
>> {
>> WALInsertLockRelease();
>> ereport(ERROR,
>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>  errmsg("a backup is already in progress"),
>>
>> Also it may be better to handle the similar checks in pg_stop_backup,
>> in the same way.
>
> Here is the updated version of the patch that I applied the above "merge" to.

Okay.

> Unfortunately this patch is not applied cleanly to old versions.
> So we need to create more patches for back-patch.

Attached are patches for all supported branches. I have tweaked the
following comment for master/9.6, that's why I am re-sending it:
+* (see comments of ExclusiveBackupState for more details).

Getting master and 9.6 was straight-forward. 9.5 showed a small
conflict. 9.4 bigger conflicts but the code blocks are
straight-forward to place except that the tablespace map does not
exist there, and that we need to be careful that the same flow is used
for the removal of the backup_label file. At 9.3, conflicts are caused
because of changes of APIs for WAL insert locking. Finally 9.2 showed
no conflicts with 9.3, which was a bit surprising.
--
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 457a83452e..b33e3697e8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -349,6 +349,29 @@ typedef struct XLogwrtResult
 } XLogwrtResult;
 
 /*
+ * State of an exclusive backup, necessary to control concurrent activities
+ * across sessions when working on exclusive backups.
+ *
+ * EXCLUSIVE_BACKUP_NONE means that there is no exclusive backup actually
+ * running, to be more precise pg_start_backup() is not being executed for
+ * an exclusive backup and there is no exclusive backup in progress.
+ * EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is starting an
+ * exclusive backup.
+ * EXCLUSIVE_BACKUP_IN_PROGRESS means that pg_start_backup() has finished
+ * running and an exclusive backup is in progress. pg_stop_backup() is
+ * needed to finish it.
+ * EXCLUSIVE_BACKUP_STOPPING means that pg_stop_backup() is stopping an
+ * exclusive backup.
+ */
+typedef enum ExclusiveBackupState
+{
+	EXCLUSIVE_BACKUP_NONE = 0,
+	EXCLUSIVE_BACKUP_STARTING,
+	EXCLUSIVE_BACKUP_IN_PROGRESS,
+	EXCLUSIVE_BACKUP_STOPPING
+} ExclusiveBackupState;
+
+/*
  * Shared state data for XLogInsert.
  */
 typedef struct XLogCtlInsert
@@ -370,13 +393,15 @@ typedef struct XLogCtlInsert
 	bool		fullPageWrites;
 
 	/*
-	 * exclusiveBackup is true if a backup started with pg_start_backup() is
-	 * in progress, and nonExclusiveBackups is a counter indicating the number
-	 * of streaming base backups currently in progress. forcePageWrites is set
-	 * to true when either of these is non-zero. lastBackupStart is the latest
-	 * checkpoint redo location used as a starting point for an online backup.
+	 * exclusiveBackupState indicates the state of an exclusive backup
+	 * (see comments of ExclusiveBackupState for more details).
+	 * nonExclusiveBackups is a counter indicating the number of streaming
+	 * base backups currently in progress. forcePageWrites is set to true
+	 * when either of these is non-zero. lastBackupStart is the latest
+	 * checkpoint redo location used as a starting point for an online
+	 * backup.
 	 */
-	bool		exclusiveBackup;
+	ExclusiveBackupState exclusiveBackupState;
 	int			nonExclusiveBackups;
 	XLogRecPtr	lastBackupStart;
 } XLogCtlInsert;
@@ -693,6 +718,7 @@ static bool CheckForStandbyTrigger(void);
 static void xlog_outrec(StringInfo buf, XLogRecord *record);
 #endif
 static void pg_start_backup_callback(int code, Datum arg);
+static void pg_stop_backup_callback(int code, Datum arg);
 static bool read_backup_label(XLogRecPtr *checkPointLoc,
   bool *backupEndRequired, bool *backupFromStandby);
 static void rm_redo_error_callback(void *arg);
@@ -8699,7 +8725,12 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
 	LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);
 	if (exclusive)
 	{
-		if (XLogCtl->Insert.exclusiveBackup)
+		/*
+		 * At first, mark that we're now starting an exclusive backup,
+		 * to ensure that there are no other sessions currently running
+		 * pg_start_backup() or pg_stop_backup().
+		 */
+		if (XLogCtl->Insert.exclusiveBackupState != 

Re: [HACKERS] Gather Merge

2017-01-17 Thread Amit Kapila
On Tue, Jan 17, 2017 at 5:19 PM, Rushabh Lathia
 wrote:
>
>
>
> Here is the benchmark number which I got with the latest (v6) patch:
>
> - max_worker_processes = DEFAULT (8)
> - max_parallel_workers_per_gather = 4
> - Cold cache environment is ensured. With every query execution - server is
>   stopped and also OS caches were dropped.
> - The reported values of execution time (in ms) is median of 3 executions.
> - power2 machine with 512GB of RAM
> - With default postgres.conf
>

You haven't mentioned scale factor used in these tests.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ISO/IEC 9075-2:2016 for postgres community

2017-01-17 Thread Oleg Bartunov
Hi there,

I just bought ISO/IEC 9075-2:2016
http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=63556
to satisfy my interest on json support in SQL.  I am not ready to discuss
here implementation details, but there is one problem with the status of
this document, which is copyright protected.

"All rights reserved. Unless otherwise specified, no part of this
publication may be reproduced or utilized otherwise in any form or by any
means, electronic or mechanical, including photocopying, or posting on the
internet or an intranet, without prior written permission. Permission can
be requested from either ISO at the address below or ISO’s member body in
the country of the requester."

My question is should we ask ISO for permission to use my copy in our
community or  should we buy at least one another copy (for reviewer) ?

Regards,

Oleg


[HACKERS] New CORRESPONDING clause design

2017-01-17 Thread Surafel Temsgen
I am new here and I really want to contribute, I have read same resource
that help understanding database system and postgresql. I would like to
start implementing sql syntax corresponding by clause because I believe
implementing sql syntax gives an opportunity to familiarize  many part of
postgresql source code. Previous implementation is here
and
have an issue on explain query and break cases on unlabeled NULLs
To repeat what a corresponding by clause means
Corresponding clause either contains a BY(...) clause or not. If it
doesn't have a BY(...) clause the usage is as follows.

SELECT 1 a, 2 b, 3 c UNION CORRESPONDING SELECT 4 b, 5 d, 6 c, 7 f;

with output:
b c
-
2 3
4 6

i.e. matching column names are filtered and are only output from the
whole set operation clause.

If we introduce a BY(...) clause, then column names are further
intersected with that BY clause:

SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(b) SELECT 4 b, 5 d, 6 c, 7 f;

with output:

b
--
2
4
 My design is
 *In parsing stage*
1. Check at least one common column name appear in queries
2. If corresponding column list is not specified, then make corresponding
list from common column name in queries target lists in the order
that those column names appear in first query
3. If corresponding column list is specified, then check that every column
name
in The corresponding column list appears in column names of both queries
*In planner*
1. Take projection columen name from corresponding list
2. Reorder left and right queries target lists according to corresponding
list
3. For each query, project columens as many as number of corresponding
columen.
Continue with normal set operation process


Re: [HACKERS] pageinspect: Hash index support

2017-01-17 Thread Amit Kapila
On Fri, Jan 13, 2017 at 12:49 AM, Jesper Pedersen
 wrote:
> Hi,
>
> On 01/11/2017 03:16 PM, Ashutosh Sharma wrote:
>>
>>
>> I have rephrased it to make it more clear.
>>
>
> Rebased, and removed the compile warn in hashfuncs.c
>

Review comments:

1.
+static Page
+verify_hash_page(bytea *raw_page, int flags)

Few checks for meta page are missing, refer _hash_checkpage.

2.
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ (errmsg("must be superuser to use pageinspect functions";

Isn't it better to use "raw page" instead of "pageinspect" in the
above error message? If you agree, then fix other similar occurrences
in the patch.

3.
+ values[j++] = CStringGetTextDatum(psprintf("(%u,%u)",
+  BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid)),
+ itup->t_tid.ip_posid));

Fix indentation in the third line.

4.
+Datum
+hash_page_items(PG_FUNCTION_ARGS)
+{
+ bytea   *raw_page = PG_GETARG_BYTEA_P(0);


+Datum
+hash_bitmap_info(PG_FUNCTION_ARGS)
+{
+ Oid indexRelid = PG_GETARG_OID(0);
+ uint32 ovflblkno = PG_GETARG_UINT32(1);

Is there a reason for keeping the input arguments for
hash_bitmap_info() different from hash_page_items()?

5.
+hash_metapage_info(PG_FUNCTION_ARGS)
{
..
+ spares = palloc0(HASH_MAX_SPLITPOINTS * 5 + 1);
..
+ mapp = palloc0(HASH_MAX_BITMAPS * 5 + 1);
..
}

Don't you think we should free the allocated memory in this function?
Also, why are you 5 as a multiplier in both the above pallocs,
shouldn't it be 4?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gather Merge

2017-01-17 Thread Rushabh Lathia
On Tue, Jan 17, 2017 at 5:19 PM, Rushabh Lathia 
wrote:

>
>
> On Fri, Jan 13, 2017 at 10:52 AM, Rushabh Lathia  > wrote:
>
>>
>>
>> On Thu, Jan 12, 2017 at 8:50 AM, Robert Haas 
>> wrote:
>>
>>> On Sun, Dec 4, 2016 at 7:36 PM, Haribabu Kommi 
>>> wrote:
>>> > On Thu, Nov 24, 2016 at 11:12 PM, Rushabh Lathia <
>>> rushabh.lat...@gmail.com>
>>> > wrote:
>>> >> PFA latest patch with fix as well as few cosmetic changes.
>>> >
>>> > Moved to next CF with "needs review" status.
>>>
>>> I spent quite a bit of time on this patch over the last couple of
>>> days.  I was hoping to commit it, but I think it's not quite ready for
>>> that yet and I hit a few other issues along the way.  Meanwhile,
>>> here's an updated version with the following changes:
>>>
>>> * Adjusted cost_gather_merge because we don't need to worry about less
>>> than 1 worker.
>>> * Don't charge double maintenance cost of the heap per 34ca0905.  This
>>> was pointed out previous and Rushabh said it was fixed, but it wasn't
>>> fixed in v5.
>>> * cost_gather_merge claimed to charge a slightly higher IPC cost
>>> because we have to block, but didn't.  Fix it so it does.
>>> * Move several hunks to more appropriate places in the file, near
>>> related code or in a more logical position relative to surrounding
>>> code.
>>> * Fixed copyright dates for the new files.  One said 2015, one said 2016.
>>> * Removed unnecessary code from create_gather_merge_plan that tried to
>>> handle an empty list of pathkeys (shouldn't happen).
>>> * Make create_gather_merge_plan more consistent with
>>> create_merge_append_plan.  Remove make_gather_merge for the same
>>> reason.
>>> * Changed generate_gather_paths to generate gather merge paths.  In
>>> the previous coding, only the upper planner nodes ever tried to
>>> generate gather merge nodes, but that seems unnecessarily limiting,
>>> since it could be useful to generate a gathered path with pathkeys at
>>> any point in the tree where we'd generate a gathered path with no
>>> pathkeys.
>>> * Rewrote generate_ordered_paths() logic to consider only the one
>>> potentially-useful path not now covered by the new code in
>>> generate_gather_paths().
>>> * Reverted changes in generate_distinct_paths().  I think we should
>>> add something here but the existing logic definitely isn't right
>>> considering the change to generate_gather_paths().
>>> * Assorted cosmetic cleanup in nodeGatherMerge.c.
>>> * Documented the new GUC enable_gathermerge.
>>> * Improved comments.  Dropped one that seemed unnecessary.
>>> * Fixed parts of the patch to be more pgindent-clean.
>>>
>>>
>> Thanks Robert for hacking into this.
>>
>>
>>> Testing this against the TPC-H queries at 10GB with
>>> max_parallel_workers_per_gather = 4, seq_page_cost = 0.1,
>>> random_page_cost = 0.1, work_mem = 64MB initially produced somewhat
>>> demoralizing results.  Only Q17, Q4, and Q8 picked Gather Merge, and
>>> of those only Q17 got faster.  Investigating this led to me realizing
>>> that join costing for parallel joins is all messed up: see
>>> https://www.postgresql.org/message-id/CA+TgmoYt2pyk2CTyvYCtF
>>> ySXN=jsorgh8_mjttlowu5qkjo...@mail.gmail.com
>>>
>>> With that patch applied, in my testing, Gather Merge got picked for
>>> Q3, Q4, Q5, Q6, Q7, Q8, Q10, and Q17, but a lot of those queries get a
>>> little slower instead of a little faster.  Here are the timings --
>>> these are with EXPLAIN ANALYZE, so take them with a grain of salt --
>>> first number is without Gather Merge, second is with Gather Merge:
>>>
>>> Q3 16943.938 ms -> 18645.957 ms
>>> Q4 3155.350 ms -> 4179.431 ms
>>> Q5 13611.484 ms -> 13831.946 ms
>>> Q6 9264.942 ms -> 8734.899 ms
>>> Q7 9759.026 ms -> 10007.307 ms
>>> Q8 2473.899 ms -> 2459.225 ms
>>> Q10 13814.950 ms -> 12255.618 ms
>>> Q17 49552.298 ms -> 46633.632 ms
>>>
>>>
>> This is strange, I will re-run the test again and post the results soon.
>>
>>
> Here is the benchmark number which I got with the latest (v6) patch:
>
> - max_worker_processes = DEFAULT (8)
> - max_parallel_workers_per_gather = 4
> - Cold cache environment is ensured. With every query execution - server is
>   stopped and also OS caches were dropped.
> - The reported values of execution time (in ms) is median of 3 executions.
> - power2 machine with 512GB of RAM
> - With default postgres.conf
>
> Timing with v6 patch on REL9_6_STABLE branch
> (last commit: 8a70d8ae7501141d283e56b31e10c66697c986d5).
>
> Query 3: 49888.300 -> 45914.426
> Query 4: 8531.939 -> 7790.498
> Query 5: 40668.920 -> 38403.658
> Query 9: 90922.825 -> 50277.646
> Query 10: 45776.445 -> 39189.086
> Query 12: 21644.593 -> 21180.402
> Query 15: 63889.061 -> 62027.351
> Query 17: 142208.463 -> 118704.424
> Query 18: 244051.155 -> 186498.456
> Query 20: 212046.605 -> 159360.520
>
> Timing with v6 patch on master branch:
> (last commit: 0777f7a2e8e0a51f0f60cfe164d538bb459bf9f2)
>
> 

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-01-17 Thread Amit Kapila
On Tue, Jan 17, 2017 at 11:39 AM, Dilip Kumar  wrote:
> On Wed, Jan 11, 2017 at 10:55 AM, Dilip Kumar  wrote:
>> I have reviewed the latest patch and I don't have any more comments.
>> So if there is no objection from other reviewers I can move it to
>> "Ready For Committer"?
>
> Seeing no objections, I have moved it to Ready For Committer.
>

Thanks for the review.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >