Re: [HACKERS] pageinspect: Hash index support

2017-01-16 Thread Mithun Cy
>
> 7. I think it is not your bug, but probably a bug in Hash index
> itself; page flag is set to 66 (for below test); So the page is both
> LH_BUCKET_PAGE and LH_BITMAP_PAGE. Is not this a bug in hash index?
>
> I have inserted 3000 records. Hash index is on integer column.
> select hasho_flag FROM hash_page_stats(get_raw_page('i1', 1));
>  hasho_flag
> 
>  66
>

Here is the test for same. After insertion of 3000 records, I think at
first split we can see bucket page flag is set with LH_BITMAP_PAGE.

create table t1( ti int);
create index i1 on t1 using hash(ti);
postgres=# select hasho_flag from hash_page_stats(get_raw_page('i1', 1));
 hasho_flag

  2
postgres=# insert into t1 select generate_series(1, 1000);
INSERT 0 1000
postgres=# select hasho_flag from hash_page_stats(get_raw_page('i1', 1));
 hasho_flag

  2
(1 row)

postgres=# insert into t1 select generate_series(1, 1000);
INSERT 0 1000
postgres=# select hasho_flag from hash_page_stats(get_raw_page('i1', 1));
 hasho_flag

  2
(1 row)

postgres=# insert into t1 select generate_series(1, 1000);
INSERT 0 1000
postgres=# select hasho_flag from hash_page_stats(get_raw_page('i1', 1));
 hasho_flag

 66
(1 row)

I think If this is a bug then example given in pageinspect.sgml should
be changed in your patch after the bug fix.
+hasho_prevblkno | -1
+hasho_nextblkno | 8474
+hasho_bucket| 0
+hasho_flag  | 66
+hasho_page_id   | 65408



-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)

2017-01-16 Thread Michael Paquier
On Sat, Jan 14, 2017 at 12:58 AM, Dilip Kumar  wrote:
> On Wed, Jan 11, 2017 at 7:47 PM, Dilip Kumar  wrote:
>> +void
>> +pgstat_count_sqlstmt(const char *commandTag)
>> +{
>> + PgStat_SqlstmtEntry *htabent;
>> + bool found;
>> +
>> + if (!pgstat_track_sql)
>> + return
>>
>> Callers of pgstat_count_sqlstmt are always ensuring that
>> pgstat_track_sql is true, seems it's redundant here.
>
> I have done testing of the feature, it's mostly working as per the 
> expectation.
>
> I have few comments/questions.
>
> 
> If you execute the query with EXECUTE then commandTag will be EXECUTE
> that way we will not show the actual query type, I mean all the
> statements will get the common tag "EXECUTE".
>
> You can try this.
> PREPARE fooplan (int) AS INSERT INTO t VALUES($1);
> EXECUTE fooplan(1);
>
> --
>
> + /* Destroy the SQL statement counter stats HashTable */
> + hash_destroy(pgstat_sql);
> +
> + /* Create SQL statement counter Stats HashTable */
>
> I think in the patch we have used mixed of "Stats HashTable" and
> "stats HashTable", I think better
> to be consistent throughout the patch. Check other similar instances.
>
> 
>
> @@ -1102,6 +1102,12 @@ exec_simple_query(const char *query_string)
>
>   PortalDrop(portal, false);
>
> + /*
> + * Count SQL statement for pg_stat_sql view
> + */
> + if (pgstat_track_sql)
> + pgstat_count_sqlstmt(commandTag);
>
> We are only counting the successful SQL statement, Is that intentional?
>
> --
> I have noticed that pgstat_count_sqlstmt is called from
> exec_simple_query and exec_execute_message. Don't we want to track the
> statement executed from functions?
> ---

The latest patch available has rotten, could you rebase please?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench - allow backslash continuations in \set expressions

2017-01-16 Thread Rafia Sabih
On Fri, Jan 13, 2017 at 9:15 PM, Fabien COELHO  wrote:
>
>>> Could it be related to transformations operated when the file is
>>> downloaded
>>> and saved, because it is a text file?
>>
>>
>> I think this is delaying the patch unnecessarily, I have attached a
>> version, please see if you can apply it successfully, we can proceed
>> with that safely then...
>
>
> This is the same file I sent:
>
>  sh> sha1sum pgbench-continuation-4.patch pgbench-continuation-3.patch
>  97fe805a89707565210699694467f9256ca02dab  pgbench-continuation-4.patch
>  97fe805a89707565210699694467f9256ca02dab  pgbench-continuation-3.patch
>
> The difference is that your version is mime-typed with a generic
>
> Application/OCTET-STREAM
>
> While the one I sent was mime-typed as
>
> text/x-diff
>
> as defined by the system in /etc/mime.types on my xenial laptop.
>
> My guess is that with the later your mail client changes the file when
> saving it, because it sees "text".
>

Okay, I am marking it as 'Ready for committer' with the following note
to committer,
I am getting whitespace errors in v3 of patch, which I corrected in
v4, however, Fabien is of the opinion that v3 is clean and is showing
whitespace errors because of downloader, etc. issues in my setup.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2017-01-16 Thread Michael Paquier
On Tue, Nov 29, 2016 at 1:35 PM, Michael Paquier
 wrote:
> On Tue, Nov 22, 2016 at 8:35 PM, Haribabu Kommi
>  wrote:
>> Hi Craig,
>>
>> This is a gentle reminder.
>>
>> you assigned as reviewer to the current patch in the 11-2016 commitfest.
>> But you haven't shared your review yet. Please share your review about
>> the patch. This will help us in smoother operation of commitfest.
>>
>> Please Ignore if you already shared your review.
>
> I have moved this CF entry to 2017-01, the remaining, still unreviewed
> patch are for renaming pg_subxact and pg_clog.

The second patch has rotten a little because of the backup
documentation. By the way, is something going to happen in the CF?
Craig, you are a reviewer of this patch.
-- 
Michael


0001-Rename-pg_clog-to-pg_xact.patch
Description: invalid/octet-stream


0002-Rename-pg_subtrans-to-pg_subxact.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb_delete with arrays

2017-01-16 Thread Michael Paquier
On Sun, Dec 18, 2016 at 1:27 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> Speaking about implementation of `jsonb_delete_array` - it's fine, but I
> would like to suggest two modifications:
>
> * create a separate helper function for jsonb delete operation, to use it in
> both `jsonb_delete` and `jsonb_delete_array`. It will help to concentrate
> related logic in one place.

I am not sure that it is much a win as the code loses readability for
a minimal refactoring. What would have been nice is to group as well
jsonb_delete_idx but handling of the element deletion is really
different there.

> * use variadic arguments for `jsonb_delete_array`. For rare cases, when
> someone decides to use this function directly instead of corresponding
> operator. It will be more consistent with `jsonb_delete` from my point of
> view, because it's transition from `jsonb_delete(data, 'key')` to
> `jsonb_delete(data, 'key1', 'key2')` is more smooth, than to
> `jsonb_delete(data, '{key1, key2}')`.

That's a good idea.

> I've attached a patch with these modifications. What do you think?

Looking at both patches proposed, documentation is still missing in
the list of jsonb operators as '-' is missing for arrays. I am marking
this patch as waiting on author for now.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-01-16 Thread Beena Emerson
On Tue, Jan 17, 2017 at 12:38 PM, Michael Paquier  wrote:

> On Tue, Jan 17, 2017 at 4:06 PM, Beena Emerson 
> wrote:
> > I have already made patch for the generic SHOW replication command
> > (attached) and am working on the new initdb patch based on that.
> > I have not yet fixed the pg_standby issue. I am trying to address all the
> > comments and bugs still.
>
> Having documentation for this patch in protocol.sgml would be nice.
>

Yes. I will add that.

-- 
Thank you,

Beena Emerson

Have a Great Day!


Re: [HACKERS] increasing the default WAL segment size

2017-01-16 Thread Michael Paquier
On Tue, Jan 17, 2017 at 4:06 PM, Beena Emerson  wrote:
> I have already made patch for the generic SHOW replication command
> (attached) and am working on the new initdb patch based on that.
> I have not yet fixed the pg_standby issue. I am trying to address all the
> comments and bugs still.

Having documentation for this patch in protocol.sgml would be nice.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-01-16 Thread Beena Emerson
On Tue, Jan 17, 2017 at 12:18 PM, Michael Paquier  wrote:

> On Sun, Jan 8, 2017 at 9:52 AM, Jim Nasby 
> wrote:
> >> I agree pretty_print_kb would have been a better for this function.
> >> However, I have realised that using the show hook and this function is
> >> not suitable and have found a better way of handling the removal of
> >> GUC_UNIT_XSEGS which no longer needs this function : using the
> >> GUC_UNIT_KB, convert the value in bytes to wal segment count instead in
> >> the assign hook. The next version of patch will use this.
> >
> >
> > ... it sounds like you're going back to exposing KB to users, and that's
> all
> > that really matters.
> >
> >> IMHO it'd be better to use the n & (n-1) check detailed at [3].
>
> That would be better.
>
> So I am looking at the proposed patch, though there have been reviews
> the patch was in "Needs Review" state, and as far as I can see it is a
> couple of things for frontends. Just by grepping for XLOG_SEG_SIZE I
> have spotted the following problems:
> - pg_standby uses it to know about the next segment available.
>

Yes. I am aware of this and had mentioned it in my post.


> - pg_receivexlog still uses it in segment handling.
> It may be a good idea to just remove XLOG_SEG_SIZE and fix the code
> paths that fail to compile without it, frontend utilities included
> because a lot of them now rely on the value coded in xlog_internal.h,
> but with this patch the value is set up in the context of initdb. And
> this would induce major breakages in many backup tools, pg_rman coming
> first in mind... We could replace it with for example a macro that
> frontends could use to check if the size of the WAL segment is in a
> valid range if the tool does not have direct access to the Postgres
> instance (aka the size of the WAL segment used there) as there are as
> well offline tools.
>

I will see whats the best way to do this.


>
> -#define XLogSegSize((uint32) XLOG_SEG_SIZE)
> +
> +extern uint32 XLogSegSize;
> +#define XLOG_SEG_SIZE XLogSegSize
> This bit is really bad for frontend declaring xlog_internal.h...
>
> --- a/src/bin/pg_test_fsync/pg_test_fsync.c
> +++ b/src/bin/pg_test_fsync/pg_test_fsync.c
> @@ -62,7 +62,7 @@ static const char *progname;
>
>  static int secs_per_test = 5;
>  static int needs_unlink = 0;
> -static char full_buf[XLOG_SEG_SIZE],
> +static char full_buf[DEFAULT_XLOG_SEG_SIZE],
> This would make sense as a new option of pg_test_fsync.
>
> A performance study would be a good idea as well. Regarding the
> generic SHOW command in the replication protocol, I may do it for next
> CF, I have use cases for it in my pocket.
>
>
Thank you for your review.

I have already made patch for the generic SHOW replication command
(attached) and am working on the new initdb patch based on that.
I have not yet fixed the pg_standby issue. I am trying to address all the
comments and bugs still.


-- 


Beena Emerson

Have a Great Day!


repl_show_cmd.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Refactor "if(strspn(str, ...) == strlen(str)" code

2017-01-16 Thread Michael Paquier
On Wed, Jan 11, 2017 at 4:50 PM, Michael Paquier
 wrote:
> Looking at this patch, I am not sure that it is worth worrying about.
> This is a receipt to make back-patching a bit more complicated, and it
> makes the code more complicated to understand. So I would vote for
> rejecting it and move on.

I have done so for now to make the CF move, if somebody wants to
complain feel free...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-01-16 Thread Michael Paquier
On Sun, Jan 8, 2017 at 9:52 AM, Jim Nasby  wrote:
>> I agree pretty_print_kb would have been a better for this function.
>> However, I have realised that using the show hook and this function is
>> not suitable and have found a better way of handling the removal of
>> GUC_UNIT_XSEGS which no longer needs this function : using the
>> GUC_UNIT_KB, convert the value in bytes to wal segment count instead in
>> the assign hook. The next version of patch will use this.
>
>
> ... it sounds like you're going back to exposing KB to users, and that's all
> that really matters.
>
>> IMHO it'd be better to use the n & (n-1) check detailed at [3].

That would be better.

So I am looking at the proposed patch, though there have been reviews
the patch was in "Needs Review" state, and as far as I can see it is a
couple of things for frontends. Just by grepping for XLOG_SEG_SIZE I
have spotted the following problems:
- pg_standby uses it to know about the next segment available.
- pg_receivexlog still uses it in segment handling.
It may be a good idea to just remove XLOG_SEG_SIZE and fix the code
paths that fail to compile without it, frontend utilities included
because a lot of them now rely on the value coded in xlog_internal.h,
but with this patch the value is set up in the context of initdb. And
this would induce major breakages in many backup tools, pg_rman coming
first in mind... We could replace it with for example a macro that
frontends could use to check if the size of the WAL segment is in a
valid range if the tool does not have direct access to the Postgres
instance (aka the size of the WAL segment used there) as there are as
well offline tools.

-#define XLogSegSize((uint32) XLOG_SEG_SIZE)
+
+extern uint32 XLogSegSize;
+#define XLOG_SEG_SIZE XLogSegSize
This bit is really bad for frontend declaring xlog_internal.h...

--- a/src/bin/pg_test_fsync/pg_test_fsync.c
+++ b/src/bin/pg_test_fsync/pg_test_fsync.c
@@ -62,7 +62,7 @@ static const char *progname;

 static int secs_per_test = 5;
 static int needs_unlink = 0;
-static char full_buf[XLOG_SEG_SIZE],
+static char full_buf[DEFAULT_XLOG_SEG_SIZE],
This would make sense as a new option of pg_test_fsync.

A performance study would be a good idea as well. Regarding the
generic SHOW command in the replication protocol, I may do it for next
CF, I have use cases for it in my pocket.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Measuring replay lag

2017-01-16 Thread Fujii Masao
On Thu, Dec 22, 2016 at 6:14 AM, Thomas Munro
 wrote:
> On Thu, Dec 22, 2016 at 2:14 AM, Fujii Masao  wrote:
>> I agree that the capability to measure the remote_apply lag is very useful.
>> Also I want to measure the remote_write and remote_flush lags, for example,
>> in order to diagnose the cause of replication lag.
>
> Good idea.  I will think about how to make that work.  There was a
> proposal to make writing and flushing independent[1].  I'd like that
> to go in.  Then the write_lag and flush_lag could diverge
> significantly, and it would be nice to be able to see that effect as
> time (though you could already see it with LSN positions).
>
>> For that, what about maintaining the pairs of send-timestamp and LSN in
>> *sender side* instead of receiver side? That is, walsender adds the pairs
>> of send-timestamp and LSN into the buffer every sampling period.
>> Whenever walsender receives the write, flush and apply locations from
>> walreceiver, it calculates the write, flush and apply lags by comparing
>> the received and stored LSN and comparing the current timestamp and
>> stored send-timestamp.
>
> I thought about that too, but I couldn't figure out how to make the
> sampling work.  If the primary is choosing (LSN, time) pairs to store
> in a buffer, and the standby is sending replies at times of its
> choosing (when wal_receiver_status_interval has been exceeded), then
> you can't accurately measure anything.

Yeah, even though the primary stores (100, 2017-01-17 00:00:00) as the pair of
(LSN, timestamp), for example, the standby may not send back the reply for
LSN 100 itself. The primary may receive the reply for larger LSN like 200,
instead. So the measurement of the lag in the primary side would not be so
accurate.

But we can calculate the "sync rep" lag by comparing the stored timestamp of
LSN 100 and the timestamp when the reply for LSN 200 is received. In sync rep,
since the transaction waiting for LSN 100 to be replicated is actually released
after the reply for LSN 200 is received, the above calculated lag is basically
accurate as sync rep lag.

Therefore I'm still thinking that it's better to maintain the pairs of LSN
and timestamp in the *primary* side. Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting huge pages on Windows

2017-01-16 Thread Amit Kapila
On Tue, Jan 10, 2017 at 8:49 AM, Tsunakawa, Takayuki
 wrote:
> Hello, Amit, Magnus,
>
> I'm sorry for my late reply.  Yesterday was a national holiday in Japan.
>
>
> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Amit Kapila
>> PGSharedMemoryReAttach is called after the startup of new process whereas
>> pgwin32_ReserveSharedMemoryRegion is called before the new process could
>> actually start.  Basically, pgwin32_ReserveSharedMemoryRegion is used to
>> reserve shared memory for each backend, so calling VirtualAlloc there should
>> follow spec for huge pages.  If you have some reason for not using, then
>> it is not clear from your reply, can you try to explain in detail.
>
> OK.  The processing takes place in the following order:
>
> 1. postmaster calls CreateProcess() to start a child postgres in a suspended 
> state.
> 2. postmaster calls VirtualAlloc(MEM_RESERVE) in 
> pgwin32_ReserveSharedMemoryRegion() to reserve the virtual address space in 
> the child to secure space for the shared memory.  This call just affects the 
> virtual address space and does not allocate physical memory.  So the large 
> page is still irrelevant.

Okay, today again reading VirtualAlloc specs, I could see that
MEM_LARGE_PAGES is not not required to reserve the memory region.  It
is only required during allocation.


>
> I succeeded by following the same procedure using secpol.msc on Win10, 
> running 64-bit PostgreSQL.  I started PostgreSQL as a Windows service because 
> it's the normal way, with the service account being a regular Windows user 
> account(i.e. not LocalSystem).
>
> But... I failed to start PostgreSQL by running "pg_ctl start" from a command 
> prompt, receiving the same error message as you.  On the other hand, I could 
> start PostgreSQL on a command prompt with administrative privileges 
> (right-click on "Command prompt" from the Start menu and select "Run as an 
> administrator" in the menu.


Hmm.  It doesn't work even on a command prompt with administrative
privileges. It gives below error:

waiting for server to start2017-01-17 11:20:13.780 IST [4788] FATAL:  could
not create shared memory segment: error code 1450
2017-01-17 11:20:13.780 IST [4788] DETAIL:  Failed system call was CreateFileMap
ping(size=148897792, name=Global/PostgreSQL:E:/WorkSpace/PostgreSQL/master/Data)
.
2017-01-17 11:20:13.780 IST [4788] LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server
Examine the log output.


Now, error code 1450 can occur due to insufficient system resources,
so I have tried by increasing the size of shared memory (higher value
of shared_buffers) without your patch and it works.  This indicates
some problem with the patch.

>  It seems that Windows removes many privileges, including "Lock Pages in 
> Memory", when starting the normal command prompt.  As its evidence, you can 
> use the attached priv.c to see what privileges are assigned and and 
> enabled/disabled.  Build it like "cl priv.c" and just run priv.exe on each 
> command prompt.  Those runs show different privileges.
>

This is bad.

> Should I need to do something, e.g. explain in the document that the user 
> should use the command prompt with administrative privileges when he uses 
> huge_pages?
>

I think it is better to document in some way if we decide to go-ahead
with the patch.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_hba_file_settings view patch

2017-01-16 Thread Michael Paquier
On Tue, Jan 17, 2017 at 10:19 AM, Haribabu Kommi
 wrote:
> On Tue, Jan 10, 2017 at 6:35 PM, Michael Paquier 
> wrote:
>> +/* LDAP supports 10 currently, keep this well above the most any
>> method needs */
>> +#define MAX_OPTIONS 12
>> Er, why? There is an assert already, that should be enough.
>
>
> Which Assert? This macro is used to verify that the maximum number
> of authentication options that are possible for a single hba line.

That one:
+   Assert(noptions <= MAX_OPTIONS);
+   if (noptions)
+   return PointerGetDatum(
+   construct_array(options, noptions, TEXTOID, -1, false, 'i'));

>> =# \d pg_hba_rules
>>View "pg_catalog.pg_hba_rules"
>>   Column  |  Type   | Collation | Nullable | Default
>> --+-+---+--+-
>>  line_number  | integer |   |  |
>>  type | text|   |  |
>>  keyword_database | text[]  |   |  |
>>  database | text[]  |   |  |
>>  keyword_user | text[]  |   |  |
>>  user_name| text[]  |   |  |
>>  keyword_address  | text|   |  |
>>  address  | inet|   |  |
>>  netmask  | inet|   |  |
>>  hostname | text|   |  |
>>  method   | text|   |  |
>>  options  | text[]  |   |  |
>>  error| text|   |  |
>> keyword_database and database map actually to the same thing if you
>> refer to a raw pg_hba.conf file because they have the same meaning for
>> user. You could simplify the view and just remove keyword_database,
>> keyword_user and keyword_address. This would simplify your patch as
>> well with all hte mumbo-jumbo to see if a string is a dedicated
>> keyword or not. In its most simple shape pg_hba_rules should show to
>> the user as an exact map of the entries of the raw file.
>
> I removed keyword_database and keyword_user columns where the data
> in those columns can easily represent with the database and user columns.
> But for address filed can contains keywords such as "same host" and etc and
> also a hostname also. Because of this reason, this field is converted into
> 3 columns in the view.

Hm. We could as well consider cidr and use just one column. But still,
the use of inet as a data type in a system view looks like a wrong
choice to me. Or we could actually just use text... Opinions from
others are welcome here of course.

>> I have copied the example file of pg_hba.conf, reloaded it:
>> https://www.postgresql.org/docs/devel/static/auth-pg-hba-conf.html
>> And then the output result gets corrupted by showing up free()'d results:
>> null   | null| \x7F\x7F\x7F\x7F\x7F
>
> There was a problem in resetting the error string, working with attached
> patch.

Thanks. Now that works.

> Updated patch attached.

This begins to look good. I have found a couple of minor issues.

+  
+   The pg_hba_rules view can be read only by
+   superusers.
+  
This is not true anymore.

+ 
+  Line number within client authentication configuration file
+  the current value was set at
+ 
I'd tune that without a past sentence. Saying just pg_hba.conf would
be fine perhaps?

+
+ database
+ text[]
+ List of database name
+
This should be plural, database nameS.

+ 
+  List of keyword address names,
+  name can be all, samehost and samenet
+ 
Phrasing looks weird to me, what about "List of keyword address names,
whose values can be all, samehost or samenet", with  markups.

+postgres=# select line_number, type, database, user_name, auth_method
from pg_hba_rules;
Nit: this could be upper-cased.

+static Datum
+getauthmethod(UserAuth auth_method)
+{
+ ...
+   default:
+   elog(ERROR, "unexpected authentication method in parsed HBA entry");
+   break;
+   }
I think that you should also remove the default clause here to catchup
errors at compilation.

+   switch (hba->conntype)
+   {
+   case ctLocal:
+   values[index] = CStringGetTextDatum("local");
+   break;
+   case ctHost:
+   values[index] = CStringGetTextDatum("host");
+   break;
+   case ctHostSSL:
+   values[index] = CStringGetTextDatum("hostssl");
+   break;
+   case ctHostNoSSL:
+   values[index] = CStringGetTextDatum("hostnossl");
+   break;
+   default:
+   elog(ERROR, "unexpected connection type in parsed HBA entry");
+   }
You could go without the default clause here as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-01-16 Thread Dilip Kumar
On Wed, Jan 11, 2017 at 10:55 AM, Dilip Kumar  wrote:
> I have reviewed the latest patch and I don't have any more comments.
> So if there is no objection from other reviewers I can move it to
> "Ready For Committer"?

Seeing no objections, I have moved it to Ready For Committer.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Append implementation

2017-01-16 Thread Amit Langote
Hi Amit,

On 2016/12/23 14:21, Amit Khandekar wrote:
> Currently an Append plan node does not execute its subplans in
> parallel. There is no distribution of workers across its subplans. The
> second subplan starts running only after the first subplan finishes,
> although the individual subplans may be running parallel scans.
> 
> Secondly, we create a partial Append path for an appendrel, but we do
> that only if all of its member subpaths are partial paths. If one or
> more of the subplans is a non-parallel path, there will be only a
> non-parallel Append. So whatever node is sitting on top of Append is
> not going to do a parallel plan; for example, a select count(*) won't
> divide it into partial aggregates if the underlying Append is not
> partial.
> 
> The attached patch removes both of the above restrictions.  There has
> already been a mail thread [1] that discusses an approach suggested by
> Robert Haas for implementing this feature. This patch uses this same
> approach.

I was looking at the executor portion of this patch and I noticed that in
exec_append_initialize_next():

if (appendstate->as_padesc)
return parallel_append_next(appendstate);

/*
 * Not parallel-aware. Fine, just go on to the next subplan in the
 * appropriate direction.
 */
if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
appendstate->as_whichplan++;
else
appendstate->as_whichplan--;

which seems to mean that executing Append in parallel mode disregards the
scan direction.  I am not immediately sure what implications that has, so
I checked what heap scan does when executing in parallel mode, and found
this in heapgettup():

else if (backward)
{
/* backward parallel scan not supported */
Assert(scan->rs_parallel == NULL);

Perhaps, AppendState.as_padesc would not have been set if scan direction
is backward, because parallel mode would be disabled for the whole query
in that case (PlannerGlobal.parallelModeOK = false).  Maybe add an
Assert() similar to one in heapgettup().

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Grouped base relation

2017-01-16 Thread David Rowley
On 17 January 2017 at 16:30, Tomas Vondra  wrote:
> Let's instead use an example similar to what Antonin mentioned in the
> initial post - two tables, with two columns each.
>
> CREATE TABLE t1 (a INT, b INT);
> CREATE TABLE t2 (c INT, d INT);
>
> And let's assume each table has 100.000 rows, but only 100 groups in the
> first column, with 1000 rows per group. Something like
>
> INSERT INTO t1
> SELECT mod(i,100), i FROM generate_series(1, 1e5) s(i);
>
> INSERT INTO t2
> SELECT mod(i,100), i FROM generate_series(1, 1e5) s(i);
>
> And let's assume this query
>
> SELECT t1.a, count(t2.d) FROM t1 JOIN t2 ON (t1.a = t2.c)
>  GROUP BY t1.a;
>
> On master, EXPLAIN (COSTS OFF, TIMING OFF, ANALYZE) looks like this:
>
>QUERY PLAN
> -
>  HashAggregate (actual rows=100 loops=1)
>Group Key: t1.a
>->  Hash Join (actual rows=1 loops=1)
>  Hash Cond: (t2.c = t1.a)
>  ->  Seq Scan on t2 (actual rows=10 loops=1)
>  ->  Hash (actual rows=10 loops=1)
>Buckets: 131072  Batches: 2  Memory Usage: 2716kB
>->  Seq Scan on t1 (actual rows=10 loops=1)
>  Planning time: 0.167 ms
>  Execution time: 17005.300 ms
> (10 rows)
>
> while with the patch it looks like this
>
>  QUERY PLAN
> -
>  Finalize HashAggregate (actual rows=100 loops=1)
>Group Key: t1.a
>->  Hash Join (actual rows=100 loops=1)
>  Hash Cond: (t1.a = t2.c)
>  ->  Partial HashAggregate (actual rows=100 loops=1)
>Group Key: t1.a
>->  Seq Scan on t1 (actual rows=10 loops=1)
>  ->  Hash (actual rows=100 loops=1)
>Buckets: 1024  Batches: 1  Memory Usage: 14kB
>->  Partial HashAggregate (actual rows=100 loops=1)
>  Group Key: t2.c
>  ->  Seq Scan on t2 (actual rows=10 loops=1)
>  Planning time: 0.105 ms
>  Execution time: 31.609 ms
> (14 rows)
>
> This of course happens because with the patch we run the transition function
> 200k-times (on each side of the join) and aggtransmultifn on the 100 rows
> produced by the join, while on master the join produces 10.000.000 rows
> (which already takes much more time), and then have to run the transition
> function on all those rows.
>
> The performance difference is pretty obvious, I guess.

An exceptional improvement.

For the combine aggregate example of this query, since no patch exists
yet, we could simply mock what the planner would do by rewriting the
query. I'll use SUM() in-place of the combinefn for COUNT():

explain analyze SELECT t1.a, sum(t2.d) FROM t1 join (SELECT c,count(d)
d from t2 group by c) t2 on t1.a = t2.c group by t1.a;

this seems to be 100,000 aggtransfn calls (for t2), then 100,000
aggcombinefn calls (for t1) (total = 200,000), where as the patch
would perform 100,000 aggtransfn calls (for t2), then 100,000
aggtransfn calls (for t1), then 100 aggtransmultifn calls (total =
200,100)

Is my maths ok?

> I don't quite see how the patch could use aggcombinefn without sacrificing a
> lot of the benefits. Sure, it's possible to run the aggcombinefn in a loop
> (with number of iterations determined by the group size on the other side of
> the join), but that sounds pretty expensive and eliminates the reduction of
> transition function calls. The join cardinality would still be reduced,
> though.

I'd be interested in seeing the run time of my example query above. I
can't quite see a reason for it to be slower, but please let me know.

> I do have other question about the patch, however. It seems to rely on the
> fact that the grouping and joins both reference the same columns. I wonder
> how uncommon such queries are.
>
> To give a reasonable example, imagine the typical start schema, which is
> pretty standard for large analytical databases. A dimension table is
> "products" and the fact table is "sales", and the schema might look like
> this:
>
> CREATE TABLE products (
> idSERIAL PRIMARY KEY,
> name  TEXT,
> category_id   INT,
> producer_id   INT
> );
>
> CREATE TABLE sales (
> product_idREFERENCES products (id),
> nitemsINT,
> price NUMERIC
> );
>
> A typical query then looks like this:
>
> SELECT category_id, SUM(nitems), SUM(price)
> FROM products p JOIN sales s ON (p.id = s.product_id)
> GROUP BY p.category_id;
>
> which obviously uses different columns for the grouping and join, and so the
> patch won't help with that. Of course, a query grouping by product_id would
> allow the patch to work
>
> SELECT category_id, SUM(nitems), SUM(price)
> FROM products p JOIN sales s ON (p.id = s.product_id)
> GROUP BY p.product_id;
>
> Another thing is that in my 

Re: [HACKERS] Retiring from the Core Team

2017-01-16 Thread Steve Crawford
Thanks, Josh, for everything. I especially enjoyed your monthly updates at
SFPUG.

Cheers,
Steve

On Thu, Jan 12, 2017 at 1:59 PM, Merlin Moncure  wrote:

> On Wed, Jan 11, 2017 at 6:29 PM, Josh Berkus  wrote:
> > Hackers:
> >
> > You will have noticed that I haven't been very active for the past year.
> >  My new work on Linux containers and Kubernetes has been even more
> > absorbing than I anticipated, and I just haven't had a lot of time for
> > PostgreSQL work.
> >
> > For that reason, as of today, I am stepping down from the PostgreSQL
> > Core Team.
>
> Thanks for all your hard work.  FWIW, your blog posts, 'Primary
> Keyvil' are some of my favorite of all time!
>
> merlin
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Too many autovacuum workers spawned during forced auto-vacuum

2017-01-16 Thread Amit Khandekar
On 16 January 2017 at 15:54, Masahiko Sawada  wrote:
> Since autovacuum worker wakes up autovacuum launcher after launched
> the autovacuum launcher could try to spawn worker process at high
> frequently if you have database with very large table in it that has
> just passed autovacuum_freeze_max_age.
>
> autovacuum.c:L1605
> /* wake up the launcher */
> if (AutoVacuumShmem->av_launcherpid != 0)
> kill(AutoVacuumShmem->av_launcherpid, SIGUSR2);
>
> I think we should deal with this case as well.

When autovacuum is enabled, after getting SIGUSR2, the worker is
launched only when it's time to launch. Doesn't look like it will be
immediately launched :

/* We're OK to start a new worker */
if (dlist_is_empty())
{ /* Special case when the list is empty */ }
else
{
 .
 .
   /* launch a worker if next_worker is right now or it is in the past */
   if (TimestampDifferenceExceeds(avdb->adl_next_worker,
current_time, 0))
  launch_worker(current_time);
}

So from the above, it looks as if there will not be a storm of workers.


Whereas, if autovacuum is disabled, autovacuum launcher does not wait
for the worker to start; it just starts the worker and quits; so the
issue won't show up here :

/*
   * In emergency mode, just start a worker (unless shutdown was requested)
   * and go away.
*/
if (!AutoVacuumingActive())
{
if (!got_SIGTERM)
do_start_worker();
proc_exit(0); /* done */
}


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

On 16 January 2017 at 15:54, Masahiko Sawada  wrote:
> On Mon, Jan 16, 2017 at 1:50 PM, Amit Khandekar  
> wrote:
>> On 13 January 2017 at 19:15, Alvaro Herrera  wrote:
>>> I think this is the same problem as reported in
>>> https://www.postgresql.org/message-id/CAMkU=1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta=ypy...@mail.gmail.com
>>
>> Ah yes, this is the same problem. Not sure why I didn't land on that
>> thread when I tried to search pghackers using relevant keywords.
>>>
 === Fix ===
>>> [...]
 Instead, the attached patch (prevent_useless_vacuums.patch) prevents
 the repeated cycle by noting that there's no point in doing whatever
 vac_update_datfrozenxid() does, if we didn't find anything to vacuum
 and there's already another worker vacuuming the same database. Note
 that it uses wi_tableoid field to check concurrency. It does not use
 wi_dboid field to check for already-processing worker, because using
 this field might cause each of the workers to think that there is some
 other worker vacuuming, and eventually no one vacuums. We have to be
 certain that the other worker has already taken a table to vacuum.
>>>
>>> Hmm, it seems reasonable to skip the end action if we didn't do any
>>> cleanup after all. This would normally give enough time between vacuum
>>> attempts for the first worker to make further progress and avoid causing
>>> a storm.  I'm not really sure that it fixes the problem completely, but
>>> perhaps it's enough.
>>
>> I had thought about this : if we didn't clean up anything, skip the
>> end action unconditionally without checking if there was any
>> concurrent worker. But then thought it is better to skip only if we
>> know there is another worker doing the same job, because :
>> a) there might be some reason we are just calling
>> vac_update_datfrozenxid() without any condition. But I am not sure
>> whether it was intentionally kept like that. Didn't get any leads from
>> the history.
>> b) it's no harm in updating datfrozenxid() it if there was no other
>> worker. In this case, we *know* that there was indeed nothing to be
>> cleaned up. So the next time this database won't be chosen again, so
>> there's no harm just calling this function.
>>
>
> Since autovacuum worker wakes up autovacuum launcher after launched
> the autovacuum launcher could try to spawn worker process at high
> frequently if you have database with very large table in it that has
> just passed autovacuum_freeze_max_age.
>
> autovacuum.c:L1605
> /* wake up the launcher */
> if (AutoVacuumShmem->av_launcherpid != 0)
> kill(AutoVacuumShmem->av_launcherpid, SIGUSR2);
>
> I think we should deal with this case as well.
>
> Regards,
>
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cache Hash Index meta page.

2017-01-16 Thread Amit Kapila
On Fri, Jan 13, 2017 at 9:58 AM, Mithun Cy  wrote:
> On Fri, Jan 6, 2017 at 11:43 AM, Amit Kapila  wrote:

Below are review comments on latest version of patch.

1.
  /*
- * Read the metapage to fetch original bucket and tuple counts.  Also, we
- * keep a copy of the last-seen metapage so that we can use its
- * hashm_spares[] values to compute bucket page addresses.  This is a bit
- * hokey but perfectly safe, since the interesting entries in the spares
- * array cannot change under us; and it beats rereading the metapage for
- * each bucket.
+ * update and get the metapage cache data.
  */
- metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
- metap = HashPageGetMeta(BufferGetPage(metabuf));
- orig_maxbucket = metap->hashm_maxbucket;
- orig_ntuples = metap->hashm_ntuples;
- memcpy(_metapage, metap, sizeof(local_metapage));
- /* release the lock, but keep pin */
- LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+ cachedmetap = _hash_getcachedmetap(rel, true);
+ orig_maxbucket = cachedmetap->hashm_maxbucket;
+ orig_ntuples = cachedmetap->hashm_ntuples;

(a) I think you can retain the previous comment or modify it slightly.
Just removing the whole comment and replacing it with a single line
seems like a step backward.
(b) Another somewhat bigger problem is that with this new change it
won't retain the pin on meta page till the end which means we might
need to perform an I/O again during operation to fetch the meta page.
AFAICS, you have just changed it so that you can call new API
_hash_getcachedmetap, if that's true, then I think you have to find
some other way of doing it.  BTW, why can't you design your new API
such that it always take pinned metapage?  You can always release the
pin in the caller if required.  I understand that you don't always
need a metapage in that API, but I think the current usage of that API
is also not that good.


2.
+ if (bucket_opaque->hasho_prevblkno != InvalidBlockNumber ||
+ bucket_opaque->hasho_prevblkno > cachedmetap->hashm_maxbucket)
+ cachedmetap = _hash_getcachedmetap(rel, true);

I don't understand the meaning of above if check.  It seems like you
will update the metapage when previous block number is not a valid
block number which will be true at the first split.  How will you
ensure that there is a re-split and cached metapage is not relevant.
I think if there is && in the above condition, then we can ensure it.

3.
+ Given a hashkey get the target bucket page with read lock, using cached
+ metapage. The getbucketbuf_from_hashkey method below explains the same.
+

All the sentences in algorithm start with small letters, then why do
you need an exception for this sentence.  I think you don't need to
add an empty line. Also, I think the usage of
getbucketbuf_from_hashkey seems out of place.  How about writing it
as:

The usage of cached metapage is explained later.


4.
+ If target bucket is split before metapage data was cached then we are
+ done.
+ Else first release the bucket page and then update the metapage cache
+ with latest metapage data.

I think it is better to retain original text of readme and add about
meta page update.

5.
+ Loop:
..
..
+ Loop again to reach the new target bucket.

No need to write "Loop again ..", that is implicit.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension)

2017-01-16 Thread Haribabu Kommi
On Sun, Jan 8, 2017 at 2:01 PM, Jim Nasby  wrote:

> On 12/29/16 9:55 PM, Haribabu Kommi wrote:
>
>> The tuples which don't have multiple copies or frozen data will be moved
>> from WOS to ROS periodically by the background worker process or autovauum
>> process. Every column data is stored separately in it's relation file.
>> There
>> is no transaction information is present in ROS. The data in ROS can be
>> referred with tuple ID.
>>
>
> Would updates be handled via the delete mechanism you described then?
>

Updates are handled similar like delete operations, but there are some extra
index insert operations occurs in this index even when the update is of HOT
type, because of TID-CRID mapping.


> In this approach, the column data is present in both heap and columnar
>> storage.
>>
>
> ISTM one of the biggest reasons to prefer a column store over heap is to
> ditch the 24 byte overhead, so I'm not sure how much of a win this is.
>

Yes, that' correct. Currently with this approach, it is not possible to
ditch the
heap completely. This approach is useful for the cases, where the user wants
to store only some columns as part of clustered index.


Another complication is that one of the big advantages of a CSTORE is
> allowing analysis to be done efficiently on a column-by-column (as opposed
> to row-by-row) basis. Does your patch by chance provide that?
>

Not the base patch that I shared. But the further patches provides the data
access
column-by-column basis using the custom plan methods.


> Generally speaking, I do think the idea of adding support for this as an
> "index" is a really good starting point, since that part of the system is
> pluggable. It might be better to target getting only what needs to be in
> core into core to begin with, allowing the other code to remain an
> extension for now. I think there's a lot of things that will be discovered
> as we start moving into column stores, and it'd be very unfortunate to
> accidentally paint the core code into a corner somewhere.
>

Yes, it is possible to add only the code that is required in the core and
keep the other part
as extension. Without providing the complete clustered index approach, I
doubt whether
the necessary hooks and it's code gets accepted to the core.


> As a side note, it's possible to get a lot of the benefits of a column
> store by using arrays. I've done some experiments with that and got an
> 80-90% space reduction, and most queries saw improved performance as well
> (there were a few cases that weren't better). The biggest advantage to this
> approach is people could start using it today, on any recent version of
> Postgres.


Interesting experiment.


> That would be a great way to gain knowledge on what users would want to
> see in a column store, something else I suspect we need. It would also be
> far less code than what you or Alvaro are proposing. When it comes to large
> changes that don't have crystal-clear requirements, I think that's really
> important.
>

The  main use case of this patch is to support mixed load environments,
where both OLTP and OLAP queries are possible. The advantage of
proposed patch design is, providing good performance to OLAP queries
without affecting OLTP.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-01-16 Thread Fujii Masao
On Sat, Jan 14, 2017 at 3:10 AM, Vladimir Rusinov  wrote:
> Attached are two new version of the patch: one keeps aliases, one don't.
> Also, remove stray reference to xlog function in one of the tests.
>
> I've lost vote count. Should we create a form to calculate which one of the
> patches should be commited?

If we do that, we should vote on all the "renaming" stuff, i.e., not only
function names but also program names like pg_receivexlog, directory names
like clog, option names like xlogdir option of initdb, return value names of
the functions like xlog_position in pg_create_physical_replication_slot, etc.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension)

2017-01-16 Thread Haribabu Kommi
On Sun, Jan 8, 2017 at 4:20 AM, Bruce Momjian  wrote:

> On Fri, Dec 30, 2016 at 02:55:39PM +1100, Haribabu Kommi wrote:
> >
> > Hi All,
> >
> > Fujitsu was interested in developing a columnar storage extension with
> minimal
> > changes the server backend.
> >
> > The columnar store is implemented as an extension using index access
> methods.
> > This can be easily enhanced with pluggable storage methods once they are
> > available.
>
> Have you see this post from 2015:
>
> https://www.postgresql.org/message-id/20150831225328.GM2912%
> 40alvherre.pgsql
>


Thanks for the information.
Yes, I already checked that mail thread. The proposal in that thread was
trying to add
the columnar storage in the core itself. The patch that is proposed is an
extension to
provide columnar storage with the help of index.

May be we can discuss the pros and cons in adding columnar store in the
core itself
or a pluggable storage approach.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] PoC: Grouped base relation

2017-01-16 Thread Tomas Vondra

On 01/17/2017 12:42 AM, David Rowley wrote:

On 10 January 2017 at 06:56, Antonin Houska  wrote:

Before performing the final aggregation, we need to multiply sum(a.x) by
count(j.*) because w/o the aggregation at base relation level the input
of the query-level aggregation would look like

 a.i | a.x | b.j

 1   |   3 |  1
 1   |   4 |  1
 1   |   3 |  1
 1   |   4 |  1

In other words, grouping of the base relation "b" below the join prevents the
join from bringing per-group input set to the aggregate input multiple
times. To compensate for this effect, I've added a new field "aggtransmultifn"
to pg_aggregate catalog. It "multiplies" the aggregate state w/o repeated
processing of the same input set many times. sum() is an example of an
aggregate that needs such processing, avg() is one that does not.


First off, I'd like to say that making improvements in this area
sounds like a great thing. I'm very keen to see progress here.



+1

Pushing down aggregates would be very useful for analytical queries.

>

I've been thinking about this aggtransmultifn and I'm not sure if it's
really needed. Adding a whole series of new transition functions is
quite a pain. At least I think so, and I have a feeling Robert might
agree with me.

Let's imagine some worst case (and somewhat silly) aggregate query:

SELECT count(*)
FROM million_row_table
CROSS JOIN another_million_row_table;

Today that's going to cause 1 TRILLION transitions! Performance will
be terrible.

If we pushed the aggregate down into one of those tables and performed
a Partial Aggregate on that, then a Finalize Aggregate on that single
row result (after the join), then that's 1 million transfn calls, and
1 million combinefn calls, one for each row produced by the join.

If we did it your way (providing I understand your proposal correctly)
there's 1 million transfn calls on one relation, then 1 million on the
other and then 1 multiplyfn call. which does 100 * 100

What did we save vs. using the existing aggcombinefn infrastructure
which went into 9.6? Using this actually costs us 1 extra function
call, right? I'd imagine the size of the patch to use aggcombinefn
instead would be a fraction of the size of the one which included all
the new aggmultiplyfns and pg_aggregate.h changes.



I think the patch relies on the assumption that the grouping reduces 
cardinality, so a CROSS JOIN without a GROUP BY clause may not be the 
best counterexample.


Let's instead use an example similar to what Antonin mentioned in the 
initial post - two tables, with two columns each.


CREATE TABLE t1 (a INT, b INT);
CREATE TABLE t2 (c INT, d INT);

And let's assume each table has 100.000 rows, but only 100 groups in the 
first column, with 1000 rows per group. Something like


INSERT INTO t1
SELECT mod(i,100), i FROM generate_series(1, 1e5) s(i);

INSERT INTO t2
SELECT mod(i,100), i FROM generate_series(1, 1e5) s(i);

And let's assume this query

SELECT t1.a, count(t2.d) FROM t1 JOIN t2 ON (t1.a = t2.c)
 GROUP BY t1.a;

On master, EXPLAIN (COSTS OFF, TIMING OFF, ANALYZE) looks like this:

   QUERY PLAN
-
 HashAggregate (actual rows=100 loops=1)
   Group Key: t1.a
   ->  Hash Join (actual rows=1 loops=1)
 Hash Cond: (t2.c = t1.a)
 ->  Seq Scan on t2 (actual rows=10 loops=1)
 ->  Hash (actual rows=10 loops=1)
   Buckets: 131072  Batches: 2  Memory Usage: 2716kB
   ->  Seq Scan on t1 (actual rows=10 loops=1)
 Planning time: 0.167 ms
 Execution time: 17005.300 ms
(10 rows)

while with the patch it looks like this

 QUERY PLAN
-
 Finalize HashAggregate (actual rows=100 loops=1)
   Group Key: t1.a
   ->  Hash Join (actual rows=100 loops=1)
 Hash Cond: (t1.a = t2.c)
 ->  Partial HashAggregate (actual rows=100 loops=1)
   Group Key: t1.a
   ->  Seq Scan on t1 (actual rows=10 loops=1)
 ->  Hash (actual rows=100 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 14kB
   ->  Partial HashAggregate (actual rows=100 loops=1)
 Group Key: t2.c
 ->  Seq Scan on t2 (actual rows=10 loops=1)
 Planning time: 0.105 ms
 Execution time: 31.609 ms
(14 rows)

This of course happens because with the patch we run the transition 
function 200k-times (on each side of the join) and aggtransmultifn on 
the 100 rows produced by the join, while on master the join produces 
10.000.000 rows (which already takes much more time), and then have to 
run the transition function on all those rows.


The performance difference is pretty obvious, I guess.

>

There's already a lot of infrastructure in there to help you detect
when this optimisation can be applied. For example,

Re: [HACKERS] GSoC 2017

2017-01-16 Thread Jim Nasby

On 1/13/17 3:09 PM, Peter van Hardenberg wrote:

A new data type, and/or a new index type could both be nicely scoped
bits of work.


Did you have any particular data/index types in mind?

Personally I'd love something that worked like a python dictionary, but 
I'm not sure how that'd work without essentially supporting a variant 
data type. I've got code for a variant type[1], and I don't think 
there's any holes in it, but the casting semantics are rather ugly. IIRC 
that problem appeared to be solvable if there was a hook in the current 
casting code right before Postgres threw in the towel and said a cast 
was impossible.


1: https://github.com/BlueTreble/variant/
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2017

2017-01-16 Thread Jim Nasby

On 1/13/17 4:08 PM, Alvaro Herrera wrote:

Jim Nasby wrote:

On 1/10/17 1:53 AM, Alexander Korotkov wrote:

1. What project ideas we have?


Perhaps allowing SQL-only extensions without requiring filesystem files
would be a good project.


Don't we already have that in patch form?  Dimitri submitted it as I
recall.


My recollection is that he tried to boil the ocean and also support 
handing compiled C libraries to the database, which was enough to sink 
the patch. It might be nice to support that if we could, and maybe it 
could be a follow-on project.


I do think complete lack of support for non-FS extensions is *seriously* 
hurting use of the feature thanks to environments like RDS and heroku. 
As Pavel mentioned, untrusted languages are in a similar boat. So maybe 
the best way to address these things is to advertise them as "increase 
usability in cloud environments" since cloud excites people.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PSQL commands: \quit_if, \quit_unless

2017-01-16 Thread Jim Nasby

On 1/13/17 11:22 PM, Tom Lane wrote:

I wonder what depth of include-file nesting
psql can support, or whether we'll be able to fix it to optimize tail
recursion of an include file.  Because somebody will be asking for that
if this is the toolset you give them.


I think the solution to that is straightforward: tell users that we hope 
to eventually support loops and that in the meantime if you try to work 
around that with recursion you get to keep both pieces when it breaks. 
While not ideal I think that's a lot better than throwing the whole idea 
out because some people will abuse it...

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Packages: Again

2017-01-16 Thread Jim Nasby

On 1/13/17 10:56 PM, Serge Rielau wrote:

But what irks me in this debate is that any reasoned and detailed
argumentation of value of the principle itself is shut down with
un-reasoned and un-detailed one-liners.
“I’m not convinced” is not an argument.
Counterpoints require content. Something starting with “because …”


+1.

I really can't fathom how someone can flatly say that a nested namespace 
is a dumb idea. Your filesystem does this. So does plpgsql. I could 
absolutely make use of nested "schemas" (or make it some other feature 
if "nested schema" offends you so much).


I agree that a nested namespace might violate ANSI (depending on if you 
overload schema/module), and that it might be hard to do with how 
Postgres currently works. And those may be reason enough not to attempt 
the effort. But those have *nothing* to do with how useful such a 
feature would be to users.


This is similar to the 10+ years users would ask for "DDL triggers" and 
get jumped all over because of how hard it would be to actually put a 
trigger on a catalog table. Thankfully someone that knew the code AND 
understood the user desire came up with the notion of event triggers 
that put hooks into every individual DDL command. Users got what they 
wanted, without any need to put "real triggers" on catalog tables.


FWIW, what I wish for in this area is:

- SOME kind of nested namespace for mulitple kinds of objects (not just 
functions)

- A way to mark those namespaces (and possibly other objects) as private.
- A way to reference extensions from other extensions and deal with 
extensions being moved to a different schema (or namespace).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-01-16 Thread Michael Paquier
On Tue, Jan 17, 2017 at 2:21 AM, Karl O. Pinc  wrote:
> On January 15, 2017 11:47:51 PM CST, Michael Paquier 
>  wrote:
>>On Mon, Jan 16, 2017 at 1:14 PM, Karl O. Pinc  wrote:
>>With all those issues fixed, I finish with the attached, that I am
>>fine to pass down to a committer. I still think that this should be
>>only one function using a SRF with rows shaped as (type, path) for
>>simplicity, but as I am visibly outnumbered I won't insist further.
>
> That also makes a certain amount of sense to me but I can't say I have 
> thought deeply about it. Haven't paid any attention to this issue and am fine 
> letting things move forward as is.

Gilles, what's your opinion here? As the author that's your call at
the end. What the point here is would be to change
pg_current_logfiles() to something like that:
=# SELECT * FROM pg_current_logfiles();
 method | file
|
 stderr | pg_log/postgresql.log
 csvlog | pg_log/postgresql.csv
And using this SRF users can filter the method with a WHERE clause.
And as a result the 1-arg version is removed. No rows are returned if
current_logfiles does not exist. I don't think there is much need for
a system view either.

>>Also, I would rather see an ERROR returned to the user in case of bad
>>data in current_logfiles, I did not change that either as that's the
>>original intention of Gilles.
>
> I could be wrong but I seem to recall that Robert recommended against an 
> error message. If there is bad data then something is really wrong, up to 
> some sort of an attack on the back end. Because this sort of thing simply 
> shouldn't happen it's enough in my opinion to guard against buffer overruns 
> etc and just get on with life. If something goes unexpectedly and badly wrong 
> with internal data structures in general there would have to be all kinds of 
> additional code to cover every possible problem and that doesn't seem to make 
> sense.

Hm... OK. At the same time not throwing at least a WARNING is
confusing, because a NULL result is returned as well even if the input
method is incorrect and even if the method is correct but it is not
present in current_logfiles. As the user is thought as a trusted user
as it has access to this function, I would think that being verbose on
the error handling, or at least warnings, would make things easier to
analyze.

> Sorry about the previous email with empty content. My email client got away 
> from me.

No problem. That happens.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Polyphase merge is obsolete

2017-01-16 Thread Peter Geoghegan
On Wed, Oct 12, 2016 at 10:16 AM, Heikki Linnakangas  wrote:
> The number of *input* tapes we can use in each merge pass is still limited,
> by the memory needed for the tape buffers and the merge heap, but only one
> output tape is active at any time. The inactive output tapes consume very
> few resources. So the whole idea of trying to efficiently reuse input tapes
> as output tapes is pointless

I picked this up again. The patch won't apply cleanly. Can you rebase?
Also, please look at my bugfix for logtape.c free block management [1]
before doing so, as that might be prerequisite. Finally, I don't think
that the Logical tape pause/resume idea is compelling. Is it hard to
not do that, but still do everything else that you propose here?
That's what I lean towards doing right now.

Anyway, efficient use of tapes certainly mattered a lot more when
rewinding meant sitting around for a magnetic tape deck to physically
rewind. There is another algorithm in AoCP Vol III that lets us write
to tapes backwards, actually, which is motivated by similar obsolete
considerations about hardware. Why not write while we rewind, to avoid
doing nothing else while rewinding?!

Perhaps this patch should make a clean break from the "rewinding"
terminology. Perhaps you should rename LogicalTapeRewindForRead() to
LogicalTapePrepareForRead(), and so on. It's already a bit awkward
that that routine is called LogicalTapeRewindForRead(), because it
behaves significantly differently when a tape is frozen, and because
the whole point of logtape.c is space reuse that is completely
dissimilar to rewinding. (Space reuse is thus totally unlike how
polyphase merge is supposed to reuse space, which is all about
rewinding, and isn't nearly as eager. Same applies to K-way balanced
merge, of course.)

I think that the "rewinding" terminology does more harm than good, now
that it doesn't even help the Knuth reader to match Knuth's remarks to
what's going on in tuplesort.c. Just a thought.

> Let's switch over to a simple k-way balanced merge. Because it's simpler. If
> you're severely limited on memory, like when sorting 1GB of data with
> work_mem='1MB' or less, it's also slightly faster. I'm not too excited about
> the performance aspect, because in the typical case of a single-pass merge,
> there's no difference. But it would be worth changing on simplicity grounds,
> since we're mucking around in tuplesort.c anyway.

I actually think that the discontinuities in the merge scheduling are
worse than you suggest here. There doesn't have to be as extreme a
difference between work_mem and the size of input as you describe
here. As an example:

create table seq_tab(t int);
insert into seq_tab select generate_series(1, 1000);
set work_mem = '4MB';
select count(distinct t) from seq_tab;

The trace_sort output ends like this:

30119/2017-01-16 17:17:05 PST LOG:  begin datum sort: workMem = 4096,
randomAccess = f
30119/2017-01-16 17:17:05 PST LOG:  switching to external sort with 16
tapes: CPU: user: 0.07 s, system: 0.00 s, elapsed: 0.06 s
30119/2017-01-16 17:17:05 PST LOG:  starting quicksort of run 1: CPU:
user: 0.07 s, system: 0.00 s, elapsed: 0.06 s
30119/2017-01-16 17:17:05 PST LOG:  finished quicksort of run 1: CPU:
user: 0.07 s, system: 0.00 s, elapsed: 0.07 s
*** SNIP ***
30119/2017-01-16 17:17:08 PST LOG:  finished writing run 58 to tape 0:
CPU: user: 2.50 s, system: 0.27 s, elapsed: 2.78 s
30119/2017-01-16 17:17:08 PST LOG:  using 4095 KB of memory for read
buffers among 15 input tapes
30119/2017-01-16 17:17:08 PST LOG:  finished 1-way merge step: CPU:
user: 2.52 s, system: 0.28 s, elapsed: 2.80 s
30119/2017-01-16 17:17:08 PST LOG:  finished 4-way merge step: CPU:
user: 2.58 s, system: 0.30 s, elapsed: 2.89 s
30119/2017-01-16 17:17:08 PST LOG:  finished 14-way merge step: CPU:
user: 2.86 s, system: 0.34 s, elapsed: 3.20 s
30119/2017-01-16 17:17:08 PST LOG:  finished 14-way merge step: CPU:
user: 3.09 s, system: 0.41 s, elapsed: 3.51 s
30119/2017-01-16 17:17:09 PST LOG:  finished 15-way merge step: CPU:
user: 3.61 s, system: 0.52 s, elapsed: 4.14 s
30119/2017-01-16 17:17:09 PST LOG:  performsort done (except 15-way
final merge): CPU: user: 3.61 s, system: 0.52 s, elapsed: 4.14 s
30119/2017-01-16 17:17:10 PST LOG:  external sort ended, 14678 disk
blocks used: CPU: user: 4.93 s, system: 0.57 s, elapsed: 5.51 s

(This is the test case that Cy posted earlier today, for the bug that
was just fixed in master.)

The first 1-way merge step is clearly kind of a waste of time. We
incur no actual comparisons during this "merge", since there is only
one real run from each input tape (all other active tapes contain only
dummy runs). We are, in effect, just shoveling the tuples from that
single run from one tape to another (from one range in the underlying
logtape.c BufFile space to another). I've seen this quite a lot
before, over the years, while working on sorting. It's not that big of
a deal, but it's a degenerate case that 

Re: [HACKERS] pg_hba_file_settings view patch

2017-01-16 Thread Haribabu Kommi
On Tue, Jan 10, 2017 at 6:35 PM, Michael Paquier 
wrote:

> On Thu, Jan 5, 2017 at 1:58 PM, Michael Paquier
>  wrote:
> > Could you hold on a bit to commit that? I'd like to look at it in more
> > details. At quick glance, there is for example no need to use
> > CreateTemplateTupleDesc and list the columns both in pg_proc.h and the
> > C routine itself. And memset() can be used in fill_hba_line for the
> > error code path.
>
> And here we go.
>

Thanks for the review.


> +
> +postgres=# select * from pg_hba_rules;
> [... large example ...]
> +
> +
> It would be nice to reduce the width of this example. That's not going
> to be friendly with the generated html.
>

Added a small example.

+   switch (hba->conntype)
> +   {
> +   case ctLocal:
> +   values[index] = CStringGetTextDatum("local");
> +   break;
> +   case ctHost:
> +   values[index] = CStringGetTextDatum("host");
> +   break;
> +   case ctHostSSL:
> +   values[index] = CStringGetTextDatum("hostssl");
> +   break;
> +   case ctHostNoSSL:
> +   values[index] = CStringGetTextDatum("hostnossl");
> +   break;
> +   default:
> +   elog(ERROR, "unexpected connection type in parsed HBA
> entry");
> +   break;
> +   }
> Here let's remove the break clause and let compilers catch problem
> when they show up.
>

Removed.


> +   if (hba->pamservice)
> +   {
> +   initStringInfo();
> +   appendStringInfoString(, "pamservice=");
> +   appendStringInfoString(, hba->pamservice);
> +   options[noptions++] = CStringGetTextDatum(str.data);
> +   }
> There is a bunch of code duplication here. I think that it would make
> more sense to encapsulate that into a routine, at least let's use
> appendStringInfo and let's group the two calls together.
>

Use a new function to reduce the repeated lines of code.


> +/* LDAP supports 10 currently, keep this well above the most any
> method needs */
> +#define MAX_OPTIONS 12
> Er, why? There is an assert already, that should be enough.
>

Which Assert? This macro is used to verify that the maximum number
of authentication options that are possible for a single hba line.



> =# \d pg_hba_rules
>View "pg_catalog.pg_hba_rules"
>   Column  |  Type   | Collation | Nullable | Default
> --+-+---+--+-
>  line_number  | integer |   |  |
>  type | text|   |  |
>  keyword_database | text[]  |   |  |
>  database | text[]  |   |  |
>  keyword_user | text[]  |   |  |
>  user_name| text[]  |   |  |
>  keyword_address  | text|   |  |
>  address  | inet|   |  |
>  netmask  | inet|   |  |
>  hostname | text|   |  |
>  method   | text|   |  |
>  options  | text[]  |   |  |
>  error| text|   |  |
> keyword_database and database map actually to the same thing if you
> refer to a raw pg_hba.conf file because they have the same meaning for
> user. You could simplify the view and just remove keyword_database,
> keyword_user and keyword_address. This would simplify your patch as
> well with all hte mumbo-jumbo to see if a string is a dedicated
> keyword or not. In its most simple shape pg_hba_rules should show to
> the user as an exact map of the entries of the raw file.
>

I removed keyword_database and keyword_user columns where the data
in those columns can easily represent with the database and user columns.
But for address filed can contains keywords such as "same host" and etc and
also a hostname also. Because of this reason, this filed is converted into
3 columns in the view.

I have copied the example file of pg_hba.conf, reloaded it:
> https://www.postgresql.org/docs/devel/static/auth-pg-hba-conf.html
> And then the output result gets corrupted by showing up free()'d results:
> null   | null| \x7F\x7F\x7F\x7F\x7F
>

There was a problem in resetting the error string, working with attached
patch.


> +   if (err_msg)
> +   {
> +   /* type */
> +   index++;
> +   nulls[index] = true;
> [... long sequence ...]
> Please let's use MemSet here and remove this large chunk of code..
>

Done.


> +   if (!superuser())
> +   ereport(ERROR,
> +   (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> +(errmsg("must be superuser to view pg_hba.conf
> settings";
> Access to the function is already revoked, so what's the point of this
> superuser check?
>

Removed.


>
> +   tupdesc = CreateTemplateTupleDesc(NUM_PG_HBA_LOOKUP_ATTS, false);
> +   TupleDescInitEntry(tupdesc, (AttrNumber) 1, 

Re: [HACKERS] PoC: Grouped base relation

2017-01-16 Thread David Rowley
On 10 January 2017 at 06:56, Antonin Houska  wrote:
> Before performing the final aggregation, we need to multiply sum(a.x) by
> count(j.*) because w/o the aggregation at base relation level the input
> of the query-level aggregation would look like
>
>  a.i | a.x | b.j
> 
>  1   |   3 |  1
>  1   |   4 |  1
>  1   |   3 |  1
>  1   |   4 |  1
>
> In other words, grouping of the base relation "b" below the join prevents the
> join from bringing per-group input set to the aggregate input multiple
> times. To compensate for this effect, I've added a new field "aggtransmultifn"
> to pg_aggregate catalog. It "multiplies" the aggregate state w/o repeated
> processing of the same input set many times. sum() is an example of an
> aggregate that needs such processing, avg() is one that does not.

First off, I'd like to say that making improvements in this area
sounds like a great thing. I'm very keen to see progress here.

I've been thinking about this aggtransmultifn and I'm not sure if it's
really needed. Adding a whole series of new transition functions is
quite a pain. At least I think so, and I have a feeling Robert might
agree with me.

Let's imagine some worst case (and somewhat silly) aggregate query:

SELECT count(*)
FROM million_row_table
CROSS JOIN another_million_row_table;

Today that's going to cause 1 TRILLION transitions! Performance will
be terrible.

If we pushed the aggregate down into one of those tables and performed
a Partial Aggregate on that, then a Finalize Aggregate on that single
row result (after the join), then that's 1 million transfn calls, and
1 million combinefn calls, one for each row produced by the join.

If we did it your way (providing I understand your proposal correctly)
there's 1 million transfn calls on one relation, then 1 million on the
other and then 1 multiplyfn call. which does 100 * 100

What did we save vs. using the existing aggcombinefn infrastructure
which went into 9.6? Using this actually costs us 1 extra function
call, right? I'd imagine the size of the patch to use aggcombinefn
instead would be a fraction of the size of the one which included all
the new aggmultiplyfns and pg_aggregate.h changes.

There's already a lot of infrastructure in there to help you detect
when this optimisation can be applied. For example,
AggClauseCosts.hasNonPartial will be true if any aggregates don't have
a combinefn, or if there's any DISTINCT or ORDER BY aggregates,
which'll also mean you can't apply the optimisation.

It sounds like a much more manageable project by using aggcombinefn
instead. Then maybe one day when we can detect if a join did not cause
any result duplication (i.e Unique Joins), we could finalise the
aggregates on the first call, and completely skip the combine state
altogether.

Thanks for your work on this.

Regards

David Rowley


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2017-01-16 Thread Alvaro Herrera
In case this still matters, I think GetValue should look more or less
like this (untested):


/*
 * Return the value for column number 'colnum' for the current row.  If column
 * -1 is requested, return representation of the whole row.
 *
 * This leaks memory, so be sure to reset often the context in which it's
 * called.
 */
static Datum
XmlTableGetValue(TableExprState *state, int colnum, bool *isnull)
{
#ifdef USE_LIBXML
XmlTableBuilderData *xtCxt;
Datum   result = (Datum) 0;
xmlNodePtr  cur;
char   *cstr = NULL;
volatile xmlXPathObjectPtr xpathobj;

xtCxt = GetXmlTableBuilderPrivateData(state, "XmlTableGetValue");

Assert(xtCxt->xpathobj &&
   xtCxt->xpathobj->type == XPATH_NODESET &&
   xtCxt->xpathobj->nodesetval != NULL);

/* Propagate context related error context to libxml2 */
xmlSetStructuredErrorFunc((void *) xtCxt->xmlerrcxt, xml_errorHandler);

cur = xtCxt->xpathobj->nodesetval->nodeTab[xtCxt->row_count - 1];
if (cur->type != XML_ELEMENT_NODE)
elog(ERROR, "unexpected xmlNode type");

/* Handle whole row case the easy way. */
if (colnum == -1)
{
text   *txt;

txt = xml_xmlnodetoxmltype(cur, xtCxt->xmlerrcxt);
result = InputFunctionCall(>in_functions[0],
   
text_to_cstring(txt),
   
state->typioparams[0],
   -1);
*isnull = false;

return result;
}

Assert(xtCxt->xpathscomp[colnum] != NULL);

xpathobj = NULL;
PG_TRY();
{
Form_pg_attribute attr;

attr = state->resultSlot->tts_tupleDescriptor->attrs[colnum];

/* Set current node as entry point for XPath evaluation */
xmlXPathSetContextNode(cur, xtCxt->xpathcxt);

/* Evaluate column path */
xpathobj = xmlXPathCompiledEval(xtCxt->xpathscomp[colnum], 
xtCxt->xpathcxt);
if (xpathobj == NULL || xtCxt->xmlerrcxt->err_occurred)
xml_ereport(xtCxt->xmlerrcxt, ERROR, 
ERRCODE_INTERNAL_ERROR,
"could not create XPath 
object");

if (xpathobj->type == XPATH_NODESET)
{
int count;
Oid targettypid = attr->atttypid;

if (xpathobj->nodesetval != NULL)
count = xpathobj->nodesetval->nodeNr;

/*
 * There are four possible cases, depending on the 
number of
 * nodes returned by the XPath expression and the type 
of the
 * target column: a) XPath returns no nodes.  b) One 
node is
 * returned, and column is of type XML.  c) One node, 
column type
 * other than XML.  d) Multiple nodes are returned.
 */
if (xpathobj->nodesetval == NULL)
{
*isnull = true;
}
else if (count == 1 && targettypid == XMLOID)
{
textstr = 
xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[0],

   xtCxt->xmlerrcxt);
cstr = text_to_cstring(textstr);
}
else if (count == 1)
{
xmlChar*str;

str = xmlNodeListGetString(xtCxt->doc,

   xpathobj->nodesetval->nodeTab[0]->xmlChildrenNode,

   1);
if (str)
{
PG_TRY();
{
cstr = pstrdup(str);
}
PG_CATCH();
{
xmlFree(str);
PG_RE_THROW();
}
PG_END_TRY();
xmlFree(str);
}
else
   

Re: [HACKERS] patch: function xmltable

2017-01-16 Thread Alvaro Herrera
Given
https://www.postgresql.org/message-id/20170116210019.a3glfwspg5lnf...@alap3.anarazel.de
which is going to heavily change how the executor works in this area, I
am returning this patch to you again.  I would like a few rather minor
changes:

1. to_xmlstr can be replaced with calls to xmlCharStrdup.
2. don't need xml_xmlnodetostr either -- just use xml_xmlnodetoxmltype
   (which returns text*) and extract the cstring from the varlena.  It's
   a bit more wasteful in terms of cycles, but I don't think we care.
   If we do care, change the function so that it returns cstring, and
   have the callers that want text wrap it in cstring_to_text.
3. have a new perValueCxt memcxt in TableExprState, child of buildercxt,
   and switch to it just before GetValue() (reset it just before
   switching).  Then, don't worry about leaks in GetValue.  This way,
   the text* conversions et al don't matter.

After that I think we're going to need to get this working on top of
Andres' changes.  Which I'm afraid is going to be rather major surgery,
but I haven't looked.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Permit dump/reload of not-too-large >1GB tuples

2017-01-16 Thread Tom Lane
Alvaro Herrera  writes:
> Permit dump/reload of not-too-large >1GB tuples

I noticed that this commit has created an overflow hazard on 64-bit
hardware: it's not difficult to attempt to make a tuple exceeding 4GB.
You just need a few GB-sized input values.  But that's uncool for
a couple of reasons:

* HeapTupleData.t_len can't represent it (it's only uint32)

* heap_form_tuple sets up the tuple to possibly become a composite
datum, meaning it applies SET_VARSIZE.  That's going to silently
overflow for any size above 1GB.

In short, as this stands it's a security problem.

I'm not sure whether it's appropriate to try to change t_len to type Size
to address the first problem.  We could try, but I don't know what the
fallout would be.  Might be better to just add a check disallowing tuple
size >= 4GB.

As for the second problem, we clearly don't have any prospect of allowing
composite datums that exceed 1GB, since they have to have varlena length
words.  We can allow plain HeapTuples to exceed that, but it'd be
necessary to distinguish whether the value being built is going to become
a composite datum or not.  That requires an API break for heap_form_tuple,
or else a separate function to convert a HeapTuple to Datum (and apply a
suitable length check).  Either way it's a bit of a mess.

In any case, a great deal more work is needed before this can
possibly be safe.  I recommend reverting it pending that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2017-01-16 Thread Alvaro Herrera
Andres Freund wrote:
> On 2017-01-16 16:04:34 -0300, Alvaro Herrera wrote:
> > Andres Freund wrote:
> > > On 2017-01-16 12:17:46 -0300, Alvaro Herrera wrote:
> > > > Hmm.  I wonder if your stuff could be used as support code for
> > > > XMLTABLE[1].
> > > 
> > > I don't immediately see what functionality overlaps, could you expand on
> > > that?
> > 
> > Well, I haven't read any previous patches in this area, but the xmltable
> > patch adds a new way of handling set-returning expressions, so it
> > appears vaguely related.
> 
> Ugh. That's not good - I'm about to remove isDone. Like completely.
> That's why I'm actually working on all this, because random expressions
> returning more rows makes optimizing expression evaluation a lot harder.

Argh.

> > These aren't properly functions in the current sense of the word,
> > though.
> 
> Why aren't they? Looks like it'd be doable to do so, at least below the
> parser level?

Hmm ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] check_srf_call_placement() isn't always setting p_hasTargetSRFs

2017-01-16 Thread Andres Freund
On 2017-01-16 14:34:58 -0500, Tom Lane wrote:
> Will go fix these things.

Thanks!

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2017-01-16 Thread Andres Freund
On 2017-01-16 16:04:34 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > On 2017-01-16 12:17:46 -0300, Alvaro Herrera wrote:
> > > Hmm.  I wonder if your stuff could be used as support code for
> > > XMLTABLE[1].
> > 
> > I don't immediately see what functionality overlaps, could you expand on
> > that?
> 
> Well, I haven't read any previous patches in this area, but the xmltable
> patch adds a new way of handling set-returning expressions, so it
> appears vaguely related.

Ugh. That's not good - I'm about to remove isDone. Like completely.
That's why I'm actually working on all this, because random expressions
returning more rows makes optimizing expression evaluation a lot harder.

> These aren't properly functions in the current sense of the word,
> though.

Why aren't they? Looks like it'd be doable to do so, at least below the
parser level?


Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2017-01-16 Thread Andres Freund
Hi,

On 2017-01-16 14:13:18 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > That worked quite well.  So we have a few questions, before I clean this
> > up:
>
> > - For now the node is named 'Srf' both internally and in explain - not
> >   sure if we want to make that something longer/easier to understand for
> >   others? Proposals? TargetFunctionScan? SetResult?
>
> "Srf" is ugly as can be, and unintelligible.  SetResult might be OK.

Named it SetResult - imo looks ok.  I think I do prefer the separate
node type over re-using Result.  The planner integration looks cleaner
to me due to not needing the srfpp special cases and such.


> > Comments?
>
> Hard to comment on your other points without a patch to look at.

Attached the current version. There's a *lot* of pending cleanup needed
(especially in execQual.c) removing outdated code/comments etc, but this
seems good enough for a first review.  I'd want that cleanup done in a
separate patch anyway.


Attached are two patches. The first is just a rebased version (just some
hunk offset changed) of your planner patch, on top of that is my
executor patch.  My patch moves some minor detail in yours around, and I
do think they should eventually be merged; but leaving it split for a
round displays the changes more cleanly.

Additional questions:
- do we care about SRFs that don't actually return a set? If so we need
  to change the error checking code in ExecEvalFunc/Oper and move it to
  the actual invocation.
- the FuncExpr/OpExpr check in ExecMakeFunctionResult is fairly ugly imo
  - but I don't quite see a much better solution.

Greetings,

Andres
>From 2c16e67f46f418239ab90a51611f168508bac66e Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sun, 15 Jan 2017 19:23:22 -0800
Subject: [PATCH 1/2] Put SRF into a separate node v1.

Author: Tom Lane
Discussion: https://postgr.es/m/557.1473895...@sss.pgh.pa.us
---
 src/backend/nodes/outfuncs.c |   1 +
 src/backend/optimizer/plan/createplan.c  |  33 -
 src/backend/optimizer/plan/planner.c | 219 +--
 src/backend/optimizer/util/clauses.c | 104 ++-
 src/backend/optimizer/util/pathnode.c|  75 +++
 src/backend/optimizer/util/tlist.c   | 199 
 src/include/nodes/relation.h |   1 +
 src/include/optimizer/clauses.h  |   1 -
 src/include/optimizer/pathnode.h |   4 +
 src/include/optimizer/tlist.h|   3 +
 src/test/regress/expected/aggregates.out |   3 +-
 src/test/regress/expected/limit.out  |  10 +-
 src/test/regress/expected/rangefuncs.out |  10 +-
 src/test/regress/expected/subselect.out  |  26 ++--
 src/test/regress/expected/tsrf.out   |  11 +-
 15 files changed, 544 insertions(+), 156 deletions(-)

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index cf0a6059e9..73fdc9706d 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1805,6 +1805,7 @@ _outProjectionPath(StringInfo str, const ProjectionPath *node)
 
 	WRITE_NODE_FIELD(subpath);
 	WRITE_BOOL_FIELD(dummypp);
+	WRITE_BOOL_FIELD(srfpp);
 }
 
 static void
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index c7bcd9b84c..875de739a8 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1421,8 +1421,21 @@ create_projection_plan(PlannerInfo *root, ProjectionPath *best_path)
 	Plan	   *subplan;
 	List	   *tlist;
 
-	/* Since we intend to project, we don't need to constrain child tlist */
-	subplan = create_plan_recurse(root, best_path->subpath, 0);
+	/*
+	 * XXX Possibly-temporary hack: if the subpath is a dummy ResultPath,
+	 * don't bother with it, just make a Result with no input.  This avoids an
+	 * extra Result plan node when doing "SELECT srf()".  Depending on what we
+	 * decide about the desired plan structure for SRF-expanding nodes, this
+	 * optimization might have to go away, and in any case it'll probably look
+	 * a good bit different.
+	 */
+	if (IsA(best_path->subpath, ResultPath) &&
+		((ResultPath *) best_path->subpath)->path.pathtarget->exprs == NIL &&
+		((ResultPath *) best_path->subpath)->quals == NIL)
+		subplan = NULL;
+	else
+		/* Since we intend to project, we don't need to constrain child tlist */
+		subplan = create_plan_recurse(root, best_path->subpath, 0);
 
 	tlist = build_path_tlist(root, _path->path);
 
@@ -1441,8 +1454,9 @@ create_projection_plan(PlannerInfo *root, ProjectionPath *best_path)
 	 * creation, but that would add expense to creating Paths we might end up
 	 * not using.)
 	 */
-	if (is_projection_capable_path(best_path->subpath) ||
-		tlist_same_exprs(tlist, subplan->targetlist))
+	if (!best_path->srfpp &&
+		(is_projection_capable_path(best_path->subpath) ||
+		 tlist_same_exprs(tlist, subplan->targetlist)))
 	{
 		/* Don't need a separate Result, just assign tlist to subplan */
 

Re: [HACKERS] check_srf_call_placement() isn't always setting p_hasTargetSRFs

2017-01-16 Thread Tom Lane
I wrote:
> Andres Freund  writes:
>> I wonder if there should be a seperate expression type for
>> the INSERT ... VALUES(exactly-one-row); since that behaves quite
>> differently.

> Perhaps.  Or maybe we should just use EXPR_KIND_SELECT_TARGET for that?

After looking around, I think we probably better use a different
EXPR_KIND; even if all the functionality is identical, we don't want
ParseExprKindName() to say "SELECT" when we're throwing an error for
INSERT...VALUES.

Also, I noticed that we don't actually allow SRFs in VALUES RTEs:

regression=# select * from (values(1,generate_series(11,13)),(2,0)) v;
ERROR:  set-valued function called in context that cannot accept a set

That's because ValuesNext doesn't handle it.  I'm not particularly
excited about fixing that, given that it's always been that way and
no one has complained yet.  But check_srf_call_placement() is misinformed,
since it thinks the case works.

Will go fix these things.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2017-01-16 Thread Alvaro Herrera
I just realized that your new xml_xmlnodetostr is line-by-line identical
to the existing xml_xmlnodetoxmltype except for two or three lines.
I think that's wrong.  I'm going to patch the existing function so that
they can share code.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2017-01-16 Thread Tom Lane
Andres Freund  writes:
> That worked quite well.  So we have a few questions, before I clean this
> up:

> - For now the node is named 'Srf' both internally and in explain - not
>   sure if we want to make that something longer/easier to understand for
>   others? Proposals? TargetFunctionScan? SetResult?

"Srf" is ugly as can be, and unintelligible.  SetResult might be OK.

> - I continued with the division of Labor that Tom had set up, so we're
>   creating one Srf node for each "nested" set of SRFs.  We'd discussed
>   nearby to change that for one node/path for all nested SRFs, partially
>   because of costing.  But I don't like the idea that much anymore. The
>   implementation seems cleaner (and probably faster) this way, and I
>   don't think nested targetlist SRFs are something worth optimizing
>   for.  Anybody wants to argue differently?

Not me.

> Comments?

Hard to comment on your other points without a patch to look at.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] check_srf_call_placement() isn't always setting p_hasTargetSRFs

2017-01-16 Thread Tom Lane
Andres Freund  writes:
> Is there a reason not to just set p_hasTargetSRFs once towards the end
> of the function, instead of doing so for all the non-error cases?

Yes: it's not supposed to get set when the SRF is in FROM.

> I wonder if there should be a seperate expression type for
> the INSERT ... VALUES(exactly-one-row); since that behaves quite
> differently.

Perhaps.  Or maybe we should just use EXPR_KIND_SELECT_TARGET for that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2017-01-16 Thread Alvaro Herrera
Andres Freund wrote:
> On 2017-01-16 12:17:46 -0300, Alvaro Herrera wrote:
> > Andres Freund wrote:
> > 
> > > That worked quite well.  So we have a few questions, before I clean this
> > > up:
> > > 
> > > - For now the node is named 'Srf' both internally and in explain - not
> > >   sure if we want to make that something longer/easier to understand for
> > >   others? Proposals? TargetFunctionScan? SetResult?
> > > 
> > > - We could alternatively add all this into the Result node - it's not
> > >   actually a lot of new code, and most of that is boilerplate stuff
> > >   about adding a new node.  I'm ok with both.
> > 
> > Hmm.  I wonder if your stuff could be used as support code for
> > XMLTABLE[1].
> 
> I don't immediately see what functionality overlaps, could you expand on
> that?

Well, I haven't read any previous patches in this area, but the xmltable
patch adds a new way of handling set-returning expressions, so it
appears vaguely related.  These aren't properly functions in the current
sense of the word, though.  There is some parallel to what
ExecMakeFunctionResult does, which I suppose is related.

> > Currently it has a bit of additional code of its own,
> > though admittedly it's very little code executor-side.  Would you mind
> > sharing a patch, or more details on how it works?
> 
> Can do both; cleaning up the patch now. What we're talking about here is
> a way to implement targetlist SRF that is based on:
> 
> 1) a patch by Tom that creates additional Result (or now Srf) executor
>nodes containing SRF evaluation. This guarantees that only Result/Srf
>nodes have to deal with targetlist SRF evaluation.
> 
> 2) new code to evaluate SRFs in the new Result/Srf node, that doesn't
>rely on ExecEvalExpr et al. to have a IsDone argument. Instead
>there's special code to handle that in the new node. That's possible
>because it's now guaranted that all SRFs are "toplevel" in the
>relevant targetlist(s).
> 
> 3) Removal of all nearly tSRF related code execQual.c and other
>executor/ files, including the node->ps.ps_TupFromTlist checks
>everywhere.
> 
> makes sense?

Hmm, okay.  (The ps_TupFromTlist thing has long seemed an ugly
construction.)  I think the current term for this kind of thing is
TableFunction -- are you really naming this "Srf" literally?  It seems
strange, but maybe it's just me.

> > Nested targetlist SRFs make my head spin.  I suppose they may have some
> > use, but where would you really want this:
> 
> I think there's some cases where it can be useful. Targetlist SRFs as a
> whole really are much more about backward compatibility than anything :)

Sure.

> > ?  If supporting the former makes it harder to support/optimize more
> > reasonable cases, it seems fair game to leave them behind.
> 
> I don't want to desupport them, just don't want to restructure (one node
> doing several levels of SRFs, instead of one per level) just to make it
> easier to give good estimates.

No objections.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuple sort is broken. It crashes on simple test.

2017-01-16 Thread Pavel Stehule
2017-01-16 19:43 GMT+01:00 Peter Geoghegan :

> On Mon, Jan 16, 2017 at 10:38 AM, Pavel Stehule 
> wrote:
> > Should not be enhanced regress tests too?
>
> We already have coverage of multi-pass external tuplesorts, as of a
> few months back. That didn't catch this bug only because it was a
> pass-by-value datum tuplesort. The relevant RELEASE_SLAB_SLOT() call
> does have line coverage already.
>
> I wouldn't object to adding a test case that would have exercised this
> bug, too. It took me a while to talk Tom into the test that was added
> several months back, which discouraged me from adding another test
> case here. (There were concerns about the overhead of an external sort
> test on slower buildfarm animals.)
>

ok



>
> --
> Peter Geoghegan
>


Re: [HACKERS] Tuple sort is broken. It crashes on simple test.

2017-01-16 Thread Tom Lane
Peter Geoghegan  writes:
> The problem was that one particular call to the macro
> RELEASE_SLAB_SLOT() happened to lack a test-for-NULL-argument needed
> by pass-by-value datum cases. The other two RELEASE_SLAB_SLOT() calls
> already have such a check.

> Attached patch fixes the bug.

Pushed, thanks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2017-01-16 Thread Andres Freund
On 2017-01-16 12:17:46 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> 
> > That worked quite well.  So we have a few questions, before I clean this
> > up:
> > 
> > - For now the node is named 'Srf' both internally and in explain - not
> >   sure if we want to make that something longer/easier to understand for
> >   others? Proposals? TargetFunctionScan? SetResult?
> > 
> > - We could alternatively add all this into the Result node - it's not
> >   actually a lot of new code, and most of that is boilerplate stuff
> >   about adding a new node.  I'm ok with both.
> 
> Hmm.  I wonder if your stuff could be used as support code for
> XMLTABLE[1].

I don't immediately see what functionality overlaps, could you expand on
that?


> Currently it has a bit of additional code of its own,
> though admittedly it's very little code executor-side.  Would you mind
> sharing a patch, or more details on how it works?

Can do both; cleaning up the patch now. What we're talking about here is
a way to implement targetlist SRF that is based on:

1) a patch by Tom that creates additional Result (or now Srf) executor
   nodes containing SRF evaluation. This guarantees that only Result/Srf
   nodes have to deal with targetlist SRF evaluation.

2) new code to evaluate SRFs in the new Result/Srf node, that doesn't
   rely on ExecEvalExpr et al. to have a IsDone argument. Instead
   there's special code to handle that in the new node. That's possible
   because it's now guaranted that all SRFs are "toplevel" in the
   relevant targetlist(s).

3) Removal of all nearly tSRF related code execQual.c and other
   executor/ files, including the node->ps.ps_TupFromTlist checks
   everywhere.

makes sense?


> > - I continued with the division of Labor that Tom had set up, so we're
> >   creating one Srf node for each "nested" set of SRFs.  We'd discussed
> >   nearby to change that for one node/path for all nested SRFs, partially
> >   because of costing.  But I don't like the idea that much anymore. The
> >   implementation seems cleaner (and probably faster) this way, and I
> >   don't think nested targetlist SRFs are something worth optimizing
> >   for.  Anybody wants to argue differently?
> 
> Nested targetlist SRFs make my head spin.  I suppose they may have some
> use, but where would you really want this:

I think there's some cases where it can be useful. Targetlist SRFs as a
whole really are much more about backward compatibility than anything :)


> ?  If supporting the former makes it harder to support/optimize more
> reasonable cases, it seems fair game to leave them behind.

I don't want to desupport them, just don't want to restructure (one node
doing several levels of SRFs, instead of one per level) just to make it
easier to give good estimates.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuple sort is broken. It crashes on simple test.

2017-01-16 Thread Peter Geoghegan
On Mon, Jan 16, 2017 at 10:38 AM, Pavel Stehule  wrote:
> Should not be enhanced regress tests too?

We already have coverage of multi-pass external tuplesorts, as of a
few months back. That didn't catch this bug only because it was a
pass-by-value datum tuplesort. The relevant RELEASE_SLAB_SLOT() call
does have line coverage already.

I wouldn't object to adding a test case that would have exercised this
bug, too. It took me a while to talk Tom into the test that was added
several months back, which discouraged me from adding another test
case here. (There were concerns about the overhead of an external sort
test on slower buildfarm animals.)

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-01-16 Thread Alvaro Herrera
Karl O. Pinc wrote:

> I do have a question here regards code formatting.
> The patch now contains:
> 
> if (log_filepath == NULL)
> {
> /* Bad data. Avoid segfaults etc. and return NULL to caller. */
> break;
> }
> 
> I'm not sure how this fits in with PG coding style,
> whether the {} should be removed or what.  I've looked
> around and can't find an example of an if with a single
> line then block and a comment.  Maybe this means that
> I shouldn't be doing this, but if I put the comment above
> the if then it will "run into" the comment block which
> immediately precedes the above code which describes
> a larger body of code.  So perhaps someone should look
> at this and tell me how to improve it.

I think this style is good.  Following the style guide to the letter
would lead to remove the braces and keep the comment where it is;
pgindent will correctly keep at its current indentation.  We do use that
style in a couple of places.  It looks a bit clunky.  In most places we
do keep those braces, for readability and future-proofing in case
somebody inadvertently introduces another statement to the "block".  We
don't automatically remove braces anyway.  (We used to do that, but
stopped a few years ago shortly after introducing PG_TRY).

Putting the comment outside (above) the "if" would be wrong, too; you'd
have to rephrase the comment in a conditional tense.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuple sort is broken. It crashes on simple test.

2017-01-16 Thread Pavel Stehule
2017-01-16 19:24 GMT+01:00 Peter Geoghegan :

> On Mon, Jan 16, 2017 at 3:48 AM, Michael Paquier
>  wrote:
> > Indeed. It crashes for me immediately by adding an ORDER BY:
> > select count(distinct t) from seq_tab order by 1;
>
> The problem was that one particular call to the macro
> RELEASE_SLAB_SLOT() happened to lack a test-for-NULL-argument needed
> by pass-by-value datum cases. The other two RELEASE_SLAB_SLOT() calls
> already have such a check.
>
> Attached patch fixes the bug.
>
>
Should not be enhanced regress tests too?

Regards

Pavel


> --
> Peter Geoghegan
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] Tuple sort is broken. It crashes on simple test.

2017-01-16 Thread Peter Geoghegan
On Mon, Jan 16, 2017 at 3:48 AM, Michael Paquier
 wrote:
> Indeed. It crashes for me immediately by adding an ORDER BY:
> select count(distinct t) from seq_tab order by 1;

The problem was that one particular call to the macro
RELEASE_SLAB_SLOT() happened to lack a test-for-NULL-argument needed
by pass-by-value datum cases. The other two RELEASE_SLAB_SLOT() calls
already have such a check.

Attached patch fixes the bug.

-- 
Peter Geoghegan
From ce24bff1aad894b607ee1ce67757efe72c5acb93 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Mon, 16 Jan 2017 10:14:02 -0800
Subject: [PATCH] Fix NULL pointer dereference in tuplesort.c

This could cause a crash when an external datum tuplesort of a
pass-by-value type required multiple passes.
---
 src/backend/utils/sort/tuplesort.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index cbaf009..e1e692d 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -2800,7 +2800,8 @@ mergeonerun(Tuplesortstate *state)
 		WRITETUP(state, destTape, >memtuples[0]);
 
 		/* recycle the slot of the tuple we just wrote out, for the next read */
-		RELEASE_SLAB_SLOT(state, state->memtuples[0].tuple);
+		if (state->memtuples[0].tuple)
+			RELEASE_SLAB_SLOT(state, state->memtuples[0].tuple);
 
 		/*
 		 * pull next tuple from the tape, and replace the written-out tuple in
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackups and slots

2017-01-16 Thread Magnus Hagander
On Mon, Jan 16, 2017 at 4:33 PM, Fujii Masao  wrote:

> On Mon, Jan 16, 2017 at 10:00 PM, Magnus Hagander 
> wrote:
> >
> >
> > On Wed, Jan 4, 2017 at 3:32 PM, Peter Eisentraut
> >  wrote:
> >>
> >> This patch looks reasonable to me.
> >>
> >> Attached is a top-up patch with a few small fixups.
> >>
> >> I suggest to wait for the resolution of the "Replication/backup
> >> defaults" thread.  I would not want to be in a situation where users who
> >> have not been trained to use replication slots now have yet another
> >> restart-only parameter to set before they can take a backup.
> >>
> >
> > Thanks for the review and the top-up patch. I've applied it and pushed.
>
> - if (replication_slot && includewal != STREAM_WAL)
> + if ((replication_slot || no_slot) && includewal != STREAM_WAL)
>
> Why do we need to check "no_slot" here? Since no_slot=true means that
> no temporary replication slot is specified, it's fine even with both -X
> none
> and fetch.
>

Um yeah, that looks like a bad merge actually. Will fix.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-01-16 Thread Karl O. Pinc


On January 15, 2017 11:47:51 PM CST, Michael Paquier 
 wrote:
>On Mon, Jan 16, 2017 at 1:14 PM, Karl O. Pinc  wrote:
>
>
>> Attached also are 2 patches which abstract some hardcoded
>> constants into symbols.  Whether to apply them is a matter
>> of style and left to the maintainers, which is why these
>> are separate patches.
>
>Making the strings csvlog, stderr and eventlog variables? Why not
>because the patch presented here uses them in new places. Now it is
>not like it is a huge maintenance burden to keep them as strings, so I
>would personally not bother much.

To my mind it's a matter readability. It is useful to be able to search for or 
identify quickly when reading, e. g., all the places where the keyword stderr 
relates to output log destination and not some other more common use. The 
downside is it is more code...


>OK. I have done a round of hands-on review on this patch, finishing
>with the attached. In the things I have done:

Thank you.

>With all those issues fixed, I finish with the attached, that I am
>fine to pass down to a committer. I still think that this should be
>only one function using a SRF with rows shaped as (type, path) for
>simplicity, but as I am visibly outnumbered I won't insist further.

That also makes a certain amount of sense to me but I can't say I have thought 
deeply about it. Haven't paid any attention to this issue and am fine letting 
things move forward as is.

>Also, I would rather see an ERROR returned to the user in case of bad
>data in current_logfiles, I did not change that either as that's the
>original intention of Gilles.

I could be wrong but I seem to recall that Robert recommended against an error 
message. If there is bad data then something is really wrong, up to some sort 
of an attack on the back end. Because this sort of thing simply shouldn't 
happen it's enough in my opinion to guard against buffer overruns etc and just 
get on with life. If something goes unexpectedly and badly wrong with internal 
data structures in general there would have to be all kinds of additional code 
to cover every possible problem and that doesn't seem to make sense.

Sorry about the previous email with empty content. My email client got away 
from me.


Karl 
Free  Software:  "You don't pay back you pay forward."
-- Robert A. Heinlein



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improvements in psql hooks for variables

2017-01-16 Thread Tom Lane
"Daniel Verite"  writes:
>   Tom Lane wrote:
>> Also, why aren't you using ParseVariableBool's existing ability to report
>> errors?

> Its' because there are two cases:
> - either only a boolean can be given to the command or variable,
> in which we let ParseVariableBool() tell:
>unrecognized value "bogus" for "command": boolean expected
> - or other parameters besides boolean are acceptable, in which case we
> can't say "boolean expected", because in fact a boolean is no more or
> less expected than the other valid values.

Ah.  Maybe it's time for a ParseVariableEnum, or some other way of
dealing with those cases in a more unified fashion.

> Do we care about the "boolean expected" part of the error message?

I'm not especially in love with that particular wording, but I'm doubtful
that we should give up all indication of what valid values are, especially
in the cases where there are more than just bool-equivalent values.
I think the right thing to do here is to fix it so that the input routine
has full information about all the valid values.  On the backend side,
we've gone to considerable lengths to make sure that error messages are
helpful for such cases, eg:

regression=# set backslash_quote to foo;
ERROR:  invalid value for parameter "backslash_quote": "foo"
HINT:  Available values: safe_encoding, on, off.

and I think it may be worth working equally hard here.

> I get the idea, except that in this example, the compiler is
> unhappy because popt->topt.expanded is not a bool, and an
> explicit cast here would be kludgy.

Yeah, you'll need an intermediate variable if you're trying to use
ParseVariableBool for such a case.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improvements in psql hooks for variables

2017-01-16 Thread Daniel Verite
Tom Lane wrote:

> "Daniel Verite"  writes:
> > [ psql-var-hooks-v6.patch ]
> 
> I took a quick look through this.  It seems to be going in generally
> the right direction, but here's a couple of thoughts:

Thanks for looking into this!
 
> I'm inclined to think that the best choice is for the function result
> to be the ok/not ok flag, and pass the variable-to-be-modified as an
> output parameter.  That fits better with the notion that the variable
> is not to be modified on failure.

Agreed, if never clobbering the variable, having the valid/invalid state
returned by ParseVariableBool() allows for simpler code. I'm changing it
that way.

> Also, why aren't you using ParseVariableBool's existing ability to report
> errors?

Its' because there are two cases:
- either only a boolean can be given to the command or variable,
in which we let ParseVariableBool() tell:
   unrecognized value "bogus" for "command": boolean expected

- or other parameters besides boolean are acceptable, in which case we
can't say "boolean expected", because in fact a boolean is no more or
less expected than the other valid values.

We could shave code by reducing ParseVariableBool()'s error message to:
  unrecognized value "bogus" for "name"
clearing the distinction between [only booleans are expected]
and [booleans or enum are expected].
Then almost all callers that have their own message could rely
on ParseVariableBool() instead, as they did previously.

Do we care about the "boolean expected" part of the error message?

>  else if (value)
> ! {
> ! boolvalid;
> ! boolnewval = ParseVariableBool(value, NULL, );
> ! if (valid)
> ! popt->topt.expanded = newval;
> ! else
> ! {
> ! psql_error("unrecognized value \"%s\" for \"%s\"\n",
> !value, param);
> ! return false;
> ! }
> ! }
> is really the hard way and you could have just done
> 
> - popt->topt.expanded = ParseVariableBool(value, param);
> + success = ParseVariableBool(value, param,
> >topt.expanded);

I get the idea, except that in this example, the compiler is
unhappy because popt->topt.expanded is not a bool, and an
explicit cast here would be kludgy.

For the error printing part, it would go away with the above 
suggestion


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-01-16 Thread Karl O. Pinc


On January 15, 2017 11:47:51 PM CST, Michael Paquier 
 wrote:
>On Mon, Jan 16, 2017 at 1:14 PM, Karl O. Pinc  wrote:
>> I do have a question here regards code formatting.
>> The patch now contains:
>>
>> if (log_filepath == NULL)
>> {
>> /* Bad data. Avoid segfaults etc. and return NULL to caller.
>*/
>> break;
>> }
>>
>> I'm not sure how this fits in with PG coding style,
>> whether the {} should be removed or what.  I've looked
>> around and can't find an example of an if with a single
>> line then block and a comment.  Maybe this means that
>> I shouldn't be doing this, but if I put the comment above
>> the if then it will "run into" the comment block which
>> immediately precedes the above code which describes
>> a larger body of code.  So perhaps someone should look
>> at this and tell me how to improve it.
>
>Including brackets in this case makes a more readable code. That's the
>pattern respected the most in PG code. See for example
>XLogInsertRecord():
>else
>{
>/*
>  * This was an xlog-switch record, but the current insert location was
>  * already exactly at the beginning of a segment, so there was no need
> * to do anything.
> */
>}
>
>> Attached also are 2 patches which abstract some hardcoded
>> constants into symbols.  Whether to apply them is a matter
>> of style and left to the maintainers, which is why these
>> are separate patches.
>
>Making the strings csvlog, stderr and eventlog variables? Why not
>because the patch presented here uses them in new places. Now it is
>not like it is a huge maintenance burden to keep them as strings, so I
>would personally not bother much.
>
>> The first patch changes only code on the master
>> branch, the 2nd patch changes the new code.
>
>Thanks for keeping things separated.
>
>> I have not looked further at the patch or checked
>> to see that all changes previously recommended have been
>> made.  Michael, if you're confident that everything is good
>> go ahead and move the patchs forward to the maintainers.  Otherwise
>> please let me know and I'll see about finding time
>> for further review.
>
>OK. I have done a round of hands-on review on this patch, finishing
>with the attached. In the things I have done:
>- Elimination of useless noise diff
>- Fixes some typos (logile, log file log, etc.)
>- Adjusted docs.
>- Docs were overdoing in storage.sgml. Let's keep the description
>simple.
>- Having a paragraph at the beginning of "Error Reporting and Logging"
>in config.sgml does not look like a good idea to me. As the updates of
>current_logfiles is associated with log_destination I'd rather see
>this part added in the description of the GUC.
>- Missed a (void) in the definition of update_metainfo_datafile().
>- Moved update_metainfo_datafile() before the signal handler routines
>in syslogger.c for clarity.
>- The last "return" is useless in update_metainfo_datafile().
>- In pg_current_logfile(), it is useless to call PG_RETURN_NULL after
>emitting an ERROR message.
>- Two calls to FreeFile(fd) have been forgotten in
>pg_current_logfile(). In one case, errno needs to be saved beforehand
>to be sure that the correct error string is generated for the user.
>- A portion of 010_pg_basebackup.pl was not updated.
>- Incorrect header ordering in basebackup.c.
>- Decoding of CR and LF characters seem to work fine when
>log_file_name include some.
>
>With all those issues fixed, I finish with the attached, that I am
>fine to pass down to a committer. I still think that this should be
>only one function using a SRF with rows shaped as (type, path) for
>simplicity, but as I am visibly outnumbered I won't insist further.
>Also, I would rather see an ERROR returned to the user in case of bad
>data in current_logfiles, I did not change that either as that's the
>original intention of Gilles.
>-- 
>Michael


Karl 
Free  Software:  "You don't pay back you pay forward."
-- Robert A. Heinlein



Re: [HACKERS] Hash support for grouping sets

2017-01-16 Thread Finnerty, Jim
The ability to exploit hashed aggregation within sorted groups, when the order 
of the input stream can be exploited this way, is potentially a useful way to 
improve aggregation performance more generally.  This would potentially be 
beneficial when the input size is expected to be larger than the amount of 
working memory available for hashed aggregation, but where there is enough 
memory to hash-aggregate just the unsorted grouping key combinations, and when 
the cumulative cost of rebuilding the hash table for each sorted subgroup is 
less than the cost of sorting the entire input.  In other words, if most of the 
grouping key combinations are already segregated by virtue of the input order, 
then hashing the remaining combinations within each sorted group might be done 
in memory, at the cost of rebuilding the hash table for each sorted subgroup.

I haven’t looked at the code for this change yet (I hope I will have the time 
to do that).  Ideally the decision to choose the aggregation method as sorted, 
hashed, or mixed hash/sort should be integrated into the cost model, but given 
the notorious difficulty of estimating intermediate cardinalities accurately it 
would be difficult to develop a cardinality model and a cost model accurate 
enough to choose among these options consistently well.

Jim Finnerty
Amazon Corp.

On 1/10/17, 10:22 AM, "pgsql-hackers-ow...@postgresql.org on behalf of Robert 
Haas"  
wrote:

On Thu, Jan 5, 2017 at 10:48 PM, Andrew Gierth
 wrote:
> Herewith a patch for doing grouping sets via hashing or mixed hashing
> and sorting.

Cool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-01-16 Thread Amit Kapila
On Thu, Jan 12, 2017 at 7:56 PM, Amit Kapila  wrote:
> On Thu, Jan 12, 2017 at 8:51 AM, Robert Haas  wrote:
>> On Wed, Jan 11, 2017 at 9:58 PM, Amit Kapila  wrote:
>>> The other alternative is to remember this information in SubPlan.  We
>>> can retrieve parallel_safe information from best_path and can use it
>>> while generating SubPlan.  The main reason for storing it in the plan
>>> was to avoid explicitly passing parallel_safe information while
>>> generating SubPlan as plan was already available at that time.
>>> However, it seems there are only two places in code (refer
>>> build_subplan) where this information needs to be propagated.  Let me
>>> know if you prefer to remember the parallel_safe information in
>>> SubPlan instead of in Plan or if you have something else in mind?
>>
>> I think we should try doing it in the SubPlan, at least, and see if
>> that comes out more elegant than what you have at the moment.
>>
>
> Okay, done that way.  I have fixed the review comments raised by Dilip
> as well and added the test case in attached patch.
>

After commit-ab1f0c8, this patch needs a rebase.  Attached find
rebased version of the patch.

Thanks to Kuntal for informing me offlist that this patch needs rebase.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pq_pushdown_subplan_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackups and slots

2017-01-16 Thread Fujii Masao
On Mon, Jan 16, 2017 at 10:00 PM, Magnus Hagander  wrote:
>
>
> On Wed, Jan 4, 2017 at 3:32 PM, Peter Eisentraut
>  wrote:
>>
>> This patch looks reasonable to me.
>>
>> Attached is a top-up patch with a few small fixups.
>>
>> I suggest to wait for the resolution of the "Replication/backup
>> defaults" thread.  I would not want to be in a situation where users who
>> have not been trained to use replication slots now have yet another
>> restart-only parameter to set before they can take a backup.
>>
>
> Thanks for the review and the top-up patch. I've applied it and pushed.

- if (replication_slot && includewal != STREAM_WAL)
+ if ((replication_slot || no_slot) && includewal != STREAM_WAL)

Why do we need to check "no_slot" here? Since no_slot=true means that
no temporary replication slot is specified, it's fine even with both -X none
and fetch.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2017-01-16 Thread Alvaro Herrera
Andres Freund wrote:

> That worked quite well.  So we have a few questions, before I clean this
> up:
> 
> - For now the node is named 'Srf' both internally and in explain - not
>   sure if we want to make that something longer/easier to understand for
>   others? Proposals? TargetFunctionScan? SetResult?
> 
> - We could alternatively add all this into the Result node - it's not
>   actually a lot of new code, and most of that is boilerplate stuff
>   about adding a new node.  I'm ok with both.

Hmm.  I wonder if your stuff could be used as support code for
XMLTABLE[1].  Currently it has a bit of additional code of its own,
though admittedly it's very little code executor-side.  Would you mind
sharing a patch, or more details on how it works?

[1] 
https://www.postgresql.org/message-id/cafj8pra_keukobxvs4v-imoeesxu0pd0ashv0-mwrfdrwte...@mail.gmail.com

> - I continued with the division of Labor that Tom had set up, so we're
>   creating one Srf node for each "nested" set of SRFs.  We'd discussed
>   nearby to change that for one node/path for all nested SRFs, partially
>   because of costing.  But I don't like the idea that much anymore. The
>   implementation seems cleaner (and probably faster) this way, and I
>   don't think nested targetlist SRFs are something worth optimizing
>   for.  Anybody wants to argue differently?

Nested targetlist SRFs make my head spin.  I suppose they may have some
use, but where would you really want this:

alvherre=# select generate_series(1, generate_series(2, 4));
 generate_series 
─
   1
   2
   1
   2
   3
   1
   2
   3
   4
(9 filas)

instead of the much more sensible

alvherre=# select i, j from generate_series(2, 4) i, generate_series(1, i) j;
 i │ j 
───┼───
 2 │ 1
 2 │ 2
 3 │ 1
 3 │ 2
 3 │ 3
 4 │ 1
 4 │ 2
 4 │ 3
 4 │ 4
(9 filas)

?  If supporting the former makes it harder to support/optimize more
reasonable cases, it seems fair game to leave them behind.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackups and slots

2017-01-16 Thread Magnus Hagander
On Wed, Jan 4, 2017 at 3:32 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> This patch looks reasonable to me.
>
> Attached is a top-up patch with a few small fixups.
>
> I suggest to wait for the resolution of the "Replication/backup
> defaults" thread.  I would not want to be in a situation where users who
> have not been trained to use replication slots now have yet another
> restart-only parameter to set before they can take a backup.
>
>
Thanks for the review and the top-up patch. I've applied it and pushed.

I just realized I missed crediting you with the review and the top-up in
the commit message. My apologies for this.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Support for pg_receivexlog --format=plain|tar

2017-01-16 Thread Michael Paquier
On Mon, Jan 16, 2017 at 9:12 PM, Magnus Hagander  wrote:
> Where did your research point to then? :) I just read the gzip rfc
> (http://www.zlib.org/rfc-gzip.html) which seems to call it that at least?

Well, OK. I was not aware of this RFC. I guessed it by looking at the
code of gzip, that uses the CRC as well. I also found some reference
into a blog post.

>> > Finally, I think we should make the error message clearly say
>> > "compressed
>> > segment file" - just to make things extra clear.
>>
>> Sure.
>
> AFAICT the
> +   iscompress ? "compressed" : "",
> part of the error handling is unnecessary, because iscompressed will always
> be true in that block. All the other error messages in that codepath has
> compressed hardcoded in them, as should this one.

Fat-fingered here..

>> Hm. It looks that you are right. zlib goes down to _tr_flush_bits() to
>> flush some output, but this finishes only with put_byte(). As the fd
>> is opaque in gzFile, it would be just better to open() the file first,
>> and then use gzdopen to get the gzFile. Let's use as well the existing
>> fd field to save it. gzclose() closes as well the parent fd per the
>> documentation of zlib.
>
> This version throws a warning:
> walmethods.c: In function ‘dir_open_for_write’:
> walmethods.c:170:11: warning: ‘gzfp’ may be used uninitialized in this
> function [-Wmaybe-uninitialized]
>f->gzfp = gzfp;

gcc and clang did not complain here, what did you use?

> I can't see that there is any code path where this can actually happen
> though, so we should probably just initialize it to NULL at variable
> declaration.  Or do you see a path where this could actually be incorrect?

Not that I see. All the code paths using gzfp are under
data_dir->compression > 0.

> If you agree with those two comments, I will go ahead and push with those
> minor fixes.

No problem for me, thanks for the review!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for pg_receivexlog --format=plain|tar

2017-01-16 Thread Magnus Hagander
On Sun, Jan 15, 2017 at 6:44 AM, Michael Paquier 
wrote:

> On Sun, Jan 15, 2017 at 1:31 AM, Magnus Hagander 
> wrote:
> >
> >
> > On Tue, Jan 10, 2017 at 3:01 AM, Michael Paquier <
> michael.paqu...@gmail.com>
> > wrote:
> >> Based on some analysis, it is enough to look at the last 4 bytes of
> >> the compressed file to get the size output data with a single call to
> >> lseek() and then read(). So as there is a simple way to do things and
> >> that's far cheaper than decompressing perhaps hundred of segments I'd
> >> rather do it this way. Attached is the implementation. This code is
> >> using 2 booleans for 4 states of the file names: full non-compressed,
> >> partial non-compressed, full compressed and partial compressed. This
> >> keeps the last check of FindStreamingStart() more simple, but that's
> >> quite fancy lately to have an enum for such things :D
> >
> >
> > I think you need to document that analysis at least in a code comment. I
> > assume this is in reference to the ISIZE member of the gzip format?
>
> Good question. I am not sure on this one.
>

Where did your research point to then? :) I just read the gzip rfc (
http://www.zlib.org/rfc-gzip.html) which seems to call it that at least?


>
> > I was going to say what happens if the file is corrupt and we get random
> > junk data there, but as we only compare it to XlogSegSize that should be
> > safe. But we might want to put a note in about that too?
>
> Perhaps. I have made a better try in FindStreamingStart.


Looks much better now. I think that one is fine as it is now.


>
> > Finally, I think we should make the error message clearly say "compressed
> > segment file" - just to make things extra clear.
>
> Sure.
>

AFAICT the
+   iscompress ? "compressed" : "",
part of the error handling is unnecessary, because iscompressed will always
be true in that block. All the other error messages in that codepath has
compressed hardcoded in them, as should this one.



>
> >> > I found another problem with it -- it is completely broken in sync
> mode.
> >> > You
> >> > need to either forbid sync mode together with compression, or teach
> >> > dir_sync() about it. The later would sound better, but I wonder how
> much
> >> > that's going to kill compression due to the small blocks? Is it a
> >> > reasonable
> >> > use-case?
> >>
> >> Hm. Looking at the docs I think that --compress defined with
> >> --synchronous maps to the use of Z_SYNC_FLUSH with gzflush(). FWIW I
> >> don't have a direct use case for it, but it is not a major effort to
> >> add it, so done. There is no actual reason to forbid this combinations
> >> of options either.
> >>
> > It is enough to just gzflush(), don't we also need to still fsync()? I
> can't
> > see any documentation that gzflush() does that, and a quick look at the
> code
> > of zlib indicates it doesn't (but I didn't check in detail, so if you did
> > please point me to it).
>
> Hm. It looks that you are right. zlib goes down to _tr_flush_bits() to
> flush some output, but this finishes only with put_byte(). As the fd
> is opaque in gzFile, it would be just better to open() the file first,
> and then use gzdopen to get the gzFile. Let's use as well the existing
> fd field to save it. gzclose() closes as well the parent fd per the
> documentation of zlib.
>
>
This version throws a warning:
walmethods.c: In function ‘dir_open_for_write’:
walmethods.c:170:11: warning: ‘gzfp’ may be used uninitialized in this
function [-Wmaybe-uninitialized]
   f->gzfp = gzfp;

I can't see that there is any code path where this can actually happen
though, so we should probably just initialize it to NULL at variable
declaration.  Or do you see a path where this could actually be incorrect?


If you agree with those two comments, I will go ahead and push with those
minor fixes.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Parallel Index Scans

2017-01-16 Thread Amit Kapila
On Fri, Jan 13, 2017 at 11:06 PM, Robert Haas  wrote:
> On Fri, Jan 13, 2017 at 9:28 AM, Anastasia Lubennikova
>  wrote:
>> I didn't find any case of noticeable performance degradation,
>> so set status to "Ready for committer".
>
> The very first hunk of doc changes looks like it makes the whitespace
> totally wrong - surely it can't be right to have 0-space indents in C
> code.
>

Fixed.

> +The index_size parameter indicate the size of generic 
> parallel
>
> indicate -> indicates
> size of generic -> size of the generic
>

Fixed.

> +   index-type-specific parallel information which will be stored immediatedly
>
> Typo.
>

Fixed.

> +   Initialize the parallel index scan state.  It will be used to initialize
> +   index-type-specific parallel information which will be stored immediatedly
> +   after generic parallel information required for parallel index scans.  The
> +   required state information will be set in target.
> +  
> +
> +   
> + The aminitparallelscan and
> amestimateparallelscan
> + functions need only be provided if the access method supports
> parallel
> + index scans.  If it doesn't, the aminitparallelscan and
> + amestimateparallelscan fields in its
> IndexAmRoutine
> + struct must be set to NULL.
> +   
>
> Inconsistent indentation.

Fixed.

>   seems like a strange choice of tag.
>

I have seen that  is used in indexam.sgml at multiple places to
refer to "bitmap" and "plain" index scans.  So thought of using same
for "parallel" index scans.

> +/* amestimateparallelscan is optional; assume no-op if not
> provided by AM */
>
> The fact that amestimateparallelscan is optional even when parallel
> index scans are supported is undocumented.
>

Okay, I have added that information in docs.

>  Similarly for the other
> functions, which also seem to be optional but not documented as such.
> The code and documentation need to match.
>

All the functions introduced by this patch are documented in
indexam.sgml as optional.  Not sure, which other place you are
expecting an update.

> +void   *amtarget = (char *) ((void *) target) + offset;
>
> Passing an unaligned pointer to the AM sounds like a recipe for
> crashes on obscure platforms that can't tolerate alignment violations,
> and possibly bad performance on others.  I'd arrange MAXALIGN the size
> of the generic structure in index_parallelscan_estimate and
> index_parallelscan_initialize.

Right, changed as per suggestion.

>  Also, why pass the size of the generic
> structure to the AM-specific estimate routine, anyway?  It can't
> legally return a smaller value, and we can add_size() just as well as
> the AM-specific code.  Wouldn't it make more sense for the AM-specific
> code to return the amount of space that is needed for AM-specific
> stuff, and let the generic code deal with the generic stuff?
>

makes sense, so changed accordingly.

> + *status - True indicates that the block number returned is valid and 
> scan
> + * is continued or block number is invalid and scan has just 
> begun
> + * or block number is P_NONE and scan is finished.  False 
> indicates
> + * that we have reached the end of scan for current
> scankeys and for
> + * that we return block number as P_NONE.
>
> It is hard to parse a sentence with that many "and" and "or" clauses,
> especially since English, unlike C, does not have strict operator
> precedence rules. Perhaps you could reword to make it more clear.
>

Okay, I have changed the comment.

> Also, does that survive pgindent with that indentation?
>

Yes.

> +BTParallelScanDesc btscan = (BTParallelScanDesc) OffsetToPointer(
> +(void *) scan->parallel_scan,
> + scan->parallel_scan->ps_offset);
>
> You could avoid these uncomfortable line breaks by declaring the
> variable on one line and the initializing it on a separate line.
>

Okay, changed.

> +SpinLockAcquire(>ps_mutex);
> +pageStatus = btscan->ps_pageStatus;
> +if (so->arrayKeyCount < btscan->ps_arrayKeyCount)
> +*status = false;
> +else if (pageStatus == BTPARALLEL_DONE)
> +*status = false;
> +else if (pageStatus != BTPARALLEL_ADVANCING)
> +{
> +btscan->ps_pageStatus = BTPARALLEL_ADVANCING;
> +nextPage = btscan->ps_nextPage;
> +exit_loop = true;
> +}
> +SpinLockRelease(>ps_mutex);
>
> IMHO, this needs comments.
>

Sure, added a comment.

> + * It updates the value of next page that allows parallel scan to move 
> forward
> + * or backward depending on scan direction.  It wakes up one of the sleeping
> + * workers.
>
> This construction is commonly used in India but sounds awkward to
> other English-speakers, or at least to me.  You can either drop the
> word "it" and just start with the verb 

Re: [HACKERS] Tuple sort is broken. It crashes on simple test.

2017-01-16 Thread Michael Paquier
(Adding Peter in CC who also worked on that)

On Mon, Jan 16, 2017 at 7:14 PM, Mithun Cy  wrote:
> Test:
> --
> create table seq_tab(t int);
> insert into seq_tab select generate_series(1, 1000);
> select count(distinct t) from seq_tab;
>
> #0  0x0094a9ad in pfree (pointer=0x0) at mcxt.c:1007
> #1  0x00953be3 in mergeonerun (state=0x1611450) at tuplesort.c:2803
> #2  0x00953824 in mergeruns (state=0x1611450) at tuplesort.c:2721
> #3  0x009521bc in tuplesort_performsort (state=0x1611450) at
> tuplesort.c:1813
> #4  0x00662b85 in process_ordered_aggregate_single
> (aggstate=0x160b540, pertrans=0x160d440, pergroupstate=0x160d3f0) at
> nodeAgg.c:1178
> #5  0x006636e0 in finalize_aggregates (aggstate=0x160b540,
> peraggs=0x160c5e0, pergroup=0x160d3f0, currentSet=0) at nodeAgg.c:1600
> #6  0x00664509 in agg_retrieve_direct (aggstate=0x160b540) at
> nodeAgg.c:2266
> #7  0x00663f66 in ExecAgg (node=0x160b540) at nodeAgg.c:1946
> #8  0x00650ad2 in ExecProcNode (node=0x160b540) at execProcnode.c:503

Indeed. It crashes for me immediately by adding an ORDER BY:
select count(distinct t) from seq_tab order by 1;
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An isolation test for SERIALIZABLE READ ONLY DEFERRABLE

2017-01-16 Thread Thomas Munro
On Mon, Jan 16, 2017 at 10:37 PM, Craig Ringer
 wrote:
> On 16 Jan. 2017 17:09, "Michael Paquier"  wrote:
>
> On Mon, Jan 16, 2017 at 9:40 AM, Thomas Munro
>  wrote:
>> I also have longer term plans to show the first and third of them
>> running with the read-only transaction moved to a standby server.
>> Kevin Grittner gave me the idea of multi-server isolation tests when I
>> mentioned my WIP SERIALIZABLE DEFERRABLE-on-standbys patch off-list,
>
> Being able to do that with the isolation tester has been mentioned a
> coupled of times, particularly from 2ndQ folks who worked in BDR. I
> think that it has been actually patched in this sense, so perhaps you
> could reuse the work that has been done there.
>
>
> Yep. See the bdr-pg/REL9_4_STABLE branch of the BDR repo, src/test/isolation
> .
>
> I think Abhijit submitted a patch to the list too.

Very nice.  That's almost exactly what I had in mind.  One thing that
jumps out at me from README.multinode is that it's surprising to see
conninfo strings in the actual spec file: I assumed that the isolation
tester would extend what it does today by accepting multiple conninfo
command line arguments, and I imagined that the spec info could say
something like SESSION "s3" CONNECTION 2.  I don't really care about
names vs numbers but I thought it might be nice if connections were
controlled separately from specs so that you could use the same spec
for tests that are run on one node and two node setups, perhaps by
making CONNECTION 2 fall back to the default connection if there is no
connection 2 provided on the command line.  For example, with
read-only-anomaly.spec (from the patch in this thread), the output
should be identical if you point s3 at a standby using syncrep and
remote_apply.  In future, if SERIALIZABLE DEFERRABLE is available on
standbys, then read-only-anomaly-3.spec should also work that way.
It'd be cool to find a way to express that without having to use a
separate spec/expected files for the multi-node version when the goal
is to show that it works the same as the single-node version.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Too many autovacuum workers spawned during forced auto-vacuum

2017-01-16 Thread Masahiko Sawada
On Mon, Jan 16, 2017 at 1:50 PM, Amit Khandekar  wrote:
> On 13 January 2017 at 19:15, Alvaro Herrera  wrote:
>> I think this is the same problem as reported in
>> https://www.postgresql.org/message-id/CAMkU=1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta=ypy...@mail.gmail.com
>
> Ah yes, this is the same problem. Not sure why I didn't land on that
> thread when I tried to search pghackers using relevant keywords.
>>
>>> === Fix ===
>> [...]
>>> Instead, the attached patch (prevent_useless_vacuums.patch) prevents
>>> the repeated cycle by noting that there's no point in doing whatever
>>> vac_update_datfrozenxid() does, if we didn't find anything to vacuum
>>> and there's already another worker vacuuming the same database. Note
>>> that it uses wi_tableoid field to check concurrency. It does not use
>>> wi_dboid field to check for already-processing worker, because using
>>> this field might cause each of the workers to think that there is some
>>> other worker vacuuming, and eventually no one vacuums. We have to be
>>> certain that the other worker has already taken a table to vacuum.
>>
>> Hmm, it seems reasonable to skip the end action if we didn't do any
>> cleanup after all. This would normally give enough time between vacuum
>> attempts for the first worker to make further progress and avoid causing
>> a storm.  I'm not really sure that it fixes the problem completely, but
>> perhaps it's enough.
>
> I had thought about this : if we didn't clean up anything, skip the
> end action unconditionally without checking if there was any
> concurrent worker. But then thought it is better to skip only if we
> know there is another worker doing the same job, because :
> a) there might be some reason we are just calling
> vac_update_datfrozenxid() without any condition. But I am not sure
> whether it was intentionally kept like that. Didn't get any leads from
> the history.
> b) it's no harm in updating datfrozenxid() it if there was no other
> worker. In this case, we *know* that there was indeed nothing to be
> cleaned up. So the next time this database won't be chosen again, so
> there's no harm just calling this function.
>

Since autovacuum worker wakes up autovacuum launcher after launched
the autovacuum launcher could try to spawn worker process at high
frequently if you have database with very large table in it that has
just passed autovacuum_freeze_max_age.

autovacuum.c:L1605
/* wake up the launcher */
if (AutoVacuumShmem->av_launcherpid != 0)
kill(AutoVacuumShmem->av_launcherpid, SIGUSR2);

I think we should deal with this case as well.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Tuple sort is broken. It crashes on simple test.

2017-01-16 Thread Mithun Cy
Test:
--
create table seq_tab(t int);
insert into seq_tab select generate_series(1, 1000);
select count(distinct t) from seq_tab;

#0  0x0094a9ad in pfree (pointer=0x0) at mcxt.c:1007
#1  0x00953be3 in mergeonerun (state=0x1611450) at tuplesort.c:2803
#2  0x00953824 in mergeruns (state=0x1611450) at tuplesort.c:2721
#3  0x009521bc in tuplesort_performsort (state=0x1611450) at
tuplesort.c:1813
#4  0x00662b85 in process_ordered_aggregate_single
(aggstate=0x160b540, pertrans=0x160d440, pergroupstate=0x160d3f0) at
nodeAgg.c:1178
#5  0x006636e0 in finalize_aggregates (aggstate=0x160b540,
peraggs=0x160c5e0, pergroup=0x160d3f0, currentSet=0) at nodeAgg.c:1600
#6  0x00664509 in agg_retrieve_direct (aggstate=0x160b540) at
nodeAgg.c:2266
#7  0x00663f66 in ExecAgg (node=0x160b540) at nodeAgg.c:1946
#8  0x00650ad2 in ExecProcNode (node=0x160b540) at execProcnode.c:503

Git bisect shows me the following commit has caused it.

commit Id e94568ecc10f2638e542ae34f2990b821bbf90ac

Author: Heikki Linnakangas   2016-10-03 16:07:49
Committer: Heikki Linnakangas   2016-10-03 16:07:49

Change the way pre-reading in external sort's merge phase works.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typo in condition_variable.c

2017-01-16 Thread Fujii Masao
On Mon, Jan 16, 2017 at 6:34 PM, Masahiko Sawada  wrote:
> On Mon, Jan 16, 2017 at 6:27 PM, Fujii Masao  wrote:
>> On Mon, Jan 16, 2017 at 5:42 PM, Masahiko Sawada  
>> wrote:
>>> Hi,
>>>
>>> Attached patch fixes comment typos in condition_variable.c
>>
>> Seems the patch still has the typo. "initiailly" should be "initially"?
>>
>
> Yes, it should be "initially".

OK, pushed. Thanks!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An isolation test for SERIALIZABLE READ ONLY DEFERRABLE

2017-01-16 Thread Craig Ringer
On 16 Jan. 2017 17:09, "Michael Paquier"  wrote:

On Mon, Jan 16, 2017 at 9:40 AM, Thomas Munro
 wrote:
> I also have longer term plans to show the first and third of them
> running with the read-only transaction moved to a standby server.
> Kevin Grittner gave me the idea of multi-server isolation tests when I
> mentioned my WIP SERIALIZABLE DEFERRABLE-on-standbys patch off-list,

Being able to do that with the isolation tester has been mentioned a
coupled of times, particularly from 2ndQ folks who worked in BDR. I
think that it has been actually patched in this sense, so perhaps you
could reuse the work that has been done there.


Yep. See the bdr-pg/REL9_4_STABLE branch of the BDR repo,
src/test/isolation .

I think Abhijit submitted a patch to the list too.


Re: [HACKERS] Typo in condition_variable.c

2017-01-16 Thread Masahiko Sawada
On Mon, Jan 16, 2017 at 6:27 PM, Fujii Masao  wrote:
> On Mon, Jan 16, 2017 at 5:42 PM, Masahiko Sawada  
> wrote:
>> Hi,
>>
>> Attached patch fixes comment typos in condition_variable.c
>
> Seems the patch still has the typo. "initiailly" should be "initially"?
>

Yes, it should be "initially".

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typo in condition_variable.c

2017-01-16 Thread Fujii Masao
On Mon, Jan 16, 2017 at 5:42 PM, Masahiko Sawada  wrote:
> Hi,
>
> Attached patch fixes comment typos in condition_variable.c

Seems the patch still has the typo. "initiailly" should be "initially"?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2017-01-16 Thread Amit Langote
On 2017/01/14 6:24, Robert Haas wrote:
> On Tue, Jan 10, 2017 at 6:06 AM, Amit Langote wrote:
>>
>> Thanks!  I realized however that the approach I used in 0002 of passing
>> the original slot to ExecConstraints() fails in certain situations.  For
>> example, if a BR trigger changes the tuple, the original slot would not
>> receive those changes, so it will be wrong to use such a tuple anymore.
>> In attached 0001, I switched back to the approach of converting the
>> partition-tupdesc-based tuple back to the root partitioned table's format.
>>  The converted tuple contains the changes by BR triggers, if any.  Sorry
>> about some unnecessary work.
> 
> Hmm.  Even with this patch, I wonder if this is really correct.  I
> mean, isn't the root of the problem here that ExecConstraints() is
> expecting that resultRelInfo matches slot, and it doesn't?

The problem is that whereas the SlotValueDescription that we build to show
in the error message should be based on the tuple that was passed to
ExecInsert() or whatever NextCopyFrom() returned for CopyFrom() to
process, it might fail to be the case if the tuple was needed to be
converted after tuple routing.  slot (the tuple in it and its tuple
descriptor) and resultRelInfo that ExecConstraint() receives *do*
correspond with each other, even after possible tuple conversion following
tuple-routing, and hence constraint checking itself works fine (since
commit 2ac3ef7a01 [1]).  As said, it's the val_desc built to show in the
error message being based on the converted-for-partition tuple that could
be seen as a problem - is it acceptable if we showed in the error message
whatever the converted-for-partition tuple looks like which might have
columns ordered differently from the root table?  If so, we could simply
forget the whole thing, including reverting f1b4c771 [2].

An example:

create table p (a int, b char, c int) partition by list (a);
create table p1 (b char, c int, a int);-- note the column order
alter table p attach partition p1 for values in (1);
alter table p add constraint check_b check (b = 'x');

insert into p values (1, 'y', 1);
ERROR:  new row for relation "p1" violates check constraint "check_b"
DETAIL:  Failing row contains (y, 1, 1).

Note that "(y, 1, 1)" results from using p1's descriptor on the converted
tuple.  As long that's clear and acceptable, I think we need not worry
about this patch and revert the previously committed patch for this "problem".

> And why
> isn't that also a problem for the things that get passed resultRelInfo
> and slot after tuple routing and before ExecConstraints?  In
> particular, in copy.c, ExecBRInsertTriggers.

If I explained the problem (as I'm seeing it) well enough above, you may
see why that's not an issue in other cases.  Other sub-routines viz.
ExecBRInsertTriggers, ExecWithCheckOptions for RLS INSERT WITH CHECK
policies, ExecARInsertTriggers, etc. do receive the correct slot and
resultRelInfo for whatever they do with a given tuple and the relation
(partition) it is being inserted into.  However, if we do consider the
above to be a problem, then we would need to fix ExecWithCheckOptions() to
do whatever we decide ExecConstraints() should do, because it can also
report WITH CHECK violation for a tuple.

> Committed 0002.

Thanks,
Amit

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ac3ef7
[2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f1b4c77




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Typo in condition_variable.c

2017-01-16 Thread Masahiko Sawada
Hi,

Attached patch fixes comment typos in condition_variable.c

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_type_in_condition_variable_c.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2017-01-16 Thread Andres Freund
On 2017-01-15 19:29:52 -0800, Andres Freund wrote:
> On 2016-10-31 09:06:39 -0700, Andres Freund wrote:
> > On 2016-09-14 19:28:25 -0400, Tom Lane wrote:
> > > Andres Freund  writes:
> > > > On 2016-09-12 19:35:22 -0400, Tom Lane wrote:
> > > Here's a draft patch that is just meant to investigate what the planner
> > > changes might look like if we do it in the pipelined-result way.
> > > Accordingly, I didn't touch the executor, but just had it emit regular
> > > Result nodes for SRF-execution steps.  However, the SRFs are all
> > > guaranteed to appear at top level of their respective tlists, so that
> > > those Results could be replaced with something that works like
> > > nodeFunctionscan.
> >
> > > So I think we should continue investigating this way of doing things.
> > > I'll try to take a look at the executor end of it tomorrow.  However
> > > I'm leaving Friday for a week's vacation, and may not have anything to
> > > show before that.
> >
> > Are you planning to work on the execution side of things? I otherwise
> > can take a stab...
>
> Doing so now.

That worked quite well.  So we have a few questions, before I clean this
up:

- For now the node is named 'Srf' both internally and in explain - not
  sure if we want to make that something longer/easier to understand for
  others? Proposals? TargetFunctionScan? SetResult?

- We could alternatively add all this into the Result node - it's not
  actually a lot of new code, and most of that is boilerplate stuff
  about adding a new node.  I'm ok with both.

- I continued with the division of Labor that Tom had set up, so we're
  creating one Srf node for each "nested" set of SRFs.  We'd discussed
  nearby to change that for one node/path for all nested SRFs, partially
  because of costing.  But I don't like the idea that much anymore. The
  implementation seems cleaner (and probably faster) this way, and I
  don't think nested targetlist SRFs are something worth optimizing
  for.  Anybody wants to argue differently?

- I chose to error out if a retset function appears in ExecEvalFunc/Oper
  and make both unconditionally set evalfunc to
  ExecMakeFunctionResultNoSets.  ExecMakeFunctionResult() is now
  externally visible.  That seems like the least noisy way to change
  things over, but I'm open for different proposals.

Comments?

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] check_srf_call_placement() isn't always setting p_hasTargetSRFs

2017-01-16 Thread Andres Freund
Hi,

while working on [1] (works, harmless regression check failures aside,
needs cleanup), I noticed $subject.  Without fixing EXPR_KIND_VALUES to
set p_hasTargetSRFs the query has Query->hasTargetSRF wrongly set.
Which in turn breaks your planner code, as it's not being triggered
anymore.

Is there a reason not to just set p_hasTargetSRFs once towards the end
of the function, instead of doing so for all the non-error cases?  That
fixes INSERT ... VALUES(generate_series()) with your approach for me.

I also noticed that we currently don't actually handle
Query->hasTargetSRF correct when said SRF is in a VALUES() block. As
there the SRFs are't in the targetlist

PlannerInfo *
subquery_planner(PlannerGlobal *glob, Query *parse,
 PlannerInfo *parent_root,
 bool hasRecursion, double tuple_fraction)
...
/* Constant-folding might have removed all set-returning functions */
if (parse->hasTargetSRFs)
parse->hasTargetSRFs = expression_returns_set((Node *) 
parse->targetList);

unsets that SRFs exist.  I'm not really sure whether that's bad or good
- it right now doesn't cause active problems because of
transformInsertStmt's
/*
 * Process INSERT ... VALUES with a single VALUES sublist.  We 
treat
 * this case separately for efficiency.  The sublist is just 
computed
 * directly as the Query's targetlist, with no VALUES RTE.  So 
it
 * works just like a SELECT without any FROM.
 */
optimization.

As we don't support SRFs in any other kind of values (ValuesNext calls
ExecEvalExpr isDone = NULL), that appears to be effectless right now.
But we'd get better errormessage if we made check_srf_call_placement()
error out. I wonder if there should be a seperate expression type for
the INSERT ... VALUES(exactly-one-row); since that behaves quite
differently.

Greetings,

Andres Freund

[1] 
http://archives.postgresql.org/message-id/20170116032952.z2re55hcfhzbkmrm%40alap3.anarazel.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers