Re: [HACKERS] pageinspect: Hash index support
> > 7. I think it is not your bug, but probably a bug in Hash index > itself; page flag is set to 66 (for below test); So the page is both > LH_BUCKET_PAGE and LH_BITMAP_PAGE. Is not this a bug in hash index? > > I have inserted 3000 records. Hash index is on integer column. > select hasho_flag FROM hash_page_stats(get_raw_page('i1', 1)); > hasho_flag > > 66 > Here is the test for same. After insertion of 3000 records, I think at first split we can see bucket page flag is set with LH_BITMAP_PAGE. create table t1( ti int); create index i1 on t1 using hash(ti); postgres=# select hasho_flag from hash_page_stats(get_raw_page('i1', 1)); hasho_flag 2 postgres=# insert into t1 select generate_series(1, 1000); INSERT 0 1000 postgres=# select hasho_flag from hash_page_stats(get_raw_page('i1', 1)); hasho_flag 2 (1 row) postgres=# insert into t1 select generate_series(1, 1000); INSERT 0 1000 postgres=# select hasho_flag from hash_page_stats(get_raw_page('i1', 1)); hasho_flag 2 (1 row) postgres=# insert into t1 select generate_series(1, 1000); INSERT 0 1000 postgres=# select hasho_flag from hash_page_stats(get_raw_page('i1', 1)); hasho_flag 66 (1 row) I think If this is a bug then example given in pageinspect.sgml should be changed in your patch after the bug fix. +hasho_prevblkno | -1 +hasho_nextblkno | 8474 +hasho_bucket| 0 +hasho_flag | 66 +hasho_page_id | 65408 -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)
On Sat, Jan 14, 2017 at 12:58 AM, Dilip Kumarwrote: > On Wed, Jan 11, 2017 at 7:47 PM, Dilip Kumar wrote: >> +void >> +pgstat_count_sqlstmt(const char *commandTag) >> +{ >> + PgStat_SqlstmtEntry *htabent; >> + bool found; >> + >> + if (!pgstat_track_sql) >> + return >> >> Callers of pgstat_count_sqlstmt are always ensuring that >> pgstat_track_sql is true, seems it's redundant here. > > I have done testing of the feature, it's mostly working as per the > expectation. > > I have few comments/questions. > > > If you execute the query with EXECUTE then commandTag will be EXECUTE > that way we will not show the actual query type, I mean all the > statements will get the common tag "EXECUTE". > > You can try this. > PREPARE fooplan (int) AS INSERT INTO t VALUES($1); > EXECUTE fooplan(1); > > -- > > + /* Destroy the SQL statement counter stats HashTable */ > + hash_destroy(pgstat_sql); > + > + /* Create SQL statement counter Stats HashTable */ > > I think in the patch we have used mixed of "Stats HashTable" and > "stats HashTable", I think better > to be consistent throughout the patch. Check other similar instances. > > > > @@ -1102,6 +1102,12 @@ exec_simple_query(const char *query_string) > > PortalDrop(portal, false); > > + /* > + * Count SQL statement for pg_stat_sql view > + */ > + if (pgstat_track_sql) > + pgstat_count_sqlstmt(commandTag); > > We are only counting the successful SQL statement, Is that intentional? > > -- > I have noticed that pgstat_count_sqlstmt is called from > exec_simple_query and exec_execute_message. Don't we want to track the > statement executed from functions? > --- The latest patch available has rotten, could you rebase please? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - allow backslash continuations in \set expressions
On Fri, Jan 13, 2017 at 9:15 PM, Fabien COELHOwrote: > >>> Could it be related to transformations operated when the file is >>> downloaded >>> and saved, because it is a text file? >> >> >> I think this is delaying the patch unnecessarily, I have attached a >> version, please see if you can apply it successfully, we can proceed >> with that safely then... > > > This is the same file I sent: > > sh> sha1sum pgbench-continuation-4.patch pgbench-continuation-3.patch > 97fe805a89707565210699694467f9256ca02dab pgbench-continuation-4.patch > 97fe805a89707565210699694467f9256ca02dab pgbench-continuation-3.patch > > The difference is that your version is mime-typed with a generic > > Application/OCTET-STREAM > > While the one I sent was mime-typed as > > text/x-diff > > as defined by the system in /etc/mime.types on my xenial laptop. > > My guess is that with the later your mail client changes the file when > saving it, because it sees "text". > Okay, I am marking it as 'Ready for committer' with the following note to committer, I am getting whitespace errors in v3 of patch, which I corrected in v4, however, Fabien is of the opinion that v3 is clean and is showing whitespace errors because of downloader, etc. issues in my setup. -- Regards, Rafia Sabih EnterpriseDB: http://www.enterprisedb.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Renaming of pg_xlog and pg_clog
On Tue, Nov 29, 2016 at 1:35 PM, Michael Paquierwrote: > On Tue, Nov 22, 2016 at 8:35 PM, Haribabu Kommi > wrote: >> Hi Craig, >> >> This is a gentle reminder. >> >> you assigned as reviewer to the current patch in the 11-2016 commitfest. >> But you haven't shared your review yet. Please share your review about >> the patch. This will help us in smoother operation of commitfest. >> >> Please Ignore if you already shared your review. > > I have moved this CF entry to 2017-01, the remaining, still unreviewed > patch are for renaming pg_subxact and pg_clog. The second patch has rotten a little because of the backup documentation. By the way, is something going to happen in the CF? Craig, you are a reviewer of this patch. -- Michael 0001-Rename-pg_clog-to-pg_xact.patch Description: invalid/octet-stream 0002-Rename-pg_subtrans-to-pg_subxact.patch Description: invalid/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb_delete with arrays
On Sun, Dec 18, 2016 at 1:27 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote: > Speaking about implementation of `jsonb_delete_array` - it's fine, but I > would like to suggest two modifications: > > * create a separate helper function for jsonb delete operation, to use it in > both `jsonb_delete` and `jsonb_delete_array`. It will help to concentrate > related logic in one place. I am not sure that it is much a win as the code loses readability for a minimal refactoring. What would have been nice is to group as well jsonb_delete_idx but handling of the element deletion is really different there. > * use variadic arguments for `jsonb_delete_array`. For rare cases, when > someone decides to use this function directly instead of corresponding > operator. It will be more consistent with `jsonb_delete` from my point of > view, because it's transition from `jsonb_delete(data, 'key')` to > `jsonb_delete(data, 'key1', 'key2')` is more smooth, than to > `jsonb_delete(data, '{key1, key2}')`. That's a good idea. > I've attached a patch with these modifications. What do you think? Looking at both patches proposed, documentation is still missing in the list of jsonb operators as '-' is missing for arrays. I am marking this patch as waiting on author for now. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] increasing the default WAL segment size
On Tue, Jan 17, 2017 at 12:38 PM, Michael Paquierwrote: > On Tue, Jan 17, 2017 at 4:06 PM, Beena Emerson > wrote: > > I have already made patch for the generic SHOW replication command > > (attached) and am working on the new initdb patch based on that. > > I have not yet fixed the pg_standby issue. I am trying to address all the > > comments and bugs still. > > Having documentation for this patch in protocol.sgml would be nice. > Yes. I will add that. -- Thank you, Beena Emerson Have a Great Day!
Re: [HACKERS] increasing the default WAL segment size
On Tue, Jan 17, 2017 at 4:06 PM, Beena Emersonwrote: > I have already made patch for the generic SHOW replication command > (attached) and am working on the new initdb patch based on that. > I have not yet fixed the pg_standby issue. I am trying to address all the > comments and bugs still. Having documentation for this patch in protocol.sgml would be nice. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] increasing the default WAL segment size
On Tue, Jan 17, 2017 at 12:18 PM, Michael Paquierwrote: > On Sun, Jan 8, 2017 at 9:52 AM, Jim Nasby > wrote: > >> I agree pretty_print_kb would have been a better for this function. > >> However, I have realised that using the show hook and this function is > >> not suitable and have found a better way of handling the removal of > >> GUC_UNIT_XSEGS which no longer needs this function : using the > >> GUC_UNIT_KB, convert the value in bytes to wal segment count instead in > >> the assign hook. The next version of patch will use this. > > > > > > ... it sounds like you're going back to exposing KB to users, and that's > all > > that really matters. > > > >> IMHO it'd be better to use the n & (n-1) check detailed at [3]. > > That would be better. > > So I am looking at the proposed patch, though there have been reviews > the patch was in "Needs Review" state, and as far as I can see it is a > couple of things for frontends. Just by grepping for XLOG_SEG_SIZE I > have spotted the following problems: > - pg_standby uses it to know about the next segment available. > Yes. I am aware of this and had mentioned it in my post. > - pg_receivexlog still uses it in segment handling. > It may be a good idea to just remove XLOG_SEG_SIZE and fix the code > paths that fail to compile without it, frontend utilities included > because a lot of them now rely on the value coded in xlog_internal.h, > but with this patch the value is set up in the context of initdb. And > this would induce major breakages in many backup tools, pg_rman coming > first in mind... We could replace it with for example a macro that > frontends could use to check if the size of the WAL segment is in a > valid range if the tool does not have direct access to the Postgres > instance (aka the size of the WAL segment used there) as there are as > well offline tools. > I will see whats the best way to do this. > > -#define XLogSegSize((uint32) XLOG_SEG_SIZE) > + > +extern uint32 XLogSegSize; > +#define XLOG_SEG_SIZE XLogSegSize > This bit is really bad for frontend declaring xlog_internal.h... > > --- a/src/bin/pg_test_fsync/pg_test_fsync.c > +++ b/src/bin/pg_test_fsync/pg_test_fsync.c > @@ -62,7 +62,7 @@ static const char *progname; > > static int secs_per_test = 5; > static int needs_unlink = 0; > -static char full_buf[XLOG_SEG_SIZE], > +static char full_buf[DEFAULT_XLOG_SEG_SIZE], > This would make sense as a new option of pg_test_fsync. > > A performance study would be a good idea as well. Regarding the > generic SHOW command in the replication protocol, I may do it for next > CF, I have use cases for it in my pocket. > > Thank you for your review. I have already made patch for the generic SHOW replication command (attached) and am working on the new initdb patch based on that. I have not yet fixed the pg_standby issue. I am trying to address all the comments and bugs still. -- Beena Emerson Have a Great Day! repl_show_cmd.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Refactor "if(strspn(str, ...) == strlen(str)" code
On Wed, Jan 11, 2017 at 4:50 PM, Michael Paquierwrote: > Looking at this patch, I am not sure that it is worth worrying about. > This is a receipt to make back-patching a bit more complicated, and it > makes the code more complicated to understand. So I would vote for > rejecting it and move on. I have done so for now to make the CF move, if somebody wants to complain feel free... -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] increasing the default WAL segment size
On Sun, Jan 8, 2017 at 9:52 AM, Jim Nasbywrote: >> I agree pretty_print_kb would have been a better for this function. >> However, I have realised that using the show hook and this function is >> not suitable and have found a better way of handling the removal of >> GUC_UNIT_XSEGS which no longer needs this function : using the >> GUC_UNIT_KB, convert the value in bytes to wal segment count instead in >> the assign hook. The next version of patch will use this. > > > ... it sounds like you're going back to exposing KB to users, and that's all > that really matters. > >> IMHO it'd be better to use the n & (n-1) check detailed at [3]. That would be better. So I am looking at the proposed patch, though there have been reviews the patch was in "Needs Review" state, and as far as I can see it is a couple of things for frontends. Just by grepping for XLOG_SEG_SIZE I have spotted the following problems: - pg_standby uses it to know about the next segment available. - pg_receivexlog still uses it in segment handling. It may be a good idea to just remove XLOG_SEG_SIZE and fix the code paths that fail to compile without it, frontend utilities included because a lot of them now rely on the value coded in xlog_internal.h, but with this patch the value is set up in the context of initdb. And this would induce major breakages in many backup tools, pg_rman coming first in mind... We could replace it with for example a macro that frontends could use to check if the size of the WAL segment is in a valid range if the tool does not have direct access to the Postgres instance (aka the size of the WAL segment used there) as there are as well offline tools. -#define XLogSegSize((uint32) XLOG_SEG_SIZE) + +extern uint32 XLogSegSize; +#define XLOG_SEG_SIZE XLogSegSize This bit is really bad for frontend declaring xlog_internal.h... --- a/src/bin/pg_test_fsync/pg_test_fsync.c +++ b/src/bin/pg_test_fsync/pg_test_fsync.c @@ -62,7 +62,7 @@ static const char *progname; static int secs_per_test = 5; static int needs_unlink = 0; -static char full_buf[XLOG_SEG_SIZE], +static char full_buf[DEFAULT_XLOG_SEG_SIZE], This would make sense as a new option of pg_test_fsync. A performance study would be a good idea as well. Regarding the generic SHOW command in the replication protocol, I may do it for next CF, I have use cases for it in my pocket. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Measuring replay lag
On Thu, Dec 22, 2016 at 6:14 AM, Thomas Munrowrote: > On Thu, Dec 22, 2016 at 2:14 AM, Fujii Masao wrote: >> I agree that the capability to measure the remote_apply lag is very useful. >> Also I want to measure the remote_write and remote_flush lags, for example, >> in order to diagnose the cause of replication lag. > > Good idea. I will think about how to make that work. There was a > proposal to make writing and flushing independent[1]. I'd like that > to go in. Then the write_lag and flush_lag could diverge > significantly, and it would be nice to be able to see that effect as > time (though you could already see it with LSN positions). > >> For that, what about maintaining the pairs of send-timestamp and LSN in >> *sender side* instead of receiver side? That is, walsender adds the pairs >> of send-timestamp and LSN into the buffer every sampling period. >> Whenever walsender receives the write, flush and apply locations from >> walreceiver, it calculates the write, flush and apply lags by comparing >> the received and stored LSN and comparing the current timestamp and >> stored send-timestamp. > > I thought about that too, but I couldn't figure out how to make the > sampling work. If the primary is choosing (LSN, time) pairs to store > in a buffer, and the standby is sending replies at times of its > choosing (when wal_receiver_status_interval has been exceeded), then > you can't accurately measure anything. Yeah, even though the primary stores (100, 2017-01-17 00:00:00) as the pair of (LSN, timestamp), for example, the standby may not send back the reply for LSN 100 itself. The primary may receive the reply for larger LSN like 200, instead. So the measurement of the lag in the primary side would not be so accurate. But we can calculate the "sync rep" lag by comparing the stored timestamp of LSN 100 and the timestamp when the reply for LSN 200 is received. In sync rep, since the transaction waiting for LSN 100 to be replicated is actually released after the reply for LSN 200 is received, the above calculated lag is basically accurate as sync rep lag. Therefore I'm still thinking that it's better to maintain the pairs of LSN and timestamp in the *primary* side. Thought? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting huge pages on Windows
On Tue, Jan 10, 2017 at 8:49 AM, Tsunakawa, Takayukiwrote: > Hello, Amit, Magnus, > > I'm sorry for my late reply. Yesterday was a national holiday in Japan. > > > From: pgsql-hackers-ow...@postgresql.org >> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Amit Kapila >> PGSharedMemoryReAttach is called after the startup of new process whereas >> pgwin32_ReserveSharedMemoryRegion is called before the new process could >> actually start. Basically, pgwin32_ReserveSharedMemoryRegion is used to >> reserve shared memory for each backend, so calling VirtualAlloc there should >> follow spec for huge pages. If you have some reason for not using, then >> it is not clear from your reply, can you try to explain in detail. > > OK. The processing takes place in the following order: > > 1. postmaster calls CreateProcess() to start a child postgres in a suspended > state. > 2. postmaster calls VirtualAlloc(MEM_RESERVE) in > pgwin32_ReserveSharedMemoryRegion() to reserve the virtual address space in > the child to secure space for the shared memory. This call just affects the > virtual address space and does not allocate physical memory. So the large > page is still irrelevant. Okay, today again reading VirtualAlloc specs, I could see that MEM_LARGE_PAGES is not not required to reserve the memory region. It is only required during allocation. > > I succeeded by following the same procedure using secpol.msc on Win10, > running 64-bit PostgreSQL. I started PostgreSQL as a Windows service because > it's the normal way, with the service account being a regular Windows user > account(i.e. not LocalSystem). > > But... I failed to start PostgreSQL by running "pg_ctl start" from a command > prompt, receiving the same error message as you. On the other hand, I could > start PostgreSQL on a command prompt with administrative privileges > (right-click on "Command prompt" from the Start menu and select "Run as an > administrator" in the menu. Hmm. It doesn't work even on a command prompt with administrative privileges. It gives below error: waiting for server to start2017-01-17 11:20:13.780 IST [4788] FATAL: could not create shared memory segment: error code 1450 2017-01-17 11:20:13.780 IST [4788] DETAIL: Failed system call was CreateFileMap ping(size=148897792, name=Global/PostgreSQL:E:/WorkSpace/PostgreSQL/master/Data) . 2017-01-17 11:20:13.780 IST [4788] LOG: database system is shut down stopped waiting pg_ctl: could not start server Examine the log output. Now, error code 1450 can occur due to insufficient system resources, so I have tried by increasing the size of shared memory (higher value of shared_buffers) without your patch and it works. This indicates some problem with the patch. > It seems that Windows removes many privileges, including "Lock Pages in > Memory", when starting the normal command prompt. As its evidence, you can > use the attached priv.c to see what privileges are assigned and and > enabled/disabled. Build it like "cl priv.c" and just run priv.exe on each > command prompt. Those runs show different privileges. > This is bad. > Should I need to do something, e.g. explain in the document that the user > should use the command prompt with administrative privileges when he uses > huge_pages? > I think it is better to document in some way if we decide to go-ahead with the patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_hba_file_settings view patch
On Tue, Jan 17, 2017 at 10:19 AM, Haribabu Kommiwrote: > On Tue, Jan 10, 2017 at 6:35 PM, Michael Paquier > wrote: >> +/* LDAP supports 10 currently, keep this well above the most any >> method needs */ >> +#define MAX_OPTIONS 12 >> Er, why? There is an assert already, that should be enough. > > > Which Assert? This macro is used to verify that the maximum number > of authentication options that are possible for a single hba line. That one: + Assert(noptions <= MAX_OPTIONS); + if (noptions) + return PointerGetDatum( + construct_array(options, noptions, TEXTOID, -1, false, 'i')); >> =# \d pg_hba_rules >>View "pg_catalog.pg_hba_rules" >> Column | Type | Collation | Nullable | Default >> --+-+---+--+- >> line_number | integer | | | >> type | text| | | >> keyword_database | text[] | | | >> database | text[] | | | >> keyword_user | text[] | | | >> user_name| text[] | | | >> keyword_address | text| | | >> address | inet| | | >> netmask | inet| | | >> hostname | text| | | >> method | text| | | >> options | text[] | | | >> error| text| | | >> keyword_database and database map actually to the same thing if you >> refer to a raw pg_hba.conf file because they have the same meaning for >> user. You could simplify the view and just remove keyword_database, >> keyword_user and keyword_address. This would simplify your patch as >> well with all hte mumbo-jumbo to see if a string is a dedicated >> keyword or not. In its most simple shape pg_hba_rules should show to >> the user as an exact map of the entries of the raw file. > > I removed keyword_database and keyword_user columns where the data > in those columns can easily represent with the database and user columns. > But for address filed can contains keywords such as "same host" and etc and > also a hostname also. Because of this reason, this field is converted into > 3 columns in the view. Hm. We could as well consider cidr and use just one column. But still, the use of inet as a data type in a system view looks like a wrong choice to me. Or we could actually just use text... Opinions from others are welcome here of course. >> I have copied the example file of pg_hba.conf, reloaded it: >> https://www.postgresql.org/docs/devel/static/auth-pg-hba-conf.html >> And then the output result gets corrupted by showing up free()'d results: >> null | null| \x7F\x7F\x7F\x7F\x7F > > There was a problem in resetting the error string, working with attached > patch. Thanks. Now that works. > Updated patch attached. This begins to look good. I have found a couple of minor issues. + + The pg_hba_rules view can be read only by + superusers. + This is not true anymore. + + Line number within client authentication configuration file + the current value was set at + I'd tune that without a past sentence. Saying just pg_hba.conf would be fine perhaps? + + database + text[] + List of database name + This should be plural, database nameS. + + List of keyword address names, + name can be all, samehost and samenet + Phrasing looks weird to me, what about "List of keyword address names, whose values can be all, samehost or samenet", with markups. +postgres=# select line_number, type, database, user_name, auth_method from pg_hba_rules; Nit: this could be upper-cased. +static Datum +getauthmethod(UserAuth auth_method) +{ + ... + default: + elog(ERROR, "unexpected authentication method in parsed HBA entry"); + break; + } I think that you should also remove the default clause here to catchup errors at compilation. + switch (hba->conntype) + { + case ctLocal: + values[index] = CStringGetTextDatum("local"); + break; + case ctHost: + values[index] = CStringGetTextDatum("host"); + break; + case ctHostSSL: + values[index] = CStringGetTextDatum("hostssl"); + break; + case ctHostNoSSL: + values[index] = CStringGetTextDatum("hostnossl"); + break; + default: + elog(ERROR, "unexpected connection type in parsed HBA entry"); + } You could go without the default clause here as well. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Wed, Jan 11, 2017 at 10:55 AM, Dilip Kumarwrote: > I have reviewed the latest patch and I don't have any more comments. > So if there is no objection from other reviewers I can move it to > "Ready For Committer"? Seeing no objections, I have moved it to Ready For Committer. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Append implementation
Hi Amit, On 2016/12/23 14:21, Amit Khandekar wrote: > Currently an Append plan node does not execute its subplans in > parallel. There is no distribution of workers across its subplans. The > second subplan starts running only after the first subplan finishes, > although the individual subplans may be running parallel scans. > > Secondly, we create a partial Append path for an appendrel, but we do > that only if all of its member subpaths are partial paths. If one or > more of the subplans is a non-parallel path, there will be only a > non-parallel Append. So whatever node is sitting on top of Append is > not going to do a parallel plan; for example, a select count(*) won't > divide it into partial aggregates if the underlying Append is not > partial. > > The attached patch removes both of the above restrictions. There has > already been a mail thread [1] that discusses an approach suggested by > Robert Haas for implementing this feature. This patch uses this same > approach. I was looking at the executor portion of this patch and I noticed that in exec_append_initialize_next(): if (appendstate->as_padesc) return parallel_append_next(appendstate); /* * Not parallel-aware. Fine, just go on to the next subplan in the * appropriate direction. */ if (ScanDirectionIsForward(appendstate->ps.state->es_direction)) appendstate->as_whichplan++; else appendstate->as_whichplan--; which seems to mean that executing Append in parallel mode disregards the scan direction. I am not immediately sure what implications that has, so I checked what heap scan does when executing in parallel mode, and found this in heapgettup(): else if (backward) { /* backward parallel scan not supported */ Assert(scan->rs_parallel == NULL); Perhaps, AppendState.as_padesc would not have been set if scan direction is backward, because parallel mode would be disabled for the whole query in that case (PlannerGlobal.parallelModeOK = false). Maybe add an Assert() similar to one in heapgettup(). Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Grouped base relation
On 17 January 2017 at 16:30, Tomas Vondrawrote: > Let's instead use an example similar to what Antonin mentioned in the > initial post - two tables, with two columns each. > > CREATE TABLE t1 (a INT, b INT); > CREATE TABLE t2 (c INT, d INT); > > And let's assume each table has 100.000 rows, but only 100 groups in the > first column, with 1000 rows per group. Something like > > INSERT INTO t1 > SELECT mod(i,100), i FROM generate_series(1, 1e5) s(i); > > INSERT INTO t2 > SELECT mod(i,100), i FROM generate_series(1, 1e5) s(i); > > And let's assume this query > > SELECT t1.a, count(t2.d) FROM t1 JOIN t2 ON (t1.a = t2.c) > GROUP BY t1.a; > > On master, EXPLAIN (COSTS OFF, TIMING OFF, ANALYZE) looks like this: > >QUERY PLAN > - > HashAggregate (actual rows=100 loops=1) >Group Key: t1.a >-> Hash Join (actual rows=1 loops=1) > Hash Cond: (t2.c = t1.a) > -> Seq Scan on t2 (actual rows=10 loops=1) > -> Hash (actual rows=10 loops=1) >Buckets: 131072 Batches: 2 Memory Usage: 2716kB >-> Seq Scan on t1 (actual rows=10 loops=1) > Planning time: 0.167 ms > Execution time: 17005.300 ms > (10 rows) > > while with the patch it looks like this > > QUERY PLAN > - > Finalize HashAggregate (actual rows=100 loops=1) >Group Key: t1.a >-> Hash Join (actual rows=100 loops=1) > Hash Cond: (t1.a = t2.c) > -> Partial HashAggregate (actual rows=100 loops=1) >Group Key: t1.a >-> Seq Scan on t1 (actual rows=10 loops=1) > -> Hash (actual rows=100 loops=1) >Buckets: 1024 Batches: 1 Memory Usage: 14kB >-> Partial HashAggregate (actual rows=100 loops=1) > Group Key: t2.c > -> Seq Scan on t2 (actual rows=10 loops=1) > Planning time: 0.105 ms > Execution time: 31.609 ms > (14 rows) > > This of course happens because with the patch we run the transition function > 200k-times (on each side of the join) and aggtransmultifn on the 100 rows > produced by the join, while on master the join produces 10.000.000 rows > (which already takes much more time), and then have to run the transition > function on all those rows. > > The performance difference is pretty obvious, I guess. An exceptional improvement. For the combine aggregate example of this query, since no patch exists yet, we could simply mock what the planner would do by rewriting the query. I'll use SUM() in-place of the combinefn for COUNT(): explain analyze SELECT t1.a, sum(t2.d) FROM t1 join (SELECT c,count(d) d from t2 group by c) t2 on t1.a = t2.c group by t1.a; this seems to be 100,000 aggtransfn calls (for t2), then 100,000 aggcombinefn calls (for t1) (total = 200,000), where as the patch would perform 100,000 aggtransfn calls (for t2), then 100,000 aggtransfn calls (for t1), then 100 aggtransmultifn calls (total = 200,100) Is my maths ok? > I don't quite see how the patch could use aggcombinefn without sacrificing a > lot of the benefits. Sure, it's possible to run the aggcombinefn in a loop > (with number of iterations determined by the group size on the other side of > the join), but that sounds pretty expensive and eliminates the reduction of > transition function calls. The join cardinality would still be reduced, > though. I'd be interested in seeing the run time of my example query above. I can't quite see a reason for it to be slower, but please let me know. > I do have other question about the patch, however. It seems to rely on the > fact that the grouping and joins both reference the same columns. I wonder > how uncommon such queries are. > > To give a reasonable example, imagine the typical start schema, which is > pretty standard for large analytical databases. A dimension table is > "products" and the fact table is "sales", and the schema might look like > this: > > CREATE TABLE products ( > idSERIAL PRIMARY KEY, > name TEXT, > category_id INT, > producer_id INT > ); > > CREATE TABLE sales ( > product_idREFERENCES products (id), > nitemsINT, > price NUMERIC > ); > > A typical query then looks like this: > > SELECT category_id, SUM(nitems), SUM(price) > FROM products p JOIN sales s ON (p.id = s.product_id) > GROUP BY p.category_id; > > which obviously uses different columns for the grouping and join, and so the > patch won't help with that. Of course, a query grouping by product_id would > allow the patch to work > > SELECT category_id, SUM(nitems), SUM(price) > FROM products p JOIN sales s ON (p.id = s.product_id) > GROUP BY p.product_id; > > Another thing is that in my
Re: [HACKERS] Retiring from the Core Team
Thanks, Josh, for everything. I especially enjoyed your monthly updates at SFPUG. Cheers, Steve On Thu, Jan 12, 2017 at 1:59 PM, Merlin Moncurewrote: > On Wed, Jan 11, 2017 at 6:29 PM, Josh Berkus wrote: > > Hackers: > > > > You will have noticed that I haven't been very active for the past year. > > My new work on Linux containers and Kubernetes has been even more > > absorbing than I anticipated, and I just haven't had a lot of time for > > PostgreSQL work. > > > > For that reason, as of today, I am stepping down from the PostgreSQL > > Core Team. > > Thanks for all your hard work. FWIW, your blog posts, 'Primary > Keyvil' are some of my favorite of all time! > > merlin > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] Too many autovacuum workers spawned during forced auto-vacuum
On 16 January 2017 at 15:54, Masahiko Sawadawrote: > Since autovacuum worker wakes up autovacuum launcher after launched > the autovacuum launcher could try to spawn worker process at high > frequently if you have database with very large table in it that has > just passed autovacuum_freeze_max_age. > > autovacuum.c:L1605 > /* wake up the launcher */ > if (AutoVacuumShmem->av_launcherpid != 0) > kill(AutoVacuumShmem->av_launcherpid, SIGUSR2); > > I think we should deal with this case as well. When autovacuum is enabled, after getting SIGUSR2, the worker is launched only when it's time to launch. Doesn't look like it will be immediately launched : /* We're OK to start a new worker */ if (dlist_is_empty()) { /* Special case when the list is empty */ } else { . . /* launch a worker if next_worker is right now or it is in the past */ if (TimestampDifferenceExceeds(avdb->adl_next_worker, current_time, 0)) launch_worker(current_time); } So from the above, it looks as if there will not be a storm of workers. Whereas, if autovacuum is disabled, autovacuum launcher does not wait for the worker to start; it just starts the worker and quits; so the issue won't show up here : /* * In emergency mode, just start a worker (unless shutdown was requested) * and go away. */ if (!AutoVacuumingActive()) { if (!got_SIGTERM) do_start_worker(); proc_exit(0); /* done */ } -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company On 16 January 2017 at 15:54, Masahiko Sawada wrote: > On Mon, Jan 16, 2017 at 1:50 PM, Amit Khandekar > wrote: >> On 13 January 2017 at 19:15, Alvaro Herrera wrote: >>> I think this is the same problem as reported in >>> https://www.postgresql.org/message-id/CAMkU=1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta=ypy...@mail.gmail.com >> >> Ah yes, this is the same problem. Not sure why I didn't land on that >> thread when I tried to search pghackers using relevant keywords. >>> === Fix === >>> [...] Instead, the attached patch (prevent_useless_vacuums.patch) prevents the repeated cycle by noting that there's no point in doing whatever vac_update_datfrozenxid() does, if we didn't find anything to vacuum and there's already another worker vacuuming the same database. Note that it uses wi_tableoid field to check concurrency. It does not use wi_dboid field to check for already-processing worker, because using this field might cause each of the workers to think that there is some other worker vacuuming, and eventually no one vacuums. We have to be certain that the other worker has already taken a table to vacuum. >>> >>> Hmm, it seems reasonable to skip the end action if we didn't do any >>> cleanup after all. This would normally give enough time between vacuum >>> attempts for the first worker to make further progress and avoid causing >>> a storm. I'm not really sure that it fixes the problem completely, but >>> perhaps it's enough. >> >> I had thought about this : if we didn't clean up anything, skip the >> end action unconditionally without checking if there was any >> concurrent worker. But then thought it is better to skip only if we >> know there is another worker doing the same job, because : >> a) there might be some reason we are just calling >> vac_update_datfrozenxid() without any condition. But I am not sure >> whether it was intentionally kept like that. Didn't get any leads from >> the history. >> b) it's no harm in updating datfrozenxid() it if there was no other >> worker. In this case, we *know* that there was indeed nothing to be >> cleaned up. So the next time this database won't be chosen again, so >> there's no harm just calling this function. >> > > Since autovacuum worker wakes up autovacuum launcher after launched > the autovacuum launcher could try to spawn worker process at high > frequently if you have database with very large table in it that has > just passed autovacuum_freeze_max_age. > > autovacuum.c:L1605 > /* wake up the launcher */ > if (AutoVacuumShmem->av_launcherpid != 0) > kill(AutoVacuumShmem->av_launcherpid, SIGUSR2); > > I think we should deal with this case as well. > > Regards, > > -- > Masahiko Sawada > NIPPON TELEGRAPH AND TELEPHONE CORPORATION > NTT Open Source Software Center -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cache Hash Index meta page.
On Fri, Jan 13, 2017 at 9:58 AM, Mithun Cywrote: > On Fri, Jan 6, 2017 at 11:43 AM, Amit Kapila wrote: Below are review comments on latest version of patch. 1. /* - * Read the metapage to fetch original bucket and tuple counts. Also, we - * keep a copy of the last-seen metapage so that we can use its - * hashm_spares[] values to compute bucket page addresses. This is a bit - * hokey but perfectly safe, since the interesting entries in the spares - * array cannot change under us; and it beats rereading the metapage for - * each bucket. + * update and get the metapage cache data. */ - metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE); - metap = HashPageGetMeta(BufferGetPage(metabuf)); - orig_maxbucket = metap->hashm_maxbucket; - orig_ntuples = metap->hashm_ntuples; - memcpy(_metapage, metap, sizeof(local_metapage)); - /* release the lock, but keep pin */ - LockBuffer(metabuf, BUFFER_LOCK_UNLOCK); + cachedmetap = _hash_getcachedmetap(rel, true); + orig_maxbucket = cachedmetap->hashm_maxbucket; + orig_ntuples = cachedmetap->hashm_ntuples; (a) I think you can retain the previous comment or modify it slightly. Just removing the whole comment and replacing it with a single line seems like a step backward. (b) Another somewhat bigger problem is that with this new change it won't retain the pin on meta page till the end which means we might need to perform an I/O again during operation to fetch the meta page. AFAICS, you have just changed it so that you can call new API _hash_getcachedmetap, if that's true, then I think you have to find some other way of doing it. BTW, why can't you design your new API such that it always take pinned metapage? You can always release the pin in the caller if required. I understand that you don't always need a metapage in that API, but I think the current usage of that API is also not that good. 2. + if (bucket_opaque->hasho_prevblkno != InvalidBlockNumber || + bucket_opaque->hasho_prevblkno > cachedmetap->hashm_maxbucket) + cachedmetap = _hash_getcachedmetap(rel, true); I don't understand the meaning of above if check. It seems like you will update the metapage when previous block number is not a valid block number which will be true at the first split. How will you ensure that there is a re-split and cached metapage is not relevant. I think if there is && in the above condition, then we can ensure it. 3. + Given a hashkey get the target bucket page with read lock, using cached + metapage. The getbucketbuf_from_hashkey method below explains the same. + All the sentences in algorithm start with small letters, then why do you need an exception for this sentence. I think you don't need to add an empty line. Also, I think the usage of getbucketbuf_from_hashkey seems out of place. How about writing it as: The usage of cached metapage is explained later. 4. + If target bucket is split before metapage data was cached then we are + done. + Else first release the bucket page and then update the metapage cache + with latest metapage data. I think it is better to retain original text of readme and add about meta page update. 5. + Loop: .. .. + Loop again to reach the new target bucket. No need to write "Loop again ..", that is implicit. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension)
On Sun, Jan 8, 2017 at 2:01 PM, Jim Nasbywrote: > On 12/29/16 9:55 PM, Haribabu Kommi wrote: > >> The tuples which don't have multiple copies or frozen data will be moved >> from WOS to ROS periodically by the background worker process or autovauum >> process. Every column data is stored separately in it's relation file. >> There >> is no transaction information is present in ROS. The data in ROS can be >> referred with tuple ID. >> > > Would updates be handled via the delete mechanism you described then? > Updates are handled similar like delete operations, but there are some extra index insert operations occurs in this index even when the update is of HOT type, because of TID-CRID mapping. > In this approach, the column data is present in both heap and columnar >> storage. >> > > ISTM one of the biggest reasons to prefer a column store over heap is to > ditch the 24 byte overhead, so I'm not sure how much of a win this is. > Yes, that' correct. Currently with this approach, it is not possible to ditch the heap completely. This approach is useful for the cases, where the user wants to store only some columns as part of clustered index. Another complication is that one of the big advantages of a CSTORE is > allowing analysis to be done efficiently on a column-by-column (as opposed > to row-by-row) basis. Does your patch by chance provide that? > Not the base patch that I shared. But the further patches provides the data access column-by-column basis using the custom plan methods. > Generally speaking, I do think the idea of adding support for this as an > "index" is a really good starting point, since that part of the system is > pluggable. It might be better to target getting only what needs to be in > core into core to begin with, allowing the other code to remain an > extension for now. I think there's a lot of things that will be discovered > as we start moving into column stores, and it'd be very unfortunate to > accidentally paint the core code into a corner somewhere. > Yes, it is possible to add only the code that is required in the core and keep the other part as extension. Without providing the complete clustered index approach, I doubt whether the necessary hooks and it's code gets accepted to the core. > As a side note, it's possible to get a lot of the benefits of a column > store by using arrays. I've done some experiments with that and got an > 80-90% space reduction, and most queries saw improved performance as well > (there were a few cases that weren't better). The biggest advantage to this > approach is people could start using it today, on any recent version of > Postgres. Interesting experiment. > That would be a great way to gain knowledge on what users would want to > see in a column store, something else I suspect we need. It would also be > far less code than what you or Alvaro are proposing. When it comes to large > changes that don't have crystal-clear requirements, I think that's really > important. > The main use case of this patch is to support mixed load environments, where both OLTP and OLAP queries are possible. The advantage of proposed patch design is, providing good performance to OLAP queries without affecting OLTP. Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal
On Sat, Jan 14, 2017 at 3:10 AM, Vladimir Rusinovwrote: > Attached are two new version of the patch: one keeps aliases, one don't. > Also, remove stray reference to xlog function in one of the tests. > > I've lost vote count. Should we create a form to calculate which one of the > patches should be commited? If we do that, we should vote on all the "renaming" stuff, i.e., not only function names but also program names like pg_receivexlog, directory names like clog, option names like xlogdir option of initdb, return value names of the functions like xlog_position in pg_create_physical_replication_slot, etc. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension)
On Sun, Jan 8, 2017 at 4:20 AM, Bruce Momjianwrote: > On Fri, Dec 30, 2016 at 02:55:39PM +1100, Haribabu Kommi wrote: > > > > Hi All, > > > > Fujitsu was interested in developing a columnar storage extension with > minimal > > changes the server backend. > > > > The columnar store is implemented as an extension using index access > methods. > > This can be easily enhanced with pluggable storage methods once they are > > available. > > Have you see this post from 2015: > > https://www.postgresql.org/message-id/20150831225328.GM2912% > 40alvherre.pgsql > Thanks for the information. Yes, I already checked that mail thread. The proposal in that thread was trying to add the columnar storage in the core itself. The patch that is proposed is an extension to provide columnar storage with the help of index. May be we can discuss the pros and cons in adding columnar store in the core itself or a pluggable storage approach. Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] PoC: Grouped base relation
On 01/17/2017 12:42 AM, David Rowley wrote: On 10 January 2017 at 06:56, Antonin Houskawrote: Before performing the final aggregation, we need to multiply sum(a.x) by count(j.*) because w/o the aggregation at base relation level the input of the query-level aggregation would look like a.i | a.x | b.j 1 | 3 | 1 1 | 4 | 1 1 | 3 | 1 1 | 4 | 1 In other words, grouping of the base relation "b" below the join prevents the join from bringing per-group input set to the aggregate input multiple times. To compensate for this effect, I've added a new field "aggtransmultifn" to pg_aggregate catalog. It "multiplies" the aggregate state w/o repeated processing of the same input set many times. sum() is an example of an aggregate that needs such processing, avg() is one that does not. First off, I'd like to say that making improvements in this area sounds like a great thing. I'm very keen to see progress here. +1 Pushing down aggregates would be very useful for analytical queries. > I've been thinking about this aggtransmultifn and I'm not sure if it's really needed. Adding a whole series of new transition functions is quite a pain. At least I think so, and I have a feeling Robert might agree with me. Let's imagine some worst case (and somewhat silly) aggregate query: SELECT count(*) FROM million_row_table CROSS JOIN another_million_row_table; Today that's going to cause 1 TRILLION transitions! Performance will be terrible. If we pushed the aggregate down into one of those tables and performed a Partial Aggregate on that, then a Finalize Aggregate on that single row result (after the join), then that's 1 million transfn calls, and 1 million combinefn calls, one for each row produced by the join. If we did it your way (providing I understand your proposal correctly) there's 1 million transfn calls on one relation, then 1 million on the other and then 1 multiplyfn call. which does 100 * 100 What did we save vs. using the existing aggcombinefn infrastructure which went into 9.6? Using this actually costs us 1 extra function call, right? I'd imagine the size of the patch to use aggcombinefn instead would be a fraction of the size of the one which included all the new aggmultiplyfns and pg_aggregate.h changes. I think the patch relies on the assumption that the grouping reduces cardinality, so a CROSS JOIN without a GROUP BY clause may not be the best counterexample. Let's instead use an example similar to what Antonin mentioned in the initial post - two tables, with two columns each. CREATE TABLE t1 (a INT, b INT); CREATE TABLE t2 (c INT, d INT); And let's assume each table has 100.000 rows, but only 100 groups in the first column, with 1000 rows per group. Something like INSERT INTO t1 SELECT mod(i,100), i FROM generate_series(1, 1e5) s(i); INSERT INTO t2 SELECT mod(i,100), i FROM generate_series(1, 1e5) s(i); And let's assume this query SELECT t1.a, count(t2.d) FROM t1 JOIN t2 ON (t1.a = t2.c) GROUP BY t1.a; On master, EXPLAIN (COSTS OFF, TIMING OFF, ANALYZE) looks like this: QUERY PLAN - HashAggregate (actual rows=100 loops=1) Group Key: t1.a -> Hash Join (actual rows=1 loops=1) Hash Cond: (t2.c = t1.a) -> Seq Scan on t2 (actual rows=10 loops=1) -> Hash (actual rows=10 loops=1) Buckets: 131072 Batches: 2 Memory Usage: 2716kB -> Seq Scan on t1 (actual rows=10 loops=1) Planning time: 0.167 ms Execution time: 17005.300 ms (10 rows) while with the patch it looks like this QUERY PLAN - Finalize HashAggregate (actual rows=100 loops=1) Group Key: t1.a -> Hash Join (actual rows=100 loops=1) Hash Cond: (t1.a = t2.c) -> Partial HashAggregate (actual rows=100 loops=1) Group Key: t1.a -> Seq Scan on t1 (actual rows=10 loops=1) -> Hash (actual rows=100 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 14kB -> Partial HashAggregate (actual rows=100 loops=1) Group Key: t2.c -> Seq Scan on t2 (actual rows=10 loops=1) Planning time: 0.105 ms Execution time: 31.609 ms (14 rows) This of course happens because with the patch we run the transition function 200k-times (on each side of the join) and aggtransmultifn on the 100 rows produced by the join, while on master the join produces 10.000.000 rows (which already takes much more time), and then have to run the transition function on all those rows. The performance difference is pretty obvious, I guess. > There's already a lot of infrastructure in there to help you detect when this optimisation can be applied. For example,
Re: [HACKERS] GSoC 2017
On 1/13/17 3:09 PM, Peter van Hardenberg wrote: A new data type, and/or a new index type could both be nicely scoped bits of work. Did you have any particular data/index types in mind? Personally I'd love something that worked like a python dictionary, but I'm not sure how that'd work without essentially supporting a variant data type. I've got code for a variant type[1], and I don't think there's any holes in it, but the casting semantics are rather ugly. IIRC that problem appeared to be solvable if there was a hook in the current casting code right before Postgres threw in the towel and said a cast was impossible. 1: https://github.com/BlueTreble/variant/ -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2017
On 1/13/17 4:08 PM, Alvaro Herrera wrote: Jim Nasby wrote: On 1/10/17 1:53 AM, Alexander Korotkov wrote: 1. What project ideas we have? Perhaps allowing SQL-only extensions without requiring filesystem files would be a good project. Don't we already have that in patch form? Dimitri submitted it as I recall. My recollection is that he tried to boil the ocean and also support handing compiled C libraries to the database, which was enough to sink the patch. It might be nice to support that if we could, and maybe it could be a follow-on project. I do think complete lack of support for non-FS extensions is *seriously* hurting use of the feature thanks to environments like RDS and heroku. As Pavel mentioned, untrusted languages are in a similar boat. So maybe the best way to address these things is to advertise them as "increase usability in cloud environments" since cloud excites people. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PSQL commands: \quit_if, \quit_unless
On 1/13/17 11:22 PM, Tom Lane wrote: I wonder what depth of include-file nesting psql can support, or whether we'll be able to fix it to optimize tail recursion of an include file. Because somebody will be asking for that if this is the toolset you give them. I think the solution to that is straightforward: tell users that we hope to eventually support loops and that in the meantime if you try to work around that with recursion you get to keep both pieces when it breaks. While not ideal I think that's a lot better than throwing the whole idea out because some people will abuse it... -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Packages: Again
On 1/13/17 10:56 PM, Serge Rielau wrote: But what irks me in this debate is that any reasoned and detailed argumentation of value of the principle itself is shut down with un-reasoned and un-detailed one-liners. “I’m not convinced” is not an argument. Counterpoints require content. Something starting with “because …” +1. I really can't fathom how someone can flatly say that a nested namespace is a dumb idea. Your filesystem does this. So does plpgsql. I could absolutely make use of nested "schemas" (or make it some other feature if "nested schema" offends you so much). I agree that a nested namespace might violate ANSI (depending on if you overload schema/module), and that it might be hard to do with how Postgres currently works. And those may be reason enough not to attempt the effort. But those have *nothing* to do with how useful such a feature would be to users. This is similar to the 10+ years users would ask for "DDL triggers" and get jumped all over because of how hard it would be to actually put a trigger on a catalog table. Thankfully someone that knew the code AND understood the user desire came up with the notion of event triggers that put hooks into every individual DDL command. Users got what they wanted, without any need to put "real triggers" on catalog tables. FWIW, what I wish for in this area is: - SOME kind of nested namespace for mulitple kinds of objects (not just functions) - A way to mark those namespaces (and possibly other objects) as private. - A way to reference extensions from other extensions and deal with extensions being moved to a different schema (or namespace). -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to implement pg_current_logfile() function
On Tue, Jan 17, 2017 at 2:21 AM, Karl O. Pincwrote: > On January 15, 2017 11:47:51 PM CST, Michael Paquier > wrote: >>On Mon, Jan 16, 2017 at 1:14 PM, Karl O. Pinc wrote: >>With all those issues fixed, I finish with the attached, that I am >>fine to pass down to a committer. I still think that this should be >>only one function using a SRF with rows shaped as (type, path) for >>simplicity, but as I am visibly outnumbered I won't insist further. > > That also makes a certain amount of sense to me but I can't say I have > thought deeply about it. Haven't paid any attention to this issue and am fine > letting things move forward as is. Gilles, what's your opinion here? As the author that's your call at the end. What the point here is would be to change pg_current_logfiles() to something like that: =# SELECT * FROM pg_current_logfiles(); method | file | stderr | pg_log/postgresql.log csvlog | pg_log/postgresql.csv And using this SRF users can filter the method with a WHERE clause. And as a result the 1-arg version is removed. No rows are returned if current_logfiles does not exist. I don't think there is much need for a system view either. >>Also, I would rather see an ERROR returned to the user in case of bad >>data in current_logfiles, I did not change that either as that's the >>original intention of Gilles. > > I could be wrong but I seem to recall that Robert recommended against an > error message. If there is bad data then something is really wrong, up to > some sort of an attack on the back end. Because this sort of thing simply > shouldn't happen it's enough in my opinion to guard against buffer overruns > etc and just get on with life. If something goes unexpectedly and badly wrong > with internal data structures in general there would have to be all kinds of > additional code to cover every possible problem and that doesn't seem to make > sense. Hm... OK. At the same time not throwing at least a WARNING is confusing, because a NULL result is returned as well even if the input method is incorrect and even if the method is correct but it is not present in current_logfiles. As the user is thought as a trusted user as it has access to this function, I would think that being verbose on the error handling, or at least warnings, would make things easier to analyze. > Sorry about the previous email with empty content. My email client got away > from me. No problem. That happens. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Polyphase merge is obsolete
On Wed, Oct 12, 2016 at 10:16 AM, Heikki Linnakangaswrote: > The number of *input* tapes we can use in each merge pass is still limited, > by the memory needed for the tape buffers and the merge heap, but only one > output tape is active at any time. The inactive output tapes consume very > few resources. So the whole idea of trying to efficiently reuse input tapes > as output tapes is pointless I picked this up again. The patch won't apply cleanly. Can you rebase? Also, please look at my bugfix for logtape.c free block management [1] before doing so, as that might be prerequisite. Finally, I don't think that the Logical tape pause/resume idea is compelling. Is it hard to not do that, but still do everything else that you propose here? That's what I lean towards doing right now. Anyway, efficient use of tapes certainly mattered a lot more when rewinding meant sitting around for a magnetic tape deck to physically rewind. There is another algorithm in AoCP Vol III that lets us write to tapes backwards, actually, which is motivated by similar obsolete considerations about hardware. Why not write while we rewind, to avoid doing nothing else while rewinding?! Perhaps this patch should make a clean break from the "rewinding" terminology. Perhaps you should rename LogicalTapeRewindForRead() to LogicalTapePrepareForRead(), and so on. It's already a bit awkward that that routine is called LogicalTapeRewindForRead(), because it behaves significantly differently when a tape is frozen, and because the whole point of logtape.c is space reuse that is completely dissimilar to rewinding. (Space reuse is thus totally unlike how polyphase merge is supposed to reuse space, which is all about rewinding, and isn't nearly as eager. Same applies to K-way balanced merge, of course.) I think that the "rewinding" terminology does more harm than good, now that it doesn't even help the Knuth reader to match Knuth's remarks to what's going on in tuplesort.c. Just a thought. > Let's switch over to a simple k-way balanced merge. Because it's simpler. If > you're severely limited on memory, like when sorting 1GB of data with > work_mem='1MB' or less, it's also slightly faster. I'm not too excited about > the performance aspect, because in the typical case of a single-pass merge, > there's no difference. But it would be worth changing on simplicity grounds, > since we're mucking around in tuplesort.c anyway. I actually think that the discontinuities in the merge scheduling are worse than you suggest here. There doesn't have to be as extreme a difference between work_mem and the size of input as you describe here. As an example: create table seq_tab(t int); insert into seq_tab select generate_series(1, 1000); set work_mem = '4MB'; select count(distinct t) from seq_tab; The trace_sort output ends like this: 30119/2017-01-16 17:17:05 PST LOG: begin datum sort: workMem = 4096, randomAccess = f 30119/2017-01-16 17:17:05 PST LOG: switching to external sort with 16 tapes: CPU: user: 0.07 s, system: 0.00 s, elapsed: 0.06 s 30119/2017-01-16 17:17:05 PST LOG: starting quicksort of run 1: CPU: user: 0.07 s, system: 0.00 s, elapsed: 0.06 s 30119/2017-01-16 17:17:05 PST LOG: finished quicksort of run 1: CPU: user: 0.07 s, system: 0.00 s, elapsed: 0.07 s *** SNIP *** 30119/2017-01-16 17:17:08 PST LOG: finished writing run 58 to tape 0: CPU: user: 2.50 s, system: 0.27 s, elapsed: 2.78 s 30119/2017-01-16 17:17:08 PST LOG: using 4095 KB of memory for read buffers among 15 input tapes 30119/2017-01-16 17:17:08 PST LOG: finished 1-way merge step: CPU: user: 2.52 s, system: 0.28 s, elapsed: 2.80 s 30119/2017-01-16 17:17:08 PST LOG: finished 4-way merge step: CPU: user: 2.58 s, system: 0.30 s, elapsed: 2.89 s 30119/2017-01-16 17:17:08 PST LOG: finished 14-way merge step: CPU: user: 2.86 s, system: 0.34 s, elapsed: 3.20 s 30119/2017-01-16 17:17:08 PST LOG: finished 14-way merge step: CPU: user: 3.09 s, system: 0.41 s, elapsed: 3.51 s 30119/2017-01-16 17:17:09 PST LOG: finished 15-way merge step: CPU: user: 3.61 s, system: 0.52 s, elapsed: 4.14 s 30119/2017-01-16 17:17:09 PST LOG: performsort done (except 15-way final merge): CPU: user: 3.61 s, system: 0.52 s, elapsed: 4.14 s 30119/2017-01-16 17:17:10 PST LOG: external sort ended, 14678 disk blocks used: CPU: user: 4.93 s, system: 0.57 s, elapsed: 5.51 s (This is the test case that Cy posted earlier today, for the bug that was just fixed in master.) The first 1-way merge step is clearly kind of a waste of time. We incur no actual comparisons during this "merge", since there is only one real run from each input tape (all other active tapes contain only dummy runs). We are, in effect, just shoveling the tuples from that single run from one tape to another (from one range in the underlying logtape.c BufFile space to another). I've seen this quite a lot before, over the years, while working on sorting. It's not that big of a deal, but it's a degenerate case that
Re: [HACKERS] pg_hba_file_settings view patch
On Tue, Jan 10, 2017 at 6:35 PM, Michael Paquierwrote: > On Thu, Jan 5, 2017 at 1:58 PM, Michael Paquier > wrote: > > Could you hold on a bit to commit that? I'd like to look at it in more > > details. At quick glance, there is for example no need to use > > CreateTemplateTupleDesc and list the columns both in pg_proc.h and the > > C routine itself. And memset() can be used in fill_hba_line for the > > error code path. > > And here we go. > Thanks for the review. > + > +postgres=# select * from pg_hba_rules; > [... large example ...] > + > + > It would be nice to reduce the width of this example. That's not going > to be friendly with the generated html. > Added a small example. + switch (hba->conntype) > + { > + case ctLocal: > + values[index] = CStringGetTextDatum("local"); > + break; > + case ctHost: > + values[index] = CStringGetTextDatum("host"); > + break; > + case ctHostSSL: > + values[index] = CStringGetTextDatum("hostssl"); > + break; > + case ctHostNoSSL: > + values[index] = CStringGetTextDatum("hostnossl"); > + break; > + default: > + elog(ERROR, "unexpected connection type in parsed HBA > entry"); > + break; > + } > Here let's remove the break clause and let compilers catch problem > when they show up. > Removed. > + if (hba->pamservice) > + { > + initStringInfo(); > + appendStringInfoString(, "pamservice="); > + appendStringInfoString(, hba->pamservice); > + options[noptions++] = CStringGetTextDatum(str.data); > + } > There is a bunch of code duplication here. I think that it would make > more sense to encapsulate that into a routine, at least let's use > appendStringInfo and let's group the two calls together. > Use a new function to reduce the repeated lines of code. > +/* LDAP supports 10 currently, keep this well above the most any > method needs */ > +#define MAX_OPTIONS 12 > Er, why? There is an assert already, that should be enough. > Which Assert? This macro is used to verify that the maximum number of authentication options that are possible for a single hba line. > =# \d pg_hba_rules >View "pg_catalog.pg_hba_rules" > Column | Type | Collation | Nullable | Default > --+-+---+--+- > line_number | integer | | | > type | text| | | > keyword_database | text[] | | | > database | text[] | | | > keyword_user | text[] | | | > user_name| text[] | | | > keyword_address | text| | | > address | inet| | | > netmask | inet| | | > hostname | text| | | > method | text| | | > options | text[] | | | > error| text| | | > keyword_database and database map actually to the same thing if you > refer to a raw pg_hba.conf file because they have the same meaning for > user. You could simplify the view and just remove keyword_database, > keyword_user and keyword_address. This would simplify your patch as > well with all hte mumbo-jumbo to see if a string is a dedicated > keyword or not. In its most simple shape pg_hba_rules should show to > the user as an exact map of the entries of the raw file. > I removed keyword_database and keyword_user columns where the data in those columns can easily represent with the database and user columns. But for address filed can contains keywords such as "same host" and etc and also a hostname also. Because of this reason, this filed is converted into 3 columns in the view. I have copied the example file of pg_hba.conf, reloaded it: > https://www.postgresql.org/docs/devel/static/auth-pg-hba-conf.html > And then the output result gets corrupted by showing up free()'d results: > null | null| \x7F\x7F\x7F\x7F\x7F > There was a problem in resetting the error string, working with attached patch. > + if (err_msg) > + { > + /* type */ > + index++; > + nulls[index] = true; > [... long sequence ...] > Please let's use MemSet here and remove this large chunk of code.. > Done. > + if (!superuser()) > + ereport(ERROR, > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), > +(errmsg("must be superuser to view pg_hba.conf > settings"; > Access to the function is already revoked, so what's the point of this > superuser check? > Removed. > > + tupdesc = CreateTemplateTupleDesc(NUM_PG_HBA_LOOKUP_ATTS, false); > + TupleDescInitEntry(tupdesc, (AttrNumber) 1,
Re: [HACKERS] PoC: Grouped base relation
On 10 January 2017 at 06:56, Antonin Houskawrote: > Before performing the final aggregation, we need to multiply sum(a.x) by > count(j.*) because w/o the aggregation at base relation level the input > of the query-level aggregation would look like > > a.i | a.x | b.j > > 1 | 3 | 1 > 1 | 4 | 1 > 1 | 3 | 1 > 1 | 4 | 1 > > In other words, grouping of the base relation "b" below the join prevents the > join from bringing per-group input set to the aggregate input multiple > times. To compensate for this effect, I've added a new field "aggtransmultifn" > to pg_aggregate catalog. It "multiplies" the aggregate state w/o repeated > processing of the same input set many times. sum() is an example of an > aggregate that needs such processing, avg() is one that does not. First off, I'd like to say that making improvements in this area sounds like a great thing. I'm very keen to see progress here. I've been thinking about this aggtransmultifn and I'm not sure if it's really needed. Adding a whole series of new transition functions is quite a pain. At least I think so, and I have a feeling Robert might agree with me. Let's imagine some worst case (and somewhat silly) aggregate query: SELECT count(*) FROM million_row_table CROSS JOIN another_million_row_table; Today that's going to cause 1 TRILLION transitions! Performance will be terrible. If we pushed the aggregate down into one of those tables and performed a Partial Aggregate on that, then a Finalize Aggregate on that single row result (after the join), then that's 1 million transfn calls, and 1 million combinefn calls, one for each row produced by the join. If we did it your way (providing I understand your proposal correctly) there's 1 million transfn calls on one relation, then 1 million on the other and then 1 multiplyfn call. which does 100 * 100 What did we save vs. using the existing aggcombinefn infrastructure which went into 9.6? Using this actually costs us 1 extra function call, right? I'd imagine the size of the patch to use aggcombinefn instead would be a fraction of the size of the one which included all the new aggmultiplyfns and pg_aggregate.h changes. There's already a lot of infrastructure in there to help you detect when this optimisation can be applied. For example, AggClauseCosts.hasNonPartial will be true if any aggregates don't have a combinefn, or if there's any DISTINCT or ORDER BY aggregates, which'll also mean you can't apply the optimisation. It sounds like a much more manageable project by using aggcombinefn instead. Then maybe one day when we can detect if a join did not cause any result duplication (i.e Unique Joins), we could finalise the aggregates on the first call, and completely skip the combine state altogether. Thanks for your work on this. Regards David Rowley -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: function xmltable
In case this still matters, I think GetValue should look more or less like this (untested): /* * Return the value for column number 'colnum' for the current row. If column * -1 is requested, return representation of the whole row. * * This leaks memory, so be sure to reset often the context in which it's * called. */ static Datum XmlTableGetValue(TableExprState *state, int colnum, bool *isnull) { #ifdef USE_LIBXML XmlTableBuilderData *xtCxt; Datum result = (Datum) 0; xmlNodePtr cur; char *cstr = NULL; volatile xmlXPathObjectPtr xpathobj; xtCxt = GetXmlTableBuilderPrivateData(state, "XmlTableGetValue"); Assert(xtCxt->xpathobj && xtCxt->xpathobj->type == XPATH_NODESET && xtCxt->xpathobj->nodesetval != NULL); /* Propagate context related error context to libxml2 */ xmlSetStructuredErrorFunc((void *) xtCxt->xmlerrcxt, xml_errorHandler); cur = xtCxt->xpathobj->nodesetval->nodeTab[xtCxt->row_count - 1]; if (cur->type != XML_ELEMENT_NODE) elog(ERROR, "unexpected xmlNode type"); /* Handle whole row case the easy way. */ if (colnum == -1) { text *txt; txt = xml_xmlnodetoxmltype(cur, xtCxt->xmlerrcxt); result = InputFunctionCall(>in_functions[0], text_to_cstring(txt), state->typioparams[0], -1); *isnull = false; return result; } Assert(xtCxt->xpathscomp[colnum] != NULL); xpathobj = NULL; PG_TRY(); { Form_pg_attribute attr; attr = state->resultSlot->tts_tupleDescriptor->attrs[colnum]; /* Set current node as entry point for XPath evaluation */ xmlXPathSetContextNode(cur, xtCxt->xpathcxt); /* Evaluate column path */ xpathobj = xmlXPathCompiledEval(xtCxt->xpathscomp[colnum], xtCxt->xpathcxt); if (xpathobj == NULL || xtCxt->xmlerrcxt->err_occurred) xml_ereport(xtCxt->xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR, "could not create XPath object"); if (xpathobj->type == XPATH_NODESET) { int count; Oid targettypid = attr->atttypid; if (xpathobj->nodesetval != NULL) count = xpathobj->nodesetval->nodeNr; /* * There are four possible cases, depending on the number of * nodes returned by the XPath expression and the type of the * target column: a) XPath returns no nodes. b) One node is * returned, and column is of type XML. c) One node, column type * other than XML. d) Multiple nodes are returned. */ if (xpathobj->nodesetval == NULL) { *isnull = true; } else if (count == 1 && targettypid == XMLOID) { textstr = xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[0], xtCxt->xmlerrcxt); cstr = text_to_cstring(textstr); } else if (count == 1) { xmlChar*str; str = xmlNodeListGetString(xtCxt->doc, xpathobj->nodesetval->nodeTab[0]->xmlChildrenNode, 1); if (str) { PG_TRY(); { cstr = pstrdup(str); } PG_CATCH(); { xmlFree(str); PG_RE_THROW(); } PG_END_TRY(); xmlFree(str); } else
Re: [HACKERS] patch: function xmltable
Given https://www.postgresql.org/message-id/20170116210019.a3glfwspg5lnf...@alap3.anarazel.de which is going to heavily change how the executor works in this area, I am returning this patch to you again. I would like a few rather minor changes: 1. to_xmlstr can be replaced with calls to xmlCharStrdup. 2. don't need xml_xmlnodetostr either -- just use xml_xmlnodetoxmltype (which returns text*) and extract the cstring from the varlena. It's a bit more wasteful in terms of cycles, but I don't think we care. If we do care, change the function so that it returns cstring, and have the callers that want text wrap it in cstring_to_text. 3. have a new perValueCxt memcxt in TableExprState, child of buildercxt, and switch to it just before GetValue() (reset it just before switching). Then, don't worry about leaks in GetValue. This way, the text* conversions et al don't matter. After that I think we're going to need to get this working on top of Andres' changes. Which I'm afraid is going to be rather major surgery, but I haven't looked. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Permit dump/reload of not-too-large >1GB tuples
Alvaro Herrerawrites: > Permit dump/reload of not-too-large >1GB tuples I noticed that this commit has created an overflow hazard on 64-bit hardware: it's not difficult to attempt to make a tuple exceeding 4GB. You just need a few GB-sized input values. But that's uncool for a couple of reasons: * HeapTupleData.t_len can't represent it (it's only uint32) * heap_form_tuple sets up the tuple to possibly become a composite datum, meaning it applies SET_VARSIZE. That's going to silently overflow for any size above 1GB. In short, as this stands it's a security problem. I'm not sure whether it's appropriate to try to change t_len to type Size to address the first problem. We could try, but I don't know what the fallout would be. Might be better to just add a check disallowing tuple size >= 4GB. As for the second problem, we clearly don't have any prospect of allowing composite datums that exceed 1GB, since they have to have varlena length words. We can allow plain HeapTuples to exceed that, but it'd be necessary to distinguish whether the value being built is going to become a composite datum or not. That requires an API break for heap_form_tuple, or else a separate function to convert a HeapTuple to Datum (and apply a suitable length check). Either way it's a bit of a mess. In any case, a great deal more work is needed before this can possibly be safe. I recommend reverting it pending that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)
Andres Freund wrote: > On 2017-01-16 16:04:34 -0300, Alvaro Herrera wrote: > > Andres Freund wrote: > > > On 2017-01-16 12:17:46 -0300, Alvaro Herrera wrote: > > > > Hmm. I wonder if your stuff could be used as support code for > > > > XMLTABLE[1]. > > > > > > I don't immediately see what functionality overlaps, could you expand on > > > that? > > > > Well, I haven't read any previous patches in this area, but the xmltable > > patch adds a new way of handling set-returning expressions, so it > > appears vaguely related. > > Ugh. That's not good - I'm about to remove isDone. Like completely. > That's why I'm actually working on all this, because random expressions > returning more rows makes optimizing expression evaluation a lot harder. Argh. > > These aren't properly functions in the current sense of the word, > > though. > > Why aren't they? Looks like it'd be doable to do so, at least below the > parser level? Hmm ... -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] check_srf_call_placement() isn't always setting p_hasTargetSRFs
On 2017-01-16 14:34:58 -0500, Tom Lane wrote: > Will go fix these things. Thanks! Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)
On 2017-01-16 16:04:34 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > On 2017-01-16 12:17:46 -0300, Alvaro Herrera wrote: > > > Hmm. I wonder if your stuff could be used as support code for > > > XMLTABLE[1]. > > > > I don't immediately see what functionality overlaps, could you expand on > > that? > > Well, I haven't read any previous patches in this area, but the xmltable > patch adds a new way of handling set-returning expressions, so it > appears vaguely related. Ugh. That's not good - I'm about to remove isDone. Like completely. That's why I'm actually working on all this, because random expressions returning more rows makes optimizing expression evaluation a lot harder. > These aren't properly functions in the current sense of the word, > though. Why aren't they? Looks like it'd be doable to do so, at least below the parser level? Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)
Hi, On 2017-01-16 14:13:18 -0500, Tom Lane wrote: > Andres Freundwrites: > > That worked quite well. So we have a few questions, before I clean this > > up: > > > - For now the node is named 'Srf' both internally and in explain - not > > sure if we want to make that something longer/easier to understand for > > others? Proposals? TargetFunctionScan? SetResult? > > "Srf" is ugly as can be, and unintelligible. SetResult might be OK. Named it SetResult - imo looks ok. I think I do prefer the separate node type over re-using Result. The planner integration looks cleaner to me due to not needing the srfpp special cases and such. > > Comments? > > Hard to comment on your other points without a patch to look at. Attached the current version. There's a *lot* of pending cleanup needed (especially in execQual.c) removing outdated code/comments etc, but this seems good enough for a first review. I'd want that cleanup done in a separate patch anyway. Attached are two patches. The first is just a rebased version (just some hunk offset changed) of your planner patch, on top of that is my executor patch. My patch moves some minor detail in yours around, and I do think they should eventually be merged; but leaving it split for a round displays the changes more cleanly. Additional questions: - do we care about SRFs that don't actually return a set? If so we need to change the error checking code in ExecEvalFunc/Oper and move it to the actual invocation. - the FuncExpr/OpExpr check in ExecMakeFunctionResult is fairly ugly imo - but I don't quite see a much better solution. Greetings, Andres >From 2c16e67f46f418239ab90a51611f168508bac66e Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Sun, 15 Jan 2017 19:23:22 -0800 Subject: [PATCH 1/2] Put SRF into a separate node v1. Author: Tom Lane Discussion: https://postgr.es/m/557.1473895...@sss.pgh.pa.us --- src/backend/nodes/outfuncs.c | 1 + src/backend/optimizer/plan/createplan.c | 33 - src/backend/optimizer/plan/planner.c | 219 +-- src/backend/optimizer/util/clauses.c | 104 ++- src/backend/optimizer/util/pathnode.c| 75 +++ src/backend/optimizer/util/tlist.c | 199 src/include/nodes/relation.h | 1 + src/include/optimizer/clauses.h | 1 - src/include/optimizer/pathnode.h | 4 + src/include/optimizer/tlist.h| 3 + src/test/regress/expected/aggregates.out | 3 +- src/test/regress/expected/limit.out | 10 +- src/test/regress/expected/rangefuncs.out | 10 +- src/test/regress/expected/subselect.out | 26 ++-- src/test/regress/expected/tsrf.out | 11 +- 15 files changed, 544 insertions(+), 156 deletions(-) diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index cf0a6059e9..73fdc9706d 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -1805,6 +1805,7 @@ _outProjectionPath(StringInfo str, const ProjectionPath *node) WRITE_NODE_FIELD(subpath); WRITE_BOOL_FIELD(dummypp); + WRITE_BOOL_FIELD(srfpp); } static void diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index c7bcd9b84c..875de739a8 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -1421,8 +1421,21 @@ create_projection_plan(PlannerInfo *root, ProjectionPath *best_path) Plan *subplan; List *tlist; - /* Since we intend to project, we don't need to constrain child tlist */ - subplan = create_plan_recurse(root, best_path->subpath, 0); + /* + * XXX Possibly-temporary hack: if the subpath is a dummy ResultPath, + * don't bother with it, just make a Result with no input. This avoids an + * extra Result plan node when doing "SELECT srf()". Depending on what we + * decide about the desired plan structure for SRF-expanding nodes, this + * optimization might have to go away, and in any case it'll probably look + * a good bit different. + */ + if (IsA(best_path->subpath, ResultPath) && + ((ResultPath *) best_path->subpath)->path.pathtarget->exprs == NIL && + ((ResultPath *) best_path->subpath)->quals == NIL) + subplan = NULL; + else + /* Since we intend to project, we don't need to constrain child tlist */ + subplan = create_plan_recurse(root, best_path->subpath, 0); tlist = build_path_tlist(root, _path->path); @@ -1441,8 +1454,9 @@ create_projection_plan(PlannerInfo *root, ProjectionPath *best_path) * creation, but that would add expense to creating Paths we might end up * not using.) */ - if (is_projection_capable_path(best_path->subpath) || - tlist_same_exprs(tlist, subplan->targetlist)) + if (!best_path->srfpp && + (is_projection_capable_path(best_path->subpath) || + tlist_same_exprs(tlist, subplan->targetlist))) { /* Don't need a separate Result, just assign tlist to subplan */
Re: [HACKERS] check_srf_call_placement() isn't always setting p_hasTargetSRFs
I wrote: > Andres Freundwrites: >> I wonder if there should be a seperate expression type for >> the INSERT ... VALUES(exactly-one-row); since that behaves quite >> differently. > Perhaps. Or maybe we should just use EXPR_KIND_SELECT_TARGET for that? After looking around, I think we probably better use a different EXPR_KIND; even if all the functionality is identical, we don't want ParseExprKindName() to say "SELECT" when we're throwing an error for INSERT...VALUES. Also, I noticed that we don't actually allow SRFs in VALUES RTEs: regression=# select * from (values(1,generate_series(11,13)),(2,0)) v; ERROR: set-valued function called in context that cannot accept a set That's because ValuesNext doesn't handle it. I'm not particularly excited about fixing that, given that it's always been that way and no one has complained yet. But check_srf_call_placement() is misinformed, since it thinks the case works. Will go fix these things. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: function xmltable
I just realized that your new xml_xmlnodetostr is line-by-line identical to the existing xml_xmlnodetoxmltype except for two or three lines. I think that's wrong. I'm going to patch the existing function so that they can share code. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)
Andres Freundwrites: > That worked quite well. So we have a few questions, before I clean this > up: > - For now the node is named 'Srf' both internally and in explain - not > sure if we want to make that something longer/easier to understand for > others? Proposals? TargetFunctionScan? SetResult? "Srf" is ugly as can be, and unintelligible. SetResult might be OK. > - I continued with the division of Labor that Tom had set up, so we're > creating one Srf node for each "nested" set of SRFs. We'd discussed > nearby to change that for one node/path for all nested SRFs, partially > because of costing. But I don't like the idea that much anymore. The > implementation seems cleaner (and probably faster) this way, and I > don't think nested targetlist SRFs are something worth optimizing > for. Anybody wants to argue differently? Not me. > Comments? Hard to comment on your other points without a patch to look at. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] check_srf_call_placement() isn't always setting p_hasTargetSRFs
Andres Freundwrites: > Is there a reason not to just set p_hasTargetSRFs once towards the end > of the function, instead of doing so for all the non-error cases? Yes: it's not supposed to get set when the SRF is in FROM. > I wonder if there should be a seperate expression type for > the INSERT ... VALUES(exactly-one-row); since that behaves quite > differently. Perhaps. Or maybe we should just use EXPR_KIND_SELECT_TARGET for that? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)
Andres Freund wrote: > On 2017-01-16 12:17:46 -0300, Alvaro Herrera wrote: > > Andres Freund wrote: > > > > > That worked quite well. So we have a few questions, before I clean this > > > up: > > > > > > - For now the node is named 'Srf' both internally and in explain - not > > > sure if we want to make that something longer/easier to understand for > > > others? Proposals? TargetFunctionScan? SetResult? > > > > > > - We could alternatively add all this into the Result node - it's not > > > actually a lot of new code, and most of that is boilerplate stuff > > > about adding a new node. I'm ok with both. > > > > Hmm. I wonder if your stuff could be used as support code for > > XMLTABLE[1]. > > I don't immediately see what functionality overlaps, could you expand on > that? Well, I haven't read any previous patches in this area, but the xmltable patch adds a new way of handling set-returning expressions, so it appears vaguely related. These aren't properly functions in the current sense of the word, though. There is some parallel to what ExecMakeFunctionResult does, which I suppose is related. > > Currently it has a bit of additional code of its own, > > though admittedly it's very little code executor-side. Would you mind > > sharing a patch, or more details on how it works? > > Can do both; cleaning up the patch now. What we're talking about here is > a way to implement targetlist SRF that is based on: > > 1) a patch by Tom that creates additional Result (or now Srf) executor >nodes containing SRF evaluation. This guarantees that only Result/Srf >nodes have to deal with targetlist SRF evaluation. > > 2) new code to evaluate SRFs in the new Result/Srf node, that doesn't >rely on ExecEvalExpr et al. to have a IsDone argument. Instead >there's special code to handle that in the new node. That's possible >because it's now guaranted that all SRFs are "toplevel" in the >relevant targetlist(s). > > 3) Removal of all nearly tSRF related code execQual.c and other >executor/ files, including the node->ps.ps_TupFromTlist checks >everywhere. > > makes sense? Hmm, okay. (The ps_TupFromTlist thing has long seemed an ugly construction.) I think the current term for this kind of thing is TableFunction -- are you really naming this "Srf" literally? It seems strange, but maybe it's just me. > > Nested targetlist SRFs make my head spin. I suppose they may have some > > use, but where would you really want this: > > I think there's some cases where it can be useful. Targetlist SRFs as a > whole really are much more about backward compatibility than anything :) Sure. > > ? If supporting the former makes it harder to support/optimize more > > reasonable cases, it seems fair game to leave them behind. > > I don't want to desupport them, just don't want to restructure (one node > doing several levels of SRFs, instead of one per level) just to make it > easier to give good estimates. No objections. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tuple sort is broken. It crashes on simple test.
2017-01-16 19:43 GMT+01:00 Peter Geoghegan: > On Mon, Jan 16, 2017 at 10:38 AM, Pavel Stehule > wrote: > > Should not be enhanced regress tests too? > > We already have coverage of multi-pass external tuplesorts, as of a > few months back. That didn't catch this bug only because it was a > pass-by-value datum tuplesort. The relevant RELEASE_SLAB_SLOT() call > does have line coverage already. > > I wouldn't object to adding a test case that would have exercised this > bug, too. It took me a while to talk Tom into the test that was added > several months back, which discouraged me from adding another test > case here. (There were concerns about the overhead of an external sort > test on slower buildfarm animals.) > ok > > -- > Peter Geoghegan >
Re: [HACKERS] Tuple sort is broken. It crashes on simple test.
Peter Geogheganwrites: > The problem was that one particular call to the macro > RELEASE_SLAB_SLOT() happened to lack a test-for-NULL-argument needed > by pass-by-value datum cases. The other two RELEASE_SLAB_SLOT() calls > already have such a check. > Attached patch fixes the bug. Pushed, thanks. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)
On 2017-01-16 12:17:46 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > > That worked quite well. So we have a few questions, before I clean this > > up: > > > > - For now the node is named 'Srf' both internally and in explain - not > > sure if we want to make that something longer/easier to understand for > > others? Proposals? TargetFunctionScan? SetResult? > > > > - We could alternatively add all this into the Result node - it's not > > actually a lot of new code, and most of that is boilerplate stuff > > about adding a new node. I'm ok with both. > > Hmm. I wonder if your stuff could be used as support code for > XMLTABLE[1]. I don't immediately see what functionality overlaps, could you expand on that? > Currently it has a bit of additional code of its own, > though admittedly it's very little code executor-side. Would you mind > sharing a patch, or more details on how it works? Can do both; cleaning up the patch now. What we're talking about here is a way to implement targetlist SRF that is based on: 1) a patch by Tom that creates additional Result (or now Srf) executor nodes containing SRF evaluation. This guarantees that only Result/Srf nodes have to deal with targetlist SRF evaluation. 2) new code to evaluate SRFs in the new Result/Srf node, that doesn't rely on ExecEvalExpr et al. to have a IsDone argument. Instead there's special code to handle that in the new node. That's possible because it's now guaranted that all SRFs are "toplevel" in the relevant targetlist(s). 3) Removal of all nearly tSRF related code execQual.c and other executor/ files, including the node->ps.ps_TupFromTlist checks everywhere. makes sense? > > - I continued with the division of Labor that Tom had set up, so we're > > creating one Srf node for each "nested" set of SRFs. We'd discussed > > nearby to change that for one node/path for all nested SRFs, partially > > because of costing. But I don't like the idea that much anymore. The > > implementation seems cleaner (and probably faster) this way, and I > > don't think nested targetlist SRFs are something worth optimizing > > for. Anybody wants to argue differently? > > Nested targetlist SRFs make my head spin. I suppose they may have some > use, but where would you really want this: I think there's some cases where it can be useful. Targetlist SRFs as a whole really are much more about backward compatibility than anything :) > ? If supporting the former makes it harder to support/optimize more > reasonable cases, it seems fair game to leave them behind. I don't want to desupport them, just don't want to restructure (one node doing several levels of SRFs, instead of one per level) just to make it easier to give good estimates. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tuple sort is broken. It crashes on simple test.
On Mon, Jan 16, 2017 at 10:38 AM, Pavel Stehulewrote: > Should not be enhanced regress tests too? We already have coverage of multi-pass external tuplesorts, as of a few months back. That didn't catch this bug only because it was a pass-by-value datum tuplesort. The relevant RELEASE_SLAB_SLOT() call does have line coverage already. I wouldn't object to adding a test case that would have exercised this bug, too. It took me a while to talk Tom into the test that was added several months back, which discouraged me from adding another test case here. (There were concerns about the overhead of an external sort test on slower buildfarm animals.) -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to implement pg_current_logfile() function
Karl O. Pinc wrote: > I do have a question here regards code formatting. > The patch now contains: > > if (log_filepath == NULL) > { > /* Bad data. Avoid segfaults etc. and return NULL to caller. */ > break; > } > > I'm not sure how this fits in with PG coding style, > whether the {} should be removed or what. I've looked > around and can't find an example of an if with a single > line then block and a comment. Maybe this means that > I shouldn't be doing this, but if I put the comment above > the if then it will "run into" the comment block which > immediately precedes the above code which describes > a larger body of code. So perhaps someone should look > at this and tell me how to improve it. I think this style is good. Following the style guide to the letter would lead to remove the braces and keep the comment where it is; pgindent will correctly keep at its current indentation. We do use that style in a couple of places. It looks a bit clunky. In most places we do keep those braces, for readability and future-proofing in case somebody inadvertently introduces another statement to the "block". We don't automatically remove braces anyway. (We used to do that, but stopped a few years ago shortly after introducing PG_TRY). Putting the comment outside (above) the "if" would be wrong, too; you'd have to rephrase the comment in a conditional tense. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tuple sort is broken. It crashes on simple test.
2017-01-16 19:24 GMT+01:00 Peter Geoghegan: > On Mon, Jan 16, 2017 at 3:48 AM, Michael Paquier > wrote: > > Indeed. It crashes for me immediately by adding an ORDER BY: > > select count(distinct t) from seq_tab order by 1; > > The problem was that one particular call to the macro > RELEASE_SLAB_SLOT() happened to lack a test-for-NULL-argument needed > by pass-by-value datum cases. The other two RELEASE_SLAB_SLOT() calls > already have such a check. > > Attached patch fixes the bug. > > Should not be enhanced regress tests too? Regards Pavel > -- > Peter Geoghegan > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > >
Re: [HACKERS] Tuple sort is broken. It crashes on simple test.
On Mon, Jan 16, 2017 at 3:48 AM, Michael Paquierwrote: > Indeed. It crashes for me immediately by adding an ORDER BY: > select count(distinct t) from seq_tab order by 1; The problem was that one particular call to the macro RELEASE_SLAB_SLOT() happened to lack a test-for-NULL-argument needed by pass-by-value datum cases. The other two RELEASE_SLAB_SLOT() calls already have such a check. Attached patch fixes the bug. -- Peter Geoghegan From ce24bff1aad894b607ee1ce67757efe72c5acb93 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Mon, 16 Jan 2017 10:14:02 -0800 Subject: [PATCH] Fix NULL pointer dereference in tuplesort.c This could cause a crash when an external datum tuplesort of a pass-by-value type required multiple passes. --- src/backend/utils/sort/tuplesort.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c index cbaf009..e1e692d 100644 --- a/src/backend/utils/sort/tuplesort.c +++ b/src/backend/utils/sort/tuplesort.c @@ -2800,7 +2800,8 @@ mergeonerun(Tuplesortstate *state) WRITETUP(state, destTape, >memtuples[0]); /* recycle the slot of the tuple we just wrote out, for the next read */ - RELEASE_SLAB_SLOT(state, state->memtuples[0].tuple); + if (state->memtuples[0].tuple) + RELEASE_SLAB_SLOT(state, state->memtuples[0].tuple); /* * pull next tuple from the tape, and replace the written-out tuple in -- 2.7.4 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackups and slots
On Mon, Jan 16, 2017 at 4:33 PM, Fujii Masaowrote: > On Mon, Jan 16, 2017 at 10:00 PM, Magnus Hagander > wrote: > > > > > > On Wed, Jan 4, 2017 at 3:32 PM, Peter Eisentraut > > wrote: > >> > >> This patch looks reasonable to me. > >> > >> Attached is a top-up patch with a few small fixups. > >> > >> I suggest to wait for the resolution of the "Replication/backup > >> defaults" thread. I would not want to be in a situation where users who > >> have not been trained to use replication slots now have yet another > >> restart-only parameter to set before they can take a backup. > >> > > > > Thanks for the review and the top-up patch. I've applied it and pushed. > > - if (replication_slot && includewal != STREAM_WAL) > + if ((replication_slot || no_slot) && includewal != STREAM_WAL) > > Why do we need to check "no_slot" here? Since no_slot=true means that > no temporary replication slot is specified, it's fine even with both -X > none > and fetch. > Um yeah, that looks like a bad merge actually. Will fix. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Patch to implement pg_current_logfile() function
On January 15, 2017 11:47:51 PM CST, Michael Paquierwrote: >On Mon, Jan 16, 2017 at 1:14 PM, Karl O. Pinc wrote: > > >> Attached also are 2 patches which abstract some hardcoded >> constants into symbols. Whether to apply them is a matter >> of style and left to the maintainers, which is why these >> are separate patches. > >Making the strings csvlog, stderr and eventlog variables? Why not >because the patch presented here uses them in new places. Now it is >not like it is a huge maintenance burden to keep them as strings, so I >would personally not bother much. To my mind it's a matter readability. It is useful to be able to search for or identify quickly when reading, e. g., all the places where the keyword stderr relates to output log destination and not some other more common use. The downside is it is more code... >OK. I have done a round of hands-on review on this patch, finishing >with the attached. In the things I have done: Thank you. >With all those issues fixed, I finish with the attached, that I am >fine to pass down to a committer. I still think that this should be >only one function using a SRF with rows shaped as (type, path) for >simplicity, but as I am visibly outnumbered I won't insist further. That also makes a certain amount of sense to me but I can't say I have thought deeply about it. Haven't paid any attention to this issue and am fine letting things move forward as is. >Also, I would rather see an ERROR returned to the user in case of bad >data in current_logfiles, I did not change that either as that's the >original intention of Gilles. I could be wrong but I seem to recall that Robert recommended against an error message. If there is bad data then something is really wrong, up to some sort of an attack on the back end. Because this sort of thing simply shouldn't happen it's enough in my opinion to guard against buffer overruns etc and just get on with life. If something goes unexpectedly and badly wrong with internal data structures in general there would have to be all kinds of additional code to cover every possible problem and that doesn't seem to make sense. Sorry about the previous email with empty content. My email client got away from me. Karl Free Software: "You don't pay back you pay forward." -- Robert A. Heinlein -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvements in psql hooks for variables
"Daniel Verite"writes: > Tom Lane wrote: >> Also, why aren't you using ParseVariableBool's existing ability to report >> errors? > Its' because there are two cases: > - either only a boolean can be given to the command or variable, > in which we let ParseVariableBool() tell: >unrecognized value "bogus" for "command": boolean expected > - or other parameters besides boolean are acceptable, in which case we > can't say "boolean expected", because in fact a boolean is no more or > less expected than the other valid values. Ah. Maybe it's time for a ParseVariableEnum, or some other way of dealing with those cases in a more unified fashion. > Do we care about the "boolean expected" part of the error message? I'm not especially in love with that particular wording, but I'm doubtful that we should give up all indication of what valid values are, especially in the cases where there are more than just bool-equivalent values. I think the right thing to do here is to fix it so that the input routine has full information about all the valid values. On the backend side, we've gone to considerable lengths to make sure that error messages are helpful for such cases, eg: regression=# set backslash_quote to foo; ERROR: invalid value for parameter "backslash_quote": "foo" HINT: Available values: safe_encoding, on, off. and I think it may be worth working equally hard here. > I get the idea, except that in this example, the compiler is > unhappy because popt->topt.expanded is not a bool, and an > explicit cast here would be kludgy. Yeah, you'll need an intermediate variable if you're trying to use ParseVariableBool for such a case. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvements in psql hooks for variables
Tom Lane wrote: > "Daniel Verite"writes: > > [ psql-var-hooks-v6.patch ] > > I took a quick look through this. It seems to be going in generally > the right direction, but here's a couple of thoughts: Thanks for looking into this! > I'm inclined to think that the best choice is for the function result > to be the ok/not ok flag, and pass the variable-to-be-modified as an > output parameter. That fits better with the notion that the variable > is not to be modified on failure. Agreed, if never clobbering the variable, having the valid/invalid state returned by ParseVariableBool() allows for simpler code. I'm changing it that way. > Also, why aren't you using ParseVariableBool's existing ability to report > errors? Its' because there are two cases: - either only a boolean can be given to the command or variable, in which we let ParseVariableBool() tell: unrecognized value "bogus" for "command": boolean expected - or other parameters besides boolean are acceptable, in which case we can't say "boolean expected", because in fact a boolean is no more or less expected than the other valid values. We could shave code by reducing ParseVariableBool()'s error message to: unrecognized value "bogus" for "name" clearing the distinction between [only booleans are expected] and [booleans or enum are expected]. Then almost all callers that have their own message could rely on ParseVariableBool() instead, as they did previously. Do we care about the "boolean expected" part of the error message? > else if (value) > ! { > ! boolvalid; > ! boolnewval = ParseVariableBool(value, NULL, ); > ! if (valid) > ! popt->topt.expanded = newval; > ! else > ! { > ! psql_error("unrecognized value \"%s\" for \"%s\"\n", > !value, param); > ! return false; > ! } > ! } > is really the hard way and you could have just done > > - popt->topt.expanded = ParseVariableBool(value, param); > + success = ParseVariableBool(value, param, > >topt.expanded); I get the idea, except that in this example, the compiler is unhappy because popt->topt.expanded is not a bool, and an explicit cast here would be kludgy. For the error printing part, it would go away with the above suggestion Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to implement pg_current_logfile() function
On January 15, 2017 11:47:51 PM CST, Michael Paquierwrote: >On Mon, Jan 16, 2017 at 1:14 PM, Karl O. Pinc wrote: >> I do have a question here regards code formatting. >> The patch now contains: >> >> if (log_filepath == NULL) >> { >> /* Bad data. Avoid segfaults etc. and return NULL to caller. >*/ >> break; >> } >> >> I'm not sure how this fits in with PG coding style, >> whether the {} should be removed or what. I've looked >> around and can't find an example of an if with a single >> line then block and a comment. Maybe this means that >> I shouldn't be doing this, but if I put the comment above >> the if then it will "run into" the comment block which >> immediately precedes the above code which describes >> a larger body of code. So perhaps someone should look >> at this and tell me how to improve it. > >Including brackets in this case makes a more readable code. That's the >pattern respected the most in PG code. See for example >XLogInsertRecord(): >else >{ >/* > * This was an xlog-switch record, but the current insert location was > * already exactly at the beginning of a segment, so there was no need > * to do anything. > */ >} > >> Attached also are 2 patches which abstract some hardcoded >> constants into symbols. Whether to apply them is a matter >> of style and left to the maintainers, which is why these >> are separate patches. > >Making the strings csvlog, stderr and eventlog variables? Why not >because the patch presented here uses them in new places. Now it is >not like it is a huge maintenance burden to keep them as strings, so I >would personally not bother much. > >> The first patch changes only code on the master >> branch, the 2nd patch changes the new code. > >Thanks for keeping things separated. > >> I have not looked further at the patch or checked >> to see that all changes previously recommended have been >> made. Michael, if you're confident that everything is good >> go ahead and move the patchs forward to the maintainers. Otherwise >> please let me know and I'll see about finding time >> for further review. > >OK. I have done a round of hands-on review on this patch, finishing >with the attached. In the things I have done: >- Elimination of useless noise diff >- Fixes some typos (logile, log file log, etc.) >- Adjusted docs. >- Docs were overdoing in storage.sgml. Let's keep the description >simple. >- Having a paragraph at the beginning of "Error Reporting and Logging" >in config.sgml does not look like a good idea to me. As the updates of >current_logfiles is associated with log_destination I'd rather see >this part added in the description of the GUC. >- Missed a (void) in the definition of update_metainfo_datafile(). >- Moved update_metainfo_datafile() before the signal handler routines >in syslogger.c for clarity. >- The last "return" is useless in update_metainfo_datafile(). >- In pg_current_logfile(), it is useless to call PG_RETURN_NULL after >emitting an ERROR message. >- Two calls to FreeFile(fd) have been forgotten in >pg_current_logfile(). In one case, errno needs to be saved beforehand >to be sure that the correct error string is generated for the user. >- A portion of 010_pg_basebackup.pl was not updated. >- Incorrect header ordering in basebackup.c. >- Decoding of CR and LF characters seem to work fine when >log_file_name include some. > >With all those issues fixed, I finish with the attached, that I am >fine to pass down to a committer. I still think that this should be >only one function using a SRF with rows shaped as (type, path) for >simplicity, but as I am visibly outnumbered I won't insist further. >Also, I would rather see an ERROR returned to the user in case of bad >data in current_logfiles, I did not change that either as that's the >original intention of Gilles. >-- >Michael Karl Free Software: "You don't pay back you pay forward." -- Robert A. Heinlein
Re: [HACKERS] Hash support for grouping sets
The ability to exploit hashed aggregation within sorted groups, when the order of the input stream can be exploited this way, is potentially a useful way to improve aggregation performance more generally. This would potentially be beneficial when the input size is expected to be larger than the amount of working memory available for hashed aggregation, but where there is enough memory to hash-aggregate just the unsorted grouping key combinations, and when the cumulative cost of rebuilding the hash table for each sorted subgroup is less than the cost of sorting the entire input. In other words, if most of the grouping key combinations are already segregated by virtue of the input order, then hashing the remaining combinations within each sorted group might be done in memory, at the cost of rebuilding the hash table for each sorted subgroup. I haven’t looked at the code for this change yet (I hope I will have the time to do that). Ideally the decision to choose the aggregation method as sorted, hashed, or mixed hash/sort should be integrated into the cost model, but given the notorious difficulty of estimating intermediate cardinalities accurately it would be difficult to develop a cardinality model and a cost model accurate enough to choose among these options consistently well. Jim Finnerty Amazon Corp. On 1/10/17, 10:22 AM, "pgsql-hackers-ow...@postgresql.org on behalf of Robert Haas"wrote: On Thu, Jan 5, 2017 at 10:48 PM, Andrew Gierth wrote: > Herewith a patch for doing grouping sets via hashing or mixed hashing > and sorting. Cool. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallelize queries containing subplans
On Thu, Jan 12, 2017 at 7:56 PM, Amit Kapilawrote: > On Thu, Jan 12, 2017 at 8:51 AM, Robert Haas wrote: >> On Wed, Jan 11, 2017 at 9:58 PM, Amit Kapila wrote: >>> The other alternative is to remember this information in SubPlan. We >>> can retrieve parallel_safe information from best_path and can use it >>> while generating SubPlan. The main reason for storing it in the plan >>> was to avoid explicitly passing parallel_safe information while >>> generating SubPlan as plan was already available at that time. >>> However, it seems there are only two places in code (refer >>> build_subplan) where this information needs to be propagated. Let me >>> know if you prefer to remember the parallel_safe information in >>> SubPlan instead of in Plan or if you have something else in mind? >> >> I think we should try doing it in the SubPlan, at least, and see if >> that comes out more elegant than what you have at the moment. >> > > Okay, done that way. I have fixed the review comments raised by Dilip > as well and added the test case in attached patch. > After commit-ab1f0c8, this patch needs a rebase. Attached find rebased version of the patch. Thanks to Kuntal for informing me offlist that this patch needs rebase. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com pq_pushdown_subplan_v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackups and slots
On Mon, Jan 16, 2017 at 10:00 PM, Magnus Haganderwrote: > > > On Wed, Jan 4, 2017 at 3:32 PM, Peter Eisentraut > wrote: >> >> This patch looks reasonable to me. >> >> Attached is a top-up patch with a few small fixups. >> >> I suggest to wait for the resolution of the "Replication/backup >> defaults" thread. I would not want to be in a situation where users who >> have not been trained to use replication slots now have yet another >> restart-only parameter to set before they can take a backup. >> > > Thanks for the review and the top-up patch. I've applied it and pushed. - if (replication_slot && includewal != STREAM_WAL) + if ((replication_slot || no_slot) && includewal != STREAM_WAL) Why do we need to check "no_slot" here? Since no_slot=true means that no temporary replication slot is specified, it's fine even with both -X none and fetch. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)
Andres Freund wrote: > That worked quite well. So we have a few questions, before I clean this > up: > > - For now the node is named 'Srf' both internally and in explain - not > sure if we want to make that something longer/easier to understand for > others? Proposals? TargetFunctionScan? SetResult? > > - We could alternatively add all this into the Result node - it's not > actually a lot of new code, and most of that is boilerplate stuff > about adding a new node. I'm ok with both. Hmm. I wonder if your stuff could be used as support code for XMLTABLE[1]. Currently it has a bit of additional code of its own, though admittedly it's very little code executor-side. Would you mind sharing a patch, or more details on how it works? [1] https://www.postgresql.org/message-id/cafj8pra_keukobxvs4v-imoeesxu0pd0ashv0-mwrfdrwte...@mail.gmail.com > - I continued with the division of Labor that Tom had set up, so we're > creating one Srf node for each "nested" set of SRFs. We'd discussed > nearby to change that for one node/path for all nested SRFs, partially > because of costing. But I don't like the idea that much anymore. The > implementation seems cleaner (and probably faster) this way, and I > don't think nested targetlist SRFs are something worth optimizing > for. Anybody wants to argue differently? Nested targetlist SRFs make my head spin. I suppose they may have some use, but where would you really want this: alvherre=# select generate_series(1, generate_series(2, 4)); generate_series ─ 1 2 1 2 3 1 2 3 4 (9 filas) instead of the much more sensible alvherre=# select i, j from generate_series(2, 4) i, generate_series(1, i) j; i │ j ───┼─── 2 │ 1 2 │ 2 3 │ 1 3 │ 2 3 │ 3 4 │ 1 4 │ 2 4 │ 3 4 │ 4 (9 filas) ? If supporting the former makes it harder to support/optimize more reasonable cases, it seems fair game to leave them behind. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackups and slots
On Wed, Jan 4, 2017 at 3:32 PM, Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > This patch looks reasonable to me. > > Attached is a top-up patch with a few small fixups. > > I suggest to wait for the resolution of the "Replication/backup > defaults" thread. I would not want to be in a situation where users who > have not been trained to use replication slots now have yet another > restart-only parameter to set before they can take a backup. > > Thanks for the review and the top-up patch. I've applied it and pushed. I just realized I missed crediting you with the review and the top-up in the commit message. My apologies for this. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Support for pg_receivexlog --format=plain|tar
On Mon, Jan 16, 2017 at 9:12 PM, Magnus Haganderwrote: > Where did your research point to then? :) I just read the gzip rfc > (http://www.zlib.org/rfc-gzip.html) which seems to call it that at least? Well, OK. I was not aware of this RFC. I guessed it by looking at the code of gzip, that uses the CRC as well. I also found some reference into a blog post. >> > Finally, I think we should make the error message clearly say >> > "compressed >> > segment file" - just to make things extra clear. >> >> Sure. > > AFAICT the > + iscompress ? "compressed" : "", > part of the error handling is unnecessary, because iscompressed will always > be true in that block. All the other error messages in that codepath has > compressed hardcoded in them, as should this one. Fat-fingered here.. >> Hm. It looks that you are right. zlib goes down to _tr_flush_bits() to >> flush some output, but this finishes only with put_byte(). As the fd >> is opaque in gzFile, it would be just better to open() the file first, >> and then use gzdopen to get the gzFile. Let's use as well the existing >> fd field to save it. gzclose() closes as well the parent fd per the >> documentation of zlib. > > This version throws a warning: > walmethods.c: In function ‘dir_open_for_write’: > walmethods.c:170:11: warning: ‘gzfp’ may be used uninitialized in this > function [-Wmaybe-uninitialized] >f->gzfp = gzfp; gcc and clang did not complain here, what did you use? > I can't see that there is any code path where this can actually happen > though, so we should probably just initialize it to NULL at variable > declaration. Or do you see a path where this could actually be incorrect? Not that I see. All the code paths using gzfp are under data_dir->compression > 0. > If you agree with those two comments, I will go ahead and push with those > minor fixes. No problem for me, thanks for the review! -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for pg_receivexlog --format=plain|tar
On Sun, Jan 15, 2017 at 6:44 AM, Michael Paquierwrote: > On Sun, Jan 15, 2017 at 1:31 AM, Magnus Hagander > wrote: > > > > > > On Tue, Jan 10, 2017 at 3:01 AM, Michael Paquier < > michael.paqu...@gmail.com> > > wrote: > >> Based on some analysis, it is enough to look at the last 4 bytes of > >> the compressed file to get the size output data with a single call to > >> lseek() and then read(). So as there is a simple way to do things and > >> that's far cheaper than decompressing perhaps hundred of segments I'd > >> rather do it this way. Attached is the implementation. This code is > >> using 2 booleans for 4 states of the file names: full non-compressed, > >> partial non-compressed, full compressed and partial compressed. This > >> keeps the last check of FindStreamingStart() more simple, but that's > >> quite fancy lately to have an enum for such things :D > > > > > > I think you need to document that analysis at least in a code comment. I > > assume this is in reference to the ISIZE member of the gzip format? > > Good question. I am not sure on this one. > Where did your research point to then? :) I just read the gzip rfc ( http://www.zlib.org/rfc-gzip.html) which seems to call it that at least? > > > I was going to say what happens if the file is corrupt and we get random > > junk data there, but as we only compare it to XlogSegSize that should be > > safe. But we might want to put a note in about that too? > > Perhaps. I have made a better try in FindStreamingStart. Looks much better now. I think that one is fine as it is now. > > > Finally, I think we should make the error message clearly say "compressed > > segment file" - just to make things extra clear. > > Sure. > AFAICT the + iscompress ? "compressed" : "", part of the error handling is unnecessary, because iscompressed will always be true in that block. All the other error messages in that codepath has compressed hardcoded in them, as should this one. > > >> > I found another problem with it -- it is completely broken in sync > mode. > >> > You > >> > need to either forbid sync mode together with compression, or teach > >> > dir_sync() about it. The later would sound better, but I wonder how > much > >> > that's going to kill compression due to the small blocks? Is it a > >> > reasonable > >> > use-case? > >> > >> Hm. Looking at the docs I think that --compress defined with > >> --synchronous maps to the use of Z_SYNC_FLUSH with gzflush(). FWIW I > >> don't have a direct use case for it, but it is not a major effort to > >> add it, so done. There is no actual reason to forbid this combinations > >> of options either. > >> > > It is enough to just gzflush(), don't we also need to still fsync()? I > can't > > see any documentation that gzflush() does that, and a quick look at the > code > > of zlib indicates it doesn't (but I didn't check in detail, so if you did > > please point me to it). > > Hm. It looks that you are right. zlib goes down to _tr_flush_bits() to > flush some output, but this finishes only with put_byte(). As the fd > is opaque in gzFile, it would be just better to open() the file first, > and then use gzdopen to get the gzFile. Let's use as well the existing > fd field to save it. gzclose() closes as well the parent fd per the > documentation of zlib. > > This version throws a warning: walmethods.c: In function ‘dir_open_for_write’: walmethods.c:170:11: warning: ‘gzfp’ may be used uninitialized in this function [-Wmaybe-uninitialized] f->gzfp = gzfp; I can't see that there is any code path where this can actually happen though, so we should probably just initialize it to NULL at variable declaration. Or do you see a path where this could actually be incorrect? If you agree with those two comments, I will go ahead and push with those minor fixes. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Parallel Index Scans
On Fri, Jan 13, 2017 at 11:06 PM, Robert Haaswrote: > On Fri, Jan 13, 2017 at 9:28 AM, Anastasia Lubennikova > wrote: >> I didn't find any case of noticeable performance degradation, >> so set status to "Ready for committer". > > The very first hunk of doc changes looks like it makes the whitespace > totally wrong - surely it can't be right to have 0-space indents in C > code. > Fixed. > +The index_size parameter indicate the size of generic > parallel > > indicate -> indicates > size of generic -> size of the generic > Fixed. > + index-type-specific parallel information which will be stored immediatedly > > Typo. > Fixed. > + Initialize the parallel index scan state. It will be used to initialize > + index-type-specific parallel information which will be stored immediatedly > + after generic parallel information required for parallel index scans. The > + required state information will be set in target. > + > + > + > + The aminitparallelscan and > amestimateparallelscan > + functions need only be provided if the access method supports > parallel > + index scans. If it doesn't, the aminitparallelscan and > + amestimateparallelscan fields in its > IndexAmRoutine > + struct must be set to NULL. > + > > Inconsistent indentation. Fixed. > seems like a strange choice of tag. > I have seen that is used in indexam.sgml at multiple places to refer to "bitmap" and "plain" index scans. So thought of using same for "parallel" index scans. > +/* amestimateparallelscan is optional; assume no-op if not > provided by AM */ > > The fact that amestimateparallelscan is optional even when parallel > index scans are supported is undocumented. > Okay, I have added that information in docs. > Similarly for the other > functions, which also seem to be optional but not documented as such. > The code and documentation need to match. > All the functions introduced by this patch are documented in indexam.sgml as optional. Not sure, which other place you are expecting an update. > +void *amtarget = (char *) ((void *) target) + offset; > > Passing an unaligned pointer to the AM sounds like a recipe for > crashes on obscure platforms that can't tolerate alignment violations, > and possibly bad performance on others. I'd arrange MAXALIGN the size > of the generic structure in index_parallelscan_estimate and > index_parallelscan_initialize. Right, changed as per suggestion. > Also, why pass the size of the generic > structure to the AM-specific estimate routine, anyway? It can't > legally return a smaller value, and we can add_size() just as well as > the AM-specific code. Wouldn't it make more sense for the AM-specific > code to return the amount of space that is needed for AM-specific > stuff, and let the generic code deal with the generic stuff? > makes sense, so changed accordingly. > + *status - True indicates that the block number returned is valid and > scan > + * is continued or block number is invalid and scan has just > begun > + * or block number is P_NONE and scan is finished. False > indicates > + * that we have reached the end of scan for current > scankeys and for > + * that we return block number as P_NONE. > > It is hard to parse a sentence with that many "and" and "or" clauses, > especially since English, unlike C, does not have strict operator > precedence rules. Perhaps you could reword to make it more clear. > Okay, I have changed the comment. > Also, does that survive pgindent with that indentation? > Yes. > +BTParallelScanDesc btscan = (BTParallelScanDesc) OffsetToPointer( > +(void *) scan->parallel_scan, > + scan->parallel_scan->ps_offset); > > You could avoid these uncomfortable line breaks by declaring the > variable on one line and the initializing it on a separate line. > Okay, changed. > +SpinLockAcquire(>ps_mutex); > +pageStatus = btscan->ps_pageStatus; > +if (so->arrayKeyCount < btscan->ps_arrayKeyCount) > +*status = false; > +else if (pageStatus == BTPARALLEL_DONE) > +*status = false; > +else if (pageStatus != BTPARALLEL_ADVANCING) > +{ > +btscan->ps_pageStatus = BTPARALLEL_ADVANCING; > +nextPage = btscan->ps_nextPage; > +exit_loop = true; > +} > +SpinLockRelease(>ps_mutex); > > IMHO, this needs comments. > Sure, added a comment. > + * It updates the value of next page that allows parallel scan to move > forward > + * or backward depending on scan direction. It wakes up one of the sleeping > + * workers. > > This construction is commonly used in India but sounds awkward to > other English-speakers, or at least to me. You can either drop the > word "it" and just start with the verb
Re: [HACKERS] Tuple sort is broken. It crashes on simple test.
(Adding Peter in CC who also worked on that) On Mon, Jan 16, 2017 at 7:14 PM, Mithun Cywrote: > Test: > -- > create table seq_tab(t int); > insert into seq_tab select generate_series(1, 1000); > select count(distinct t) from seq_tab; > > #0 0x0094a9ad in pfree (pointer=0x0) at mcxt.c:1007 > #1 0x00953be3 in mergeonerun (state=0x1611450) at tuplesort.c:2803 > #2 0x00953824 in mergeruns (state=0x1611450) at tuplesort.c:2721 > #3 0x009521bc in tuplesort_performsort (state=0x1611450) at > tuplesort.c:1813 > #4 0x00662b85 in process_ordered_aggregate_single > (aggstate=0x160b540, pertrans=0x160d440, pergroupstate=0x160d3f0) at > nodeAgg.c:1178 > #5 0x006636e0 in finalize_aggregates (aggstate=0x160b540, > peraggs=0x160c5e0, pergroup=0x160d3f0, currentSet=0) at nodeAgg.c:1600 > #6 0x00664509 in agg_retrieve_direct (aggstate=0x160b540) at > nodeAgg.c:2266 > #7 0x00663f66 in ExecAgg (node=0x160b540) at nodeAgg.c:1946 > #8 0x00650ad2 in ExecProcNode (node=0x160b540) at execProcnode.c:503 Indeed. It crashes for me immediately by adding an ORDER BY: select count(distinct t) from seq_tab order by 1; -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An isolation test for SERIALIZABLE READ ONLY DEFERRABLE
On Mon, Jan 16, 2017 at 10:37 PM, Craig Ringerwrote: > On 16 Jan. 2017 17:09, "Michael Paquier" wrote: > > On Mon, Jan 16, 2017 at 9:40 AM, Thomas Munro > wrote: >> I also have longer term plans to show the first and third of them >> running with the read-only transaction moved to a standby server. >> Kevin Grittner gave me the idea of multi-server isolation tests when I >> mentioned my WIP SERIALIZABLE DEFERRABLE-on-standbys patch off-list, > > Being able to do that with the isolation tester has been mentioned a > coupled of times, particularly from 2ndQ folks who worked in BDR. I > think that it has been actually patched in this sense, so perhaps you > could reuse the work that has been done there. > > > Yep. See the bdr-pg/REL9_4_STABLE branch of the BDR repo, src/test/isolation > . > > I think Abhijit submitted a patch to the list too. Very nice. That's almost exactly what I had in mind. One thing that jumps out at me from README.multinode is that it's surprising to see conninfo strings in the actual spec file: I assumed that the isolation tester would extend what it does today by accepting multiple conninfo command line arguments, and I imagined that the spec info could say something like SESSION "s3" CONNECTION 2. I don't really care about names vs numbers but I thought it might be nice if connections were controlled separately from specs so that you could use the same spec for tests that are run on one node and two node setups, perhaps by making CONNECTION 2 fall back to the default connection if there is no connection 2 provided on the command line. For example, with read-only-anomaly.spec (from the patch in this thread), the output should be identical if you point s3 at a standby using syncrep and remote_apply. In future, if SERIALIZABLE DEFERRABLE is available on standbys, then read-only-anomaly-3.spec should also work that way. It'd be cool to find a way to express that without having to use a separate spec/expected files for the multi-node version when the goal is to show that it works the same as the single-node version. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Too many autovacuum workers spawned during forced auto-vacuum
On Mon, Jan 16, 2017 at 1:50 PM, Amit Khandekarwrote: > On 13 January 2017 at 19:15, Alvaro Herrera wrote: >> I think this is the same problem as reported in >> https://www.postgresql.org/message-id/CAMkU=1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta=ypy...@mail.gmail.com > > Ah yes, this is the same problem. Not sure why I didn't land on that > thread when I tried to search pghackers using relevant keywords. >> >>> === Fix === >> [...] >>> Instead, the attached patch (prevent_useless_vacuums.patch) prevents >>> the repeated cycle by noting that there's no point in doing whatever >>> vac_update_datfrozenxid() does, if we didn't find anything to vacuum >>> and there's already another worker vacuuming the same database. Note >>> that it uses wi_tableoid field to check concurrency. It does not use >>> wi_dboid field to check for already-processing worker, because using >>> this field might cause each of the workers to think that there is some >>> other worker vacuuming, and eventually no one vacuums. We have to be >>> certain that the other worker has already taken a table to vacuum. >> >> Hmm, it seems reasonable to skip the end action if we didn't do any >> cleanup after all. This would normally give enough time between vacuum >> attempts for the first worker to make further progress and avoid causing >> a storm. I'm not really sure that it fixes the problem completely, but >> perhaps it's enough. > > I had thought about this : if we didn't clean up anything, skip the > end action unconditionally without checking if there was any > concurrent worker. But then thought it is better to skip only if we > know there is another worker doing the same job, because : > a) there might be some reason we are just calling > vac_update_datfrozenxid() without any condition. But I am not sure > whether it was intentionally kept like that. Didn't get any leads from > the history. > b) it's no harm in updating datfrozenxid() it if there was no other > worker. In this case, we *know* that there was indeed nothing to be > cleaned up. So the next time this database won't be chosen again, so > there's no harm just calling this function. > Since autovacuum worker wakes up autovacuum launcher after launched the autovacuum launcher could try to spawn worker process at high frequently if you have database with very large table in it that has just passed autovacuum_freeze_max_age. autovacuum.c:L1605 /* wake up the launcher */ if (AutoVacuumShmem->av_launcherpid != 0) kill(AutoVacuumShmem->av_launcherpid, SIGUSR2); I think we should deal with this case as well. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Tuple sort is broken. It crashes on simple test.
Test: -- create table seq_tab(t int); insert into seq_tab select generate_series(1, 1000); select count(distinct t) from seq_tab; #0 0x0094a9ad in pfree (pointer=0x0) at mcxt.c:1007 #1 0x00953be3 in mergeonerun (state=0x1611450) at tuplesort.c:2803 #2 0x00953824 in mergeruns (state=0x1611450) at tuplesort.c:2721 #3 0x009521bc in tuplesort_performsort (state=0x1611450) at tuplesort.c:1813 #4 0x00662b85 in process_ordered_aggregate_single (aggstate=0x160b540, pertrans=0x160d440, pergroupstate=0x160d3f0) at nodeAgg.c:1178 #5 0x006636e0 in finalize_aggregates (aggstate=0x160b540, peraggs=0x160c5e0, pergroup=0x160d3f0, currentSet=0) at nodeAgg.c:1600 #6 0x00664509 in agg_retrieve_direct (aggstate=0x160b540) at nodeAgg.c:2266 #7 0x00663f66 in ExecAgg (node=0x160b540) at nodeAgg.c:1946 #8 0x00650ad2 in ExecProcNode (node=0x160b540) at execProcnode.c:503 Git bisect shows me the following commit has caused it. commit Id e94568ecc10f2638e542ae34f2990b821bbf90ac Author: Heikki Linnakangas2016-10-03 16:07:49 Committer: Heikki Linnakangas 2016-10-03 16:07:49 Change the way pre-reading in external sort's merge phase works. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Typo in condition_variable.c
On Mon, Jan 16, 2017 at 6:34 PM, Masahiko Sawadawrote: > On Mon, Jan 16, 2017 at 6:27 PM, Fujii Masao wrote: >> On Mon, Jan 16, 2017 at 5:42 PM, Masahiko Sawada >> wrote: >>> Hi, >>> >>> Attached patch fixes comment typos in condition_variable.c >> >> Seems the patch still has the typo. "initiailly" should be "initially"? >> > > Yes, it should be "initially". OK, pushed. Thanks! Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An isolation test for SERIALIZABLE READ ONLY DEFERRABLE
On 16 Jan. 2017 17:09, "Michael Paquier"wrote: On Mon, Jan 16, 2017 at 9:40 AM, Thomas Munro wrote: > I also have longer term plans to show the first and third of them > running with the read-only transaction moved to a standby server. > Kevin Grittner gave me the idea of multi-server isolation tests when I > mentioned my WIP SERIALIZABLE DEFERRABLE-on-standbys patch off-list, Being able to do that with the isolation tester has been mentioned a coupled of times, particularly from 2ndQ folks who worked in BDR. I think that it has been actually patched in this sense, so perhaps you could reuse the work that has been done there. Yep. See the bdr-pg/REL9_4_STABLE branch of the BDR repo, src/test/isolation . I think Abhijit submitted a patch to the list too.
Re: [HACKERS] Typo in condition_variable.c
On Mon, Jan 16, 2017 at 6:27 PM, Fujii Masaowrote: > On Mon, Jan 16, 2017 at 5:42 PM, Masahiko Sawada > wrote: >> Hi, >> >> Attached patch fixes comment typos in condition_variable.c > > Seems the patch still has the typo. "initiailly" should be "initially"? > Yes, it should be "initially". Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Typo in condition_variable.c
On Mon, Jan 16, 2017 at 5:42 PM, Masahiko Sawadawrote: > Hi, > > Attached patch fixes comment typos in condition_variable.c Seems the patch still has the typo. "initiailly" should be "initially"? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2017/01/14 6:24, Robert Haas wrote: > On Tue, Jan 10, 2017 at 6:06 AM, Amit Langote wrote: >> >> Thanks! I realized however that the approach I used in 0002 of passing >> the original slot to ExecConstraints() fails in certain situations. For >> example, if a BR trigger changes the tuple, the original slot would not >> receive those changes, so it will be wrong to use such a tuple anymore. >> In attached 0001, I switched back to the approach of converting the >> partition-tupdesc-based tuple back to the root partitioned table's format. >> The converted tuple contains the changes by BR triggers, if any. Sorry >> about some unnecessary work. > > Hmm. Even with this patch, I wonder if this is really correct. I > mean, isn't the root of the problem here that ExecConstraints() is > expecting that resultRelInfo matches slot, and it doesn't? The problem is that whereas the SlotValueDescription that we build to show in the error message should be based on the tuple that was passed to ExecInsert() or whatever NextCopyFrom() returned for CopyFrom() to process, it might fail to be the case if the tuple was needed to be converted after tuple routing. slot (the tuple in it and its tuple descriptor) and resultRelInfo that ExecConstraint() receives *do* correspond with each other, even after possible tuple conversion following tuple-routing, and hence constraint checking itself works fine (since commit 2ac3ef7a01 [1]). As said, it's the val_desc built to show in the error message being based on the converted-for-partition tuple that could be seen as a problem - is it acceptable if we showed in the error message whatever the converted-for-partition tuple looks like which might have columns ordered differently from the root table? If so, we could simply forget the whole thing, including reverting f1b4c771 [2]. An example: create table p (a int, b char, c int) partition by list (a); create table p1 (b char, c int, a int);-- note the column order alter table p attach partition p1 for values in (1); alter table p add constraint check_b check (b = 'x'); insert into p values (1, 'y', 1); ERROR: new row for relation "p1" violates check constraint "check_b" DETAIL: Failing row contains (y, 1, 1). Note that "(y, 1, 1)" results from using p1's descriptor on the converted tuple. As long that's clear and acceptable, I think we need not worry about this patch and revert the previously committed patch for this "problem". > And why > isn't that also a problem for the things that get passed resultRelInfo > and slot after tuple routing and before ExecConstraints? In > particular, in copy.c, ExecBRInsertTriggers. If I explained the problem (as I'm seeing it) well enough above, you may see why that's not an issue in other cases. Other sub-routines viz. ExecBRInsertTriggers, ExecWithCheckOptions for RLS INSERT WITH CHECK policies, ExecARInsertTriggers, etc. do receive the correct slot and resultRelInfo for whatever they do with a given tuple and the relation (partition) it is being inserted into. However, if we do consider the above to be a problem, then we would need to fix ExecWithCheckOptions() to do whatever we decide ExecConstraints() should do, because it can also report WITH CHECK violation for a tuple. > Committed 0002. Thanks, Amit [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ac3ef7 [2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f1b4c77 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Typo in condition_variable.c
Hi, Attached patch fixes comment typos in condition_variable.c Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center fix_type_in_condition_variable_c.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)
On 2017-01-15 19:29:52 -0800, Andres Freund wrote: > On 2016-10-31 09:06:39 -0700, Andres Freund wrote: > > On 2016-09-14 19:28:25 -0400, Tom Lane wrote: > > > Andres Freundwrites: > > > > On 2016-09-12 19:35:22 -0400, Tom Lane wrote: > > > Here's a draft patch that is just meant to investigate what the planner > > > changes might look like if we do it in the pipelined-result way. > > > Accordingly, I didn't touch the executor, but just had it emit regular > > > Result nodes for SRF-execution steps. However, the SRFs are all > > > guaranteed to appear at top level of their respective tlists, so that > > > those Results could be replaced with something that works like > > > nodeFunctionscan. > > > > > So I think we should continue investigating this way of doing things. > > > I'll try to take a look at the executor end of it tomorrow. However > > > I'm leaving Friday for a week's vacation, and may not have anything to > > > show before that. > > > > Are you planning to work on the execution side of things? I otherwise > > can take a stab... > > Doing so now. That worked quite well. So we have a few questions, before I clean this up: - For now the node is named 'Srf' both internally and in explain - not sure if we want to make that something longer/easier to understand for others? Proposals? TargetFunctionScan? SetResult? - We could alternatively add all this into the Result node - it's not actually a lot of new code, and most of that is boilerplate stuff about adding a new node. I'm ok with both. - I continued with the division of Labor that Tom had set up, so we're creating one Srf node for each "nested" set of SRFs. We'd discussed nearby to change that for one node/path for all nested SRFs, partially because of costing. But I don't like the idea that much anymore. The implementation seems cleaner (and probably faster) this way, and I don't think nested targetlist SRFs are something worth optimizing for. Anybody wants to argue differently? - I chose to error out if a retset function appears in ExecEvalFunc/Oper and make both unconditionally set evalfunc to ExecMakeFunctionResultNoSets. ExecMakeFunctionResult() is now externally visible. That seems like the least noisy way to change things over, but I'm open for different proposals. Comments? Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] check_srf_call_placement() isn't always setting p_hasTargetSRFs
Hi, while working on [1] (works, harmless regression check failures aside, needs cleanup), I noticed $subject. Without fixing EXPR_KIND_VALUES to set p_hasTargetSRFs the query has Query->hasTargetSRF wrongly set. Which in turn breaks your planner code, as it's not being triggered anymore. Is there a reason not to just set p_hasTargetSRFs once towards the end of the function, instead of doing so for all the non-error cases? That fixes INSERT ... VALUES(generate_series()) with your approach for me. I also noticed that we currently don't actually handle Query->hasTargetSRF correct when said SRF is in a VALUES() block. As there the SRFs are't in the targetlist PlannerInfo * subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root, bool hasRecursion, double tuple_fraction) ... /* Constant-folding might have removed all set-returning functions */ if (parse->hasTargetSRFs) parse->hasTargetSRFs = expression_returns_set((Node *) parse->targetList); unsets that SRFs exist. I'm not really sure whether that's bad or good - it right now doesn't cause active problems because of transformInsertStmt's /* * Process INSERT ... VALUES with a single VALUES sublist. We treat * this case separately for efficiency. The sublist is just computed * directly as the Query's targetlist, with no VALUES RTE. So it * works just like a SELECT without any FROM. */ optimization. As we don't support SRFs in any other kind of values (ValuesNext calls ExecEvalExpr isDone = NULL), that appears to be effectless right now. But we'd get better errormessage if we made check_srf_call_placement() error out. I wonder if there should be a seperate expression type for the INSERT ... VALUES(exactly-one-row); since that behaves quite differently. Greetings, Andres Freund [1] http://archives.postgresql.org/message-id/20170116032952.z2re55hcfhzbkmrm%40alap3.anarazel.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers