Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables
Hi Rajkumar, On Fri, Sep 16, 2016 at 6:00 PM, Rajkumar Raghuwanshi wrote: > > On Fri, Sep 9, 2016 at 3:17 PM, Ashutosh Bapat > wrote: >> >> Hi All, >> >> PFA the patch to support partition-wise joins for partitioned tables. The >> patch >> is based on the declarative parition support patches provided by Amit >> Langote >> on 26th August 2016. > > > I have applied declarative partitioning patches posted by Amit Langote on 26 > Aug 2016 and then partition-wise-join patch, getting below error while make > install. > > ../../../../src/include/nodes/relation.h:706: error: redefinition of typedef > ‘PartitionOptInfo’ > ../../../../src/include/nodes/relation.h:490: note: previous declaration of > ‘PartitionOptInfo’ was here > make[4]: *** [gistbuild.o] Error 1 > make[4]: Leaving directory > `/home/edb/Desktop/edb_work/WORKDB/PG/postgresql/src/backend/access/gist' > make[3]: *** [gist-recursive] Error 2 > make[3]: Leaving directory > `/home/edb/Desktop/edb_work/WORKDB/PG/postgresql/src/backend/access' > make[2]: *** [access-recursive] Error 2 > make[2]: Leaving directory > `/home/edb/Desktop/edb_work/WORKDB/PG/postgresql/src/backend' > make[1]: *** [all-backend-recurse] Error 2 > make[1]: Leaving directory > `/home/edb/Desktop/edb_work/WORKDB/PG/postgresql/src' > make: *** [all-src-recurse] Error 2 > > PS : I am using - gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17) > Thanks for the report and the patch. This is fixed by the patch posted with https://www.postgresql.org/message-id/CAFjFpRdRFWMc4zNjeJB6p1Ncpznc9DMdXfZJmVK5X_us5zeD9Q%40mail.gmail.com. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables
Hi, I got a server crash with partition_wise_join, steps to reproduce given below. postgres=# set enable_partition_wise_join=true; SET postgres=# CREATE TABLE tbl (a int,c text) PARTITION BY LIST(a); CREATE TABLE postgres=# CREATE TABLE tbl_p1 PARTITION OF tbl FOR VALUES IN (1, 2); CREATE TABLE postgres=# CREATE TABLE tbl_p2 PARTITION OF tbl FOR VALUES IN (3, 4); CREATE TABLE postgres=# INSERT INTO tbl VALUES (1,'P1'),(2,'P1'),(3,'P2'),(4,'P2'); INSERT 0 4 postgres=# EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM tbl t1 INNER JOIN tbl t2 ON (t1.a = t2.a) WHERE t1.c = 'P1' AND t1.c = 'P2'; NOTICE: join between relations (b 1) and (b 2) is considered for partition-wise join. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> Thanks & Regards, Rajkumar Raghuwanshi QMG, EnterpriseDB Corporation
Re: [HACKERS] Quorum commit for multiple synchronous replication.
On Sat, Sep 17, 2016 at 2:04 AM, Masahiko Sawada wrote: > Since we already released 9.6RC1, I understand that it's quite hard to > change syntax of 9.6. > But considering that we support the quorum commit, this could be one > of the solutions in order to avoid breaking backward compatibility and > to provide useful user interface. > So I attached these patches. standby_config: -standby_list{ $$ = create_syncrep_config("1", $1); } -| FIRST NUM '(' standby_list ')'{ $$ = create_syncrep_config($1, $4); } +standby_list{ $$ = create_syncrep_config("1", $1, SYNC_REP_PRIORITY); } +| ANY NUM '(' standby_list ')'{ $$ = create_syncrep_config($2, $4, SYNC_REP_QUORUM); } +| FIRST NUM '(' standby_list ')'{ $$ = create_syncrep_config($2, $4, SYNC_REP_PRIORITY); } Reading again the thread, it seems that my previous post [1] was a bit misunderstood. My position is to not introduce any new behavior changes in 9.6, so we could just make the FIRST NUM grammar equivalent to NUM. [1]: https://www.postgresql.org/message-id/CAB7nPqRDvJn18e54ccNpOP1A2_iUN6-iU=4njgmmgiagvcs...@mail.gmail.com -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting SJIS as a database encoding
Hello, At Tue, 13 Sep 2016 11:44:01 +0300, Heikki Linnakangas wrote in <7ff67a45-a53e-4d38-e25d-3a121afea...@iki.fi> > On 09/08/2016 09:35 AM, Kyotaro HORIGUCHI wrote: > > Returning in UTF-8 bloats the result string by about 1.5 times so > > it doesn't seem to make sense comparing with it. But it takes > > real = 47.35s. > > Nice! Thanks! > I was hoping that this would also make the binaries smaller. A few > dozen kB of storage is perhaps not a big deal these days, but > still. And smaller tables would also consume less memory and CPU > cache. Agreed. > I removed the #include "../../Unicode/utf8_to_sjis.map" line, so that > the old table isn't included anymore, compiled, and ran "strip > utf8_and_sjis.so". Without this patch, it's 126 kB, and with it, it's > 160 kB. So the radix tree takes a little bit more space. > > That's not too bad, and I'm sure we could live with that, but with a > few simple tricks, we could do better. First, since all the values we > store in the tree are < 0x, we could store them in int16 instead > of int32, and halve the size of the table right off the bat. won't work > for all encodings, of course, but it might be worth it to > have two versions of the code, one for int16 and another for int32. That's right. I used int imprudently. All of the character in the patch, and most of characters in other than Unicode-related encodings are in 2 bytes. 3 bytes characters can be in separate table in the struct for the case. Othersise two or more versions of the structs is possible since currently the radix struct is utf8_and_sjis's own in spite of the fact that it is in pg_wchar.h in the patch. > Another trick is to eliminate redundancies in the tables. Many of the > tables contain lots of zeros, as in: > > > /* c3xx */{ ... > > 0x817e, > > /* c398 */ 0x, 0x, 0x, 0x, 0x, 0x, 0x, > > 0x, > > /* c3a0 */ 0x, 0x, 0x, 0x, 0x, 0x, 0x, > > 0x, > > /* c3a8 */ 0x, 0x, 0x, 0x, 0x, 0x, 0x, > > 0x, > > /* c3b0 */ 0x, 0x, 0x, 0x, 0x, 0x, 0x, > > 0x8180, > > /* c3b8 */ 0x, 0x, 0x, 0x, 0x, 0x, 0x, > > 0x > > }, > > and > > > /* e388xx */{ > > /* e38880 */ 0x, 0x, 0x, 0x, 0x, 0x, 0x, > > 0x, > > /* e3 */ 0x, 0x, 0x, 0x, 0x, 0x, 0x, > > 0x, > > /* e38890 */ 0x, 0x, 0x, 0x, 0x, 0x, 0x, > > 0x, > > /* e38898 */ 0x, 0x, 0x, 0x, 0x, 0x, 0x, > > 0x, > > /* e388a0 */ 0x, 0x, 0x, 0x, 0x, 0x, 0x, > > 0x, ... > > }, > > You could overlay the last row of the first table, which is all zeros, > with the first row of the second table, which is also all zeros. (Many > of the tables have a lot more zero-rows than this example.) Yes, the bunch of zeros was annoyance. Several or many compression techniques are available in exchange for some additional CPU time. But the technique you suggested doesn't need such sacrifice, sounds nice. > But yes, this patch looks very promising in general. I think we should > switch over to radix trees for the all the encodings. The result was more than I expected for a character set with about 7000 characters. We can expect certain amount of advangate even for character sets that have less than a hundred of characters. I'll work on this for the next CF. Thanks. -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pageinspect: Hash index support
On Wed, Sep 21, 2016 at 3:25 AM, Jesper Pedersen wrote: > On 09/20/2016 12:45 PM, Jeff Janes wrote: >> Is the 2nd "1" in this call needed? >> >> SELECT * FROM hash_page_stats(get_raw_page('mytab_index', 1), 1) >> >> As far as I can tell, that argument is only used to stuff into the output >> field "blkno", it is not used to instruct the interpretation of the raw >> page itself. It doesn't seem worthwhile to have the parameter that only >> echos back to the user what the user already put in (twice). The only >> existing funtions which take the blkno argument are those that don't use >> the get_raw_page style. >> >> Also, should we document what the single letter values mean in the >> hash_page_stats.type column? It is not obvious that 'i' means bitmap, for >> example. >> > > Adjusted in v4. Thanks for the new version. > Code/doc will need an update once the CHI patch goes in. If you are referring to the WAL support, I guess that this patch or the other patch could just rebase and adjust as needed. hash_page_items and hash_page_stats share a lot of common points with their btree equivalents, still doing a refactoring would just impact the visibility of the code, and it is wanted as educational in this module, so let's go with things the way you suggest. + + The type information will be 'm' for a metadata page, + 'v' for an overflow page, 'b' for a bucket page, + 'i' for a bitmap page, and 'u' for an unused page. + Other functions don't go into this level of details, so I would just rip out this paragraph. The patch looks in pretty good shape to me, so I am switching it as ready for committer. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Wed, Sep 21, 2016 at 3:48 AM, Tomas Vondra wrote: > > I have no idea what's causing this - it might be related to the kernel, but > I'm not sure why it should affect the patches differently. Let's see how the > new kernel affects this. > > dilip / sync=off 16 32 64 128 192 > -- > master 261983790137211144418315 > granular-locking258293839540626142998160 > no-content-lock 258723899441053140588169 > group-update265033891142993194748325 > > dilip / sync=on16 32 64 128 192 > -- > master 261383779038492136538337 > granular-locking256613858640692145358311 > no-content-lock 256533905941169143708373 > group-update264723917042126189238366 > > pgbench / sync=off 16 32 64 128 192 > -- > master 230013576241202317898005 > granular-locking232183613042535458508701 > no-content-lock 233223655342772473948204 > group-update231293617741788464198163 > > pgbench / sync=on 16 32 64 128 192 > -- > master 229043607741295355748297 > granular-locking233233625442446439098959 > no-content-lock 233043667042606484408813 > group-update231273669641859466938345 > > > So there is some improvement due to the patches for 128 clients (+30% in > some cases), but it's rather useless as 64 clients either give you > comparable performance (pgbench workload) or way better one (Dilip's > workload). > I think these results are somewhat similar to what Dilip has reported. Here, if you see in both cases, the performance improvement is seen when the client count is greater than cores (including HT). As far as I know the m/c on which Dilip is running the tests also has 64 HT. The point here is that the CLOGControlLock contention is noticeable only at that client count, so it is not fault of patch that it is not improving at lower client-count. I guess that we will see performance improvement between 64~128 client counts as well. > Also, pretty much no difference between synchronous_commit on/off, probably > thanks to running on unlogged tables. > Yeah. > I'll repeat the test on the 4-socket machine with a newer kernel, but that's > probably the last benchmark I'll do for this patch for now. > Okay, but I think it is better to see the results between 64~128 client count and may be greater than128 client counts, because it is clear that patch won't improve performance below that. > I agree with > Robert that the cases the patch is supposed to improve are a bit contrived > because of the very high client counts. > No issues, I have already explained why I think it is important to reduce the remaining CLOGControlLock contention in yesterday's and this mail. If none of you is convinced, then I think we have no choice but to drop this patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On Tue, Sep 20, 2016 at 10:24 PM, Jeff Janes wrote: > On Thu, Sep 15, 2016 at 11:42 PM, Amit Kapila > wrote: >> >> >> Okay, Thanks for pointing out the same. I have fixed it. Apart from >> that, I have changed _hash_alloc_buckets() to initialize the page >> instead of making it completely Zero because of problems discussed in >> another related thread [1]. I have also updated README. >> > > with v7 of the concurrent has patch and v4 of the write ahead log patch and > the latest relcache patch (I don't know how important that is to reproducing > this, I suspect it is not), I once got this error: > > > 38422 0 2016-09-19 16:25:50.055 PDT:LOG: database system was > interrupted; last known up at 2016-09-19 16:25:49 PDT > 38422 0 2016-09-19 16:25:50.057 PDT:LOG: database system was not > properly shut down; automatic recovery in progress > 38422 0 2016-09-19 16:25:50.057 PDT:LOG: redo starts at 3F/2200DE90 > 38422 01000 2016-09-19 16:25:50.061 PDT:WARNING: page verification failed, > calculated checksum 65067 but expected 21260 > 38422 01000 2016-09-19 16:25:50.061 PDT:CONTEXT: xlog redo at 3F/22053B50 > for Hash/ADD_OVFL_PAGE: bmsize 4096, bmpage_found T > 38422 XX001 2016-09-19 16:25:50.071 PDT:FATAL: invalid page in block 9 of > relation base/16384/17334 > 38422 XX001 2016-09-19 16:25:50.071 PDT:CONTEXT: xlog redo at 3F/22053B50 > for Hash/ADD_OVFL_PAGE: bmsize 4096, bmpage_found T > > > The original page with the invalid checksum is: > I think this is a example of torn page problem, which seems to be happening because of the below code in your test. ! if (JJ_torn_page > 0 && counter++ > JJ_torn_page && !RecoveryInProgress()) { ! nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ/3); ! ereport(FATAL, ! (errcode(ERRCODE_DISK_FULL), ! errmsg("could not write block %u of relation %s: wrote only %d of %d bytes", ! blocknum, ! relpath(reln->smgr_rnode, forknum), ! nbytes, BLCKSZ), ! errhint("JJ is screwing with the database."))); ! } else { ! nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ); ! } If you are running the above test by disabling JJ_torn_page, then it is a different matter and we need to investigate it, but l assume you are running by enabling it. I think this could happen if the actual change in page is in 2/3 part of page which you are not writing in above code. The checksum in page header which is written as part of partial page write (1/3 part of page) would have considered the actual change you have made whereas after restart when it again read the page to apply redo, the checksum calculation won't include the change being made in 2/3 part. Today, Ashutosh has shared the logs of his test run where he has shown similar problem for HEAP page. I think this could happen though rarely for any page with the above kind of tests. Does this explanation explains the reason of problem you are seeing? > > If I ignore the checksum failure and re-start the system, the page gets > restored to be a bitmap page. > Okay, but have you ensured in some way that redo is applied to bitmap page? Today, while thinking on this problem, I realized that currently in patch we are using REGBUF_NO_IMAGE for bitmap page for one of the problem reported by you [1]. That change will fix the problem reported by you, but it will expose bitmap pages for torn-page hazards. I think the right fix there is to make pd_lower equal to pd_upper for bitmap page, so that full page writes doesn't exclude the data in bitmappage. Thoughts? [1] - https://www.postgresql.org/message-id/CAA4eK1KJOfVvFUmi6dcX9Y2-0PFHkomDzGuyoC%3DaD3Qj9WPpFA%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [HACKERS] Error running custom plugin: “output plugins have to declare the _PG_output_plugin_init symbol”
On Tue, Sep 20, 2016 at 4:32 PM, valeriof wrote: > Hi Ashutosh, > Thank you for your answer. At the end I realized that the PGDLLEXPORT > keyword was missing from the functions definitions. > > As a side question, what are the options to debug the plugin while it's > being executing? I've seen a debug plugin for Postgres but it seems more for > SQL functions and stored procedures. Is it possible to attach the process > from Visual Studio debugger? > I have never used Visual Studio, but you might find something useful at https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Windows. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tracking wait event for latches
On Wed, Sep 21, 2016 at 1:03 PM, Thomas Munro wrote: > Yeah but that's at run time. I meant you could help developers > discover ASAP if they add a new item to one place but not the other > with a compile time assertion: > const char * > GetWaitEventIdentifier(uint16 eventId) > { > StaticAssertStmt(lengthof(WaitEventNames) == WE_LAST_TYPE + 1, > "WaitEventNames must match WaitEventIdentifiers"); > if (eventId > WE_LAST_TYPE) > return "???"; > return WaitEventNames[eventId]; > } Ah, OK, good idea. I had AssertStmt in mind, not StaticAssertStmt. > You missed a couple that are hiding inside #ifdef WIN32: > > From pgstat.c: > + EVENT_PGSTAT_MAIN); > > From syslogger.c: > + EVENT_SYSLOGGER_MAIN); Oops. Fixed those ones and checked the builds on WIN32. -- Michael diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c index 8ca1c1c..7cecef8 100644 --- a/contrib/postgres_fdw/connection.c +++ b/contrib/postgres_fdw/connection.c @@ -496,7 +496,7 @@ pgfdw_get_result(PGconn *conn, const char *query) wc = WaitLatchOrSocket(MyLatch, WL_LATCH_SET | WL_SOCKET_READABLE, PQsocket(conn), - -1L); + -1L, WE_EXTENSION); ResetLatch(MyLatch); CHECK_FOR_INTERRUPTS(); diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 0776428..5185d65 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -679,6 +679,18 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser buffer in question. + + + EventSet: The server process is waiting on one or more + sockets, a time or an inter-process latch. The wait happens in a + WaitEventSet, PostgreSQL's portable + I/O multiplexing abstraction using boolean variables letting a + process sleep until it is set. It supports several type of + operations, like postmaster death handling, timeout and socket + activity lookup, and are a useful replacement for sleep + where signal handling matters. + + @@ -1085,6 +1097,139 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser BufferPin Waiting to acquire a pin on a buffer. + + EventSet + ArchiverMain + Waiting in main loop of the archiver process. + + + AutoVacuumMain + Waiting in main loop of autovacuum launcher process. + + + BaseBackupThrottle + Waiting during base backup when throttling activity. + + + BgWorkerShutdown + Waiting for background worker to shut down. + + + BgWorkerStartup + Waiting for background worker to start up. + + + BgWriterHibernate + Waiting in background writer process, hibernating. + + + BgWriterMain + Waiting in main loop of background writer process background worker. + + + CheckpointerMain + Waiting in main loop of checkpointer process. + + + ExecuteGather + Waiting for activity from child process when executing Gather node. + + + Extension + Waiting in the code path of an extension, should be used by custom plugins and modules + + + MessageQueueInternal + Waiting for other process to be attached in shared message queue. + + + MessageQueuePutMessage + Waiting to put new message in shared message queue. + + + MessageQueueReceive + Waiting to receive bytes in shared message queue. + + + MessageQueueSend + Waiting to send bytes in shared message queue. + + + ParallelFinish + Waiting for parallel workers to finish computing. + + + PgSleep + Waiting in process that called pg_sleep. + + + PgStatMain + Waiting in main loop of the statistics collector process. + + + ProcSignal + Waiting for signal from another backend. + + + ProcSleep + Waiting for a specific lock. + + + RecoveryApplyDelay + Waiting to apply WAL at recovery because it is delayed. + + + RecoveryWalAll + Waiting for WAL from any kind of source (local, archive or s
Re: [HACKERS] Tracking wait event for latches
On Wed, Sep 21, 2016 at 3:40 PM, Michael Paquier wrote: > On Wed, Sep 21, 2016 at 8:13 AM, Thomas Munro > wrote: >> It looks like this array wants to be in alphabetical order, but it >> isn't quite. Also, perhaps a compile time assertion about the size of >> the array matching EVENT_LAST_TYPE could be useful? > > In GetWaitEventIdentifier()? I'd think that just returning ??? would > have been fine if there is a non-matching call. Yeah but that's at run time. I meant you could help developers discover ASAP if they add a new item to one place but not the other with a compile time assertion: const char * GetWaitEventIdentifier(uint16 eventId) { StaticAssertStmt(lengthof(WaitEventNames) == WE_LAST_TYPE + 1, "WaitEventNames must match WaitEventIdentifiers"); if (eventId > WE_LAST_TYPE) return "???"; return WaitEventNames[eventId]; } >> +1 from me too for avoiding the overly general term 'event'. It does >> seem a little odd to leave the enumerators names as EVENT_... though; >> shouldn't these be WAIT_EVENT_... or WE_...? Or perhaps you could >> consider WaitPointIdentifier and WP_SECURE_READ or >> WaitEventPointIdentifier and WEP_SECURE_READ, if you buy my earlier >> argument that what we are really naming here is point in the code >> where we wait, not the events we're waiting for. Contrast with >> LWLocks where we report the lock that you're waiting for, not the >> place in the code where you're waiting for that lock. > > Well, WE_ if I need make a choice for something else than EVENT_. You missed a couple that are hiding inside #ifdef WIN32: >From pgstat.c: + EVENT_PGSTAT_MAIN); >From syslogger.c: + EVENT_SYSLOGGER_MAIN); -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tracking wait event for latches
On Wed, Sep 21, 2016 at 8:13 AM, Thomas Munro wrote: > + > + EventSet: The server process is waiting on a socket > + or a timer. This wait happens in a latch, an inter-process facility > + using boolean variables letting a process sleep until it is set. > + Latches support several type of operations, like postmaster death > + handling, timeout and socket activity lookup, and are a useful > + replacement for sleep where signal handling matters. > + > > This paragraph seems a bit confused. I suggest something more like > this: "The server process is waiting for one or more sockets, a timer > or an interprocess latch. The wait happens in a WaitEventSet, > PostgreSQL's portable IO multiplexing abstraction." OK, I have tweaked the paragraph as follows using your suggestion: + + + EventSet: The server process is waiting on one or more + sockets, a time or an inter-process latch. The wait happens in a + WaitEventSet, PostgreSQL's portable + I/O multiplexing abstraction using boolean variables letting a + process sleep until it is set. It supports several type of + operations, like postmaster death handling, timeout and socket + activity lookup, and are a useful replacement for sleep + where signal handling matters. + + > On Tue, Aug 23, 2016 at 7:01 PM, Michael Paquier > wrote: >>> On Mon, Aug 22, 2016 at 12:09 PM, Alexander Korotkov >>> wrote: Would it be better to use an array here? >> >> Okay, I have switched to an array > > I looked at various macro tricks[1] but they're all pretty unpleasant! > +1 for the simple array with carefully managed order. About that > order... > [1] > http://stackoverflow.com/questions/147267/easy-way-to-use-variables-of-enum-types-as-string-in-c Yes, I recall bumping on this one, or something really similar to that... > +const char *const WaitEventNames[] = { > [...] > +}; > > It looks like this array wants to be in alphabetical order, but it > isn't quite. Also, perhaps a compile time assertion about the size of > the array matching EVENT_LAST_TYPE could be useful? In GetWaitEventIdentifier()? I'd think that just returning ??? would have been fine if there is a non-matching call. > +typedef enum WaitEventIdentifier > +{ > [...] > +} WaitEventIdentifier; > > This is also nearly but not exactly in alphabetical order > (EVENT_PROC_SIGNAL comes after EVENT_PROC_SLEEP for example). But > it's not currently possible to have the strings and the enumerators > both in alphabetical order because they're not the same, even > accounting for camel-case to upper-case transliteration. I think at > least one of them should be sorted. Shouldn't they match fully and > then *both* be sorted alphabetically? For example > "MessageQueueInternal" doesn't match EVENT_MQ_WAIT_INTERNAL. Then > there are some cases where I'd expect underscores for consistency with > other enumerators and with the corresponding camel-case strings: you > have EVENT_WAL_SENDER_MAIN, but EVENT_WALWRITER_MAIN. Not wrong.. > The documentation is in a slightly different order again but also not > exactly alphabetical: for example ProcSleep is listed before > ProcSignal. Right. > Sorry if this all sounds terribly picky but I think we should try to > be strictly systematic here. No worries about that, it matters a lot for this patch. The user-faced documentation is what should do the decision-making I think. So let's order the names, and adapt the enum depending on that. I have done so after double-checking both lists, and added a comment for anybody updating that in the fiture. EventIdentifier seems too general name for me, isn't it? Could we name it WaitEventIdentifier? Or WaitEventId for shortcut? >> >> ... And WaitEventIdentifier. > > +1 from me too for avoiding the overly general term 'event'. It does > seem a little odd to leave the enumerators names as EVENT_... though; > shouldn't these be WAIT_EVENT_... or WE_...? Or perhaps you could > consider WaitPointIdentifier and WP_SECURE_READ or > WaitEventPointIdentifier and WEP_SECURE_READ, if you buy my earlier > argument that what we are really naming here is point in the code > where we wait, not the events we're waiting for. Contrast with > LWLocks where we report the lock that you're waiting for, not the > place in the code where you're waiting for that lock. Well, WE_ if I need make a choice for something else than EVENT_. > On the other hand, if we could *accurately* report whether it's a > "Latch", "Socket" or "Latch|Socket" that we're waiting for, it might > be cool to do that instead. One way to do that would be to use up > several class IDs: WAIT_EVENT_LATCH, WAIT_EVENT_LATCH_OR_SOCKET, > WAIT_EVENT_SOCKET (or perhaps WAIT_EVENT_LATCH | WAIT_EVENT_SOCKET, > reserving 2 or 3 upper bits from the 8 bit class ID for this). Then > we could fig
Re: [HACKERS] Logical Replication WIP
On Tue, 20 Sep 2016, Peter Eisentraut wrote: On 9/18/16 4:17 PM, Steve Singer wrote: I think if we want to prevent the creation of subscriptions that point to self, then we need to create a magic token when the postmaster starts and check for that when we connect. So more of a running-instance identifier instead of a instance-on-disk identifier. The other option is that we just allow it and make it more robust. I think we should go with the second option for now. I feel that the effort is better spent making sure that initial syncs that have don't subscribe (for whatever reasons) can be aborted instead of trying to build a concept of node identity before we really need it. Steve -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Tue, Sep 20, 2016 at 9:15 AM, Dilip Kumar wrote: > +1 > > My test are under run, I will post it soon.. I have some more results now 8 socket machine 10 min run(median of 3 run) synchronous_commit=off scal factor = 300 share buffer= 8GB test1: Simple update(pgbench) Clients Head GroupLock 32 45702 45402 64 46974 51627 128 35056 55362 test2: TPCB (pgbench) Clients Head GroupLock 32 27969 27765 64 33140 34786 128 21555 38848 Summary: -- At 32 clients no gain, I think at this workload Clog Lock is not a problem. At 64 Clients we can see ~10% gain with simple update and ~5% with TPCB. At 128 Clients we can see > 50% gain. Currently I have tested with synchronous commit=off, later I can try with on. I can also test at 80 client, I think we will see some significant gain at this client count also, but as of now I haven't yet tested. With above results, what we think ? should we continue our testing ? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix Error Message for allocate_recordbuf() Failure
On Wed, Sep 21, 2016 at 10:25 AM, Michael Paquier wrote: > On Wed, Sep 21, 2016 at 12:32 AM, Robert Haas wrote: >> For what it's worth, I think it's fine. Good error messages are a useful >> thing. >> >> More generally, I think the whole patch looks good and should be committed. > > Hm. I'd think that it is still more portable to just issue a WARNING > message in palloc_extended() when MCXT_ALLOC_NO_OOM then. Other code > paths could benefit from that as well, and the patch proposed does > nothing for the other places calling it. I am fine to write a patch > for this purpose if you agree on that. Or in short the attached. All the other callers of palloc_extended benefit from that. -- Michael diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c index 5cf388f..52b9c1b 100644 --- a/src/backend/utils/mmgr/mcxt.c +++ b/src/backend/utils/mmgr/mcxt.c @@ -973,6 +973,10 @@ palloc_extended(Size size, int flags) errmsg("out of memory"), errdetail("Failed on request of size %zu.", size))); } + ereport(WARNING, + (errcode(ERRCODE_OUT_OF_MEMORY), +errmsg("out of memory"), +errdetail("Failed on request of size %zu.", size))); return NULL; } diff --git a/src/common/fe_memutils.c b/src/common/fe_memutils.c index 58c5c4c..0d44ef6 100644 --- a/src/common/fe_memutils.c +++ b/src/common/fe_memutils.c @@ -30,9 +30,9 @@ pg_malloc_internal(size_t size, int flags) tmp = malloc(size); if (tmp == NULL) { + fprintf(stderr, _("could not allocate %zu bytes of memory\n"), size); if ((flags & MCXT_ALLOC_NO_OOM) == 0) { - fprintf(stderr, _("out of memory\n")); exit(EXIT_FAILURE); } return NULL; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)
On Thu, Sep 15, 2016 at 6:01 AM, Robert Haas wrote: > On Fri, Sep 2, 2016 at 2:33 AM, Haribabu Kommi > wrote: > > On Wed, Aug 31, 2016 at 3:19 AM, Alvaro Herrera < > alvhe...@2ndquadrant.com> > > wrote: > >> Haribabu Kommi wrote: > >> > >>> Apart from the above, here are the following list of command tags that > >>> are generated in the code, I took only the first word of the command > tag > >>> just to see how many categories present. The number indicates the > >>> subset of operations or number of types it is used. Like create table, > >>> create function and etc. > >> > >> Sounds about right. I suppose all those cases that you aggregated here > >> would expand to full tags in the actual code. I furthermore suppose > >> that some of these could be ignored, such as the transaction ones and > >> things like load, lock, move, fetch, discard, deallocate (maybe lump > >> them all together under "other", or some other rough categorization, as > >> Tom suggests). > > > > Following is the pg_stat_sql view with the SQL categories that I > considered > > that are important. Rest of the them will be shown under others category. > > > > postgres=# \d pg_stat_sql > > View "pg_catalog.pg_stat_sql" > > Column | Type | Modifiers > > -+--+--- > > inserts | bigint | > > deletes | bigint | > > updates | bigint | > > selects | bigint | > > declare_cursors | bigint | > > closes | bigint | > > creates | bigint | > > drops | bigint | > > alters | bigint | > > imports | bigint | > > truncates | bigint | > > copies | bigint | > > grants | bigint | > > revokes | bigint | > > clusters| bigint | > > vacuums | bigint | > > analyzes| bigint | > > refreshs| bigint | > > locks | bigint | > > checkpoints | bigint | > > reindexes | bigint | > > deallocates | bigint | > > others | bigint | > > stats_reset | timestamp with time zone | > > > > > > If any additions/deletions, I can accommodate them. > > I think it is not a good idea to make the command names used here the > plural forms of the command tags. Instead of "inserts", "updates", > "imports", etc. just use "INSERT", "UPDATE", "IMPORT". That's simpler > and less error prone - e.g. you won't end up with things like > "refreshs", which is not a word. Thanks for your suggestion. I also thought of changing the name while writing "refreshs" as a column name of the view originally. A small restriction with the change of names to command names is that, user needs to execute the query as follows. select pg_stat_sql.select from pg_stat_sql; Updated patch is attached with the corrections. Apart from name change, added documentation and tests. Regards, Hari Babu Fujitsu Australia pg_stat_sql_2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix Error Message for allocate_recordbuf() Failure
On Wed, Sep 21, 2016 at 12:32 AM, Robert Haas wrote: > For what it's worth, I think it's fine. Good error messages are a useful > thing. > > More generally, I think the whole patch looks good and should be committed. Hm. I'd think that it is still more portable to just issue a WARNING message in palloc_extended() when MCXT_ALLOC_NO_OOM then. Other code paths could benefit from that as well, and the patch proposed does nothing for the other places calling it. I am fine to write a patch for this purpose if you agree on that. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] get_home_path: use HOME
2016-09-20 18:35 GMT+02:00, Tom Lane : > Rudolf Gavlas writes: >> The usage of HOME environment variable (if set) is IMO the right, >> standard and faster way to get_home_path(). > > Can you provide some evidence for that claim? I can believe "faster" > but the rest sounds like wishful thinking. 1) NetBSD glob(3) http://netbsd.gw.com/cgi-bin/man-cgi?glob+3+NetBSD-current ENVIRONMENT HOME If defined, used as the home directory of the current user in tilde expansions. 2) BIND https://nxr.netbsd.org/xref/src/external/bsd/bind/dist/bin/dig/dig.c#1765 3) less https://nxr.netbsd.org/xref/src/external/bsd/less/dist/cmdbuf.c#1403 (https://nxr.netbsd.org/xref/src/external/bsd/less/dist/decode.c#533) 4) NetBSD sh(1) http://netbsd.gw.com/cgi-bin/man-cgi?sh+1+NetBSD-current ENVIRONMENT HOME Set automatically by login(1) from the user's login directory in the password file (passwd(5)). This environment variable also functions as the default argument for the cd built-in. 5) bash(1) (version 4.3.39) Shell Variables The following variables are used by the shell. In some cases, bash assigns a default value to a variable; these cases are noted below. HOME The home directory of the current user; the default argument for the cd builtin command. The value of this variable is also used when performing tilde expansion. 6) OpenLDAP https://nxr.netbsd.org/xref/src/external/bsd/openldap/dist/libraries/libldap/init.c#331 I've just grabbed what I have at hand, the list could go on ... >> I work in an environment, where servers are administered by people >> with different user names and identical uid (0). > > I think what you have there is an incredibly badly-designed system that > can be expected to break outside software (eg, Postgres). If we take > this patch, what's to stop someone from complaining that we broke *their* > badly-designed system that abuses the HOME variable? I'm pretty hesitant > to touch code that's worked the same way for a decade or two on such a > basis. I don't think this system is incredibly bad. But that's off-topic. If you think that using the value of HOME variable as the user's home directory is bad idea, I won't argue with that, I've already expressed my opinion. What is the real problem here is using home directory of a user A as a home directory for user B. That's clearly a bug and if you want to solve it without using HOME, I am fine with that. r. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] get_home_path: use HOME
2016-09-20 18:55 GMT+02:00, Alvaro Herrera : > Rudolf Gavlas wrote: > >> I work in an environment, where servers are administered by people >> with different user names and identical uid (0). > > So everyone is superuser there? That sounds, um, unorthodox. Yes, the administrators of the servers, that means people responsible for installing, configuring and running all of the software on the servers day and night are superusers there. I am quite surprised it may sound unorthodox. I am only used to unix environment though. What is the orthodox way of doing that, btw? r. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exclude schema during pg_restore
On 9/19/16 3:23 PM, Michael Banck wrote: > Version 2 attached. Committed, thanks. I added the new option to the help output in pg_restore. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On Mon, Sep 19, 2016 at 03:50:38PM -0400, Robert Haas wrote: > It will probably have some bugs, but they probably won't be worse than > the status quo: > > WARNING: hash indexes are not WAL-logged and their use is discouraged > > Personally, I think it's outright embarrassing that we've had that > limitation for years; it boils down to "hey, we have this feature but > it doesn't work", which is a pretty crummy position for the world's > most advanced open-source database to take. No question. We inherited the technical dept of hash indexes 20 years ago and haven't really solved it yet. We keep making incremental improvements, which keeps it from being removed, but hash is still far behind other index types. > > I'm rather unenthused about having a hash index implementation that's > > mildly better in some corner cases, but otherwise doesn't have much > > benefit. That'll mean we'll have to step up our user education a lot, > > and we'll have to maintain something for little benefit. > > If it turns out that it has little benefit, then we don't really need > to step up our user education. People can just keep using btree like The big problem is people coming from other databases and assuming our hash indexes have the same benefits over btree that exist in some other database software. The 9.5 warning at least helps with that. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] less expensive pg_buffercache on big shmem
On 09/02/2016 11:01 AM, Robert Haas wrote: On Fri, Sep 2, 2016 at 8:49 AM, Andres Freund wrote: On 2016-09-02 08:31:42 +0530, Robert Haas wrote: I wonder whether we ought to just switch from the consistent method to the semiconsistent method and call it good. +1. I think, before long, we're going to have to switch away from having locks & partitions in the first place. So I don't see a problem relaxing this. It's not like that consistency really buys you anything... I'd even consider not using any locks. I think we certainly want to lock the buffer header, because otherwise we might get a torn read of the buffer tag, which doesn't seem good. But it's not obvious to me that there's any point in taking the lock on the buffer mapping partition; I'm thinking that doesn't really do anything unless we lock them all, and we all seem to agree that's going too far. +1 from me to only locking the buffer headers. IMHO that's perfectly fine for the purpose of this extension. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] less expensive pg_buffercache on big shmem
On 09/05/2016 06:19 PM, Ivan Kartyshov wrote: On 09/03/2016 05:04 AM, Tomas Vondra wrote: This patch needs a rebase, as 06d7fd6e bumped the version to 1.2. Thank you for a valuable hint. So, will we get a rebased patch? I see the patch is back in 'needs review' but there's no new version. > If we will replace consistent method, then we should replace it with the > partially consistent method (called "nonconsistent") because: > 1) it's based on fast spinlocks (it's not fully up to its name, though) In other words, it does exactly the thing proposed up-thread, i.e. locking only buffer headers. What do you mean by fast spinlocks? And why aren't they up to the name? Not they (spinlocks), but the name “nonconsistent” is somewhat misleading. At the moment we can't implement concurrent shared memory access without locks in general, so most inconsistent method that has been proposed was the "nonconsistent" one. But roughly speaking *nonconsistent* is not as such by the name, because it contains a locking mechanism (spinlocks). I'm a bit confused by the naming, but my understanding is that nonconsistent only acquires (spin)locks on individual buffers, so it does not have a consistent view on shared_buffers as a whole, but data about individual buffers are consistent. I think that's fine and perfectly acceptable for this extension. > 2) it's *almost* the fastest one (the less time needed for execution of > method, the less data core will change and as a consequence the more > consistent snapshot will be) I'm not sure I understand what you're saying here? What do you mean by "almost the fastest one"? I mean the fastest one that has been proposed ("consistent"| "semiconsistent"| "nonconsistent"). I'm a bit confused by the results you reported before, i.e. that 1) nonconsistent is 10% faster than consistent method 2) semiconsistent is 5-times slower than nonconsistent method What does that mean? Are you refering to duration of the queries reading data from pg_buffercache, or to impact on the other queries? Here I mean "working duration time". I have no idea what "working duration time" is. How can be semiconsistent 5x slower than nonconsistent? Wouldn't that make it significantly slower than the consistent method? Firstly, when we want to measure the quality of pg_buffercache, we must measure several values: 1) Execution time (duration of the queries reading data from pg_buffercache) 2) How it influences the system (and other queries) during its work Secondly, the semiconsistent is slower than nonconsistent and consistent method, but it makes less influence on other queries then consistent. Thirdly, it is slower because it locks the partitions of shared memory in a different way than in consistent or nonconsistent methods. The semi-consistent strategy implies that only one buffer is locked at a time. Therefor has no significant effect on the system, but it takes more time. It'd be really useful if you could provide actual numbers, explain what metrics you compare and how. I'm sure it makes sense to you but it's utterly confusing for everyone else, and it also makes it impossible to reproduce the benchmark. I've looked at the code (applied on top of e3b607c) and I see this in pg_buffercache_pages: for (j = 0; j < num_partit; j++) { if (snaptype == CONS_SNAP) for (i = 0; i < NUM_BUFFER_PARTITIONS; i++) LWLockAcquire(BufMappingPartitionLockByIndex(i), LW_SHARED); else if (snaptype == SEMICONS_SNAP) LWLockAcquire(BufMappingPartitionLockByIndex(j), LW_SHARED); for (i = 0; i < NBuffers; i++) { ... walk through all shared buffers ... } } How is the SEMICONS_SNAP case supposed to only scan the currently locked partition, when we simply scans all shared buffers for SEMICONS_SNAP? That seems outright broken, it should scan only the currently locked partition. Secondly, I see this bit added to the loop over buffers: if (bufHdr->tag.forkNum == -1) { fctx->record[i].blocknum = InvalidBlockNumber; continue; } and I have no idea why this is needed (when it was not needed before). The patch also breaks a few comments, because it adds code between the comment and the related code. For example the (likely unnecessary) bit of code changes this: /* Lock each buffer header before inspecting. */ buf_state = LockBufHdr(bufHdr); into this /* Lock each buffer header before inspecting. */ if (bufHdr->tag.forkNum == -1) { fctx->record[i].blocknum = InvalidBlockNumber; continue; } buf_state = LockBufHdr(bufHdr); which is confusing (and I'd argue broken). Similarly when releasing the locks, the original comment/code block looks like this: /* * And release locks. We do this in reverse order for two reasons: * (1) Anyone else who needs more than one of the locks w
Re: [HACKERS] Tracking wait event for latches
+ + EventSet: The server process is waiting on a socket + or a timer. This wait happens in a latch, an inter-process facility + using boolean variables letting a process sleep until it is set. + Latches support several type of operations, like postmaster death + handling, timeout and socket activity lookup, and are a useful + replacement for sleep where signal handling matters. + This paragraph seems a bit confused. I suggest something more like this: "The server process is waiting for one or more sockets, a timer or an interprocess latch. The wait happens in a WaitEventSet, PostgreSQL's portable IO multiplexing abstraction." On Tue, Aug 23, 2016 at 7:01 PM, Michael Paquier wrote: >> On Mon, Aug 22, 2016 at 12:09 PM, Alexander Korotkov >> wrote: >>> Would it be better to use an array here? > > Okay, I have switched to an array I looked at various macro tricks[1] but they're all pretty unpleasant! +1 for the simple array with carefully managed order. About that order... +const char *const WaitEventNames[] = { + "ArchiverMain", + "AutoVacuumMain", + "BaseBackupThrottle", + "BgWorkerStartup", + "BgWorkerShutdown", + "BgWriterMain", + "BgWriterHibernate", + "CheckpointerMain", + "ExecuteGather", + "Extension", + "MessageQueuePutMessage", + "MessageQueueSend", + "MessageQueueReceive", + "MessageQueueInternal", + "ParallelFinish", + "PgStatMain", + "ProcSleep", + "ProcSignal", + "PgSleep", + "SecureRead", + "SecureWrite", + "SSLOpenServer", + "SyncRep", + "SysLoggerMain", + "RecoveryApplyDelay", + "RecoveryWalAll", + "RecoveryWalStream", + "WalReceiverWaitStart", + "WalReceiverMain", + "WalSenderMain", + "WalSenderWaitForWAL", + "WalSenderWriteData" + "WalWriterMain", +}; It looks like this array wants to be in alphabetical order, but it isn't quite. Also, perhaps a compile time assertion about the size of the array matching EVENT_LAST_TYPE could be useful? +typedef enum WaitEventIdentifier +{ + EVENT_ARCHIVER_MAIN, + EVENT_AUTOVACUUM_MAIN, + EVENT_BASEBACKUP_THROTTLE, + EVENT_BGWORKER_STARTUP, + EVENT_BGWORKER_SHUTDOWN, + EVENT_BGWRITER_MAIN, + EVENT_BGWRITER_HIBERNATE, + EVENT_CHECKPOINTER_MAIN, + EVENT_EXECUTE_GATHER, + EVENT_EXTENSION, + EVENT_MQ_PUT_MESSAGE, + EVENT_MQ_SEND_BYTES, + EVENT_MQ_RECEIVE_BYTES, + EVENT_MQ_WAIT_INTERNAL, + EVENT_PARALLEL_WAIT_FINISH, + EVENT_PGSTAT_MAIN, + EVENT_PROC_SLEEP, + EVENT_PROC_SIGNAL, + EVENT_PG_SLEEP, + EVENT_SECURE_READ, + EVENT_SECURE_WRITE, + EVENT_SSL_OPEN_SERVER, + EVENT_SYNC_REP, + EVENT_SYSLOGGER_MAIN, + EVENT_RECOVERY_APPLY_DELAY, + EVENT_RECOVERY_WAL_ALL, + EVENT_RECOVERY_WAL_STREAM, + EVENT_WAL_RECEIVER_WAIT_START, + EVENT_WAL_RECEIVER_MAIN, + EVENT_WAL_SENDER_WRITE_DATA, + EVENT_WAL_SENDER_MAIN, + EVENT_WAL_SENDER_WAIT_WAL, + EVENT_WALWRITER_MAIN +} WaitEventIdentifier; This is also nearly but not exactly in alphabetical order (EVENT_PROC_SIGNAL comes after EVENT_PROC_SLEEP for example). But it's not currently possible to have the strings and the enumerators both in alphabetical order because they're not the same, even accounting for camel-case to upper-case transliteration. I think at least one of them should be sorted. Shouldn't they match fully and then *both* be sorted alphabetically? For example "MessageQueueInternal" doesn't match EVENT_MQ_WAIT_INTERNAL. Then there are some cases where I'd expect underscores for consistency with other enumerators and with the corresponding camel-case strings: you have EVENT_WAL_SENDER_MAIN, but EVENT_WALWRITER_MAIN. The documentation is in a slightly different order again but also not exactly alphabetical: for example ProcSleep is listed before ProcSignal. Sorry if this all sounds terribly picky but I think we should try to be strictly systematic here. >>> EventIdentifier seems too general name for me, isn't it? Could we name it >>> WaitEventIdentifier? Or WaitEventId for shortcut? > > ... And WaitEventIdentifier. +1 from me too for avoiding the overly general term 'event'. It does seem a little odd to leave the enumerators names as EVENT_... though; shouldn't these be WAIT_EVENT_... or WE_...? Or perhaps you could consider WaitPointIdentifier and WP_SECURE_READ or WaitEventPointIdentifier and WEP_SECURE_READ, if you buy my earlier argument that what we are really naming here is point in the code where we wait, not the events we're waiting for. Contrast with LWLocks where we report the lock that you're waiting for, not the place in the code where you're waiting for that lock. On Wed, Aug 3, 2016 at 1:31 AM, Michael Paquier wrote: > Finally, I have changed the patch so as it does not track "Latch" > events, but "EventSet" events instead, per the suggestion of Thomas. > "WaitEventSet" is too close to wait_event in my taste so I shortened > the suggeston. This is good, because showing "Latch" when we were really waiting for a socket was misleading. On the other hand, if we could *accurately* report whether it's a "
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
Hi, On 09/19/2016 09:10 PM, Robert Haas wrote: > It's possible that the effect of this patch depends on the number of sockets. EDB test machine cthulhu as 8 sockets, and power2 has 4 sockets. I assume Dilip's tests were run on one of those two, although he doesn't seem to have mentioned which one. Your system is probably 2 or 4 sockets, which might make a difference. Results might also depend on CPU architecture; power2 is, unsurprisingly, a POWER system, whereas I assume you are testing x86. Maybe somebody who has access should test on hydra.pg.osuosl.org, which is a community POWER resource. (Send me a private email if you are a known community member who wants access for benchmarking purposes.) Yes, I'm using x86 machines: 1) large but slightly old - 4 sockets, e5-4620 (so a bit old CPU, 32 cores in total) - kernel 3.2.80 2) smaller but fresh - 2 sockets, e5-2620 v4 (newest type of Xeons, 16 cores in total) - kernel 4.8.0 Personally, I find the results so far posted on this thread thoroughly unimpressive. I acknowledge that Dilip's results appear to show that in a best-case scenario these patches produce a rather large gain. However, that gain seems to happen in a completely contrived scenario: astronomical client counts, unlogged tables, and a test script that maximizes pressure on CLogControlLock. If you have to work that hard to find a big win, and tests under more reasonable conditions show no benefit, it's not clear to me that it's really worth the time we're all spending benchmarking and reviewing this, or the risk of bugs, or the damage to the SLRU abstraction layer. I think there's a very good chance that we're better off moving on to projects that have a better chance of helping in the real world. I'm posting results from two types of workloads - traditional r/w pgbench and Dilip's transaction. With synchronous_commit on/off. Full results (including script driving the benchmark) are available here, if needed: https://bitbucket.org/tvondra/group-clog-benchmark/src It'd be good if someone could try reproduce this on a comparable machine, to rule out my stupidity. 2 x e5-2620 v4 (16 cores, 32 with HT) = On the "smaller" machine the results look like this - I have only tested up to 64 clients, as higher values seem rather uninteresting on a machine with only 16 physical cores. These are averages of 5 runs, where the min/max for each group are within ~5% in most cases (see the "spread" sheet). The "e5-2620" sheet also shows the numbers as % compared to master. dilip / sync=off 148 16 32 64 -- master 47561767235542573037459682138 granular-locking 47451772835078561057298377858 no-content-lock46461765034887557947327379000 group-update 45821775735383569747438781794 dilip / sync=on 148 16 32 64 -- master 48191758335636574377462082036 granular-locking 45681781635122561687319278462 no-content-lock45401766234747555607350879320 group-update 44951761235474570957440981874 pgbench / sync=off148 16 32 64 -- master 37911436827806433695447262956 granular-locking 38221446227597431735639164669 no-content-lock37251421227471430415543163589 group-update 38951445327574434055678362406 pgbench / sync=on 148 16 32 64 -- master 39071428927802437175690262916 granular-locking 37701450327636441075520563903 no-content-lock37721411127388430545642464386 group-update 38441433427452436215589662498 There's pretty much no improvement at all - most of the results are within 1-2% of master, in both directions. Hardly a win. Actually, with 1 client there seems to be ~5% regression, but it might also be noise and verifying it would require further testing. 4 x e5-4620 v1 (32 cores, 64 with HT) = These are averages of 10 runs, and there are a few strange things here. Firstly, for Dilip's workload the results get much (much) worse between 64 and 128 clients, for some reason. I suspect this might be due to fairly old kernel (3.2.80), so I'll reboot the machine with 4.5.x kernel and try again. Secondly, the min/max differences get much larger than the ~5% on the smaller
Re: [HACKERS] gratuitous casting away const
> On Sep 20, 2016, at 1:06 PM, Tom Lane wrote: > > Mark Dilger writes: >> Would patches to not cast away const be considered? > > In general, yes, but I'm not especially in favor of something like this: > >> bool >> PageIndexTupleOverwrite(Page page, OffsetNumber offnum, >> - Item newtup, Size newsize) >> + const char *newtup, Size >> newsize) >> { > > since that seems to be discarding type information in order to add > "const"; does not seem like a net benefit from here. The following seems somewhere in between, with ItemPointer changing to const ItemPointerData *. I expect you would not care for this change, but thought I'd check to see where you draw the line: diff --git a/src/backend/access/gin/ginbulk.c b/src/backend/access/gin/ginbulk.c index 71c64e4..903b01f 100644 --- a/src/backend/access/gin/ginbulk.c +++ b/src/backend/access/gin/ginbulk.c @@ -244,7 +244,7 @@ ginInsertBAEntries(BuildAccumulator *accum, static int qsortCompareItemPointers(const void *a, const void *b) { - int res = ginCompareItemPointers((ItemPointer) a, (ItemPointer) b); + int res = ginCompareItemPointers((const ItemPointerData *) a, (const ItemPointerData *) b); /* Assert that there are no equal item pointers being sorted */ Assert(res != 0); diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h index bf589ab..2e5a7dff 100644 --- a/src/include/access/gin_private.h +++ b/src/include/access/gin_private.h @@ -968,7 +968,7 @@ extern ItemPointer ginMergeItemPointers(ItemPointerData *a, uint32 na, * so we want this to be inlined. */ static inline int -ginCompareItemPointers(ItemPointer a, ItemPointer b) +ginCompareItemPointers(const ItemPointerData *a, const ItemPointerData *b) { uint64 ia = (uint64) a->ip_blkid.bi_hi << 32 | (uint64) a->ip_blkid.bi_lo << 16 | a->ip_posid; uint64 ib = (uint64) b->ip_blkid.bi_hi << 32 | (uint64) b->ip_blkid.bi_lo << 16 | b->ip_posid; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] needlessly casting away const in localtime.c
Mark Dilger writes: > Friends, per the recent thread "gratuitous casting away const", the > assignment on line 1247 of localtime.c has const lvalue and rvalue, > yet casts through (char *) rather than (const char *). Fix attached. If you want to propose cosmetic improvements in localtime.c, or the majority of the other stuff in src/timezone, you need to send it to the IANA folk. Otherwise it'll just be lost next time we sync with them. See src/timezone/README. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] needlessly casting away const in localtime.c
Friends, per the recent thread "gratuitous casting away const", the assignment on line 1247 of localtime.c has const lvalue and rvalue, yet casts through (char *) rather than (const char *). Fix attached. Mark Dilger localtime.c.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] increasing the default WAL segment size
On 2016-09-20 16:32:46 -0400, Robert Haas wrote: > > Requiring a non-default compile time or even just cluster creation time > > option for tuning isn't something worth expanding energy on imo. > > I don't agree. The latency requirements on an archive_command when > you're churning out 16MB files multiple times per second are insanely > tight, and saying that we shouldn't increase the size because it's > better to go redesign a bunch of other things that will eventually > *maybe* remove the need for archive_command does not seem like a > reasonable response. Oh, I'm on board with increasing the default size a bit. A different default size isn't a non-default compile time option anymore though, and I don't think 1GB is a reasonable default. Running multiple archive_commands concurrently - pretty easy to implement - isn't the same as removing the need for archive command. I'm pretty sure that continously,and if necessary concurrently, archiving a bunch of 64MB files is going to work better than irregularly creating / transferring 1GB files. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] increasing the default WAL segment size
On Tue, Sep 20, 2016 at 4:25 PM, Andres Freund wrote: >> Even with a 1GB segment size, some of them will fill multiple files >> per minute. At the current limit of 64MB, a few of them would still >> fill more than one file per second. That is not sane. > > I doubt generating much larger files actually helps a lot there. I bet > you a patch review that 1GB files are going to regress in pretty much > every situation; especially when taking latency into account. Well, you have a point: let's find out. Suppose we create a cluster that generates WAL very quickly, and then try different WAL segment sizes and see what works out best. Maybe something like: create N relatively small tables, with 100 or so tuples in each one. Have N backends, each assigned one of those tables, and it just updates all the rows over and over in a tight loop. Or feel free to suggest something else. > I think what's actually needed for that is: > - make it easier to implement archiving via streaming WAL; i.e. make > pg_receivexlog actually usable > - make archiving parallel > - decouple WAL write & fsyncing granularity from segment size > > Requiring a non-default compile time or even just cluster creation time > option for tuning isn't something worth expanding energy on imo. I don't agree. The latency requirements on an archive_command when you're churning out 16MB files multiple times per second are insanely tight, and saying that we shouldn't increase the size because it's better to go redesign a bunch of other things that will eventually *maybe* remove the need for archive_command does not seem like a reasonable response. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] increasing the default WAL segment size
On 2016-09-20 16:18:02 -0400, Robert Haas wrote: > On Tue, Sep 20, 2016 at 4:09 PM, Andres Freund wrote: > > That sounds way too big to me. WAL file allocation would trigger pretty > > massive IO storms during zeroing, max_wal_size is going to be hard to > > tune, the amounts of dirty data during bulk loads is going to be very > > hard to control. If somebody wants to do something like this they > > better be well informed enough to override a #define. > > EnterpriseDB has customers generating multiple TB of WAL per day. Sure, that's kind of common. > Even with a 1GB segment size, some of them will fill multiple files > per minute. At the current limit of 64MB, a few of them would still > fill more than one file per second. That is not sane. I doubt generating much larger files actually helps a lot there. I bet you a patch review that 1GB files are going to regress in pretty much every situation; especially when taking latency into account. I think what's actually needed for that is: - make it easier to implement archiving via streaming WAL; i.e. make pg_receivexlog actually usable - make archiving parallel - decouple WAL write & fsyncing granularity from segment size Requiring a non-default compile time or even just cluster creation time option for tuning isn't something worth expanding energy on imo. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] increasing the default WAL segment size
On Tue, Sep 20, 2016 at 4:09 PM, Andres Freund wrote: > On 2016-09-20 16:05:44 -0400, Robert Haas wrote: >> On Tue, Sep 20, 2016 at 2:49 PM, Peter Eisentraut >> wrote: >> > On 8/24/16 9:31 PM, Robert Haas wrote: >> >> I'd like to propose that we increase the default WAL segment size, >> >> which is currently 16MB. >> > >> > While the discussion about the best default value is ongoing, maybe we >> > should at least *allow* some larger sizes, for testing out. Currently, >> > configure says "Allowed values are 1,2,4,8,16,32,64.". What might be a >> > good new upper limit? > > I'm doubtful it's worth increasing this. > >> 1GB? > > That sounds way too big to me. WAL file allocation would trigger pretty > massive IO storms during zeroing, max_wal_size is going to be hard to > tune, the amounts of dirty data during bulk loads is going to be very > hard to control. If somebody wants to do something like this they > better be well informed enough to override a #define. EnterpriseDB has customers generating multiple TB of WAL per day. Even with a 1GB segment size, some of them will fill multiple files per minute. At the current limit of 64MB, a few of them would still fill more than one file per second. That is not sane. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] increasing the default WAL segment size
On 2016-09-20 16:05:44 -0400, Robert Haas wrote: > On Tue, Sep 20, 2016 at 2:49 PM, Peter Eisentraut > wrote: > > On 8/24/16 9:31 PM, Robert Haas wrote: > >> I'd like to propose that we increase the default WAL segment size, > >> which is currently 16MB. > > > > While the discussion about the best default value is ongoing, maybe we > > should at least *allow* some larger sizes, for testing out. Currently, > > configure says "Allowed values are 1,2,4,8,16,32,64.". What might be a > > good new upper limit? I'm doubtful it's worth increasing this. > 1GB? That sounds way too big to me. WAL file allocation would trigger pretty massive IO storms during zeroing, max_wal_size is going to be hard to tune, the amounts of dirty data during bulk loads is going to be very hard to control. If somebody wants to do something like this they better be well informed enough to override a #define. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Tue, Sep 20, 2016 at 3:31 PM, Peter Eisentraut wrote: > On 9/20/16 11:47 AM, Robert Haas wrote: >> #define BGWORKER_CLASS_MASK 0x00f0 >> #define BGWORKER_CLASS_PARALLEL 0x0010 >> /* add additional bgworker classes here */ >>> > >>> > Unless we have a mechanism that makes use of this somehow, this attempt >>> > at categorizing might be premature. >> My main reason for wanting is that I suspect there will be a similar >> desire to limit the number of replication workers at some point. > > Would that work for something developed as an extension? Would we all > have to agree on non-conflicting numbers? Maybe a string tag would work > better? No, I'm assuming that the classes would be built-in. A string tag seems like over-engineering to me, particularly because the postmaster needs to switch on the tag, and we need to be very careful about the degree to which the postmaster trusts the contents of shared memory. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gratuitous casting away const
Mark Dilger writes: > Would patches to not cast away const be considered? In general, yes, but I'm not especially in favor of something like this: > bool > PageIndexTupleOverwrite(Page page, OffsetNumber offnum, > - Item newtup, Size newsize) > + const char *newtup, Size > newsize) > { since that seems to be discarding type information in order to add "const"; does not seem like a net benefit from here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] increasing the default WAL segment size
On Tue, Sep 20, 2016 at 2:49 PM, Peter Eisentraut wrote: > On 8/24/16 9:31 PM, Robert Haas wrote: >> I'd like to propose that we increase the default WAL segment size, >> which is currently 16MB. > > While the discussion about the best default value is ongoing, maybe we > should at least *allow* some larger sizes, for testing out. Currently, > configure says "Allowed values are 1,2,4,8,16,32,64.". What might be a > good new upper limit? 1GB? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] gratuitous casting away const
Friends, There are places in the code that cast away const for no apparent reason, fixed like such: diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c index 3143bd9..fe2cfc2 100644 --- a/src/backend/executor/nodeIndexscan.c +++ b/src/backend/executor/nodeIndexscan.c @@ -409,8 +409,8 @@ static int reorderqueue_cmp(const pairingheap_node *a, const pairingheap_node *b, void *arg) { - ReorderTuple *rta = (ReorderTuple *) a; - ReorderTuple *rtb = (ReorderTuple *) b; + const ReorderTuple *rta = (const ReorderTuple *) a; + const ReorderTuple *rtb = (const ReorderTuple *) b; IndexScanState *node = (IndexScanState *) arg; return -cmp_orderbyvals(rta->orderbyvals, rta->orderbynulls, There are also places which appear to cast away const due to using typedefs that don't include const, which make for a more messy fix, like such: diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c index 73aa0c0..9e157a3 100644 --- a/src/backend/storage/page/bufpage.c +++ b/src/backend/storage/page/bufpage.c @@ -1037,7 +1037,7 @@ PageIndexTupleDeleteNoCompact(Page page, OffsetNumber offnum) */ bool PageIndexTupleOverwrite(Page page, OffsetNumber offnum, - Item newtup, Size newsize) + const char *newtup, Size newsize) { PageHeader phdr = (PageHeader) page; ItemId tupid; diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h index ad4ab5f..cd97a1a 100644 --- a/src/include/storage/bufpage.h +++ b/src/include/storage/bufpage.h @@ -431,7 +431,7 @@ extern void PageIndexTupleDelete(Page page, OffsetNumber offset); extern void PageIndexMultiDelete(Page page, OffsetNumber *itemnos, int nitems); extern void PageIndexTupleDeleteNoCompact(Page page, OffsetNumber offset); extern bool PageIndexTupleOverwrite(Page page, OffsetNumber offnum, - Item newtup, Size newsize); + const char *newtup, Size newsize); extern char *PageSetChecksumCopy(Page page, BlockNumber blkno); extern void PageSetChecksumInplace(Page page, BlockNumber blkno); Are these castings away of const consistent with the project's coding standards? Would patches to not cast away const be considered? Thanks, Mark Dilger -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On 9/20/16 11:47 AM, Robert Haas wrote: >>> >> #define BGWORKER_CLASS_MASK 0x00f0 >>> >> #define BGWORKER_CLASS_PARALLEL 0x0010 >>> >> /* add additional bgworker classes here */ >> > >> > Unless we have a mechanism that makes use of this somehow, this attempt >> > at categorizing might be premature. > My main reason for wanting is that I suspect there will be a similar > desire to limit the number of replication workers at some point. Would that work for something developed as an extension? Would we all have to agree on non-conflicting numbers? Maybe a string tag would work better? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATE and DELETE
On 9/20/16 3:04 PM, David Fetter wrote: > On Fri, Sep 09, 2016 at 09:57:21AM -0400, Peter Eisentraut wrote: >> Review of the patch in the commit fest: >> >> - The documentation is a bit incorrect about the ways to load this >> module. shared_preload_libraries is not necessary. session_ and >> local_ (with prep) should also work. > > Would you be so kind as to describe how you got > local_preload_libraries to work? I'm stuck on getting Makefile to > realize that the hook should be installed in $libdir/plugins rather > than $libdir itself. There is no Makefile support for that, and AFAICT, that particular feature is kind of obsolescent. I wouldn't worry about it too much. The main point was, there are multiple ways to load modules. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATE and DELETE
On Fri, Sep 09, 2016 at 09:57:21AM -0400, Peter Eisentraut wrote: > Review of the patch in the commit fest: > > - The documentation is a bit incorrect about the ways to load this > module. shared_preload_libraries is not necessary. session_ and > local_ (with prep) should also work. Would you be so kind as to describe how you got local_preload_libraries to work? I'm stuck on getting Makefile to realize that the hook should be installed in $libdir/plugins rather than $libdir itself. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] get_home_path: use HOME
Rudolf Gavlas wrote: > 2016-09-20 18:55 GMT+02:00, Alvaro Herrera : > > Rudolf Gavlas wrote: > > > >> I work in an environment, where servers are administered by people > >> with different user names and identical uid (0). > > > > So everyone is superuser there? That sounds, um, unorthodox. > > Yes, the administrators of the servers, that means people responsible > for installing, configuring and running all of the software on the > servers day and night are superusers there. I am quite surprised it > may sound unorthodox. I am only used to unix environment though. What > is the orthodox way of doing that, btw? In my view of the world, each of the admins would have a regular user, with the privilege of running commands as superuser using something like "sudo" (including running a shell). get_home_path is psql's code. I would expect client connections to come from regular users, as it is considered risky to run all code with elevated privileges, anyway. As I recall, if you tried to start the postgres server using a superuser account you would quickly find out that it completely refuses to start. I suppose it works because some start script su's to the postgres unprivileged account to run pg_ctl. (Windows is an exception to this, where it used to be customary to run servers using administrator privileges, where instead of outright refusing to run, pg_ctl would drop all privileges first.) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "Some tests to cover hash_index"
Why not use generate_series() queries to insert the appropriate number of tuples, instead of a handful of INSERT lines each time? Since each insert is a separate transaction, that would probably be faster. Why do you have a plpgsql function just to create a cursor? Wouldn't it be simpler to create the cursor in an SQL statement? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] increasing the default WAL segment size
On 8/24/16 9:31 PM, Robert Haas wrote: > I'd like to propose that we increase the default WAL segment size, > which is currently 16MB. While the discussion about the best default value is ongoing, maybe we should at least *allow* some larger sizes, for testing out. Currently, configure says "Allowed values are 1,2,4,8,16,32,64.". What might be a good new upper limit? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tuplesort merge pre-reading
On Fri, Sep 9, 2016 at 9:51 PM, Claudio Freire wrote: > On Fri, Sep 9, 2016 at 8:13 AM, Heikki Linnakangas wrote: >> >> Claudio, if you could also repeat the tests you ran on Peter's patch set on >> the other thread, with these patches, that'd be nice. These patches are >> effectively a replacement for >> 0002-Use-tuplesort-batch-memory-for-randomAccess-sorts.patch. And review >> would be much appreciated too, of course. >> >> Attached are new versions. Compared to last set, they contain a few comment >> fixes, and a change to the 2nd patch to not allocate tape buffers for tapes >> that were completely unused. > > > Will do so Well, here they are, the results. ODS format only (unless you've got issues opening the ODS). The results seem all over the map. Some regressions seem significant (both in the amount of performance lost and their significance, since all 4 runs show a similar regression). The worst being "CREATE INDEX ix_lotsofitext_zz2ijw ON lotsofitext (z, z2, i, j, w);" with 4GB work_mem, which should be an in-memory sort, which makes it odd. I will re-run it overnight just in case to confirm the outcome. logtape_preload_timings.ods Description: application/vnd.oasis.opendocument.spreadsheet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On 09/16/2016 02:42 AM, Amit Kapila wrote: Okay, Thanks for pointing out the same. I have fixed it. Apart from that, I have changed _hash_alloc_buckets() to initialize the page instead of making it completely Zero because of problems discussed in another related thread [1]. I have also updated README. Thanks. This needs a rebase against the CHI v8 [1] patch. [1] https://www.postgresql.org/message-id/CAA4eK1+X=8sud1uczdzne3d9cgi9kw+kjxp2tnw7sx5w8pl...@mail.gmail.com Best regards, Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pageinspect: Hash index support
Hi, On 09/20/2016 01:46 PM, Ashutosh Sharma wrote: I am getting a fatal error along with some warnings when trying to apply the v3 patch using "git apply" command. [ashu@localhost postgresql]$ git apply 0001-pageinspect-Hash-index-support_v3.patch 0001-pageinspect-Hash-index-support_v3.patch:29: trailing whitespace. brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES) 0001-pageinspect-Hash-index-support_v3.patch:36: trailing whitespace. DATA = pageinspect--1.6.sql pageinspect--1.5--1.6.sql \ 0001-pageinspect-Hash-index-support_v3.patch:37: trailing whitespace. pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \ 0001-pageinspect-Hash-index-support_v3.patch:38: trailing whitespace. pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \ 0001-pageinspect-Hash-index-support_v3.patch:39: trailing whitespace. pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql fatal: git apply: bad git-diff - expected /dev/null on line 46 Which platform are you on ? The development branch is @ https://github.com/jesperpedersen/postgres/tree/pageinspect_hash Also, i think the USAGE for hash_metap() is refering to hash_metap_bytea(). Please correct it. I have just started reviewing the patch, will keep on posting my comments upon further review. Fixed in v4. Thanks for the review. Best regards, Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pageinspect: Hash index support
On 09/20/2016 12:45 PM, Jeff Janes wrote: Is the 2nd "1" in this call needed? SELECT * FROM hash_page_stats(get_raw_page('mytab_index', 1), 1) As far as I can tell, that argument is only used to stuff into the output field "blkno", it is not used to instruct the interpretation of the raw page itself. It doesn't seem worthwhile to have the parameter that only echos back to the user what the user already put in (twice). The only existing funtions which take the blkno argument are those that don't use the get_raw_page style. Also, should we document what the single letter values mean in the hash_page_stats.type column? It is not obvious that 'i' means bitmap, for example. Adjusted in v4. Code/doc will need an update once the CHI patch goes in. Best regards, Jesper >From 1f27a2bb28cc6dfea9cba015d7cceab768f67d0a Mon Sep 17 00:00:00 2001 From: jesperpedersen Date: Fri, 5 Aug 2016 10:16:32 -0400 Subject: [PATCH] pageinspect: Hash index support --- contrib/pageinspect/Makefile | 10 +- contrib/pageinspect/hashfuncs.c | 403 ++ contrib/pageinspect/pageinspect--1.5--1.6.sql | 63 contrib/pageinspect/pageinspect--1.5.sql | 279 -- contrib/pageinspect/pageinspect--1.6.sql | 338 + contrib/pageinspect/pageinspect.control | 2 +- doc/src/sgml/pageinspect.sgml | 104 +++ 7 files changed, 914 insertions(+), 285 deletions(-) create mode 100644 contrib/pageinspect/hashfuncs.c create mode 100644 contrib/pageinspect/pageinspect--1.5--1.6.sql delete mode 100644 contrib/pageinspect/pageinspect--1.5.sql create mode 100644 contrib/pageinspect/pageinspect--1.6.sql diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile index a98237e..c73d728 100644 --- a/contrib/pageinspect/Makefile +++ b/contrib/pageinspect/Makefile @@ -2,13 +2,13 @@ MODULE_big = pageinspect OBJS = rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o \ - brinfuncs.o ginfuncs.o $(WIN32RES) + brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES) EXTENSION = pageinspect -DATA = pageinspect--1.5.sql pageinspect--1.4--1.5.sql \ - pageinspect--1.3--1.4.sql pageinspect--1.2--1.3.sql \ - pageinspect--1.1--1.2.sql pageinspect--1.0--1.1.sql \ - pageinspect--unpackaged--1.0.sql +DATA = pageinspect--1.6.sql pageinspect--1.5--1.6.sql \ + pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \ + pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \ + pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql PGFILEDESC = "pageinspect - functions to inspect contents of database pages" ifdef USE_PGXS diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c new file mode 100644 index 000..6efbf22 --- /dev/null +++ b/contrib/pageinspect/hashfuncs.c @@ -0,0 +1,403 @@ +/* + * hashfuncs.c + * Functions to investigate the content of HASH indexes + * + * Copyright (c) 2016, PostgreSQL Global Development Group + * + * IDENTIFICATION + * contrib/pageinspect/hashfuncs.c + */ + +#include "postgres.h" + +#include "access/hash.h" +#include "funcapi.h" +#include "miscadmin.h" + +PG_FUNCTION_INFO_V1(hash_metap); +PG_FUNCTION_INFO_V1(hash_page_items); +PG_FUNCTION_INFO_V1(hash_page_stats); + +/* + * structure for single hash page statistics + * + */ +typedef struct HashPageStat +{ + uint32 live_items; + uint32 dead_items; + uint32 page_size; + uint32 max_avail; + uint32 free_size; + uint32 avg_item_size; + char type; + + /* opaque data */ + BlockNumber hasho_prevblkno; + BlockNumber hasho_nextblkno; + Bucket hasho_bucket; + uint16 hasho_flag; + uint16 hasho_page_id; +} HashPageStat; + + +/* + * Verify that the given bytea contains a HASH page, or die in the attempt. + * A pointer to the page is returned. + */ +static Page +verify_hash_page(bytea *raw_page) +{ + Page page; + int raw_page_size; + HashPageOpaque pageopaque; + + raw_page_size = VARSIZE(raw_page) - VARHDRSZ; + + if (raw_page_size != BLCKSZ) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("input page too small"), + errdetail("Expected size %d, got %d", + BLCKSZ, raw_page_size))); + + page = VARDATA(raw_page); + pageopaque = (HashPageOpaque) PageGetSpecialPointer(page); + if (pageopaque->hasho_page_id != HASHO_PAGE_ID) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("page is not a HASH page"), + errdetail("Expected %08x, got %08x.", + HASHO_PAGE_ID, pageopaque->hasho_page_id))); + + return page; +} + +/* - + * GetHashPageStatistics() + * + * Collect statistics of single hash page + * - + */ +static void +GetHashPageStatistics(Page page, HashPageStat *stat) +{ + PageHeader phdr = (PageHeader) page; + OffsetNumber maxoff = PageGetMaxOffsetNumber(pa
Re: [HACKERS] pageinspect: Hash index support
Hi, I am getting a fatal error along with some warnings when trying to apply the v3 patch using "git apply" command. [ashu@localhost postgresql]$ git apply 0001-pageinspect-Hash-index-support_v3.patch 0001-pageinspect-Hash-index-support_v3.patch:29: trailing whitespace. brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES) 0001-pageinspect-Hash-index-support_v3.patch:36: trailing whitespace. DATA = pageinspect--1.6.sql pageinspect--1.5--1.6.sql \ 0001-pageinspect-Hash-index-support_v3.patch:37: trailing whitespace. pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \ 0001-pageinspect-Hash-index-support_v3.patch:38: trailing whitespace. pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \ 0001-pageinspect-Hash-index-support_v3.patch:39: trailing whitespace. pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql fatal: git apply: bad git-diff - expected /dev/null on line 46 Also, i think the USAGE for hash_metap() is refering to hash_metap_bytea(). Please correct it. I have just started reviewing the patch, will keep on posting my comments upon further review. Thanks. With Regards, Ashutosh Sharma. EnterpriseDB: http://www.enterprisedb.com On Tue, Sep 20, 2016 at 10:15 PM, Jeff Janes wrote: > On Tue, Sep 20, 2016 at 5:40 AM, Jesper Pedersen > wrote: >> >> On 09/20/2016 03:19 AM, Michael Paquier wrote: >>> >>> You did not get right the comments from Alvaro upthread. The following >>> functions are added with this patch: >>> function hash_metap(text) >>> function hash_metap_bytea(bytea) >>> function hash_page_items(text,integer) >>> function hash_page_items_bytea(bytea) >>> function hash_page_stats(text,integer) >>> function hash_page_stats_bytea(bytea,integer) >>> >>> Now the following set of functions would be sufficient: >>> function hash_metapage_info(bytea) >>> function hash_page_items(bytea) >>> function hash_page_stats(bytea) >>> The last time pageinspect has been updated, when BRIN functions have >>> been added, it has been discussed to just use (bytea) as an argument >>> interface and just rely on get_raw_page() to get the pages wanted, so >>> I think that we had better stick with that and keep things simple. >>> >> >> Yes, I know, Alvaro and you voted for the bytea methods, and Jeff asked >> for both. >> >> Attached is v3 with only the bytea based methods. >> >> Alvaro, Michael and Jeff - Thanks for the review ! > > > > Is the 2nd "1" in this call needed? > > SELECT * FROM hash_page_stats(get_raw_page('mytab_index', 1), 1) > > As far as I can tell, that argument is only used to stuff into the output > field "blkno", it is not used to instruct the interpretation of the raw page > itself. It doesn't seem worthwhile to have the parameter that only echos > back to the user what the user already put in (twice). The only existing > funtions which take the blkno argument are those that don't use the > get_raw_page style. > > Also, should we document what the single letter values mean in the > hash_page_stats.type column? It is not obvious that 'i' means bitmap, for > example. > > Cheers, > > Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvements in psql hooks for variables
Ashutosh Bapat wrote: > You may want to add this to the November commitfest > https://commitfest.postgresql.org/11/. Done. It's at https://commitfest.postgresql.org/11/799/ Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On 20/09/16 17:47, Robert Haas wrote: > On Tue, Sep 20, 2016 at 11:27 AM, Peter Eisentraut > wrote: >> On 9/19/16 2:18 PM, Robert Haas wrote: >>> I suggest that we make it part of bgw_flags, but use a bitmask for it, >>> like this: >>> >>> #define BGWORKER_CLASS_MASK 0x00f0 >>> #define BGWORKER_CLASS_PARALLEL 0x0010 >>> /* add additional bgworker classes here */ >> >> Unless we have a mechanism that makes use of this somehow, this attempt >> at categorizing might be premature. > > My main reason for wanting is that I suspect there will be a similar > desire to limit the number of replication workers at some point. > Yes there definitely is desire to use this for replication workers as well. The logical replication launcher currently handles the limit by itself but I'd definitely prefer to reuse something more generic. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Windows service is not starting so there’s message in log: FATAL: "could not create shared memory segment “Global/Postgr
On Tue, Sep 20, 2016 at 1:03 PM, Tom Lane wrote: > Robert Haas writes: >> On Thu, May 26, 2016 at 7:44 PM, Michael Paquier >> wrote: >>> Yeah, it is definitely better to register it. And I have switched the >>> patch as ready for committer just now. > >> Committed and back-patched to 9.4, where DSM was introduced. > > ISTM both the previous coding and this version can fail for no good > reason, that is what if GetLastError() happens to return one of these > error codes as a result of some unrelated failure from before this > subroutine is entered? That is, wouldn't it be a good idea to > do SetLastError(0) before calling CreateFileMapping? Or is the latter > guaranteed to do that on success? I don't see that stated in its > man page. I'll leave it to people who know more about Windows than I do to opine on that. I suspect it's not too seriously broken because we've managed to cut two (almost three) major releases since this code was written without any complaints about that. But there may well be something that can be tightened up. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Windows service is not starting so there’s message in log: FATAL: "could not create shared memory segment “Global/PostgreSQL.851401618”: Permission
On Tue, Sep 20, 2016 at 12:53 PM, Tom Lane wrote: > Robert Haas writes: >> Yeah, random() is the wrong thing. It should use PostmasterRandom(). >> Fixed to do that instead. > > I am not very happy about this patch; have you considered the security > implications of what you just did? Hmm. No. > If you haven't, I'll tell you: > you just made the postmaster's selection of "random" cancel keys and > password salts a lot more predictable. Formerly, the srandom() seed > for those depended on both the postmaster start time and the time of > the first connection request, but this change removes the first > connection request from the equation. If you know the postmaster start > time --- which we will happily tell any asker --- it will not take too > many trials to find the seed that's in use. Realistically, in some large percentage of the real-world installs, that's not going to take too many trials anyway. People don't generally start a postmaster so that they can NOT connect to it, and there are plenty of real-world installations where you can count on the first connection happening in well under 1s. I'd suggest that if you're relying on that time being a secret for anything very security-critical, you're already in trouble. > I'd be the first to agree that this point is inadequately documented > in the code, but PostmasterRandom should be reserved for its existing > security-related uses, not exposed to the world for (ahem) random other > uses. So, we could have dsm_postmaster_startup() seed the random number generator itself, and then let PostmasterRandom() override the seed later. Like maybe: struct timeval tv; gettimeofday(&tv, NULL); srandom(tv.tv_sec); ... dsm_control_handle = random(); dsm_postmaster_startup() doesn't care very much about whether an adversary can predict the chosen DSM control segment ID, but it doesn't want to keep picking the same one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Windows service is not starting so there’s message in log: FATAL: "could not create shared memory segment “Global/PostgreSQL.85140
Robert Haas writes: > On Thu, May 26, 2016 at 7:44 PM, Michael Paquier > wrote: >> Yeah, it is definitely better to register it. And I have switched the >> patch as ready for committer just now. > Committed and back-patched to 9.4, where DSM was introduced. ISTM both the previous coding and this version can fail for no good reason, that is what if GetLastError() happens to return one of these error codes as a result of some unrelated failure from before this subroutine is entered? That is, wouldn't it be a good idea to do SetLastError(0) before calling CreateFileMapping? Or is the latter guaranteed to do that on success? I don't see that stated in its man page. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] get_home_path: use HOME
Rudolf Gavlas wrote: > I work in an environment, where servers are administered by people > with different user names and identical uid (0). So everyone is superuser there? That sounds, um, unorthodox. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On Thu, Sep 15, 2016 at 11:42 PM, Amit Kapila wrote: > > Okay, Thanks for pointing out the same. I have fixed it. Apart from > that, I have changed _hash_alloc_buckets() to initialize the page > instead of making it completely Zero because of problems discussed in > another related thread [1]. I have also updated README. > > with v7 of the concurrent has patch and v4 of the write ahead log patch and the latest relcache patch (I don't know how important that is to reproducing this, I suspect it is not), I once got this error: 38422 0 2016-09-19 16:25:50.055 PDT:LOG: database system was interrupted; last known up at 2016-09-19 16:25:49 PDT 38422 0 2016-09-19 16:25:50.057 PDT:LOG: database system was not properly shut down; automatic recovery in progress 38422 0 2016-09-19 16:25:50.057 PDT:LOG: redo starts at 3F/2200DE90 38422 01000 2016-09-19 16:25:50.061 PDT:WARNING: page verification failed, calculated checksum 65067 but expected 21260 38422 01000 2016-09-19 16:25:50.061 PDT:CONTEXT: xlog redo at 3F/22053B50 for Hash/ADD_OVFL_PAGE: bmsize 4096, bmpage_found T 38422 XX001 2016-09-19 16:25:50.071 PDT:FATAL: invalid page in block 9 of relation base/16384/17334 38422 XX001 2016-09-19 16:25:50.071 PDT:CONTEXT: xlog redo at 3F/22053B50 for Hash/ADD_OVFL_PAGE: bmsize 4096, bmpage_found T The original page with the invalid checksum is: $ od 16384_17334_9 000 32 00 015420 077347 020404 00 30 017760 020 017760 020004 00 00 00 00 00 00 040 00 00 00 00 00 00 00 00 * 0017760 07 02 17 17 07 00 02 177600 002 If I ignore the checksum failure and re-start the system, the page gets restored to be a bitmap page. Cheers, Jeff
Re: [HACKERS] Use of SizeOfIptrData - is that obsolete?
On Tue, Sep 20, 2016 at 8:34 PM, Tom Lane wrote: > Pavan Deolasee writes: > > I happened to notice this comment in src/include/storage/itemptr.h > > > * Note: because there is an item pointer in each tuple header and index > > * tuple header on disk, it's very important not to waste space with > > * structure padding bytes. The struct is designed to be six bytes long > > * (it contains three int16 fields) but a few compilers will pad it to > > * eight bytes unless coerced. We apply appropriate persuasion where > > * possible, and to cope with unpersuadable compilers, we try to use > > * "SizeOfIptrData" rather than "sizeof(ItemPointerData)" when computing > > * on-disk sizes. > > */ > > > Is that now obsolete? > > Realistically, because struct HeapTupleHeaderData contains a field of > type ItemPointerData, it's probably silly to imagine that we can save > anything if the compiler can't be persuaded to believe that > sizeof(ItemPointerData) is 6. It may well be that the structure pragmas > work on everything that wouldn't natively believe that anyway. > Yeah, that's what I thought and rest of the code seems to make that assumption as well. Attached patch removes the last remaining reference to SizeOfIptrData and also removes the macro and the associated comment. TBH I couldn't fully trace how the TID array gets generated in nodeTidscan.c, but I'm fairly confident that this should not break anything because this was the only remaining reference. While htup.h refactoring happened in 9.5, I don't see any point in back patching this. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services pg_remove_itemptr_size.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Windows service is not starting so there’s message in log: FATAL: "could not create shared memory segment “Global/PostgreSQL.851401618”: Permission denied”
Robert Haas writes: > Yeah, random() is the wrong thing. It should use PostmasterRandom(). > Fixed to do that instead. I am not very happy about this patch; have you considered the security implications of what you just did? If you haven't, I'll tell you: you just made the postmaster's selection of "random" cancel keys and password salts a lot more predictable. Formerly, the srandom() seed for those depended on both the postmaster start time and the time of the first connection request, but this change removes the first connection request from the equation. If you know the postmaster start time --- which we will happily tell any asker --- it will not take too many trials to find the seed that's in use. I'd be the first to agree that this point is inadequately documented in the code, but PostmasterRandom should be reserved for its existing security-related uses, not exposed to the world for (ahem) random other uses. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pageinspect: Hash index support
On Tue, Sep 20, 2016 at 5:40 AM, Jesper Pedersen wrote: > On 09/20/2016 03:19 AM, Michael Paquier wrote: > >> You did not get right the comments from Alvaro upthread. The following >> functions are added with this patch: >> function hash_metap(text) >> function hash_metap_bytea(bytea) >> function hash_page_items(text,integer) >> function hash_page_items_bytea(bytea) >> function hash_page_stats(text,integer) >> function hash_page_stats_bytea(bytea,integer) >> >> Now the following set of functions would be sufficient: >> function hash_metapage_info(bytea) >> function hash_page_items(bytea) >> function hash_page_stats(bytea) >> The last time pageinspect has been updated, when BRIN functions have >> been added, it has been discussed to just use (bytea) as an argument >> interface and just rely on get_raw_page() to get the pages wanted, so >> I think that we had better stick with that and keep things simple. >> >> > Yes, I know, Alvaro and you voted for the bytea methods, and Jeff asked > for both. > > Attached is v3 with only the bytea based methods. > > Alvaro, Michael and Jeff - Thanks for the review ! > Is the 2nd "1" in this call needed? SELECT * FROM hash_page_stats(get_raw_page('mytab_index', 1), 1) As far as I can tell, that argument is only used to stuff into the output field "blkno", it is not used to instruct the interpretation of the raw page itself. It doesn't seem worthwhile to have the parameter that only echos back to the user what the user already put in (twice). The only existing funtions which take the blkno argument are those that don't use the get_raw_page style. Also, should we document what the single letter values mean in the hash_page_stats.type column? It is not obvious that 'i' means bitmap, for example. Cheers, Jeff
Re: [HACKERS] [PATCH] get_home_path: use HOME
Rudolf Gavlas writes: > The usage of HOME environment variable (if set) is IMO the right, > standard and faster way to get_home_path(). Can you provide some evidence for that claim? I can believe "faster" but the rest sounds like wishful thinking. > I work in an environment, where servers are administered by people > with different user names and identical uid (0). I think what you have there is an incredibly badly-designed system that can be expected to break outside software (eg, Postgres). If we take this patch, what's to stop someone from complaining that we broke *their* badly-designed system that abuses the HOME variable? I'm pretty hesitant to touch code that's worked the same way for a decade or two on such a basis. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [HACKERS] Re: [HACKERS] Windows service is not starting so there’s message in log: FATAL: "could not create shared memory segment “Global/PostgreSQL.851401618”: Permission denied”
On Thu, Oct 15, 2015 at 11:32 PM, Amit Kapila wrote: > On Thu, Oct 15, 2015 at 8:35 PM, Dmitry Vasilyev > wrote: >> >> I think that function dsm_impl_windows() with EACCES error should not >> do ereport() with FATAL level. It works, but it is likely to make an >> infinite loop if the user will receive EACCES error. >> > > Currently we are using error level as ERROR for creating dsm during > postmaster startup which is not right and rather we should use error > level as LOG. Can you please try with the attached patch and see > if the issue is fixed for you. > > Another some what related point is currently we are using random() > function to ensure a unique name for dsm and it seems to me that > it is always going to generate same number on first invocation (at least > thats what happening on windows) due to which you are seeing the > error. Yeah, random() is the wrong thing. It should use PostmasterRandom(). Fixed to do that instead. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Windows service is not starting so there’s message in log: FATAL: "could not create shared memory segment “Global/PostgreSQL.851401618
On Thu, May 26, 2016 at 7:44 PM, Michael Paquier wrote: >> Thanks for reviewing the patch. I have added the entry for this patch in >> next CF (https://commitfest.postgresql.org/10/636/), feel free to mark it >> as Ready for committer if you think patch is good. > > Yeah, it is definitely better to register it. And I have switched the > patch as ready for committer just now. Committed and back-patched to 9.4, where DSM was introduced. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] get_home_path: use HOME
Hi, I work in an environment, where servers are administered by people with different user names and identical uid (0). The attached patch fixes a bug exposed in such environments: where the logic of retrieving a personal configuration file relies solely on get_home_path(), the different users are forced to share the file of the first user with given uid. The usage of HOME environment variable (if set) is IMO the right, standard and faster way to get_home_path(). r. diff --git a/src/port/path.c b/src/port/path.c index 7bf7cbc..33cb790 100644 --- a/src/port/path.c +++ b/src/port/path.c @@ -807,15 +807,24 @@ bool get_home_path(char *ret_path) { #ifndef WIN32 - char pwdbuf[BUFSIZ]; - struct passwd pwdstr; - struct passwd *pwd = NULL; - - (void) pqGetpwuid(geteuid(), &pwdstr, pwdbuf, sizeof(pwdbuf), &pwd); - if (pwd == NULL) - return false; - strlcpy(ret_path, pwd->pw_dir, MAXPGPATH); - return true; + char *envhome = getenv("HOME"); + if (envhome != NULL && strlen(envhome) > 0) + { + strlcpy(ret_path, envhome, MAXPGPATH); + return true; + } + else + { + char pwdbuf[BUFSIZ]; + struct passwd pwdstr; + struct passwd *pwd = NULL; + + (void) pqGetpwuid(geteuid(), &pwdstr, pwdbuf, sizeof(pwdbuf), &pwd); + if (pwd == NULL) + return false; + strlcpy(ret_path, pwd->pw_dir, MAXPGPATH); + return true; + } #else char *tmppath; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Tue, Sep 20, 2016 at 11:27 AM, Peter Eisentraut wrote: > On 9/19/16 2:18 PM, Robert Haas wrote: >> I suggest that we make it part of bgw_flags, but use a bitmask for it, >> like this: >> >> #define BGWORKER_CLASS_MASK 0x00f0 >> #define BGWORKER_CLASS_PARALLEL 0x0010 >> /* add additional bgworker classes here */ > > Unless we have a mechanism that makes use of this somehow, this attempt > at categorizing might be premature. My main reason for wanting is that I suspect there will be a similar desire to limit the number of replication workers at some point. However, that could be wrong. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Odd system-column handling in postgres_fdw join pushdown patch
On Fri, Jul 1, 2016 at 3:10 AM, Etsuro Fujita wrote: > On 2016/04/14 4:57, Robert Haas wrote: >> >> 1. For a regular FDW scan, zero the xmin, xmax, and cid of the tuple >> before returning it from postgres_fdw, so that we don't expose the >> datum-tuple fields. I can't see any reason this isn't safe, but I >> might be missing something. > > I noticed odd behavior of this in EvalPlanQual. Consider: > > -- session 1 > postgres=# create extension postgres_fdw; > CREATE EXTENSION > postgres=# create server fs foreign data wrapper postgres_fdw options > (dbname 'postgres'); > CREATE SERVER > postgres=# create user mapping for public server fs; > CREATE USER MAPPING > postgres=# create table t1 (a int, b int); > CREATE TABLE > postgres=# create table t2 (a int, b int); > CREATE TABLE > postgres=# insert into t1 values (1, 1); > INSERT 0 1 > postgres=# insert into t2 values (1, 1); > INSERT 0 1 > postgres=# create foreign table ft1 (a int, b int) server fs options > (table_name 't1'); > CREATE FOREIGN TABLE > postgres=# select xmin, xmax, cmin, * from ft1; > xmin | xmax | cmin | a | b > --+--+--+---+--- > 0 |0 |0 | 1 | 1 > (1 row) > > -- session 2 > postgres=# begin; > BEGIN > postgres=# update t2 set a = a; > UPDATE 1 > > -- session 1 > postgres=# select ft1.xmin, ft1.xmax, ft1.cmin, ft1.* from ft1, t2 for > update; > > -- session 2 > postgres=# commit; > COMMIT > > -- session 1 (will show the following) > xmin |xmax| cmin | a | b > --++---+---+--- >128 | 4294967295 | 16398 | 1 | 1 > (1 row) > > The values of xmin, xmax, and cmin are not 0! The reason for that is that > we don't zero these values in a test tuple stored by > EvalPlanQualFetchRowMarks for EvalPlanQual re-evaluations. > > That cleanup applies to the file_fdw foreign table case as well, so I think > xmin, xmax, and cid in tuples from such tables should be set to 0, too. And > ISTM that it's better that the core (ie, ForeignNext) supports doing that, > rather than each FDW does that work. That would also minimize the overhead > because ForeignNext does that if needed. Please find attached a patch. If you want this to be considered, you'll need to rebase it and submit it to the next CommitFest. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix Error Message for allocate_recordbuf() Failure
On Mon, Jul 11, 2016 at 12:04 AM, Michael Paquier wrote: > On Sat, Jul 9, 2016 at 2:58 AM, Shoaib Lari wrote: >> Besides making the error message more informative, we had to modify >> allocate_recordbuf() to return the actual number of bytes that were being >> allocated. > > - report_invalid_record(state, "record length %u at %X/%X too long", > - total_len, > - (uint32) (RecPtr >> 32), (uint32) RecPtr); > + report_invalid_record(state, > + "cannot allocate %u bytes for record > length %u at %X/%X", > + newSizeOut, total_len, (uint32) (RecPtr >> 32), > + (uint32) RecPtr); > > It does not look like a good idea to me to complicate the interface of > allocate_recordbuf just to make more verbose one error message, ... For what it's worth, I think it's fine. Good error messages are a useful thing. More generally, I think the whole patch looks good and should be committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On 9/18/16 4:17 PM, Steve Singer wrote: > When I create my subscriber database by doing a physical backup of the > publisher cluster (with cp before I add any data) then I am unable to > connect subscribe. > ie > initdb ../data > cp -r ../data ../data2 > ./postgres -D ../data > ./postgres -D ../data2 > > This make sense when I look at your code, but it might not be what we want I think if we want to prevent the creation of subscriptions that point to self, then we need to create a magic token when the postmaster starts and check for that when we connect. So more of a running-instance identifier instead of a instance-on-disk identifier. The other option is that we just allow it and make it more robust. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On 9/19/16 2:18 PM, Robert Haas wrote: > I suggest that we make it part of bgw_flags, but use a bitmask for it, > like this: > > #define BGWORKER_CLASS_MASK 0x00f0 > #define BGWORKER_CLASS_PARALLEL 0x0010 > /* add additional bgworker classes here */ Unless we have a mechanism that makes use of this somehow, this attempt at categorizing might be premature. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "Some tests to cover hash_index"
On Tue, Sep 20, 2016 at 8:52 AM, Amit Kapila wrote: > I think you have a point, but not sure if it is worth to add a > separate file. It might be tricky to choose which file to add new > tests for hash_indexes. Anybody else have opinion on this point? I think all the tests should be added to hash_index.sql. A new file doesn't seem warranted. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On Thu, Sep 15, 2016 at 11:11:41AM +0530, Amit Kapila wrote: > I think it is possible without breaking pg_upgrade, if we match all > items of a page at once (and save them as local copy), rather than > matching item-by-item as we do now. We are already doing similar for > btree, refer explanation of BTScanPosItem and BTScanPosData in > nbtree.h. FYI, pg_upgrade has code to easily mark indexes as invalid and create a script the use can run to recreate the indexes as valid. I have received no complaints when this was used. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] more parallel query documentation
On 9/19/16 1:22 PM, Robert Haas wrote: > On Fri, Sep 16, 2016 at 4:28 PM, Alvaro Herrera > wrote: >> I agree it should be added. I suggest that it could even be added to >> the 9.6 docs, if you can make it. > > Here's a patch. I intend to commit this pretty quickly unless > somebody objects, and also to backpatch it into 9.6. I'm sure it's > not perfect, but imperfect documentation is better than no > documentation. Looks reasonable to me. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use of SizeOfIptrData - is that obsolete?
Pavan Deolasee writes: > I happened to notice this comment in src/include/storage/itemptr.h > * Note: because there is an item pointer in each tuple header and index > * tuple header on disk, it's very important not to waste space with > * structure padding bytes. The struct is designed to be six bytes long > * (it contains three int16 fields) but a few compilers will pad it to > * eight bytes unless coerced. We apply appropriate persuasion where > * possible, and to cope with unpersuadable compilers, we try to use > * "SizeOfIptrData" rather than "sizeof(ItemPointerData)" when computing > * on-disk sizes. > */ > Is that now obsolete? Realistically, because struct HeapTupleHeaderData contains a field of type ItemPointerData, it's probably silly to imagine that we can save anything if the compiler can't be persuaded to believe that sizeof(ItemPointerData) is 6. It may well be that the structure pragmas work on everything that wouldn't natively believe that anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel sec scan in plpgsql
On Tue, Sep 20, 2016 at 10:58 AM, Tom Lane wrote: > Maybe it would be better to fix the rule against workers > invoking their own parallel queries. That rule does have the advantage of preventing us from having one user backend launch N^2 workers. I don't think it would be that much work to fix it, but the results might be pretty exciting. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel sec scan in plpgsql
On Tue, Sep 20, 2016 at 9:24 AM, Amit Kapila wrote: > I think here point is that for any case where there is count of rows > to be selected, we disable parallelism. There are many genuine cases > like select count(*) into cnt ... which will run to completion, but as > plpgsql passes row count to be 1 or 2, it doesn't enter parallel mode. > There are couple other cases like that. Do you see a reason for not > enabling parallelism for such cases? If we can somehow know that the rowcount which is passed is greater than or equal to the actual number of rows which will be generated, then it's fine to enable parallelism. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?
On Tue, Sep 20, 2016 at 2:20 AM, Tsunakawa, Takayuki wrote: > There's no apparent evidence to indicate the cause, but I could guess a few > reasons. What do you think these are correct and should fix PostgreSQL? (I > think so) I think that we shouldn't start changing things based on guesses about what the problem is, even if they're fairly smart guesses. The thing to do would be to construct a test rig, crash the server repeatedly, and add debugging instrumentation to figure out where the time is actually going. I do think your theory about the stats collector might be worth pursuing. It seems that the stats collector only responds to SIGQUIT, ignoring SIGTERM. Making it do a clean shutdown on SIGTERM and a fast exit on SIGQUIT seems possibly worthwhile. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel sec scan in plpgsql
Amit Kapila writes: > On Mon, Sep 19, 2016 at 11:39 PM, Robert Haas wrote: >> However, I think the chances of supporting parallel query for queries >> not executed to completion any time in the near future are very poor. > I think here point is that for any case where there is count of rows > to be selected, we disable parallelism. There are many genuine cases > like select count(*) into cnt ... which will run to completion, but as > plpgsql passes row count to be 1 or 2, it doesn't enter parallel mode. > There are couple other cases like that. Do you see a reason for not > enabling parallelism for such cases? The other problem that would have to be confronted here is nesting, ie it would only be OK for a plpgsql function to invoke a parallel query if it wasn't itself being executed by a parallel worker --- or maybe even if it's being executed by the leader process but there's an active Gather somewhere else in the calling query's plan tree. (Not sure about the implementation's properties for that case.) We'd have to decide whether we want plancache to track both parallel and nonparallel plans for plpgsql queries. Do-able no doubt but pretty ugly. Maybe it would be better to fix the rule against workers invoking their own parallel queries. However that's probably moot unless the not-executing-to-completion issue can be solved. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)
On Mon, Sep 19, 2016 at 9:54 PM, Craig Ringer wrote: >> You appear to have attached the wrong patch set. > > Whoops, multitasking fail. > > Sorry for the late response, hospitals are "fun". I did some cleanup of 0001 (see attached) and was all set to commit it when I realized what I think is a problem: holding XidGenLock doesn't seem to help with the race condition between this function and CLOG truncation, because vac_truncate_clog() updates the shared memory limits AFTER performing the truncation. If the order of those operations were reversed, we'd be fine, because it would get stuck trying to update the shared memory limits and wouldn't be able to truncate until it did - and conversely, if it updated the shared memory limits before we examined them, that would be OK, too, because we'd be sure not to consult the pages that are about to be truncated. As it is, though, I don't see that there's any real interlock here. (BTW, the stuff you moved from clog.c to clog.h doesn't actually need to be moved; one of the changes I made here was to undo that.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company txid-status-rmh.patch Description: invalid/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speedup twophase transactions
> > On 20 Sep 2016, at 09:40, Michael Paquier wrote: > Thanks for digging into this. > + StartupCLOG(); > + StartupSUBTRANS(checkPoint.oldestActiveXid); > + RecoverPreparedFromFiles(); > [...] >/* > -* Startup commit log and subtrans only. MultiXact and commit > -* timestamp have already been started up and other SLRUs are not > -* maintained during recovery and need not be started yet. > -*/ > - StartupCLOG(); > - StartupSUBTRANS(oldestActiveXID); > Something is definitely wrong in this patch if we begin to do that. Putting that before actual WAL replay is just following historical order of events. Prepared files are pieces of WAL that happened before checkpoint, so ISTM there is no conceptual problem in restoring their state before replay. Moreover I think that this approach is better then oldest one. There is kind of circular dependency between StartupSUBTRANS() and restoring old prepared transaction: StartupSUBTRANS requires oldestActiveXID, but you can get it only after recovering 2pc files, but that recovery requires working SUBTRANS. In old code that was resolved by two passes through 2pc files: first one finds oldestActiveXmin but doesn’t restore their state, then subtrans was started, and only after that RecoverPreparedTransactions() is called. And that logic was repeated three times in xlog.c with help of following functions: PrescanPreparedTransactions() — 3 calls StandbyRecoverPreparedTransactions() — 2 calls RecoverPreparedTransactions() — 1 call Now, since we know that 2pc files are written only on checkpoint we can boil all that cases down to one: get oldestActiveXID from checkpoint and restore it before WAL replay. So only one call of RecoverPreparedFromFiles() and one call of CleanupPreparedTransactionsOnPITR() in case of PITR. And each of them does only what stated on function name without side-effects like advancing nextXid in previous versions. So, to summarise, i think keeping old interfaces here is bigger evil because in each of mentioned functions we will need to deal with both memory and file 2pc states. > There is no need to move those two calls normally, and I think that we > had better continue to do that only for hot standbys just to improve > 2PC handling… We can simulate old interface, it’s not a big problem. But that will complicate 2pc code and keep useless code in xlog.c because it was written in assumption that 2pc file can be created before checkpoint and now it isn’t true. > CleanupPreparedTransactionsOnPITR() cleans up pg_twophase, but isn't > it necessary to do something as well for what is in memory? Yes, that is necessary. Thanks, will fix. And probably add prepared tx to PITR test. > I have been thinking about this patch quite a bit, and the approach > taken looks really over-complicated to me. Basically what we are > trying to do here is to reuse as much as possible code paths that are > being used by non-recovery code paths during recovery, and then try to > adapt a bunch of things like SUBTRANS structures, CLOG, XID handling > and PGXACT things in sync to handle the 2PC information in memory > correctly. I am getting to think that this is *really* bug-prone in > the future and really going to make maintenance hard. With this patch we are reusing one infrastructure for normal work, recovery and replay. I don’t think that we will win a lot reliability if we split that to a different code paths. > Taking one step back, and I know that I am a noisy one, but looking at > this patch is making me aware of the fact that it is trying more and > more to patch things around the more we dig into issues, and I'd > suspect that trying to adapt for example sub-transaction and clog > handling is just the tip of the iceberg and that there are more things > that need to be taken care of if we continue to move on with this > approach. Again, it isn’t patching around to fix issues, it’s totally possible to keep old interface. However it’s possible that current approach is wrong because of some aspect that i didn’t think of, but now I don’t see good counterarguments. > Coming to the point: why don't we simplify things? In short: > - Just use a static list and allocate a chunk of shared memory as > needed. DSM could come into play here to adapt to the size of a 2PC > status file, this way there is no need to allocate a chunk of memory > with an upper bound. Still we could use an hardcoded upper-bound of > say 2k with max_prepared_transactions, and just write the 2PC status > file to disk if its size is higher than what can stick into memory. > - save 2PC information in memory instead of writing it to a 2PC file > when XLOG_XACT_PREPARE shows up. > - flush information to disk when there is a valid restart point in > RecoveryRestartPoint(). We would need as well to tweak > PrescanPreparedTransactions accordingly than everything we are trying > to do here. > That would
Re: [HACKERS] Declarative partitioning - another take
On Thu, Sep 15, 2016 at 4:53 AM, Amit Langote wrote: > [ new patches ] Re-reviewing 0001. + partexprs + pg_node_tree This documentation doesn't match pg_partition_table.h, which has partexprsrc and partexprbin. I don't understand why it's a good idea to have both, and there seem to be no comments or documentation supporting that choice anywhere. + The optional PARTITION BY clause specifies a method of + partitioning the table and the corresponding partition key. Table + thus created is called partitioned table. Key + consists of an ordered list of column names and/or expressions when + using the RANGE method, whereas only a single column or + expression can be specified when using the LIST method. + The type of a key column or an expression must have an associated + btree operator class or one must be specified along with the column + or the expression. Both of the sentences in this paragraph that do not begin with "the" need to begin with "the". (In my experience, it's generally a feature of English as spoken in India that connecting words like "the" and "a" are sometimes left out where non-Indian speakers of English would include them, so it would be good to watch out for this issue in general.) Also, I think this should be rephrased a bit to be more clear about how the partitioning key works, like this: The optional PARTITION BY clause specifies a method of partitioning the table. The table thus created is called a partitioned table. The parenthesized list of expressions forms the partitioning key for the table. When using range partitioning, the partioning key can include multiple columns or expressions, but for list partitioning, the partitioning key must consist of a single column or expression. If no btree operator class is specified when creating a partitioned table, the default btree operator class for the datatype will be used. If there is none, an error will be reported. +case RELKIND_PARTITIONED_TABLE: options = heap_reloptions(classForm->relkind, datum, false); Why? None of the reloptions that pertain to heap seem relevant to a relkind without storage. But, ah, do partitioned tables have storage? I mean, do we end up with an empty file, or no relfilenode at all? Can I CLUSTER, VACUUM, etc. a partitioned table? It would seem cleaner for the parent to have no relfilenode at all, but I'm guessing we might need some more changes for that to work out. +pg_collation.h pg_range.h pg_transform.h pg_partitioned_table.h\ Whitespace. Also, here and elsewhere, how about using alphabetical order, or anyway preserving it insofar as the existing list is alphabetized? +/* Remove NO INHERIT flag if rel is a partitioned table */ +if (is_no_inherit && +rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) +ereport(ERROR, +(errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("cannot add NO INHERIT constraint to partitioned table \"%s\"", + RelationGetRelationName(rel; The code and the comment disagree. I think the code is right and the comment should be adjusted to say something like /* Partitioned tables do not have storage, so a NO INHERIT constraint makes no sense. */ + * IDENTIFICATION + *src/backend/utils/misc/partition.c Wrong. +} KeyTypeCollInfo; I don't like this type name very much. Can we get "Part" in there someplace? It doesn't seem to be very well-designed, either. The number of entries in each array is determined by the partnatts flag in PartitionKeyData, which has also got various other arrays whose lengths are determined by partnatts. Why do we have some arrays in one structure and some arrays in another structure? Would it hurt anything to merge everything into one structure? Or could PartitionKeyData include a field of type KeyTypeCollInfo rather than KeyTypeCollInfo *, saving one pointer reference every place we access this data? +/* Allocate in the supposedly short-lived working context */ Why supposedly? +datum = fastgetattr(tuple, Anum_pg_partitioned_table_partattrs, +RelationGetDescr(catalog), +&isnull); Isn't the point of putting the fixed-length fields first that we can use GETSTRUCT() here? And even for partattrs as the first variable-length thing? +/* + * Run the expressions through eval_const_expressions. This is + * not just an optimization, but is necessary, because eventually + * the planner will be comparing them to similarly-processed qual + * clauses, and may fail to detect valid matches without this. + * We don't bother with canonicalize_qual, however. + */ I'm a bit confused by this, because I would think this processing ought to have been done before storing anything in the system catalogs. I don't see why it should be necessary to do it again after pulli
Re: [HACKERS] Parallel sec scan in plpgsql
On Mon, Sep 19, 2016 at 11:39 PM, Robert Haas wrote: > On Sat, Sep 17, 2016 at 11:54 PM, Amit Kapila wrote: >> In general, I think we should support the cases as required (or >> written) by you from plpgsql or sql functions. We need more work to >> support such cases. There are probably two ways of supporting such >> cases, we can build some intelligence in plpgsql execution such that >> it can recognise such queries and allow to use parallelism or we need >> to think of enabling parallelism for cases where we don't run the plan >> to completion. Most of the use cases from plpgsql or sql function >> fall into later category as they don't generally run the plan to >> completion. > > I think there's certainly more work that could be done to teach > PL/pgsql about cases where the query will run to completion. I didn't > work very hard to make sure we covered all of those; there are > probably several different cases where parallelism could be safely > enabled but currently isn't. Also, I didn't do anything at all to > update the other PLs, and that would be good, too. > > However, I think the chances of supporting parallel query for queries > not executed to completion any time in the near future are very poor. > I think here point is that for any case where there is count of rows to be selected, we disable parallelism. There are many genuine cases like select count(*) into cnt ... which will run to completion, but as plpgsql passes row count to be 1 or 2, it doesn't enter parallel mode. There are couple other cases like that. Do you see a reason for not enabling parallelism for such cases? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "Some tests to cover hash_index"
On Mon, Sep 19, 2016 at 8:44 PM, Mithun Cy wrote: > On Sat, Aug 6, 2016 at 9:41 AM, Amit Kapila wrote: >> I wonder why you have included a new file for these tests, why can't be >> these added to existing hash_index.sql. > tests in hash_index.sql did not cover overflow pages, above tests were for > mainly for them. > The name of file hash_index_split suggests it focus on split. Are the tests focussed more on overflow pages or on split of hash index? So I thought having a separate test file can help > enabling/disabling them in schedule files, when we do not want them running > as it take slightly high time. If you think otherwise I will reconsider will > add tests to hash_index.sql. > I think you have a point, but not sure if it is worth to add a separate file. It might be tricky to choose which file to add new tests for hash_indexes. Anybody else have opinion on this point? Can you check how much time it takes as compare to btree or brin index tests? I am facing below diff with your new patch. *** *** 1,4 ! -- -- Cause some overflow insert and splits. -- CREATE TABLE hash_split_heap (keycol INT); --- 1,4 ! -- -- Cause some overflow insert and splits. -- CREATE TABLE hash_split_heap (keycol INT); == There is an extra space in expected file which is leading to above failure. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pageinspect: Hash index support
On 09/20/2016 03:19 AM, Michael Paquier wrote: You did not get right the comments from Alvaro upthread. The following functions are added with this patch: function hash_metap(text) function hash_metap_bytea(bytea) function hash_page_items(text,integer) function hash_page_items_bytea(bytea) function hash_page_stats(text,integer) function hash_page_stats_bytea(bytea,integer) Now the following set of functions would be sufficient: function hash_metapage_info(bytea) function hash_page_items(bytea) function hash_page_stats(bytea) The last time pageinspect has been updated, when BRIN functions have been added, it has been discussed to just use (bytea) as an argument interface and just rely on get_raw_page() to get the pages wanted, so I think that we had better stick with that and keep things simple. Yes, I know, Alvaro and you voted for the bytea methods, and Jeff asked for both. Attached is v3 with only the bytea based methods. Alvaro, Michael and Jeff - Thanks for the review ! Best regards, Jesper >From 0aff82ccb40f00efe9e48cacef9c8a45c1327da2 Mon Sep 17 00:00:00 2001 From: jesperpedersen Date: Fri, 5 Aug 2016 10:16:32 -0400 Subject: [PATCH] pageinspect: Hash index support --- contrib/pageinspect/Makefile | 10 +- contrib/pageinspect/hashfuncs.c | 408 ++ contrib/pageinspect/pageinspect--1.5--1.6.sql | 64 contrib/pageinspect/pageinspect--1.5.sql | 279 -- contrib/pageinspect/pageinspect--1.6.sql | 339 + contrib/pageinspect/pageinspect.control | 2 +- doc/src/sgml/pageinspect.sgml | 100 +++ 7 files changed, 917 insertions(+), 285 deletions(-) create mode 100644 contrib/pageinspect/hashfuncs.c create mode 100644 contrib/pageinspect/pageinspect--1.5--1.6.sql delete mode 100644 contrib/pageinspect/pageinspect--1.5.sql create mode 100644 contrib/pageinspect/pageinspect--1.6.sql diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile index a98237e..c73d728 100644 --- a/contrib/pageinspect/Makefile +++ b/contrib/pageinspect/Makefile @@ -2,13 +2,13 @@ MODULE_big = pageinspect OBJS = rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o \ - brinfuncs.o ginfuncs.o $(WIN32RES) + brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES) EXTENSION = pageinspect -DATA = pageinspect--1.5.sql pageinspect--1.4--1.5.sql \ - pageinspect--1.3--1.4.sql pageinspect--1.2--1.3.sql \ - pageinspect--1.1--1.2.sql pageinspect--1.0--1.1.sql \ - pageinspect--unpackaged--1.0.sql +DATA = pageinspect--1.6.sql pageinspect--1.5--1.6.sql \ + pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \ + pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \ + pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql PGFILEDESC = "pageinspect - functions to inspect contents of database pages" ifdef USE_PGXS diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c new file mode 100644 index 000..4d2cd2a --- /dev/null +++ b/contrib/pageinspect/hashfuncs.c @@ -0,0 +1,408 @@ +/* + * hashfuncs.c + * Functions to investigate the content of HASH indexes + * + * Copyright (c) 2016, PostgreSQL Global Development Group + * + * IDENTIFICATION + * contrib/pageinspect/hashfuncs.c + */ + +#include "postgres.h" + +#include "access/hash.h" +#include "funcapi.h" +#include "miscadmin.h" + +PG_FUNCTION_INFO_V1(hash_metap); +PG_FUNCTION_INFO_V1(hash_page_items); +PG_FUNCTION_INFO_V1(hash_page_stats); + +/* + * structure for single hash page statistics + * + */ +typedef struct HashPageStat +{ + uint32 blkno; + uint32 live_items; + uint32 dead_items; + uint32 page_size; + uint32 max_avail; + uint32 free_size; + uint32 avg_item_size; + char type; + + /* opaque data */ + BlockNumber hasho_prevblkno; + BlockNumber hasho_nextblkno; + Bucket hasho_bucket; + uint16 hasho_flag; + uint16 hasho_page_id; +} HashPageStat; + + +/* + * Verify that the given bytea contains a HASH page, or die in the attempt. + * A pointer to the page is returned. + */ +static Page +verify_hash_page(bytea *raw_page) +{ + Page page; + int raw_page_size; + HashPageOpaque pageopaque; + + raw_page_size = VARSIZE(raw_page) - VARHDRSZ; + + if (raw_page_size != BLCKSZ) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("input page too small"), + errdetail("Expected size %d, got %d", + BLCKSZ, raw_page_size))); + + page = VARDATA(raw_page); + pageopaque = (HashPageOpaque) PageGetSpecialPointer(page); + if (pageopaque->hasho_page_id != HASHO_PAGE_ID) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("page is not a HASH page"), + errdetail("Expected %08x, got %08x.", + HASHO_PAGE_ID, pageopaque->hasho_page_id))); + + return page; +} + +/* - + * GetHashPageStatistics() + *
Re: [HACKERS] kqueue
Hi, On 16/09/2016 05:11, Thomas Munro wrote: Still no change measurable on my laptop. Keith, would you be able to test this on your rig and see if it sucks any less than the last one? I've tested kqueue-v6.patch on the Celeron NetBSD machine and numbers were constantly lower by about 5-10% vs fairly recent HEAD (same as my last pgbench runs). Cheers -- Matteo Beccati Development & Consulting - http://www.beccati.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel sec scan in plpgsql
On 18.09.2016 06:54, Amit Kapila wrote: On Fri, Sep 16, 2016 at 8:48 PM, Alex Ignatov wrote: On 16.09.2016 16:50, Amit Kapila wrote: Can you try by setting force_parallel_mode = off;? I think it is sending the whole function execution to worker due to force_parallel_mode. No changes: Okay, it just skipped from my mind that we don't support parallel queries for SQL statement execution (or statements executed via exec_stmt_execsql) from plpgsql. For detailed explanation of why that is not feasible you can refer one of my earlier e-mails [1] on similar topic. I think if we can somehow get the results via Perform statement, then it could be possible to use parallelism via plpgsql. However, you can use it via SQL functions, an example is below: set min_parallel_relation_size =0; set parallel_tuple_cost=0; set parallel_setup_cost=0; Load 'auto_explain'; set auto_explain.log_min_duration = 0; set auto_explain.log_analyze = true; set auto_explain.log_nested_statements = true; create table test_plpgsql(c1 int, c2 char(1000)); insert into test_plpgsql values(generate_series(1,10),'aaa'); create or replace function parallel_test_set_sql() returns setof bigint as $$ select count(*) from test_plpgsql; $$language sql PARALLEL SAFE STRICT STABLE; Then execute function as: select * from parallel_test_set_sql(); You can see below plan if auto_explain module is loaded. Finalize Aggregate (cost=14806.85..14806.86 rows=1 width=8) (actual tim e=1094.966..1094.967 rows=1 loops=1) -> Gather (cost=14806.83..14806.84 rows=2 width=8) (actual time=472. 216..1094.943 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=14806.83..14806.84 rows=1 width=8) (actual time=177.867..177.868 rows=1 loops=3) -> Parallel Seq Scan on test_plpgsql (cost=0.00..14702.6 7 rows=41667 width=0) (actual time=0.384..142.565 rows=3 loops=3) CONTEXT: SQL function "parallel_test_set_sql" statement 1 LOG: duration: 2965.040 ms plan: Query Text: select * from parallel_test_set_sql(); Function Scan on parallel_test_set_sql (cost=0.25..10.25 rows=1000 widt h=8) (actual time=2538.620..2776.955 rows=1 loops=1) In general, I think we should support the cases as required (or written) by you from plpgsql or sql functions. We need more work to support such cases. There are probably two ways of supporting such cases, we can build some intelligence in plpgsql execution such that it can recognise such queries and allow to use parallelism or we need to think of enabling parallelism for cases where we don't run the plan to completion. Most of the use cases from plpgsql or sql function fall into later category as they don't generally run the plan to completion. [1] - https://www.postgresql.org/message-id/CAA4eK1K8kaO_jRk42-o2rmhSRbKV-3mR%2BiNVcONLdbcSXW5TfQ%40mail.gmail.com Thank you for you sugestion! That works. But what we can do with this function: create or replace function parallel_test_sql(t int) returns setof bigint as $$ select count(*) from (select a,b,c,d,e,sum(bytes) from test where a>= $1 group by a,b,c,d,e)t; $$ language sql PARALLEL SAFE STRICT STABLE; explain (analyze,buffers) select * from parallel_test_sql(2); "Function Scan on parallel_test_sql (cost=0.25..10.25 rows=1000 width=8) (actual time=2410.789..2410.790 rows=1 loops=1)" " Buffers: shared hit=63696" "Planning time: 0.082 ms" "Execution time: 2410.841 ms" 2016-09-20 14:09:04 MSK [13037]: [75-1] user=ipdr,db=ipdr,app=pgAdmin III - Query Tool,client=127.0.0.1 LOG: duration: 2410.135 ms plan: Query Text: select count(*) from (select a,b,c,d,e,sum(bytes) from test where a>= $1 group by a,b,c,d,e)t; Aggregate (cost=230701.42..230701.43 rows=1 width=8) -> HashAggregate (cost=230363.59..230513.74 rows=15015 width=28) Group Key: test.a, test.b, test.c, test.d, test.e -> Seq Scan on test (cost=0.00..188696.44 rows=372 width=20) Filter: (a >= $1) No parallelism again. Looks like that Filter: (a >= $1) breaks parallelism Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error running custom plugin: “output plugins have to declare the _PG_output_plugin_init symbol”
Hi Ashutosh, Thank you for your answer. At the end I realized that the PGDLLEXPORT keyword was missing from the functions definitions. As a side question, what are the options to debug the plugin while it's being executing? I've seen a debug plugin for Postgres but it seems more for SQL functions and stored procedures. Is it possible to attach the process from Visual Studio debugger? Thanks, Valerio -- View this message in context: http://postgresql.nabble.com/Error-running-custom-plugin-output-plugins-have-to-declare-the-PG-output-plugin-init-symbol-tp5921145p5921898.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Covering + unique indexes.
On Tue, Sep 20, 2016 at 10:51 AM, Amit Kapila wrote: > On Tue, Sep 6, 2016 at 10:18 PM, Anastasia Lubennikova > wrote: >> 28.08.2016 09:13, Amit Kapila: >> >> On Mon, Aug 15, 2016 at 8:15 PM, Anastasia Lubennikova >> wrote: >> >> >> So the patch is correct. >> We can go further and remove this index_truncate_tuple() call, because >> the first key of any inner (or root) page doesn't need any key at all. >> > > Anyway, I think truncation happens if the page is at leaf level and > that is ensured by check, so I think we can't remove this: > + if (indnkeyatts != indnatts && P_ISLEAF(pageop)) > > > -- I have one more question regarding this truncate high-key concept. > I think if high key is truncated, then during insertion, for cases > like below it move to next page, whereas current page needs to be > splitted. > > Assume index on c1,c2,c3 and c2,c3 are including columns. > > Actual high key on leaf Page X - > 3, 2 , 2 > Truncated high key on leaf Page X > 3 > > New insertion key > 3, 1, 2 > > Now, I think for such cases during insertion if the page X doesn't > have enough space, it will move to next page whereas ideally, it > should split current page. Refer function _bt_findinsertloc() for > this logic. > Basically, here I wanted to know is that do we maintain ordering for keys with respect to including columns while storing them (In above example, do we ensure that 3,1,2 is always stored before 3,2,2)? > > > -- I am getting Assertion failure when I use this patch with database > created with a build before this patch. However, if I create a fresh > database it works fine. Assertion failure details are as below: > I have tried this test on my Windows m/c only. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring speculative insertion with unique indexes a little
On 03/17/2016 01:43 AM, Peter Geoghegan wrote: I attach a revision, that makes all the changes that Heikki suggested, except one. As already noted several times, following this suggestion would have added a bug. Alvaro preferred my original approach here in any case. I refer to my original approach of making the new UNIQUE_CHECK_SPECULATIVE case minimally different from the existing UNIQUE_CHECK_PARTIAL case currently used for deferred unique constraints and speculative insertion, as opposed to making the new UNIQUE_CHECK_SPECULATIVE "like CHECK_UNIQUE_YES, but return FALSE instead of throwing an error on conflict". That was broken because CHECK_UNIQUE_YES waits for the outcome of an xact, which UNIQUE_CHECK_PARTIAL never does, and so UNIQUE_CHECK_SPECULATIVE must never do. Any and all waits happen in the first phase of speculative insertion, and never the seconds. I could give a complicated explanation for why, involving a deadlock scenario, but a simple explanation will do: it has always worked that way, and was tested to work that way. Feedback from Heikki led to these changes for this revision: * The use of arguments within ExecInsert() was simplified. * More concise AM documentation. The docs essentially describe two new concepts: - What unique index insertion needs to know about speculative insertion in general. This doesn't just apply to speculative inserters themselves, of course. - What speculative insertion is. Why it exists (why we don't just wait on xact). In other words, what "unprincipled deadlocks" are, and how they are avoided (from a relatively high level). I feel like I have a responsibility to make sure that this mechanism is well documented, especially given that not that many people were involved in its design. It's possible that no more than the 3 original authors of UPSERT fully understand speculative insertion -- it's easy to miss some of the subtleties. Thanks for working on this, and sorry for disappearing and dropping this on the floor earlier. This doesn't apply anymore, thanks to 9c810a2e. Shouldn't be hard to fix. I was in support of this earlier in general, even though I had some questions on the details. But now that I look at the patch again, I'm not so sure anymore. I don't think this actually makes things clearer. It adds new cases that the code needs to deal with. Index AM writers now have to care about the difference between a UNIQUE_CHECK_PARTIAL and UNIQUE_CHECK_SPECULATIVE. You can have the exact same implementation for both, but at the very least the index AM author needs to read the paragraph you added to the docs to understand the difference. That adds some cognitive load. Likewise, in ExecInsertIndexTuples(), this makes the deferred-index case work slightly differently from speculative insertion. It's not a big difference, but it again forces you to keep one more scenario in mind, when reading the code. So overall, I think we should just drop this. Maybe a comment somewhere would be in order, to point out that ExecInsertIndexTuples() and index_insert might perform some unnecessary work, by inserting index tuples for a doomed heap tuple, if a speculative insertion fails. But that's all. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] IF (NOT) EXISTS in psql-completion
On Mon, Sep 19, 2016 at 6:11 PM, Pavel Stehule wrote: > I am thinking so commit's description should be inside README Horiguchi-san, your patch has some whitespace issues, you may want to get a run with git diff --check. Here are some things I have spotted: src/bin/psql/tab-complete.c:1074: trailing whitespace. +"MATERIALIZED VIEW", src/bin/psql/tab-complete.c:2621: trailing whitespace. + COMPLETE_WITH_QUERY(Query_for_list_of_roles, This set of patches is making psql tab completion move into a better shape, particularly with 0001 that removes the legendary huge if-elif and just the routine return immediately in case of a keyword match. Things could be a little bit more shortened by for example not doing the refactoring of the tab macros because they are just needed in tab-complete.c. The other patches introduce further improvements for the existing infrastructure, but that's a lot of things just for adding IF [NOT] EXISTS to be honest. Testing a bit, I have noticed that for example trying to after typing "create table if", if I attempt to do a tab completion "not exists" does not show up. I suspect that the other commands are failing at that as well. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Use of SizeOfIptrData - is that obsolete?
Hi, I happened to notice this comment in src/include/storage/itemptr.h * Note: because there is an item pointer in each tuple header and index * tuple header on disk, it's very important not to waste space with * structure padding bytes. The struct is designed to be six bytes long * (it contains three int16 fields) but a few compilers will pad it to * eight bytes unless coerced. We apply appropriate persuasion where * possible, and to cope with unpersuadable compilers, we try to use * "SizeOfIptrData" rather than "sizeof(ItemPointerData)" when computing * on-disk sizes. */ Is that now obsolete? I mean I can still find one reference in src/backend/executor/nodeTidscan.c, which uses SizeOfIptrData instead of sizeof (ItemPointerData), but all other places such as heapam_xlog.h now use sizeof operator. It was changed in 2c03216d831160b when we moved a bunch of xlog related stuff from htup.h to this new file. Hard to tell if there were other users before that and they were all dropped in this one commit or various commits leading to this. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] pageinspect: Hash index support
On Tue, Sep 20, 2016 at 1:11 AM, Jesper Pedersen wrote: > This version adds the overloaded get_raw_page based methods. However, I'm > not verifying the page other than page_id, as hasho_flag can contain > multiple flags in Amit's patches. And, I don't see having different page ids > as a benefit - at least not part of this patch. > > We should probably just have one of the method sets, so I'll send a v3 with > the set voted for. > > I kept the 'data' field as is, for now. $ git diff master --check contrib/pageinspect/hashfuncs.c:758: space before tab in indent. + values); That's always good to run to check the format of a patch before sending it. You did not get right the comments from Alvaro upthread. The following functions are added with this patch: function hash_metap(text) function hash_metap_bytea(bytea) function hash_page_items(text,integer) function hash_page_items_bytea(bytea) function hash_page_stats(text,integer) function hash_page_stats_bytea(bytea,integer) Now the following set of functions would be sufficient: function hash_metapage_info(bytea) function hash_page_items(bytea) function hash_page_stats(bytea) The last time pageinspect has been updated, when BRIN functions have been added, it has been discussed to just use (bytea) as an argument interface and just rely on get_raw_page() to get the pages wanted, so I think that we had better stick with that and keep things simple. Note: the patch checks if a superuser is calling the new functions, which is a good thing. I am switching the patch back to "waiting on author". As far as I can see the patch is in rather good shape, you just need to adapt the docs and shave all the parts that are not needed for the final result. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers