Re: [HACKERS] tracking commit timestamps
On 2014-11-02 19:27:25 +0100, Petr Jelinek wrote: Well, Michael has point that the extradata is pretty much useless currently, perhaps it would help to add the interface to set extradata? Only accessible via C and useless aren't the same thing. But sure, add it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start
On 11/04/2014 07:31 AM, Álvaro Hernández Tortosa wrote: Thank you for your comment, Tom. However I think this behavior, as seen from a user perspective, it's not the expected one. That may be the case, but I think it's the SQL-standard behaviour, so we can't really mess with it. The spec requires SET TRANSACTION ISOLATION, and you can't implement that if you take a snapshot at BEGIN. If it is still the intended behavior, I think it should be clearly documented as such, and a recommendation similar to issue a 'SELECT 1' right after BEGIN to freeze the data before any own query or similar comment should be added. Again, as I said in my email, the documentation clearly says that only sees data committed before the transaction began. And this is clearly not the real behavior. It's more of a difference in when the transaction begins. Arguably, BEGIN says I intend to begin a new transaction with the next query rather than immediately begin executing a new transaction. This concept could be clearer in the docs. Sure, there are, that was the link I pointed out, but I found no explicit mention to the fact that I'm raising here. I'm sure it's documented *somewhere*, in that I remember reading about this detail in the docs, but I can't find _where_ in the docs. It doesn't seem to be in: http://www.postgresql.org/docs/current/static/transaction-iso.html where I'd expect. In any case, we simply cannot take the snapshot at BEGIN time, because it's permitted to: SET TRANSACTION ISOLATION LEVEL READ COMMITTED; in a DB that has default serializable isolation or has a SET SESSION CHARACTERISTICS isolation mode of serializable. Note that SET TRANSACTION is SQL-standard. AFAIK deferring the snapshot that's consistent with other RDBMSes that use snapshots, too. The docs of that command allude to, but doesn't explicitly state, the behaviour you mention. http://www.postgresql.org/docs/current/static/sql-set-transaction.html -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On Sat, Nov 1, 2014 at 10:00 PM, Michael Paquier michael.paqu...@gmail.com wrote: More comments: I have done a couple of tests on my laptop with pgbench like that to generate a maximum of transaction commits: $ pgbench --no-vacuum -f ~/Desktop/commit.sql -T 60 -c 24 -j 24 $ cat ~/Desktop/commit.sql SELECT txid_current() Here is an average of 5 runs: - master: 49842.44 - GUC off, patched = 49688.71 - GUC on, patched = 49459.73 So there is little noise. Here are also more comments about the code that I found while focusing on committs.c: 1) When the GUC is not enabled, and when the transaction ID provided is not in a correct range, a dummy value based on InvalidTransactionId (?!) is returned, like that: + /* Return empty if module not enabled */ + if (!commit_ts_enabled) + { + if (ts) + *ts = InvalidTransactionId; + if (data) + *data = (CommitExtraData) 0; + return; + } This leads to some incorrect results: =# select pg_get_transaction_committime('1'); pg_get_transaction_committime --- 2000-01-01 09:00:00+09 (1 row) Or for example: =# SELECT txid_current(); txid_current -- 1006 (1 row) =# select pg_get_transaction_committime('1006'); pg_get_transaction_committime --- 2014-11-04 14:54:37.589381+09 (1 row) =# select pg_get_transaction_committime('1007'); pg_get_transaction_committime --- 2000-01-01 09:00:00+09 (1 row) =# SELECT txid_current(); txid_current -- 1007 (1 row) And at other times error is not that helpful for user: =# select pg_get_transaction_committime('1'); ERROR: XX000: could not access status of transaction 1 DETAIL: Could not read from file pg_committs/ at offset 114688: Undefined error: 0. LOCATION: SlruReportIOError, slru.c:896 I think that you should simply return an error in TransactionIdGetCommitTsData when xid is not in the range supported. That will be more transparent for the user. 2) You may as well want to return a different error if the GUC track_commit_timestamps is disabled. 3) This comment should be updated in committs.c, we are not talking about CLOG here: +/* + * Link to shared-memory data structures for CLOG control + */ 4) Similarly, I think more comments should be put in here. It is OK to truncate the commit timestamp stuff similarly to CLOG to have a consistent status context available, but let's explain it. * multixacts; that will be done by the next checkpoint. */ TruncateCLOG(frozenXID); + TruncateCommitTs(frozenXID) 5) Reading the code, TransactionTreeSetCommitTimestamp is always called with do_xlog = false, making that actually no timestamps are WAL'd... Hence WriteSetTimestampXlogRec is just dead code with the latest version of the patch. IMO, this should be always WAL-logged when track_commit_timestamp is on. 6) Shouldn't any value update of track_commit_timestamp be tracked in XLogReportParameters? That's thinking about making the commit timestamp available on standbys as well.. 7) pg_xlogdump has no support for RM_COMMITTS_ID, something that would be useful for developers. 8) The redo and xlog routines of committs should be out in a dedicated file, like committsxlog.c or similar. The other rmgr do so, so let's be consistent. Regards, -- Michael
Re: [HACKERS] tracking commit timestamps
On 2014-11-04 17:19:18 +0900, Michael Paquier wrote: 5) Reading the code, TransactionTreeSetCommitTimestamp is always called with do_xlog = false, making that actually no timestamps are WAL'd... Hence WriteSetTimestampXlogRec is just dead code with the latest version of the patch. IMO, this should be always WAL-logged when track_commit_timestamp is on. It's callable via a 'extern' function. So, I'd not consider it dead. And the WAL logging is provided by xact.c's own WAL logging - it always does the corresponding committs calls. 6) Shouldn't any value update of track_commit_timestamp be tracked in XLogReportParameters? That's thinking about making the commit timestamp available on standbys as well.. Yes, it should. 7) pg_xlogdump has no support for RM_COMMITTS_ID, something that would be useful for developers. What do you mean by that? There's the corresponding rmgrdesc.c support I think? 8) The redo and xlog routines of committs should be out in a dedicated file, like committsxlog.c or similar. The other rmgr do so, so let's be consistent. Seems pointless to me. The file isn't that big and the other SLRUs don't do it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On Tue, Nov 4, 2014 at 5:05 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-11-02 19:27:25 +0100, Petr Jelinek wrote: Well, Michael has point that the extradata is pretty much useless currently, perhaps it would help to add the interface to set extradata? Only accessible via C and useless aren't the same thing. But sure, add it. I'm still on a -1 for that. You mentioned that there is perhaps no reason to delay a decision on this matter, but IMO there is no reason to rush either in doing something we may regret. And I am not the only one on this thread expressing concern about this extra data thingy. If this extra data field is going to be used to identify from which node a commit comes from, then it is another feature than what is written on the subject of this thread. In this case let's discuss it in the thread dedicated to replication identifiers, or come up with an extra patch once the feature for commit timestamps is done. -- Michael
Re: [HACKERS] tracking commit timestamps
On Tue, Nov 4, 2014 at 5:23 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-11-04 17:19:18 +0900, Michael Paquier wrote: 5) Reading the code, TransactionTreeSetCommitTimestamp is always called with do_xlog = false, making that actually no timestamps are WAL'd... Hence WriteSetTimestampXlogRec is just dead code with the latest version of the patch. IMO, this should be always WAL-logged when track_commit_timestamp is on. It's callable via a 'extern' function. So, I'd not consider it dead. And the WAL logging is provided by xact.c's own WAL logging - it always does the corresponding committs calls. The code path is unused. We'd better make the XLOG record mandatory if tracking is enabled, as this information is useful on standbys as well. 7) pg_xlogdump has no support for RM_COMMITTS_ID, something that would be useful for developers. What do you mean by that? There's the corresponding rmgrdesc.c support I think? Oops sorry. I thought there was some big switch in pg_xlogdump when writing this comment. Yeah that's fine. -- Michael
Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices
David Fetter wrote: On Tue, Nov 04, 2014 at 07:51:06AM +0900, Tatsuo Ishii wrote: Just out of curiosity, why is Oracle's NUMBER (I assume you are talking about this) so fast? I suspect that what happens is that NUMBER is stored as a native type (int2, int4, int8, int16) that depends on its size and then cast to the next upward thing as needed, taking any performance hits at that point. The documentation hints (38 decimal places) at a 128-bit internal representation as the maximum. I don't know what happens when you get past what 128 bits can represent. No, Oracle stores NUMBERs as variable length field (up to 22 bytes), where the first byte encodes the sign and the comma position and the remaining bytes encode the digits, each byte representing two digits in base-100 notation (see Oracle Metalink note 1007641.6). So it's not so different from PostgreSQL. No idea why their arithmetic should be faster. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BRIN indexes - TRAP: BadArgument
On 3 November 2014 22:18, Alvaro Herrera alvhe...@2ndquadrant.com wrote: So here's v21. I also attach a partial diff from v20, just in case anyone wants to give it a look. Looks really good. I'd like to reword this sentence in the readme, since one of the main use cases would be tables without btrees It's unlikely that BRIN would be the only + indexes in a table, though, because primary keys can be btrees only, and so + we don't implement this optimization. I don't see a regression test. Create, use, VACUUM, just so we know it hasn't regressed after commit. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On 2014-11-04 17:29:04 +0900, Michael Paquier wrote: On Tue, Nov 4, 2014 at 5:23 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-11-04 17:19:18 +0900, Michael Paquier wrote: 5) Reading the code, TransactionTreeSetCommitTimestamp is always called with do_xlog = false, making that actually no timestamps are WAL'd... Hence WriteSetTimestampXlogRec is just dead code with the latest version of the patch. IMO, this should be always WAL-logged when track_commit_timestamp is on. It's callable via a 'extern' function. So, I'd not consider it dead. And the WAL logging is provided by xact.c's own WAL logging - it always does the corresponding committs calls. The code path is unused. No. It is not. It can be called by extensions? We'd better make the XLOG record mandatory if tracking is enabled, as this information is useful on standbys as well. Did you read what I wrote? To quote And the WAL logging is provided by xact.c's own WAL logging - it always does the corresponding committs calls.. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] What exactly is our CRC algorithm?
On 10/27/2014 06:02 PM, Heikki Linnakangas wrote: I came up with the attached patches. They do three things: 1. Get rid of the 64-bit CRC code. It's not used for anything, and haven't been for years, so it doesn't seem worth spending any effort to fix them. 2. Switch to CRC-32C (Castagnoli) for WAL and other places that don't need to remain compatible across major versions. 3. Use the same lookup table for hstore and ltree, as used for the legacy almost CRC-32 algorithm. The tables are identical, so might as well. Any objections? I hear none, so committed with some small fixes. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BRIN indexes - TRAP: BadArgument
On Mon, Nov 3, 2014 at 2:18 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: So here's v21. I also attach a partial diff from v20, just in case anyone wants to give it a look. This needs a bump to 1.3, or the extension won't install: contrib/pageinspect/pageinspect.control During crash recovery, I am getting a segfault: #0 0x0067ee35 in LWLockRelease (lock=0x0) at lwlock.c:1161 #1 0x00664f4a in UnlockReleaseBuffer (buffer=0) at bufmgr.c:2888 #2 0x00465a88 in brin_xlog_revmap_extend (lsn=value optimized out, record=value optimized out) at brin_xlog.c:261 #3 brin_redo (lsn=value optimized out, record=value optimized out) at brin_xlog.c:284 #4 0x004ce505 in StartupXLOG () at xlog.c:6795 I failed to preserve the data directory, I'll try to repeat this later this week if needed. Cheers, Jeff
[HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
Mon, 3 Nov 2014 22:55:02 -0200 от Fabrízio de Royes Mello fabriziome...@gmail.com: On Mon, Nov 3, 2014 at 7:25 PM, Alexey Vasiliev leopard...@inbox.ru wrote: Mon, 3 Nov 2014 19:18:51 -0200 от Fabrízio de Royes Mello fabriziome...@gmail.com : Should I change my patch and send it by email? And also as I understand I should change message ID for https://commitfest.postgresql.org/action/patch_view?id=1636 , isn't it? You should just send another version of your patch by email and add a new comment to commit-fest using the Patch comment type. Added new patch. And you should add the patch to an open commit-fest not to an in-progress. The next opened is 2014-12 [1]. Moved. Thanks. -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello -- Alexey Vasiliev -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sequence Access Method WIP
On 10/13/2014 01:01 PM, Petr Jelinek wrote: Hi, I rewrote the patch with different API along the lines of what was discussed. Thanks, that's better. It would be good to see an alternative seqam to implement this API, to see how it really works. The local one is too dummy to expose any possible issues. The API now consists of following functions: sequence_alloc - allocating range of new values The function receives the sequence relation, current value, number of requested values amdata and relevant sequence options like min/max and returns new amdata, new current value, number of values allocated and also if it needs wal write (that should be returned if amdata has changed plus other reasons the AM might have to force the wal update). sequence_setval - notification that setval is happening This function gets sequence relation, previous value and new value plus the amdata and returns amdata (I can imagine some complex sequence AMs will want to throw error that setval can't be done on them). sequence_request_update/sequence_update - used for background processing Basically AM can call the sequence_request_update and backend will then call the sequence_update method of an AM with current amdata and will write the updated amdata to disk sequence_seqparams - function to process/validate the standard sequence options like start position, min/max, increment by etc by the AM, it's called in addition to the standard processing sequence_reloptions - this is the only thing that remained unchanged from previous patch, it's meant to pass custom options to the AM Only the alloc and reloptions methods are required (and implemented by the local AM). The caching, xlog writing, updating the page, etc is handled by backend, the AM does not see the tuple at all. I decided to not pass even the struct around and just pass the relevant options because I think if we want to abstract the storage properly then the AM should not care about how the pg_sequence looks like at all, even if it means that the sequence_alloc parameter list is bit long. Hmm. The division of labour between the seqam and commands/sequence.c still feels a bit funny. sequence.c keeps track of how many values have been WAL-logged, and thus usable immediately, but we still call sequence_alloc even when using up those already WAL-logged values. If you think of using this for something like a centralized sequence server in a replication cluster, you certainly don't want to make a call to the remote server for every value - you'll want to cache them. With the local seqam, there are two levels of caching. Each backend caches some values (per the CACHE value option in CREATE SEQUENCE). In addition to that, the server WAL-logs 32 values at a time. If you have a remote seqam, it would most likely add a third cache, but it would interact in strange ways with the second cache. Considering a non-local seqam, the locking is also a bit strange. The server keeps the sequence page locked throughout nextval(). But if the actual state of the sequence is maintained elsewhere, there's no need to serialize the calls to the remote allocator, i.e. the sequence_alloc() calls. I'm not exactly sure what to do about that. One option is to completely move the maintenance of the current value, i.e. sequence.last_value, to the seqam. That makes sense from an abstraction point of view. For example with a remote server managing the sequence, storing the current value in the local catalog table makes no sense as it's always going to be out-of-date. The local seqam would store it as part of the am-private data. However, you would need to move the responsibility of locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to provide an API that the seqam can call to do that. Perhaps just let the seqam call heap_inplace_update on the sequence relation. For the amdata handling (which is the AM's private data variable) the API assumes that (Datum) 0 is NULL, this seems to work well for reloptions so should work here also and it simplifies things a little compared to passing pointers to pointers around and making sure everything is allocated, etc. Sadly the fact that amdata is not fixed size and can be NULL made the page updates of the sequence relation quite more complex that it used to be. It would be nice if the seqam could define exactly the columns it needs, with any datatypes. There would be a set of common attributes: sequence_name, start_value, cache_value, increment_by, max_value, min_value, is_cycled. The local seqam would add last_value, log_cnt and is_called to that. A remote seqam that calls out to some other server might store the remote server's hostname etc. There could be a seqam function that returns a TupleDesc with the required columns, for example. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:
Re: [HACKERS] tracking commit timestamps
Michael Paquier wrote: I'm still on a -1 for that. You mentioned that there is perhaps no reason to delay a decision on this matter, but IMO there is no reason to rush either in doing something we may regret. And I am not the only one on this thread expressing concern about this extra data thingy. If this extra data field is going to be used to identify from which node a commit comes from, then it is another feature than what is written on the subject of this thread. In this case let's discuss it in the thread dedicated to replication identifiers, or come up with an extra patch once the feature for commit timestamps is done. Introducing the extra data field in a later patch would mean an on-disk representation change, i.e. pg_upgrade trouble. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TAP test breakage on MacOS X
On Mon, Nov 3, 2014 at 4:44 PM, Peter Eisentraut pete...@gmx.net wrote: On 10/29/14 8:42 AM, Robert Haas wrote: I'm sympathetic to that line of reasoning, but I really think that if you want to keep this infrastructure, it needs to be made portable. Let me clarify that this was my intention. I have looked at many test frameworks, many of which are much nicer than what we have, but the portability and dependency implications for this project would have been between shocking and outrageous. I settled for what I felt was the absolute minimum: Perl + IPC::Run. It was only later on that I learned that 1) subtests don't work in Perl 5.10, and 2) subtests are broken in Perl 5.12. So we removed the use of subtests and now we are back at the baseline I started with. Thanks. At last check, which I think was approximately last week, the tests were running and passing on my machine. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] What exactly is our CRC algorithm?
On Tue, Nov 4, 2014 at 4:47 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I hear none, so committed with some small fixes. Are you going to get the slice-by-N stuff working next, to speed this up? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On 2014-11-04 10:01:00 -0300, Alvaro Herrera wrote: Michael Paquier wrote: I'm still on a -1 for that. You mentioned that there is perhaps no reason to delay a decision on this matter, but IMO there is no reason to rush either in doing something we may regret. And I am not the only one on this thread expressing concern about this extra data thingy. If this extra data field is going to be used to identify from which node a commit comes from, then it is another feature than what is written on the subject of this thread. In this case let's discuss it in the thread dedicated to replication identifiers, or come up with an extra patch once the feature for commit timestamps is done. Introducing the extra data field in a later patch would mean an on-disk representation change, i.e. pg_upgrade trouble. It's also simply not necessarily related to replication identifiers. This is useful whether replication identifiers make it in or not. It allows to implement something like replication identifiers outside of core (albeit with a hefty overhead in OLTP workloads). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] What exactly is our CRC algorithm?
On 2014-11-04 08:21:13 -0500, Robert Haas wrote: On Tue, Nov 4, 2014 at 4:47 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I hear none, so committed with some small fixes. Are you going to get the slice-by-N stuff working next, to speed this up? I don't plan to do anything serious with it, but I've hacked up the crc code to use the hardware instruction. The results are quite good - crc vanishes entirely from the profile for most workloads. It's still visible for bulk copy, but that's pretty much it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
Hi, On 2014-11-03 14:04:00 +0300, Alexey Vasiliev wrote: * What the patch does in a short paragraph: This patch should add option recovery_timeout, which help to control timeout of restore_command nonzero status code. Right now default value is 5 seconds. This is useful, if I using for restore of wal logs some external storage (like AWS S3) and no matter what the slave database will lag behind the master. The problem, what for each request to AWS S3 need to pay, what is why for N nodes, which try to get next wal log each 5 seconds will be bigger price, than for example each 30 seconds. Before I do this in this way: if ! (/usr/local/bin/envdir /etc/wal-e.d/env /usr/local/bin/wal-e wal-fetch %f %p); then sleep 60; fi . But in this case restart/stop database slower. Without saying that the feature is unneccessary, wouldn't this better be solved by using streaming rep most of the time? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices
On Tue, Nov 04, 2014 at 11:44:22AM +0900, Michael Paquier wrote: On Sun, Nov 2, 2014 at 2:30 AM, Tom Lane t...@sss.pgh.pa.us wrote: In the case of hash indexes, because we still have to have the hash opclasses in core, there's no way that it could be pushed out as an extension module even if we otherwise had full support for AMs as extensions. So what I hear you proposing is let's break this so thoroughly that it *can't* be fixed. I'm not on board with that. I think the WARNING will do just fine to discourage novices who are not familiar with the state of the hash AM. In the meantime, we could push forward with the idea of making hash indexes automatically unlogged, so that recovering from a crash wouldn't be quite so messy/ dangerous. There is as well another way: finally support WAL-logging for hash indexes. +1 Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices
Feng Tian ft...@vitessedata.com wrote: Performance different between Money and Numeric is *HUGE*. For TPCH Q1, the performance difference is 5x for stock postgres, and ~20x for vitesse. Stock postgres, for my laptop, TPCH 1G, Q1, use money type ~ 9s, use Numeric (15, 2) is ~53s. test=# do $$ begin perform sum('1.01'::numeric) from generate_series(1,1000); end; $$; This may not reflect the difference of the two data type. One aggregate is not where most of the time is spent. TPCH Q1 has many more computing. That's why I gave the count(*) baseline. If you subtract that from the sum() timings for money and numeric, numeric takes 5x as long as money in my quick-and-dirty benchmark. I have no problem believing that some applications can see that level of performance hit from using numeric instead of money. It's a little surprising to me to see more than a factor of five slowdown from using numeric instead of money; it leaves me a little curious how that happens. Perhaps that workload is more conducive to keeping those money amounts in registers and doing register arithmetic. In any event, I'm against removing or re-deprecating the money type. There are some problems with money; there are other problems with numeric. If the docs are failing to make the trade-offs clear, we should fix the docs. And we can always look at improving either or both types. The fact that numeric is 5x to 20x slower than money in important applications, combined with the fact that it performs far worse than a similar type in another product, suggest that numeric could stand a serious optimization pass. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices
On 11/02/2014 06:41 PM, Tom Lane wrote: I wrote: Either way, requiring a dump/reload for upgrade is surely a better answer for users of the type than just summarily screwing them. BTW, after reflecting a bit more I'm less than convinced that this datatype is completely useless. Even if you prefer to store currency values in numeric columns, casting to or from money provides a way to accept or emit values in whatever monetary format the LC_MONETARY locale setting specifies. That seems like a useful feature, and it's one you could not easily duplicate using to_char/to_number (not to mention that those functions aren't without major shortcomings of their own). I personally find that one of the money type's missfeatures, since the currency symbol does not vary just per language/locale but also for which currency you are working with. For databases with multiple currencies the output format of money is just confusing. Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices
On Tue, Nov 4, 2014 at 8:16 AM, Kevin Grittner kgri...@ymail.com wrote: In any event, I'm against removing or re-deprecating the money type. There are some problems with money; there are other problems with numeric. If the docs are failing to make the trade-offs clear, we should fix the docs. And we can always look at improving either or both types. The fact that numeric is 5x to 20x slower than money in important applications, combined with the fact that it performs far worse than a similar type in another product, suggest that numeric could stand a serious optimization pass. Money should stay. It is the only fixed point integer based numeric that comes with the database. Should a non locale dependent fixed point type that offers the same performance come along, then you have a good argument to deprecate. numeric is definitely a dog which the fundamental problem IMNSHO. I'm guessing optimization paths are going to revolve around utilizing binary integer ops in cases where it's known to be safe to do so. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices
On 11/03/2014 10:11 PM, Tom Lane wrote: Josh Berkusj...@agliodbs.com writes: On 11/02/2014 11:41 AM, Tom Lane wrote: Nothing that I recall at the moment, but there is certainly plenty of stuff of dubious quality in there. I'd argue that chkpass, intagg, intarray, isn, spi, and xml2 are all in worse shape than the money type. Why are we holding on to xml2 again? IIRC, there's some xpath-related functionality in there that's not yet available in core (and needs some redesign before it'd ever get accepted into core, so there's not a real quick fix to be had). Yes, xpath_table is badly broken, as Robert Haas documented and I expanded on a while back. See for example http://www.postgresql.org/message-id/4d6bcc1e.3010...@dunslane.net I don't have any time of my own to work on this any time soon. But I know that it has users, so just throwing it out would upset some people. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
Tue, 4 Nov 2014 14:41:56 +0100 от Andres Freund and...@2ndquadrant.com: Hi, On 2014-11-03 14:04:00 +0300, Alexey Vasiliev wrote: * What the patch does in a short paragraph: This patch should add option recovery_timeout, which help to control timeout of restore_command nonzero status code. Right now default value is 5 seconds. This is useful, if I using for restore of wal logs some external storage (like AWS S3) and no matter what the slave database will lag behind the master. The problem, what for each request to AWS S3 need to pay, what is why for N nodes, which try to get next wal log each 5 seconds will be bigger price, than for example each 30 seconds. Before I do this in this way: if ! (/usr/local/bin/envdir /etc/wal-e.d/env /usr/local/bin/wal-e wal-fetch %f %p); then sleep 60; fi . But in this case restart/stop database slower. Without saying that the feature is unneccessary, wouldn't this better be solved by using streaming rep most of the time? But we don't need streaming rep. Master database no need to know about slaves (and no need to add this little overhead to master). Slaves read wal logs from S3 and the same S3 wal logs used as backups. -- Alexey Vasiliev -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
On 7/11/14 4:36 AM, Ali Akbar wrote: But i found some bug in libxml2's code, because we call xmlCopyNode with 1 as the second parameter, internally xmlCopyNode will call xmlStaticCopyNode recursively via xmlStaticCopyNodeList. And xmlStaticCopyNodeList doesn't check the return of xmlStaticCopyNode whether it's NULL. So if someday the recursion returns NULL (maybe because of memory exhaustion), it will SEGFAULT. Knowing this but in libxml2 code, is this patch is still acceptable? This problem was fixed in libxml2 2.9.2 (see https://bugzilla.gnome.org/show_bug.cgi?id=708681). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
On 7/8/14 6:03 AM, Ali Akbar wrote: If we put 1 as the parameter, the resulting Node will link to the existing children. In this case, the namespace problem for the children might be still exists. If any of the children uses different namespace(s) than the parent, the namespace definition will not be copied correctly. This analysis might very well be right, but I think we should prove it with a test case. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
On 10/6/14 10:24 PM, Ali Akbar wrote: While reviewing the patch myself, i spotted some formatting problems in the code. Fixed in this v5 patch. Also, this patch uses context patch format (in first versions, because of the small modification, context patch format obfucates the changes. After reimplementation this isn't the case anymore) I think the problem this patch is addressing is real, and your approach is sound, but I'd ask you to go back to the xmlCopyNode() version, and try to add a test case for why the second argument = 1 is necessary. I don't see any other problems. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN pageinspect functions
On Tue, Oct 7, 2014 at 10:33 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Some time ago, when debugging a GIN bug, I wrote these pageinspect functions to inspect GIN indexes. They were very useful; we should add them. I think these functions will be quite useful for debugging purpose and we already have similar function's for other index (btree). Few suggestions for patch: 1. Documentation seems to be missing, other API's exposed via pageinspect are documented at: http://www.postgresql.org/docs/devel/static/pageinspect.html 2. +CREATE FUNCTION gin_metapage(IN page bytea, +OUT pending_head bigint, +OUT pending_tail bigint, + OUT tail_free_size int4, +OUT n_pending_pages bigint, +OUT n_pending_tuples bigint, +OUT n_total_pages bigint, +OUT n_entry_pages bigint, +OUT n_data_pages bigint, +OUT n_entries bigint, + OUT version int4) +AS 'MODULE_PATHNAME', 'gin_metapage' +LANGUAGE C STRICT; a. Isn't it better to name the function as gin_metap(..) similar to existing function bt_metap(..)? b. Can this function have a similar signature as bt_metap() which means it should take input as relname? 3. Can gin_dataleafpage() API have similar name and signature as API bt_page_items() exposed for btree? 4. Can we have any better name for gin_pageopaq (other API name's in this module are self explanatory)? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices
Craig Ringer cr...@2ndquadrant.com writes: On 11/03/2014 03:41 AM, Tom Lane wrote: Nothing that I recall at the moment, but there is certainly plenty of stuff of dubious quality in there. I'd argue that chkpass, intagg, intarray, isn, spi, and xml2 are all in worse shape than the money type. What's wrong with intarray? The single biggest practical problem with it is that it creates new definitions of the @ and @ operators, which cause confusion with the core operators of those names. For example, we've repeatedly seen bug reports along the lines of why does this query not use this index because of people trying to use the contrib @ operator with a core GIN index or vice versa. (In fairness, I think intarray might be older than the core operators, but that doesn't make this less of a problem.) Another thing that grates on me is the mostly-arbitrary restriction to non-null-containing arrays. That's an implementation artifact rather than something semantically required by the operations. More generally, it seems like a grab bag of not terribly well designed features, and the features that do seem well designed seem like they ought to be more generic than just for int4 arrays. So to me it feels like proof-of-concept experimentation rather than a production-grade thing that we could feel good about moving into core. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [BUGS] BUG #11867: Strange behaviour with composite types after resetting database tablespace
On Mon, 2014-11-03 at 22:08 -0500, Tom Lane wrote: spockFascinating./spock :-) I believe what is happening is: [ . . . ] This is not mission-critical for me but I'd be grateful for suggestions for work-arounds. I don't see any workaround that's much easier than fixing the bug. But what's your use case that involves flipping databases from one tablespace to another and then back again within the life expectancy of unused shared-buffers pages? It doesn't seem terribly surprising that nobody noticed this before ... It's a completely unreal use case. I am working on a diff tool, and this was found as part of my test suite: build db x, drop db x, build db y, apply diff y-x, compare with original x, apply diff x-y, compare with original y. So this is a fairly minor inconvenience for me. __ Marc signature.asc Description: This is a digitally signed message part
Re: [HACKERS] alter user set local_preload_libraries.
On 10/9/14 1:58 PM, Fujii Masao wrote: Also I think that it's useful to allow ALTER ROLE/DATABASE SET to set PGC_BACKEND and PGC_SU_BACKEND parameters. So, what about applying the attached patch? This patch allows that and changes the context of session_preload_libraries to PGC_SU_BACKEND. After looking through this again, I wonder whether there is any reason why ignore_system_indexes cannot be plain PGC_USERSET. With this change, we'd allow setting it via ALTER ROLE, but the access to pg_db_role_setting happens before it. So if there is anything unsafe about changing ignore_system_indexes, then this would be a problem, but I don't see anything. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [JDBC] [HACKERS] Pipelining executions to postgresql server
On Nov 4, 2014, at 12:55 AM, Craig Ringer cr...@2ndquadrant.com wrote: On 11/04/2014 07:56 AM, Mikko Tiihonen wrote: I also think the async I/O is the way to go. Luckily that has already been done in the pgjdbc-ng (https://github.com/impossibl/pgjdbc-ng), built on top of netty java NIO library. It has quite good feature parity with the original pgjdbc driver. Huh, it does seem to now. The big omission, unless I'm blind, is support for COPY. (It also lacks support for JDBC3 and JDBC4, requiring JDK 7 and JDBC 4.1, but that's reasonable enough.) I focused on 4.1 compliance instead of legacy support. I am proud to say I believe pgjdbc-ng is 100% feature compliant for JDBC 4.1. Legacy support is still not, and likely will never be, planned. It does lack COPY support as I have never used it myself from the driver, only from tools outside the JVM. It now has LISTEN/NOTIFY by the looks, and of course it's built around binary protocol support. I freely admit I haven't looked at it too closely. I'm a tad frustrated by the parallel development of a different driver when the official driver has so much in the way of low hanging fruit to improve. I’ll only nibble this bait…. I outlined my reasons when I started the project. They were all valid then and still are now. My apologies for causing any frustration with this new path. It’s a lot cleaner, simpler and provides more JDBC features than the original driver because of it. Although I must say, without the original driver and it’s exhausting battery of unit tests, building a new driver would seem impossible. I have to say I like some aspects of how pgjdbc-ng is being done - in particular use of Maven and being built around non-blocking I/O. OTOH, the use of Google Guava I find pretty inexplicable in a JDBC driver, and since it hard-depends on JDK 7 I don't understand why Netty was used instead of the async / non-blocking features of the core JVM. That may simply be my inexperience with writing non-blocking socket I/O code on Java though. It confuses me as to why you consider using stable, well implemented, well tested and well cared for libraries as inexplicable. Just because we are building a “driver” means we have to write every line of code ourselves? No thanks. You can imagine our differences on this philosophy are one of the reasons why I consider pgjdbc-ng’s parallel development to be a godsend rather than hacking on the original code. Concerning Guava… A great library with an amazing number of classes that make everyday Java easier. The requirement for JDK 7 was chosen before Guava was considered not because of it. Using it seemed obvious after that decision. Also, we have internalized the classes we use out of Guava to remove it as a dependency. This is more work to maintain on our part but makes it worth it when deploying a single JAR. Concerning Netty… All options were entertained at the beginning. The original version actually used a basic NIO socket. After I realized I had to basically write my own framework to work with this socket correctly I searched for an alternative and found Netty. The deciding factor was that Implementing SSL on on top of the NIO API was considered next to impossible to get right; according to all prevailing wisdom at the time. Whereas with Netty, SSL support is basically a single line change. I'm also concerned about how it'll handle new JDBC versions, as it seems to lack the JDBC interface abstraction of the core driver. My plan is to handle adding support for 4.2 and beyond by using a Maven based conditional preprocessor. If that fails, or seems just too ugly, I’ll probably have to look at an abstract class based method like that of the original driver. I do not think the JDBC batch interface even allow doing updates to multiple tables when using prepared statements? Ah, I missed this before. That's correct. You get prepared statements _or_ multiple different statements. That's a more understandable reason to concoct a new API, and explains why you're not just solving the issues with batches. Though I still think you're going to have to fix the buffer management code before you do anything like this. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-jdbc mailing list (pgsql-j...@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-jdbc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #11867: Strange behaviour with composite types after resetting database tablespace
I wrote: However: at no point in this sequence did we flush the original catalog blocks out of shared buffers. Therefore, a read of the database's pg_class or pg_type at this point is likely to find the previous states of those pages (pre-composite-type-drop) as valid, matching entries, so that we skip reading the up-to-date page contents back from disk. A quick fix for this would be to issue DropDatabaseBuffers during movedb() BTW, I wondered whether the exact same hazard didn't exist for ALTER TABLE SET TABLESPACE. At first glance it looks bad, because ATExecSetTableSpace doesn't forcibly drop old shared buffers for the moved table either. Closer examination says we're safe though: * The buffers will be dropped during smgrdounlink at end of transaction, so you could only be at risk if you moved a table, modified it, and moved it back in a single transaction. * A new relfilenode will be assigned during each movement. * When assigning a relfilenode during the move-back, we'd be certain to choose a relfilenode different from the original one, because the old relation still exists on-disk at this point. So it's safe as things stand; but this seems a bit, um, rickety. Should we insert a DropRelFileNodesAllBuffers call into ATExecSetTableSpace to make it safer? It's kind of annoying to have to scan the buffer pool twice, but I'm afraid that sometime in the future somebody will try to get clever about optimizing away the end-of-transaction buffer pool scan, and break this case. Or maybe I'm just worrying too much. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alter user set local_preload_libraries.
Peter Eisentraut pete...@gmx.net writes: On 10/9/14 1:58 PM, Fujii Masao wrote: Also I think that it's useful to allow ALTER ROLE/DATABASE SET to set PGC_BACKEND and PGC_SU_BACKEND parameters. So, what about applying the attached patch? This patch allows that and changes the context of session_preload_libraries to PGC_SU_BACKEND. After looking through this again, I wonder whether there is any reason why ignore_system_indexes cannot be plain PGC_USERSET. With this change, we'd allow setting it via ALTER ROLE, but the access to pg_db_role_setting happens before it. So if there is anything unsafe about changing ignore_system_indexes, then this would be a problem, but I don't see anything. There are some, um, interesting consequences of setting ignore_system_indexes; AFAIK, none that put data integrity at risk, but it can destroy performance in ways beyond the obvious ones. See for example the comments for get_mergejoin_opfamilies and get_ordering_op_properties. I don't particularly want to answer user bug reports about such behaviors, nor do I care to put any effort into making the behavior without system indexes smarter than it is now. (We should also consider the risk that there might be as-yet-unrecognized dependencies on catalog scan order that would amount to actual bugs.) So I'm -1 on any change that might make it look like we were encouraging people to use ignore_system_indexes except as a very last resort. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] superuser() shortcuts
On 10/27/14 11:40 AM, Stephen Frost wrote: * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: As I started looking at this, there are multiple other places where these types of error messages occur (opclasscmds.c, user.c, postinit.c, miscinit.c are just a few), not just around the changes in this patch. If we change them in one place, wouldn't it be best to change them in the rest? If that is the case, I'm afraid that might distract from the purpose of this patch. Perhaps, if we want to change them, then that should be submitted as a separate patch? Yeah. I'm just saying that maybe this patch should adopt whatever wording we agree to, not that we need to change other places. On the other hand, since so many other places have adopted the different wording, maybe there's a reason for it and if so, does anybody know what it is. But I have to say that it does look inconsistent to me. Updated patch attached. Comments welcome. The changes in src/backend/commands/alter.c src/backend/commands/foreigncmds.c src/backend/commands/tablecmds.c src/backend/commands/typecmds.c are OK, because the superuser() calls are redundant, and cleaning this up is clearly in line with planned future work. I suggest moving the rest of the changes into separate patches. The error message wording changes might well be worth considering, but there are tons more must be $someone to do $something error messages, and changing two of them isn't making anything more or less consistent. The ha*_something_privilege() changes are also not very consistent. We already have have_createrole_privilege(), which does include a superuser check, and you add has_replication_privilege() with a superuser check, but has_catupdate_privilege() and has_inherit_privilege() don't include a superuser check. That's clearly a mess. Btw., why rename have_createrole_privilege()? Also, your patch has spaces between tabs. Check for whitespace errors with git. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] get_cast_func syscache utility function
here's a patch for a utility function to look up the cast function for a from/to pair of types, as recently suggested by Alvaro. Although it only contains one use (in json.c), the upcoming jsonb generators would also use it twice. I'd like to get this committed fairly quickly so I can prepare an updated patch for the jsonb generators. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] get_cast_func syscache utility function
missing patch Regards Pavel 2014-11-04 18:57 GMT+01:00 Andrew Dunstan and...@dunslane.net: here's a patch for a utility function to look up the cast function for a from/to pair of types, as recently suggested by Alvaro. Although it only contains one use (in json.c), the upcoming jsonb generators would also use it twice. I'd like to get this committed fairly quickly so I can prepare an updated patch for the jsonb generators. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] get_cast_func syscache utility function
On 11/04/2014 12:57 PM, Andrew Dunstan wrote: here's a patch for a utility function to look up the cast function for a from/to pair of types, as recently suggested by Alvaro. Although it only contains one use (in json.c), the upcoming jsonb generators would also use it twice. I'd like to get this committed fairly quickly so I can prepare an updated patch for the jsonb generators. This time with patch. cheers andrew diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c index d2bf640..fe9cf83 100644 --- a/src/backend/utils/adt/json.c +++ b/src/backend/utils/adt/json.c @@ -1299,22 +1299,12 @@ json_categorize_type(Oid typoid, /* but let's look for a cast to json, if it's not built-in */ if (typoid = FirstNormalObjectId) { - HeapTuple tuple; + Oid castfunc = get_cast_func(typoid, JSONOID); - tuple = SearchSysCache2(CASTSOURCETARGET, - ObjectIdGetDatum(typoid), - ObjectIdGetDatum(JSONOID)); - if (HeapTupleIsValid(tuple)) + if (OidIsValid(castfunc)) { - Form_pg_cast castForm = (Form_pg_cast) GETSTRUCT(tuple); - - if (castForm-castmethod == COERCION_METHOD_FUNCTION) - { - *tcategory = JSONTYPE_CAST; - *outfuncoid = castForm-castfunc; - } - - ReleaseSysCache(tuple); + *tcategory = JSONTYPE_CAST; + *outfuncoid = castfunc; } } } diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c index 552e498..0da08fe 100644 --- a/src/backend/utils/cache/lsyscache.c +++ b/src/backend/utils/cache/lsyscache.c @@ -21,6 +21,7 @@ #include bootstrap/bootstrap.h #include catalog/pg_amop.h #include catalog/pg_amproc.h +#include catalog/pg_cast.h #include catalog/pg_collation.h #include catalog/pg_constraint.h #include catalog/pg_namespace.h @@ -2889,3 +2890,36 @@ get_range_subtype(Oid rangeOid) else return InvalidOid; } + +/* + * + * get_cast_func(fromtype, totype) - funcoid + * + * find a cast function if any from fromtype to totype. + * + * return the Oid of the fucntion found or InvalidOid otherwise. + */ + +Oid +get_cast_func(Oid from_type, Oid to_type) +{ + HeapTuple tuple; + Oid castfunc = InvalidOid; + + tuple = SearchSysCache2(CASTSOURCETARGET, + ObjectIdGetDatum(from_type), + ObjectIdGetDatum(to_type)); + if (HeapTupleIsValid(tuple)) + { + Form_pg_cast castForm = (Form_pg_cast) GETSTRUCT(tuple); + + if (castForm-castmethod == COERCION_METHOD_FUNCTION) + { + castfunc = castForm-castfunc; + } + + ReleaseSysCache(tuple); + } + + return castfunc; +} diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h index 07d24d4..0291b96 100644 --- a/src/include/utils/lsyscache.h +++ b/src/include/utils/lsyscache.h @@ -151,6 +151,7 @@ extern void free_attstatsslot(Oid atttype, float4 *numbers, int nnumbers); extern char *get_namespace_name(Oid nspid); extern Oid get_range_subtype(Oid rangeOid); +extern Oid get_cast_func(Oid from_type, Oid to_type); #define type_is_array(typid) (get_element_type(typid) != InvalidOid) /* type_is_array_domain accepts both plain arrays and domains over arrays */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY TO returning empty result with parallel ALTER TABLE
--On 3. November 2014 18:15:04 +0100 Sven Wegener sven.wege...@stealer.net wrote: I've check git master and 9.x and all show the same behaviour. I came up with the patch below, which is against curent git master. The patch modifies the COPY TO code to create a new snapshot, after acquiring the necessary locks on the source tables, so that it sees any modification commited by other backends. Well, i have the feeling that there's nothing wrong with it. The ALTER TABLE command has rewritten all tuples with its own XID, thus the current snapshot does not see these tuples anymore. I suppose that in SERIALIZABLE or REPEATABLE READ transaction isolation your proposal still doesn't return the tuples you'd like to see. -- Thanks Bernd -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] get_cast_func syscache utility function
Andrew Dunstan and...@dunslane.net writes: here's a patch for a utility function to look up the cast function for a from/to pair of types, as recently suggested by Alvaro. Although it only contains one use (in json.c), the upcoming jsonb generators would also use it twice. I'd like to get this committed fairly quickly so I can prepare an updated patch for the jsonb generators. I'm not exactly convinced that this is an appropriate utility function, because casts for which a COERCION_METHOD_FUNCTION entry exists in pg_cast are by no means the whole universe of casts. I'm concerned that creating this function will encourage patch authors to blow off other possibilities when they should not. In the particular context at hand, it seems like you might be better advised to think about how to refactor json_categorize_type to be more helpful to your new use-case. A concrete example of what I'm worried about is that json_categorize_type already ignores the possibility of binary-compatible (WITHOUT FUNCTION) casts to json. Since it's eliminated the domain case earlier, that's perhaps not too horridly broken; but it'd be very easy for new uses of this get_cast_func() function to overlook such considerations. In short, I'd rather see this addressed through functions with slightly higher-level APIs that are capable of covering more cases. In most cases it'd be best if callers were using find_coercion_pathway() rather than taking shortcuts. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY TO returning empty result with parallel ALTER TABLE
Bernd Helmle maili...@oopsware.de writes: --On 3. November 2014 18:15:04 +0100 Sven Wegener sven.wege...@stealer.net wrote: I've check git master and 9.x and all show the same behaviour. I came up with the patch below, which is against curent git master. The patch modifies the COPY TO code to create a new snapshot, after acquiring the necessary locks on the source tables, so that it sees any modification commited by other backends. Well, i have the feeling that there's nothing wrong with it. The ALTER TABLE command has rewritten all tuples with its own XID, thus the current snapshot does not see these tuples anymore. I suppose that in SERIALIZABLE or REPEATABLE READ transaction isolation your proposal still doesn't return the tuples you'd like to see. Not sure. The OP's point is that in a SELECT, you do get unsurprising results, because a SELECT will acquire its execution snapshot after it's gotten AccessShareLock on the table. Arguably COPY should behave likewise. Or to be even more concrete, COPY (SELECT * FROM tab) TO ... probably already acts like he wants, so why isn't plain COPY equivalent to that? What concerns me more is whether there are other utility statements that have comparable issues. We should not fix just COPY if there are more cases. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY TO returning empty result with parallel ALTER TABLE
On 11/04/2014 01:51 PM, Tom Lane wrote: Bernd Helmle maili...@oopsware.de writes: --On 3. November 2014 18:15:04 +0100 Sven Wegener sven.wege...@stealer.net wrote: I've check git master and 9.x and all show the same behaviour. I came up with the patch below, which is against curent git master. The patch modifies the COPY TO code to create a new snapshot, after acquiring the necessary locks on the source tables, so that it sees any modification commited by other backends. Well, i have the feeling that there's nothing wrong with it. The ALTER TABLE command has rewritten all tuples with its own XID, thus the current snapshot does not see these tuples anymore. I suppose that in SERIALIZABLE or REPEATABLE READ transaction isolation your proposal still doesn't return the tuples you'd like to see. Not sure. The OP's point is that in a SELECT, you do get unsurprising results, because a SELECT will acquire its execution snapshot after it's gotten AccessShareLock on the table. Arguably COPY should behave likewise. Or to be even more concrete, COPY (SELECT * FROM tab) TO ... probably already acts like he wants, so why isn't plain COPY equivalent to that? Yes, that seems like an outright bug. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY TO returning empty result with parallel ALTER TABLE
On Tue, 4 Nov 2014, Bernd Helmle wrote: --On 3. November 2014 18:15:04 +0100 Sven Wegener sven.wege...@stealer.net wrote: I've check git master and 9.x and all show the same behaviour. I came up with the patch below, which is against curent git master. The patch modifies the COPY TO code to create a new snapshot, after acquiring the necessary locks on the source tables, so that it sees any modification commited by other backends. Well, i have the feeling that there's nothing wrong with it. The ALTER TABLE command has rewritten all tuples with its own XID, thus the current snapshot does not see these tuples anymore. I suppose that in SERIALIZABLE or REPEATABLE READ transaction isolation your proposal still doesn't return the tuples you'd like to see. No, but with REPEATABLE READ and SERIALIZABLE the plain SELECT case is currently broken too. The issue I'm fixing is that COPY (SELECT ...) TO and SELECT behave differently. So, how does an ALTER TABLE should behave transaction-wise? PostgreSQL claims transactional DDL support. As mentioned above a parallel SELECT with SERIALIZABLE returns an empty result, but it also sees the schema change, at least the change shows up in the result tuple description. The change doesn't show up in pg_catalog, so that bit is handled correctly. The schema change transaction can be rolled back the way it is now, so it's kind of transactional, but other transactions seeing the schema change in query results is broken. The empty result might be fixed by just keeping the old XID of rewritten tuples, as the exclusive lock ALTER TABLE helds should be enough to make sure nobody is actively accessing the rewritten table. But I'm pondering if there is a solution for the visible schema change that doesn't involve keeping the old data files around or to just raise an serialization error. Coming back on my mentioned solution of setting the XID of rewritten tuples to the FrozenXID. That's broken as in the REPEATABLE READ/SERIALIZABLE case there might be tuples that should not be seen by older transactions and moving the tuples to the FrozenXID would make them visible. Sven -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY TO returning empty result with parallel ALTER TABLE
On Tue, 4 Nov 2014, Tom Lane wrote: Bernd Helmle maili...@oopsware.de writes: --On 3. November 2014 18:15:04 +0100 Sven Wegener sven.wege...@stealer.net wrote: I've check git master and 9.x and all show the same behaviour. I came up with the patch below, which is against curent git master. The patch modifies the COPY TO code to create a new snapshot, after acquiring the necessary locks on the source tables, so that it sees any modification commited by other backends. Well, i have the feeling that there's nothing wrong with it. The ALTER TABLE command has rewritten all tuples with its own XID, thus the current snapshot does not see these tuples anymore. I suppose that in SERIALIZABLE or REPEATABLE READ transaction isolation your proposal still doesn't return the tuples you'd like to see. Not sure. The OP's point is that in a SELECT, you do get unsurprising results, because a SELECT will acquire its execution snapshot after it's gotten AccessShareLock on the table. Arguably COPY should behave likewise. Or to be even more concrete, COPY (SELECT * FROM tab) TO ... probably already acts like he wants, so why isn't plain COPY equivalent to that? No, both plain COPY and COPY (SELECT ...) TO are broken. Sven -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices
On 11/04/2014 07:33 AM, Tom Lane wrote: More generally, it seems like a grab bag of not terribly well designed features, and the features that do seem well designed seem like they ought to be more generic than just for int4 arrays. So to me it feels like proof-of-concept experimentation rather than a production-grade thing that we could feel good about moving into core. Yah, as a user of intarray, I think it belongs as an external extension. Heck, I think it might get more attention as an external extension, and if it were external then folks could fork it and create versions that don't have operator conflicts. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] superuser() shortcuts
Thanks for looking at this patch. I suggest moving the rest of the changes into separate patches. Hmmm... perhaps the following? * superuser-cleanup - contains above mentioned superuser shortcuts only. * has_privilege-cleanup - contains has_*_priviledge cleanup only. Would that also require a separate commitfest entry? The ha*_something_privilege() changes are also not very consistent. We already have have_createrole_privilege(), which does include a superuser check, and you add has_replication_privilege() with a superuser check, but has_catupdate_privilege() and has_inherit_privilege() don't include a superuser check. That's clearly a mess. Good catch. Though, according to the documentation, not even superuser is allowed to bypass CATUPDATE. http://www.postgresql.org/docs/9.4/static/catalog-pg-authid.html. However, I can't think of a reason why inherit wouldn't need the superuser check. Obviously superuser is considered a member of every role, but is there a reason that a superuser would not be allowed to bypass this? I only ask because it did not have a check previously, so I figure there might have been a good reason for it? Btw., why rename have_createrole_privilege()? Well, actually it wasn't necessarily a rename. It was a removal of that function all together as all it did was simply return the result of has_createrole_privilege. That seemed rather redundant and unnecessary, IMO. Also, your patch has spaces between tabs. Check for whitespace errors with git. Yikes. -Adam -- Adam Brightwell - adam.brightw...@crunchydatasolutions.com Database Engineer - www.crunchydatasolutions.com
Re: [BUGS] [HACKERS] COPY TO returning empty result with parallel ALTER TABLE
On 2014-11-04 13:51:23 -0500, Tom Lane wrote: Bernd Helmle maili...@oopsware.de writes: --On 3. November 2014 18:15:04 +0100 Sven Wegener sven.wege...@stealer.net wrote: I've check git master and 9.x and all show the same behaviour. I came up with the patch below, which is against curent git master. The patch modifies the COPY TO code to create a new snapshot, after acquiring the necessary locks on the source tables, so that it sees any modification commited by other backends. Well, i have the feeling that there's nothing wrong with it. The ALTER TABLE command has rewritten all tuples with its own XID, thus the current snapshot does not see these tuples anymore. I suppose that in SERIALIZABLE or REPEATABLE READ transaction isolation your proposal still doesn't return the tuples you'd like to see. Not sure. The OP's point is that in a SELECT, you do get unsurprising results, because a SELECT will acquire its execution snapshot after it's gotten AccessShareLock on the table. Arguably COPY should behave likewise. Or to be even more concrete, COPY (SELECT * FROM tab) TO ... probably already acts like he wants, so why isn't plain COPY equivalent to that? Even a plain SELECT essentially acts that way if I recall correctly if you use REPEATABLE READ+ and force a snapshot to be acquired beforehand. It's imo not very surprising. All ALTER TABLE rewrites just disregard visibility of existing tuples. Only CLUSTER/VACUUM FULL do the full hangups to keep all the necessary tuples + ctid chains around. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL replay bugs
Michael Paquier wrote: On Mon, Jul 14, 2014 at 6:14 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Although I doubt necessity of the flexibility seeing the current testing framework, I don't have so strong objection about that. Nevertheless, perhaps you are appreciated to put a notice on.. README or somewhere. Hm, well... Fine, I added it in this updated series. Did this go anywhere? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [JDBC] [HACKERS] Pipelining executions to postgresql server
Kevin Wooten kd...@me.com wrote: On Nov 4, 2014, at 12:55 AM, Craig Ringer cr...@2ndquadrant.com wrote: I have to say I like some aspects of how pgjdbc-ng is being done - in particular use of Maven and being built around non-blocking I/O. OTOH, the use of Google Guava I find pretty inexplicable in a JDBC driver, and since it hard-depends on JDK 7 I don't understand why Netty was used instead of the async / non-blocking features of the core JVM. That may simply be my inexperience with writing non-blocking socket I/O code on Java though. Java6 has been EOL since 2011 and Java7 is EOL in less than 6 months. I think that depending on old Java 7 version that soon should not even be used in production (without paying for extra support) can hardly be too hard requirement. It confuses me as to why you consider using stable, well implemented, well tested and well cared for libraries as inexplicable. Just because we are building a “driver” means we have to write every line of code ourselves? No thanks. Embedding parts of other projects into code-base during build with renamed packages is nowadays common practice in java projects: spring does it, elasticsearch embeds whole netty and more, even jre embeds for example xerces and asm. It might not be the optimal solution, but still definitely better than writing everything from scratch or copy-pasting code from other projects. If pgjdbc-ng provides both a thin maven version (with external versioned dependencies) and a fat-jar with the external versions repackaged inside the users can choose either old way: just-drop-a-jdbc-jar-into-project or new way with their chosen build tool that automatically manages the dependencies. -Mikko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup fails with long tablespace paths
On 10/20/14 4:51 PM, Peter Eisentraut wrote: On 10/20/14 2:59 PM, Tom Lane wrote: What do we want to do about this? I think a minimum expectation would be for pg_basebackup to notice and complain when it's trying to create an unworkably long symlink entry, but it would be far better if we found a way to cope instead. Isn't it the backend that should error out before sending truncated files names? src/port/tar.c: /* Name 100 */ sprintf(h[0], %.99s, filename); Here are patches to address that. First, it reports errors when attempting to create a tar header that would truncate file or symlink names. Second, it works around the problem in the tests by creating a symlink from the short-name tempdir that we had set up for the Unix-socket directory case. The first patch can be backpatched to 9.3. The tar code before that is different and would need manual adjustments. If someone has a too-long tablespace path, I think they can work around that after this patch by creating a shorter symlink and updating the pg_tblspc symlinks to point there. From 16d5eb9514c89391e6c2361212363a7ac2f16e0b Mon Sep 17 00:00:00 2001 From: Peter Eisentraut peter_e@gmx.net Date: Tue, 4 Nov 2014 15:19:18 -0500 Subject: [PATCH 1/2] Error when creating names too long for tar format The tar format (at least the version we are using), does not support file names or symlink targets longer than 99 bytes. Until now, the tar creation code would silently truncate any names that are too long. (Its original application was pg_dump, where this never happens.) This creates problems when running base backups over the replication protocol. The most important problem is when a tablespace path is longer than 99 bytes, which will result in a truncated tablespace path being backed up. Less importantly, the basebackup protocol also promises to back up any other files it happens to find in the data directory, which would also lead to file name truncation if someone put a file with a long name in there. Now both of these cases result in an error during the backup. --- src/backend/replication/basebackup.c | 21 - src/include/pgtar.h | 10 +- src/port/tar.c | 10 +- 3 files changed, 38 insertions(+), 3 deletions(-) diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index fbcecbb..5835725 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -1234,11 +1234,30 @@ _tarWriteHeader(const char *filename, const char *linktarget, struct stat * statbuf) { char h[512]; + enum tarError rc; - tarCreateHeader(h, filename, linktarget, statbuf-st_size, + rc = tarCreateHeader(h, filename, linktarget, statbuf-st_size, statbuf-st_mode, statbuf-st_uid, statbuf-st_gid, statbuf-st_mtime); + switch (rc) + { + case TAR_OK: + break; + case TAR_NAME_TOO_LONG: + ereport(ERROR, + (errmsg(file name too long for tar format: \%s\, + filename))); + break; + case TAR_SYMLINK_TOO_LONG: + ereport(ERROR, + (errmsg(symbolic link target too long for tar format: file name \%s\, target \%s\, + filename, linktarget))); + break; + default: + elog(ERROR, unrecognized tar error: %d, rc); + } + pq_putmessage('d', h, 512); } diff --git a/src/include/pgtar.h b/src/include/pgtar.h index 2dd6f6b..6b6c1e1 100644 --- a/src/include/pgtar.h +++ b/src/include/pgtar.h @@ -11,5 +11,13 @@ * *- */ -extern void tarCreateHeader(char *h, const char *filename, const char *linktarget, size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime); + +enum tarError +{ + TAR_OK = 0, + TAR_NAME_TOO_LONG, + TAR_SYMLINK_TOO_LONG +}; + +extern enum tarError tarCreateHeader(char *h, const char *filename, const char *linktarget, size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime); extern int tarChecksum(char *header); diff --git a/src/port/tar.c b/src/port/tar.c index 09fd6c1..74168e8 100644 --- a/src/port/tar.c +++ b/src/port/tar.c @@ -49,10 +49,16 @@ tarChecksum(char *header) * must always have space for 512 characters, which is a requirement by * the tar format. */ -void +enum tarError tarCreateHeader(char *h, const char *filename, const char *linktarget, size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime) { + if (strlen(filename) 99) + return TAR_NAME_TOO_LONG; + + if (linktarget strlen(linktarget) 99) + return TAR_SYMLINK_TOO_LONG; + /* * Note: most of the fields in a tar header are not supposed to be * null-terminated. We use sprintf, which will write a null after the @@ -141,4 +147,6 @@ tarCreateHeader(char *h, const char *filename, const char *linktarget, * 6 digits, a space, and a null, which is legal per POSIX. */ sprintf(h[148], %06o , tarChecksum(h)); + + return TAR_OK; } -- 2.1.3 From 25f6daded3a4be8182af2d185f4826708f7a1b76 Mon
Re: [HACKERS] Sequence Access Method WIP
On 04/11/14 13:11, Heikki Linnakangas wrote: On 10/13/2014 01:01 PM, Petr Jelinek wrote: Hi, I rewrote the patch with different API along the lines of what was discussed. Thanks, that's better. It would be good to see an alternative seqam to implement this API, to see how it really works. The local one is too dummy to expose any possible issues. Yeah, I don't know what that would be, It's hard for me to find something that would sever purely demonstration purposes. I guess I could port BDR sequences to this if it would help (once we have bit more solid agreement that the proposed API at least theoretically seems ok so that I don't have to rewrite it 10 times if at all possible). ... Only the alloc and reloptions methods are required (and implemented by the local AM). The caching, xlog writing, updating the page, etc is handled by backend, the AM does not see the tuple at all. I decided to not pass even the struct around and just pass the relevant options because I think if we want to abstract the storage properly then the AM should not care about how the pg_sequence looks like at all, even if it means that the sequence_alloc parameter list is bit long. Hmm. The division of labour between the seqam and commands/sequence.c still feels a bit funny. sequence.c keeps track of how many values have been WAL-logged, and thus usable immediately, but we still call sequence_alloc even when using up those already WAL-logged values. If you think of using this for something like a centralized sequence server in a replication cluster, you certainly don't want to make a call to the remote server for every value - you'll want to cache them. With the local seqam, there are two levels of caching. Each backend caches some values (per the CACHE value option in CREATE SEQUENCE). In addition to that, the server WAL-logs 32 values at a time. If you have a remote seqam, it would most likely add a third cache, but it would interact in strange ways with the second cache. Considering a non-local seqam, the locking is also a bit strange. The server keeps the sequence page locked throughout nextval(). But if the actual state of the sequence is maintained elsewhere, there's no need to serialize the calls to the remote allocator, i.e. the sequence_alloc() calls. I'm not exactly sure what to do about that. One option is to completely move the maintenance of the current value, i.e. sequence.last_value, to the seqam. That makes sense from an abstraction point of view. For example with a remote server managing the sequence, storing the current value in the local catalog table makes no sense as it's always going to be out-of-date. The local seqam would store it as part of the am-private data. However, you would need to move the responsibility of locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to provide an API that the seqam can call to do that. Perhaps just let the seqam call heap_inplace_update on the sequence relation. My idea of how this works is - sequence_next handles the allocation and the level2 caching (WAL logged cache) via amdata if it supports it, or returns single value if it doesn't - then the WAL will always just write the 1 value and there will basically be no level2 cache, since it is the sequence_next who controls how much will be WAL-logged, what backend asks for is just suggestion. The third level caching as you say should be solved by the sequence_request_update and sequence_update callback - that will enable the sequence AM to handle this kind of caching asynchronously without blocking the sequence_next unnecessarily. That way it's possible to implement many different strategies, from no cache at all, to three levels of caching, with and without blocking of calls to sequence_next. For the amdata handling (which is the AM's private data variable) the API assumes that (Datum) 0 is NULL, this seems to work well for reloptions so should work here also and it simplifies things a little compared to passing pointers to pointers around and making sure everything is allocated, etc. Sadly the fact that amdata is not fixed size and can be NULL made the page updates of the sequence relation quite more complex that it used to be. It would be nice if the seqam could define exactly the columns it needs, with any datatypes. There would be a set of common attributes: sequence_name, start_value, cache_value, increment_by, max_value, min_value, is_cycled. The local seqam would add last_value, log_cnt and is_called to that. A remote seqam that calls out to some other server might store the remote server's hostname etc. There could be a seqam function that returns a TupleDesc with the required columns, for example. Wouldn't that somewhat bloat catalog if we had new catalog table for each sequence AM? It also does not really solve the amdata being dynamic size issue. I think just dynamic field where AM stores whatever it wants is enough (ie the current solution), it's
Re: [HACKERS] tracking commit timestamps
On 04/11/14 09:05, Andres Freund wrote: On 2014-11-02 19:27:25 +0100, Petr Jelinek wrote: Well, Michael has point that the extradata is pretty much useless currently, perhaps it would help to add the interface to set extradata? Only accessible via C and useless aren't the same thing. But sure, add it. I actually meant nicer C api - the one that will make it possible to say for this transaction, use this extradata (or for all transactions from now on done by this backend use this extradata), instead of current API where you have to overwrite what RecordCommit already wrote. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL replay bugs
Michael Paquier wrote: On Mon, Jul 14, 2014 at 6:14 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Although I doubt necessity of the flexibility seeing the current testing framework, I don't have so strong objection about that. Nevertheless, perhaps you are appreciated to put a notice on.. README or somewhere. Hm, well... Fine, I added it in this updated series. FWIW I gave this a trial run and found I needed some tweaks to test.sh and the Makefile in order to make it work on VPATH; mainly replace ./ with `dirname $0` in a couple test.sh in a couple of places, and something similar in the Makefile. Also you have $PG_ROOT_DIR somewhere which doesn't work. Also you have the Makefile checking for -DBUFFER_CAPTURE exactly but for some reason I used -DBUFFER_CAPTURE=1 which wasn't well received by your $(filter) stuff. Instead of checking CFLAGS it might make more sense to expose it as a read-only GUC and grep `postmaster -C buffer_capture` or similar. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On 04/11/14 09:19, Michael Paquier wrote: On Sat, Nov 1, 2014 at 10:00 PM, Michael Paquier michael.paqu...@gmail.com mailto:michael.paqu...@gmail.com wrote: More comments: Here are also more comments about the code that I found while focusing on committs.c: 1) When the GUC is not enabled, and when the transaction ID provided is not in a correct range, a dummy value based on InvalidTransactionId (?!) is returned, like that: + /* Return empty if module not enabled */ + if (!commit_ts_enabled) + { + if (ts) + *ts = InvalidTransactionId; + if (data) + *data = (CommitExtraData) 0; + return; + } This leads to some incorrect results: =# select pg_get_transaction_committime('1'); pg_get_transaction_committime --- 2000-01-01 09:00:00+09 (1 row) Oh, I had this on my TODO and somehow forgot about it, I am leaning towards throwing an error when calling any committs get function while the tracking is disabled. Or for example: =# SELECT txid_current(); txid_current -- 1006 (1 row) =# select pg_get_transaction_committime('1006'); pg_get_transaction_committime --- 2014-11-04 14:54:37.589381+09 (1 row) =# select pg_get_transaction_committime('1007'); pg_get_transaction_committime --- 2000-01-01 09:00:00+09 (1 row) =# SELECT txid_current(); txid_current -- 1007 (1 row) And at other times error is not that helpful for user: =# select pg_get_transaction_committime('1'); ERROR: XX000: could not access status of transaction 1 DETAIL: Could not read from file pg_committs/ at offset 114688: Undefined error: 0. LOCATION: SlruReportIOError, slru.c:896 I think that you should simply return an error in TransactionIdGetCommitTsData when xid is not in the range supported. That will be more transparent for the user. Agreed. 5) Reading the code, TransactionTreeSetCommitTimestamp is always called with do_xlog = false, making that actually no timestamps are WAL'd... Hence WriteSetTimestampXlogRec is just dead code with the latest version of the patch. IMO, this should be always WAL-logged when track_commit_timestamp is on. As Andres explained this is always WAL-logged when called by current caller so we don't want it to be double logged, so that's why do_xlog = false, but when extension will need call it, it will most likely need do_xlog = true. 8) The redo and xlog routines of committs should be out in a dedicated file, like committsxlog.c or similar. The other rmgr do so, so let's be consistent. Most actually don't AFAICS. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On 11/3/14 5:17 PM, Petr Jelinek wrote: Please don't name anything committs. That looks like a misspelling of something. There is nothing wrong with pg_get_transaction_commit_timestamp() If you want to reduce the length, lose the get. I am fine with that, I only wonder if your definition of anything only concerns the SQL interfaces or also the internals. I'd be fine with commit_ts for internals, but not committs. One day, you'll need a function or data structure that works with multiple of these, and then you'll really be in naming trouble. ;-) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On 04/11/14 09:25, Michael Paquier wrote: On Tue, Nov 4, 2014 at 5:05 PM, Andres Freund and...@2ndquadrant.com mailto:and...@2ndquadrant.com wrote: On 2014-11-02 19:27:25 +0100, Petr Jelinek wrote: Well, Michael has point that the extradata is pretty much useless currently, perhaps it would help to add the interface to set extradata? Only accessible via C and useless aren't the same thing. But sure, add it. I'm still on a -1 for that. You mentioned that there is perhaps no reason to delay a decision on this matter, but IMO there is no reason to rush either in doing something we may regret. And I am not the only one on this thread expressing concern about this extra data thingy. If this extra data field is going to be used to identify from which node a commit comes from, then it is another feature than what is written on the subject of this thread. In this case let's discuss it in the thread dedicated to replication identifiers, or come up with an extra patch once the feature for commit timestamps is done. The issue as I see it is that both of those features are closely related and just one without the other has very limited use. What I learned from working on UDR is that for conflict handling, I was actually missing the extradata more than the timestamp itself - in other words I have extension for 9.4 where I have use for this API already, so the argument about dead code or forks or whatever does not really hold. The other problem is that if we add extradata later we will either break upgrade-ability of will have to write essentially same code again which will store just the extradata instead of the timestamp, I don't really like either of those options to be honest. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL replay bugs
On 11/4/14 3:21 PM, Alvaro Herrera wrote: FWIW I gave this a trial run and found I needed some tweaks to test.sh and the Makefile in order to make it work on VPATH; mainly replace ./ with `dirname $0` in a couple test.sh in a couple of places, and something similar in the Makefile. Also you have $PG_ROOT_DIR somewhere which doesn't work. I also saw some bashisms in the script. Maybe the time for shell-based test scripts has passed? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On 04/11/14 22:20, Peter Eisentraut wrote: On 11/3/14 5:17 PM, Petr Jelinek wrote: Please don't name anything committs. That looks like a misspelling of something. There is nothing wrong with pg_get_transaction_commit_timestamp() If you want to reduce the length, lose the get. I am fine with that, I only wonder if your definition of anything only concerns the SQL interfaces or also the internals. I'd be fine with commit_ts for internals, but not committs. One day, you'll need a function or data structure that works with multiple of these, and then you'll really be in naming trouble. ;-) Hmm we use CommitTs in interfaces that uses CamelCase naming so I guess commit_ts is indeed natural expansion of that. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivelog completion command
On 11/2/14 8:26 AM, Magnus Hagander wrote: The idea is to have pg_receivexlog fire off an external command at the end of each segment - for example a command to gzip the file, or to archive it off into a Magic Cloud (TM) or something like that. A simple facility to allow gzipping after the file is complete might be OK, but the cloud use case is probably too abstract to be useful. I'd rather write my own consumer for that, or go back to archive_command, which has the queuing logic built in already. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [GENERAL] Re: [BUGS] [HACKERS] COPY TO returning empty result with parallel ALTER TABLE
Andres Freund and...@2ndquadrant.com writes: On 2014-11-04 13:51:23 -0500, Tom Lane wrote: Not sure. The OP's point is that in a SELECT, you do get unsurprising results, because a SELECT will acquire its execution snapshot after it's gotten AccessShareLock on the table. Arguably COPY should behave likewise. Or to be even more concrete, COPY (SELECT * FROM tab) TO ... probably already acts like he wants, so why isn't plain COPY equivalent to that? Even a plain SELECT essentially acts that way if I recall correctly if you use REPEATABLE READ+ and force a snapshot to be acquired beforehand. It's imo not very surprising. It doesn't fail in a non-default isolation mode is hardly much of an argument for this being okay in READ COMMITTED. All ALTER TABLE rewrites just disregard visibility of existing tuples. Only CLUSTER/VACUUM FULL do the full hangups to keep all the necessary tuples + ctid chains around. Yeah, and I think that it's entirely reasonable for rewriting ALTER TABLEs to update the xmin of the rewritten tuples; after all, the output data could be arbitrarily different from what the previous transactions put into the table. But that is not the question here. If the COPY blocks until the ALTER completes --- as it must --- why is its execution snapshot not taken *after* the lock is acquired? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [GENERAL] Re: [BUGS] [HACKERS] COPY TO returning empty result with parallel ALTER TABLE
--On 4. November 2014 17:18:14 -0500 Tom Lane t...@sss.pgh.pa.us wrote: Yeah, and I think that it's entirely reasonable for rewriting ALTER TABLEs to update the xmin of the rewritten tuples; after all, the output data could be arbitrarily different from what the previous transactions put into the table. But that is not the question here. If the COPY blocks until the ALTER completes --- as it must --- why is its execution snapshot not taken *after* the lock is acquired? COPY waits in DoCopy() but its snapshot gets acquired in PortalRunUtility() earlier. SELECT has it's lock already during transform/analyse phase and its snapshot is taken much later. Looks like we need something that analyses a utility statement to get its lock before executing. -- Thanks Bernd -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] ltree::text not immutable?
I wrote: An alternative that just occurred to me is to put the no-volatile- I/O-functions check into CREATE TYPE, but make it just WARNING not ERROR. That would be nearly as good as an ERROR in terms of nudging people who'd accidentally omitted a volatility marking from their custom types. But we could leave chkpass as-is and wait to see if anyone complains hey, why am I getting this warning?. If we don't hear anyone complaining, maybe that means we can get away with changing the type's behavior in 9.6 or later. Attached is a complete patch along these lines. As I suggested earlier, this just makes the relevant changes in ltree--1.0.sql and pg_trgm--1.1.sql without bumping their extension version numbers, since it doesn't seem important enough to justify a version bump. I propose that we could back-patch the immutability-additions in ltree and pg_trgm, since they won't hurt anything and might make life a little easier for future adopters of those modules. The WARNING additions should only go into HEAD though. regards, tom lane diff --git a/contrib/ltree/ltree--1.0.sql b/contrib/ltree/ltree--1.0.sql index 5a2f375..7d55fc6 100644 *** a/contrib/ltree/ltree--1.0.sql --- b/contrib/ltree/ltree--1.0.sql *** *** 6,17 CREATE FUNCTION ltree_in(cstring) RETURNS ltree AS 'MODULE_PATHNAME' ! LANGUAGE C STRICT; CREATE FUNCTION ltree_out(ltree) RETURNS cstring AS 'MODULE_PATHNAME' ! LANGUAGE C STRICT; CREATE TYPE ltree ( INTERNALLENGTH = -1, --- 6,17 CREATE FUNCTION ltree_in(cstring) RETURNS ltree AS 'MODULE_PATHNAME' ! LANGUAGE C STRICT IMMUTABLE; CREATE FUNCTION ltree_out(ltree) RETURNS cstring AS 'MODULE_PATHNAME' ! LANGUAGE C STRICT IMMUTABLE; CREATE TYPE ltree ( INTERNALLENGTH = -1, *** CREATE OPERATOR CLASS ltree_ops *** 303,314 CREATE FUNCTION lquery_in(cstring) RETURNS lquery AS 'MODULE_PATHNAME' ! LANGUAGE C STRICT; CREATE FUNCTION lquery_out(lquery) RETURNS cstring AS 'MODULE_PATHNAME' ! LANGUAGE C STRICT; CREATE TYPE lquery ( INTERNALLENGTH = -1, --- 303,314 CREATE FUNCTION lquery_in(cstring) RETURNS lquery AS 'MODULE_PATHNAME' ! LANGUAGE C STRICT IMMUTABLE; CREATE FUNCTION lquery_out(lquery) RETURNS cstring AS 'MODULE_PATHNAME' ! LANGUAGE C STRICT IMMUTABLE; CREATE TYPE lquery ( INTERNALLENGTH = -1, *** CREATE OPERATOR ^? ( *** 414,425 CREATE FUNCTION ltxtq_in(cstring) RETURNS ltxtquery AS 'MODULE_PATHNAME' ! LANGUAGE C STRICT; CREATE FUNCTION ltxtq_out(ltxtquery) RETURNS cstring AS 'MODULE_PATHNAME' ! LANGUAGE C STRICT; CREATE TYPE ltxtquery ( INTERNALLENGTH = -1, --- 414,425 CREATE FUNCTION ltxtq_in(cstring) RETURNS ltxtquery AS 'MODULE_PATHNAME' ! LANGUAGE C STRICT IMMUTABLE; CREATE FUNCTION ltxtq_out(ltxtquery) RETURNS cstring AS 'MODULE_PATHNAME' ! LANGUAGE C STRICT IMMUTABLE; CREATE TYPE ltxtquery ( INTERNALLENGTH = -1, *** CREATE OPERATOR ^@ ( *** 481,492 CREATE FUNCTION ltree_gist_in(cstring) RETURNS ltree_gist AS 'MODULE_PATHNAME' ! LANGUAGE C STRICT; CREATE FUNCTION ltree_gist_out(ltree_gist) RETURNS cstring AS 'MODULE_PATHNAME' ! LANGUAGE C STRICT; CREATE TYPE ltree_gist ( internallength = -1, --- 481,492 CREATE FUNCTION ltree_gist_in(cstring) RETURNS ltree_gist AS 'MODULE_PATHNAME' ! LANGUAGE C STRICT IMMUTABLE; CREATE FUNCTION ltree_gist_out(ltree_gist) RETURNS cstring AS 'MODULE_PATHNAME' ! LANGUAGE C STRICT IMMUTABLE; CREATE TYPE ltree_gist ( internallength = -1, diff --git a/contrib/pg_trgm/pg_trgm--1.1.sql b/contrib/pg_trgm/pg_trgm--1.1.sql index 1fff7af..34b37e4 100644 *** a/contrib/pg_trgm/pg_trgm--1.1.sql --- b/contrib/pg_trgm/pg_trgm--1.1.sql *** CREATE OPERATOR - ( *** 53,64 CREATE FUNCTION gtrgm_in(cstring) RETURNS gtrgm AS 'MODULE_PATHNAME' ! LANGUAGE C STRICT; CREATE FUNCTION gtrgm_out(gtrgm) RETURNS cstring AS 'MODULE_PATHNAME' ! LANGUAGE C STRICT; CREATE TYPE gtrgm ( INTERNALLENGTH = -1, --- 53,64 CREATE FUNCTION gtrgm_in(cstring) RETURNS gtrgm AS 'MODULE_PATHNAME' ! LANGUAGE C STRICT IMMUTABLE; CREATE FUNCTION gtrgm_out(gtrgm) RETURNS cstring AS 'MODULE_PATHNAME' ! LANGUAGE C STRICT IMMUTABLE; CREATE TYPE gtrgm ( INTERNALLENGTH = -1, diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c index 55a6881..cbb65f8 100644 *** a/src/backend/commands/typecmds.c --- b/src/backend/commands/typecmds.c *** DefineType(List *names, List *parameters *** 547,552 --- 547,598 NameListToString(analyzeName)); #endif + /* + * Print warnings if any of the type's I/O functions are marked volatile. + * There is a general assumption that I/O functions are stable or + * immutable; this allows us for example to mark record_in/record_out + * stable
[HACKERS] to_char_at_timezone()?
Hi, 9.4 FINALLY added the UTC offset formatting pattern to to_char(). However, it falls a bit short in the sense that it's always the time zone offset according to the effective TimeZone value. This has a few issues as far as I can tell: 1) It's not truly controlled by the query which produces the timestamptz values in the case of e.g. functions 2) Having to SET LOCAL before a query is quite ugly 3) It supports only a single TimeZone value per query So I got into thinking whether it would make sense to provide a new function, say, to_char_at_timezone() to solve this problem. For example: local:marko=#* select now(); now --- 2014-11-05 00:43:47.954662+01 (1 row) local:marko=#* select to_char_at_timezone(now(), '-MM-DD HH24:MI:SSOF', 'Etc/Utc'); to_char_at_timezone 2014-11-04 23:43:47+00 (1 row) local:marko=#* select to_char_at_timezone(now(), '-MM-DD HH24:MI:SSOF', 'America/Los_Angeles'); to_char_at_timezone 2014-11-04 15:43:47-08 (1 row) Any thoughts? The patch is quite trivial. .marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On Tue, Nov 4, 2014 at 10:01 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Michael Paquier wrote: I'm still on a -1 for that. You mentioned that there is perhaps no reason to delay a decision on this matter, but IMO there is no reason to rush either in doing something we may regret. And I am not the only one on this thread expressing concern about this extra data thingy. If this extra data field is going to be used to identify from which node a commit comes from, then it is another feature than what is written on the subject of this thread. In this case let's discuss it in the thread dedicated to replication identifiers, or come up with an extra patch once the feature for commit timestamps is done. Introducing the extra data field in a later patch would mean an on-disk representation change, i.e. pg_upgrade trouble. Then why especially 4 bytes for the extra field? Why not 8 or 16? -- Michael
Re: [HACKERS] to_char_at_timezone()?
Marko Tiikkaja ma...@joh.to writes: So I got into thinking whether it would make sense to provide a new function, say, to_char_at_timezone() to solve this problem. For example: ... Any thoughts? The patch is quite trivial. I'm not convinced that it's all that trivial. Is the input timestamp or timestamptz, and what's the semantics of that exactly (ie what timezone rotation happens)? One's first instinct is often wrong in this area. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] to_char_at_timezone()?
On 11/5/14, 12:59 AM, Tom Lane wrote: Marko Tiikkaja ma...@joh.to writes: So I got into thinking whether it would make sense to provide a new function, say, to_char_at_timezone() to solve this problem. For example: ... Any thoughts? The patch is quite trivial. I'm not convinced that it's all that trivial. Is the input timestamp or timestamptz, and what's the semantics of that exactly (ie what timezone rotation happens)? One's first instinct is often wrong in this area. In my example, the input is a timestamptz, and the output is converted to the target time zone the same way timestamptz_out() does, except based on the input timezone instead of TimeZone. Not sure whether it would make sense to do this for timestamp, or whether there's even a clear intuitive behaviour there. .marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start
On 04/11/14 09:07, Craig Ringer wrote: On 11/04/2014 07:31 AM, Álvaro Hernández Tortosa wrote: Thank you for your comment, Tom. However I think this behavior, as seen from a user perspective, it's not the expected one. That may be the case, but I think it's the SQL-standard behaviour, so we can't really mess with it. The spec requires SET TRANSACTION ISOLATION, and you can't implement that if you take a snapshot at BEGIN. It's true that the standard mandates SET TRANSACTION rather than setting the isolation level with the BEGIN statement, and in any case you can raise/lower the isolation level with SET regardless of what the session or the begin command said. However, is it really a problem taking a snapshot at BEGIN time --only if the tx is started with BEGIN ... (REPEATABLE READ | SERIALIZABLE)? AFAIK, and I may be missing some internal details here, the worst that can happen is that you took one extra, unnecessary snapshot. I don't see that as a huge problem. The standard (92) says that transaction is initiated when a transaction-initiating SQL-statement is executed. To be fair, that sounds to me more of a SELECT rather than a BEGIN, but I may be wrong. If it is still the intended behavior, I think it should be clearly documented as such, and a recommendation similar to issue a 'SELECT 1' right after BEGIN to freeze the data before any own query or similar comment should be added. Again, as I said in my email, the documentation clearly says that only sees data committed before the transaction began. And this is clearly not the real behavior. It's more of a difference in when the transaction begins. Arguably, BEGIN says I intend to begin a new transaction with the next query rather than immediately begin executing a new transaction. This concept could be clearer in the docs. If this is really how it should behave, I'd +1000 to make it clearer in the docs, and to explicitly suggest the user to perform a query discarding the results early after BEGIN if the user wants the state freezed if there may span time between BEGIN and the real queries to be executed (like doing a SELECT 1). Sure, there are, that was the link I pointed out, but I found no explicit mention to the fact that I'm raising here. I'm sure it's documented *somewhere*, in that I remember reading about this detail in the docs, but I can't find _where_ in the docs. It doesn't seem to be in: http://www.postgresql.org/docs/current/static/transaction-iso.html where I'd expect. Yepp, there's no mention there. In any case, we simply cannot take the snapshot at BEGIN time, because it's permitted to: SET TRANSACTION ISOLATION LEVEL READ COMMITTED; in a DB that has default serializable isolation or has a SET SESSION CHARACTERISTICS isolation mode of serializable. Note that SET TRANSACTION is SQL-standard. As I said, AFAIK it shouldn't matter a lot to take the snapshot at BEGIN. The worst that can happen is that you end up in read committed and you need to take more snapshots, one per query. AFAIK deferring the snapshot that's consistent with other RDBMSes that use snapshots, too. I tried Oracle and SQL Server. SQL Server seems to behave as PostgreSQL, but just because it locks the table if accessed in a serializable transaction, so it definitely waits until select to lock it. However, Oracle behaved as I expected: data is frozen at BEGIN time. I haven't tested others. The docs of that command allude to, but doesn't explicitly state, the behaviour you mention. http://www.postgresql.org/docs/current/static/sql-set-transaction.html Should we improve then the docs stating this more clearly? Any objection to do this? Regards, Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On 2014-11-05 08:57:07 +0900, Michael Paquier wrote: On Tue, Nov 4, 2014 at 10:01 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Michael Paquier wrote: I'm still on a -1 for that. You mentioned that there is perhaps no reason to delay a decision on this matter, but IMO there is no reason to rush either in doing something we may regret. And I am not the only one on this thread expressing concern about this extra data thingy. If this extra data field is going to be used to identify from which node a commit comes from, then it is another feature than what is written on the subject of this thread. In this case let's discuss it in the thread dedicated to replication identifiers, or come up with an extra patch once the feature for commit timestamps is done. Introducing the extra data field in a later patch would mean an on-disk representation change, i.e. pg_upgrade trouble. Then why especially 4 bytes for the extra field? Why not 8 or 16? It's sufficiently long that you can build infrastructure to storing more transaction metadata data ontop. I could live making it 8 bytes, but I don't see a clear advantage. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Preferring MemSet or memset?
Hi all, MemSet is an internal macro faster than the system memset for zeroing small word-aligned structures defined in src/include/c.h. Both are being used here and there with no real preference. An example of that is that in many code paths we have for example nulls and values used to build tuples, that are sometimes initialized with memset, other times with MemSet. Wouldn't we gain a bit of performance by switching to MemSet the initializations of nulls and values currently done with memset? Opinions? Regards, -- Michael
Re: [HACKERS] Preferring MemSet or memset?
Michael Paquier michael.paqu...@gmail.com writes: MemSet is an internal macro faster than the system memset for zeroing small word-aligned structures defined in src/include/c.h. Both are being used here and there with no real preference. An example of that is that in many code paths we have for example nulls and values used to build tuples, that are sometimes initialized with memset, other times with MemSet. Wouldn't we gain a bit of performance by switching to MemSet the initializations of nulls and values currently done with memset? MemSet is *not* a win unless the target area is word-sized+word-aligned, so it'd be a pretty unlikely thing for it to win on zeroing bool arrays. In principle it could win for zeroing Datum arrays, but I don't think I want to read code that is zeroing a Datum array with MemSet and the adjacent bool array with memset; that's just weird, and it's unlikely that we get enough win out of it to justify the notational inconsistency. I keep expecting that we'll find that on modern platforms memset is a winner across the board. There is no reason that gcc couldn't optimize memset calls into code as good as or better than MemSet, and I have the impression that the compiler guys have actually put some effort into optimizing it lately. I'm not sure if anyone's recently redone the tests that showed MemSet was worthwhile ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Convert query plan to sql query
Hello, I would like to transform the query plan (output of the planner, debug_print_plan) into an sql query. I know that there are pieces of the query plan that might be machine dependent (in var for example). So I wanted to have your suggestions or thoughts before I put efforts into it. Basically, if I have: query1 - parser - rewriter - planner the process would be : query_plan - planner - parser - query2 query1 and query2 are not necessarily the same due to rewrite, stats.. Thanks! -- View this message in context: http://postgresql.1045698.n5.nabble.com/Convert-query-plan-to-sql-query-tp5825727.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add CREATE support to event triggers
On Fri, Oct 31, 2014 at 11:27 AM, Michael Paquier michael.paqu...@gmail.com wrote: So, what I think is missing is really a friendly interface to manipulate JsonbContainers directly, and I think that we are not far from it with something like this set, roughly: - Initialization of an empty container - Set of APIs to directly push a value to a container (boolean, array, null, string, numeric or other jsonb object) - Initialization of JsonbValue objects Here are more thoughts among those lines looking at the current state of the patch 4 that introduces the infrastructure of the whole feature. This would make possible in-memory manipulation of jsonb containers without relying on a 3rd-part set of APIs like what this patch is doing with ObjTree to deparse the DDL parse trees. 1) Set of constructor functions for JsonbValue: null, bool, string, array, JSONB object for nested values. Note that keys for can be used as Jsonb string objects 2) Lookup functions for values in a JsonbContainer. Patch 4 is introducing that with find_string_in_jsonbcontainer and find_bool_in_jsonbcontainer. We may as well extend it to be able to look for another Jsonb object for nested searches for example. 3) Functions to push JsonbValue within a container, using a key and a value. This is where most of the work would be necessary, for bool, null, string, Jsonb object and numeric. This infrastructure would allow in-memory manipulation of jsonb containers. Containers that can then be easily be manipulated to be changed back to strings and for value lookups using key strings. Regards, -- Michael
Re: [HACKERS] WAL replay bugs
Thanks for the tests. On Wed, Nov 5, 2014 at 5:21 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Michael Paquier wrote: On Mon, Jul 14, 2014 at 6:14 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Although I doubt necessity of the flexibility seeing the current testing framework, I don't have so strong objection about that. Nevertheless, perhaps you are appreciated to put a notice on.. README or somewhere. Hm, well... Fine, I added it in this updated series. FWIW I gave this a trial run and found I needed some tweaks to test.sh and the Makefile in order to make it work on VPATH; mainly replace ./ with `dirname $0` in a couple test.sh in a couple of places, and something similar in the Makefile. Also you have $PG_ROOT_DIR somewhere which doesn't work. Ah thanks, forgot that. Also you have the Makefile checking for -DBUFFER_CAPTURE exactly but for some reason I used -DBUFFER_CAPTURE=1 which wasn't well received by your $(filter) stuff. Instead of checking CFLAGS it might make more sense to expose it as a read-only GUC and grep `postmaster -C buffer_capture` or similar. Yes that's a good idea. Now, do we really want this feature in-core? That's somewhat a duplicate of what is mentioned here: http://www.postgresql.org/message-id/CAB7nPqQMq=4ejak317mxz4has0i+1rslbqu29zx18jwlb2j...@mail.gmail.com Of course both things do not have the same coverage as the former is for buildfarm and dev, while the latter is dedidated to production systems, but could be used for development as well. The patch sent there is a bit outdated, but a potential implementation gets simpler with XLogReadBufferForRedo able to return flags about each block state during redo. I am still planning to come back to it for this cycle, though I stopped for now waiting for the WAL format patches finish to shape the APIs this feature would rely on. Regards, -- Michael
Re: [HACKERS] WAL replay bugs
On Wed, Nov 5, 2014 at 6:29 AM, Peter Eisentraut pete...@gmx.net wrote: On 11/4/14 3:21 PM, Alvaro Herrera wrote: FWIW I gave this a trial run and found I needed some tweaks to test.sh and the Makefile in order to make it work on VPATH; mainly replace ./ with `dirname $0` in a couple test.sh in a couple of places, and something similar in the Makefile. Also you have $PG_ROOT_DIR somewhere which doesn't work. I also saw some bashisms in the script. Maybe the time for shell-based test scripts has passed? Except pg_upgrade, are there other tests using bash? -- Michael
Re: [HACKERS] Convert query plan to sql query
May be you want to check how it's done in Postgres-XC. Postgres-XC works on plans being created by PostgreSQL and reverse-engineers queries (for parts of the plans which are shippable.) The notions of shippability may not be of interest to you, but the code to reverse-engineer most of the plan nodes is there in Postgres-XC. On Wed, Nov 5, 2014 at 8:47 AM, mariem mariem.benfad...@gmail.com wrote: Hello, I would like to transform the query plan (output of the planner, debug_print_plan) into an sql query. I know that there are pieces of the query plan that might be machine dependent (in var for example). So I wanted to have your suggestions or thoughts before I put efforts into it. Basically, if I have: query1 - parser - rewriter - planner the process would be : query_plan - planner - parser - query2 query1 and query2 are not necessarily the same due to rewrite, stats.. Thanks! -- View this message in context: http://postgresql.1045698.n5.nabble.com/Convert-query-plan-to-sql-query-tp5825727.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices
On Tue, Nov 04, 2014 at 08:30:21AM +, Laurenz Albe wrote: David Fetter wrote: On Tue, Nov 04, 2014 at 07:51:06AM +0900, Tatsuo Ishii wrote: Just out of curiosity, why is Oracle's NUMBER (I assume you are talking about this) so fast? I suspect that what happens is that NUMBER is stored as a native type (int2, int4, int8, int16) that depends on its size and then cast to the next upward thing as needed, taking any performance hits at that point. The documentation hints (38 decimal places) at a 128-bit internal representation as the maximum. I don't know what happens when you get past what 128 bits can represent. No, Oracle stores NUMBERs as variable length field (up to 22 bytes), where the first byte encodes the sign and the comma position and the remaining bytes encode the digits, each byte representing two digits in base-100 notation (see Oracle Metalink note 1007641.6). Thanks for clearing that up, and sorry for spreading misinformed guesses. So it's not so different from PostgreSQL. No idea why their arithmetic should be faster. I have an idea, but this time, I think it's right. They have at least one team of people whose job it is to make sure that it is fast. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On 11/3/14, 2:36 PM, Alvaro Herrera wrote: Jim Nasby wrote: On 11/1/14, 8:41 AM, Petr Jelinek wrote: Well this is not BDR specific thing, the idea is that with logical replication, commit timestamp is not enough for conflict handling, you also need to have additional info in order to identify some types of conflicts conflicts (local update vs remote update for example). So the extradata field was meant as something that could be used to add the additional info to the xid. Related to this... is there any way to deal with 2 transactions that commit in the same microsecond? It seems silly to try and handle that for every commit since it should be quite rare, but perhaps we could store the LSN as extradata if we detect a conflict? Well, two things. One, LSN is 8 bytes and extradata (at least in this patch when I last saw it) is only 4 bytes. But secondly and more important is that detecting a conflict is done in node B *after* node A has recorded the transaction's commit time; there is no way to know at commit time that there is going to be a conflict caused by that transaction in the future. (If there was a way to tell, you could just as well not commit the transaction in the first place.) I'm worried about 2 commits in the same microsecond on the same system, not on 2 different systems. Or, put another way, if we're going to expose this I think it should also provide a guaranteed unique commit ordering for a single cluster. Presumably, this shouldn't be that hard since we do know the exact order in which things committed. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for better support of time-varying timezone abbreviations
On Wed, Oct 22, 2014 at 11:32:41AM -0400, Tom Lane wrote: I don't have a strong opinion about which of the above things to do ... what's your preference? I think it's better for the future if me make a clean cut. Yes, the option keeps compatability a little bit higher, but that doesn't matter that much as the codes won't be used by the server anyway. Besides, they may eventually change and then we have to make sure to remember ecpg's copy again. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org Jabber: michael.meskes at gmail dot com VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format and API changes (9.5)
On Tue, Nov 4, 2014 at 10:03 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 10/30/2014 06:02 PM, Andres Freund wrote: On 2014-10-29 10:24:20 +0200, Heikki Linnakangas wrote: On 10/06/2014 02:29 PM, Andres Freund wrote: On 2014-10-06 14:19:39 +0300, Heikki Linnakangas wrote: Barring objections, I'll commit this, and then continue benchmarking the second patch with the WAL format and API changes. I'd like to have a look at it beforehand. Ping? Here's an rebased patch. I'd like to proceed with this. Doing so. Here's a new version of this refactoring patch. It fixes all the concrete issues you pointed out. I've not yet really looked, but on a quick readthrough XLogInsertRecData() staying in xlog.c doesn't make me happy... Ok.. Can you elaborate? To me the split between xloginsert.c doing some of the record assembly, and xlog.c doing the lower level part of the assembly is just wrong. I moved the checks for bootstrap mode and xl_len == 0, from the current XLogInsert to the new XLogInsert() function. I don't imagine that to be enough address your feelings about the split between xlog.c and xloginsert.c, but makes it a tiny bit clearer IMHO. I don't know what else to do about it, as it feels very sensible to me as it is now. So unless you object more loudly, I'm going to just go ahead with this and commit, probably tomorrow. Few observations while reading the latest patch: 1. +XLogRecPtr +XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata) { .. + /* info's high bits are reserved for use by me */ + if (info XLR_INFO_MASK) + elog(PANIC, invalid xlog info mask %02X, info); .. } Earlier before this check, we use to check XLogInsertAllowed() which is moved to XLogInsertRecord(), isn't it better to keep the check in beginning of the function XLogInsert()? 2. XLogRecPtr XLogInsertRecord(XLogRecData *rdata, XLogRecPtr fpw_lsn) { .. if (fpw_lsn != InvalidXLogRecPtr fpw_lsn = RedoRecPtr doPageWrites) { /* * Oops, some buffer now needs to be backed up that the caller * didn't back up. Start over. */ WALInsertLockRelease(); END_CRIT_SECTION(); return InvalidXLogRecPtr; } .. } IIUC, there can be 4 states for doPageWrites w.r.t when record is getting assembled (XLogRecordAssemble) and just before actual insert ( XLogInsertRecord) I think the behaviour for the state when doPageWrites is true during XLogRecordAssemble and false during XLogInsertRecord (just before actual insert) is different as compare to HEAD. In the patch, it will not retry if doPageWrites is false when we are about to insert even though fpw_lsn = RedoRecPtr whereas in HEAD it will detect this during assembly of record and retry, isn't this a problem? 3. There are couple of places where *XLogInsert* is used in wal.sgml and it seems to me some of those needs change w.r.t this patch. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Convert query plan to sql query
mariem mariem.benfad...@gmail.com wrote: Hello, I would like to transform the query plan (output of the planner, debug_print_plan) into an sql query. I don't think SQL can express the information the plan contains. For example, join methods (hash, nest loop, merge). -- Antonin Houska Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers