Re: [HACKERS] matview incremental maintenance
On 2013-06-17 16:41 +02:00, Kevin Grittner wrote: Since there seems to be interest in discussing incremental maintenance of materialized views *now*, I'm starting this thread to try to avoid polluting unrelated threads with the discussion. I don't intend to spend a lot of time on it until the CF in progress completes, but at that point the work will start in earnest. So I'll say where I'm at, and welcome anyone who has time to spare outside of the CF to comment or contribute ideas. The paper at the core of the discussion can be found by searching for "maintaining views incrementally gupta mumick subrahmanian" -- it's on both the ACM and CiteSeerX websites. Of course, one i.e. Ashish Gupta, Inderpal Singh Mumick, and V. S. Subrahmanian. 1993. Maintaining views incrementally. In Proceedings of the 1993 ACM SIGMOD international conference on Management of data (SIGMOD '93), Peter Buneman and Sushil Jajodia (Eds.). ACM, New York, NY, USA, 157-166. DOI=10.1145/170035.170066 http://doi.acm.org/10.1145/170035.170066 just in case a direct reference might come in handy :-) ... All the ebst, Stefan. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard to Use WAS: Hard limit on WAL space
On 2013-06-15 08:44 CEST, Brendan Jurd wrote: On 15 June 2013 16:18, Craig Ringer ... wrote: On 06/15/2013 02:08 PM, Brendan Jurd wrote: On 15 June 2013 14:43, Craig Ringer ... wrote: The #1 question I see on Stack Overflow has to be confusion about pg_hba.conf, mostly from people who have no idea it exists, don't understand how to configure it, etc. The totally non-obvious name of the file probably has something to do with that. It should be called 'auth.conf'. Not convinced; since it only controls one facet of auth - it doesn't define users, passwords, grants, etc ... When somebody is setting up postgres for the first time, and they list the contents of the config directory, you want them to have some idea as they may not have read up to section 19.1 The pg_hba.conf File inside chapter 19 Client Authentication of part III. Server Administration :-?, which states (as of 9.2.4): """ Client authentication is controlled by a configuration file, which traditionally is named pg_hba.conf and is stored in the database cluster's data directory. (HBA stands for host-based authentication.) A default pg_hba.conf file is installed when the data directory is initialized by initdb. It is possible to place the authentication configuration file elsewhere, however; see the hba_file configuration parameter. ... """ ;-) thanks to hyperlinks this is quite close to the start, but I was surprised to not find it by skimming the text and following the hyperlinks but by knowing the filename instead and entering it ("pg_hba.conf") into the Search Documentation text field on the top right corner of http://www.postgresql.org/docs/9.2/interactive/index.html. Maybe we could find a better place of the whatever-then-name inside the part of the docs even the "TL;DR" mood people might read? A paragraph or two spiced up with some catchy StackOverflow-inspired terms people with a need to configure this authentication aspect might have expected could also be expected in INSTALL like docs or directly observable on the hyperlinked way from part I. Tutorial chapter 1 Getting Started section 1.1 Installation all down to chapter 15. Installation from Source Code. But of course only, if this is "wanted behavior". If I read the section 1.1 Installation (again 9.2.4) I have the impression, that it more transports the message in our case, that "you are the site admin, deal with it, read the docs", or don't I read it right? (I am a non-native English reader) what each of the files is for. If they see something called 'auth.conf', they'll get the right general idea. An understanding of the nuances (like that it doesn't control user accounts) will come once they open up the file -- which they may well do, because it is called 'auth.conf', and 'auth' is a thing you want to configure. that may well be, I do not know, how people that prefer reading folder and filenames over manuals written for them grok text, as I read the docs, promised ;-) If they see something called 'pg_hba.conf', they may very reasonably assume that it is some internal/advanced stuff that they don't need to worry about just yet, because what the heck is a 'pg_hba'? The 'pg' is unnecessary and the 'hba' is an internal jargon term that we've ill-advisedly allowed to leak out into the filename. at around 1995 when I started using Postgres95 it sure took some time to find that pg_hba.conf file, but I then perceived it to be very well documented, and also felt a bit guilty, as it's name occured in the INSTALL file cf. ftp://ftp-archives.postgresql.org/pub/source/v7.2/postgresql-7.2.tar.gz and the INSTALL file. Therein "burried" inside Step 1 of "If You Are Upgrading" ... If you really feel that 'auth.conf' is too imprecise, maybe something like 'conn-auth.conf' would be more your style. I think you guys did and still do a fantastic job with PostgreSQL and eps. it's documentation, but in this case I doubt, that any renaming of config files will really have an impact on usability in the shady area of "TL;DR" - at least for the next twenty years or so - as it still holds, that from a false start (eg. not reading documentation written) anything may follow. But as usability is a practical concern I (as a user) would be +0 on renaming it if people not finding it bearing the old name, but then editing it is really wanted behavior. All the best, Stefan. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)
On 2013-06-11 19:45 CEST, Greg Smith wrote: On 6/11/13 12:22 PM, Merlin Moncure wrote: Personally I think this patch should go in regardless -- the concerns made IMNSHO are specious. That's nice, but we have this process for validating whether features go in or not that relies on review instead of opinions. ;-) that's why I played with the test_fallocate.c, as it was easy to do and I understood, the author (of the patch) wanted to trigger some reviews ... I do not (yet) know anything about the core codes, so I leave this to the hackers. My review result was, that with newer gcc's you should specify an open mode flag as third argument of the fopen call, as only with the test tool nothing important found. Stefan. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DO ... RETURNING
On 2013-06-11 19:01 CEST, Hannu Krosing wrote: On 06/11/2013 05:27 PM, Merlin Moncure wrote: On Tue, Jun 11, 2013 at 9:45 AM, Stephen Frost ... wrote: * Merlin Moncure ... wrote: I agree with all your comments pretty much down the line. Need top level CALL that supports parameterization and multiple sets that utilizes background worker (we have example spi worker that gives some hints about how pl/pgsql could be made to work). Because it's top level (can't even be inlined to CTE), we can access behaviors that are not possible in current pl/pgsql, for example setting transaction isolation in advance of snapshot and changing database connection mid-procedure. And this still has next-to-nothing to do with the specific proposal that was put forward. It's a complete feature but completely relevant to the discussion -- the behaviors have a lot of overlap and CALL is in the standard whereas the ad hoc feature DO isn't. Could you point to the ISO/ANSI SQL CALL definition ? On the publicly available standards page of ISO: http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html searching in it for SQL shows link of "ISO/IEC 9075-1:2008": http://standards.iso.org/ittf/PubliclyAvailableStandards/c045498_ISO_IEC_9075-1_2008.zip when you follow and go through: http://standards.iso.org/ittf/licence.html i.e. accept the granted license, you may receive a personal single copy non-distributable PDF version of ISO/IEC 9075-1:2008, 3rd Edition, Information technology -- Database languages -- SQL -- Part 1: Framework (SQL/Framework), of COmmittee JTC1/SC32 There at least in section 5.3.4 you find the pointer, that among others, the terms CALL and RETURN are specified in ISO9075-2. So that "points", but to follow to the end ... ;-) but at least it is clear from this source, that CALL seems to be a statement in SQL to invoke a procedure or whatever name juggling suits. ... HTH, Stefan. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)
On 2013-11.06 17:28, Jon Nelson wrote: There hasn't been much activity here recently. I'm curious, then, if there are questions that I can answer. It may be useful to summarize some things here: - the purpose of the patch is to use posix_fallocate when creating new WAL files, because it's (usually) much quicker - using posix_fallocate is *one* system call versus 2048 calls to write(2) - additionally, using posix_fallocate /guarantees/ that the filesystem has space for the WAL file (by spec) - reportedly (difficult to test or prove), using posix_fallocate *may* reduce file fragmentation - the (limited) testing I've done bears this out: the more new WAL file creation there is, the more the improvement. Once the number of WAL files reaches a constant point, there does not appear to be either a positive or a negative performance impact. This is as expected. - a test program (C) was also written and used which creates, allocates, and then writes to files as fast as possible. This test program also shows the expected performance benefits. - the performance benefits range from a few percent up to about 15 percent tried the test program of the patch at least a bit. Retrieved it from: http://www.postgresql.org/message-id/attachment/29088/test_fallocate.c on an oldish linux box (Kernel 2.6.32, x86_64) following $ gcc -o test_fallocate test_fallocate.c a short $ ./test_fallocate foo 1 1 yields: without posix_fallocate: 1 open/close iterations, 1 rewrite in 26.1993s with posix_fallocate: 1 open/close iterations, 1 rewrite in 13.3299s on another box (Kernel 3.2.0, x86_64) same procedure yields: without posix_fallocate: 1 open/close iterations, 1 rewrite in 19.1972s with posix_fallocate: 1 open/close iterations, 1 rewrite in 9.9280s Note, when trying gcc -O2 test_fallocate.c fails to compile with: In file included from /usr/include/fcntl.h:252:0, from test_fallocate.c:3: In function ‘open’, inlined from ‘main’ at test_fallocate.c:68:16: /usr/include/x86_64-linux-gnu/bits/fcntl2.h:51:24: error: call to ‘__open_missing_mode’ declared with attribute error: open with O_CREAT in second argument needs 3 arguments Concerns: - some were concerned that the spec makes no claims about posix_fallocate being able to guarantee that the space allocated has zeroes in it. This was discussed here and on the Linux Kernel mailing list, wherein the expected behavior is that it does provide zeroes - most systems don't allocate a great many new WAL files, so the performance benefit is small - Benefits: - new WAL file allocate is much quicker, more efficient (fewer system calls) - the patch is (reportedly - I'm not a good judge here!) quite small HTH, Stefan. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON and unicode surrogate pairs
On 2013-06-11 15:23 CEST, Hannu Krosing wrote: On 06/11/2013 03:08 PM, Stefan Drees wrote: ... What about this: =# SELECT '{"measure":"seconds", "measure":42}'::json; json -- {"measure":42} I presume people being used to store metadata in "preceding" json object members with duplicate names, would want to decide in the client requesting the data what to do with the metadata information and at what point to "drop", wouldn't they :-?) Seems like blatant misuse of JSON format :) I assume that as JSON is _serialisation_ format, it should represent a data structure, not processing instructions. I can see no possible JavaScript structure which could produce duplicate key when serialised. ahem, JSON is a notation that allows toplevel an object or an array. If it is an object, this consists of pairs called (name, value). Here value can be any object, array, number, string or the literals null, false or true. The name must be a string. That's it :-) no key **and** also no ordering on these "name"s ;-) and as the RFC does not care, where the data came from or how it was represented before it became "JSON text" (the top-level element of a JSON document) how should the parser know ... but delta notaion, commenting, or "streaming" needs created many applications that deliver multibags and trust on some ordering conventions in their dataexchanging relations. And I don't think that any standard JSON reader supports this either. Oh yes. Convention is merely: Keep all ("Streaming") or the last (whatever the last may mean, must be carefully ensured in the interchange relation). All would like these two scenarios, but the RFC as is does not prevent an early-out (like INSERT OR IGNORE) :-)) Of you want to store any JavaScript snippets in database use text. JSON is language agnostic. I use more JSON from python, php than from js, but others do so differently ... Or perhaps pl/v8 :) Do you mean the "V8 Engine Javascript Procedural Language add-on for PostgreSQL" (http://code.google.com/p/plv8js/), I guess so. I did not want to hijack the thread, as this centered more around escaping where and what in which context (DB vs. client encoding). As the freshly created IETF json working group revamps the JSON RFC on its way to the standards track, there are currently also discussions on what to do with unicode surrogate pairs. See eg. this thread http://www.ietf.org/mail-archive/web/json/current/msg00675.html starting a summarizing effort. Just in case it helps making the fresh JSON feature of PostgreSQL bright, shining and future proof :-) Stefan. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON and unicode surrogate pairs
On 2013-06-11 12:53 CEST, Hannu Krosing wrote: On 06/11/2013 10:47 AM, Andres Freund wrote: On 2013-06-10 13:01:29 -0400, Andrew Dunstan wrote: It's legal, is it not, to just write the equivalent Unicode character in the JSON string and not use the escapes? If so I would think that that would be the most common usage. If someone's writing an escape, they probably had a reason for doing it that way, and might not appreciate our overriding their decision. We never store the converted values in the JSON object, nor do we return them from functions that return JSON. But many of the functions and operators that process the JSON have variants that return text instead of JSON, and in those cases, when the value returned is a JSON string, we do the following to it: I have just realized that the problem is actually quite a lot bigger than that. We also use this value for field name comparison. So, let us suppose that we have a LATIN1 database and a piece of JSON with a field name containing the Euro sign ("\u20ac"), a character that is not in LATIN1. Making that processable so it doesn't blow up would be mighty tricky and error prone. The non-orthogonality I suggested as a solution upthread is, by contrast, very small and easy to manage, and not terribly hard to explain - see attached. I think this all shows pretty clearly that it was a mistake allowing json data in the database that we cannot entirely display with the database's encoding. All the proposed ugly workarounds are only necessary because we don't throw an error when originally validating the json. Even in an utf-8 database you can get errors due to \u unescaping (at attribute access time, *NOT* at json_in() time) due to invalidate surrogate pairs. I think this goes countrary to normal postgres approach of validating data as strict as necessary. And I think we are going to regret not fixing this while there are still relatively few users out there. Exactly - * allow in only valid JSON. * Validate all utf8 strings for valid unicode. * have one canonic way of outputting unicode - utf8 for utf8 databases, escaped for all other encodings * If you need to store anything else, use text. Requiring preserving "original text" in json data field is Not Good! I fully expect '{"a":1, "a":none, "a":true, "a":"b"}'::json to come out as '{"a":b"}' ahem, do you mean instead to give (none -> null and missing '"' inserted in "answer"): =# SELECT '{"a":1, "a":null, "a":true, "a":"b"}'::json; json -- {"a":"b"} or only when "stored" in database and subsequently retrieved? The "original text" in this case was perfectly valid JSON text. (I know that currently this is noty true and will happen only once I read in the json value in client) Isn't this a good situation and doesn't this also depend on the storage representation in the client? What about this: =# SELECT '{"measure":"seconds", "measure":42}'::json; json -- {"measure":42} I presume people being used to store metadata in "preceding" json object members with duplicate names, would want to decide in the client requesting the data what to do with the metadata information and at what point to "drop", wouldn't they :-?) For anything else - don't use json, use any text type If you really need a simple text-validated-as-valid-json-input datatype then add this. Call it jsontext or somesuch :) All the best, Stefan. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers