Re: [HACKERS] matview incremental maintenance

2013-06-17 Thread Stefan Drees

On 2013-06-17 16:41 +02:00, Kevin Grittner wrote:

Since there seems to be interest in discussing incremental
maintenance of materialized views *now*, I'm starting this thread
to try to avoid polluting unrelated threads with the discussion.  I
don't intend to spend a lot of time on it until the CF in progress
completes, but at that point the work will start in earnest.  So
I'll say where I'm at, and welcome anyone who has time to spare
outside of the CF to comment or contribute ideas.

The paper at the core of the discussion can be found by searching
for "maintaining views incrementally gupta mumick subrahmanian" --
it's on both the ACM and CiteSeerX websites.  Of course, one


i.e.
Ashish Gupta, Inderpal Singh Mumick, and V. S. Subrahmanian. 1993. 
Maintaining views incrementally. In Proceedings of the 1993 ACM SIGMOD 
international conference on Management of data (SIGMOD '93), Peter 
Buneman and Sushil Jajodia (Eds.). ACM, New York, NY, USA, 157-166. 
DOI=10.1145/170035.170066 http://doi.acm.org/10.1145/170035.170066


just in case a direct reference might come in handy :-)


...


All the ebst,
Stefan.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hard to Use WAS: Hard limit on WAL space

2013-06-15 Thread Stefan Drees

On 2013-06-15 08:44 CEST, Brendan Jurd wrote:

On 15 June 2013 16:18, Craig Ringer ... wrote:

On 06/15/2013 02:08 PM, Brendan Jurd wrote:

On 15 June 2013 14:43, Craig Ringer ... wrote:

The #1 question I see on Stack Overflow has to be confusion about
pg_hba.conf, mostly from people who have no idea it exists, don't understand
how to configure it, etc.

The totally non-obvious name of the file probably has something to do
with that.  It should be called 'auth.conf'.

Not convinced; since it only controls one facet of auth - it doesn't
define users, passwords, grants, etc ...


When somebody is setting up postgres for the first time, and they list
the contents of the config directory, you want them to have some idea


as they may not have read up to section 19.1 The pg_hba.conf File inside 
chapter 19 Client Authentication of part III. Server Administration :-?, 
which states (as of 9.2.4):

"""
Client authentication is controlled by a configuration file, which 
traditionally is named pg_hba.conf and is stored in the database 
cluster's data directory. (HBA stands for host-based authentication.) A 
default pg_hba.conf file is installed when the data directory is 
initialized by initdb. It is possible to place the authentication 
configuration file elsewhere, however; see the hba_file configuration 
parameter. ...

"""

;-) thanks to hyperlinks this is quite close to the start, but I was 
surprised to not find it by skimming the text and following the 
hyperlinks but by knowing the filename instead and entering it 
("pg_hba.conf") into the Search Documentation text field on the top 
right corner of http://www.postgresql.org/docs/9.2/interactive/index.html.


Maybe we could find a better place of the whatever-then-name inside the 
part of the docs even the "TL;DR" mood people might read? A paragraph or 
two spiced up with some catchy StackOverflow-inspired terms people with 
a need to configure this authentication aspect might have expected could 
also be expected in INSTALL like docs or directly observable on the 
hyperlinked way from part I. Tutorial chapter 1 Getting Started section 
1.1 Installation all down to chapter 15. Installation from Source Code. 
But of course only, if this is "wanted behavior".


If I read the section 1.1 Installation (again 9.2.4) I have the 
impression, that it more transports the message in our case, that "you 
are the site admin, deal with it, read the docs", or don't I read it 
right? (I am a non-native English reader)



what each of the files is for.  If they see something called
'auth.conf', they'll get the right general idea.  An understanding of
the nuances (like that it doesn't control user accounts) will come
once they open up the file -- which they may well do, because it is
called 'auth.conf', and 'auth' is a thing you want to configure.


that may well be, I do not know, how people that prefer reading folder 
and filenames over manuals written for them grok text, as I read the 
docs, promised ;-)



If they see something called 'pg_hba.conf', they may very reasonably
assume that it is some internal/advanced stuff that they don't need to
worry about just yet, because what the heck is a 'pg_hba'?  The 'pg'
is unnecessary and the 'hba' is an internal jargon term that we've
ill-advisedly allowed to leak out into the filename.


at around 1995 when I started using Postgres95 it sure took some time to 
find that pg_hba.conf file, but I then perceived it to be very well 
documented, and also felt a bit guilty, as it's name occured in the 
INSTALL file cf. 
ftp://ftp-archives.postgresql.org/pub/source/v7.2/postgresql-7.2.tar.gz 
and the INSTALL file. Therein "burried" inside Step 1 of "If You Are 
Upgrading" ...



If you really feel that 'auth.conf' is too imprecise, maybe something
like 'conn-auth.conf' would be more your style.


I think you guys did and still do a fantastic job with PostgreSQL and 
eps. it's documentation, but in this case I doubt, that any renaming of 
config files will really have an impact on usability in the shady area 
of "TL;DR" - at least for the next twenty years or so - as it still 
holds, that from a false start (eg. not reading documentation written) 
anything may follow.


But as usability is a practical concern I (as a user) would be +0 on 
renaming it if people not finding it bearing the old name, but then 
editing it is really wanted behavior.


All the best,
Stefan.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-06-11 Thread Stefan Drees

On 2013-06-11 19:45 CEST, Greg Smith wrote:

On 6/11/13 12:22 PM, Merlin Moncure wrote:


Personally I think this patch should go in regardless -- the concerns
made IMNSHO are specious.


That's nice, but we have this process for validating whether features go
in or not that relies on review instead of opinions.

;-) that's why I played with the test_fallocate.c, as it was easy to do 
and I understood, the author (of the patch) wanted to trigger some 
reviews ... I do not (yet) know anything about the core codes, so I 
leave this to the hackers. My review result was, that with newer gcc's 
you should specify an open mode flag as third argument of the fopen 
call, as only with the test tool nothing important found.


Stefan.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DO ... RETURNING

2013-06-11 Thread Stefan Drees

On 2013-06-11 19:01 CEST, Hannu Krosing wrote:

On 06/11/2013 05:27 PM, Merlin Moncure wrote:

On Tue, Jun 11, 2013 at 9:45 AM, Stephen Frost ... wrote:

* Merlin Moncure ... wrote:

I agree with all your comments pretty much down the line.  Need top
level CALL that supports parameterization and multiple sets that
utilizes background worker (we have example spi worker that gives some
hints about how pl/pgsql could be made to work).  Because it's top
level (can't even be inlined to CTE), we can access behaviors that are
not possible in current pl/pgsql, for example setting transaction
isolation in advance of snapshot and changing database connection
mid-procedure.

And this still has next-to-nothing to do with the specific proposal that
was put forward.


It's a complete feature but completely relevant to the discussion --
the behaviors have a lot of overlap and CALL is in the standard
whereas the ad hoc feature DO isn't.

Could you point to the ISO/ANSI SQL CALL definition ?


On the publicly available standards page of ISO:

http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html

searching in it for SQL shows link of "ISO/IEC 9075-1:2008":

http://standards.iso.org/ittf/PubliclyAvailableStandards/c045498_ISO_IEC_9075-1_2008.zip

when you follow and go through:

http://standards.iso.org/ittf/licence.html

i.e. accept the granted license, you may receive a personal single copy 
non-distributable PDF version of


ISO/IEC 9075-1:2008, 3rd Edition, Information technology -- Database 
languages -- SQL -- Part 1: Framework (SQL/Framework), of COmmittee 
JTC1/SC32


There at least in section 5.3.4 you find the pointer, that among others, 
the terms CALL and RETURN are specified in ISO9075-2.


So that "points", but to follow to the end ... ;-)

but at least it is clear from this source, that CALL seems to be a 
statement in SQL to invoke a procedure or whatever name juggling suits.



...


HTH,
Stefan.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-06-11 Thread Stefan Drees

On 2013-11.06 17:28, Jon Nelson wrote:

There hasn't been much activity here recently.  I'm curious, then, if
there are questions that I can answer.
It may be useful to summarize some things here:

- the purpose of the patch is to use posix_fallocate when creating new
WAL files, because it's (usually) much quicker
- using posix_fallocate is *one* system call versus 2048 calls to write(2)
- additionally, using posix_fallocate /guarantees/ that the filesystem
has space for the WAL file (by spec)
- reportedly (difficult to test or prove), using posix_fallocate *may*
reduce file fragmentation
- the (limited) testing I've done bears this out: the more new WAL
file creation there is, the more the improvement. Once the number of
WAL files reaches a constant point, there does not appear to be either
a positive or a negative performance impact. This is as expected.
- a test program (C) was also written and used which creates,
allocates, and then writes to files as fast as possible. This test
program also shows the expected performance benefits.
- the performance benefits range from a few percent up to about 15 percent


tried the test program of the patch at least a bit.

Retrieved it from:
http://www.postgresql.org/message-id/attachment/29088/test_fallocate.c

on an oldish linux box (Kernel 2.6.32, x86_64) following
$ gcc -o test_fallocate test_fallocate.c
a short
$ ./test_fallocate foo 1 1
yields:
without posix_fallocate: 1 open/close iterations, 1 rewrite in 26.1993s
with posix_fallocate: 1 open/close iterations, 1 rewrite in 13.3299s

on another box (Kernel 3.2.0, x86_64) same procedure yields:
without posix_fallocate: 1 open/close iterations, 1 rewrite in 19.1972s
with posix_fallocate: 1 open/close iterations, 1 rewrite in 9.9280s

Note, when trying gcc -O2 test_fallocate.c fails to compile with:

In file included from /usr/include/fcntl.h:252:0,
 from test_fallocate.c:3:
In function ‘open’,
inlined from ‘main’ at test_fallocate.c:68:16:
/usr/include/x86_64-linux-gnu/bits/fcntl2.h:51:24: error: call to 
‘__open_missing_mode’ declared with attribute error: open with O_CREAT 
in second argument needs 3 arguments




Concerns:
- some were concerned that the spec makes no claims about
posix_fallocate being able to guarantee that the space allocated has
zeroes in it. This was discussed here and on the Linux Kernel mailing
list, wherein the expected behavior is that it does provide zeroes
- most systems don't allocate a great many new WAL files, so the
performance benefit is small
- 

Benefits:
- new WAL file allocate is much quicker, more efficient (fewer system calls)
- the patch is (reportedly - I'm not a good judge here!) quite small


HTH,
Stefan.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-11 Thread Stefan Drees

On 2013-06-11 15:23 CEST, Hannu Krosing wrote:

On 06/11/2013 03:08 PM, Stefan Drees wrote:

...

What about this:
=# SELECT '{"measure":"seconds", "measure":42}'::json;
  json
--
  {"measure":42}

I presume people being used to store metadata in "preceding" json
object members with duplicate names, would want to decide in the
client requesting the data what to do with the metadata information
and at what point to "drop", wouldn't they :-?)

Seems like blatant misuse of JSON format :)

I assume that as JSON is _serialisation_ format, it should represent a
data structure, not processing instructions.

I can see no possible JavaScript structure which could produce duplicate
key when serialised.


ahem, JSON is a notation that allows toplevel an object or an array.
If it is an object, this consists of pairs called (name, value).
Here value can be any object, array, number, string or the literals 
null, false or true.
The name must be a string. That's it :-) no key **and** also no ordering 
on these "name"s ;-) and as the RFC does not care, where the data came 
from or how it was represented before it became "JSON text" (the 
top-level element of a JSON document) how should the parser know
 ... but delta notaion, commenting, or "streaming" needs created many 
applications that deliver multibags and trust on some ordering 
conventions in their dataexchanging relations.



And I don't think that any standard JSON reader supports this either.


Oh yes. Convention is merely: Keep all ("Streaming") or the last 
(whatever the last may mean, must be carefully ensured in the 
interchange relation).
All would like these two scenarios, but the RFC as is does not prevent 
an early-out (like INSERT OR IGNORE) :-))



Of you want to store any JavaScript snippets in database use text.


JSON is language agnostic. I use more JSON from python, php than from 
js, but others do so differently ...



Or perhaps pl/v8 :)



Do you mean the "V8 Engine Javascript Procedural Language add-on for 
PostgreSQL" (http://code.google.com/p/plv8js/), I guess so.


I did not want to hijack the thread, as this centered more around 
escaping where and what in which context (DB vs. client encoding).


As the freshly created IETF json working group revamps the JSON RFC on 
its way to the standards track, there are currently also discussions on 
what to do with unicode surrogate pairs. See eg. this thread 
http://www.ietf.org/mail-archive/web/json/current/msg00675.html starting 
a summarizing effort.


Just in case it helps making the fresh JSON feature of PostgreSQL 
bright, shining and future proof :-)


Stefan.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-11 Thread Stefan Drees

On 2013-06-11 12:53 CEST, Hannu Krosing wrote:

On 06/11/2013 10:47 AM, Andres Freund wrote:

On 2013-06-10 13:01:29 -0400, Andrew Dunstan wrote:

It's legal, is it not, to just write the equivalent Unicode character in
the JSON string and not use the escapes?  If so I would think that that
would be the most common usage.  If someone's writing an escape, they
probably had a reason for doing it that way, and might not appreciate
our overriding their decision.

We never store the converted values in the JSON object, nor do we return
them from functions that return JSON. But many of the functions and
operators that process the JSON have variants that return text instead of
JSON, and in those cases, when the value returned is a JSON string, we do
the following to it:

I have just realized that the problem is actually quite a lot bigger than
that. We also use this value for field name comparison. So, let us suppose
that we have a LATIN1 database and a piece of JSON with a field name
containing the Euro sign ("\u20ac"), a character that is not in LATIN1.
Making that processable so it doesn't blow up would be mighty tricky and
error prone. The non-orthogonality I suggested as a solution upthread is, by
contrast, very small and easy to manage, and not terribly hard to explain -
see attached.

I think this all shows pretty clearly that it was a mistake allowing
json data in the database that we cannot entirely display with the
database's encoding. All the proposed ugly workarounds are only
necessary because we don't throw an error when originally validating the
json.
Even in an utf-8 database you can get errors due to \u unescaping (at
attribute access time, *NOT* at json_in() time) due to invalidate
surrogate pairs.

I think this goes countrary to normal postgres approach of validating
data as strict as necessary. And I think we are going to regret not
fixing this while there are still relatively few users out there.

Exactly -

  * allow in only valid JSON.
  * Validate all utf8 strings for valid unicode.
  * have one canonic way of outputting unicode - utf8 for utf8 databases,
escaped for all other encodings
  * If you need to store anything else, use text.

Requiring preserving "original text" in json data field is Not Good!

I fully expect '{"a":1, "a":none, "a":true, "a":"b"}'::json to come out
as '{"a":b"}'


ahem, do you mean instead to give (none -> null and missing '"' inserted 
in "answer"):

=# SELECT '{"a":1, "a":null, "a":true, "a":"b"}'::json;
 json
--
 {"a":"b"}

or only when "stored" in database and subsequently retrieved? The 
"original text" in this case was perfectly valid JSON text.



(I know that currently this is noty true and will happen only once I
read in the json value in client)


Isn't this a good situation and doesn't this also depend on the storage 
representation in the client?


What about this:
=# SELECT '{"measure":"seconds", "measure":42}'::json;
 json
--
 {"measure":42}

I presume people being used to store metadata in "preceding" json object 
members with duplicate names, would want to decide in the client 
requesting the data what to do with the metadata information and at what 
point to "drop", wouldn't they :-?)



For anything else - don't use json, use any text type

If you really need a simple text-validated-as-valid-json-input datatype
then add this.

Call it jsontext or somesuch :)



All the best,
Stefan.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers