Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-07-13 Thread Shulgin, Oleksandr
> Yes, but I think the plugin is the right place to do it. What is more,
this won't actually prevent you completely from producing non-ECMAScript
compliant JSON, since json or jsonb values containing offending numerics
won't be caught, AIUI.

Ah, that's a good catch indeed.

> But a fairly simple to write function that reparsed and fixed the JSON
inside the decoder would work.

Need to rethink this, but reparsing was never my favorite option here. :-)

--
Alex


Re: [HACKERS] TABLESAMPLE doesn't actually satisfy the SQL spec, does it?

2015-07-13 Thread Simon Riggs
On 12 July 2015 at 18:50, Tom Lane  wrote:

> Robert Haas  writes:
> > On Sun, Jul 12, 2015 at 12:02 PM, Tom Lane  wrote:
> >> As best I can tell (evidence below), the SQL standard requires that if a
> >> single query reads a table with a TABLESAMPLE clause multiple times
> (say,
> >> because it's on the inside of a nestloop), then the exact same set of
> >> sampled rows are returned each time.
>
> > Hmm, I tend to agree that it would be good if it behaved that way.
> > Otherwise, it seems like the behavior could be quite surprising.
>
> Yeah.  As a concrete example, consider
>
> select * from t1, t2 tablesample ... where t1.x = t2.x
>
> and suppose that there are multiple occurences of x = 10 in both tables.
> As things stand, if the join is done as a nestloop then a particular t2
> row with x = 10 might appear in the output joined with some of the t1 rows
> with x = 10 but not with others.  On the other hand, the results of a hash
> join would not be inconsistent in that way, since t2 would be read only
> once.
>

Hmm, a non-key join to a sampled table. What would the meaning of such a
query be? The table would need to big enough to experience updates and also
be under current update activity. BERNOULLI isn't likely to have many users
because it is so slow. So overall, such a query is not useful and as such
unlikely.

The mechanism of sampling was discussed heavily before and there wasn't an
approach that met all of the goals: IIRC we would need to test visibility
twice on each tuple to get around these problems. Given that users of
TABLESAMPLE have already explicitly stated their preference for speed over
accuracy, minor tweaks to handle corner cases don't seem warranted.

If you have a simple, better way I would not object. Forgive me, I haven't
yet understood your proposal about sampling rule above.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22

2015-07-13 Thread Noah Misch
On Mon, Jul 13, 2015 at 06:19:49PM -0400, Andrew Dunstan wrote:
> >>>On 7/13/2015 5:36 PM, Andrew Dunstan wrote:
> hstore_plpython.o: In function `hstore_to_plpython':
> /home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:35:
> undefined reference to `PLyUnicode_FromStringAndSize'

> The functions are in fact apparently built - the names are present in the
> object file and the DLL.

Per "objdump -x src/pl/plpython/plpython3.dll | less -pOrdinal/Name", those
symbols aren't exported.  The Cygwin toolchain appears to export every
function until you mark one with __declspec (dllexport), after which it
exports just the ones so marked.  Since plpython3.dll has an explicit
__declspec on PyInit_plpy(), that's the sole function exported.  Adding
-Wl,--export-all-symbols to the PL/Python link lets the build complete; I have
not researched whether it is a proper fix.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Forensic recovery deleted pgdump custom format file

2015-07-13 Thread Michael Paquier
On Tue, Jul 14, 2015 at 11:20 AM, David Guimaraes  wrote:
> Yeah bingo

Hm. While there is a magic-code header for the custom format, by
looking at the code I am not seeing any traces of a similar thing at
the end of the dump file (_CloseArchive in pg_backup_custom.c), and I
don't recall wither that there is an estimation of the size of the
dump either in the header. If those files were stored close to each
other, one idea may be to look for the next header present. or to
attempt to roughly estimate the size that they would have I am afraid.
In any case, applying reverse engineering methods seems like the most
reliable method to reconstitute an archive handler that could be used
by pg_restore or pg_dump, but perhaps others have other ideas.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Forensic recovery deleted pgdump custom format file

2015-07-13 Thread David Guimaraes
Yeah bingo
Em 13/07/2015 22:11, "Michael Paquier"  escreveu:

> On Tue, Jul 14, 2015 at 10:58 AM, David Guimaraes 
> wrote:
> > The backups were deleted. I need them to use pg_restore.
>
> So what you mean is that you are looking at your disk at FS level to
> find traces of those deleted backups by analyzing their binary
> format... Am I missing something?
> --
> Michael
>


Re: [HACKERS] Forensic recovery deleted pgdump custom format file

2015-07-13 Thread Michael Paquier
On Tue, Jul 14, 2015 at 10:58 AM, David Guimaraes  wrote:
> The backups were deleted. I need them to use pg_restore.

So what you mean is that you are looking at your disk at FS level to
find traces of those deleted backups by analyzing their binary
format... Am I missing something?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Forensic recovery deleted pgdump custom format file

2015-07-13 Thread David Guimaraes
The backups were deleted. I need them to use pg_restore.
Em 13/07/2015 21:18, "Michael Paquier"  escreveu:

> On Tue, Jul 14, 2015 at 9:28 AM, David Guimaraes 
> wrote:
> > So I decided to try to understand the file format generated by
> > pgdump. Analyzing the source code of pgdump/recovery, i concluded a few
> > things:
> >
> > The header of the file always starts with "PGDMP" followed by pgdump
> version
> > number used, followed by int size, offset, etc. followed by TOCs.
> >
> > My question is how to know the end of the file? Are there any signature
> that
> > I can use? Or would have to analyze the whole file?
>
> Why are you trying to reinvent the wheel? pg_restore is not available?
> --
> Michael
>


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-13 Thread Fujii Masao
On Tue, Jul 14, 2015 at 9:00 AM, Michael Paquier
 wrote:
> On Mon, Jul 13, 2015 at 10:34 PM, Fujii Masao wrote:
>> On Fri, Jul 10, 2015 at 10:06 PM, Beena Emerson wrote:
>>> On Tue, Jul 7, 2015 at 2:19 PM, Michael Paquier wrote:
>>>
 Something like pg_syncinfo/ coupled with a LW lock, we already do
 something similar for replication slots with pg_replslot/.
>>>
>>> I was trying to figure out how the JSON metadata can be used.
>>> It would have to be set using a given set of functions.
>>
>> So we can use only such a set of functions to configure synch rep?
>> I don't like that idea. Because it prevents us from configuring that
>> while the server is not running.
>
> If you store a json blob in a set of files of PGDATA you could update
> them manually there as well. That's perhaps re-inventing the wheel
> with what is available with GUCs though.

Why don't we just use GUC? If the quorum setting is not so complicated
in real scenario, GUC seems enough for that.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22

2015-07-13 Thread Andrew Dunstan


On 07/13/2015 07:55 PM, Michael Paquier wrote:

On Tue, Jul 14, 2015 at 8:49 AM, Andrew Dunstan  wrote:

On 07/13/2015 07:33 PM, Tom Lane wrote:

Andrew Dunstan  writes:

Not AFAICT. Here is the contrib build:

Hm ... what does -DUSE_DL_IMPORT do?

NFI - I tried building with that but it made no difference.

Is your python3 build a 32b version?


No, this is all 64 bit.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Forensic recovery deleted pgdump custom format file

2015-07-13 Thread Michael Paquier
On Tue, Jul 14, 2015 at 9:28 AM, David Guimaraes  wrote:
> So I decided to try to understand the file format generated by
> pgdump. Analyzing the source code of pgdump/recovery, i concluded a few
> things:
>
> The header of the file always starts with "PGDMP" followed by pgdump version
> number used, followed by int size, offset, etc. followed by TOCs.
>
> My question is how to know the end of the file? Are there any signature that
> I can use? Or would have to analyze the whole file?

Why are you trying to reinvent the wheel? pg_restore is not available?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Forensic recovery deleted pgdump custom format file

2015-07-13 Thread David Guimaraes
Hello. I need some help.

I have the following situation. My client deleted a number of old backups
from a drive disc made by PGDUMP with custom flag activated. I could not
find any program to recover backup files made by PGDUMP of customized /
binary form. So I decided to try to understand the file format generated by
pgdump. Analyzing the source code of pgdump/recovery, i concluded a few
things:

The header of the file always starts with "PGDMP" followed by pgdump
version number used, followed by int size, offset, etc. followed by TOCs.

My question is how to know the end of the file? Are there any signature
that I can use? Or would have to analyze the whole file?

Thank u.

-- 
David


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-13 Thread Michael Paquier
On Mon, Jul 13, 2015 at 10:34 PM, Fujii Masao wrote:
> On Fri, Jul 10, 2015 at 10:06 PM, Beena Emerson wrote:
>> On Tue, Jul 7, 2015 at 2:19 PM, Michael Paquier wrote:
>>
>>> Something like pg_syncinfo/ coupled with a LW lock, we already do
>>> something similar for replication slots with pg_replslot/.
>>
>> I was trying to figure out how the JSON metadata can be used.
>> It would have to be set using a given set of functions.
>
> So we can use only such a set of functions to configure synch rep?
> I don't like that idea. Because it prevents us from configuring that
> while the server is not running.

If you store a json blob in a set of files of PGDATA you could update
them manually there as well. That's perhaps re-inventing the wheel
with what is available with GUCs though.

>> Is JSON being preferred because it would be ALTER SYSTEM friendly and in a
>> format already known to users?
>
> At least currently ALTER SYSTEM cannot accept the JSON data
> (e.g., the return value of JSON function like json_build_object())
> as the setting value. So I'm not sure how friendly ALTER SYSTEM
> and JSON format really. If you want to argue that, probably you
> need to improve ALTER SYSTEM so that JSON can be specified.
>
>> In a real-life scenario, at most how many groups and nesting would be
>> expected?
>
> I don't think that many groups and nestings are common.

Yeah, in most common configurations people are not going to have more
than 3 groups with only one level of nodes.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22

2015-07-13 Thread Michael Paquier
On Tue, Jul 14, 2015 at 8:49 AM, Andrew Dunstan  wrote:
>
> On 07/13/2015 07:33 PM, Tom Lane wrote:
>>
>> Andrew Dunstan  writes:
>>>
>>> Not AFAICT. Here is the contrib build:
>>
>> Hm ... what does -DUSE_DL_IMPORT do?
>
> NFI - I tried building with that but it made no difference.

Is your python3 build a 32b version?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22

2015-07-13 Thread Andrew Dunstan


On 07/13/2015 07:33 PM, Tom Lane wrote:

Andrew Dunstan  writes:

Not AFAICT. Here is the contrib build:

Hm ... what does -DUSE_DL_IMPORT do?




NFI - I tried building with that but it made no difference.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22

2015-07-13 Thread Tom Lane
Andrew Dunstan  writes:
> Not AFAICT. Here is the contrib build:

Hm ... what does -DUSE_DL_IMPORT do?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-13 Thread Simon Riggs
On 13 July 2015 at 17:00, Tom Lane  wrote:

> I wrote:
> > TBH, I think the right thing to do at this point is to revert the entire
> > patch and send it back for ground-up rework.  I think the high-level
> > design is wrong in many ways and I have about zero confidence in most
> > of the code details as well.
>
> > I'll send a separate message about high-level issues,
>
> And here's that.  I do *not* claim that this is a complete list of design
> issues with the patch, it's just things I happened to notice in the amount
> of time I've spent so far (which is already way more than I wanted to
> spend on TABLESAMPLE right now).
>

Interesting that a time-based sample does indeed yield useful results. Good
to know.

That is exactly what this patch provides: a time-based sample, with reduced
confidence in the accuracy of the result. And other things too.


> I'm not sure that we need an extensible sampling-method capability at all,
> let alone that an API for that needs to be the primary focus of a patch.
> Certainly the offered examples of extension modules aren't inspiring:
> tsm_system_time is broken by design (more about that below) and nobody
> would miss tsm_system_rows if it weren't there.


Time based sampling isn't broken by design. It works exactly according to
the design.

Time-based sampling has been specifically requested by data visualization
developers who are trying to work out how to display anything useful from a
database beyond a certain size. The feature for PostgreSQL implements a
similar mechanism to that deployed already with BlinkDB, demonstrated at
VLDB 2012. I regard it as an Advanced feature worthy of deployment within
PostgreSQL, based upon user request.

Row based sampling offers the capability to retrieve a sample of a fixed
size however big the data set. I shouldn't need to point out that this is
already in use within the ANALYZE command. Since it implements a technique
already in use within PostgreSQL, I see no reason not to expose that to
users. As I'm sure you're aware, there is rigorous math backing up the use
of fixed size sampling, with recent developments from my research
colleagues at Manchester University that further emphasises the usefulness
of this feature for us.


> What is far more
> important is to get sampling capability into indexscans, and designing
> an API ahead of time seems like mostly a distraction from that.
>

This has come up as a blocker on TABLESAMPLE before. I've got no evidence
to show anyone cares about that. I can't imagine a use for it myself; it
was not omitted from this patch because its difficult, it was omitted
because its just useless. If anyone ever cares, they can add it in a later
release.


> I'd think seriously about tossing the entire executor-stage part of the
> patch, creating a stateless (history-independent) sampling filter that
> just accepts or rejects TIDs, and sticking calls to that into all the
> table scan node types.  Once you had something like that working well
> it might be worth considering whether to try to expose an API to
> generalize it.  But even then it's not clear that we really need any
> more options than true-Bernoulli and block-level sampling.
>

See above.


> The IBM paper I linked to in the other thread mentions that their
> optimizer will sometimes choose to do Bernoulli sampling even if SYSTEM
> was requested.


That sounds broken to me. This patch gives the user what the user asks for.
BERNOULLI or SYSTEM.

If you particularly like the idea of mixing the two sampling methods, you
can do so *because* the sampling method API is extensible for the user. So
Mr.DB2 can get it his way if he likes and he can call it SYSNOULLI if he
likes; others can get it the way they like also. No argument required.

It was very, very obvious that whatever sampling code anybody dreamt up
would be objected to. Clearly, we need a way to allow people to implement
whatever technique they wish. Stratified sampling, modified sampling, new
techniques. Whatever.


> Probably that happens when it decides to do a simple
> indexscan, because then there's no advantage to trying to cluster the
> sampled rows.  But in the case of a bitmap scan, you could very easily do
> either true Bernoulli or block-level sampling simply by adjusting the
> rules about which bits you keep or clear in the bitmap (given that you
> apply the filter between collection of the index bitmap and accessing the
> heap, which seems natural).  The only case where a special scan type
> really seems to be needed is if you want block-level sampling, the query
> would otherwise use a seqscan, *and* the sampling percentage is pretty low
> --- if you'd be reading every second or third block anyway, you're likely
> better off with a plain seqscan so that the kernel sees sequential access
> and does prefetching.  The current API doesn't seem to make it possible to
> switch between seqscan and read-only-selected-blocks based on the sampling
> percentage, but I think that coul

Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22

2015-07-13 Thread Andrew Dunstan


On 07/13/2015 05:18 PM, Tom Lane wrote:

Andrew Dunstan  writes:

On 07/13/2015 11:53 AM, Marco Atzeri wrote:

On 7/13/2015 5:36 PM, Andrew Dunstan wrote:

hstore_plpython.o: In function `hstore_to_plpython':
/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:35:
undefined reference to `PLyUnicode_FromStringAndSize'

No this doesn't seem to be the problem. For some reason it's apparently
not finding the symbol in plpython3.dll, where it should definitely
exist, unless I'm completely off base. So I'm rather confused.

Could hstore_plpython and plpython somehow have been built with different
ideas about PY_MAJOR_VERSION?  PLyUnicode_FromStringAndSize is
conditionally compiled, and the reference to it from hstore_plpython
depends on a conditionally-defined macro, and this error would make plenty
of sense if those conditions somehow diverged.  So I'd look for instance
at whether identical -I paths were used in both parts of the build.




Not AFAICT. Here is the contrib build:

   ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
   -Wdeclaration-after-statement -Wendif-labels
   -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
   -fwrapv -fexcess-precision=standard -g -O2 -I../../src/pl/plpython
   -I/usr/include/python3.4m -I../../contrib/hstore -I. -I.
   -I../../src/include -I/usr/include/libxml2   -c -o hstore_plpython.o
   hstore_plpython.c
   ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
   -Wdeclaration-after-statement -Wendif-labels
   -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
   -fwrapv -fexcess-precision=standard -g -O2   -shared -o
   hstore_plpython3.dll  hstore_plpython.o  -L../../src/port
   -L../../src/common -Wl,--allow-multiple-definition
   -Wl,--enable-auto-import -L/usr/lib  -L/usr/local/lib
   -Wl,--as-needed   -L../../src/backend -lpostgres -L../hstore
   -lhstore -L../../src/pl/plpython -lplpython3
   -L/usr/lib/python3.4/config-3.4m -lpython3.4m -lpgcommon -lpgport
   -lxslt -lxml2 -lssl -lcrypto -lz -lreadline -lcrypt

and here is the plpython build:

   ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
   -Wdeclaration-after-statement -Wendif-labels
   -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
   -fwrapv -fexcess-precision=standard -g -O2  -I. -I.
   -I/usr/include/python3.4m -I../../../src/include
   -I/usr/include/libxml2  -DUSE_DL_IMPORT  -c -o plpy_util.o plpy_util.c
   ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
   -Wdeclaration-after-statement -Wendif-labels
   -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
   -fwrapv -fexcess-precision=standard -g -O2   -shared -o
   plpython3.dll  plpy_cursorobject.o plpy_elog.o plpy_exec.o
   plpy_main.o plpy_planobject.o plpy_plpymodule.o plpy_procedure.o
   plpy_resultobject.o plpy_spi.o plpy_subxactobject.o plpy_typeio.o
   plpy_util.o  -L../../../src/port -L../../../src/common
   -Wl,--allow-multiple-definition -Wl,--enable-auto-import -L/usr/lib 
   -L/usr/local/lib -Wl,--as-needed -L/usr/lib/python3.4/config-3.4m

   -lpython3.4m -lintl -ldl -L../../../src/backend -lpostgres
   -lpgcommon -lpgport -lxslt -lxml2 -lssl -lcrypto -lz -lreadline -lcrypt


The functions are in fact apparently built - the names are present in 
the object file and the DLL.


cheers

andrew



regards, tom lane





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22

2015-07-13 Thread Tom Lane
Andrew Dunstan  writes:
> On 07/13/2015 11:53 AM, Marco Atzeri wrote:
>> On 7/13/2015 5:36 PM, Andrew Dunstan wrote:
>>> hstore_plpython.o: In function `hstore_to_plpython':
>>> /home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:35:
>>>  
>>> undefined reference to `PLyUnicode_FromStringAndSize'

> No this doesn't seem to be the problem. For some reason it's apparently 
> not finding the symbol in plpython3.dll, where it should definitely 
> exist, unless I'm completely off base. So I'm rather confused.

Could hstore_plpython and plpython somehow have been built with different
ideas about PY_MAJOR_VERSION?  PLyUnicode_FromStringAndSize is
conditionally compiled, and the reference to it from hstore_plpython
depends on a conditionally-defined macro, and this error would make plenty
of sense if those conditions somehow diverged.  So I'd look for instance
at whether identical -I paths were used in both parts of the build.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [DESIGN] Incremental checksums

2015-07-13 Thread Andres Freund
On 2015-07-13 15:50:44 -0500, Jim Nasby wrote:
> Another possibility is some kind of a page-level indicator of what binary
> format is in use on a given page. For checksums maybe a single bit would
> suffice (indicating that you should verify the page checksum). Another use
> case is using this to finally ditch all the old VACUUM FULL code in
> HeapTupleSatisfies*().

That's a bad idea, because that bit then'd not be protected by the
checksum.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SQL function to report log message

2015-07-13 Thread Tom Lane
Jim Nasby  writes:
> On 7/13/15 3:39 PM, dinesh kumar wrote:
>> Ah. It's' my bad interpretation. Let me work on it, and will send a new
>> patch as a wrapper sql function for ereport.

> You might want to present a plan for that; it's not as trivial as it 
> sounds due to how ereport works. In particular, I'd want to see (at 
> minimum) the same functionality that plpgsql's RAISE command now 
> provides (errdetail, errhint, etc).

The real question is why the existing functionality in plpgsql isn't
sufficient.  Somebody who wants a "log from SQL" function can easily
write a simple plpgsql function that does exactly what they want,
with no more or fewer bells-n-whistles than they need.  If we try
to create a SQL function that does all that, it's likely to be a mess
to use, even with named arguments.

I'm not necessarily against the basic idea, but I think inventing
something that actually offers an increment in usability compared
to the existing alternative is going to be harder than it sounds.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22

2015-07-13 Thread Andrew Dunstan


On 07/13/2015 11:53 AM, Marco Atzeri wrote:

On 7/13/2015 5:36 PM, Andrew Dunstan wrote:


On 07/12/2015 05:06 PM, Tom Lane wrote:

Andrew Dunstan  writes:

On 07/04/2015 11:02 AM, Tom Lane wrote:

It's not apparent to me how that works at all.

BTW, the .a files being linked to above are not like Unix .a static
archives - they are import library files, which I think they are only
used at link time, not run time. The path to the DLLs isn't being
hardcoded.
Ah, I see.  So then what Marco is suggesting is copying a coding 
pattern

that does work.

Unless there is further argument I propose to commit something very 
like
Marco's suggestion for hstore_plperl, hstore_plpython and 
ltree_plpython

No objection here.




OK, I tried the attached patch.

but when trying to build with python 3 I get this (no such problems with
python2, which builds and tests fine):

make -C hstore_plpython all
make[1]: Entering directory
'/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython'
ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -g -O2 -I../../src/pl/plpython
-I/usr/include/python3.4m -I../../contrib/hstore -I. -I.
-I../../src/include -I/usr/include/libxml2   -c -o hstore_plpython.o
hstore_plpython.c
ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -g -O2   -shared -o
hstore_plpython3.dll  hstore_plpython.o  -L../../src/port
-L../../src/common -Wl,--allow-multiple-definition
-Wl,--enable-auto-import -L/usr/lib  -L/usr/local/lib
-Wl,--as-needed   -L../../src/backend -lpostgres -L../hstore
-lhstore -L../../src/pl/plpython -lplpython3
-L/usr/lib/python3.4/config-3.4m -lpython3.4m -lpgcommon -lpgport
-lxslt -lxml2 -lssl -lcrypto -lz -lreadline -lcrypt
hstore_plpython.o: In function `hstore_to_plpython':

/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:35: 



undefined reference to `PLyUnicode_FromStringAndSize'


[cut]

I'd like to get that fixed before applying anything. Marco, any ideas?

To build with python 3, set the environment like this:
PYTHON=/usr/bin/python3 - this can be done in the config file.


I am only building with python2, but on cygwin
there is an additional "intl" library for python3 binding

$ python2-config --libs
-lpython2.7 -ldl

$ python3-config --libs
-lpython3.4m -lintl -ldl



No this doesn't seem to be the problem. For some reason it's apparently 
not finding the symbol in plpython3.dll, where it should definitely 
exist, unless I'm completely off base. So I'm rather confused.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [DESIGN] Incremental checksums

2015-07-13 Thread David Christensen

> On Jul 13, 2015, at 3:50 PM, Jim Nasby  wrote:
> 
> On 7/13/15 3:26 PM, David Christensen wrote:
>> * Incremental Checksums
>> 
>> PostgreSQL users should have a way up upgrading their cluster to use data 
>> checksums without having to do a costly pg_dump/pg_restore; in particular, 
>> checksums should be able to be enabled/disabled at will, with the database 
>> enforcing the logic of whether the pages considered for a given database are 
>> valid.
>> 
>> Considered approaches for this are having additional flags to pg_upgrade to 
>> set up the new cluster to use checksums where they did not before (or 
>> optionally turning these off).  This approach is a nice tool to have, but in 
>> order to be able to support this process in a manner which has the database 
>> online while the database is going throught the initial checksum process.
> 
> It would be really nice if this could be extended to handle different page 
> formats as well, something that keeps rearing it's head. Perhaps that could 
> be done with the cycle idea you've described.

I had had this thought too, but the main issues I saw were that new page 
formats were not guaranteed to take up the same space/storage, so there was an 
inherent limitation on the ability to restructure things out *arbitrarily*; 
that being said, there may be a use-case for the types of modifications that 
this approach *would* be able to handle.

> Another possibility is some kind of a page-level indicator of what binary 
> format is in use on a given page. For checksums maybe a single bit would 
> suffice (indicating that you should verify the page checksum). Another use 
> case is using this to finally ditch all the old VACUUM FULL code in 
> HeapTupleSatisfies*().

There’s already a page version field, no?  I assume that would be sufficient 
for the page format indicator.  I don’t know about using a flag for verifying 
the checksum, as that is already modifying the page which is to be checksummed 
anyway, which we want to avoid having to rewrite a bunch of pages 
unnecessarily, no?  And you’d presumably need to clear that state again which 
would be an additional write.  This was the issue that the checksum cycle was 
meant to handle, since we store this information in the system catalogs and the 
types of modifications here would be idempotent.

David
--
David Christensen
PostgreSQL Team Manager
End Point Corporation
da...@endpoint.com
785-727-1171







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SQL function to report log message

2015-07-13 Thread dinesh kumar
On Mon, Jul 13, 2015 at 1:56 PM, Jim Nasby  wrote:

> On 7/13/15 3:39 PM, dinesh kumar wrote:
>
>> I don't really see the point of what you're describing here. Just do
>> something like RAISE WARNING which should normally be high enough to
>> make it into the logs. Or use a pl language that will let you write
>> your own logfile on the server (ie: plperlu).
>>
>> True. Using plperlu, shall we bypass our log_* settings. If it's true, i
>> wasn't sure about it.
>>
>
> plperlu can do anything the server can do. Including fun things like
> appending to any file the server can write to or executing things like `rm
> -rf pg_xlog`.


Thanks Much Jim.


>
>
>  I didn't mean that, we need to have this feature, since we have
>> it on
>> other RDBMS. I don't see a reason, why don't we have this in our
>> PG too.
>>
>> I see the similar item in our development list
>> <
>> http://www.postgresql.org/message-id/53a8e96e.9060...@2ndquadrant.com>.
>>
>>
>> That's not at all what that item is talking about. It's talking
>> about exposing ereport as a SQL function, without altering the rest
>> of our logging behavior.
>>
>>
>> Ah. It's' my bad interpretation. Let me work on it, and will send a new
>> patch as a wrapper sql function for ereport.
>>
>
> You might want to present a plan for that; it's not as trivial as it
> sounds due to how ereport works. In particular, I'd want to see (at
> minimum) the same functionality that plpgsql's RAISE command now provides
> (errdetail, errhint, etc).
>
>
Sure. Let me prepare a prototype for it, and will share with you before
implementing.


Best Regards,
Dinesh
manojadinesh.blogspot.com


> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>


Re: [HACKERS] [PATCH] SQL function to report log message

2015-07-13 Thread Jim Nasby

On 7/13/15 3:39 PM, dinesh kumar wrote:

I don't really see the point of what you're describing here. Just do
something like RAISE WARNING which should normally be high enough to
make it into the logs. Or use a pl language that will let you write
your own logfile on the server (ie: plperlu).

True. Using plperlu, shall we bypass our log_* settings. If it's true, i
wasn't sure about it.


plperlu can do anything the server can do. Including fun things like 
appending to any file the server can write to or executing things like 
`rm -rf pg_xlog`.



I didn't mean that, we need to have this feature, since we have
it on
other RDBMS. I don't see a reason, why don't we have this in our
PG too.

I see the similar item in our development list
.


That's not at all what that item is talking about. It's talking
about exposing ereport as a SQL function, without altering the rest
of our logging behavior.


Ah. It's' my bad interpretation. Let me work on it, and will send a new
patch as a wrapper sql function for ereport.


You might want to present a plan for that; it's not as trivial as it 
sounds due to how ereport works. In particular, I'd want to see (at 
minimum) the same functionality that plpgsql's RAISE command now 
provides (errdetail, errhint, etc).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [DESIGN] Incremental checksums

2015-07-13 Thread Jim Nasby

On 7/13/15 3:26 PM, David Christensen wrote:

* Incremental Checksums

PostgreSQL users should have a way up upgrading their cluster to use data 
checksums without having to do a costly pg_dump/pg_restore; in particular, 
checksums should be able to be enabled/disabled at will, with the database 
enforcing the logic of whether the pages considered for a given database are 
valid.

Considered approaches for this are having additional flags to pg_upgrade to set 
up the new cluster to use checksums where they did not before (or optionally 
turning these off).  This approach is a nice tool to have, but in order to be 
able to support this process in a manner which has the database online while 
the database is going throught the initial checksum process.


It would be really nice if this could be extended to handle different 
page formats as well, something that keeps rearing it's head. Perhaps 
that could be done with the cycle idea you've described.


Another possibility is some kind of a page-level indicator of what 
binary format is in use on a given page. For checksums maybe a single 
bit would suffice (indicating that you should verify the page checksum). 
Another use case is using this to finally ditch all the old VACUUM FULL 
code in HeapTupleSatisfies*().

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SQL function to report log message

2015-07-13 Thread dinesh kumar
On Mon, Jul 13, 2015 at 1:29 PM, Jim Nasby  wrote:

> On 7/13/15 12:39 PM, dinesh kumar wrote:
>
>> > As of now, we don't have an SQL function to write
>> custom/application
>> > messages to log destination. We have "RAISE" clause which is
>> controlled by
>> > log_ parameters. If we have an SQL function which works
>> irrespective of log
>> > settings, that would be a good for many log parsers. What i mean
>> is, in DBA
>> > point of view, if we route all our native OS stats to log files in
>> a proper
>> > format, then we can have our log reporting tools to give most
>> effective
>> > reports. Also, Applications can log their own messages to postgres
>> log
>> > files, which can be monitored by DBAs too.
>>
>> What's the actual use case for this feature other than it would be
>> good to have it?
>>
>>
>> That's a good question Michael.
>>
>> When i was working as a DBA for a different RDBMS, developers used to
>> write some serious APP errors, Followed by instructions into some sort
>> of log and trace files.
>> Since, DBAs didn't have the permission to check app logs, which was
>> owned by Ops team.
>>
>> In my old case, application was having serious OOM issues, which was
>> crashing frequently after the deployment. It wasn't the consistent
>> behavior from the app side, hence they used to sent  a copy all APP
>> metrics to trace files, and we were monitoring the DB what was happening
>> during the spike on app servers.
>>
>
> Spewing a bunch of stuff into the postgres log doesn't seem like an
> improvement here.
>
>
Agreed Jim.


> I don't really see the point of what you're describing here. Just do
> something like RAISE WARNING which should normally be high enough to make
> it into the logs. Or use a pl language that will let you write your own
> logfile on the server (ie: plperlu).
>
> True. Using plperlu, shall we bypass our log_* settings. If it's true, i
wasn't sure about it.


>  I didn't mean that, we need to have this feature, since we have it on
>> other RDBMS. I don't see a reason, why don't we have this in our PG too.
>>
>> I see the similar item in our development list
>> .
>>
>
> That's not at all what that item is talking about. It's talking about
> exposing ereport as a SQL function, without altering the rest of our
> logging behavior.
>

Ah. It's' my bad interpretation. Let me work on it, and will send a new
patch as a wrapper sql function for ereport.


Thanks again.

Regards,
Dinesh
manojadinesh.blogspot.com

-- 
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-13 Thread Robert Haas
On Tue, Jul 7, 2015 at 6:28 AM, Kyotaro HORIGUCHI
 wrote:
> Please forgive me to resend this message for some too-sad
> misspellings.
>
> # "Waiting for heavy weight locks" is somewhat confusing to spell..
>
> ===
> Hello,
>
> At Tue, 7 Jul 2015 16:27:38 +0900, Fujii Masao  wrote 
> in 
>> Each backend reports its event when trying to take a lock. But
>> the reported event is never reset until next event is reported.
>> Is this OK? This means that the wait_event column keeps showing
>> the *last* event while a backend is in idle state, for example.
>> So, shouldn't we reset the reported event or report another one
>> when releasing the lock?
>
> It seems so but pg_stat_activity.waiting would indicate whether
> the event is lasting. However, .waiting reflects only the status
> of heavy-weight locks. It would be quite misleading.
>
> I think that pg_stat_activity.wait_event sould be linked to
> .waiting then .wait_event should be restricted to heavy weight
> locks if the meaning of .waiting cannot not be changed.

Yeah, that's clearly no good.  It only makes sense to have wait_event
show the most recent event if waiting tells you whether the wait is
still ongoing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-13 Thread Robert Haas
On Mon, Jul 6, 2015 at 10:48 AM, Fujii Masao  wrote:
> According to his patch, the wait events that he was thinking to add were:
>
> + typedef enum PgCondition
> + {
> + PGCOND_UNUSED= 0,/* unused */
> +
> + /* 1 - CPU */
> + PGCOND_CPU= 1,/* generic cpu operations */
> + /* 11000 - CPU:PARSE */
> + PGCOND_CPU_PARSE= 11000,/* pg_parse_query */
> + PGCOND_CPU_PARSE_ANALYZE= 11100,/* parse_analyze */
> + /* 12000 - CPU:REWRITE */
> + PGCOND_CPU_REWRITE= 12000,/* pg_rewrite_query */
> + /* 13000 - CPU:PLAN */
> + PGCOND_CPU_PLAN= 13000,/* pg_plan_query */
> + /* 14000 - CPU:EXECUTE */
> + PGCOND_CPU_EXECUTE= 14000,/* PortalRun or
> PortalRunMulti */
[ etc. ]

Sorry to say it, but I this design is a mess.

Suppose we are executing a query, and during the query we execute the
ILIKE operator, and within that we try to acquire a buffer content
lock (say, to detoast some data).  So at the outermost level our state
is PGCOND_CPU_EXECUTE, and then within that we are in state
PGCOND_CPU_ILIKE, and then within that we are in state
PGCOND_LWLOCK_PAGE.  When we exit each of the inner states, we've got
to restore the proper outer state, or time will be mis-attribtued.
Error handling has got to pop all of the items off the stack that were
added since the PG_TRY() block started, and then push on a new state
for error handling, which gets popped when the PG_TRY block finishes.
Another problem is that some of these things are incredibly specific
(like "running the ILIKE operator") and others are extremely general
(like "executing the query").  Why does ILIKE get a code but
+(int4,int4) does not?  We need some less-arbitrary way of assigning
codes than what's shown here.

Now, that's not to say there are no good ideas here. For example,
pg_stat_activity could expose a byte of state indicating which phase
of query processing is current: parse / parse analysis / rewrite /
plan / execute / none.  I think that'd be a fine thing to do, and I
support doing it, although maybe not in the same patch as my original
proposal.  On the flip side, I don't support trying to expose
information on the level of "which C function are we currently
executing?" because I think there's going to be absolutely no
reasonable way to make that sufficiently low-overhead, and also
because I don't see any way to make it less than nightmarishly onerous
from the point of view of code maintenance.  We could expose some
functions but not others, but that seems like a mess; I think unless
and until we have a better solution, the right answer to "I need to
know which C function is running in each backend" is "that's what perf
is for".

In any case, I think the main point is that Itagaki-san's proposal is
not really a proposal for wait events.  It is a proposal to expose
some state about "what is the backend doing right now?" which might be
waiting or something else.  I believe those things are should be
separated into several separate pieces of state.  It's entirely
reasonable to want to know whether we are in the parse phase or plan
phase separately from knowing whether we are waiting on an lwlock or
not, because then you could (for example) judge what percentage of
your lock waits are coming from parsing vs. what fraction are coming
from planning, which somebody might care about.  Or you might care
about ONLY the phase of query processing and not about wait events at
all, and then you can ignore one column and look at the other.  With
the above proposal, those things all get munged together in a way that
I think is bound to be awkward.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SQL function to report log message

2015-07-13 Thread Jim Nasby

On 7/13/15 12:39 PM, dinesh kumar wrote:

> As of now, we don't have an SQL function to write custom/application
> messages to log destination. We have "RAISE" clause which is controlled by
> log_ parameters. If we have an SQL function which works irrespective of 
log
> settings, that would be a good for many log parsers. What i mean is, in 
DBA
> point of view, if we route all our native OS stats to log files in a 
proper
> format, then we can have our log reporting tools to give most effective
> reports. Also, Applications can log their own messages to postgres log
> files, which can be monitored by DBAs too.

What's the actual use case for this feature other than it would be
good to have it?


That's a good question Michael.

When i was working as a DBA for a different RDBMS, developers used to
write some serious APP errors, Followed by instructions into some sort
of log and trace files.
Since, DBAs didn't have the permission to check app logs, which was
owned by Ops team.

In my old case, application was having serious OOM issues, which was
crashing frequently after the deployment. It wasn't the consistent
behavior from the app side, hence they used to sent  a copy all APP
metrics to trace files, and we were monitoring the DB what was happening
during the spike on app servers.


Spewing a bunch of stuff into the postgres log doesn't seem like an 
improvement here.


I don't really see the point of what you're describing here. Just do 
something like RAISE WARNING which should normally be high enough to 
make it into the logs. Or use a pl language that will let you write your 
own logfile on the server (ie: plperlu).



I didn't mean that, we need to have this feature, since we have it on
other RDBMS. I don't see a reason, why don't we have this in our PG too.

I see the similar item in our development list
.


That's not at all what that item is talking about. It's talking about 
exposing ereport as a SQL function, without altering the rest of our 
logging behavior.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [DESIGN] Incremental checksums

2015-07-13 Thread David Christensen
pgsql-hackers,

So I’ve put some time into a design for the incremental checksum feature and 
wanted to get some feedback from the group:

* Incremental Checksums

PostgreSQL users should have a way up upgrading their cluster to use data 
checksums without having to do a costly pg_dump/pg_restore; in particular, 
checksums should be able to be enabled/disabled at will, with the database 
enforcing the logic of whether the pages considered for a given database are 
valid.

Considered approaches for this are having additional flags to pg_upgrade to set 
up the new cluster to use checksums where they did not before (or optionally 
turning these off).  This approach is a nice tool to have, but in order to be 
able to support this process in a manner which has the database online while 
the database is going throught the initial checksum process.

In order to support the idea of incremental checksums, this design adds the 
following things:

** pg_control:

Keep "data_checksum_version", but have it indicate *only* the algorithm version 
for checksums. i.e., it's no longer used for the data_checksum enabled/disabled 
state.

Add "data_checksum_state", an enum with multiple states: "disabled", 
"enabling", "enforcing" (perhaps "revalidating" too; something to indicate that 
we are reprocessing a database that purports to have been completely 
checksummed already)

An explanation of the states as well as the behavior of the checksums for each.

- disabled => not in a checksum cycle; no read validation, no checksums 
written.  This is the current behavior for Postgres *without* checksums.

- enabling => in a checksum cycle; no read validation, write checksums.  Any 
page that gets written to disk will be a valid checksum.  This is required when 
transitioning a cluster which has never had checksums, as the page reads would 
normally fail since they are uninitialized.

- enforcing => not in a checksum cycle; read validation, write checksums.  This 
is the current behavior of Postgres *with* checksums.

 (caveat: I'm not certain the following state is needed (and the current 
version of this patch doesn't have it)):

- revalidating => in a checksum cycle; read validation, write checksums.  The 
difference between this and "enabling" is that we care if page reads fail, 
since by definition they should have been validly checksummed, as we should 
verify this.

Add "data_checksum_cycle", a counter that gets incremented with every checksum 
cycle change.  This is used as a flag to verify when new checksum actions take 
place, for instance if we wanted to upgrade/change the checksum algorithm, or 
if we just want to support periodic checksum validation.

This variable will be compared against new values in the system tables to keep 
track of which relations still need to be checksummed in the cluster.

** pg_database:

Add a field "datlastchecksum" which will be the last checksum cycle which has 
completed for all relations in that database.

** pg_class:

Add a field "rellastchecksum" which stores the last successful checksum cycle 
for each relation.

** The checksum bgworker:

Something needs to proactively checksum any relations which are needing to be 
validated, and this something is known as the checksum bgworker.  Checksum 
bgworker will operate similar to autovacuum daemons, and in fact in this 
initial pass, we'll hook into the autovac launcher due to similarities in 
catalog reading functionality as well as balancing out with other maintenance 
activity.

If autovacuum does not need to do any vacuuming work, it will check if the 
cluster has requested a checksum cycle by checking if the state is "enabling" 
(or "revalidate").  If so, it will look for any database which needs checksums 
update.  It checks the current value of the data_checksum_cycle counter and 
looks for any databases with "datlastchecksum < data_checksum_cycle".

When all database have "datlastchecksum" == data_checksum_cycle, we initiate 
checksumming of any global cluster heap files.  When the global cluster tables 
heap files have been checksummed, then we consider the checksum cycle complete, 
change pg_control's "data_checksum_state" to "enforcing" and consider things 
fully up-to-date.

If it finds a database needing work, it iterates through that database's 
relations looking for "rellastchecksum < data_checksum_cycle".  If it finds 
none (i.e., every record has rellastchecksum == data_checksum_cycle) then it 
marks the containing database as up-to-date by updating "datlastchecksum = 
data_checksum_cycle".

For any relation that it finds in the database which is not checksummed, it 
starts an actual worker to handle the checksum process for this table.  Since 
the state of the cluster is already either "enforcing" or "revalidating", any 
block writes will get checksums added automatically, so the only thing the 
bgworker needs to do is load each block in the relation and explicitly mark as 
dirty (unless that's not required for FlushBuf

Re: [HACKERS] A little RLS oversight?

2015-07-13 Thread Stephen Frost
Michael,

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> On Sun, Jul 12, 2015 at 5:59 PM, Yaroslav wrote:
> > I can still see all statistics for 'test' in pg_stats under unprivileged
> > user.
> 
> Indeed, this looks like an oversight of RLS. Even if a policy is
> defined to prevent a user from seeing the rows of other users, it is
> still possible to get some information though this view.
> I am adding an open item regarding that for 9.5.

We need to be careful to avoid the slippery slope of trying to prevent
all covert channels, which has been extensively discussed previously.  I
tend to agree with this specific case of, if you have RLS configured on
the table then we probably shouldn't allow normal users to see the stats
on the table, but I don't have a problem with the usage of those stats
for generating plans, which users could see the results of via EXPLAIN.

> > I'd prefer statistics on RLS-enabled tables to be simply hidden completely
> > for unprivileged users.
> 
> This looks like something simple enough to do.
> @Stephen: perhaps you have some thoughts on the matter? Currently
> pg_stats breaks its promise to only show information about the rows
> current user can read.

I agree that it should be reasonably simple to do and, provided that's
the case, I'm fine with doing it once I get back (currently out until
the 27th).

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_upgrade + Extensions

2015-07-13 Thread Andrew Dunstan


On 07/13/2015 01:12 PM, Smitha Pamujula wrote:
Yes. I have checked that the extension didn't exist in any of the 
databases. I used \dx to see if there was json_build was listed and i 
didnt see any. Is that sufficient to check its existence. I am about 
to do another testing in a few minutes on a different machine. I will 
capture before/after shots





Please don't top-post on the PostgreSQL lists - see 



In theory it should be enough if it was installed in the standard way. 
But a more thorough procedure would be to run this command:


   select count(*)
   from pg_proc
   where prosrc ~ 'json_build';

Here's a one-liner that will check every database for you:

   for db in `psql  -t -c 'select datname from pg_database where
   datallowconn'` ; do C=`psql -t -c "select count(*) from pg_proc
   where prosrc ~ 'json_build'" $db`; echo $db $C; done

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Default Roles (was: Additional role attributes)

2015-07-13 Thread Stephen Frost
Fujii,

* Fujii Masao (masao.fu...@gmail.com) wrote:
> he documents of the functions which the corresponding default roles
> are added by this patch need to be updated. For example, the description
> of pg_xlog_replay_pause() says "Pauses recovery immediately (restricted
> to superusers).". I think that the description needs to mention
> the corresponding default role "pg_replay". Otherwise, it's difficult for
> users to see which default role is related to the function they want to use.
> Or probably we can add the table explaining all the relationships between
> default roles and corresponding operations. And it's useful.

Certainly, totally agree that we need to make it clear in the function
descriptions also.

> Why do we allow users to change the attributes of default roles?
> For example, ALTER ROLE default_role or GRANT ... TO default_role.
> Those changes are not dumped by pg_dumpall. So if users change
> the attributes for some reasons but they disappear via pg_dumpall,
> maybe the system goes into unexpected state.

Good point.  I'm fine with simply disallowing that completely; does
anyone want to argue that we should allow superusers to ALTER or GRANT
to these roles?  I have a hard time seeing the need for that and it
could make things quite ugly.

> I think that it's better to allow the roles with pg_monitor to
> execute pgstattuple functions. They are usually used for monitoring.
> Thought?

Possibly, but I'd need to look at them more closely than I have time to
right now.  Can you provide a use-case?  That would certainly help.
Also, we are mostly focused on things which are currently superuser-only
capabilities, if you don't need to be superuser today then the
monitoring system could be granted access using the normal mechanisms.
Actually logging systems won't log in directly as "pg_monitor" anyway,
they'll log in as "nagios" or similar, which has been GRANT'd
"pg_monitor" and could certainly be GRANT'd other rights also.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCH] SQL function to report log message

2015-07-13 Thread dinesh kumar
On Mon, Jul 13, 2015 at 1:11 AM, Michael Paquier 
wrote:

> On Mon, Jul 13, 2015 at 4:54 PM, dinesh kumar 
> wrote:
> > Would like to discuss on below feature here.
> >
> > Feature:
> > Having an SQL function, to write messages to log destination.
> >
> > Justification:
> > As of now, we don't have an SQL function to write custom/application
> > messages to log destination. We have "RAISE" clause which is controlled
> by
> > log_ parameters. If we have an SQL function which works irrespective of
> log
> > settings, that would be a good for many log parsers. What i mean is, in
> DBA
> > point of view, if we route all our native OS stats to log files in a
> proper
> > format, then we can have our log reporting tools to give most effective
> > reports. Also, Applications can log their own messages to postgres log
> > files, which can be monitored by DBAs too.
>
> What's the actual use case for this feature other than it would be
> good to have it?


That's a good question Michael.

When i was working as a DBA for a different RDBMS, developers used to write
some serious APP errors, Followed by instructions into some sort of log and
trace files.
Since, DBAs didn't have the permission to check app logs, which was owned
by Ops team.

In my old case, application was having serious OOM issues, which was
crashing frequently after the deployment. It wasn't the consistent behavior
from the app side, hence they used to sent  a copy all APP metrics to trace
files, and we were monitoring the DB what was happening during the spike on
app servers.

I didn't mean that, we need to have this feature, since we have it on other
RDBMS. I don't see a reason, why don't we have this in our PG too.

I see the similar item in our development list
.

Let me know if i miss anything here.

Best Regards,
Dinesh
manojadinesh.blogspot.com

A log message is here to give information about the
> state of something that happens in backend, but in the case of this
> function the event that happens is the content of the function itself.
> It also adds a new log level for something that has a unique usage,
> which looks like an overkill to me. Btw, you could do something more
> advanced with simply an extension if you really want to play with this
> area... But I am dubious about what kind of applications would use
> that.
> --
> Michael
>


Re: [HACKERS] pg_upgrade + Extensions

2015-07-13 Thread Smitha Pamujula
Yes. I have checked that the extension didn't exist in any of the
databases. I used \dx to see if there was json_build was listed and i didnt
see any. Is that sufficient to check its existence. I am about to do
another testing in a few minutes on a different machine. I will capture
before/after shots

Thanks

On Fri, Jul 10, 2015 at 4:26 PM, Andrew Dunstan  wrote:

>
>
> On Fri, Jul 10, 2015 at 5:05 PM, David E. Wheeler 
> wrote:
>
>> On Jul 10, 2015, at 11:32 AM, Smitha Pamujula <
>> smitha.pamuj...@iovation.com> wrote:
>>
>>
>> > Your installation references loadable libraries that are missing from
>> the
>> > new installation.  You can add these libraries to the new installation,
>> > or remove the functions using them from the old installation.  A list of
>> > problem libraries is in the file:
>> > loadable_libraries.txt
>> >
>> > Failure, exiting
>> > [postgres@pdxdvrptsrd04 ~]$ cat loadable_libraries.txt
>> > Could not load library "json_build"
>> > ERROR:  could not access file "json_build": No such file or directory
>>
>> So you drop the json_build extension before upgrading, but pg_upgrade
>> still complains that it’s missing? That seems odd.
>>
>>
>>
>
> Are you sure the extension was uninstalled from every database in the
> cluster? This seems likely to occur when you forgot to uninstall it from
> some database (e.g. template1)
>
> cheers
>
> andrew
>
>


-- 
Smitha Pamujula
Database Administrator // The Watch Woman

Direct: 503.943.6764
Mobile: 503.290.6214 // Twitter: iovation
www.iovation.com


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-13 Thread Simon Riggs
On 11 July 2015 at 21:28, Tom Lane  wrote:


> What are we going to do about this?
>

I will address the points you raise, one by one.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] intarray planning/exeuction problem with multiple operators

2015-07-13 Thread Tom Lane
Jeff Janes  writes:
> I've found an interesting performance problem in the intarray extension
> module.  It doesn't seem to be version dependent, I've verified it in 9.4.4
> and 9.6devel.

Seems like this isn't specifically an issue for intarray, but rather one
with the core GIN code not being smart about the combination of selective
and unselective index conditions.  In particular, it seems like the smart
thing for GIN to do with this example is just ignore the @@ condition
altogether so far as the index search goes, and mark all the results as
needing recheck so that the @@ operator gets applied as a filter.

You could also complain about the core planner not considering leaving
some potentially-indexable quals out of the actual index condition,
but TBH I don't see that that would be a useful use of planner cycles.
In almost every case it would mean that if there are K quals potentially
usable with a given index, we'd cost out 2^K-1 index paths and immediately
reject all but the use-em-all alternative.  That's a lot of cycles to
spend to handle a corner case.  It's always been assumed to be the index
AM's problem to optimize use of the index quals it's handed, and I don't
see why that shouldn't be true here.

> The proposed selectivity estimate improvement patch for @@ does not fix
> this (and evaluating that patch was how I found this issue.)

Nah, it wouldn't: the cost estimates are correct, in the sense that
gincostestimate realizes that GIN will be horribly stupid about this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #13126: table constraint loses its comment

2015-07-13 Thread Heikki Linnakangas

On 07/08/2015 08:12 AM, Michael Paquier wrote:

On Tue, Jul 7, 2015 at 6:24 PM, Petr Jelinek  wrote:

On 2015-07-04 13:45, Michael Paquier wrote:


On Fri, Jul 3, 2015 at 11:59 PM, Petr Jelinek wrote:


Well for indexes you don't really need to add the new AT command, as
IndexStmt has char *idxcomment which it will automatically uses as
comment
if not NULL. While  I am not huge fan of the idxcomment it doesn't seem
to
be easy to remove it in the future and it's what transformTableLikeClause
uses so it might be good to be consistent with that.



Oh, right, I completely missed your point and this field in IndexStmt.
Yes it is better to be consistent in this case and to use it. It makes
as well the code easier to follow.
Btw, regarding the new AT routines, I am getting find of them, it
makes easier to follow which command is added where in the command
queues.

Updated patch attached.



Cool, I am happy with the patch now. Marking as ready for committer.


Thanks for the review.


I don't much like splitting the code across multiple helper functions, 
it makes the overall logic more difficult to follow, although I agree 
that the indentation has made the pretty hard to read as it is. I'm 
planning to commit the attached two patches. The first one is just 
reformatting changes to ATExecAlterColumnType(), turning the switch-case 
statements into if-else blocks. If-else doesn't cause so much 
indentation, and allows defining variables local to the "cases" more 
naturally, so it alleviates the indentation problem somewhat. The second 
patch is the actual bug fix.


There was one bug in this patch: the COMMENT statement that was 
constructed didn't schema-qualify the relation, so if the ALTERed table 
was not in search_path, the operation would fail with a "relation not 
found" error (or add the comment to wrong object). Fixed that.


I plan to commit the attached patches later today or tomorrow. But how 
do you feel about back-patching this? It should be possible to 
backpatch, although at a quick test it seems that there have been small 
changes to the affected code in many versions, so it would require some 
work. Also, in back-branches we'd need to put the new AT_ReAddComment 
type to the end of the list, like we've done when we added 
AT_ReAddConstraint in 9.3. I'm inclined to backpatch this to 9.5 only, 
even though this is clearly a bug fix, on the grounds that it's not a 
very serious bug and there's always some risk in backpatching.


- Heikki

>From c4865eb873a9cafb7e247cc69b7030243b74f3be Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Mon, 13 Jul 2015 19:22:31 +0300
Subject: [PATCH 1/2] Reformat code in ATPostAlterTypeParse.

The code in ATPostAlterTypeParse was very deeply indented, mostly because
there were two nested switch-case statements, which add a lot of
indentation. Use if-else blocks instead, to make the code less indented
and more readable.

This is in preparation for next patch that makes some actualy changes to
the function. These cosmetic parts have been separated to make it easier
to see the real changes in the other patch.
---
 src/backend/commands/tablecmds.c | 104 +++
 1 file changed, 51 insertions(+), 53 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d394713..e7b23f1 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8645,69 +8645,67 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd,
 		Node	   *stm = (Node *) lfirst(list_item);
 		AlteredTableInfo *tab;
 
-		switch (nodeTag(stm))
+		tab = ATGetQueueEntry(wqueue, rel);
+
+		if (IsA(stm, IndexStmt))
+		{
+			IndexStmt  *stmt = (IndexStmt *) stm;
+			AlterTableCmd *newcmd;
+
+			if (!rewrite)
+TryReuseIndex(oldId, stmt);
+
+			newcmd = makeNode(AlterTableCmd);
+			newcmd->subtype = AT_ReAddIndex;
+			newcmd->def = (Node *) stmt;
+			tab->subcmds[AT_PASS_OLD_INDEX] =
+lappend(tab->subcmds[AT_PASS_OLD_INDEX], newcmd);
+		}
+		else if (IsA(stm, AlterTableStmt))
 		{
-			case T_IndexStmt:
+			AlterTableStmt *stmt = (AlterTableStmt *) stm;
+			ListCell   *lcmd;
+
+			foreach(lcmd, stmt->cmds)
+			{
+AlterTableCmd *cmd = (AlterTableCmd *) lfirst(lcmd);
+
+if (cmd->subtype == AT_AddIndex)
 {
-	IndexStmt  *stmt = (IndexStmt *) stm;
-	AlterTableCmd *newcmd;
+	Assert(IsA(cmd->def, IndexStmt));
 
 	if (!rewrite)
-		TryReuseIndex(oldId, stmt);
+		TryReuseIndex(get_constraint_index(oldId),
+	  (IndexStmt *) cmd->def);
 
-	tab = ATGetQueueEntry(wqueue, rel);
-	newcmd = makeNode(AlterTableCmd);
-	newcmd->subtype = AT_ReAddIndex;
-	newcmd->def = (Node *) stmt;
+	cmd->subtype = AT_ReAddIndex;
 	tab->subcmds[AT_PASS_OLD_INDEX] =
-		lappend(tab->subcmds[AT_PASS_OLD_INDEX], newcmd);
-	break;
+		lappend(tab->subcmds[AT_PASS_OLD_INDEX], cmd);
 }
-			case T_AlterTableStmt:
+else if (cmd->subtype == AT_Add

Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-13 Thread Tom Lane
I wrote:
> TBH, I think the right thing to do at this point is to revert the entire
> patch and send it back for ground-up rework.  I think the high-level
> design is wrong in many ways and I have about zero confidence in most
> of the code details as well.

> I'll send a separate message about high-level issues,

And here's that.  I do *not* claim that this is a complete list of design
issues with the patch, it's just things I happened to notice in the amount
of time I've spent so far (which is already way more than I wanted to
spend on TABLESAMPLE right now).


I'm not sure that we need an extensible sampling-method capability at all,
let alone that an API for that needs to be the primary focus of a patch.
Certainly the offered examples of extension modules aren't inspiring:
tsm_system_time is broken by design (more about that below) and nobody
would miss tsm_system_rows if it weren't there.  What is far more
important is to get sampling capability into indexscans, and designing
an API ahead of time seems like mostly a distraction from that.

I'd think seriously about tossing the entire executor-stage part of the
patch, creating a stateless (history-independent) sampling filter that
just accepts or rejects TIDs, and sticking calls to that into all the
table scan node types.  Once you had something like that working well
it might be worth considering whether to try to expose an API to
generalize it.  But even then it's not clear that we really need any
more options than true-Bernoulli and block-level sampling.

The IBM paper I linked to in the other thread mentions that their
optimizer will sometimes choose to do Bernoulli sampling even if SYSTEM
was requested.  Probably that happens when it decides to do a simple
indexscan, because then there's no advantage to trying to cluster the
sampled rows.  But in the case of a bitmap scan, you could very easily do
either true Bernoulli or block-level sampling simply by adjusting the
rules about which bits you keep or clear in the bitmap (given that you
apply the filter between collection of the index bitmap and accessing the
heap, which seems natural).  The only case where a special scan type
really seems to be needed is if you want block-level sampling, the query
would otherwise use a seqscan, *and* the sampling percentage is pretty low
--- if you'd be reading every second or third block anyway, you're likely
better off with a plain seqscan so that the kernel sees sequential access
and does prefetching.  The current API doesn't seem to make it possible to
switch between seqscan and read-only-selected-blocks based on the sampling
percentage, but I think that could be an interesting optimization.
(Another bet that's been missed is having the samplescan logic request
prefetching when it is doing selective block reads.  The current API can't
support that AFAICS, since there's no expectation that nextblock calls
could be done asynchronously from nexttuple calls.)

Another issue with the API as designed is the examinetuple() callback.
Allowing sample methods to see invisible tuples is quite possibly the
worst idea in the whole patch.  They can *not* do anything with such
tuples, or they'll totally break reproducibility: if the tuple is
invisible to your scan, it might well be or soon become invisible to
everybody, whereupon it would be subject to garbage collection at the
drop of a hat.  So if an invisible tuple affects the sample method's
behavior at all, repeated scans in the same query would not produce
identical results, which as mentioned before is required both by spec
and for minimally sane query behavior.  Moreover, examining the contents
of the tuple is unsafe (it might contain pointers to TOAST values that
no longer exist); and even if it were safe, what's the point?  Sampling
that pays attention to the data is the very definition of biased.  So
if we do re-introduce an API like the current one, I'd definitely lose
this bit and only allow sample methods to consider visible tuples.

On the point of reproducibility: the tsm_system_time method is utterly
incapable of producing identical results across repeated scans, because
there is no reason to believe it would get exactly as far in the same
amount of time each time.  This might be all right across queries if
the method could refuse to work with REPEATABLE clauses (but there's
no provision for such a restriction in the current API).  But it's not
acceptable within a query.  Another problem with tsm_system_time is that
its cost/rowcount estimation is based on nothing more than wishful
thinking, and can never be based on anything more than wishful thinking,
because planner cost units are not time-based.  Adding a comment that
says we'll nonetheless pretend they are milliseconds isn't a solution.
So that sampling method really has to go away and never come back,
whatever we might ultimately salvage from this patch otherwise.

(I'm not exactly convinced that the system or tsm_system_rows methods
are adequately rep

Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22

2015-07-13 Thread Marco Atzeri

On 7/13/2015 5:36 PM, Andrew Dunstan wrote:


On 07/12/2015 05:06 PM, Tom Lane wrote:

Andrew Dunstan  writes:

On 07/04/2015 11:02 AM, Tom Lane wrote:

It's not apparent to me how that works at all.

BTW, the .a files being linked to above are not like Unix .a static
archives - they are import library files, which I think they are only
used at link time, not run time. The path to the DLLs isn't being
hardcoded.

Ah, I see.  So then what Marco is suggesting is copying a coding pattern
that does work.


Unless there is further argument I propose to commit something very like
Marco's suggestion for hstore_plperl, hstore_plpython and ltree_plpython

No objection here.




OK, I tried the attached patch.

but when trying to build with python 3 I get this (no such problems with
python2, which builds and tests fine):

make -C hstore_plpython all
make[1]: Entering directory
'/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython'
ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -g -O2 -I../../src/pl/plpython
-I/usr/include/python3.4m -I../../contrib/hstore -I. -I.
-I../../src/include -I/usr/include/libxml2   -c -o hstore_plpython.o
hstore_plpython.c
ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -g -O2   -shared -o
hstore_plpython3.dll  hstore_plpython.o  -L../../src/port
-L../../src/common -Wl,--allow-multiple-definition
-Wl,--enable-auto-import -L/usr/lib  -L/usr/local/lib
-Wl,--as-needed   -L../../src/backend -lpostgres -L../hstore
-lhstore -L../../src/pl/plpython -lplpython3
-L/usr/lib/python3.4/config-3.4m -lpython3.4m -lpgcommon -lpgport
-lxslt -lxml2 -lssl -lcrypto -lz -lreadline -lcrypt
hstore_plpython.o: In function `hstore_to_plpython':

/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:35:

undefined reference to `PLyUnicode_FromStringAndSize'


[cut]

I'd like to get that fixed before applying anything. Marco, any ideas?

To build with python 3, set the environment like this:
PYTHON=/usr/bin/python3 - this can be done in the config file.


I am only building with python2, but on cygwin
there is an additional "intl" library for python3 binding

$ python2-config --libs
-lpython2.7 -ldl

$ python3-config --libs
-lpython3.4m -lintl -ldl


cheers

andrew


Regards
Marco



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22

2015-07-13 Thread Andrew Dunstan


On 07/12/2015 05:06 PM, Tom Lane wrote:

Andrew Dunstan  writes:

On 07/04/2015 11:02 AM, Tom Lane wrote:

It's not apparent to me how that works at all.

BTW, the .a files being linked to above are not like Unix .a static
archives - they are import library files, which I think they are only
used at link time, not run time. The path to the DLLs isn't being hardcoded.

Ah, I see.  So then what Marco is suggesting is copying a coding pattern
that does work.


Unless there is further argument I propose to commit something very like
Marco's suggestion for hstore_plperl, hstore_plpython and ltree_plpython

No objection here.




OK, I tried the attached patch.

but when trying to build with python 3 I get this (no such problems with 
python2, which builds and tests fine):


   make -C hstore_plpython all
   make[1]: Entering directory
   '/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython'
   ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
   -Wdeclaration-after-statement -Wendif-labels
   -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
   -fwrapv -fexcess-precision=standard -g -O2 -I../../src/pl/plpython
   -I/usr/include/python3.4m -I../../contrib/hstore -I. -I.
   -I../../src/include -I/usr/include/libxml2   -c -o hstore_plpython.o
   hstore_plpython.c
   ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
   -Wdeclaration-after-statement -Wendif-labels
   -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
   -fwrapv -fexcess-precision=standard -g -O2   -shared -o
   hstore_plpython3.dll  hstore_plpython.o  -L../../src/port
   -L../../src/common -Wl,--allow-multiple-definition
   -Wl,--enable-auto-import -L/usr/lib  -L/usr/local/lib
   -Wl,--as-needed   -L../../src/backend -lpostgres -L../hstore
   -lhstore -L../../src/pl/plpython -lplpython3
   -L/usr/lib/python3.4/config-3.4m -lpython3.4m -lpgcommon -lpgport
   -lxslt -lxml2 -lssl -lcrypto -lz -lreadline -lcrypt
   hstore_plpython.o: In function `hstore_to_plpython':
   
/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:35:
   undefined reference to `PLyUnicode_FromStringAndSize'
   
/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:35:(.text+0x99):
   relocation truncated to fit: R_X86_64_PC32 against undefined symbol
   `PLyUnicode_FromStringAndSize'
   
/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:28:
   undefined reference to `PLyUnicode_FromStringAndSize'
   
/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:28:(.text+0xf1):
   relocation truncated to fit: R_X86_64_PC32 against undefined symbol
   `PLyUnicode_FromStringAndSize'
   hstore_plpython.o: In function `plpython_to_hstore':
   
/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:96:
   undefined reference to `PLyObject_AsString'
   
/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:96:(.text+0x2cc):
   relocation truncated to fit: R_X86_64_PC32 against undefined symbol
   `PLyObject_AsString'
   
/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:84:
   undefined reference to `PLyObject_AsString'
   
/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:84:(.text+0x321):
   relocation truncated to fit: R_X86_64_PC32 against undefined symbol
   `PLyObject_AsString'
   collect2: error: ld returned 1 exit status
   ../../src/Makefile.shlib:358: recipe for target
   'hstore_plpython3.dll' failed
   make[1]: *** [hstore_plpython3.dll] Error 1
   make[1]: Leaving directory
   '/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython'
   Makefile:92: recipe for target 'all-hstore_plpython-recurse' failed
   make: *** [all-hstore_plpython-recurse] Error 2

I'd like to get that fixed before applying anything. Marco, any ideas?

To build with python 3, set the environment like this: 
PYTHON=/usr/bin/python3 - this can be done in the config file.



cheers

andrew

diff --git a/contrib/hstore_plperl/Makefile b/contrib/hstore_plperl/Makefile
index 19a8ab4..603ef52 100644
--- a/contrib/hstore_plperl/Makefile
+++ b/contrib/hstore_plperl/Makefile
@@ -30,6 +30,10 @@ override CPPFLAGS += -DPLPERL_HAVE_UID_GID -Wno-comment
 SHLIB_LINK += ../hstore/libhstore.a $(wildcard ../../src/pl/plperl/libperl*.a)
 endif
 
+ifeq ($(PORTNAME), cygwin)
+SHLIB_LINK += -L../hstore -l hstore $(perl_embed_ldflags)
+endif
+
 # As with plperl we need to make sure that the CORE directory is included
 # last, probably because it sometimes contains some header files with names
 # that clash with some of ours, or with some that we include, notably on
diff --git a/contrib/hstore_plpython/Makefile b/contrib/hstore_plpython/Makefile
index 6ee434b..dfff0fd 100644
--- a/contrib/hstore_plpython/Makefile
+++ b/contrib/hstore_plpython/Makefile
@@ -28,6 +28,12 @@ ifeq ($(PORTNAME), win32)
 SHLIB_LINK += ../hstore/libhstore.a $(wildcard ../../src/pl/plpython/libpython*

Re: [HACKERS] dead assignment src/bin/scripts/print.c line 421

2015-07-13 Thread Heikki Linnakangas

On 07/13/2015 04:56 PM, Laurent Laborde wrote:

Friendly greetings !

in file src/bin/scripts/print.c line 421 :
need_recordsep = false;
then set to true line 424.

Now i'm pretty sure it's a meaningless "bug" without any consequence (the
commit that introduced it is 15 years old).

There is a lot of (apparently) dead assignment here and there but some
assigment could be used for debugging purpose so ... why not. But this one ?


The code in question looks like this:


for (f = footers; f; f = f->next)
{
if (need_recordsep)
{
print_separator(cont->opt->recordSep, fout);
need_recordsep = false;
}
fputs(f->data, fout);
need_recordsep = true;
}


Hmm. It does kind of make sense. Right after printing the separator, you 
don't need to print a separator because you just printed one.  But as 
soon as you print the field, you need a separator again. It would be 
quite understandable without that dead assignment too, and that's 
probably how I would've written it in the first place. But since that's 
how it is and has been for 15 years, I'm inclined to just leave it so.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dead assignment src/bin/scripts/print.c line 421

2015-07-13 Thread Laurent Laborde
Should have been sent to the bugs ML sorry :-/

On Mon, Jul 13, 2015 at 3:56 PM, Laurent Laborde 
wrote:

> Friendly greetings !
>
> in file src/bin/scripts/print.c line 421 :
> need_recordsep = false;
> then set to true line 424.
>
> Now i'm pretty sure it's a meaningless "bug" without any consequence (the
> commit that introduced it is 15 years old).
>
> There is a lot of (apparently) dead assignment here and there but some
> assigment could be used for debugging purpose so ... why not. But this one ?
>
> --
> Laurent "ker2x" Laborde
> DBA gandi.net \o/
>



-- 
Laurent "ker2x" Laborde


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-13 Thread Andres Freund
On 2015-07-13 23:48:02 +0900, Sawada Masahiko wrote:
> But please image the case where old cluster has table which is very
> large, read-only and vacuum freeze is done.
> In this case, the all-frozen bit of such table in new cluster will not
> set, unless we do vacuum freeze again.
> The information of all-frozen of such table is lacked.

So what? That's the situation today… Yes, it'll trigger a
anti-wraparound vacuum at some later point, after that they map bits
will be set.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-13 Thread Simon Riggs
On 13 July 2015 at 15:48, Sawada Masahiko  wrote:

> On Mon, Jul 13, 2015 at 9:22 PM, Andres Freund  wrote:
> > On 2015-07-13 21:03:07 +0900, Sawada Masahiko wrote:
> >> Even If we implement rewriting tool for vm into pg_upgrade, it will
> >> take time as much as revacuum because it need whole scanning table.
> >
> > Why would it? Sure, you can only set allvisible and not the frozen bit,
> > but that's fine. That way the cost for freezing can be paid over time.
> >
> > If we require terrabytes of data to be scanned, including possibly
> > rewriting large portions due to freezing, before index only scans work
> > and most vacuums act in a partial manner the migration to 9.6 will be a
> > major pain for our users.
>
> Ah, If we set all bit as not all-frozen,  we don't need to whole table
> scanning, only scan vm.
> And I agree with this.
>
> But please image the case where old cluster has table which is very
> large, read-only and vacuum freeze is done.
> In this case, the all-frozen bit of such table in new cluster will not
> set, unless we do vacuum freeze again.
> The information of all-frozen of such table is lacked.
>

The contents of the VM fork is essential to retain after an upgrade because
it is used for Index Only Scans. If we destroy that information it could
send SQL response times to unacceptable levels after upgrade.

It takes time to scan the VM and create the new VFM, but the time taken is
proportional to the size of VM, which seems like it will be acceptable.

Example calcs:
An 8TB PostgreSQL installation would need us to scan 128MB of VM into about
256MB of VFM. Probably the fsyncs will occupy the most time.
In comparison, we would need to scan all 8TB to rebuild the VMs, which will
take much longer (and fsyncs will still be needed).

Since we don't record freeze map information now it is acceptable to begin
after upgrade with all freeze info set to zero.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-07-13 Thread Andrew Dunstan


On 07/13/2015 10:46 AM, Ryan Pedela wrote:
On Mon, Jul 13, 2015 at 1:30 AM, Shulgin, Oleksandr 
mailto:oleksandr.shul...@zalando.de>> 
wrote:



To reiterate: for my problem, that is escaping numerics that can
potentially overflow[1] under ECMAScript standard, I want to be
able to override the code that outputs the numeric converted to
string.  There is no way in current implementation to do that *at
all*, short of copying all the code involved in producing JSON
output and changing it at certain points.  One could try
re-parsing JSON instead, but that doesn't actually solve the
issue, because type information is lost forever at that point.


I had the exact same problem with Node.js and client-side Javascript. 
That is why I wrote json-bignum [1] for Node.js. There is a bower 
version [2] as well. The only caveat is that it is slower than the 
native JSON functions, but I am happy to receive PRs to improve 
performance.


1. https://github.com/datalanche/json-bignum
2. https://libraries.io/bower/json-bignum

As far as large numbers in JSON, I think Postgres is doing the right 
thing and should not be changed. It is Javascript that is stupid here, 
and I don't think it is wise to add something to core just because one 
client does stupid things with large numbers. In addition, ES7 is 
introducing "value types" which will hopefully solve the large number 
problem in Javascript.


The random whitespace issue is valid in my opinion and should be fixed.






OK, I think we're getting a consensus here. It's good to know the JS 
world is acquiring some sanity in this area.


Let's just fix the whitespace and be done, without all the callback 
stuff. That should be a much smaller patch.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-07-13 Thread Andrew Dunstan


On 07/13/2015 05:41 AM, Shulgin, Oleksandr wrote:
On Mon, Jul 13, 2015 at 9:44 AM, Pavel Stehule 
mailto:pavel.steh...@gmail.com>> wrote:



To reiterate: for my problem, that is escaping numerics that
can potentially overflow[1] under ECMAScript standard, I want
to be able to override the code that outputs the numeric
converted to string.  There is no way in current
implementation to do that *at all*, short of copying all the
code involved in producing JSON output and changing it at
certain points.  One could try re-parsing JSON instead, but
that doesn't actually solve the issue, because type
information is lost forever at that point.

The whitespace unification was a mere side-effect of the
original effort on this patch.


The dynamic type change is some what I would not to do in
database, really :)

If you afraid about overflow, then convert numeric to string
immediately - in this case, the client have to support both
variant - so immediate cast should not be a problem.


Yeah, but how would you do that in context of a logical replication 
decoding plugin?  I've tried a number of tricks for that, including, 
but not limited to registering phony types to wrap numeric type and 
replacing the OID of numeric with this custom type OID in TupleDesc, 
but then again one has to register that as known record type, etc.


Anyway this check on max number should be implemented in our
JSON(b) out functions (as warning?).


Not really, since this is a problem of ECMAScript standard, not JSON 
spec.  For example, Python module for handling JSON doesn't suffer 
from this overflow problem,


The thing is, we cannot know which clients are going to consume the 
stream of decoded events, and if it's some implementation of 
JavaScript, it can suffer silent data corruption if we don't guard 
against that in the logical decoding plugin.


Hope that makes it clear. :-)




Yes, but I think the plugin is the right place to do it. What is more, 
this won't actually prevent you completely from producing non-ECMAScript 
compliant JSON, since json or jsonb values containing offending numerics 
won't be caught, AIUI. But a fairly simple to write function that 
reparsed and fixed the JSON inside the decoder would work.


cheers

andrew






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-13 Thread Sawada Masahiko
On Mon, Jul 13, 2015 at 9:22 PM, Andres Freund  wrote:
> On 2015-07-13 21:03:07 +0900, Sawada Masahiko wrote:
>> Even If we implement rewriting tool for vm into pg_upgrade, it will
>> take time as much as revacuum because it need whole scanning table.
>
> Why would it? Sure, you can only set allvisible and not the frozen bit,
> but that's fine. That way the cost for freezing can be paid over time.
>
> If we require terrabytes of data to be scanned, including possibly
> rewriting large portions due to freezing, before index only scans work
> and most vacuums act in a partial manner the migration to 9.6 will be a
> major pain for our users.

Ah, If we set all bit as not all-frozen,  we don't need to whole table
scanning, only scan vm.
And I agree with this.

But please image the case where old cluster has table which is very
large, read-only and vacuum freeze is done.
In this case, the all-frozen bit of such table in new cluster will not
set, unless we do vacuum freeze again.
The information of all-frozen of such table is lacked.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-07-13 Thread Ryan Pedela
On Mon, Jul 13, 2015 at 1:30 AM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:
>
>
> To reiterate: for my problem, that is escaping numerics that can
> potentially overflow[1] under ECMAScript standard, I want to be able to
> override the code that outputs the numeric converted to string.  There is
> no way in current implementation to do that *at all*, short of copying all
> the code involved in producing JSON output and changing it at certain
> points.  One could try re-parsing JSON instead, but that doesn't actually
> solve the issue, because type information is lost forever at that point.
>

I had the exact same problem with Node.js and client-side Javascript. That
is why I wrote json-bignum [1] for Node.js. There is a bower version [2] as
well. The only caveat is that it is slower than the native JSON functions,
but I am happy to receive PRs to improve performance.

1. https://github.com/datalanche/json-bignum
2. https://libraries.io/bower/json-bignum

As far as large numbers in JSON, I think Postgres is doing the right thing
and should not be changed. It is Javascript that is stupid here, and I
don't think it is wise to add something to core just because one client
does stupid things with large numbers. In addition, ES7 is introducing
"value types" which will hopefully solve the large number problem in
Javascript.

The random whitespace issue is valid in my opinion and should be fixed.

Thanks,
Ryan Pedela


Re: [HACKERS] Default Roles (was: Additional role attributes)

2015-07-13 Thread Fujii Masao
On Wed, May 13, 2015 at 12:07 PM, Stephen Frost  wrote:
> All,
>
> This patch gets smaller and smaller.
>
> Upon reflection I realized that, with default roles, it's entirely
> unnecssary to change how the permission checks happen today- we can
> simply add checks to them to be looking at role membership also.  That's
> removed the last of my concerns regarding any API breakage for existing
> use-cases and has greatly simplified things overall.
>
> This does change the XLOG functions to require pg_monitor, as discussed
> on the other thread where it was pointed out by Heikki that the XLOG
> location information could be used to extract sensitive information
> based on what happens during compression.  Adding docs explaining that
> is on my to-do list for tomorrow.
>
> * Stephen Frost (sfr...@snowman.net) wrote:
>> Andres suggested that we drop the REPLICATION role attribute and just
>> use membership in pg_replication instead.  That's turned out quite
>> fantastically as we can now handle upgrades without breaking anything-
>> CREATE ROLE and ALTER ROLE still accept the attribute but simply grant
>> pg_replication to the role instead, and postinit.c has been changed to
>> check role membership similar to other pg_hba role membership checks
>> when a replication connection comes in.  Hat's off to Andres for his
>> suggestion.
>
> It's also unnecessary to change how the REPLICATION role attribute
> functions today.  This patch does add the pg_replication role, but it's
> only allowed to execute the various pg_logical and friends functions and
> not to actually connect as a REPLICATION user.  Connecting as a
> REPLICATION user allows you to stream the entire contents of the
> backend, after all, so it makes sense to have that be independent.
>
> I added another default role which allows the user to view
> pg_show_file_settings, as that seemed useful to me.  The diffstat for
> that being something like 4 additions without docs and maybe 10 with.
> More documentation would probably be good though and I'll look at adding
> to it.
>
> Most of the rest of what I've done has simply been reverting back to
> what we had.  The patch is certainly far easier to verify by reading
> through it now, as the changes are right next to each other, and the
> regression output changes are much smaller.
>
> Thoughts?  Comments?  Suggestions?

he documents of the functions which the corresponding default roles
are added by this patch need to be updated. For example, the description
of pg_xlog_replay_pause() says "Pauses recovery immediately (restricted
to superusers).". I think that the description needs to mention
the corresponding default role "pg_replay". Otherwise, it's difficult for
users to see which default role is related to the function they want to use.
Or probably we can add the table explaining all the relationships between
default roles and corresponding operations. And it's useful.

Why do we allow users to change the attributes of default roles?
For example, ALTER ROLE default_role or GRANT ... TO default_role.
Those changes are not dumped by pg_dumpall. So if users change
the attributes for some reasons but they disappear via pg_dumpall,
maybe the system goes into unexpected state.

I think that it's better to allow the roles with pg_monitor to
execute pgstattuple functions. They are usually used for monitoring.
Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] dead assignment src/bin/scripts/print.c line 421

2015-07-13 Thread Laurent Laborde
Friendly greetings !

in file src/bin/scripts/print.c line 421 :
need_recordsep = false;
then set to true line 424.

Now i'm pretty sure it's a meaningless "bug" without any consequence (the
commit that introduced it is 15 years old).

There is a lot of (apparently) dead assignment here and there but some
assigment could be used for debugging purpose so ... why not. But this one ?

-- 
Laurent "ker2x" Laborde
DBA gandi.net \o/


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-13 Thread Tom Lane
Michael Paquier  writes:
> Regarding the fact that those two contrib modules can be part of a
> -contrib package and could be installed, nuking those two extensions
> from the tree and preventing the creating of custom tablesample
> methods looks like a correct course of things to do for 9.5.

TBH, I think the right thing to do at this point is to revert the entire
patch and send it back for ground-up rework.  I think the high-level
design is wrong in many ways and I have about zero confidence in most
of the code details as well.

I'll send a separate message about high-level issues, but as far as code
details go, I started to do some detailed code review last night and only
got through contrib/tsm_system_rows/tsm_system_rows.c before deciding it
was hopeless.  Let's have a look at my notes:

 * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group

Obsolete copyright date.

 * IDENTIFICATION
 *contrib/tsm_system_rows_rowlimit/tsm_system_rows.c

Wrong filename.  (For the moment, I'll refrain from any value judgements
about the overall adequacy or quality of the comments in this patch, and
just point out obvious errors that should have been caught in review.)

typedef struct
{
SamplerRandomState randstate;
uint32  seed;   /* random seed */
BlockNumber nblocks;/* number of block in relation */
int32   ntuples;/* number of tuples to return */
int32   donetuples; /* tuples already returned */
OffsetNumber lt;/* last tuple returned from current block */
BlockNumber step;   /* step size */
BlockNumber lb; /* last block visited */
BlockNumber doneblocks; /* number of already returned blocks */
} SystemSamplerData;

This same typedef name is defined in three different places in the patch
(tablesample/system.c, tsm_system_rows.c, tsm_system_time.c).  While that
might not amount to a bug, it's sure a recipe for confusion, especially
since the struct definitions are all different.

Datum
tsm_system_rows_init(PG_FUNCTION_ARGS)
{
TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
uint32  seed = PG_GETARG_UINT32(1);
int32   ntuples = PG_ARGISNULL(2) ? -1 : PG_GETARG_INT32(2);

This is rather curious coding.  Why should this function check only
one of its arguments for nullness?  If it needs to defend against
any of them being null, it really needs to check all.  But in point of
fact, this function is declared STRICT, which means it's a violation of
the function call protocol if the core code ever passes a null to it,
and so this test ought to be dead code.

A similar pattern of ARGISNULL checks in declared-strict functions exists
in all the tablesample modules, not just this one, showing that this is an
overall design error not just a thinko here.  My inclination would be to
make the core code enforce non-nullness of all tablesample arguments so as
to make it unnecessary to check strictness of the tsm*init functions, but
in any case it is Not Okay to just pass nulls willy-nilly to strict C
functions.

Also, I find this coding pretty sloppy even without the strictness angle,
because the net effect of this way of dealing with nulls is that an
argument-must-not-be-null complaint is reported as "out of range",
which is both confusing and the wrong ERRCODE.

if (ntuples < 1)
ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 errmsg("invalid sample size"),
 errhint("Sample size must be positive integer value.")));

I don't find this to be good error message style.  The secondary comment
is not a "hint", it's an ironclad statement of what you did wrong, so if
we wanted to phrase it like this it should be an errdetail not errhint.
But the whole thing is overly cute anyway because there is no reason at
all not to just say what we mean in the primary error message, eg
ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 errmsg("sample size must be greater than zero")));

While we're on the subject, what's the reason for disallowing a sample
size of zero?  Seems like a reasonable edge case.

/* All blocks have been read, we're done */
if (sampler->doneblocks > sampler->nblocks ||
sampler->donetuples >= sampler->ntuples)
PG_RETURN_UINT32(InvalidBlockNumber);

Okay, I lied, I *am* going to complain about this comment.  Comments that
do not accurately describe the code they're attached to are worse than
useless.

/*
 * Cleanup method.
 */
Datum
tsm_system_rows_end(PG_FUNCTION_ARGS)
{
TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);

pfree(tsdesc->tsmdata);

PG_RETURN_VOID();
}

This cleanup method is a waste of code space.  There is no need to
pfree individual allocations at query execution end.

limitnode = linitial(args);
limitnode = estimate_expression_value(root, limitnode);

if (

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-13 Thread Fujii Masao
On Fri, Jul 10, 2015 at 10:06 PM, Beena Emerson  wrote:
> Hello,
>
> Tue, Jul 7, 2015 at 02:56 AM, Josh Berkus wrote:
>> pro-JSON:
>>
>> * standard syntax which is recognizable to sysadmins and devops.
>> * can use JSON/JSONB functions with ALTER SYSTEM SET to easily make
>> additions/deletions from the synch rep config.
>> * can add group labels (see below)
>
> Adding group labels do have a lot of values but as Amit has pointed out,
> with little modification, they can be included in GUC as well.

Or you can extend the custom GUC mechanism so that we can
specify the groups by using them, for example,

quorum_commit.mygroup1 = 'london, nyc'
quorum_commit.mygruop2 = 'tokyo, pune'
synchronous_standby_names = '1(mygroup1), 1(mygroup2)'

> On Tue, Jul 7, 2015 at 2:19 PM, Michael Paquier wrote:
>
>> Something like pg_syncinfo/ coupled with a LW lock, we already do
>> something similar for replication slots with pg_replslot/.
>
> I was trying to figure out how the JSON metadata can be used.
> It would have to be set using a given set of functions.

So we can use only such a set of functions to configure synch rep?
I don't like that idea. Because it prevents us from configuring that
while the server is not running.

> Is JSON being preferred because it would be ALTER SYSTEM friendly and in a
> format already known to users?

At least currently ALTER SYSTEM cannot accept the JSON data
(e.g., the return value of JSON function like json_build_object())
as the setting value. So I'm not sure how friendly ALTER SYSTEM
and JSON format really. If you want to argue that, probably you
need to improve ALTER SYSTEM so that JSON can be specified.

> In a real-life scenario, at most how many groups and nesting would be
> expected?

I don't think that many groups and nestings are common.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-13 Thread Fujii Masao
On Mon, Jul 13, 2015 at 9:19 PM, Ildus Kurbangaliev
 wrote:
> On 07/13/2015 01:36 PM, Amit Kapila wrote:
>
> Other problem of pg_stat_activity that we can not see all processes there
> (checkpointer for example). So we anyway need separate view for monitoring
> purposes.

+1

When there are many walsender processes running,
maybe I'd like to see their wait events.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Enhanced ALTER OPERATOR

2015-07-13 Thread Uriy Zhuravlev
 Hello hackers.
 
Attached is a new version of patch:
* port syntax from NULL to truth NONE
* fix documentation (thanks Heikki)
* rebase to master
 
Thanks.

-- 
Uriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/doc/src/sgml/ref/alter_operator.sgml b/doc/src/sgml/ref/alter_operator.sgml
index bdb2d02..237e4f1 100644
--- a/doc/src/sgml/ref/alter_operator.sgml
+++ b/doc/src/sgml/ref/alter_operator.sgml
@@ -26,6 +26,11 @@ ALTER OPERATOR name ( { left_typename ( { left_type | NONE } , { right_type | NONE } )
SET SCHEMA new_schema
+
+ALTER OPERATOR name ( { left_type | NONE } , { right_type | NONE } )
+SET ( {  RESTRICT = { res_proc | NONE }
+   | JOIN = { join_proc | NONE }
+ } [, ... ] )
 
  
 
@@ -34,8 +39,7 @@ ALTER OPERATOR name ( { left_type
ALTER OPERATOR changes the definition of
-   an operator.  The only currently available functionality is to change the
-   owner of the operator.
+   an operator.
   
 
   
@@ -98,6 +102,25 @@ ALTER OPERATOR name ( { left_type
 

+
+   
+ res_proc
+ 
+   
+ The restriction selectivity estimator function for this operator; write NONE to remove existing selectivity estimator.
+   
+  
+   
+
+   
+ join_proc
+ 
+   
+ The join selectivity estimator function for this operator; write NONE to remove existing selectivity estimator.
+   
+ 
+   
+
   
  
 
@@ -109,6 +132,13 @@ ALTER OPERATOR name ( { left_type
 ALTER OPERATOR @@ (text, text) OWNER TO joe;
 
+
+  
+Change the restriction and join selectivity estimator function of a custom operator a && b for type int[]:
+
+ALTER OPERATOR && (_int4, _int4) SET (RESTRICT = _int_contsel, JOIN = _int_contjoinsel);
+
+
  
 
  
diff --git a/src/backend/catalog/pg_operator.c b/src/backend/catalog/pg_operator.c
index 072f530..4c5c9c6 100644
--- a/src/backend/catalog/pg_operator.c
+++ b/src/backend/catalog/pg_operator.c
@@ -28,8 +28,10 @@
 #include "catalog/pg_operator.h"
 #include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
+#include "commands/defrem.h"
 #include "miscadmin.h"
 #include "parser/parse_oper.h"
+#include "parser/parse_func.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
@@ -53,7 +55,7 @@ static Oid OperatorShellMake(const char *operatorName,
   Oid leftTypeId,
   Oid rightTypeId);
 
-static void OperatorUpd(Oid baseId, Oid commId, Oid negId);
+static void ShellOperatorUpd(Oid baseId, Oid commId, Oid negId);
 
 static Oid get_other_operator(List *otherOp,
    Oid otherLeftTypeId, Oid otherRightTypeId,
@@ -563,7 +565,7 @@ OperatorCreate(const char *operatorName,
 		commutatorId = operatorObjectId;
 
 	if (OidIsValid(commutatorId) || OidIsValid(negatorId))
-		OperatorUpd(operatorObjectId, commutatorId, negatorId);
+		ShellOperatorUpd(operatorObjectId, commutatorId, negatorId);
 
 	return address;
 }
@@ -633,7 +635,7 @@ get_other_operator(List *otherOp, Oid otherLeftTypeId, Oid otherRightTypeId,
 }
 
 /*
- * OperatorUpd
+ * ShellOperatorUpd
  *
  *	For a given operator, look up its negator and commutator operators.
  *	If they are defined, but their negator and commutator fields
@@ -642,7 +644,7 @@ get_other_operator(List *otherOp, Oid otherLeftTypeId, Oid otherRightTypeId,
  *	which are the negator or commutator of each other.
  */
 static void
-OperatorUpd(Oid baseId, Oid commId, Oid negId)
+ShellOperatorUpd(Oid baseId, Oid commId, Oid negId)
 {
 	int			i;
 	Relation	pg_operator_desc;
@@ -864,3 +866,154 @@ makeOperatorDependencies(HeapTuple tuple)
 
 	return myself;
 }
+
+/*
+ * Operator update aka ALTER OPERATOR for RESTRICT, JOIN
+ */
+void OperatorUpd(Oid classId,
+ Oid baseId,
+ List *operator_params)
+{
+	int			i;
+	ListCell	*pl;
+	Relation	catalog;
+	HeapTuple	tup;
+	Oid 		operator_param_id = 0;
+	bool		nulls[Natts_pg_operator];
+	bool		replaces[Natts_pg_operator];
+	Datum		values[Natts_pg_operator];
+
+	for (i = 0; i < Natts_pg_operator; ++i)
+	{
+		values[i] = (Datum) 0;
+		replaces[i] = false;
+		nulls[i] = false;
+	}
+
+	catalog = heap_open(classId, RowExclusiveLock);
+	tup = SearchSysCacheCopy1(OPEROID, ObjectIdGetDatum(baseId));
+
+	if (HeapTupleIsValid(tup))
+	{
+		/*
+		 * loop over the definition list and extract the information we need.
+		 */
+		foreach(pl, operator_params)
+		{
+			DefElem*defel = (DefElem *) lfirst(pl);
+			List	   *param;
+			int			param_type;
+
+			if (defel->arg == NULL)
+param = NULL;
+			else
+param = defGetQualifiedName(defel);
+
+			if (pg_strcasecmp(defel->defname, "restrict") == 0)
+param_type = Anum_pg_operator_oprrest;
+			else if (pg_strcasecmp(defel->defname, "join") == 0)
+param_type = Anum_pg_operator_oprjoin;
+			else
+			{
+ereport(WARNING,
+		(errcode(ERRCODE_SYNTAX_ERROR),
+		 errmsg("operator attribute \"%s\" not recognized",
+defel->defname)));
+continue;
+			}
+
+			/* Resets if written NONE */
+		

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-13 Thread Michael Paquier
On Mon, Jul 13, 2015 at 9:22 PM, Sawada Masahiko  wrote:
> I might missing something but, these functions will generate WAL?
> If they does, we will face the situation where we need to wait
> forever, Fujii-san pointed out.

No, those functions are here to manipulate the metadata defining the
quorum/priority set. We definitely do not want something that
generates WAL.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-13 Thread Michael Paquier
On Mon, Jul 13, 2015 at 9:03 PM, Sawada Masahiko  wrote:
> On Mon, Jul 13, 2015 at 7:46 PM, Amit Kapila  wrote:
>> On Mon, Jul 13, 2015 at 3:39 PM, Sawada Masahiko 
>> wrote:
>>>
>>> On Tue, Jul 7, 2015 at 9:07 PM, Simon Riggs  wrote:
>>> > On 6 July 2015 at 17:28, Simon Riggs  wrote:
>>> >
>>> >> I think we need something for pg_upgrade to rewrite existing VMs.
>>> >> Otherwise a large read only database would suddenly require a massive
>>> >> revacuum after upgrade, which seems bad. That can wait for now until we
>>> >> all
>>> >> agree this patch is sound.
>>> >
>>> >
>>> > Since we need to rewrite the "vm" map, I think we should call the new
>>> > map
>>> > "vfm"
>>> >
>>> > That way we will be able to easily check whether the rewrite has been
>>> > conducted on all relations.
>>> >
>>> > Since the maps are just bits there is no other way to tell that a map
>>> > has
>>> > been rewritten
>>>
>>> To avoid revacuum after upgrade, you meant that we need to rewrite
>>> each bit of vm to corresponding bits of vfm, if it's from
>>> not-supporting vfm version(i.g., 9.5 or earlier ). right?
>>> If so, we will need to do whole scanning table, which is expensive as
>>> well.
>>> Clearing vm and do revacuum would be nice, rather than doing in
>>> upgrading, I think.
>>>
>>
>> How will you ensure to have revacuum for all the tables after
>> upgrading?
>
> We use script file which are generated by pg_upgrade.

I haven't followed this thread closely, but I am sure you recall that
vacuumdb has a parallel mode.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-13 Thread Andres Freund
On 2015-07-13 21:03:07 +0900, Sawada Masahiko wrote:
> Even If we implement rewriting tool for vm into pg_upgrade, it will
> take time as much as revacuum because it need whole scanning table.

Why would it? Sure, you can only set allvisible and not the frozen bit,
but that's fine. That way the cost for freezing can be paid over time.

If we require terrabytes of data to be scanned, including possibly
rewriting large portions due to freezing, before index only scans work
and most vacuums act in a partial manner the migration to 9.6 will be a
major pain for our users.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-13 Thread Sawada Masahiko
On Fri, Jul 10, 2015 at 10:06 PM, Beena Emerson  wrote:
> Hello,
>
> Tue, Jul 7, 2015 at 02:56 AM, Josh Berkus wrote:
>> pro-JSON:
>>
>> * standard syntax which is recognizable to sysadmins and devops.
>> * can use JSON/JSONB functions with ALTER SYSTEM SET to easily make
>> additions/deletions from the synch rep config.
>> * can add group labels (see below)
>
> Adding group labels do have a lot of values but as Amit has pointed out,
> with little modification, they can be included in GUC as well. It will not
> make it any more complex.
>
> On Tue, Jul 7, 2015 at 2:19 PM, Michael Paquier wrote:
>
>> Something like pg_syncinfo/ coupled with a LW lock, we already do
>> something similar for replication slots with pg_replslot/.
>
> I was trying to figure out how the JSON metadata can be used.
> It would have to be set using a given set of functions. Right?
> I am sorry this question is very basic.
>
> The functions could be something like:
> 1. pg_add_synch_set(set_name NAME, quorum INT, is_priority bool, set_members
> VARIADIC)
>
> This will be used to add a sync set. The set_members can be individual
> elements of another set name. The parameter is_priority is used to decide
> whether the set is priority (true) set or quorum (false). This function call
> will  create a folder pg_syncinfo/groups/$NAME and store the json blob?
>
> The root group would be automatically sset by finding the group which is not
> included in other groups? or can be set by another function?
>
> 2. pg_modify_sync_set(set_name NAME, quorum INT, is_priority bool,
> set_members VARIADIC)
>
> This will update the pg_syncinfo/groups/$NAME to store the new values.
>
> 3. pg_drop_synch_set(set_name NAME)
>
> This will update the pg_syncinfo/groups/$NAME folder. Also all the groups
> which included this would be updated?
>
> 4. pg_show_synch_set()
>
> this will display the current sync setting in json format.
>
> Am I missing something?
>
> Is JSON being preferred because it would be ALTER SYSTEM friendly and in a
> format already known to users?
>
> In a real-life scenario, at most how many groups and nesting would be
> expected?
>

I might missing something but, these functions will generate WAL?
If they does, we will face the situation where we need to wait
forever, Fujii-san pointed out.


Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-13 Thread Ildus Kurbangaliev

On 07/13/2015 01:36 PM, Amit Kapila wrote:

I have already proposed something very similar in this thread [1]
(where instead of class, I have used wait_event_type) to which
Robert doesn't agree, so here I think before writing code, it seems
prudent to get an agreement about what kind of User-Interface
would satisfy the requirement and will be extendible for future as
well.  I think it will be better if you can highlight some points about
what kind of user-interface is better (extendible) and the reasons for
same.

[1] (Refer option-3) - 
http://www.postgresql.org/message-id/caa4ek1j6cg_jym00nrwt4n8r78zn4ljoqy_zu1xrzxfq+me...@mail.gmail.com


The idea of splitting to classes and events does not confict with your 
current implementation. That is not an issue to show only one value
in pg_stat_activity and more detailed two parameters in other places. 
The base reason is that DBA will want to see grouped information about 
class (for example wait time of whole `Storage` class).


About user interface it depends from what we want to be monitored. In 
our patch we have profiling and history. In profiling we show class, 
event, wait_time and count. In history we save all parameters of wait.


Other problem of pg_stat_activity that we can not see all processes 
there (checkpointer for example). So we anyway need separate view for 
monitoring purposes.


--
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-13 Thread Sawada Masahiko
On Mon, Jul 13, 2015 at 7:46 PM, Amit Kapila  wrote:
> On Mon, Jul 13, 2015 at 3:39 PM, Sawada Masahiko 
> wrote:
>>
>> On Tue, Jul 7, 2015 at 9:07 PM, Simon Riggs  wrote:
>> > On 6 July 2015 at 17:28, Simon Riggs  wrote:
>> >
>> >> I think we need something for pg_upgrade to rewrite existing VMs.
>> >> Otherwise a large read only database would suddenly require a massive
>> >> revacuum after upgrade, which seems bad. That can wait for now until we
>> >> all
>> >> agree this patch is sound.
>> >
>> >
>> > Since we need to rewrite the "vm" map, I think we should call the new
>> > map
>> > "vfm"
>> >
>> > That way we will be able to easily check whether the rewrite has been
>> > conducted on all relations.
>> >
>> > Since the maps are just bits there is no other way to tell that a map
>> > has
>> > been rewritten
>>
>> To avoid revacuum after upgrade, you meant that we need to rewrite
>> each bit of vm to corresponding bits of vfm, if it's from
>> not-supporting vfm version(i.g., 9.5 or earlier ). right?
>> If so, we will need to do whole scanning table, which is expensive as
>> well.
>> Clearing vm and do revacuum would be nice, rather than doing in
>> upgrading, I think.
>>
>
> How will you ensure to have revacuum for all the tables after
> upgrading?

We use script file which are generated by pg_upgrade.

>  Till the time Vacuum is done on the tables that
> have vm before upgrade, any queries on those tables can
> become slower.

Even If we implement rewriting tool for vm into pg_upgrade, it will
take time as much as revacuum because it need whole scanning table.
I meant that we rewrite vm using by existing facility (i.g., vacuum
(freeze)), instead of implementing new rewriting tool for vm.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] intarray planning/exeuction problem with multiple operators

2015-07-13 Thread Uriy Zhuravlev
On Sunday 12 July 2015 23:12:41 Jeff Janes wrote:
> I've found an interesting performance problem in the intarray extension
> module.  It doesn't seem to be version dependent, I've verified it in 9.4.4
> and 9.6devel.

Hello.
Look at my patch. Maybe he solves this problem.
https://commitfest.postgresql.org/5/253/

-- 
Uriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-13 Thread Amit Kapila
On Mon, Jul 13, 2015 at 3:39 PM, Sawada Masahiko 
wrote:
>
> On Tue, Jul 7, 2015 at 9:07 PM, Simon Riggs  wrote:
> > On 6 July 2015 at 17:28, Simon Riggs  wrote:
> >
> >> I think we need something for pg_upgrade to rewrite existing VMs.
> >> Otherwise a large read only database would suddenly require a massive
> >> revacuum after upgrade, which seems bad. That can wait for now until
we all
> >> agree this patch is sound.
> >
> >
> > Since we need to rewrite the "vm" map, I think we should call the new
map
> > "vfm"
> >
> > That way we will be able to easily check whether the rewrite has been
> > conducted on all relations.
> >
> > Since the maps are just bits there is no other way to tell that a map
has
> > been rewritten
>
> To avoid revacuum after upgrade, you meant that we need to rewrite
> each bit of vm to corresponding bits of vfm, if it's from
> not-supporting vfm version(i.g., 9.5 or earlier ). right?
> If so, we will need to do whole scanning table, which is expensive as
well.
> Clearing vm and do revacuum would be nice, rather than doing in
> upgrading, I think.
>

How will you ensure to have revacuum for all the tables after
upgrading?  Till the time Vacuum is done on the tables that
have vm before upgrade, any queries on those tables can
become slower.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-13 Thread Amit Kapila
On Mon, Jul 13, 2015 at 3:26 PM, Ildus Kurbangaliev <
i.kurbangal...@postgrespro.ru> wrote:

>
> On 07/12/2015 06:53 AM, Amit Kapila wrote:
>
>  For having duration, I think you need to use gettimeofday or some
> similar call to calculate the wait time, now it will be okay for the
> cases where wait time is longer, however it could be problematic for
> the cases if the waits are very small (which could probably be the
> case for LWLocks)
>
> gettimeofday already used in our patch and it gives enough accuracy (in
> microseconds), especially when lwlock become a problem. Also we tested our
> realization and it gives overhead less than 1%. (
> http://www.postgresql.org/message-id/559d4729.9080...@postgrespro.ru,
> testing part).
>

I think that test is quite generic, we should test more combinations
(like use -M prepared option as that can stress LWLock machinery
somewhat more) and other type of tests which can stress the part
of code where gettimeofday() is used in patch.


> We need help here with testing on other platforms. I used gettimeofday
> because of builtin module "instr_time.h" that already gives cross-platform
> tested functions for measuring, but I'm planning to make similar
> implementation for monotonic functions based on clock_gettime for more
> accuracy.
>
>
> > 2) Accumulate per backend statistics about each wait event type: number
> of occurrences and total duration. With this statistics user can identify
> system bottlenecks again without sampling.
> >
> > Number #2 will be provided as a separate patch.
> > Number #1 require different concurrency model. ldus will extract it from
> "waits monitoring" patch shortly.
> >
>
> Sure, I think those should be evaluated as separate patches,
> and I can look into those patches and see if something more
> can be exposed as part of this patch which we can be reused in
> those patches.
>
> If you agree I'l do some modifications to your patch, so we can later
> extend it with our other modifications. Main issue is that one variable for
> all types is not enough. For flexibity in the future we need at least two -
> class and event, for example class=LWLock, event=ProcArrayLock, or
> class=Storage, and event=READ.
>


I have already proposed something very similar in this thread [1]
(where instead of class, I have used wait_event_type) to which
Robert doesn't agree, so here I think before writing code, it seems
prudent to get an agreement about what kind of User-Interface
would satisfy the requirement and will be extendible for future as
well.  I think it will be better if you can highlight some points about
what kind of user-interface is better (extendible) and the reasons for
same.

[1] (Refer option-3) -
http://www.postgresql.org/message-id/caa4ek1j6cg_jym00nrwt4n8r78zn4ljoqy_zu1xrzxfq+me...@mail.gmail.com


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-13 Thread Sawada Masahiko
On Tue, Jul 7, 2015 at 9:07 PM, Simon Riggs  wrote:
> On 6 July 2015 at 17:28, Simon Riggs  wrote:
>
>> I think we need something for pg_upgrade to rewrite existing VMs.
>> Otherwise a large read only database would suddenly require a massive
>> revacuum after upgrade, which seems bad. That can wait for now until we all
>> agree this patch is sound.
>
>
> Since we need to rewrite the "vm" map, I think we should call the new map
> "vfm"
>
> That way we will be able to easily check whether the rewrite has been
> conducted on all relations.
>
> Since the maps are just bits there is no other way to tell that a map has
> been rewritten

To avoid revacuum after upgrade, you meant that we need to rewrite
each bit of vm to corresponding bits of vfm, if it's from
not-supporting vfm version(i.g., 9.5 or earlier ). right?
If so, we will need to do whole scanning table, which is expensive as well.
Clearing vm and do revacuum would be nice, rather than doing in
upgrading, I think.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-13 Thread Ildus Kurbangaliev


On 07/12/2015 06:53 AM, Amit Kapila wrote:
On Fri, Jul 10, 2015 at 10:03 PM, Alexander Korotkov 
mailto:a.korot...@postgrespro.ru>> wrote:

>
> On Fri, Jun 26, 2015 at 6:39 AM, Robert Haas > wrote:

>>
>> On Thu, Jun 25, 2015 at 9:23 AM, Peter Eisentraut > wrote:

>> > On 6/22/15 1:37 PM, Robert Haas wrote:
>> >> Currently, the only time we report a process as waiting is when 
it is

>> >> waiting for a heavyweight lock.  I'd like to make that somewhat more
>> >> fine-grained, by reporting the type of heavyweight lock it's 
awaiting
>> >> (relation, relation extension, transaction, etc.).  Also, I'd 
like to
>> >> report when we're waiting for a lwlock, and report either the 
specific
>> >> fixed lwlock for which we are waiting, or else the type of lock 
(lock

>> >> manager lock, buffer content lock, etc.) for locks of which there is
>> >> more than one.  I'm less sure about this next part, but I think we
>> >> might also want to report ourselves as waiting when we are doing 
an OS

>> >> read or an OS write, because it's pretty common for people to think
>> >> that a PostgreSQL bug is to blame when in fact it's the operating
>> >> system that isn't servicing our I/O requests very quickly.
>> >
>> > Could that also cover waiting on network?
>>
>> Possibly.  My approach requires that the number of wait states be kept
>> relatively small, ideally fitting in a single byte. And it also
>> requires that we insert pgstat_report_waiting() calls around the thing
>> that is notionally blocking.  So, if there are a small number of
>> places in the code where we do network I/O, we could stick those calls
>> around those places, and this would work just fine. But if a foreign
>> data wrapper, or any other piece of code, does network I/O - or any
>> other blocking operation - without calling pgstat_report_waiting(), we
>> just won't know about it.
>
>
> Idea of fitting wait information into single byte and avoid both 
locking and atomic operations is attractive.

> But how long we can go with it?
> Could DBA make some conclusion by single querying of 
pg_stat_activity or double querying?

>

It could be helpful in situations, where the session is stuck on a
particular lock or when you see most of the backends are showing
the wait on same LWLock.

> In order to make a conclusion about system load one have to run 
daemon or background worker which is continuously sampling current 
wait events.
> Sampling current wait event with high rate also gives some overhead 
to the system as well as locking or atomic operations.

>

The idea of sampling sounds good, but I think if it adds performance
penality on the system, then we should look into the ways to avoid
it in hot-paths.

> Checking if backend is stuck isn't easy as well. If you don't expose 
how long last wait event continues it's hard to distinguish getting 
stuck on particular lock and high concurrency on that lock type.

>
> I can propose following:
>
> 1) Expose more information about current lock to user. For instance, 
having duration of current wait event, user can determine if backend 
is getting > stuck on particular event without sampling.

>

For having duration, I think you need to use gettimeofday or some
similar call to calculate the wait time, now it will be okay for the
cases where wait time is longer, however it could be problematic for
the cases if the waits are very small (which could probably be the
case for LWLocks)
gettimeofday already used in our patch and it gives enough accuracy (in 
microseconds), especially when lwlock become a problem. Also we tested 
our realization and it gives overhead less than 1%. 
(http://www.postgresql.org/message-id/559d4729.9080...@postgrespro.ru, 
testing part). We need help here with testing on other platforms. I used 
gettimeofday because of builtin module "instr_time.h" that already gives 
cross-platform tested functions for measuring, but I'm planning to make 
similar implementation for monotonic functions based on clock_gettime 
for more accuracy.


> 2) Accumulate per backend statistics about each wait event type: 
number of occurrences and total duration. With this statistics user 
can identify system bottlenecks again without sampling.

>
> Number #2 will be provided as a separate patch.
> Number #1 require different concurrency model. ldus will extract it 
from "waits monitoring" patch shortly.

>

Sure, I think those should be evaluated as separate patches,
and I can look into those patches and see if something more
can be exposed as part of this patch which we can be reused in
those patches.
If you agree I'l do some modifications to your patch, so we can later 
extend it with our other modifications. Main issue is that one variable 
for all types is not enough. For flexibity in the future we need at 
least two - class and event, for example class=LWLock, 
event=ProcArrayLock, or class=Storage, and event=READ. With this 
modification it is not so big

Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-07-13 Thread Shulgin, Oleksandr
On Mon, Jul 13, 2015 at 9:44 AM, Pavel Stehule 
wrote:

>
> To reiterate: for my problem, that is escaping numerics that can
>> potentially overflow[1] under ECMAScript standard, I want to be able to
>> override the code that outputs the numeric converted to string.  There is
>> no way in current implementation to do that *at all*, short of copying all
>> the code involved in producing JSON output and changing it at certain
>> points.  One could try re-parsing JSON instead, but that doesn't actually
>> solve the issue, because type information is lost forever at that point.
>>
>> The whitespace unification was a mere side-effect of the original effort
>> on this patch.
>>
>
> The dynamic type change is some what I would not to do in database, really
> :)
>
> If you afraid about overflow, then convert numeric to string immediately -
> in this case, the client have to support both variant - so immediate cast
> should not be a problem.
>

Yeah, but how would you do that in context of a logical replication
decoding plugin?  I've tried a number of tricks for that, including, but
not limited to registering phony types to wrap numeric type and replacing
the OID of numeric with this custom type OID in TupleDesc, but then again
one has to register that as known record type, etc.

Anyway this check on max number should be implemented in our JSON(b) out
> functions (as warning?).
>

Not really, since this is a problem of ECMAScript standard, not JSON spec.
For example, Python module for handling JSON doesn't suffer from this
overflow problem,

The thing is, we cannot know which clients are going to consume the stream
of decoded events, and if it's some implementation of JavaScript, it can
suffer silent data corruption if we don't guard against that in the logical
decoding plugin.

Hope that makes it clear. :-)

--
Alex


Re: [HACKERS] multivariate statistics / patch v7

2015-07-13 Thread Kyotaro HORIGUCHI
Hi, Thanks for the detailed explaination. I misunderstood the
code (more honest speaking, din't look so close there). Then I
looked it closer.


At Wed, 08 Jul 2015 03:03:16 +0200, Tomas Vondra  
wrote in <559c76d4.2030...@2ndquadrant.com>
> FWIW this was a stupid bug in update_match_bitmap_histogram(), which
> initially handled only AND clauses, and thus assumed the "match" of a
> bucket can only decrease. But for OR clauses this is exactly the
> opposite (we assume no buckets match and add buckets matching at least
> one of the clauses).
> 
> With this fixed, the estimates look like this:
> 

> IMHO pretty accurate estimates - no issue with OR clauses.

Ok, I understood the diferrence between what I thought and what
you say. The code is actually concious of OR clause but is looks
somewhat confused.

Currently choosing mv stats in clauselist_selectivity can be
outlined as following,

1. find_stats finds candidate mv stats containing *all*
   attributes appeared in the whole clauses regardless of and/or
   exprs by walking whole the clause tree.

   Perhaps this is the measure to early bailout.

2.1. Within every disjunction elements, collect mv-related
   attributes while checking whether the all leaf nodes (binop or
   ifnull) are compatible by (eventually) walking whole the
   clause tree.

2.2. Check if all the collected attribute are contained in
   mv-stats columns.

3. Finally, clauseset_mv_selectivity_histogram() (and others).

   This funciton applies every ExprOp onto every attribute in
   every histogram backes and (tries to) make the boolean
   operation of the result bitmaps.

I have some comments on the implement and I also try to find the
solution for them.


1. The flow above looks doing  very similiar thins repeatedly.

2. I believe what the current code does can be simplified.

3. As you mentioned in comments, some additional infrastructure
   needed.

After all, I think what we should do after this are as follows,
as the first step.

- Add the means to judge the selectivity operator(?) by other
  than oprrest of the op of ExprOp. (You missed neqsel already)

  I suppose one solution for this is adding oprmvstats taking
  'm', 'h' and 'f' and their combinations. Or for the
  convenience, it would be a fixed-length string like this.

  oprname | oprmvstats
  =   | 'mhf'
  <>  | 'mhf'
  <   | 'mh-'
  >   | 'mh-'
  >=  | 'mh-'
  <=  | 'mh-'

  This would make the code in clause_is_mv_compatible like this.

  > oprmvstats = get_mvstatsset(expr->opno); /* bitwise representation */
  > if (oprmvstats & types)
  > {
  >*attnums = bms_add_member(*attnums, var->varattno);
  >return true;
  > }
  > return false;

- Current design just manage to work but it is too complicated
  and hardly have affinity with the existing estimation
  framework. I proposed separation of finding stats phase and
  calculation phase, but I would like to propose transforming
  RestrictInfo(and finding mvstat) phase and running the
  transformed RestrictInfo phase after looking close to the
  patch.

  I think transforing RestrictInfo makes the situnation
  better. Since it nedds different information, maybe it is
  better to have new struct, say, RestrictInfoForEstimate
  (boo!). Then provide mvstatssel() to use in the new struct.
  The rough looking of the code would be like below. 

  clauselist_selectivity()
  {
...
RestrictInfoForEstmate *esclause =
  transformClauseListForEstimation(root, clauses, varRelid);
...

return clause_selectivity(esclause):
  }

  clause_selectivity(RestrictInfoForEstmate *esclause)
  {
if (IsA(clause, RestrictInfo))...
if (IsA(clause, RestrictInfoForEstimate))
{
   RestrictInfoForEstimate *ecl = (RestrictInfoForEstimate*) clause;
   if (ecl->selfunc)
   {
  sx = ecl->selfunc(root, ecl);
   }
}
if (IsA(clause, Var))...
  }

  
  transformClauseListForEstimation(...)
  {
...

relid = collect_mvstats_info(root, clause, &attlist);
if (!relid) return;
if (get_mvstats_hook)
 mvstats = (*get_mvstats_hoook) (root, relid, attset);
else
 mvstats = find_mv_stats(root, relid, attset))
  }
  ...

> I've pushed this to github [1] but I need to do some additional
> fixes. I also had to remove some optimizations while fixing this, and
> will have to reimplement those.
> 
> That's not to say that the handling of OR-clauses is perfectly
> correct. After looking at clauselist_selectivity_or(), I believe it's
> a bit broken and will need a bunch of fixes, as explained in the
> FIXMEs I pushed to github.
> 
> [1] https://github.com/tvondra/postgres/tree/mvstats

I don't see whether it is doable or not, and I suppose you're
unwilling to change the big picture, so I will consider the idea
and will show you the result, if it turns out to be possible and
promising.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@

Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-13 Thread Michael Paquier
On Mon, Jul 13, 2015 at 7:36 AM, Tom Lane  wrote:
> I wrote:
>> The two contrib modules this patch added are nowhere near fit for public
>> consumption.  They cannot clean up after themselves when dropped:
>> ...
>> Raw inserts into system catalogs just
>> aren't a sane thing to do in extensions.
>
> I had some thoughts about how we might fix that, without going to the
> rather tedious lengths of creating a complete set of DDL infrastructure
> for CREATE/DROP TABLESAMPLE METHOD.
>
> First off, the extension API designed for the tablesample patch is
> evidently modeled on the index AM API, which was not a particularly good
> precedent --- it's not a coincidence that index AMs can't be added or
> dropped on-the-fly.  Modeling a server internal API as a set of
> SQL-visible functions is problematic when the call signatures of those
> functions can't be accurately described by SQL datatypes, and it's rather
> pointless and inefficient when none of the functions in question is meant
> to be SQL-callable.  It's even less attractive if you don't think you've
> got a completely stable API spec, because adding, dropping, or changing
> signature of any one of the API functions then involves a pile of
> easy-to-mess-up catalog changes on top of the actually useful work.
> Not to mention then having to think about backwards compatibility of
> your CREATE command's arguments.
>
> We have a far better model to follow already, namely the foreign data
> wrapper API.  In that, there's a single SQL-visible function that returns
> a dummy datatype indicating that it's an FDW handler, and when called,
> it hands back a struct containing pointers to all the other functions
> that the particular wrapper needs to supply (and, if necessary, the struct
> could have non-function-pointer fields containing other info the core
> system might need to know about the handler).  We could similarly invent a
> pseudotype "tablesample_handler" and represent each tablesample method by
> a single SQL function returning tablesample_handler.  Any future changes
> in the API for tablesample handlers would then appear as changes in the C
> definition of the struct returned by the handler, which requires no
> SQL-visible thrashing, hence creates no headaches for pg_upgrade, and it
> is pretty easy to design it so that failure to update an external module
> that implements the API results in C compiler errors and/or sane fallback
> behavior.

That's elegant.

> Once we've done that, I think we don't even need a special catalog
> representing tablesample methods.  Given "FROM foo TABLESAMPLE
> bernoulli(...)", the parser could just look for a function bernoulli()
> returning tablesample_handler, and it's done.  The querytree would have an
> ordinary function dependency on that function, which would be sufficient
> to handle DROP dependency behaviors properly.  (On reflection, maybe
> better if it's "bernoulli(internal) returns tablesample_handler",
> so as to guarantee that such a function doesn't create a conflict with
> any user-defined function of the same name.)
>
> Thoughts?

Regarding the fact that those two contrib modules can be part of a
-contrib package and could be installed, nuking those two extensions
from the tree and preventing the creating of custom tablesample
methods looks like a correct course of things to do for 9.5.

When looking at this patch I was as well surprised that the BERNOUILLI
method can only be applied on a physical relation and was not able to
fire on a materialized set of tuples, say the result of a WITH clause
for example.

> PS: now that I've written this rant, I wonder why we don't redesign the
> index AM API along the same lines.  It probably doesn't matter much at
> the moment, but if we ever get serious about supporting index AM
> extensions, I think we ought to consider doing that.

Any API redesign looks to be clearly 9.6 work IMO at this point.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SQL function to report log message

2015-07-13 Thread Michael Paquier
On Mon, Jul 13, 2015 at 4:54 PM, dinesh kumar  wrote:
> Would like to discuss on below feature here.
>
> Feature:
> Having an SQL function, to write messages to log destination.
>
> Justification:
> As of now, we don't have an SQL function to write custom/application
> messages to log destination. We have "RAISE" clause which is controlled by
> log_ parameters. If we have an SQL function which works irrespective of log
> settings, that would be a good for many log parsers. What i mean is, in DBA
> point of view, if we route all our native OS stats to log files in a proper
> format, then we can have our log reporting tools to give most effective
> reports. Also, Applications can log their own messages to postgres log
> files, which can be monitored by DBAs too.

What's the actual use case for this feature other than it would be
good to have it? A log message is here to give information about the
state of something that happens in backend, but in the case of this
function the event that happens is the content of the function itself.
It also adds a new log level for something that has a unique usage,
which looks like an overkill to me. Btw, you could do something more
advanced with simply an extension if you really want to play with this
area... But I am dubious about what kind of applications would use
that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] SQL function to report log message

2015-07-13 Thread dinesh kumar
Hi All,

Greetings for the day.

Would like to discuss on below feature here.

Feature:
Having an SQL function, to write messages to log destination.

Justification:
As of now, we don't have an SQL function to write custom/application
messages to log destination. We have "RAISE" clause which is controlled by
log_ parameters. If we have an SQL function which works irrespective of log
settings, that would be a good for many log parsers. What i mean is, in DBA
point of view, if we route all our native OS stats to log files in a proper
format, then we can have our log reporting tools to give most effective
reports. Also, Applications can log their own messages to postgres log
files, which can be monitored by DBAs too.

Implementation:
Implemented a new function "pg_report_log" which takes one argument as
text, and returns void. I took, "LOG" prefix for all the reporting
messages.I wasn't sure to go with new prefix for this, since these are
normal LOG messages. Let me know, if i am wrong here.

Here is the attached patch.

Regards,
Dinesh
manojadinesh.blogspot.com
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 76f77cb..b2fc4cd 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -17850,6 +17850,15 @@ postgres=# SELECT * FROM 
pg_xlogfile_name_offset(pg_stop_backup());
 Return information about a file.

   
+  
+   
+pg_report_log(message 
text])
+   
+   void
+   
+Write message into log file.
+   
+  
  
 

@@ -17918,6 +17927,18 @@ SELECT (pg_stat_file('filename')).modification;
 

 
+   
+pg_report_log
+   
+   
+pg_report_log is useful to write custom messages
+into current log destination and returns void.
+Typical usages include:
+
+SELECT pg_report_log('Message');
+
+   
+
   
 
   
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index c0495d9..6c54f3a 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -76,6 +76,23 @@ current_query(PG_FUNCTION_ARGS)
 }
 
 /*
+ * pg_report_log()
+ *
+ * Printing custom log messages in log file.
+ */
+
+Datum
+pg_report_log(PG_FUNCTION_ARGS)
+{
+
+   ereport(MESSAGE,
+   (errmsg("%s", text_to_cstring(PG_GETARG_TEXT_P(0))),
+   errhidestmt(true)));
+
+   PG_RETURN_VOID();
+}
+
+/*
  * Send a signal to another backend.
  *
  * The signal is delivered if the user is either a superuser or the same
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 088c714..2e8f547 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -302,7 +302,7 @@ errstart(int elevel, const char *filename, int lineno,
elevel == INFO);
}
 
-   /* Skip processing effort if non-error message will not be output */
+   /* Skip processing effort if non-error,custom message will not be 
output */
if (elevel < ERROR && !output_to_server && !output_to_client)
return false;
 
@@ -2062,6 +2062,7 @@ write_eventlog(int level, const char *line, int len)
case DEBUG3:
case DEBUG2:
case DEBUG1:
+   case MESSAGE:
case LOG:
case COMMERROR:
case INFO:
@@ -2917,6 +2918,7 @@ send_message_to_server_log(ErrorData *edata)
case DEBUG1:
syslog_level = LOG_DEBUG;
break;
+   case MESSAGE:
case LOG:
case COMMERROR:
case INFO:
@@ -3547,6 +3549,7 @@ error_severity(int elevel)
case DEBUG5:
prefix = _("DEBUG");
break;
+   case MESSAGE:
case LOG:
case COMMERROR:
prefix = _("LOG");
@@ -3666,6 +3669,9 @@ is_log_level_output(int elevel, int log_min_level)
/* Neither is LOG */
else if (elevel >= log_min_level)
return true;
+   /* If elevel is MESSAGE, then ignore log settings */
+   else if (elevel == MESSAGE)
+   return true;
 
return false;
 }
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 6fd1278..62c619a 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5344,6 +5344,11 @@ DESCR("tsm_bernoulli_reset(internal)");
 DATA(insert OID = 3346 (  tsm_bernoulli_cost   PGNSP PGUID 12 1 0 0 0 
f f f f t f v 7 0 2278 "2281 2281 2281 2281 2281 2281 2281" _null_ _null_ 
_null_ _null_ _null_ tsm_bernoulli_cost _null_ _null_ _null_ ));
 DESCR("tsm_bernoulli_cost(internal)");
 
+/* Logging function */
+
+DATA(insert OID = 6015 (  pg_report_logPGNSP PGUID 12 1 0 0 0 
f f f f t f v 1 0 2278 "1043" _

Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-07-13 Thread Pavel Stehule
2015-07-13 9:30 GMT+02:00 Shulgin, Oleksandr :

> On Sun, Jul 12, 2015 at 8:39 PM, Pavel Stehule 
> wrote:
>
>> The thing is - it's not only about whitespace, otherwise I would probably
>>> not bother with the generic interface. For my original problem, there is
>>> simply no way to do this correctly in an extension w/o copying over all of
>>> the logic from json.c, which I have to do right now, would rather not.
>>>
>> I am sorry - we are talking about JSON, not about any styled document. I
>> disagree, so it has not be implemented as extension - the backport of JSON
>> support is a extension.
>>
>
> Hm... I'm having a hard time making sense of that statement, sorry.
>
> To reiterate: for my problem, that is escaping numerics that can
> potentially overflow[1] under ECMAScript standard, I want to be able to
> override the code that outputs the numeric converted to string.  There is
> no way in current implementation to do that *at all*, short of copying all
> the code involved in producing JSON output and changing it at certain
> points.  One could try re-parsing JSON instead, but that doesn't actually
> solve the issue, because type information is lost forever at that point.
>
> The whitespace unification was a mere side-effect of the original effort
> on this patch.
>

The dynamic type change is some what I would not to do in database, really
:)

If you afraid about overflow, then convert numeric to string immediately -
in this case, the client have to support both variant - so immediate cast
should not be a problem.

Anyway this check on max number should be implemented in our JSON(b) out
functions (as warning?).

Regards

Pavel


>
> --
> Best regards,
> Alex
>
> [1]
> http://stackoverflow.com/questions/307179/what-is-javascripts-highest-integer-value-that-a-number-can-go-to-without-losin
>
>


Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-07-13 Thread Shulgin, Oleksandr
On Sun, Jul 12, 2015 at 8:39 PM, Pavel Stehule 
wrote:

> The thing is - it's not only about whitespace, otherwise I would probably
>> not bother with the generic interface. For my original problem, there is
>> simply no way to do this correctly in an extension w/o copying over all of
>> the logic from json.c, which I have to do right now, would rather not.
>>
> I am sorry - we are talking about JSON, not about any styled document. I
> disagree, so it has not be implemented as extension - the backport of JSON
> support is a extension.
>

Hm... I'm having a hard time making sense of that statement, sorry.

To reiterate: for my problem, that is escaping numerics that can
potentially overflow[1] under ECMAScript standard, I want to be able to
override the code that outputs the numeric converted to string.  There is
no way in current implementation to do that *at all*, short of copying all
the code involved in producing JSON output and changing it at certain
points.  One could try re-parsing JSON instead, but that doesn't actually
solve the issue, because type information is lost forever at that point.

The whitespace unification was a mere side-effect of the original effort on
this patch.

--
Best regards,
Alex

[1]
http://stackoverflow.com/questions/307179/what-is-javascripts-highest-integer-value-that-a-number-can-go-to-without-losin


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-13 Thread Haribabu Kommi
On Mon, Jul 13, 2015 at 12:06 PM, Haribabu Kommi
 wrote:
> On Thu, Jul 9, 2015 at 5:36 PM, Haribabu Kommi  
> wrote:
>>
>> I will do some performance tests and send you the results.
>
> Here are the performance results tested on my machine.
>
>
>  Head  vm patchvm+prefetch 
> patch
>
> First vacuum120sec<1sec <1sec
> second vacuum180 sec   180 sec30 sec
>
> I did some modifications in the code to skip the vacuum truncation by
> the first vacuum command.
> This way I collected the second vacuum time taken performance.
>
> I just combined your vm and prefetch patch into a single patch
> vm+prefetch patch without a GUC.
> I kept the prefetch as 32 and did the performance test. I chosen
> prefetch based on the current
> buffer access strategy, which is 32 for vacuum presently instead of an
> user option.
> Here I attached the modified patch with both vm+prefetch logic.
>
> I will do some tests on a machine with SSD and let you know the
> results. Based on these results,
> we can decide whether we need a GUC or not? based on the impact of
> prefetch on ssd machines.

Following are the performance readings on a machine with SSD.
I increased the pgbench scale factor to 1000 in the test instead of 500
to show a better performance numbers.

 Head   vm patchvm+prefetch patch

First vacuum6.24 sec   2.91 sec   2.91 sec
second vacuum6.66 sec   6.66 sec   7.19 sec

There is a small performance impact on SSD with prefetch.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers