from:"Jim Nasby"

Re: [HACKERS] Convert pltcl from strings to objects

2016-02-27 Thread Jim Nasby


On 2/25/16 9:30 AM, Alvaro Herrera wrote:

Jim Nasby wrote:


Here we have another case. prodesc is a global thing. And it is shared
between different operations. Problem was that there is no partcular
owner, and we have to wait when last operation which deals with it
would finish. It looks like perfect job for reference counting.


I've just tried to wrap my head around what's going on with prodesc and
failed... specifically, I don't understand this claim in the comment:

* Add the proc description block to the hashtable.  Note we do not
* attempt to free any previously existing prodesc block.  !!This is
* annoying, but necessary since there could be active calls using
* the old prodesc.!!

What else could be referencing it? I realize it's stored in pltcl_proc_htab,
but AFAICT that's backend-local. So I don't understand what else could be
referencing it.


Try to open a cursor that uses the function, fetch a few tuples from it;
then change the function and fetch more rows from the cursor.  I suppose
the open cursor could contain a reference to the function's prodesc.

Refcounting the prodesc would let it live until the cursor's closed,
then free it.


Hadn't thought about cursors; I suspect you're right about that. I 
wounder if other PLs would handle that correctly.


I'm also not sure how the reference would get decremented... via 
ResourceOwner somehow?

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sanity checking for ./configure options?

2016-02-27 Thread Jim Nasby


On 2/26/16 9:34 AM, Ivan Kartyshov wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   tested, failed
Documentation:tested, failed

Tested, I think it`s rather important to make cleanup work on that project.


Did you mean to mark all those items as tested, failed?

On another note, the other use case for allowing 1-1024 is if you run 
with listen_address=''.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sanity checking for ./configure options?

2016-02-27 Thread Jim Nasby


On 2/26/16 9:29 PM, Peter Eisentraut wrote:

To make this really robust, you might need to do pattern matching on the
value.


Yeah, and I don't see any reasonable way to do that... we don't require 
sed or the like, do we?


I'll look at the other things you mentioned.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Improve error handling in pltcl

2016-02-28 Thread Jim Nasby


Per discussion in [1], this patch improves error reporting in pltcl.

pltcl_error_objects.patch applies on top of the pltcl_objects_2.patch 
referenced in [2].


pltcl_error_master.patch applies against current master.

[1] 
http://www.postgresql.org/message-id/20160223150401.2173d...@wagner.wagner.home

[2] http://www.postgresql.org/message-id/56cce7d2.9090...@bluetreble.com
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
diff --git a/doc/src/sgml/pltcl.sgml b/doc/src/sgml/pltcl.sgml
index d2175d5..d5c576d 100644
--- a/doc/src/sgml/pltcl.sgml
+++ b/doc/src/sgml/pltcl.sgml
@@ -775,6 +775,127 @@ CREATE EVENT TRIGGER tcl_a_snitch ON ddl_command_start 
EXECUTE PROCEDURE tclsnit
 

 
+   
+Error Handling in PL/Tcl
+
+
+ error handling
+ in PL/Tcl
+
+
+
+ All Tcl errors that are allowed to propagate back to the top level of the
+ interpreter, that is, errors not caught within the stored procedure
+ using the Tcl catch command will raise a database
+ error.
+
+
+ Tcl code within or called from the stored procedure can choose to
+ raise a database error by invoking the elog
+ command provided by PL/Tcl or by generating an error using the Tcl
+ error command and not catching it with Tcl's
+ catch command.
+
+
+ Database errors that occur from the PL/Tcl stored procedure's
+ use of spi_exec, spi_prepare,
+ and spi_execp are also catchable by Tcl's
+ catch command.
+
+
+ Tcl provides an errorCode variable that can
+ represent additional information about the error in a form that
+ is easy for programs to interpret.  The contents are in Tcl list
+ format and the first word identifies the subsystem or
+ library responsible for the error and beyond that the contents are left
+ to the individual code or library.  For example if Tcl's
+ open command is asked to open a file that doesn't
+ exist, errorCode
+ might contain POSIX ENOENT {no such file or directory}
+ where the third element may vary by locale but the first and second
+ will not.
+
+
+ When spi_exec, spi_prepare
+ or spi_execp cause a database error to be raised,
+ that database eror propagates back to Tcl as a Tcl error.
+ In this case errorCode is set to a list
+ where the first element is POSTGRES followed by a
+ copious decoding of the Postgres error structure.  Since fields in the
+ structure may or may not be present depending on the nature of the
+ error, how the function was invoked, etc, PL/Tcl has adopted the 
+ convention that subsequent elements of the errorCode
+ list are key-value pairs where the first value is the name of the
+ field and the second is its value.
+
+
+ Fields that may be present include message,
+ detail, detail_log,
+ hint, domain,
+ context_domain, context,
+ schema, table,
+ column, datatype,
+ constraint, cursor_position,
+ internalquery, internal_position,
+ filename, lineno and
+ funcname.
+
+
+ You might find it useful to load the results into an array. Code
+ for doing that might look like
+
+if {[lindex $errorCode 0] == "POSTGRES"} {
+array set errorRow [lrange $errorCode 1 end]
+}
+
+
+
+ In the example below we cause an error by attempting to
+ SELECT from a table that doesn't exist.
+
+select tcl_eval('spi_exec "select * from foo;"');
+
+
+
+ERROR:  relation "foo" does not exist
+
+
+
+
+ Now we examine the error code.  (The double-colons explicitly
+ reference errorCode as a global variable.)
+
+select tcl_eval('join $::errorCode "\n"');
+
+
+
+   tcl_eval
+---
+ POSTGRES +
+ message  +
+ relation "foo" does not exist+
+ domain   +
+ postgres-9.6 +
+ context_domain   +
+ postgres-9.6 +
+ cursorpos+
+ 0+
+ internalquery+
+ select * from foo;   +
+ internalpos  +
+ 15   +
+ filename +
+ parse_relation.c +
+ lineno   +
+ 1159 +
+ funcname +
+ parserOpenTable
+(1 row)
+
+
+
+   
+

Modules and the unknown Command

diff --git a/src/pl/tcl/expected/pltcl_setup.out 
b/src/pl/tcl/expected/pltcl_setup.out
index 4183c14..0a9f9f4 100644
--- a/src/pl/tcl/expected/pltcl_setup.out
+++ b/src/pl/tcl/expected/pltcl_setup.out
@@ -542,3 +542,44 @@ NOTICE:  tclsnitch: ddl_command_start DROP TABLE
 NOTICE:  tclsnitch: ddl

Re: [HACKERS] Convert pltcl from strings to objects

2016-02-29 Thread Jim Nasby


On 2/29/16 9:57 AM, Tom Lane wrote:

plpgsql already has a similar mechanism (see PLpgSQL_function.use_count)
which you could probably copy.  But I'd advise that this is a separate
matter to be addressed in a separate patch; it has little to do with the
nominal subject matter of this patch.


Ahh, thanks for pointing that out.

Completely agree on it being a separate patch. Flight Aware is a big 
pltcl user as well as a contributor to the TCL community, so there's 
several more patches in the works. This would be one of them.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improve error handling in pltcl

2016-02-29 Thread Jim Nasby


On 2/28/16 5:50 PM, Jim Nasby wrote:

Per discussion in [1], this patch improves error reporting in pltcl.


I forgot to mention that this work is sponsored by Flight Aware 
(http://flightaware.com).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dealing with extension dependencies that aren't quite 'e'

2016-02-29 Thread Jim Nasby


On 2/29/16 7:27 PM, Abhijit Menon-Sen wrote:

1. This adds the 'x'/DEPENDENCY_AUTO_EXTENSION type.
2. This adds an 'ALTER FUNCTION … ADD DEPENDENT FUNCTION …' command.

I split up the two because we may want the new dependency type without
going to the trouble of adding a new command. Maybe extension authors
should just insert an 'x' row into pg_depend directly?


I don't see why this would be limited to just functions. I could 
certainly see an extension that creates ease-of-use views that depend on 
the extension, or tables that have triggers that  Am I missing 
something?



I was inclined to implement it using ALTER FUNCTION, but AlterFunction()
is focused on altering the pg_proc entry for a function, so the new code
didn't fit. Ultimately, ExecAlterExtensionContentsStmt() was the closest
match, so that's where I did it.


Maybe the better way to handle this would be through ALTER EXTENSION?

Given the audience for this, I think it'd probably be OK to just provide 
a function that does this, instead of DDL. I'd be concerned about asking 
users to do raw inserts though. pg_depends isn't the easiest thing to 
grok so I suspect there'd be a lot of problems with that, resulting in 
more raw DML to try and fix things, resulting in pg_depend getting 
completely screwed up...

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dealing with extension dependencies that aren't quite 'e'

2016-03-01 Thread Jim Nasby

On 2/29/16 10:33 PM, Abhijit Menon-Sen wrote:

>Given the audience for this, I think it'd probably be OK to just
>provide a function that does this, instead of DDL.

That seems like a promising idea. Can you suggest some possible usage?

pg_extension_dependency( regextension, any )

where "any" would be all the other reg* types. That should be a lot less 
work to code up than messing with the grammar.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-01 Thread Jim Nasby

On 2/28/16 4:02 PM, Andres Freund wrote:

So, where to go from here?  I'm acutely aware that we're hard up against
>the final 9.6 commitfest, and that we discourage major patches arriving
>so late in a devel cycle.  But I simply couldn't get this done any faster.
>I don't really want to hold it over for the 9.7 devel cycle.  It's been
>enough trouble maintaining this patch in the face of conflicting commits
>over the last year or so (it's probably still got bugs related to parallel
>query...), and there definitely are conflicting patches in the upcoming
>'fest.  And the lack of this infrastructure is blocking progress on FDWs
>and some other things.
>
>So I'd really like to get this into 9.6.  I'm happy to put it into the
>March commitfest if someone will volunteer to review it.

Hard. This is likely to cause/trigger a number of bugs, and we don't
have much time to let this mature. It's a change that we're unlikely to
be able to back-out if we discover that it wasn't the right thing to
integrate shortly before the release.  On the other hand, this is a
major architectural step forward; one that unblocks a number of nice
features. There's also an argument to be made that integrating this now
is beneficial, because it'll cause less churn for patches being
developed while 9.6 is stabilizing.

Perhaps the best way to handle this would be to commit it to a branch 
sooner rather than later. If things work out, that branch can become the 
official beta. If not, in can become the basis for 9.7.

If nothing else it means that Tom isn't the only one stuck trying to 
maintain this. Even if the branch is nothing but a means to generating a 
patch for 9.7, having it in place makes it a lot easier for other 
developers that need to to code against it.

While I'm promoting heresy... I imagine that this patch doesn't require 
a catversion bump. Perhaps it would be worth doing a short-cycle major 
release just to get this in. That might sound insane but since one of 
the biggest obstacles to upgrading remains dealing with the on-disk 
format, I don't think users would freak out about it.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PROPOSAL: Fast temporary tables

2016-03-01 Thread Jim Nasby


On 3/1/16 10:05 AM, Atri Sharma wrote:

Fair point, that means inventing a whole new OID generation structure..


Generation is just the tip of the iceberg. You still need the equivalent 
to foreign keys (ie: pg_depend). While you would never have a permanent 
object depend on a temp object, the reverse certainly needs to be supported.


If I were attempting to solve this at a SQL level, I'd be thinking about 
using table inheritance such that the permanent objects are stored in a 
permanent parent. New backends would create UNLOGGED children off of 
that parent. There would be a pid column that was always NULL in the 
parent, but populated in children. That means children could use their 
own local form of an OID. When a backend terminates you'd just truncate 
all it's tables.


Actually translating that into relcache and everything else would be a 
serious amount of work.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Publish autovacuum informations

2016-03-01 Thread Jim Nasby

On 3/1/16 8:37 AM, Julien Rouhaud wrote:

>
>We understood (IMHO is an interesting idea) but as Michael said hooks is
>for a general purpose. So can you demonstrate other use cases for this
>new hooks?
>

I can think of several usage.  First, since the hook will always be
called, an extension will see all the activity a worker is doing when
exposing private structure will always be some kind of sampling.  Then,

I think that's pretty key. If you wanted to create an extension that 
logs vacuums (which would be great, since current state of the art is 
logs + pgBadger), you'd want to gather your data about what the vacuum 
did as the vacuum was ending.

I can certainly see cases where you don't care about that and just want 
what's in shared memory, but that would only be useful for monitoring 
what's happening real-time, not for knowing what final results are.

BTW, I think as much of this as possible should also work for regular 
vacuums.

you can have other information that wouldn't be available just by
exposing private structure.  For instance knowing a VACUUM isn't
performed by the worker (either because another worker is already
working on it or because it isn't needed anymore). IIRC there was a
discussion about concurrency issue in this case. We can also know if the
maintenance was cancelled due to lock not obtained fast enough.
Finally, as long as the hooks aren't use, they don't have any overhead.
  I agree that all this is for monitoring purpose.

I'm not sure what are the fancy things that Michael had in mind with
exposing the private structure.  Michael, was it something like having
the ability to change some of these data through an extension?

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improve error handling in pltcl

2016-03-01 Thread Jim Nasby


On 2/29/16 10:01 PM, Tom Lane wrote:

Jim Nasby  writes:

On 2/28/16 5:50 PM, Jim Nasby wrote:

Per discussion in [1], this patch improves error reporting in pltcl.



I forgot to mention that this work is sponsored by Flight Aware
(http://flightaware.com).


Huh ... I use that site.  There's PG and pltcl code behind it?
Cool!


Heh, I didn't realize you were a TCL fan.

They've been heavy PG users from the start. Eventually PG had trouble 
keeping up with the in-flight tracking so they created Speed Tables [1]. 
And Karl (one of the founders) is a well known TCL contributor[2].


When it comes to sheer geek factor though, I think the 
multilateration[3] stuff they're doing with their ADS-B network is a 
really cool data application. It's basically a form of "reverse GPS" for 
tracking aircraft.


[1] https://github.com/flightaware/speedtables
[2] http://wiki.tcl.tk/83
[3] http://flightaware.com/adsb/mlat/
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Publish autovacuum informations

2016-03-01 Thread Jim Nasby


On 3/1/16 3:02 PM, Julien Rouhaud wrote:

You mean for database wide vacuum?


I mean manual vacuum. Some hooks and stats would apply only to autovac 
obviously (and it'd be nice to get visibility into the scheduling 
decisions both daemons are making). But as much as possible things 
should be done in vacuum.c/lazyvacuum.c so it works for manual vacuums 
as well.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql - DECLARE - cannot to use %TYPE or %ROWTYPE for composite types

2016-03-02 Thread Jim Nasby


On 3/2/16 3:52 PM, Pavel Stehule wrote:

Right, and it's arguably dubious that that doesn't already work.
Unfortunately, these % things are just random plpgsql parser hacks, not
real types.  Maybe this should be done in the main PostgreSQL parser
with parameter hooks, if we wanted this feature to be available outside
plpgsql as well.


I am not fan to propagate this feature outside PLpgSQL - it is possible
new dependency between database object, and the cost is higher than
benefits.


I fail to see how it'd be a dependency. I'd expect it to look up the 
type when you run the command, just like plpgsql does. I think it'd be 
useful to have.


That said, I think that should be a completely separate patch and 
discussion. Lets at least get it into plpgsql first.


As for the array of element/element of array feature; I agree it would 
be nice, but we're pretty late in the game for that, and I don't see why 
that couldn't be added later.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2016-03-02 Thread Jim Nasby


On 3/2/16 4:21 PM, Peter Geoghegan wrote:

I think you should commit this. The chances of anyone other than you
and Masahiko recalling that you developed this tool in 3 years is
essentially nil. I think that the cost of committing a developer-level
debugging tool like this is very low. Modules like pg_freespacemap
currently already have no chance of being of use to ordinary users.
All you need to do is restrict the functions to throw an error when
called by non-superusers, out of caution.

It's a problem that modules like pg_stat_statements and
pg_freespacemap are currently lumped together in the documentation,
but we all know that.


+1.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Convert pltcl from strings to objects

2016-03-02 Thread Jim Nasby


On 3/2/16 12:32 PM, Tom Lane wrote:

Jim Nasby  writes:

[ pltcl_objects_2.patch ]


I've pushed this with some minor fixes, as well as the followup work
mentioned in this thread.


Awesome, thanks!

I've asked Karl's opinion on increasing the minimum TCL version, but I 
suspect that won't be an issue.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Convert pltcl from strings to objects

2016-03-02 Thread Jim Nasby


On 3/1/16 5:06 PM, Tom Lane wrote:

If we don't do that, I'm at least going to put in a similar #error for
Tcl 8.0; but I really think we ought to just say 8.4 is the minimum.


Just confirmed that should be completely reasonable. I'll take a look at 
it in a few days if you don't beat me to it.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Publish autovacuum informations

2016-03-02 Thread Jim Nasby


On 3/2/16 10:48 AM, Julien Rouhaud wrote:

Good point, I don't see a lot of information available with this hooks
that a native system statistics couldn't offer. To have the same amount
of information, I think we'd need a pg_stat_autovacuum view that shows a
realtime insight of the workers, and also add some aggregated counters
to PgStat_StatTabEntry. I wonder if adding counters to
PgStat_StatTabEntry would be accepted though.


I would also really like to see a means of logging (auto)vacuum activity 
in the database itself. We figured out how to do that with 
pg_stat_statements, which was a lot harder... it seems kinda silly not 
to offer that for vacuum. Hooks plus shared memory data should allow for 
that (the only tricky bit is the hook would need to start and then 
commit a transaction, but that doesn't seem onerous).


I think the shared memory structures should be done as well. Having that 
real-time info is also valuable.


I don't see too much point in adding stuff to the stats system for this.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2016-03-02 Thread Jim Nasby


On 3/2/16 5:41 PM, Tom Lane wrote:

Jim Nasby  writes:

On 3/2/16 4:21 PM, Peter Geoghegan wrote:

I think you should commit this. The chances of anyone other than you
and Masahiko recalling that you developed this tool in 3 years is
essentially nil. I think that the cost of committing a developer-level
debugging tool like this is very low. Modules like pg_freespacemap
currently already have no chance of being of use to ordinary users.
All you need to do is restrict the functions to throw an error when
called by non-superusers, out of caution.

It's a problem that modules like pg_stat_statements and
pg_freespacemap are currently lumped together in the documentation,
but we all know that.



+1.


Would it make any sense to stick it under src/test/modules/ instead of
contrib/ ?  That would help make it clear that it's a debugging tool
and not something we expect end users to use.


I haven't looked at it in detail; is there something inherently 
dangerous about it?


When I'm forced to wear a DBA hat, I'd really love to be able to find 
out what VM status for a large table is. If it's in contrib they'll know 
the tool is there; if it's under src then there's about 0 chance of 
that. I'd think SU-only and any appropriate warnings would be enough 
heads-up for DBAs to be careful with it.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb array-style subscription

2016-03-02 Thread Jim Nasby


On 3/2/16 6:24 PM, Tom Lane wrote:

If the patch were proposing a similar amount of new infrastructure to
support some datatype-extensible concept of subscripting, I'd be much
happier about it.


+1


I believe there's been some handwaving in the past about extensible
approaches to subscripting, though I haven't got time to troll the
archives for it right now.


I'd be able to make use of that in my ndarray data type. It would also 
be nice to be able to add things like matrix types, sparse arrays, and 
variable size arrays (ie: list of lists), and subscripting is how you'd 
want to interface with all of those.


Presumably the point type is handled specially today, so that should be 
taken care off too.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-11 Thread Jim Nasby


On 3/10/16 8:36 PM, Robert Haas wrote:

1. We make it true only for heavyweight lock waits, and false for
other kinds of waits.  That's pretty strange.
2. We make it true for all kinds of waits that we now know how to
report.  That still breaks compatibility.


I would absolutely vote for 2 here. You could even argue that it's a bug 
fix, since those were waits we technically should have been indicating.


The only way I can see #2 breaking anything is if you're using 
waiting=true to determine whether you look at pg_locks and your code 
will blow up if you get no rows back, but that seems like a pretty 
limited use case to me (Hello, LEFT JOIN).


Dropping the column entirely though would break tons of things.

Another random thought... changes like this would probably be easier to 
handle if we provided backwards compatibility extensions that created 
views that mimicked the catalog for a specific Postgres version.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-11 Thread Jim Nasby

On 3/11/16 3:31 PM, Peter Geoghegan wrote:

Can we come up with names that more clearly identify the difference
>between those two functions? I mean,_parent_  does not make it
>particularly obvious that the second function acquires exclusive lock
>and performs more thorough checks.

Dunno about that. It's defining characteristic is that it checks child
pages against their parent IMV. Things are not often defined in terms
of their locking requirements.

First, thanks for your work on this. I've wanted it in the past.

I agree the name isn't very clear. Perhaps _recurse?

I also agree that the nmodule name isn't very clear. If this is meant to 
be the start of a generic consistency checker, lets call it that. 
Otherwise, it should be marked as being specific to btrees, because 
presumably we might eventually want similar tools for GIN, etc. (FWIW 
I'd vote for a general consistency checker).

I know the vacuum race condition would be very rare, but I don't think 
it can be ignored. Last thing you want out of a consistency checker is 
false negatives/positives. I do think it would be reasonable to just 
wholesale block against concurrent vacuums, but I don't think there's 
any reasonable way to do that.

I would prefer the ability to do something other than raising an error 
when corruption is found, so that you could find all corruption in an 
index. Obviously could log to a different level. Another option would be 
SRFs that return info about all the corruption found, but that's 
probably overkill.

It'd be nice if you had the option to obey vacuum_cost_delay when 
running this, but that's clearly just a nice-to-have (or maybe just obey 
it all the time, since it defaults to 0).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Relation extension scalability

2016-03-11 Thread Jim Nasby


On 3/11/16 5:14 PM, Petr Jelinek wrote:

I don't really understand this part about concurrent DDL.  If there
were concurrent DDL going on, presumably other backends would be
blocked on the relation lock, not the relation extension lock - and it
doesn't seem likely that you'd often have a huge pile-up of inserters
waiting on concurrent DDL.  But I guess it could happen.



Yeah I was thinking about the latter part and as I said it's very rare
case, but I did see something similar couple of times in the wild. It's
not objection against committing this patch though, in fact I think it
can be committed as is.


FWIW, this is definitely a real possibility in any shop that has very 
high downtime costs and high transaction rates.


I also think some kind of clamp is a good idea. It's not that uncommon 
to run max_connections significantly higher than 100, so the extension 
could be way larger than 16MB. In those cases this patch could actually 
make things far worse as everyone backs up waiting on the OS to extend 
many MB when all you actually needed were a couple dozen more pages.


BTW, how was *20 arrived at? ISTM that if you have a lot of concurrent 
demand for extension that means you're running lots of small DML 
operations, not really big ones. I'd think that would make *1 more 
appropriate.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-11 Thread Jim Nasby


On 3/11/16 6:17 PM, Peter Geoghegan wrote:

Not sure about the cost delay thing. Delays are disabled by default
for manually issued VACUUM, so have doubts that that's useful.


Right, but you still have the option to enable them if you don't want to 
swamp your IO system. That's why CIC obeys it too. If I was running a 
consistency check on a production system I'd certainly want the option 
to throttle it. Without that option, I don't see running this on 
production systems as being an option. If that's not a goal then fine, 
but if it is a goal I think it needs to be there.


Isn't it just a few extra lines of code to support it?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is there a way around function search_path killing SQL function inlining? - and backup / restore issue

2016-03-11 Thread Jim Nasby


On 3/10/16 3:29 PM, Regina Obe wrote:

Take for example, I have tiger geocoder which relies on fuzzystrmatch.  I have 
no idea where someone installs fuzzystrmatch so I can't schema qualify those 
calls.  I use that dependent function to use to build an index on tables.


This is something I've thought about as well, and I think the real 
problem is search_path just isn't the right way to handle this. I think 
there needs to be some way to definitively reference something that's 
part of an extension; a method that doesn't depend on whatever schema 
the extension happens to be installed in.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-12 Thread Jim Nasby


On 3/10/16 7:48 AM, Robert Haas wrote:

I think the problem is that you can't show the name of a non-global
SQL object (such as a relation) unless the object is in the current
database.  Many of the views in the first group are database-local
views, while things like pg_locks span all databases.  We can show the
datid/relid always, but if we have a relname column it will have to be
NULL unless the datid is our database.


I would prefer that if the object is in another database we at least 
display the OID. That way, if you're logging this info you can go back 
later and figure out what was going on.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improve error handling in pltcl

2016-03-13 Thread Jim Nasby


On 3/3/16 8:51 AM, Pavel Stehule wrote:

Hi

I am testing behave, and some results looks strange


Thanks for the review!


postgres=# \sf foo
CREATE OR REPLACE FUNCTION public.foo()
  RETURNS void
  LANGUAGE plpgsql
AS $function$
begin
   raise exception sqlstate 'ZZ666' using message='hello, world',
detail='hello, my world', hint = 'dont afraid';
end
$function$

postgres=# select tcl_eval('spi_exec "select foo();"');
ERROR:  38000: hello, world
CONTEXT:  hello, world   <<<<==???

the message was in context. Probably it is out of scope of this patch,
but it isn't consistent with other PL


 while executing
"spi_exec "select foo();""
 ("eval" body line 1)
 invoked from within
"eval $1"
 (procedure "__PLTcl_proc_16864" line 3)
 invoked from within
"__PLTcl_proc_16864 {spi_exec "select foo();"}"
in PL/Tcl function "tcl_eval"
LOCATION:  throw_tcl_error, pltcl.c:1217
Time: 1.178 ms


Both problems actually exists in HEAD. The issue is this line in 
throw_tcl_error:


econtext = utf_u2e(Tcl_GetVar(interp, "errorInfo", TCL_GLOBAL_ONLY));

Offhand I don't see any great way to improve that behavior, and in any 
case it seems out of scope for this patch. As a workaround I'm just 
forcing psql error VERBOSITY to terse for now.



postgres=# select tcl_eval('join $::errorCode "\n"');
 tcl_eval
═
  POSTGRES   ↵
  message↵
  hello, world   ↵
  detail ↵
  hello, my world↵
  hint   ↵
  dont afraid↵
  domain ↵
  plpgsql-9.6↵
  context_domain ↵
  postgres-9.6   ↵
  context↵
  PL/pgSQL function foo() line 3 at RAISE↵
  SQL statement "select foo();"  ↵
  cursor_position↵
  0  ↵
  filename   ↵
  pl_exec.c  ↵
  lineno ↵
  3165   ↵
  funcname   ↵
  exec_stmt_raise
(1 row)

I miss a SQLSTATE.


Great catch. Fixed.


Why is used List object instead dictionary? TCL supports it
https://www.tcl.tk/man/tcl8.5/tutorial/Tcl23a.html


Because errorCode unfortunately is an array and not a dict. It doesn't 
really seem worth messing with it in the eval since this is just a 
sanity check...


New patch attached. It also removes some other unstable output from the 
regression test.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
diff --git a/doc/src/sgml/pltcl.sgml b/doc/src/sgml/pltcl.sgml
index d2175d5..d5c576d 100644
--- a/doc/src/sgml/pltcl.sgml
+++ b/doc/src/sgml/pltcl.sgml
@@ -775,6 +775,127 @@ CREATE EVENT TRIGGER tcl_a_snitch ON ddl_command_start 
EXECUTE PROCEDURE tclsnit
 

 
+   
+Error Handling in PL/Tcl
+
+
+ error handling
+ in PL/Tcl
+
+
+
+ All Tcl errors that are allowed to propagate back to the top level of the
+ interpreter, that is, errors not caught within the stored procedure
+ using the Tcl catch command will raise a database
+ error.
+
+
+ Tcl code within or called from the stored procedure can choose to
+ raise a database error by invoking the elog
+ command provided by PL/Tcl or by generating an error using the Tcl
+ error command and not catching it with Tcl's
+ catch command.
+
+
+ Database errors that occur from the PL/Tcl stored procedure's
+ use of spi_exec, spi_prepare,
+ and spi_execp are also catchable by Tcl's
+ catch command.
+
+
+ Tcl provides an errorCode variable that can
+ represent additional information about the error in a form that
+ is easy for programs to interpret.  The contents are in Tcl list
+ format and the first word identifies the subsystem or
+ library responsible for the error and beyond that the contents are left
+ to the individual code or library.  For example if Tcl's
+ open command is asked to open a file that doesn't
+ exist, errorCode
+ might contain POSIX ENOENT {no such file or directory}
+ where the third element may vary by locale but the first and second
+ will not.
+
+
+ When spi_exec, spi_prepare
+ or spi_execp cause a database error to be raised,
+ that database eror propagates back to Tcl as a Tcl err

Re: [HACKERS] Sanity checking for ./configure options?

2016-03-13 Thread Jim Nasby


On 2/26/16 9:29 PM, Peter Eisentraut wrote:

Your code and comments suggest that you can specify the port to
configure by setting PGPORT, but that is not the case.

test == is not portable (bashism).

Error messages should have consistent capitalization.

Indentation in configure is two spaces.


>As the comment states, it doesn't catch things like --with-pgport=1a in
>configure, but the compile error you get with that isn't too hard to
>figure out, so I think it's OK.

Passing a non-integer as argument will produce an error message like
(depending on shell)

./configure: line 3107: test: 11a: integer expression expected

but will not actually abort configure.

It would work more robustly if you did something like this

elif test "$default_port" -ge "1" -a "$default_port" -le "65535"; then
   :
else
   AC_MSG_ERROR([port must be between 1 and 65535])
fi

but that still leaks the shell's error message.

There is also the risk of someone specifying a number with a leading
zero, which C would interpret as octal but the shell would not.


All issues should now be addressed.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
diff --git a/configure b/configure
index b3f3abe..e7bddba 100755
--- a/configure
+++ b/configure
@@ -3099,6 +3099,16 @@ cat >>confdefs.h <<_ACEOF
 _ACEOF
 
 
+# It's worth testing for this because it creates a very confusing error
+if test "$default_port" = ""; then
+  as_fn_error $? "invalid empty string supplied with --with-pgport" "$LINENO" 5
+elif test ! `echo $default_port | sed -e 's/[0-9]//g'` = ''; then
+  as_fn_error $? "invalid port specification; must be a number" "$LINENO" 5
+elif test ! `echo $default_port | sed -e 's/^0//g'` = $default_port; then
+  as_fn_error $? "illegal leading 0 specified with --with-pgport" "$LINENO" 5
+elif test "$default_port" -lt "1" -o "$default_port" -gt "65535"; then
+  as_fn_error $? "port must be between 1 and 65535" "$LINENO" 5
+fi
 
 #
 # '-rpath'-like feature can be disabled
diff --git a/configure.in b/configure.in
index 0bd90d7..db6e2a0 100644
--- a/configure.in
+++ b/configure.in
@@ -164,6 +164,16 @@ but it's convenient if your clients have the right default 
compiled in.
 AC_DEFINE_UNQUOTED(DEF_PGPORT_STR, "${default_port}",
[Define to the default TCP port number as a string 
constant.])
 AC_SUBST(default_port)
+# It's worth testing for this because it creates a very confusing error
+if test "$default_port" = ""; then
+  AC_MSG_ERROR([invalid empty string supplied with --with-pgport])
+elif test ! `echo $default_port | sed -e 's/[[0-9]]//g'` = ''; then
+  AC_MSG_ERROR([invalid port specification; must be a number])
+elif test ! `echo $default_port | sed -e 's/^0//g'` = $default_port; then
+  AC_MSG_ERROR([illegal leading 0 specified with --with-pgport])
+elif test "$default_port" -lt "1" -o "$default_port" -gt "65535"; then
+  AC_MSG_ERROR([port must be between 1 and 65535])
+fi
 
 #
 # '-rpath'-like feature can be disabled

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql metaqueries with \gexec

2016-03-13 Thread Jim Nasby


On 2/22/16 1:01 PM, Corey Huinker wrote:

In the mean time, update patch attached.


Really attached this time.


I'm getting a warning from this patch:

common.c:947:8: warning: variable 'success' is used uninitialized 
whenever 'if' condition is true [-Wsometimes-uninitialized]

if (pset.gexec_flag)
^~~
common.c:995:9: note: uninitialized use occurs here
return success;
   ^~~
common.c:947:4: note: remove the 'if' if its condition is always false
if (pset.gexec_flag)
^~~~
common.c:937:15: note: initialize the variable 'success' to silence this 
warning

boolsuccess;
   ^
= '\0'
1 warning generated.

(note that I'm using CC='ccache clang -Qunused-arguments 
-fcolor-diagnostics')



for (r = 0; r < nrows; r++)
{
for (c = 0; c < ncolumns; c++)
{

etc...

Normally we don't use gratuitous {'s, and I don't think it's helping 
anything in this case. But I'll let whoever commits this decide.



diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 5f27120..0f87f29 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1280,8 +1280,8 @@ psql_completion(const char *text, int start, int end)
"\\dm", "\\dn", "\\do", "\\dO", "\\dp", "\\drds", "\\ds", 
"\\dS",
"\\dt", "\\dT", "\\dv", "\\du", "\\dx", "\\dy",
"\\e", "\\echo", "\\ef", "\\encoding", "\\ev",
-   "\\f", "\\g", "\\gset", "\\h", "\\help", "\\H", "\\i", "\\ir", 
"\\l",
-   "\\lo_import", "\\lo_export", "\\lo_list", "\\lo_unlink",
+   "\\f", "\\g", "\\gexec", "\\gset", "\\h", "\\help", "\\H", "\\i", 
"\\ir",
+   "\\l", "\\lo_import", "\\lo_export", "\\lo_list", "\\lo_unlink",


FWIW, it's generally better to leave that kind of re-wrapping to the 
next pg_indent run.


I added tests for ON_ERROR_STOP. New patch attached.

The patch still needs to document this feature in the psql docs (and 
maybe the manpage? not sure how that's generated...)

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 9750a5b..5ca769f 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -849,6 +849,13 @@ exec_command(const char *cmd,
status = PSQL_CMD_SEND;
}
 
+   /* \gexec -- send query and treat every result cell as a query to be 
executed */
+   else if (strcmp(cmd, "gexec") == 0)
+   {
+   pset.gexec_flag = true;
+   status = PSQL_CMD_SEND;
+   }
+
/* \gset [prefix] -- send query and store result into variables */
else if (strcmp(cmd, "gset") == 0)
{
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 2cb2e9b..54b7790 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -710,6 +710,46 @@ StoreQueryTuple(const PGresult *result)
return success;
 }
 
+/*
+ * ExecQueryTuples: assuming query result is OK, execute every query
+ * result as its own statement
+ *
+ * Returns true if successful, false otherwise.
+ */
+static bool
+ExecQueryTuples(const PGresult *result)
+{
+   boolsuccess = true;
+   int nrows = PQntuples(result);
+   int ncolumns = PQnfields(result);
+   int r, c;
+
+   for (r = 0; r < nrows; r++)
+   {
+   for (c = 0; c < ncolumns; c++)
+   {
+   if (! PQgetisnull(result, r, c))
+   {
+   if ( ! SendQuery(PQgetvalue(result, r, c)) )
+   {
+   if (pset.on_error_stop)
+   {
+   return false;
+   }
+   else
+   {
+   success = false;
+

Re: [HACKERS] psql metaqueries with \gexec

2016-03-13 Thread Jim Nasby

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

Still needs documentation.

The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-13 Thread Jim Nasby


On 3/11/16 6:45 PM, Peter Geoghegan wrote:

I'll add that if people like the interface you propose. (Overloading
the VACUUM cost delay functions to cause a delay for amcheck
functions, too).


I thought that had already been overloaded by CIC, but I'm not finding 
reference to it... ANALYZE does use it though, so the ship has already 
sorta sailed.


I'm actually a bit surprised cost delay isn't used anywhere else. As 
more background operations are added I suspect users will want it at 
some point.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] Proposal for \crosstabview in psql

2016-03-13 Thread Jim Nasby


On 3/13/16 12:48 AM, Pavel Stehule wrote:

crosstabview is really visualization tool. **But now, there are not any
other tool available from terminal.** So this can be significant help to
all people who would to use this functionality.


Not just the terminal either. Offhand I'm not aware of *any* fairly 
simple tool that provides crosstab. There's a bunch of 
complicated/expensive BI tools that do, but unless you've gone through 
the trouble of getting one of those setup you're currently pretty stuck.


Ultimately I'd really like some way to remove/reduce the restriction of 
result set definitions needing to be determined at plan time. That would 
open the door for server-side crosstab/pivot as well a a host of other 
things (such as dynamically turning a hstore/json/xml field into a 
recordset).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Relation extension scalability

2016-03-13 Thread Jim Nasby


On 3/11/16 9:57 PM, Petr Jelinek wrote:

I also think some kind of clamp is a good idea. It's not that
uncommon to run max_connections significantly higher than 100, so
the extension could be way larger than 16MB. In those cases this
patch could actually make things far worse as everyone backs up
waiting on the OS to extend many MB when all you actually needed
were a couple dozen more pages.


I agree, We can have some max limit on number of extra pages, What other
thinks ?



Well, that's what I meant with clamping originally. I don't know what is
a good value though.


Well, 16MB is 2K pages, which is what you'd get if 100 connections were 
all blocked and we're doing 20 pages per waiter. That seems like a 
really extreme scenario, so maybe 4MB is a good compromise. That's 
unlikely to be hit in most cases, unlikely to put a ton of stress on IO, 
even with magnetic media (assuming the whole 4MB is queued to write in 
one shot...). 4MB would still reduce the number of locks by 500x.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing - V18

2016-03-13 Thread Jim Nasby


On 3/13/16 6:30 PM, Peter Geoghegan wrote:

On Sat, Mar 12, 2016 at 5:21 PM, Jeff Janes  wrote:

Would the wiki be a good place for such tips?  Not as formal as the
documentation, and more centralized (and editable) than a collection
of blog posts.


That general direction makes sense, but I'm not sure if the Wiki is
something that this will work for. I fear that it could become
something like the TODO list page: a page that contains theoretically
accurate information, but isn't very helpful. The TODO list needs to
be heavily pruned, but that seems like something that will never
happen.

A centralized location for performance tips will probably only work
well if there are still high standards that are actively enforced.
There still needs to be tight editorial control.


I think there's ways to significantly restrict who can edit a page, so 
this could probably still be done via the wiki. IMO we should also be 
encouraging users to test various tips and provide feedback, so maybe a 
wiki page with a big fat request at the top asking users to submit any 
feedback about the page to -performance.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Publish autovacuum informations

2016-03-18 Thread Jim Nasby


On 3/3/16 3:54 AM, Kyotaro HORIGUCHI wrote:

I wonder why there haven't been discussions so far on what kind
of information we want by this feature. For example I'd be happy
to see the time of last autovacuum trial and the cause if it has
been skipped for every table. Such information would (maybe)
naturally be shown in pg_stat_*_tables.

=
=# select relid, last_completed_autovacuum, last_completed_autovacv_status, 
last_autovacuum_trial, last_autovacuum_result from pg_stat_user_tables;
-[ RECORD 1 ]-+--
relid | 16390
last_completed_autovacuum | 2016-03-01 01:25:00.349074+09
last_completed_autovac_status | Completed in 4 seconds. Scanned 434 pages, 
skipped 23 pages
last_autovacuum_trial | 2016-03-03 17:33:04.004322+09
last_autovac_traial_status| Canceled by PID 2355. Processed 144/553 pages.
-[ RECORD 2 ]--+--
...
last_autovacuum_trial | 2016-03-03 07:25:00.349074+09
last_autovac_traial_status| Completed in 4 seconds. Scanned 434 pages, 
skipped 23 pages
-[ RECORD 3 ]--+--
...
last_autovacuum_trial | 2016-03-03 17:59:12.324454+09
last_autovac_trial_status | Processing by PID 42334, 564 / 32526 pages done.
-[ RECORD 4 ]--+--
...
last_autovacuum_trial | 2016-03-03 17:59:12.324454+09
last_autovac_trial_status | Skipped by dead-tuple threashold.
=


I kinda like where you're going here, but I certainly don't think the 
stats system is the way to do it. Stats bloat is already a problem on 
bigger systems. More important, I don't think having just the last 
result is very useful. If you've got a vacuum problem, you want to see 
history, especially history of the vacuum runs themselves.


The good news is that vacuum is a very low-frequency operation, so it 
has none of the concerns that the generic stats system does. I think it 
would be reasonable to provide event triggers that fire on every 
launcher loop, after a worker has built it's "TODO list", and after 
every (auto)vacuum.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql - DECLARE - cannot to use %TYPE or %ROWTYPE for composite types

2016-03-19 Thread Jim Nasby


On 3/3/16 4:51 AM, Pavel Stehule wrote:

CREATE TABLE a(a int);
CREATE TABLE b(a a.a%TYPE)

And the people expecting the living relation between table a and table
b. So when I do ALTER a.a, then b.a should be changed. What if I drop
a.a or drop a?

So this is reason, why I don't would this feature in SQL side.


I don't buy that. plpgsql doesn't work that way, so why would this? 
*especially* with the %TYPE decorator.


Now, if the syntax was

CREATE TABLE b(a a.a)

then I would expect b.a to be a foreign key reference to a.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Minor typos in optimizer/README

2016-03-19 Thread Jim Nasby

Studying the partification commit, I noticed a few typos in $SUBJECT. 
Patch attached.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 7ecf8c8..9529346 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -900,8 +900,8 @@ above, plus a relids set, which allows there to be more 
than one upperrel
 of the same kind.  We use NULL for the relids if there's no need for more
 than one upperrel of the same kind.  Currently, in fact, the relids set
 is vestigial because it's always NULL, but that's expected to change in
-future.  For example, in planning set operations, we might need the relids
-to denote which subset of the leaf SELECTs has been combined in a
+the future.  For example, in planning set operations, we might need the
+relids to denote which subset of the leaf SELECTs has been combined in a
 particular group of Paths that are competing with each other.
 
 The result of subquery_planner() is always returned as a set of Paths
@@ -971,5 +971,5 @@ One of the keys to making parallel query effective is to 
run as much of
 the query in parallel as possible.  Therefore, we expect it to generally
 be desirable to postpone the Gather stage until as near to the top of the
 plan as possible.  Expanding the range of cases in which more work can be
-pushed below the Gather (and costly them accurately) is likely to keep us
+pushed below the Gather (and costing them accurately) is likely to keep us
 busy for a long time to come.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql - DECLARE - cannot to use %TYPE or %ROWTYPE for composite types

2016-03-19 Thread Jim Nasby


On 3/15/16 7:17 PM, Tom Lane wrote:

In short, I think we should reject this implementation and instead try
to implement the type operators we want in the core grammar's Typename
production, from which plpgsql will pick it up automatically.


+1.

Something else that's been discussed is allowing [] referencing to be 
more modular. Offhand I don't see how that would impact this new type 
referencing stuff, but maybe someone else sees an issue.


BTW, it might also be useful to allow {} to work as a reference method.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fd.c doesn't remove files on a crash-restart

2016-03-19 Thread Jim Nasby


On 3/16/16 2:16 PM, Tom Lane wrote:

Robert Haas  writes:

On Wed, Mar 16, 2016 at 2:05 PM, Tom Lane  wrote:

Possible compromise: remove files only in non-Assert builds?



That sorta seems like tying two things together that aren't obviously
related.  I think building with --enable-cassert is support to enable
debugging cross-checks, not change behavior.


Well, it's support to enable debugging, and I would classify not
destroying evidence as being debugging support.


Another option: keep stuff around for a single restart. I don't think 
this would be that hard by having a file that's a list of files to 
remove on the next restart. On restart, remove everything in that file 
(and the file itself). If there's anything left, create a new file 
that's the list of what's left.


The other nice thing about having this list is it would tell the DBA 
exactly what files were left after the crash vs what's new.


Actually, I guess another option would be to have a separate directory 
to move all these files into. On restart, nuke the directory if it 
exists, then move stuff in there if necessary.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] typmod is always -1

2016-03-20 Thread Jim Nasby


On 3/17/16 7:40 PM, Tom Lane wrote:

Chapman Flack  writes:

It seems that a typmod can only be used restrict the set of possible
values of the unmodified type (as clearly seen in the language "length
conversion cast", since certainly a typmod allowing { string | length < N }
is doing nothing but enforcing a subset of { string }. Each element of
the subset is still a valid element of the whole set (naturally, boring)
*and has to be represented the same way* (interesting): the representation
mustn't do clever things that you would need to know the typmod in order to
interpret, because most uses of a value are without access to the typmod.


You do need to be able to interpret values of the type without having
separate access to the typmod, but I don't think it follows that it's as
restrictive as you say.  One easy way around that is to store the typmod
in the value.

Practical uses might include compressing the data in different ways
depending on typmod.  I'm drawing a blank on other compelling examples
though I'm sure there are some.  Have you looked at PostGIS?  I'm pretty
sure some of their types make use of typmod in nontrivial ways.


If you want a non-trivial use of typmod, take a look at the (work in 
progress) variant type I created[1]. It allows you pass names of 
"registered variants" in via typmod. The idea behind that is to restrict 
what types you can actually store in a particular variant field (though 
you can also disallow a registered variant from being used in a table 
definition).


I did run into some cases where Postgres ignored typmod, so I special 
case the default typmod (-1) to a registered variant that's disabled.


[1] 
https://github.com/BlueTreble/variant/blob/master/doc/variant.md#variant-modifier

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] oldest xmin is far in the past

2016-03-20 Thread Jim Nasby


On 3/19/16 11:32 AM, Tomas Vondra wrote:

Hi,

On 03/19/2016 06:29 AM, John Snow wrote:

There is no any long transaction neither prepared transaction.


Can you show us pg_stat_activity? Particularly the xmin values for
backends attached to the two databases mentioned in the log (1 and 12451).

FWIW the second OID is a bit weird - the first OID assigned to normal
objects is defined as 16384, and none of the so I wonder how you managed
to create a database with such DB?


On my 9.4, template1 has oid 1.

BTW, John mentioned Slony; if this is on one of the replicas then it's 
certainly understandable that all the tables have ages that are almost 
identical. That happens because the initial COPY of each table takes 
place in a single transaction, and the only other activity that's 
generating XIDs is the normal replay process. Depending on your 
settings, I'd expect that you're only generating a couple XIDs/minute, 
so even if it took 10 days to do the initial copy you'd still only have 
a span of ~30k transactions. That means autovac will suddenly want to 
freeze the whole database in one shot. It's a good idea to run a manual 
vacuum freeze after the initial copy is done to prevent this.


To answer one of your other questions, it look like all the ages are 
~500M XIDs, which means you've got another ~1B to go before this becomes 
a serious concern.



* freeze_min_age
* vacuum_freeze_min_age
* autovacuum_freeze_max_age (we already know this one)

What values are set for those?


Better yet, can you just run this query?

SELECT name, setting, unit, source
  FROM pg_settings
  WHERE name ~ 'freeze|vacuum' OR source !~ 'default|override'
;

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-20 Thread Jim Nasby


On 3/17/16 9:01 AM, Robert Haas wrote:

I think that
there are an awful lot of cases where extension authors haven't been
able to quite do what they want to do without core changes because
they couldn't get control in quite the right place; or they could do
it but they had to cut-and-paste a lot of code.


FWIW, I've certainly run into this at least once, maybe twice. The case 
I can think of offhand is doing function resolution with variant. I 
don't remember the details anymore, but my recollection is that to get 
what I needed I would have needed to copy huge swaths of the rewrite code.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improve error handling in pltcl

2016-03-20 Thread Jim Nasby


On 3/17/16 5:46 PM, Tom Lane wrote:

Pavel Stehule  writes:

I'll mark this patch as ready for commiters.


I started to look at this patch.  It seems to me that the format of the
errorCode output is not terribly well designed.

...

Maybe there's another way.  I've not used Tcl in anger since around
the turn of the century, so it's entirely likely that I'm missing
something.  But the proposed doc addition isn't showing me any really
easy way to work with this data format, and I think that that's either
a design fail or a docs fail, not something I should just accept as
the best we can do.


I asked Karl about this (since he's active in the TCL community and 
works with TCL every day), and his response was essentially:


Tcl is all about flat lists of key value pairs.

array set myArray $list

sucks a flat list of key-value pairs into an array and vice versa

set list [array get myArray]

creates one. This is normal Tcl stuff.

Getting the errorCode into an array is as easy as

array set errorData [lrange $errorCode 1 end]

Then you can do

$errorData(detail), $errorData(message), etc.

In fact keyed lists in TclX which are the inspiration for the approach 
to lists of alternating key-value pairs did it the way he suggested and 
that’s fallen by the wayside in favor of flat lists.


There has been a formal proposal to add a -stride to lsearch to make 
lsearch efficient at searching the same flat lists of key-value pairs 
and I expect to see it in Tcl 8.7 or sooner.



The doc example also makes me think that more effort should get expended
on converting internalquery/internalpos to just be query/cursorpos.
It seems unlikely to me that a Tcl function could ever see a case
where the latter fields are useful directly.


Is there docs or an example on how to handle that? I looked at the 
plpython stuff and I'm still really unclear on what exactly an 
internalquery is as opposed to regular context info? 
PLy_spi_exception_set simply exposes the raw internalquery and internalpos.



Also, I'm curious as to why you think "domain" or "context_domain"
is of any value to expose here.  Tcl code is not going to have any
access to the NLS infrastructure (if that's even been compiled) to
do anything with those values.


I'm not really sure what it's hurting to expose that, but I'll remove it.


And I believe it may be a security violation for this code to expose
"detail_log".  The entire point of that field is it goes to the
postmaster log and NOT anywhere where unprivileged clients can see it.


Removed.


Nitpickier stuff:

* Docs example could use work: it should show how to do something
useful *in Tcl code*, like maybe checking whether an error had a
particular SQLSTATE.  The example with dumping the whole list at the
psql command line is basically useless, not to mention that it
relies on a nonexistent "tcl_eval" function.  (And I don't care


Will work on an improved example.


for the regression test case creating such a function ... isn't
that a fine SQL-injection hole?)


If it was taking external input, but it's not, and it saves from 
creating 2 separate functions (which you want to do to make sure the 
global errorCode is being set, and not a local copy).



* I believe pltcl_construct_errorCode needs to do E2U encoding
conversion for anything that could contain non-ASCII data, which is
most of the non-fixed strings.


Done.


* Useless-looking hunk at pltcl.c:1610


Removed.


* I think the unstable data you're griping about is the Tcl function's
OID, not the PID.  (I wonder whether we should make an effort to hide
that in errorInfo.  But if so it's a matter for a separate patch.)


It's possible that someone would want to know the name of the 
constructed TCL function (and yeah, I think it's the OID not PID).



I'll set this patch back to Waiting On Author.  I believe it's well
within reach of getting committed in this fest, but it needs more
work.


Interim patch attached (need to work on the docs).
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
diff --git a/doc/src/sgml/pltcl.sgml b/doc/src/sgml/pltcl.sgml
index d2175d5..d5c576d 100644
--- a/doc/src/sgml/pltcl.sgml
+++ b/doc/src/sgml/pltcl.sgml
@@ -775,6 +775,127 @@ CREATE EVENT TRIGGER tcl_a_snitch ON ddl_command_start 
EXECUTE PROCEDURE tclsnit
 

 
+   
+Error Handling in PL/Tcl
+
+
+ error handling
+ in PL/Tcl
+
+
+
+ All Tcl errors that are allowed to propagate back to the top level of the
+ interpreter, that is, errors not caught within the stored procedure
+ using the Tcl catch command will raise a database
+ error.
+
+
+ Tcl code within or called from the stored procedure can choose to
+ raise a

Re: [HACKERS] Relax requirement for INTO with SELECT in pl/pgsql

2016-03-21 Thread Jim Nasby


On 3/21/16 5:03 PM, Merlin Moncure wrote:

in Oracle, you'd simply do:
LogIt('I did something');


It would be *great* if we could support that in plpgsql.


I'm not sure what Oracle does for SELECT statements without INTO/BULK
UPDATE.  I'm not really inclined to care -- I'm really curious to see
an argument where usage of PERFORM actually helps in some meaningful
way.  Notably, SELECT without INTO is accepted syntax, but fails only
after running the query.  I think that's pretty much stupid but it's
fair to say I'm not inventing syntax, only disabling the error.


I don't think it buys much at all.

While we're on the subject, it'd be great if variable := SELECT ... 
worked too.



I'm not sure what other databases do is relevant.   They use other
procedure languages than pl//sql (the biggest players are pl/psm and
t-sql) which have a different set of rules in terms of passing
variables in and out of queries.


+1
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-21 Thread Jim Nasby


On 3/21/16 11:57 AM, Teodor Sigaev wrote:

A and B are points of intersection of lines. So, box PBCAis a bounding
box for points contained in 3-rd (see labeling above). For example X
labeled point is not a descendace of child node with centroid  C because
it must be in branch of 1-st quad of parent node. So, each node (except
root) will have a limitation in its quadrant. To transfer that
limitation the traversalValue is used.


Isn't this basically the same thing that the cube contrib module does? 
(Which has the added benefit of kNN-capable operators).


If that's true then ISTM it'd be better to work on pulling cube's 
features into box?


If it's not true, I'm still wondering if there's enough commonality here 
that we should pull cube into core...

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-22 Thread Jim Nasby


On 3/21/16 7:41 PM, Stas Kelvich wrote:

While people tends to use machine learning and regressions models more and more 
it is interesting to have some general n-dim indexing with kNN,
but I think it is different problem and should be solved in a different way.


I think one of the issues here is it's not very clear to someone without 
a good amount of ML knowledge how these things relate. I hear "box' and 
'cube' and think it's just a 2D vs 3D issue, and intarray isn't even on 
the radar.


Maybe what's needed are actual vector and matrix types?

In any case, if you've got a good reason why box and cube should stay 
separate then further discussion should happen in another thread.


BTW, if you haven't seen it, take a look at http://madlib.apache.org/
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-03-22 Thread Jim Nasby

On 3/22/16 9:36 AM, Amit Kapila wrote:

 > > Note, that we are doing it only when a transaction has less than
equal to
 > > 64 sub transactions.
 >
 > So?
 >

They should fall on one page, unless they are heavily interleaved as
pointed by you.  I think either subtransactions are present or not, this
patch won't help for bigger transactions.

FWIW, the use case that comes to mind here is the "upsert" example in 
the docs. AFAIK that's going to create a subtransaction every time it's 
called, regardless if whether it performs actual DML. I've used that in 
places that would probably have moderately high concurrency, and I 
suspect I'm not alone in that.

That said, it wouldn't surprise me if plpgsql overhead swamps an effect 
this patch has, so perhaps it's a moot point.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improve error handling in pltcl

2016-03-22 Thread Jim Nasby


On 3/20/16 8:42 PM, Jim Nasby wrote:

The doc example also makes me think that more effort should get expended
on converting internalquery/internalpos to just be query/cursorpos.
It seems unlikely to me that a Tcl function could ever see a case
where the latter fields are useful directly.


Is there docs or an example on how to handle that? I looked at the
plpython stuff and I'm still really unclear on what exactly an
internalquery is as opposed to regular context info?
PLy_spi_exception_set simply exposes the raw internalquery and internalpos.


Anyone any pointers on this? I'm not sure I can finish the docs without 
knowing what we want to do here.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Show dropped users' backends in pg_stat_activity

2016-03-22 Thread Jim Nasby

On 3/22/16 10:35 PM, Kyotaro HORIGUCHI wrote:

Even if we maintained some interlock for a backend's login role identity,
>I hardly think it would be practical to e.g. lock during transient SET
>ROLE or security-definer-function-call operations.  So it's not like we
>can let the permissions system assume that a role OID being inquired about
>always matches a live entry in pg_authid.

Even if blocking DROPs is not perfect for all cases,
unconditionally allowing to DROP a role still doesn't seem proper
behavior, especially for replication roles. And session logins
seem to me to have enough reason to be treated differently than
disguising as another role using SET ROLE or sec-definer.

There's probably a way this could be handled, since DROP ROLE is 
presumably a very uncommon operation. Perhaps something as simple as 
keeping a single OID in shared memory for the role about to be dropped. 
That would serialize role drops, but I doubt that matters.

The attached patch blocks DROP ROLE for roles that own active
sessions, and on the other hand prevents a session from being
activated if the login role is concurrently dropped.

I think this is fine for now, but... what happens if you drop a role 
that's in use on a streaming replica? Does replay stall or do we just 
ignore it?

There should probably be some doc changes to go with the patch too, no?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Relax requirement for INTO with SELECT in pl/pgsql

2016-03-23 Thread Jim Nasby

On 3/22/16 8:37 AM, Merlin Moncure wrote:

I afraid of useless and forgotten call of functions. But the risk is same
>like PERFORM - so this is valid from one half. The PERFORM statement holds
>special semantic, and it is interesting.

I see your point here, but the cost of doing that far outweighs the
risks.  And I don't think the arbitrary standard of defining forcing
the user to identify if the query should return data is a good way of
identifying dead code.

Not to mention that there's tons of other ways to introduce unintended 
inefficiencies. Off the top of my head, declaring variables that are 
never referenced and have no assignment is a big one.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-23 Thread Jim Nasby


On 3/22/16 7:28 AM, Michael Paquier wrote:

On Mon, Mar 21, 2016 at 7:55 AM, Jim Nasby  wrote:

On 3/17/16 9:01 AM, Robert Haas wrote:


I think that
there are an awful lot of cases where extension authors haven't been
able to quite do what they want to do without core changes because
they couldn't get control in quite the right place; or they could do
it but they had to cut-and-paste a lot of code.


FWIW, I've certainly run into this at least once, maybe twice. The case I
can think of offhand is doing function resolution with variant. I don't
remember the details anymore, but my recollection is that to get what I
needed I would have needed to copy huge swaths of the rewrite code.


Amen, I have been doing that a couple of days ago with some elog stuff.


Any ideas on ways to address this? Adding more hooks in random places 
every time we stumble across something doesn't seem like a good method.


One thing I've wondered about is making it easier to find specific 
constructs in a parsed query so that you can make specific 
modifications. I recall looking at that once and finding a roadblock 
(maybe a bunch of functions were static?)

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-03-24 Thread Jim Nasby


On 3/24/16 10:21 AM, Alexander Korotkov wrote:

1) It's a great feature many users dream about.


Doesn't matter if it starts eating their data...


2) Patch is not very big.
3) Patch doesn't introduce significant infrastructural changes.  It just
change some well-isolated placed.


It doesn't really matter how big the patch is, it's a question of "What 
did the patch fail to consider?". With something as complicated as the 
btree code, there's ample opportunities for missing things. (And FWIW, 
I'd argue that a 51kB patch is certainly not small, and a patch that is 
doing things in critical sections isn't terribly isolated).


I do think this will be a great addition, but it's just too late to be 
adding this to 9.6.


(BTW, I'm getting bounces from a.lebe...@postgrespro.ru, as well as 
postmaster@. I emailed i...@postgrespro.ru about this but never heard back.)

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Alter or rename enum value

2016-03-24 Thread Jim Nasby


On 3/24/16 2:00 PM, Matthias Kurz wrote:

ALTER TYPE bogon DROP VALUE 'cat'; -- not implemented in 9.5 but should
work in future
ROLLBACK;


Dropping a value is significantly harder because that value could be in use.

I'm certain there's a really good reason adding new values isn't allowed 
inside of a transaction. It's probably documented in the code.


To answer your question about "what goes into a release", there's really 
no process for that. What goes into a release is what someone was 
interested enough in to get community approval for the idea, write the 
patch, and shepard the patch through the review process. So if you want 
these features added, you need to either: do it yourself, convince 
someone else to do it for free, or pay someone to do it for you.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] avg,first,last,median in one query

2016-03-24 Thread Jim Nasby


On 3/24/16 9:00 AM, Konstantin Knizhnik wrote:

But unfortunately it is not possible to calculate median is such way
because percentile_disc is not compatible with OVER:


I don't know if you could use cume_dist()[1] to do this, but even if you 
can't it probably wouldn't be hard to modify it to do what you need.


[1] http://www.postgresql.org/docs/9.5/static/functions-window.html
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Alter or rename enum value

2016-03-25 Thread Jim Nasby


On 3/24/16 10:27 PM, Tom Lane wrote:

It's conceivable that we could do something like adding an "isdead"
column to pg_enum and making enum_in reject new values that're marked
isdead.  But I can't see that we'd ever be able to support true
removal of an enum value at reasonable cost.  And I'm not really sure
where the use-case argument is for working hard on it.


I wonder if we could handle this by allowing foreign keys on enum 
columns back to pg_enum. Presumably that means we'd have to treat 
pg_enum as a regular table and not a catalog table. Due to locking 
concerns I don't think we'd want to put the FKs in place by default either.


I've certainly heard people avoiding ENUMs because of their limitations, 
so it'd be nice if there was a way to lift them.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Alter or rename enum value

2016-03-25 Thread Jim Nasby


On 3/25/16 2:22 PM, Gavin Flower wrote:


I've certainly heard people avoiding ENUMs because of their
limitations, so it'd be nice if there was a way to lift them.

Well, I use Enums extensively in Java.

However, I totally avoid using ENUMs in pg, due to their inflexibility!


Possibly related to this, for a long time I've also wanted a way to 
better integrate FKs, probably via some kind of a pseudotype or maybe a 
special operator. The idea being that instead of manually specifying 
joins, you could treat a FK field in a table as a pointer and do things 
like:


CREATE TABLE invoice(customer int NOT NULL REFERENCES(customer));

SELECT invoice.*, customer->first_name, customer->last_name, ...
  FROM invoice;

If we had that capability, there would be less need for ENUMs.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improve error handling in pltcl

2016-03-25 Thread Jim Nasby


On 3/25/16 3:11 PM, Tom Lane wrote:

Jim Nasby  writes:
the data, we're making it unnecessarily hard.  All we need is one more
field in there, and you can simplify that to


Ahh, nice.


I think actually it's a simple point: there won't ever be a case where
cursorpos is set here, because that's only used for top-level SQL syntax
errors.  Anything we are catching would be an internal-query error, so
we might as well not confuse PL/Tcl users with the distinction but just
report internalquery/internalpos as the statement and cursor position.


PLy_spi_exception_set simply exposes the raw internalquery and internalpos.


Right, because that's all that could be useful.


Ahh, ok, finally I get it.

It would be nice if the comments for ErrorData were clearer...


it strikes me that this is not coding style we want to encourage.
We should borrow the infrastructure plpgsql has for converting
SQLSTATEs into condition names, so that that can be more like


Yeah, Karl and I were just talking about that as we were finishing up 
the docs changes (ironically, as you were commiting this...).


I ended up with a more realistic example that also demonstrates that you 
can refer to errorCode in a separate function if desired. That patch 
attached for posterity.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
diff --git a/doc/src/sgml/pltcl.sgml b/doc/src/sgml/pltcl.sgml
index d2175d5..4cf32df 100644
--- a/doc/src/sgml/pltcl.sgml
+++ b/doc/src/sgml/pltcl.sgml
@@ -775,6 +775,169 @@ CREATE EVENT TRIGGER tcl_a_snitch ON ddl_command_start 
EXECUTE PROCEDURE tclsnit
 

 
+   
+Error Handling in PL/Tcl
+
+
+ error handling
+ in PL/Tcl
+
+
+
+ All Tcl errors that occur within a stored procedure and are not caught
+ using Tcl's catch or try
+ functions will raise a database error.
+
+
+ Tcl code can raise a database error by invoking the
+ elog command provided by PL/Tcl or by generating an
+ error using the Tcl error command and not catching it
+ with Tcl's catch command.
+
+
+ Database errors that occur from the PL/Tcl stored procedure's
+ use of spi_exec, spi_prepare,
+ and spi_execp are also catchable by Tcl's
+ catch command.
+
+
+ Tcl provides an errorCode variable that can represent
+ additional information about the error in a form that is easy for programs
+ to interpret.  The contents are a Tcl list format. The first word
+ identifies the subsystem or library responsible for the error. The
+ remaining contents are up to the individual code or library.  For example
+ if Tcl's open command is asked to open a file that
+ doesn't exist, errorCode might contain POSIX
+ ENOENT {no such file or directory} where the third element may
+ vary by locale but the first and second will not.
+
+
+ When spi_exec, spi_prepare
+ or spi_execp cause a database error to be raised,
+ that database eror propagates back to Tcl as a Tcl error.  In this case
+ errorCode is set to a list where the first element is
+ POSTGRES followed by details of the Postgres error.
+ Since fields in the structure may or may not be present depending on the
+ nature of the error, how the function was invoked, etc, PL/Tcl has adopted
+ the convention that subsequent elements of the
+ errorCode list are key-value pairs where the first
+ value is the name of the field and the second is its value.
+
+
+ Fields that may be present include SQLSTATE,
+ message,
+ detail,
+ hint,
+ context,
+ schema,
+ table,
+ column,
+ datatype,
+ constraint,
+ cursor_position,
+ internalquery,
+ internal_position,
+ filename,
+ lineno and
+ funcname.
+
+
+ You might find it useful to load the results into an array. Code
+ for doing that might look like
+
+if {[lindex $errorCode 0] == "POSTGRES"} {
+array set errorRow [lrange $errorCode 1 end]
+}
+
+
+
+ This example shows how to trap a specific SQL error.
+
+CREATE TABLE account(user_name varchar(1) NOT NULL PRIMARY KEY);
+CREATE OR REPLACE FUNCTION public.create_user(user_name text)
+ RETURNS void LANGUAGE pltcl AS $function$
+set prep [ spi_prepare "INSERT INTO account(user_name) VALUES(\$1)" [ list 
text ] ]
+if [ catch {
+spi_execp $prep [ list $1 ]
+} msg ] {
+if {[lindex $::errorCode 0] == "POSTGRES"} {
+array set errorData [lrange $::errorCode 1 end]
+if { $errorData(SQLSTATE) == "23505" && $errorData(constraint) == 
"account_pkey" } {
+return -code error "user '$1' already exists"
+}
+}
+throw $::errorCode $msg
+}
+$function$;
+

[HACKERS] SQL access to access method details

2016-03-28 Thread Jim Nasby

While working on a tool to capture catalog/stats info and looking at 
cross version compatibility, I noticed that the pg_am changes removed 
SQL access to a bunch of AM info. [1] indicates that's part of the 
purpose of the patch; are we sure no tools are using this info? Unlike 
previous catalog compatibility breaks, this one completely removes that 
information, so if someone was using it they're now completely hosed.


[1] http://www.postgresql.org/message-id/55fec1ab.8010...@2ndquadrant.com
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] backup tools ought to ensure created backups are durable

2016-03-28 Thread Jim Nasby


On 3/28/16 11:03 AM, Magnus Hagander wrote:


That should work yeah. And given that we already use that check in other
places, it seems it should be perfectly safe. And as long as we only do
a WARNING and not abort if the fsync fails, we should be OK if people
intentionally store their backups on an fs that doesn't speak fsync (if
that exists), in which case I don't really think we even need a switch
to turn it off.


I'd even go so far as spitting out a warning any time we can't fsync 
(maybe that's what you're suggesting?)

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Alter or rename enum value

2016-03-28 Thread Jim Nasby


On 3/28/16 4:42 AM, Emre Hasegeli wrote:

Now, we are using a
function to replace an enum type on all tables with another one, but
we are not at all happy with this solution.  It requires all objects
which were using the enum to be dropped and recreated, and it rewrites
the tables, so it greatly increases the migration time and effort.


FWIW, there are ways to avoid some of that pain by having a trigger 
maintain the new column on INSERT/UPDATE and then slowly touching all 
the old rows where the new column is NULL.


Obviously would be much better if we could just do this with ENUMs...
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Weird behavior during CREATE EXTENSION

2016-01-12 Thread Jim Nasby


This behavior had be quite baffled...


~@decina.local/29760# create extension "trunklet-format" CASCADE;
NOTICE:  installing required extension "trunklet"
NOTICE:  installing required extension "variant"
CREATE EXTENSION
~@decina.local/29760# create extension "pgxntool-test";
ERROR:  syntax error at or near "-"
LINE 1: create extension "pgxntool-test";
^
~@decina.local/29760# select * from pg_available_extensions where name ~'-';
  name   | default_version | installed_version | 
comment
-+-+---+-
 pgxntool-test   | 0.1.0   |   |
 trunklet-format | 0.1.1   | 0.1.1 | A format()-based 
template language for trunklet
(2 rows)


Eventually, I realized the problem was the first line of the extension 
file itself:


CREATE FUNCTION pgxntool-test(

wrapping that in "s fixed the issue. (The reason that still doesn't line 
up with the ^ above is because the ^ is accounting for "LINE 1: ".)


This makes debugging extensions quite tedious. Part of the explanation 
is in the comment for execute_sql_string():



/*
 * Execute given SQL string.
 *
 * filename is used only to report errors.
 *
 * Note: it's tempting to just use SPI to execute the string, but that does
 * not work very well.  The really serious problem is that SPI will parse,
 * analyze, and plan the whole string before executing any of it; of course
 * this fails if there are any plannable statements referring to objects
 * created earlier in the script.  A lesser annoyance is that SPI insists
 * on printing the whole string as errcontext in case of any error, and that
 * could be very long.
 */


I can think of 4 ways to fix this:

1) Have psql parse the script into separate commands for us.
2) Pull enough of psql's parsing into the backend code to be able to do #1
3) Add *file* line numbers to the output of pg_parse_query()
4) Have ereport spit out the context you'd normally get from SPI if it 
sees that it was called from somewhere underneath execute_sql_string().


My preference would actually be #1, because that would make it easy for 
any tool that wanted to to get access to that. Jon Erdman actually 
looked into a similar alternative in the past and it was only a few 
lines of code. Basically, when the "parse file" option is chosen don't 
even attempt to connect to a database, just parse things, execute \ 
commands and print the results instead of sending them via libpq. That 
wouldn't work directly here because we want to split commands apart, but 
it wouldn't be hard to have psql spit out a special command separator 
line and then look for that. psql would have to ignore \quit in this 
mode though, but I think that's fine.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: PL/Pythonu - function ereport

2016-01-13 Thread Jim Nasby

On 1/12/16 11:25 AM, Catalin Iacob wrote:

>The differentiation between Error and SPIError is wrong, because there isn't
>any difference in reality.

They're similar but not really the same thing. raise Error and
plpy.error are both ways to call ereport(ERROR, ...) while SPIError is
raised when coming back after calling into Postgres to execute SQL
that itself raises an error. Now indeed, that executed SQL raised an
error itself via another ereport(ERROR, ...) somewhere but that's a
different thing.

Why should they be different? An error is an error. You either want to 
trap a specific type of error or you don't. Having two completely 
different ways to do the same thing is just confusing.

IMHO the Error and Fatal classes should never have been committed, 
especially since they're undocumented. It's not the responsibility of 
this patch to remove them, but it certainly shouldn't dig the hole deeper.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Weird behavior during CREATE EXTENSION

2016-01-13 Thread Jim Nasby


On 1/12/16 5:00 PM, Tom Lane wrote:

There's certainly room to improve error reporting for extension scripts,
but I think you are heading into a dead end in the name of saving a little
time coding.  I'd suggest looking into an errcontext callback, instead.

Also, there's some technology in CREATE FUNCTION that deals with the fact
that we may be calling the parser on a string different from the original
user command, which might be worth borrowing here --- at least part of
the confusion is that it's evidently reporting a cursor position relative
to the extension script as though it applied to the CREATE EXTENSION.


Are you talking about plpgsql_compile_error_callback()? It looks like it 
does it's magic by relying on the plpgsql parser to keep track of where 
it's at.


ISTM part of the goal here should be to show what the actual command was 
that failed (ie: the command in the extension file). I'm guessing the 
way to do that would be to have pg_parse_query() keep the original 
statement in the parse nodes?


I guess if this was easy it would already have been fixed...
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Interesting read on SCM upending software and hardware architecture

2016-01-18 Thread Jim Nasby


On 1/18/16 2:47 PM, Peter Geoghegan wrote:

On Mon, Jan 18, 2016 at 12:31 PM, Robert Haas  wrote:

People keep predicting the death of spinning media, but I think
it's not happening to anywhere near as fast as that people think.
Yes, I'm writing this on a laptop with an SSD, and my personal laptop
also has an SSD, but their immediate predecessors did not, and these
are fairly expensive laptops.  And most customers I talk to are still
using spinning disks.  Meanwhile, main memory is getting so large that
even pretty significant databases can be entirely RAM-cached.  So I
tend to think that this is a lot less exciting than people who are not
me seem to think.


I tend to agree that the case for SSDs as a revolutionary technology
has been significantly overstated. This recent article makes some
interesting points:

http://www.zdnet.com/article/what-we-learned-about-ssds-in-2015/

I think it's much more true that main memory scaling (in particular,
main memory capacity) has had a huge impact, but that trend appears to
now be stalling.


My original article doesn't talk about SSDs; it's talking about 
non-volatile memory architectures (quoted extract below). Fusion IO is 
an example of this, and if NVDIMMs become available we'll see even 
faster non-volatile performance.


To me, the most interesting point the article makes is that systems now 
need much better support for multiple classes of NV storage. I agree 
with your point that spinning rust is here to stay for a long time, 
simply because it's cheap as heck. So systems need to become much better 
at moving data between different layers of NV storage so that you're 
getting the biggest bang for the buck. That will remain critical as long 
as SCM's remain 25x more expensive than rust.


Quote from article:



Flash-based storage devices are not new: SAS and SATA SSDs have been 
available for at least the past decade, and have brought flash memory 
into computers in the same form factor as spinning disks. SCMs reflect a 
maturing of these flash devices into a new, first-class I/O device: SCMs 
move flash off the slow SAS and SATA buses historically used by disks, 
and onto the significantly faster PCIe bus used by more 
performance-sensitive devices such as network interfaces and GPUs. 
Further, emerging SCMs, such as non-volatile DIMMs (NVDIMMs), interface 
with the CPU as if they were DRAM and offer even higher levels of 
performance for non-volatile storage.


Today's PCIe-based SCMs represent an astounding three-order-of-magnitude 
performance change relative to spinning disks (~100K I/O operations per 
second versus ~100). For computer scientists, it is rare that the 
performance assumptions that we make about an underlying hardware 
component change by 1,000x or more. This change is punctuated by the 
fact that the performance and capacity of non-volatile memories continue 
to outstrip CPUs in year-on-year performance improvements, closing and 
potentially even inverting the I/O gap.


The performance of SCMs means that systems must no longer "hide" them 
via caching and data reduction in order to achieve high throughput. 
Unfortunately, however, this increased performance comes at a high 
price: SCMs cost 25x as much as traditional spinning disks ($1.50/GB 
versus $0.06/GB), with enterprise-class PCIe flash devices costing 
between three and five thousand dollars each. This means that the cost 
of the non-volatile storage can easily outweigh that of the CPUs, DRAM, 
and the rest of the server system that they are installed in. The 
implication of this shift is significant: non-volatile memory is in the 
process of replacing the CPU as the economic center of the datacenter.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Rethinking TRANSFORM FOR TYPE ...

2016-01-19 Thread Jim Nasby

I'm using the TRANSFORM feature to implement a new data type for python 
(ndarrays from numpy). I'm constantly getting tripped up by forgetting 
to add TRANSFORM FOR TYPE. Worse, the requirement for explicitly stating 
transform means I can't use a polymorphic type.


In the case of adding a new transform for an existing type, current 
behavior makes sense; you'll break all existing functions using the type 
if you just swap the representation out under them. Further, if you are 
pulling in some new extension that uses the same language and type, that 
function will be expecting the old representation, not the new one.


For the case of creating a new data type, I think explicitly requiring 
the TRANSFORM clause makes no sense. It's a bunch of extra work for 
users that adds no benefit.


A simple way to fix this would be to allow simply marking a transform as 
being DEFAULT. If a transform is marked as DEFAULT then it would 
automatically get used.


Perhaps a better way would be allowing for multiple transforms for each 
language and type. That way users aren't stuck with a single 
preconceived notion of how to represent a type. The immediate use I see 
for that is it would allow a transform to be created in something other 
than C, as long as the language you want to use can handle a raw C 
string. That desire might sound silly to a lot of -hackers, but given 
the amount of pain I went through figuring out how to properly marshal 
an ndarray back and forth in C, I sure as hell wish I could have done it 
in python! Since plpythonu understands bytea, I don't see any reason I 
couldn't have.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: PL/Pythonu - function ereport

2016-01-22 Thread Jim Nasby


On 1/21/16 4:57 PM, Pavel Stehule wrote:

It is not correct - outside PLPython you got a Error (PostgreSQL error
has not any classes), and isn't important the raising class (Error or
SPIError). Inside PL/Python you will got SPIError or successors (based
on SQLcode).


Right. The closest thing we have to error classes is SQLSTATE. If 
someone found a clever way to setup an exception inheritance tree[1] on 
that then maybe different exceptions would make sense. Short of that, I 
don't see it.


[1] There's a hierarchy to the SQL state codes, based on the first 2 
characters. So if there was...


class connection_exception(spi_exception)
  __init__
str = 'Connection Exception'

class connection_does_not_exist(connection_exception)
  __init__
str = 'Connection Does Not Exist"

...

to map to the small set of errors below, maybe that would make sense. 
Obviously that would need to be auto-generated. It seems more trouble 
than it's worth though.



Section: Class 08 - Connection Exception

08000EERRCODE_CONNECTION_EXCEPTION 
 connection_exception
08003EERRCODE_CONNECTION_DOES_NOT_EXIST 
 connection_does_not_exist
08006EERRCODE_CONNECTION_FAILURE 
 connection_failure
08001EERRCODE_SQLCLIENT_UNABLE_TO_ESTABLISH_SQLCONNECTION 
 sqlclient_unable_to_establish_sqlconnection
08004EERRCODE_SQLSERVER_REJECTED_ESTABLISHMENT_OF_SQLCONNECTION 
 sqlserver_rejected_establishment_of_sqlconnection
08007EERRCODE_TRANSACTION_RESOLUTION_UNKNOWN 
 transaction_resolution_unknown
08P01EERRCODE_PROTOCOL_VIOLATION 
 protocol_violation



--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Releasing in September

2016-01-22 Thread Jim Nasby


On 1/21/16 2:29 AM, Amit Kapila wrote:

I also think there should be some way to give credit to CFM, if it is
difficult to do anything related to money, then we can enforce that if
CFM submits any patches for next CF, then those should be prioritised.


Personally, I don't see why we have our scarcest resource doing what is 
essentially a project management task, especially when at least one 
commercial company has offered to donate paid staff time.


I don't think the last CF of 9.6 is the time to experiment, but I think 
we should try using a PM for the first CF of 9.7.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Releasing in September

2016-01-22 Thread Jim Nasby


On 1/20/16 11:40 AM, Tom Lane wrote:

Yeah.  It's certainly unfair if someone's patch doesn't get reviewed,
but there are only 24 hours in a day, and we have a limited pool of
reviewer and committer manpower.  I think we just have to say that
sometimes life is unfair.


I think that's a great way to ensure we shrink the pool of reviewers 
when someone works on a patch and then it goes nowhere. I find it rather 
difficult to get feedback on ideas before I spend the time to code 
something, it's got to be even worse for someone the community doesn't 
know. So if we're going to do this, I think there must be a mechanism 
for a patch idea/design to be approved.


I think we also need to be careful about -hackers being the only place 
feature desirability is measured. There's an entire world of users out 
there that aren't even on -general. If some feature doesn't really 
interest -hackers but there's 50 users that want it and someone willing 
to work on it, ISTM we should make efforts to get it committed.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Releasing in September

2016-01-22 Thread Jim Nasby


On 1/20/16 11:49 PM, Tom Lane wrote:

Michael Paquier  writes:

On Thu, Jan 21, 2016 at 2:30 PM, Peter Geoghegan  wrote:

What benefit does porting sqlsmith for inclusion in core have? I can
only think of costs, including those that you mentioned.



We have automatic buildfarm coverage on many platforms. Perhaps we
could live without that with a buildfarm module though.


I do not think we should necessarily try to include every testing tool
in the core distribution.  What is important is that they be readily
available: easy to find, easy to use, documented, portable.  "Same
license as the PG core code" is not on that list.

An immediately relevant example is that the buildfarm server and client
code aren't in the core distribution, and AFAIR no one has suggested
that they need to be.


Right. What I think would be far more useful is making it easier to 
explicitly test things (better tools + design for test), and something 
akin to buildfarm that will run automated testing on submitted patches.


Put another way: it's stupid that we even ask reviewers to waste time 
running make check. That can be automated. Ideally reviewers shouldn't 
be doing any testing, because the tests that are part of the patch 
should answer every question they would have, but I don't see that 
happening until we have a separate automation-only target that we don't 
care how long it takes to run.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] count_nulls(VARIADIC "any")

2016-01-22 Thread Jim Nasby


On 1/21/16 1:48 PM, Pavel Stehule wrote:

the form of regress tests is not pretty significant issue. Jim's
design is little bit transparent, Marko's is maybe little bit
practical. Both has sense from my opinion, and any hasn't
significant advantage against other.


any possible agreement, how these tests should be designed?

simple patch, simple regress tests, so there are no reason for long waiting.


I don't really see how individual tests are more practical (you can 
still cut and paste a table...), but since there's no strong consensus 
either way I'd say it's up to you as author.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Releasing in September

2016-01-22 Thread Jim Nasby


On 1/20/16 4:29 PM, Bruce Momjian wrote:

On Wed, Jan 20, 2016 at 09:12:07AM -0800, Joshua Drake wrote:

I just don't buy the Ubuntu release model for our database.  Ubuntu is
trying to balance hot features vs stability, while we are really focused
on stability, similar to Debian.


I understand but I think we are missing out on an opportunity here.
Notice that the shorter release cycle for STS will actually make
some things easier. Including:

  * Increased test base (just like Fedora/Ubuntu)
  * Increased early adopter testing (that is what early adopting is
really about for us anyway)
  * Decreased concerns about upgrades and ability to extend upgrade status.


I can see LTS working for plugin change, but not server binary changes.


s/LTS/STS/?

In any case, I think JD is onto something here. As someone that focuses 
more on user experience than "deep core" code, I already find yearly 
releases to be quite inconvenient. It's hard to find the motivation to 
make a minor improvement in something (especially knowing how hard it 
will be to get the patch approved) knowing that it won't see the light 
of day for a year, and realistically I won't be able to use it with any 
clients that are in production for 2-3 years.


Given the high level of extensibility that we have, maybe it would be 
good to logically segregate stuff into things that are deeply embedded 
in the "core" code (ie: on-disk format) from things that are much easier 
to change when necessary (like add-on functions or PLs). Things like new 
JSON operators could be released much more rapidly that way.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Releasing in September

2016-01-29 Thread Jim Nasby


On 1/22/16 12:14 PM, Andres Freund wrote:

On 2016-01-22 08:40:28 -0600, Jim Nasby wrote:

Ideally reviewers shouldn't be doing any testing, because the tests
that are part of the patch should answer every question they would
have, but I don't see that happening until we have a separate
automation-only target that we don't care how long it takes to run.


I think that's completely wrong.

Yes, more tests are good, and we need a place for longer running
tests. But assuming that every patch author will create a testsuite that
covers every angle is just about akin to assuming every submitter will
deliver perfect, bug free code. And we know how well that turns out.

I think actively trying to break a feature, and postgres in general, is
one of the most important tasks of reviewers and testers. And with that
I don't mean trying to run "make check". Look e.g. at the tests Jeff
Janes has performed, what the recent plug tests of Tomas Vondra brought
to light, or at what the full page write checker tool of Heikki's
showed.


IIRC Jeff's tests are scripted and obviously the page write checker is 
as well. I don't recall the exact methodology Tomas was using but I 
suspect it could also be scripted if you had a way to pull the plug via 
software (via a power management unit or maybe kill -9 of a VM). All of 
that is stuff that can and should be automated. Presumably it won't ever 
be part of the Makefile tests, but that's fine. Heck, the test scripts 
could stay in completely separate repos.


Where the code lives isn't the issue; it's getting stuff like this 
automated so humans can go back do doing things that can't be automated.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL Re-Writes

2016-02-01 Thread Jim Nasby


On 1/31/16 3:26 PM, Jan Wieck wrote:

On 01/27/2016 08:30 AM, Amit Kapila wrote:

operation.  Now why OS couldn't find the corresponding block in
memory is that, while closing the WAL file, we use
POSIX_FADV_DONTNEED if wal_level is less than 'archive' which
lead to this problem.  So with this experiment, the conclusion is that
though we can avoid re-write of WAL data by doing exact writes, but
it could lead to significant reduction in TPS.


POSIX_FADV_DONTNEED isn't the only way how those blocks would vanish
from OS buffers. If I am not mistaken we recycle WAL segments in a round
robin fashion. In a properly configured system, where the reason for a
checkpoint is usually "time" rather than "xlog", a recycled WAL file
written to had been closed and not touched for about a complete
checkpoint_timeout or longer. You must have a really big amount of spare
RAM in the machine to still find those blocks in memory. Basically we
are talking about the active portion of your database, shared buffers,
the sum of all process local memory and the complete pg_xlog directory
content fitting into RAM.


But that's only going to matter when the segment is newly recycled. My 
impression from Amit's email is that the OS was repeatedly reading even 
in the same segment?


Either way, I would think it wouldn't be hard to work around this by 
spewing out a bunch of zeros to the OS in advance of where we actually 
need to write, preventing the need for reading back from disk.


Amit, did you do performance testing with archiving enabled an a no-op 
archive_command?

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-01 Thread Jim Nasby

On 1/31/16 7:38 PM, Kouhei Kaigai wrote:
> I'm under investigation of SSD-to-GPU direct feature on top of
> the custom-scan interface. It intends to load a bunch of data
> blocks on NVMe-SSD to GPU RAM using P2P DMA, prior to the data
> loading onto CPU/RAM, to preprocess the data to be filtered out.
> It only makes sense if the target blocks are not loaded to the
> CPU/RAM yet, because SSD device is essentially slower than RAM.
> So, I like to have a reliable way to check the latest status of
> the shared buffer, to kwon whether a particular block is already
> loaded or not.

That completely ignores the OS cache though... wouldn't that be a major
issue?

To answer your direct question, I'm no expert, but I haven't seen any
functions that do exactly what you want. You'd have to pull relevant
bits from ReadBuffer_*. Or maybe a better method would just be to call
BufTableLookup() without any locks and if you get a result > -1 just
call the relevant ReadBuffer function. Sometimes you'll end up calling
ReadBuffer even though the buffer isn't in shared buffers, but I would
think that would be a rare occurrence.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] statistics for shared catalogs not updated when autovacuum is off

2016-02-01 Thread Jim Nasby


On 1/30/16 5:05 PM, Peter Eisentraut wrote:

When autovacuum is off, the statistics in pg_stat_sys_tables for shared
catalogs (e.g., pg_authid, pg_database) never update.  So seq_scan
doesn't update when you read the table, last_analyze doesn't update when
you run analyze, etc.

But when you shut down the server and restart it with autovacuum on, the


What about with autovacuum still off?


updated statistics magically appear right away.  So seq_scan is updated
with the number of reads you did before the shutdown, last_analyze
updates with the time of the analyze you did before the shutdown, etc.
So the data is saved, just not propagated to the view properly.

I can reproduce this back to 9.3, but not 9.2.  Any ideas?


ISTR that there's some code in the autovac launcher that ensures certain 
stats have been loaded from the file into memory; I'm guessing that the 
functions implementing the shared catalog views need something similar.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Trigger.sgml

2016-02-01 Thread Jim Nasby


On 1/28/16 8:02 PM, Tatsuo Ishii wrote:

I am working as a volunteer to translate docs to Japanese. I have been
having hard time to parse the following sentence in
doc/src/sgml/trigger.sgml.


The possibility of surprising outcomes should be considered when there
are both BEFORE INSERT and
BEFORE UPDATE row-level triggers that
both affect a row being inserted/updated (this can still be
problematic if the modifications are more or less equivalent if
they're not also idempotent).


Especially I don't understand this part:

   (this can still be problematic if the modifications are more or less
   equivalent if they're not also idempotent).

It would be great if someone could enligntend me.


I believe the idea here is that thanks to UPSERT you can now get very 
strange behavior if you have BEFORE triggers that aren't idempotent. IE:


CREATE TABLE test(
  a int PRIMARY KEY
);

BEFORE INSERT a = a - 1
BEFORE UPDATE a = a + 1

INSERT (1) -- Results in 0
INSERT (2) -- Results in 1

Now if you try to UPSERT (1), the before insert will give you a=0, which 
conflicts. So then you end up with an UPDATE, which gives you a=1 again. 
Things are even worse when you try to UPSERT (2), because the insert 
conflicts but then you try to update a row that doesn't exist (a=2).


Obviously this is a ridiculous example, but hopefully it shows the problem.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Add links to commit fests to patch summary page

2016-02-01 Thread Jim Nasby

It would be nice if the patch summary page (ie, [1]) had links to the 
relevant entry in that CF. The specific need I see is if you look up a 
patch in the current CF and it's been moved to the next CF you have to 
manually go to that CF and search for the patch.


[1] https://commitfest.postgresql.org/9/353/
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2016-02-01 Thread Jim Nasby

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, failed
Spec compliant:   not tested
Documentation:not tested

There are currently no docs or unit tests. I suspect this patch is still WIP?

create-am.5.patch:
General notes:
Needs catversion bump.

IndexQualInfo and GenericCosts have been moved to src/include/utils/selfuncs.h.

METHOD becomes an unreserved keyword.

generic-xlog.5.patch:
generic_xlog.c: At least needs a bunch of explanatory comments, if not info in 
a README. Since I don't really understand what the design here is my review is 
just cursory.

static memoryMove() - seems like an overly-generic function name to me...

writeCopyFlagment(), writeMoveFlagment(): s/Fl/Fr/?

bloom-control.5:
README:
+ CREATE INDEX bloomidx ON tbloom(i1,i2,i3) 
+WITH (length=5, col1=2, col2=2, col3=4);
+ 
+ Here, we create bloom index with signature length 80 bits and attributes
+ i1, i2  mapped to 2 bits, attribute i3 - to 4 bits.

It's not clear to me where 80 bits is coming from...
bloom.h:
+ #define BITBYTE   (8)
ISTR seeing this defined somewhere in the Postgres headers; dunno if it's worth 
using that definition instead.

Testing:
I ran the SELECT INTO from the README, as well as CREATE INDEX bloomidx. I then 
ran

insert into tbloom select
(generate_series(1,1000)*random())::int as i1,
(generate_series(1,1000)*random())::int as i2,
(generate_series(1,1000)*random())::int as i3,
(generate_series(1,1000)*random())::int as i4,
(generate_series(1,1000)*random())::int as i5,
(generate_series(1,1000)*random())::int as i6,
(generate_series(1,1000)*random())::int as i7,
(generate_series(1,1000)*random())::int as i8,
(generate_series(1,1000)*random())::int as i9,
(generate_series(1,1000)*random())::int as i10,
(generate_series(1,1000)*random())::int as i11,
(generate_series(1,1000)*random())::int as i12,
(generate_series(1,1000)*random())::int as i13
 from generate_series(1,999);

and kill -9'd the backend. After restart I did VACUUM VERBOSE without issue. I 
ran the INSERT INTO, kill -9 and VACUUM VERBOSE sequence again. This time I got 
an assert:

TRAP: FailedAssertion("!(((bool) (((const void*)((ItemPointer) left) != 
((void*)0)) && (((ItemPointer) left)->ip_posid != 0", File: "vacuumlazy.c", 
Line: 1823)

That line is

lblk = ItemPointerGetBlockNumber((ItemPointer) left);

which does

AssertMacro(ItemPointerIsValid(pointer)), \

which is the assert that's failing.

I've squirreled this install away for now, in case you can't repro this failure.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2016-02-01 Thread Jim Nasby


On 2/1/16 4:59 PM, Alvaro Herrera wrote:

Masahiko Sawada wrote:


Attached updated version patch.
Please review it.


In pg_upgrade, the "page convert" functionality is there to abstract
rewrites of pages being copied; your patch is circumventing it and
AFAICS it makes the interface more complicated for no good reason.  I
think the real way to do that is to write your rewriteVisibilityMap as a
pageConverter routine.  That should reduce some duplication there.


IIRC this is about the third problem that's been found with pg_upgrade 
in this patch. That concerns me given the potential for disaster if 
freeze bits are set incorrectly.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Raising the checkpoint_timeout limit

2016-02-01 Thread Jim Nasby


On 2/1/16 6:13 PM, Andres Freund wrote:

I'm not sure what'd actually be a good upper limit. I'd be inclined to
even go to as high as a week or so. A lot of our settings have
upper/lower limits that aren't a good idea in general.


The only reason I can see for the 1 hour limit is to try and prevent 
footguns. I think that's a valid goal, but there should be a way to 
over-ride it. And if we don't want that kind of protection then I'd say 
just yank the upper limit.



I'm also wondering if it'd not make sense to raise the default timeout
to 15min or so. The upper ceiling for that really is recovery time, and
that has really shrunk rather drastically due to faster cpus and
architectural improvements in postgres (bgwriter, separate
checkpointer/bgwriter, restartpoints, ...).


It would be interesting if someone had a large-ish 9.4 or 9.5 install 
that they could test recovery timing on. My suspicion is that as long as 
FPWs are on that you'd generally end up limited by how fast you could 
read WAL unless you exceeded the FS cache. (I'm assuming a BBU and that 
the FS and controller will do a nice job of ordering writes optimally so 
that you'll get performance similar to reads when it's time to fsync.)

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] statistics for shared catalogs not updated when autovacuum is off

2016-02-01 Thread Jim Nasby


On 2/1/16 7:20 PM, Peter Eisentraut wrote:

That's probably right.  Even with autovacuum on, the statistics for
shared catalogs do not appear as updated right away.  That is, if you
run VACUUM and then look at pg_stat_sys_tables right away, you will see
the stats for shared catalogs to be slightly out of date until the
minutely autovacuum check causes them to update.

So the problem exists in general, but the autovacuum launcher papers
over it every minute.


I suspect the issue is in backend_read_statsfile(). Presumably the if 
just needs a call to AutoVacuumingActive() added:



/*
 * Autovacuum launcher wants stats about all databases, but a shallow 
read
 * is sufficient.
 */
if (IsAutoVacuumLauncherProcess())
pgStatDBHash = pgstat_read_statsfiles(InvalidOid, false, false);
else
pgStatDBHash = pgstat_read_statsfiles(MyDatabaseId, false, 
true);


The interesting thing is that we always start the launcher one time, to 
protect against wraparound, but apparently that path doesn't call 
anything that calls backend_read_statsfile() (which is static).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add links to commit fests to patch summary page

2016-02-01 Thread Jim Nasby


On 2/1/16 6:15 PM, Alvaro Herrera wrote:

Jim Nasby wrote:

It would be nice if the patch summary page (ie, [1]) had links to the
relevant entry in that CF. The specific need I see is if you look up a patch
in the current CF and it's been moved to the next CF you have to manually go
to that CF and search for the patch.


Agreed, I could use that.  In the "status" row, each commitfest entry
(the "2015-11" text) could be a link to that patch in that commitfest.


Yeah, what I was thinking.


(You can actually construct the URL easily just by changing the
commitfest ID, which is the first number in the URL; for example 2016-01
is /8/).


*waits for someone to comment on how surrogate keys are bad*

;P
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add links to commit fests to patch summary page

2016-02-02 Thread Jim Nasby


On 2/2/16 6:35 AM, Alvaro Herrera wrote:

what I want is the link to go to
*that patch's*  page in the other commitfest.  That's also what I
think Jim wants.


+1
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Auditing

2016-02-02 Thread Jim Nasby


On 2/2/16 10:34 AM, Joshua D. Drake wrote:

Auditing is a pretty security/enterprisey-related thing that could do
with the "officially considered to of the PostgreSQL project standard
and ready for production" rubber-stamp that tends to go along with most
end-user/admin-oriented stuff shipped in the tarball.


Which is exactly why I think .Org needs an official "Extensions" project
which would completely eliminate these arguments. A project team
explicitly for vetting extensions.


Yeah, it's disappointing that PGXN doesn't seem to have really taken 
off. I'm sure a big part of that is the need for even SQL extensions to 
have server access, but I suspect part of it is because it's a separate 
project.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Auditing

2016-02-02 Thread Jim Nasby


On 2/2/16 5:00 AM, Simon Riggs wrote:

Since you've written the email here, I'd ask that you join our community
and use your knowledge and passion to make things happen.


+1. Kudos for speaking up in the first place, but it's clear that right 
now the biggest thing holding Postgres back is lack of reviewers, 
followed by lack of developers. If your company put even 10% of what it 
would pay for Oracle or MSSQL licensing back into the community (either 
via direct employee involvement or by funding development) then auditing 
could have moved a lot faster than it did.


It would also help if the community better publicized ways that 
companies could give back.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Integer overflow in timestamp_part()

2016-02-02 Thread Jim Nasby


On 2/2/16 6:39 PM, Tom Lane wrote:

I'm inclined to think that a good solution would be to create an
artificial restriction to not accept years beyond, say, 10 AD.
That would leave us with a lot of daylight to not have to worry
about corner-case overflows in timestamp arithmetic.  I'm not sure
though where we'd need to enforce such a restriction; certainly in
timestamp[tz]_in, but where else?


Probably some of the casts (I'd think at least timestamp->timestamptz). 
Maybe timestamp[tz]_recv. Most of the time*pl* functions. :/

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Auditing

2016-02-02 Thread Jim Nasby


On 2/2/16 7:25 PM, Curtis Ruck wrote:

I'm opening to testing and evaluating to see if it meets our compliance
requirements, but I am no where close to being a C developer, or having
C developers that could actually provide a meaningful review.  One issue
along this thread already pops up, concerning the client_min_messages,
and how other patches in the pipeline for 9.6 would be required to
enable the auditing to meet compliance requirements.


There's other ways you can help besides reviewing. Providing real-world 
use cases helps. Even better is maintaining things on the wiki that 
assist with moving things forward (use cases, discussion/decision 
highlights, really anything that helps move the discussion).



It just seems after reading the mailing list history, that there is a
lack of interest by people with commit authority, even though there is a
decent interest in it from the community, and honestly, no one really
likes auditing, but its one of those damned if you do (in performance)
and damned if you don't (in legal) things.


Yeah, no one that's volunteering time (as opposed to being paid to work 
on PG) is going to pick up something as unsexy and painful to deal with 
as auditing.



Additionally Robert, given your professional status, you are by no means
an unbiased contributor in this discussion.  Your stance on this matter
shows that you don't necessarily want the open source solution to
succeed in the commercial/compliance required space.  Instead of arguing


I'm sorry, but that's just ridiculous, and I completely agree with 
Robert's initial sentiment: there needs to be a damn good reason for the 
community to pick one specific implementation of something when there 
are competing solutions.



blankly against inclusion can you at least provide actionable based
feedback that if met would allow patches of this magnitude in?


It works just like any other patch does: the community has to come to a 
*consensus* that not only is the feature desired and well designed, but 
that the implementation is high quality. I haven't followed the auditing 
discussions closely, but it seems that there are still questions around 
how the feature should work.



I'm personally fine with fiscally rewarding organizations that assist my
customer in succeeding, but its hard to convince my customer to fund
open source, even though they wouldn't be able to do 75% of what they do
without it.  Based on past experience this is the same most open source
organizations face, especially when they don't have the marketing engine
that the large commercial players have.


I really don't understand that, given what most of the alternative 
solutions cost. If they balk at putting money towards developing 
Postgres they really need to get a quote for running the same amount of 
MSSQL (let alone Oracle, which is even more expensive).


I do think the community could do a better job of at least encouraging 
companies to fund development. Unfortunately there's always going to be 
some amount of friction here though, because of the question of how to 
allocate funds to the different companies that are involved. Another 
problem is no commercial company can actually guarantee anything will 
make it into community Postgres, and it's very difficult to even 
estimate the amount of effort (read as: what to charge) for getting a 
feature committed.


Commercial development is certainly possible though. 2nd Quadrant was 
able to raise a good amount of money to fund the development of hot 
standby. IIRC that was before sites like kickstarter existed too, so it 
would probably be even easier to do today.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: PL/Pythonu - function ereport

2016-02-02 Thread Jim Nasby


On 2/2/16 4:52 PM, Alvaro Herrera wrote:

Robert Haas wrote:


The eventual committer is likely to be much happier with this patch if
you guys have achieved consensus among yourselves on the best
approach.

(Disclaimer: The eventual committer won't be me.  I'm not a Python
guy.  But we try to proceed by consensus rather than committer-dictat
around here, when we can.  Obviously the committer has the final say
at some level, but it's better if that power doesn't need to be
exercised too often.)


Actually I imagine that if there's no agreement between author and first
reviewer, there might not *be* a committer in the first place.  Perhaps
try to get someone else to think about it and make a decision.  It is
possible that some other committer is able to decide by themselves but I
wouldn't count on it.


+1.

FWIW, I'd think it's better to not break backwards compatibility, but 
I'm also far from a python expert. It might well be worth adding a 
plpython GUC to control the behavior so that there's a migration path 
forward, or maybe do something like the 'import __future__' that python 
is doing to ease migration to python 3.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add links to commit fests to patch summary page

2016-02-02 Thread Jim Nasby


On 2/2/16 9:46 AM, Magnus Hagander wrote:

I'm not entirely sure what I'd use that for myself, but that's trivial
to implement. Thus, done and published.


FWIW, my use case was actually to go from 
https://commitfest.postgresql.org/8/353/ to 
https://commitfest.postgresql.org/9/353/, which I needed to do in order 
to submit the review.


So, perhaps another enhancement would be for commenting/reviewing to 
always work regardless of what commitfest ID is in the URL. Perhaps Edit 
too. I think the Status dropdown needs to stay specific to the correct 
CF though.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idle In Transaction Session Timeout, revived

2016-02-03 Thread Jim Nasby


On 2/3/16 2:30 PM, Robert Haas wrote:

On Sun, Jan 31, 2016 at 8:33 AM, Vik Fearing  wrote:

Attached is a rebased and revised version of my
idle_in_transaction_session_timeout patch from last year.

This version does not suffer the problems the old one did where it would
jump out of SSL code thanks to Andres' patch in commit
4f85fde8eb860f263384fffdca660e16e77c7f76.

The basic idea is if a session remains idle in a transaction for longer
than the configured time, that connection will be dropped thus releasing
the connection slot and any locks that may have been held by the broken
client.

Added to the March commitfest.


+1 for doing something like this.  Great idea!


Wouldn't it be more sensible to just roll the transaction back and not 
disconnect?

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Audit Extension

2016-02-03 Thread Jim Nasby

On 2/3/16 10:36 AM, Robert Haas wrote:

People who are interested in audit are also understandably leery of
>downloading code from an untrusted source.  Both PGXN and GitHub are The
>Wild West as far as conservative auditors are concerned.

I hate to be rude here, but that's not my problem.  You can put it on
your corporate web site and let people download it from there.  I'm
sure that auditors are familiar with the idea of downloading software
from for-profit companies.  Do they really not use any software from
Microsoft or Apple, for example?  If the problem is that they will
trust the PostgreSQL open source project but not YOUR company, then I
respectfully suggest that you need to establish the necessary
credibility, not try to piggyback on someone else's.

Luckily pgaudit is it's own group on Github 
(https://github.com/pgaudit), so it doesn't even have to be controlled 
by a single company. If others care about auditing I would hope that 
they'd contribute code there and eventually become a formal member of 
the pgaudit project.

As for PGXN being an untrusted source, that's something that it's in the 
project's best interest to try and address somehow, perhaps by having 
formally audited extensions. Amazon already has to do this to some 
degree before an extension can be allowed in RDS, and so does Heroku, so 
maybe that would be a starting point.

I think a big reason Postgres got to where it is today is because of 
it's superior extensibility, and I think continuing to encourage that 
with formal support for things like PGXN is important.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Raising the checkpoint_timeout limit

2016-02-03 Thread Jim Nasby


On 2/2/16 10:10 PM, Robert Haas wrote:

Now, you could also set such configuration settings in
a situation where it will not work out well.  But that is true of most
configuration settings.


Yeah, if we're going to start playing parent then I think the first 
thing to do is remove the fsync GUC. The AWS team has done testing that 
shows it to be worthless from a performance standpoint now that we have 
synchronous commit, and it's an extremely large foot-bazooka to have 
laying around.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idle In Transaction Session Timeout, revived

2016-02-03 Thread Jim Nasby


On 2/3/16 4:05 PM, David Steele wrote:

On 2/3/16 4:25 PM, Tom Lane wrote:

Robert Haas  writes:

On Wed, Feb 3, 2016 at 3:41 PM, Jim Nasby  wrote:

Wouldn't it be more sensible to just roll the transaction back and not
disconnect?


I'm not sure how messy this would be in practice.  But if we think that
killing the whole session is not desirable but something we're doing for
expediency, then it would be worth looking into that approach.


I think killing the session is a perfectly sensible thing to do in this
case.  Everything meaningful that was done in the session will be rolled
back - no need to waste resources keeping the connection open.


Except you end up losing stuff like every GUC you've set, existing temp 
tables, etc. For an application that presumably doesn't matter, but for 
a user connection it would be a PITA.


I wouldn't put a bunch of effort into it though. Dropping the connection 
is certainly better than nothing.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Batch update of indexes

2016-02-03 Thread Jim Nasby


On 1/21/16 11:47 AM, Konstantin Knizhnik wrote:

BTW, could you explain, what is the reason to copy data into the
pending list and then copy it again while flushing pending list into
the index? Why not read this data directly from the table? I feel that
I've missed something important here.


No, I do  not think that inserted data should be placed in pending list
and then copied to main table.
It should be stored directly in the main table and "pending list" is
just some fast, transient index.


That sounds similar to what we would need to support referencing OLD and 
NEW in per-statement triggers: a good way to find everything that was 
changed in a statement.


Or if you will, s/statement/transaction/.

Having that is probably a prerequisite for doing incremental refresh 
materialized views.


My suspicion is that it would be useful to pre-order the new data before 
trying to apply it to the indexes.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Sanity checking for ./configure options?

2016-02-03 Thread Jim Nasby

I just discovered that ./configure will happily accept '--with-pgport=' 
(I was actually doing =$PGPORT, and didn't realize $PGPORT was empty). 
What you end up with is a compile error in guc.c, with no idea why it's 
broken. Any reason not to have configure or at least make puke if pgport 
isn't valid?

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "using previous checkpoint record at" maybe not the greatest idea?

2016-02-04 Thread Jim Nasby


On 2/4/16 5:09 PM, David G. Johnston wrote:


What the 2nd para in the documentation is saying is something different:
it is talking about reading all the pg_xlog files (in reverse order),
which is not pg_control, and see what checkpoint records are there, then
figure out which one to use.


Yes, I inferred something that obviously isn't true - that the system
doesn't go hunting for a valid checkpoint to begin recovery from.  While
it does not do so in the case of a corrupted pg_control file I further
assumed it never did.  That would be because the documentation doesn't
make the point of stating that two checkpoint positions exist and that
PostgreSQL will try the second one if the first one proves unusable.
Given the topic of this thread that omission makes the documentation
out-of-date.  Maybe its covered elsewhere but since this section
addresses locating a starting point I would expect any such description
to be here as well.


Yeah, I think we should fix the docs. Especially since I imagine that if 
you're reading that part of the docs you're probably having a really bad 
day, and bad info won't help you...

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "using previous checkpoint record at" maybe not the greatest idea?

2016-02-04 Thread Jim Nasby


On 2/4/16 3:37 PM, Andres Freund wrote:

On 2016-02-03 09:28:24 -0500, Robert Haas wrote:

Would we still have some way of forcing the older checkpoint record to
be used if somebody wants to try to do that?


I think currently the best way to force an arbitrary checkpoint to be
used is creating a "custom" backup label. Not that nice.  Not sure if we
need something nice here, I don't really see a frequent need for this.

We could add another option to pg_resetxlog alternatively :/


I guess you'd have to scan through WAL files by hand to find the next 
oldest checkpoint?


I'm guessing that if this is happening in the field there's a decent 
chance people aren't noticing it, so maybe the best thing for now is to 
turn off the automatic behavior bust still have a relatively easy way to 
re-enable it. In case this is more common than we think...

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: PL/Pythonu - function ereport

2016-02-04 Thread Jim Nasby


On 2/4/16 3:13 AM, Catalin Iacob wrote:

Thanks for the overview. Very helpful.


I find existing behaviour for 2, 3 and 4 unlike other Python APIs I've
seen, surprising and not very useful. If I want to log a tuple I can
construct and pass a single tuple, I don't see why plpy.info() needs
to construct it for me. And for the documented, single argument call
nothing changes.


Agreed, that usage is wonky.


The question to Bruce (and others) is: is it ok to change to the new
behaviour illustrated and change meaning for usages like 2, 3 and 4?


If any users have a bunch of code that depends on the old behavior, 
they're going to be rather irritated if we break it. If we want to 
depricate it then I think we need a GUC that allows you to get the old 
behavior back.



If we don't want that, the solution Pavel and I see is introducing a
parallel API named plpy.raise_info or plpy.rich_info or something like
that with the new behaviour and leave the existing functions
unchanged. Another option is some compatibility GUC but I don't think
it's worth the trouble and confusion.


If we're going to provide an alternative API, I'd just do 
plpy.raise(LEVEL, ...).


At this point, my vote would be:

Add a plpython.ereport_mode GUC that has 3 settings: current 
(deprecated) behavior, allow ONLY 1 argument, new behavior. The reason 
for the 1 argument option is it makes it much easier to find code that's 
still using the old behavior. I think it's also worth having 
plpy.raise(LEVEL, ...) as an alternative.


If folks feel that's overkill then I'd vote to leave the existing 
behavior alone and just add plpy.raise(LEVEL, ...).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-04 Thread Jim Nasby

On 2/4/16 12:30 AM, Kouhei Kaigai wrote:
>> 2. A feature to suspend i/o write-out towards a particular blocks
>> >that are registered by other concurrent backend, unless it is not
>> >unregistered (usually, at the end of P2P DMA).
>> >==> to be discussed.

I think there's still a race condition here though...

A
finds buffer not in shared buffers

B
reads buffer in
modifies buffer
starts writing buffer to OS

A
Makes call to block write, but write is already in process; thinks
writes are now blocked
Reads corrupted block
Much hilarity ensues

Or maybe you were just glossing over that part for brevity.

...

> I tried to design a draft of enhancement to realize the above i/o write-out
> suspend/resume, with less invasive way as possible as we can.
> 
>ASSUMPTION: I intend to implement this feature as a part of extension,
>because this i/o suspend/resume checks are pure overhead increment
>for the core features, unless extension which utilizes it.
> 
> Three functions shall be added:
> 
> extern intGetStorageMgrNumbers(void);
> extern f_smgr GetStorageMgrHandlers(int smgr_which);
> extern void   SetStorageMgrHandlers(int smgr_which, f_smgr smgr_handlers);
> 
> As literal, GetStorageMgrNumbers() returns the number of storage manager
> currently installed. It always return 1 right now.
> GetStorageMgrHandlers() returns the currently configured f_smgr table to
> the supplied smgr_which. It allows extensions to know current configuration
> of the storage manager, even if other extension already modified it.
> SetStorageMgrHandlers() assigns the supplied 'smgr_handlers', instead of
> the current one.
> If extension wants to intermediate 'smgr_write', extension will replace
> the 'smgr_write' by own function, then call the original function, likely
> mdwrite, from the alternative function.
> 
> In this case, call chain shall be:
> 
>FlushBuffer, and others...
> +-- smgrwrite(...)
>  +-- (extension's own function)
>   +-- mdwrite

ISTR someone (Robert Haas?) complaining that this method of hooks is
cumbersome to use and can be fragile if multiple hooks are being
installed. So maybe we don't want to extend it's usage...

I'm also not sure whether this is better done with an smgr hook or a
hook into shared buffer handling...
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Batch update of indexes

2016-02-04 Thread Jim Nasby


On 2/4/16 1:37 AM, konstantin knizhnik wrote:

>My suspicion is that it would be useful to pre-order the new data before 
trying to apply it to the indexes.

Sorry, but ALTER INDEX is expected to work for all indexes, not only B-Tree, 
and for them sorting may not be possible...
But for B-Tree presorting inserted data should certainly increase performance.
I will think about it.


I wasn't talking about ALTER INDEX.

My theory is that if you're doing a large DML operation it might be more 
efficient to update an index as a single bulk operation, instead of 
doing it for each tuple.


If you want to do that, then you need an efficient method for finding 
everything that a DML statement changed. That's the exact same thing we 
need to support statement-level triggers being able to reference NEW and 
OLD. It's probably also what we need to support incremental update matviews.


If we had such a capability then we could add options to the AM 
infrastructure to allow indexes to support doing bulk maintenance as 
well as per-tuple maintenance (or even support only bulk maintenance...)


I don't think any of that has anything to do with ALTER INDEX.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2230 matches

Mail list logo