from:"Serge Rielau"

Re: [HACKERS] proposal: schema variables

2017-10-31 Thread Serge Rielau

" Although the syntax of CREATE TEMPORARY TABLE resembles that of the SQL 
standard, the effect is not the same. In the standard, temporary tables are 
defined just once and automatically exist (starting with empty contents) in 
every session that needs them. PostgreSQL instead requires each session to 
issue its own CREATE TEMPORARY TABLE command for each temporary table to be 
used. This allows different sessions to use the same temporary table name for 
different purposes, whereas the standard's approach constrains all instances of 
a given temporary table name to have the same table structure.”
Yeah, that’s a DECLAREd table in my book. No wonder we didn’t link up.
Cheers Serge

Re: [HACKERS] proposal: schema variables

2017-10-31 Thread Serge Rielau

Pavel, I can imagine, so DECLARE command will be introduced as short cut 
for CREATE TEMP VARIABLE, but in this moment I would not to open this 
topic. I afraid of bikeshedding and I hope so CREATE TEMP VAR is anough. 
Language is important because language stays. You choice of syntax will 
outlive your code and possibly yourself.

My 2 cents Serge

Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-27 Thread Serge Rielau

via Newton Mail 
[https://cloudmagic.com/k/d/mailapp?ct=dx=9.8.79=10.12.6=email_footer_2]
On Fri, Oct 27, 2017 at 2:42 PM, Peter Geoghegan  wrote:
On Fri, Oct 27, 2017 at 2:13 PM, srielau  wrote:
> While the standard may not require a unique index for the ON clause I have
> never seen a MERGE statement that did not have this property. So IMHO this
> is a reasonable restrictions.

The Oracle docs on MERGE say nothing about unique indexes or
constraints. They don't even mention them in passing. They do say
"This statement is a convenient way to combine multiple operations. It
lets you avoid multiple INSERT, UPDATE, and DELETE DML statements."

SQL Server's MERGE docs do mention unique indexes, but only in
passing, saying something about unique violations, and that unique
violations *cannot* be suppressed in MERGE, even though that's
possible with other DML statements (with something called
IGNORE_DUP_KEY).

What other systems *do* have this restriction? I've never seen one that did. 
Not clear what you are leading up to here. When I did MERGE in DB2 there was 
also no limitation: " Each row in the target can only be operated on once. A 
row in the target can only be identified as MATCHED with one row in the result 
table of the table-reference” What there was however was a significant amount 
of code I had to write and test to enforce the above second sentence. IIRC it 
involved, in the absence of a proof that the join could not expand, adding a 
row_number() over() AS rn over the target leg of the join and then a 
row_number() over(partition by rn) > 1 THEN RAISE_ERROR() to catch violators. 
Maybe in PG there is a trivial way to detect an expanding join and block it at 
runtime.
So the whole point I’m trying to make is that I haven’t seen the need for the 
extra work I had to do once the feature appeared in the wild.
Cheers Serge

Re: [HACKERS] generated columns

2017-09-12 Thread Serge Rielau


> On Sep 12, 2017, at 12:35 PM, Jaime Casanova <jaime.casan...@2ndquadrant.com> 
> wrote:
> 
> also is interesting that in triggers, both before and after, the
> column has a null. that seems reasonable in a before trigger but not
> in an after trigger
Why is a NULL reasonable for before triggers?
If I create a table with a column with default and I omit that column on INSERT
Is the column value also NULL in the before trigger? (I hope not)

BTW, the original idea behind generated columns was to materialize them.
Reason being to avoid expensive computations of frequently used expressions 
(and to support indexing in the absence of indexes with expressions)
 
You may find the following amusing:
https://www.ibm.com/developerworks/community/blogs/SQLTips4DB2LUW/entry/expression_generated_columns?lang=en
 
<https://www.ibm.com/developerworks/community/blogs/SQLTips4DB2LUW/entry/expression_generated_columns?lang=en>

Cheers
Serge Rielau
salesforce.com

Re: [HACKERS] Silent bug in transformIndexConstraint

2017-08-23 Thread Serge Rielau

Never mind. I take that back. The problem is not in community code.

Cheers
Serge


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Silent bug in transformIndexConstraint

2017-08-23 Thread Serge Rielau

In parse_utilcmd.c: transformIndexConstraint() resides the following piece 
of code:



/* * For UNIQUE and PRIMARY KEY, we just have a list of column names. * * 
Make sure referenced keys exist. If we are making a PRIMARY KEY index, * 
also make sure they are NOT NULL, if possible. (Although we could leave * 
it to DefineIndex to mark the columns NOT NULL, it's more efficient to * 
get it right the first time.) */ foreach(lc, constraint->keys) { char *key 
= strVal(lfirst(lc));
The strVal() is wrong since first(lc) returns an IndexElem * and not a 
Value * and we should be doing: char *key = ((IndexElem *) 
lfirst(lc))->name
The existing code only works by luck because Value.val.str happens to match 
the same offset as IndexElem.name.

Re: [HACKERS] CTE inlining

2017-05-09 Thread Serge Rielau

On Tue, May 9, 2017 at 12:22 PM, David G. Johnston  
wrote:
On Tue, May 9, 2017 at 12:15 PM, Peter Eisentraut < 
peter.eisentr...@2ndquadrant.com [peter.eisentr...@2ndquadrant.com] > wrote:
On 5/5/17 08:43, David Rowley wrote:
> How about we get the ball rolling on this in v10 and pull that part
> out of the docs. If anything that'll buy us a bit more wiggle room to
> change this in v11.
>
> I've attached a proposed patch.

If we just tell them that the thing they might have relied on might go
away, without a replacement to suggest, then we're just confusing and
scaring them, no?

We'd end up suggesting our OFFSET 0 hack as true protection. If they know for a 
fact that their use of CTE for its barrier properties is not supported they are 
also more likely to document intentional usage with something like: "-- CHANGE 
THIS ONCE VERSION 11 IS RELEASED!!! --" which would make finding the call sites 
that need to add the new "MATERIALIZED" keyword much easier. How about adding 
MATERIALIZED now (in 10) as a noise word. Give people a release to switch over 
before pulling the rug.. Cheers Serge

Re: [HACKERS] CTE inlining

2017-05-04 Thread Serge Rielau

In my past life when I was faced with such debates I argued that the number 
of customers We are hoping to attract in the future is much bigger than the 
ones we risk offending. Doesn't mean I wanted to piss everyone off. Just 
that I didn't want to be held hostage by history.

Cheers Serge
PS: On the opposing side was typically the mainframe crowd hint hint

Re: [HACKERS] CTE inlining

2017-05-04 Thread Serge Rielau

I haven't been keeping close tabs either, but surely we still have to have
the optimization fence in (at least) all these cases:

* CTE contains INSERT/UPDATE/DELETE
* CTE contains SELECT FOR UPDATE/SHARE (else the set of rows that get
locked might change)
* CTE contains volatile functions

I'm willing to write off cases where, eg, a function should have been
marked volatile and was not. That's user error and there are plenty
of hazards of that kind already. But if the optimizer has reason
to know that discarding the fence might change any query side-effects,
it mustn't. Yes! +100
Cheers Serge

Re: [HACKERS] CTE inlining

2017-05-04 Thread Serge Rielau


> On May 4, 2017, at 3:02 AM, Gavin Flower <gavinflo...@archidevsys.co.nz> 
> wrote:
> 
> On 30/04/17 16:28, Tom Lane wrote:
>> Craig Ringer <craig.rin...@2ndquadrant.com> writes:
>>> - as you noted, it is hard to decide when it's worth inlining vs
>>> materializing for CTE terms referenced more than once.
>> [ raised eyebrow... ]  Please explain why the answer isn't trivially
>> "never".
>> 
>> There's already a pretty large hill to climb here in the way of
>> breaking peoples' expectations about CTEs being optimization
>> fences.  Breaking the documented semantics about CTEs being
>> single-evaluation seems to me to be an absolute non-starter.
>> 
>>  regards, tom lane
>> 
>> 
> Could not each CTE be only evaluated once, but restricted (as far as is 
> practicable) to the rows actually needed by the body of the SELECT?
> 
Tom,

Are you worried about semantics or performance?
With proper detection of mutating functions and snapshot isolation I do not see 
how a user would detect “lack of” single evaluation.
As for performance we’d be talking about what? An uncorrelated inner of a 
nested loop join?

Anyway it seems to me that there a multiple properties at play here which are 
quite orthogonal.

1. Full materialization/Slow materialization/pipelining
  I cannot come up with any scenario where full materialization would be 
beneficial from a performance point of view (which speaks to Gavin’s view).
  I can see it from a semantic point of view when order of execution may matter 
(for example with embedded DML and triggers present).
  As soon as semantics are at play having syntax is absolutely the right thing: 
+1 for MATERIALIZE
2.Pushing predicates (or other operators) into the CTE.
   All this can ever do is reduce the number of rows being looked at.
   As long as the optimizer is careful, not to do what it isn’t supposed to do 
in a nested query (push past a mutating function) I don’t see the harm
3. Replicating the CTE to push distinct operators from different consumers.
   Again this can only be done if there are no mutators or non deterministic 
operators.

To summarize: big +1 to preserve the existing behavior with MATERIALIZE and to 
set CTE’s free by default with the onus on the optimizer to prove semantic 
equivalence.

Cheers
Serge Rielau
Salesforce.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Serge Rielau

via Newton Mail 
[https://cloudmagic.com/k/d/mailapp?ct=dx=9.4.52=10.11.6=email_footer_2]
On Tue, Apr 25, 2017 at 3:48 PM, Doug Doole  wrote: It's 
not always that simple, at least in postgres, unless you disregard
search_path. Consider e.g. cases like

CREATE SCHEMA a;
CREATE SCHEMA b;
CREATE TABLE a.foobar(somecol int);
SET search_patch = 'b,a';
SELECT * FROM foobar;
CREATE TABLE b.foobar(anothercol int);
SELECT * FROM foobar; -- may not be cached plan from before!

it sounds - my memory of DB2 is very faint, and I never used it much -
like similar issues could arise in DB2 too?

DB2 does handle this case. Unfortunately I don't know the details of how it 
worked though.
A naive option would be to invalidate anything that depends on table or view 
*.FOOBAR. You could probably make it a bit smarter by also requiring that 
schema A appear in the path. While this specific scenario does not arise in DB2 
since it uses CURRENT SCHEMA only for tables (much to my dislike) your examples 
holds for functions and types which are resolved by path. For encapsulated SQL 
(in views, functions) conservative semantics are enforced via including the 
timestamp. For dynamic SQL the problem you describe does exist though and I 
think it is handled in the way Doug describes. However, as noted by Doug the 
topic of plan invalidation is really orthogonal to normalizing the queries. All 
it does is provide more opportunities to run into any pre-existing bugs.
Cheers Serge
PS: I’m just starting to look at the plan invalidation code in PG because we 
are dealing with potentially 10s of thousands of cached SQL statements. So 
these complete wipe outs or walks of every plan in the cache don’t scale.

Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Serge Rielau


> On Apr 25, 2017, at 1:37 PM, Konstantin Knizhnik  
> wrote:
>> 
>> SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;
>> 
>> You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.
> 
> I am substituting only string literals. So the query above will be 
> transformed to 
> 
> SELECT $1::CHAR(10) || $2, 5 + 6;
> 
> What's wrong with it?

Oh, well that leaves a lot of opportunities on the table, doesn’t it?

Cheers
Serge

Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Serge Rielau

On Tue, Apr 25, 2017 at 9:45 AM, Konstantin Knizhnik 
<k.knizh...@postgrespro.ru> wrote: On 25.04.2017 19:12, Serge Rielau wrote:

On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik < k.knizh...@postgrespro.ru 
[k.knizh...@postgrespro.ru] > wrote: Another problem is caused by using integer 
literals in context where parameters can not be used, for example "order by 1”.
You will also need to deal with modifiers in types such as VARCHAR(10). Not 
sure if there are specific functions which can only deal with literals (?) as 
well. Sorry, I do not completely understand how presence of type modifiers can 
affect string literals used in query.
Can you provide me some example? SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;
You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.
Also some OLAP syntax like “rows preceding”
It pretty much boils down to whether you can do some shallow parsing rather 
than expending the effort to build the parse tree.
Cheers Serge

Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Serge Rielau

> On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik <k.knizh...@postgrespro.ru> 
> wrote:
> Another problem is caused by using integer literals in context where 
> parameters can not be used, for example "order by 1”.
You will also need to deal with modifiers in types such as VARCHAR(10). Not 
sure if there are specific functions which can only deal with literals (?) as 
well.

Doug Doole did this work in DB2 LUW and he may be able to point to more places 
to watch out for semantically.

Generally, in my experience, this feature is very valuable when dealing with 
(poorly designed) web apps that just glue together strings.
Protecting it under a GUC would allow to only do the work if it’s deemed likely 
to help.
Another rule I find useful is to abort any efforts to substitute literals if 
any bind variable is found in the query.
That can be used as a cue that the author of the SQL left the remaining 
literals in on purpose.

A follow up feature would be to formalize different flavors of peeking. 
I.e. can you produce a generic plan, but still recruit the initial set of bind 
values/substituted literals to dos costing?

Cheers
Serge Rielau
Salesforce.com <http://salesforce.com/>

PS: FWIW, I like this feature.

Re: [HACKERS] Fast Default WIP patch for discussion

2017-04-05 Thread Serge Rielau

Andres,
Yes, I still want to push this in. However I have not had time to get back to 
it. I’m embarrassed to say that I don’t even know where the comments that were 
issued occurred.
Cheers Serge

via Newton Mail 
[https://cloudmagic.com/k/d/mailapp?ct=dx=9.4.52=10.11.6=email_footer_2]
On Wed, Apr 5, 2017 at 4:47 PM, Andres Freund <and...@anarazel.de> wrote:
Hi Serge,

On 2016-10-28 08:28:11 -0700, Serge Rielau wrote:
> Time for me to dig into that then.

Are you planning to update your POC at some point? This'd be a very
welcome improvement.

Regards,

Andres

Re: [HACKERS] Packages: Again

2017-02-03 Thread Serge Rielau


> 
> Still I little bit afraid about nesting - Postgres allows function 
> overloading with specific mechanism of selecting called function. Sometimes 
> it is problematic now, and the this structure is flat. 
> 
> I like a idea of more close relation between function and schema. This means 
> implicit setting of SEARCH_PATH to function schema. It is simply but powerful 
> idea. 
> 
> CREATE FUNCTION mod.func() 
> AS $$ $$ MODULE VISIBILITY
> 
> can be dynamically executed like
> CREATE FUNCTION mod.fun()
> AS $$
>   SET SEARCH_PATH TO "mod,$SEARCH_PATH";
>   ..
> $$;
Ah, yes. 
It is my understanding that PG treats functions more like macros.
That is the search_path for queries inside the function is not fixed to the one 
in effect when the function was defined.
This does have advantages in some cases, but it hurts in this case.
What you are describing is syntax to force that in some form or other.
This is actually not hard to do at all.
PG already records the search path in the function cache (I think) and plan 
source cache (I know), just not in pg_proc.
If PG supported this functionality it would improve the mapping.

> I understand well so missing nested structures is big complication when you 
> do port from environment where this functionality is used. But the nesting 
> means usually more complex solution - and in these days I don't believe so it 
> is necessary. PLpgSQL doesn't allow nested functions - it is strongly reduced 
> against original PL/SQL - and it is visible only when you do migration from 
> Oracle. 
I’m not talking about nested PLpgSQL function definition. That is indeed rather 
advanced and while I have seen it (and its’s supported in DB2 of that reason) I 
would not consider it high priority.
Multiple packages in different schema are common however because applications 
use schemas for versioning. That’s why flattening the package into a schema as 
supported today does not work in these cases.

Cheers
Serge

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Packages: Again

2017-02-03 Thread Serge Rielau

> DB2 propose using schemas instead packages
>
> https://www.ibm.com/developerworks/data/library/techarticle/dm-0711zubiri/ 
> [https://www.ibm.com/developerworks/data/library/techarticle/dm-0711zubiri/] 
> That article by Adriana is 6 years ago and was written actually while we 
> implemented MODULE’s for DB2 9.7. So yes, when you don’t have modules, 
> schemata are the way to go in the same way as when all you have is a hammer 
> everything is a nail. We considered MODULEs an absolute must to get 
> functional equivalency to Oracle PL/SQL packages. Also they wouldn’t take up 
> so much space in the standard if they would be deemed to provide no 
> function... > Now I am working with Oracle application - and I try to 
> understand to Oracle
> developers - often pattern is using "Oracle schema" as database - and then
> the packages has sense. But there is not a mapping "Oracle schema" =
> "PostgreSQL schema" - and packages is a redundant concept in Postgres (and
> in all db, where the schema are like namaspace - MSSQL, DB2, MySQL). I have 
> never heard the claim that database in Oracle matches schema in other DBMS. 
> In my experience Oracle is well in line on the schema front with the 
> exception of the one-to-one relationship between schema and user.
The database-is-really-a-schema mapping is something we (at DB2) traditionally 
associated with Sybase and SQL Server migrations where we saw plenty of small 
databases with cross database queries.
Having said all that I think schemata are quite powerful in Postgres, not least 
because of the clean usage of search_path for all object resolution and schema 
being independent of user. They get us a fair ways. The main gap remains the 
inability to do any sort of nesting. To have two “package-like-things” with the 
same name. I’m not going to repeat myself on that one and bore everyone. My 
thinking on modules is someone reflected here: 
https://www.ibm.com/developerworks/community/blogs/SQLTips4DB2LUW/entry/module?lang=en
Cheers Serge

Re: [HACKERS] Packages: Again

2017-01-13 Thread Serge Rielau

On Fri, Jan 13, 2017 at 4:24 PM, Peter Geoghegan <p...@heroku.com> wrote:
On Fri, Jan 13, 2017 at 3:44 PM, Serge Rielau <se...@rielau.com> wrote:
> And sometimes the community DOES go its own way rather than implementing the 
> standard. For example by rejecting the MERGE statement in favor of another 
> syntax and semantic.

That's total nonsense.

MERGE isn't UPSERT…. Peter, you are misreading what I wrote. I did not allege 
that PostgreSQL did the wrong thing. And you are essentially confirming that 
there was debate and MERGE deemed to be not what was wanted. So PG, with 
reason, went with something not in the standard.
That is precisely my point!
Packages are not modules are not nested schemata either. And the argument that 
nested schemata are a no-go because of the standard is invalid for the same 
reason discarding an option other than MERGE because that’s the only thing in 
the standard was invalid.
But what irks me in this debate is that any reasoned and detailed argumentation 
of value of the principle itself is shut down with un-reasoned and un-detailed 
one-liners. “I’m not convinced” is not an argument. Counterpoints require 
content. Something starting with “because …”
If the community does not believe that there is value in a more refined 
grouping of objects than a schema the discussion in DOA. If there is consensus 
that there is value one can debate about the best semantics and language 
covering it.
Cheers Serge

Re: [HACKERS] Packages: Again

2017-01-13 Thread Serge Rielau

On Fri, Jan 13, 2017 at 2:46 PM, Kevin Grittner <kgri...@gmail.com> wrote:
On Fri, Jan 13, 2017 at 12:35 PM, Serge Rielau <se...@rielau.com> wrote:

> Yes my proposal to nest schemata is “radical” and this community
> is not falling into that camp.
> But there is nothing holy about database.schema.object.attribute

It is mandated by the U.S. and international SQL standard documents. Compliance 
to the standard does not prohibit extensions to the standard. That is, in fact, 
how the standard is progressed.
The SQL/PSM standard introduced another “dot”: database.schema.module.object
And sometimes the community DOES go its own way rather than implementing the 
standard. For example by rejecting the MERGE statement in favor of another 
syntax and semantic.
Cheers Serge

Re: [HACKERS] Packages: Again

2017-01-13 Thread Serge Rielau

On Fri, Jan 13, 2017 at 12:45 PM, Pavel Stehule  
wrote: show patch and show a advantages against schema, please. I have tried to 
describe the advantage. If the community doesn’t agree, that’s fine. I do not 
see how expending the effort of back porting a patch (and getting clearance for 
that from my employer) will enhance my argument.
Cheers Serge

Re: [HACKERS] Packages: Again

2017-01-13 Thread Serge Rielau


> On Jan 13, 2017, at 11:11 AM, Pavel Stehule  wrote:
> 
> With Postgres we should to think much more about other PL - there is not only 
> PL/pgSQL. So any what we create should be available for any PL. Our PLpgSQL 
> is based on total different technology design - so some benefits of sharing 
> compiled code across databases has not too value in Postgres.
Let me stress one last point:
MODULE’s are 100% orthogonal to PLpgSQL as implement by SFDC and also 
orthogonal to SQL PL as implemented by DB2.
Modules can (and do for us) contain C-functions of example.
Similarly when the community provides provides server side session variables I 
have no doubt they will integrate with MODULE’s with very little work.

It’s a DDL and name resolution game, predominantly

Cheers
Serge

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Packages: Again

2017-01-13 Thread Serge Rielau


> On Jan 13, 2017, at 10:23 AM, Pavel Stehule  wrote:
> 
> I have not clean feeling from this - I am pretty sure so I am afraid 
> schizophrenic  between MODULES, SCHEMAS. Nested schemas increase complexity 
> of searching complexity and breaks a logic database.schema.object
Yes my proposal to nest schemata is “radical” and this community is not falling 
into that camp.
But there is nothing holy about database.schema.object.attribute
.
> 
> Currently almost all in PostgreSQL PL design is primitive, but that means 
> pretty simple too.
We are having > 30,000 functions with a total of millions of lines of code.

You are describing a self fulfilling prophecy here.
As long as the community codebase only caters to its existing users you will 
not see a change in the usage pattern of the base.

> It is hard to see a advantages of this proposal. 

At least I tried :-)

Serge





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Packages: Again

2017-01-13 Thread Serge Rielau

> 
> * A design that can fit in with PostgreSQL
> * Solid benefits beyond "makes life easier for Oracle users" to
> justify each feature/change
> * Funding/time to make it happen
> 
> So far, I haven't seen anyone with one of those, let alone all three.
OK, I’ll bite…

* In SFDC’s extension of PostgreSQL we nest namespaces.
  This was done before my time here, but its very stable. It's easy to keep 
merged and not that much code.
  To make the special semantics of these nested namespaces evident however we 
leaned on the SQL/PSM standard and call them MODULE’s.
  Unlike the standard our MODULEs share the namespace (no pun intended) with 
regular schemata which seems practical and limits confusion when referencing 
  a module without schema qualification.
  
  We did extend upon the standard with ALTER MODULE .. ADD [FUNCTION | TYPE | 
…] syntax.
  Just like few users create a new schema with tables in one statement, no-one 
actually creates a module with content in one statement (unless, as in Oracle 
they have to).
  This was done before my time as well, but parallels what we implemented did 
in DB2 for the reasons described earlier in this thread.
  You want to be able to modify members of a module separately.

  Starting with a blank slate I do wonder whether simply allowing nesting of 
namespaces would be sufficient to achieve the vast majority of the goal.
  I.e. CREATE SCHEMA .
  The rest… follows trivially :-)

* Benefits:
  a) The files on my computer are organized in directories that have more than 
one level of nesting.
   I simply can’t imagine having thousands or tens of thousands of objects 
lying around and only one coarse way of subdividing them.
  This is compounded by the desire you version. I want to the same names 
for objects across multiple concurrently present versions of the schema.
   If I consume the schema for the version the entire schema for a version 
becomes a flat jumple.
  b) Access control
  By putting things that belong together actually together in an explicit 
way I can achieve scoping without having to resort to permissions.
  I can simply postulate that all objects in a module are private unless 
they are published.
  Access control happens at the module level.
 This is no different than library management on your OS.
 You don’t chmod the individual entry points!
 c) Scoping
 Similar to the above, but more related to search path.
 Within a module I can be assured that any unqualified references will 
first resolve within the module.
 No mucking with the search path by anyone will cause me to execute the 
wrong function, resolve to the wrong type etc.  

  Simply put: As long as we agree that users want to implement substantial 
server side logic the conclusion that standard programming 
  abstractions such as classes and member functions are a boon seems to be 
obvious.

  Note that I have been careful not to tie modules too strongly to specific 
types. Conceptually I see nothing from with a module, table, view, etc.
  It’s just a bit more “far out” since there is AFAIK no precedence. 

* IFF our existing efforts (fast defaults and executor runtime improvements) to 
work with the community are successful I would happily lobby 
  to at least port our module code to the community codebase. We can take it 
from there.

Cheers
Serge Rielau
Salesforce.com

Re: [HACKERS] missing optimization - column <> column

2016-12-05 Thread Serge Rielau

Actually there are lots of things that can be done with this sort of theorem 
proving.
And NULL is a plenty good answer for a filter, just not for a check constraint.
Amongst them INSERT through UNION ALL for symmetric views which can be handy 
for FDW partitioned tables.

One such implementation an be found here:
https://www.google.com/patents/US6728952 (apparently expired)


Cheers
Serge

Salesforce.com

 
> On Dec 5, 2016, at 7:28 AM, Pavel Stehule  wrote:
> 
> 
> 
> 2016-12-05 16:24 GMT+01:00 Tom Lane  >:
> Pavel Stehule > 
> writes:
> > I found some crazy queries in one customer application. These queries are
> > stupid, but it was surprise for me so there are not some simple optimization
> 
> > create table foo(a int);
> > insert into foo select generate_series(1,10);
> > analyze foo;
> > explain select * from foo where a <> a;
> 
> > It does full scan of foo, although it should be replaced by false in
> > planner time.
> 
> > Same issue is a expression a = a .. can be replaced by true
> 
> Wrong; those expressions yield NULL for NULL input.  You could perhaps
> optimize them slightly into some form of is-null test, but it hardly
> seems worth the planner cycles to check for.
> 
> understand
>  
> 
> If you write something like "1 <> 1", it will be folded.
> 
> it works, but a <> a not
> 
> Regards
> 
> Pavel 
> 
> regards, tom lane
>

Re: [HACKERS] Fast Default WIP patch for discussion

2016-10-28 Thread Serge Rielau


> On Oct 28, 2016, at 5:46 AM, Robert Haas <robertmh...@gmail.com> wrote:
> 
> On Fri, Oct 21, 2016 at 7:15 PM, Serge Rielau <se...@rielau.com> wrote:
>> Some key design points requiring discussion:
>> 1. Storage of the “exist” (working name) default
>>   Right now the patch stores the default value in its binary form as it
>> would be in the tuple into a BYTEA.
>>   It would be feasible to store the pretty printed literal, however this
>> requires calling the io functions when a
>>   tuple descriptor is built.
> 
> A pretty-printed literal is a terrible idea because input functions
> need not be immutable (e.g. timestamptz).  
I did not know that! Interesting.

> I would have expected a
> pg_node_tree column containing a CONST node, but your bytea thing
> might be OK.
I was debating following the example given by the current DEFAULT. 
But figured it’s overkill given that no-one user will ever have to look at it.
And in the end a the pretty printed CONST node is no more readable.

> 
>> 3. Delayed vs. early expansion of the tuples.
>>To avoid having to decide when to copy tuple descriptors with or without
>> constraints and defaults
>>I have opted to expand the tuple at the core entry points.
>>How do I know I have them all? An omission means wrong results!
> 
> Early expansion seems right.  "How do I know I have them all?" - and
> not only all of the current ones but all future ones - seems like a
> core sticking point for this patch.
Yes, there is a certain amount of inherent, hard to control, risk towards the 
future that we must be willing to accept. 

> 
>> 4. attisnull()
>>This routine is used in many places, but to give correct result sit must
>> now be accompanied
>>by the tuple descriptor. This becomes moderately messy and it’s not
>> always clear where to get that.
>>Interestingly most usages are related to catalog lookups.
>>Assuming we have no intention to support fast default for catalog tables
>> we could keep using the
>>existing attisnull() api for catalog lookups and use a new version (name
>> tbd) for user tables.
> 
> attisnull is not a thing.  There's heap_attisnull and slot_attisnull.
My apologies. 
Obviously slot_attisnull() is a non issue since slots have tuple descriptors.
It’s heap_attisnull() I am struggling with. It’s rather popular. 
Yet, in the large majority of cases exist defaults do not apply, so digging up 
the descriptor
is rather wasteful.

>> 5. My head hurts looking at the PK/FK code - it’s not always clear which
>> tuple descriptor belongs
>>to which tuple
> 
> I suggest ibuprofen and a stiff upper lip.
Sound advise.

>> 6. Performance of the expansion code.
>>The current code needs to walk all defaults and then start padding by
>> filling in values.
>>But the outcome is always the same. We will produce the same payload and
>> the name null map.
>>It would be feasible to cache an “all defaults tuple”, remember the
>> offsets (and VARWIDTH, HASNULL)
>>for each attribute and then simply splice the short and default tuples
>> together.
>>This ought to be faster, but the meta data to cache is not insignificant
>> and the expansion code is messy enough
>>without this already.
> 
> You could experiment to figure out if it makes any difference.
I think my first experiment will be to measure the performance impact of 
“expressed” vs. "non expressed" defaults 
with the current design. 
If that is considered excessive I will explore the more complex alternative.

> 
>> 7. Grooming
>>Obviously we can remove all exist defaults for a table from pg_attribute
>> whenever the table is rewrittem.
>>That’s easy.
> 
> But might cause the table to expand a lot, which would suck.
Well, a this point I must point back to the original goal which is O(1) ADD 
COLUMN performance.
The footprint of a rewritten table is no worse after this patch.
The reduced footprint of a table due to a rewrite avoided on ADD COLUMN is boon 
one must not rely upon.
The existing tuple layout would support an enhancement where trailing defaults 
may be avoided.
But I believe we would be better served with a more general approach to table 
compresion.

> 
>>    But could we/should we keep track of the short tuples and either
>> eliminate them or drop exist defaults once they
>>become obsolete because there is no tuple short enough for them to
>> matter.
> 
> I wouldn't mess with it.
*phew*

> 
>> 8. Do we need to worry about toasted defaults?
> 
> Presumably.
Time for me to dig into that then.

Cheers
Serge RIelau
salesforce.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast Default WIP patch for discussion

2016-10-26 Thread Serge Rielau

Euler,

Thanks, I was previously told I should post a WIP patch here.
There are too many open issues to be near committing.
Anyway, I have created a patch. 
https://commitfest.postgresql.org/11/843/#
Since this is my first time I do have a couple of questions:
There are entries for a git and a wiki link.
Should I push the patch to some branch, if so which repository?

Thanks
Serge

The form does ask for a git link rather 
> On Oct 26, 2016, at 8:51 AM, Euler Taveira <eu...@timbira.com.br> wrote:
> 
> On 26-10-2016 12:43, Serge Rielau wrote:
>> Posting to this group on a Friday evening was obviously a Bad Idea(tm). :-)
>> 
> Serge, add your patch to the next commitfest [1] so we don't forget to
> review it.
> 
> 
> [1] https://commitfest.postgresql.org/11/
> 
> 
> -- 
>   Euler Taveira   Timbira - http://www.timbira.com.br/
>   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast Default WIP patch for discussion

2016-10-26 Thread Serge Rielau

Hackers,

Posting to this group on a Friday evening was obviously a Bad Idea(tm). :-)

Let me clarify that I’m at this point not looking for any detailed review.
Rather I’m hoping to drive towards design decisions on the below.
So any opining would be much appreciated.
> On Oct 21, 2016, at 4:15 PM, Serge Rielau <se...@rielau.com> wrote:
> ...
>  
> Some key design points requiring discussion:
> 1. Storage of the “exist” (working name) default
>Right now the patch stores the default value in its binary form as it 
> would be in the tuple into a BYTEA.
>It would be feasible to store the pretty printed literal, however this 
> requires calling the io functions when a 
>tuple descriptor is built.
> 2. The exist default is cached alongside the “current” default in the tuple 
> descriptor’s constraint structure.
> Seems most natural too me, but debatable.
> 3. Delayed vs. early expansion of the tuples.
> To avoid having to decide when to copy tuple descriptors with or without 
> constraints and defaults
> I have opted to expand the tuple at the core entry points.
> How do I know I have them all? An omission means wrong results!
> 4. attisnull()
> This routine is used in many places, but to give correct result sit must 
> now be accompanied
> by the tuple descriptor. This becomes moderately messy and it’s not 
> always clear where to get that.
> Interestingly most usages are related to catalog lookups.
> Assuming we have no intention to support fast default for catalog tables 
> we could keep using the
> existing attisnull() api for catalog lookups and use a new version (name 
> tbd) for user tables.
> 5. My head hurts looking at the PK/FK code - it’s not always clear which 
> tuple descriptor belongs 
> to which tuple
> 6. Performance of the expansion code.
> The current code needs to walk all defaults and then start padding by 
> filling in values.
> But the outcome is always the same. We will produce the same payload and 
> the name null map.
> It would be feasible to cache an “all defaults tuple”, remember the 
> offsets (and VARWIDTH, HASNULL) 
> for each attribute and then simply splice the short and default tuples 
> together.
> This ought to be faster, but the meta data to cache is not insignificant 
> and the expansion code is messy enough
> without this already.
> 7. Grooming
> Obviously we can remove all exist defaults for a table from pg_attribute 
> whenever the table is rewrittem.
> That’s easy.
> But could we/should we keep track of the short tuples and either 
> eliminate them or drop exist defaults once they 
> become obsolete because there is no tuple short enough for them to 
> matter. 
> 8. Do we need to worry about toasted defaults?
> 
Thanks
Serge Rielau
salesforce.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fast Default WIP patch for discussion

2016-10-21 Thread Serge Rielau

As promised and requested find attached a work in progress patch for fast defaults.This is my first patch, I hope I used the right format…..The premise of the feature is to avoid a table rewrite when adding a column with a default.This is done by remembering the default value in pg_attribute instead of updating all the rows.When a tuple is read from disk which is shorter than the tuple descriptor mandates the default is “plugged in”either when filling in the datum/null array of a virtual tuple, or by “expanding” the tuple to a complete heap or minimal tuplewith all attributes.Example:CREATE TABLE t (pk INT NOT NULL PRIMARY KEY);INSERT INTO t VALUES (1);=> t: (1)ALTER TABLE t ADD COLUMN c1 INT DEFAULT 5;=> 1. Stores the default _expression_ in pg_attrdef.adbin (no news)     2. a. Stores the default values (computed at time of ALTER TABLE) in pg_attribute.att_existdef         b. Sets pg_attribute.att_hasexistdef to trueSELECT * FROM t;=> Build a virtual tuple using getAttr() API (no news)     but since tuple has fewer mats than table signature fill in extra attributes from pg_attribute.att_existdef      if att_hasexistdef is true=> (1) becomes (1, 5)INSERT INTO t VALUES (2);=> Fill in DEFAULT for t.c1 from pg_attrdef.adbin as usual: (2, 5)=> t: (1), (2, 5)ALTER TABLE t ALTER COLUMN c1 SET DEFAULT -1;=> 1. Drop row from pg_attrdef for c1   (no news)     2. Add row to pg_attrdef for c1 DEFAULT -1  (no news)      3. Leave pg_atribute.att_existdef alone!!INSERT INTO t VALUES (3);=> Fill in DEFAULT for t.c1 from pg_attrdef.adbin as usual: (3, -1)=> t: (1), (2, 5), (3, -1)SELECT * FROM t;=>  Build a virtual tuple using get*Attr() API (no news)     but since tuple has fewer mats than table signature fill in extra attributes from pg_attribute.att_existdef      if att_hasexistdef is true=> (1) becomes (1, 5)     (2, 5)     (3, -1)ALTER TABLE t ALTER COLUMN c1 DROP DEFAULT;=> 1. Drop row from pg_attrdef for c1   (no news)     2. Leave pg_atribute.att_existdef alone!!INSERT INTO t VALUES (4);=> Fill in default DEFAULT (4, NULL)=> t: (1), (2, 5), (3, -1), (4, NULL)SELECT * FROM t;=>  Build a virtual tuple using get*Attr() API (no news)     but since tuple has fewer mats than table signature fill in extra attributes from pg_attribute.att_existdef      if att_hasexistdef is true=> (1) becomes (1, 5)     (2, 5)     (3, -1)     (4, NULL)You can find a new (incomplete) test file fast_default.sql in regress/sql Some key design points requiring discussion:1. Storage of the “exist” (working name) default   Right now the patch stores the default value in its binary form as it would be in the tuple into a BYTEA.   It would be feasible to store the pretty printed literal, however this requires calling the io functions when a    tuple descriptor is built.2. The exist default is cached alongside the “current” default in the tuple descriptor’s constraint structure.    Seems most natural too me, but debatable.3. Delayed vs. early expansion of the tuples.    To avoid having to decide when to copy tuple descriptors with or without constraints and defaults    I have opted to expand the tuple at the core entry points.    How do I know I have them all? An omission means wrong results!4. attisnull()    This routine is used in many places, but to give correct result sit must now be accompanied    by the tuple descriptor. This becomes moderately messy and it’s not always clear where to get that.    Interestingly most usages are related to catalog lookups.    Assuming we have no intention to support fast default for catalog tables we could keep using the    existing attisnull() api for catalog lookups and use a new version (name tbd) for user tables.5. My head hurts looking at the PK/FK code - it’s not always clear which tuple descriptor belongs     to which tuple6. Performance of the expansion code.    The current code needs to walk all defaults and then start padding by filling in values.    But the outcome is always the same. We will produce the same payload and the name null map.    It would be feasible to cache an “all defaults tuple”, remember the offsets (and VARWIDTH, HASNULL)     for each attribute and then simply splice the short and default tuples together.    This ought to be faster, but the meta data to cache is not insignificant and the expansion code is messy enough    without this already.7. Grooming    Obviously we can remove all exist defaults for a table from pg_attribute whenever the table is rewrittem.    That’s easy.    But could we/should we keep track of the short tuples and either eliminate them or drop exist defaults once they     become obsolete because there is no tuple short enough for them to matter. 8. Do we need to worry about toasted defaults?CheersSerge RielauSalesforce.com

fast_default.patch
Description: Binary data

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-15 Thread Serge Rielau

This feature was added in DB2 year ago. AFAIK it was not very successful. 
Regular compression techniques proved serve a broader and purpose and save 
more space.
http://www.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.dbobj.doc/doc/c0056482.html 
[http://www.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.dbobj.doc/doc/c0056482.html]
via Newton Mail 
[https://cloudmagic.com/k/d/mailapp?ct=pi=9.1.19=10.0.2=email_footer_2]

On Sat, Oct 15, 2016 at 09:10, Jim Nasby  wrote:
On 10/9/16 11:02 PM, Corey Huinker wrote:
>
> There's actually another use case here that's potentially extremely
> valuable for warehousing and other "big data": compact
> representation of a default value.
>
>
> I too would benefit from tables having either a default value in the
> event of a NOT-NULL column being flagged null, or a flat-out constant.
>
> This would be a big win in partitioned tables where the partition can
> only hold one value of the partitioning column.

I hadn't thought of that use case... with rowcounts in the billions
becoming pretty common even the cost of a 4 byte enum starts to add up.

> I guess a constant would be a pg_type where the sole value is encoded,
> and the column itself is stored like an empty string.

Not empty string; the storage would look like NULL does today; the
difference being that we'd know that attribute wasn't NULL-able so if
it's marked as being "NULL" it actually means it has the default value.
Though obviously this would only work if the default was a Const, and
you wouldn't be able to change the default without ensuring no rows in
the table were using this trick. But I suspect there's still plenty of
scenarios where the advantage is worth it.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-06 Thread Serge Rielau


> On Oct 6, 2016, at 9:20 AM, Tom Lane  wrote:
> 
> Vitaly Burovoy  writes:
>> But what I discover for myself is that we have pg_attrdef separately
>> from the pg_attribute. Why?
> 
> The core reason for that is that the default expression needs to be
> a separate object from the column for purposes of dependency analysis.
> For example, if you have a column whose default is "foo()", then the
> default expression depends on the function foo(), but the column should
> not: if you drop the function, only the default expression ought to
> be dropped, not the column.
> 
> Because of this, the default expression needs to have its own OID
> (to be stored in pg_depend) and it's convenient to store it in a
> separate catalog so that the classoid can identify it as being a
> default expression rather than some other kind of object.
Good to know.
> 
> If we were going to allow these missing_values or creation_defaults
> or whatever they're called to be general expressions, then they would need
> to have their own OIDs for dependency purposes.  That would lead me to
> think that the best representation is to put them in their own rows in
> pg_attrdef, perhaps adding a boolean column to pg_attrdef to distinguish
> regular defaults from these things.  Or maybe they even need their own
> catalog, depending on whether you think dependency analysis would want
> to distinguish them from regular defaults using just the classed.
> 
> Now, as I just pointed out in another mail, realistically we're probably
> going to restrict the feature to simple constants, which'd mean they will
> depend only on the column's type and can never need any dependencies of
> their own.  So we could take the shortcut of just storing them in a new
> column in pg_attribute.  But maybe that's shortsighted and we'll
> eventually wish we'd done them as full-fledged separate objects.
> 
> But on the third hand ... once one of these is in place, how could you
> drop it separately from the column?  That would amount to a change in the
> column's stored data, which is not what one would expect from dropping
> a separate object.  So maybe it's senseless to think that these things
> could ever be distinct objects.  But that definitely leads to the
> conclusion that they're constants and nothing else.
I cannot follow this reasoning.
Let’s look past what PG does today:
For each row (whether that’s necessary or not) we evaluate the expression, 
compute the value and 
store it in the rewritten table.
We do not record dependencies on the “pedigree” of the value.
It happened to originate from the DEFAULT expression provided with the ADD 
COLUMN,
but that is not remembered anywhere.
All we remember is the value - in each row.

So the only change that is proposed here - when it comes right down to it - is 
to
remember the value once only (IFF it is provably the same for each row) and 
thus 
avoid the need to rewrite the table.
So I see no reason to impose any restriction other than “evaluated value is 
provably the same for every row”.

Regarding the location of storage.
I did start of using pg_attrdef, but ran into some snags.
My approach was to add the value as an extra column (rather than an extra row).
That caused trouble since a SET DEFAULT operation is decomposed into a DROP and 
a SET and
preserving the value across such operations did not come naturally.

If we were to use extra rows instead that issue would be solved, assuming we ad 
a “default kind” sort of column.
It would dictate the storage format though which may be considered overkill for 
a a constant. 

Cheers
Serge

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-06 Thread Serge Rielau


> On Oct 6, 2016, at 9:01 AM, Tom Lane  wrote:
> 
> BTW, it also occurs to me that there are going to be good implementation
> reasons for restricting it to be a hard constant, not any sort of
> expression.  We are likely to need to be able to insert the value in
> low-level code where general expression evaluation is impractical.
> 
Yes, the padding must happen primarily in the getAttr() routines.
Clearly we do not want to evaluate an expression there.
But what speaks against evaluating the expression before we store it?
After all we seem to all agree that this only works if the expression computes 
to the same constant all the time.

If we do not want to store an “untyped” datum straight in pg_attribute as a 
BYTEA (my current approach) we could store the pretty printed version of the 
constant
and evaluate that when we build the tuple descriptor.
This happens when we load the relation into the relcache.

Anyway, I’m jumping ahead and it’s perhaps best to let the code speak for 
itself once I have the WIP patch ready so we have something concrete to discuss

Cheers
Serge




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-06 Thread Serge Rielau


> On Oct 6, 2016, at 5:25 AM, Vitaly Burovoy <vitaly.buro...@gmail.com> wrote:
> 
> On 10/6/16, Simon Riggs <si...@2ndquadrant.com> wrote:
>> On 6 October 2016 at 04:43, Serge Rielau <se...@rielau.com> wrote:
>>>>> Or should I compose some sort of a design document?
>> 
>> Having read this thread, I'm a little unclear as to what you're
>> writing now, though there's definitely good ideas here.
>> 
>> I think it would be beneficial to write up a single coherent
>> description of this, including behaviour and a small sketch of
>> implementation, just so everyone knows what this is. No design doc,
>> but a summary.
> 
> At the moment I think it can also be a good idea to post the current
> patch as a Proposal or a WIP to get initial feedback.
I can do that -  Accompanied by a posting sized overview.

> 
> Yes, it works for stable "now()" but does not work for volatile
> functions like "random()", "uuid_generate_v4()" or default for serial
> columns. The only possible way I can see is to check an expression has
> only "T_Const"s, static and stable functions. In such case the
> expression can be evaluated and the result be saved as a value for
> absented attributes of a tuple. In the other case save NULL there and
> rewrite the table.
Agreed. I think DEFAULT as-is does the job nicely function wise.
One can always decompose the ADD COLUMN into two steps within the same 
transaction 
if the initial column value for pre-existing rows does not match the default 
for new or updated rows.
 
AT Just needs a performance boost for large tables where that’s reasonably 
possible.

>> Which makes me think we should call this missing_value or absent_value
>> so its clear that it is not a "default" it is the value we use for
>> rows that do not have any value stored for them.
> 
> It is definitely a default for a user, it is not a regular default internally.
> I'm not a native speaker, "absent_value" can be mixed up with a NULL.
> As for me the best phrase is "pre-add-column-default", but it is
> impossible to use it as a column name. :-(
> It is still an open question.
I like Tom’s “creation default”. Another one could be “initial default”.
But that, too, can be misread.

Cheers
Serge Rielau
Salesforce.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Serge Rielau


> On Oct 5, 2016, at 5:52 PM, Vitaly Burovoy <vitaly.buro...@gmail.com> wrote:
> 
> On 10/5/16, Serge Rielau <se...@rielau.com> wrote:
>> I want to point out as a minor "extension" that there is no need for the
>> default to be immutable. It is merely required that the default is evaluate
>> at time of ADD COLUMN and then we remember the actual value for the exist
>> default, rather than the parsed expression as we do for the "current"
>> default.
> 
> I don't think it will be accepted.
And I wouldn’t expect it to. I had a misunderstanding on what PG did.
Clearly the enhancement must be semantically neutral and be limited to the 
cases where that can be asserted.
So my patch will detect that situation and fall back to the original behavior 
as needed.
> Your patch must be based on a just "master" branch.
> If you haven't seen wiki pages [1], [2] and [3], it is the time to do
> it (and related ones).
> 
>> Or should I compose some sort of a design document?
> 
> Since Tom Lane, Andres Freund and other people agreed "it does work",
> you may post a patch to a new thread and write a short (but clean
> enough) description with a link to the current thread. Examples can be
> seen by links from the CF[4].
Thanks of rte guidance.
It will take a bit of time to port to community code and complete QA.
I shall return….
 
Serge Rielau
Salesforce.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Serge Rielau

via Newton Mail 
[https://cloudmagic.com/k/d/mailapp?ct=dx=9.0.74=10.11.6=email_footer_2]
On Wed, Oct 5, 2016 at 4:19 PM, Vitaly Burovoy <vitaly.buro...@gmail.com> wrote:
On 10/5/16, Tom Lane <t...@sss.pgh.pa.us> wrote:
> I wrote:
>> Need a better name for the concept, since evidently this name isn't
>> conveying the idea.
>
> Maybe "creation default" would work better? Point being it's the
> default value at the time of column creation.
Hmm... Personaly for me the original topic name is good enough. I think at 
issue is with the term “exist default” rather than the feature/topic name (?) 
But what I discover for myself is that we have pg_attrdef separately
from the pg_attribute. Why?
Is it time to join them? For not presented defaults it would be only
one bit per row(if we avoid "adsrc" as it is recommended), but for a
separate table it is 11 columns with two indexes now... In terms of footprint 
we may be able to remove pg_attrdef. I would consider that orthogonal to the 
proposed feature though. The internal representation of defaults in the tuple 
descriptor still needs to be a map of sorts.
To comment on Pantelis SQL Server Reference: Other vendors such as Oracle and 
DB2 also support this feature.
The listed restriction made me loop back to Vitaly’s original serial example: 
ALTER TABLE t ADD COLUMN c2 serial; and rethink Tom’s struct restriction to 
constants.
In PG the proposed feature would also have to be limited to immutable(?) 
default expressions to comply with existing behavior, which matches SQL Servers.
My current patch does not restrict that and thusly falsely "fills in" the same 
value for all rows.
Cheers Serge Rielau Salesforce.com

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Serge Rielau

via Newton Mail 
[https://cloudmagic.com/k/d/mailapp?ct=dx=9.0.74=10.11.6=email_footer_2]
On Wed, Oct 5, 2016 at 3:58 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
Andres Freund <and...@anarazel.de> writes:
> On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:
>>> No, "a second “exist default"" was mentioned, i.e. it is an additional
>>> column in a system table (pg_attribute) as default column values of
>>> the "pre-alter" era. It solves changing of the default expression of
>>> the same column later.

> Don't think that actually solves the issue. The default might be unset
> for a while, for example. Essentially you'd need to be able to associate
> arbitrary number of default values with an arbitrary set of rows.

I think it does work, as long as the "exists default" is immutable.
(For safety, personally, I'd restrict it to be a verbatim constant.)
The point is that you apply that when you are reading a row that has
so few columns that it must predate the original ALTER TABLE ADD COLUMN.
Subsequent ADD/DROP/SET DEFAULT affect the "active default" and hence
insertions that happen after them, but they don't affect the
interpretation of old rows. And of course all rows inserted after the
ADD COLUMN contain explicit values of the column, so their meaning is
unaffected in any case.
You do need two defaults associated with a column to make this work.
The "exists default" never changes after the column is added. But
in principle, the "exists default" just replaces the NULL value that
we implicitly insert now in such cases. Explained so much better than I could 
do it :-)
I want to point out as a minor “extension” that there is no need for the 
default to be immutable. It is merely required that the default is evaluate at 
time of ADD COLUMN and then we remember the actual value for the exist default, 
rather than the parsed expression as we do for the “current” default. Need a 
better name for the concept, since evidently this name isn't
conveying the idea. By all means. Got anything in mind?
Cheers Serge Rielau

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Serge Rielau

On Wed, Oct 5, 2016 at 3:23 PM, Vitaly Burovoy <vitaly.buro...@gmail.com> wrote:
On 10/5/16, Andres Freund <and...@anarazel.de> wrote:
> On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
>> Dear Hackers,
>> I’m working on a patch that expands PG’s ability to add columns to a table
>> without a table rewrite (i.e. at O(1) cost) from the
>> nullable-without-default to a more general case.
> If I understand this proposal correctly, altering a column default will
> still have trigger a rewrite unless there's previous default?
No, "a second “exist default"" was mentioned, i.e. it is an additional
column in a system table (pg_attribute) as default column values of
the "pre-alter" era. It solves changing of the default expression of
the same column later. Correct and good guess on pg_attribute. That’s where 
it’s living in my proposal.
Cheers Serge

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Serge Rielau

On Wed, Oct 5, 2016 at 2:45 PM, Vitaly Burovoy <vitaly.buro...@gmail.com> wrote:
On 10/5/16, Serge Rielau <se...@rielau.com> wrote:
> Dear Hackers,
>
> I’m working on a patch that expands PG’s ability to add columns to a table
> without a table rewrite (i.e. at O(1) cost) from the
> nullable-without-default to a more general case. E.g.
...
> Is there an interest in principle in the community for this functionality?

Wow! I think it would be great! It also solves huge vacuuming after
rewriting the table(s).
Just pay attention to corner cases like indexes, statistics and speed. Yes, 
Yes, and still analyzing speed
But I'd like to see solution for more important cases like:
CREATE TABLE t (pk INT NOT NULL PRIMARY KEY);
INSERT INTO t VALUES (1), (2), (3);
ALTER TABLE t ADD COLUMN c1 timestamptz NOT NULL DEFAULT 'now';
SELECT * FROM t ORDER BY pk;
ALTER TABLE t ADD COLUMN c2 serial;
SELECT * FROM t ORDER BY pk;
INSERT INTO t(pk) VALUES (4);
SELECT * FROM t ORDER BY pk; By solution I think you mean a semantic change 
from what it is doing today which is: * “Now” is fixed to ALTER TABLE time for 
all pre-existing rows * serial will fill in the same value for all pre-existing 
rows Having different semantics for those would require a rewrite and probably 
different syntax in some form.
This is what my patch does on our PG derivative today: CREATE TABLE t (pk INT 
NOT NULL PRIMARY KEY); CREATE TABLE postgres=# INSERT INTO t VALUES (1), (2), 
(3); INSERT 0 3 postgres=# ALTER TABLE t ADD COLUMN c1 timestamptz NOT NULL 
DEFAULT 'now'; ALTER TABLE postgres=# SELECT * FROM t ORDER BY pk; pk | c1 
+--- 1 | 2016-10-05 21:47:58.919194+00 2 | 
2016-10-05 21:47:58.919194+00 3 | 2016-10-05 21:47:58.919194+00 (3 rows)
postgres=# postgres=# ALTER TABLE t ADD COLUMN c2 serial; SELECT * FROM t ORDER 
BY pk; INSERT INTO t(pk) VALUES (4); SELECT * FROM t ORDER BY pk;
ALTER TABLE t ADD COLUMN c2 serial; ALTER TABLE postgres=# SELECT * FROM t 
ORDER BY pk; pk | c1 | c2 +---+ 1 | 
2016-10-05 21:47:58.919194+00 | 1 2 | 2016-10-05 21:47:58.919194+00 | 1 3 | 
2016-10-05 21:47:58.919194+00 | 1 (3 rows)
postgres=# INSERT INTO t(pk) VALUES (4); INSERT 0 1 postgres=# SELECT * FROM t 
ORDER BY pk; pk | c1 | c2 +---+ 1 | 
2016-10-05 21:47:58.919194+00 | 1 2 | 2016-10-05 21:47:58.919194+00 | 1 3 | 
2016-10-05 21:47:58.919194+00 | 1 4 | 2016-10-05 21:47:58.919194+00 | 2 (4 
rows) P.S.: I really think it is a good idea, just some research is
necessary and covering corner cases... Thanks. This would be my first 
contribution. I take it I would post a patch based on a recent PG 9.6 master 
for review? Or should I compose some sort of a design document?
Cheers Serge Rielau Salesforce.com

[HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Serge Rielau

Dear Hackers,
I’m working on a patch that expands PG’s ability to add columns to a table 
without a table rewrite (i.e. at O(1) cost) from the nullable-without-default 
to a more general case. E.g. CREATE TABLE T(pk INT NOT NULL PRIMARY KEY); 
INSERT INTO T VALEUS (1), (2), (3); ALTER TABLE T ADD COLUMN c1 INTEGER NOT 
NULL DEFAULT 5; INSERT INTO T VALUES (4, DEFAULT); ALTER TABLE T ALTER COLUMN 
SET DEFAULT 6; INSERT INTO T VALUS (5, DEFAULT); SELECT * FROM T ORDER BY pk; 
=> (1, 5), (2, 5), (3, 5), (4, 5), (5, 6);
Rows 1-3 have never been updated, yet they know that their values of c1 is 5.
The requirement is driven by large tables for which add column takes too much 
time and/or produces too large a transaction for comfort.
In simplified terms: * a second “exist default” is computed and stored in the 
catalogs at time of AT ADD COLUMN * The exist default is cached in the tuple 
descriptor (e.g in attrdef) * When one of the getAttr or copytuple related 
routines is invoked the exist default is filled in instead of simply NULL 
padding if the tuple is shorter the requested attribute number.

Is there an interest in principle in the community for this functionality?
Cheers Serge Rielau Salesforce.com
via Newton Mail 
[https://cloudmagic.com/k/d/mailapp?ct=dx=9.0.74=10.11.6=email_footer_2]

Re: [HACKERS] autonomous transactions

2016-09-03 Thread Serge Rielau


> On Sep 3, 2016, at 5:25 AM, Greg Stark <st...@mit.edu> wrote:
> 
> On Sat, Sep 3, 2016 at 12:09 PM, Simon Riggs <si...@2ndquadrant.com> wrote:
>> So doing autonomous transactions inside a single backend doesn't gain
>> you very much, yet it is an enormously invasive patch to do it that
>> way, not least because it requires you to rewrite locking and
>> deadlocks to make them work correctly when proc is not 1:1 with xid.
>> And as Serge points out it introduces other restrictions that we know
>> about now, perhaps more as well.
> 
> Well using a separate process also requires rewriting locking and
> deadlock detection since a reasonable user might expect that second
> process to have access to data locked in their current transaction.The
> plus side is that we're already facing that issue with parallel query
> so at least it's something that only has to be solved once instead of
> a new problem.
I can’t speak for reasonable users, (or persons in fact ;-)
But… previous implementations of ATs do fully expect them to deadlock on their 
parents and not see uncommitted changes.
So if one’s goal is to merely match competitors’ behavior then that part is a 
non issue.
I do not recall the single backend approach having been particularly invasive.
We managed to do a the 9.4 -> 9.5 merge with little problem despite it.

IMHO, solving the problem of passing variables to and from an AT is required 
for viability of the feature.
How else would the AT know what it’s supposed to do?
Starting an AT within a DDL transaction seems a much more narrow use case to me.

Interestingly, despite being supported in PL/SQL on nested BEGIN END blocks, 
we nearly exclusively see AT’s covering the entire function or trigger.
This usage property can be used to narrow the scope of variable passing to 
function parameters.

Cheers
Serge Rielau
salesforce.com

  

 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autonomous transactions

2016-08-31 Thread Serge Rielau

> On Aug 31, 2016, at 6:46 AM, Greg Stark  wrote:
> 
> Using a background worker mean that the autonomous transaction can't
> access any state from the process memory. Parameters in plpgsql are a
> symptom of this but I suspect there will be others. What happens if a
> statement timeout occurs during an autonomous transaction? What
> happens if you use a pl language in the autonomous transaction and if
> it tries to use non-transactional information such as prepared
> statements?
> 
+1 on this.
The proposed solution loosely matches what was done in DB2 9.7 and it runs into 
the same 
complexity. Passing local variable or session level variables back and forth 
became a source of grief.

At SFDC PG we have taken a different tack:
1. Gather up all the transaction state that is scattered across global 
variables into a struct
2. backup/restore transaction state when an autonomous transaction is invoked.

This allows full access to all non-transactional state.

The downside is that full access also includes uncommitted DDL (shared 
recache). 
So we had to restrict DDL in the parent transaction prior to the spawning of 
the child.

If there is interest in exploring this kind of solution as an alternative I can 
elaborate.

Cheers
Serge

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] C++ port of Postgres

2016-08-17 Thread Serge Rielau

> On Aug 16, 2016, at 10:16 PM, Craig Ringer  wrote:
> 
> On 17 August 2016 at 09:49, Andres Freund  > wrote:
>  
> 
> You need to include the files surrounded by extern "C" { }.
> 
> I'd really like to adopt the convention used by many libraries etc of doing 
> this automatically - detecting a c++ compiler in the preprocessor and 
> wrapping in "extern "C"" .
> 
> Having the codebase c++-clean enough to compile with a c++ compiler seems to 
> be the easiest way to maintain that, but means more "extern "C"" droppings in 
> the .c files, not just the headers. Still, pretty ignoreable.
> 
Big +1 here,
Just having community code compilable with a C++ compiler out of the box would 
go a long way.

Beyond that, on my end I have been working with PG now for a year and a half 
and here is a quick list of what I sorely miss from my C++ days:
* Overloading of functions (same as in SQL) keeps naming clean
* Named parameters (same as SQL) keeps code readable
* Adding new function parameters with defaults so I don’t need to pass in NULL, 
0, … at 20 places (again supported in SQL)
* Member functions greatly help organize code
* simple inheritance (as emulated today in node types)

At my old employer we used C++ for the DBMS in various degrees in different 
components.
That degree was agreed upon in coding standards, so we could pick what we like 
about C++ and blacklist what we didn’t.
E.g. C style exception handling was prohibited
Default memory management (new) was prohibited.
Instead new() was overloaded and hooked into the DBMS memory manager.
I see no reason why this couldn’t be done in PG.

I can’t comment of compiling on a Rasperry PI, but know that my former DBMS 
code compiled and ran on Windows, Linux, AIX, Sun, HP, and Mac.

But again, just having the community code compile so proprietary (for now) 
enhancements could be written in C++ would be huge.

Cheers
Serge

Re: [HACKERS] NewYork Bombing: SQL server bomb proof!!

2001-09-17 Thread Serge Rielau


This is probably the worst post I have seen in a newsgroup ever.
Using this tragedy so promote a product is disgusting.
You are not doing the product you are promoting a favor with this.

I will not comment on the technical content of this post.

Serge



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

42 matches

Mail list logo