Re: [HACKERS] Built-in support for a memory consumption ulimit?

2014-06-18 Thread Amit Kapila
On Wed, Jun 18, 2014 at 10:00 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Amit Kapila amit.kapil...@gmail.com writes:
  Won't it be possible if we convert malloc calls in backend code to
  go through wrapper, we already have some precedents of same like
  guc_malloc, pg_malloc?

 We do not have control over mallocs done by third-party code
 (think pl/perl for example).

Yeah, mallocs done by third-party code would be difficult to track,
one possibility could be that we expose a built-in memory allocator
function.  I think it will lead to change in third-party code who wants
to use this new feature.  However if thats not viable then we need to
think about some OS specific calls like the one you have suggested
above (sbrk(0)), but I think that solution might also need to have
portable API for Windows.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


[HACKERS] include_dir catch-22

2014-06-18 Thread Craig Ringer
There is, IMO, a significant oversight with the include_dir feature.

If a distributor wants to enable it by default at initdb time, they
can't just turn it on in postgresql.conf.sample, because initdb will die
when the postgres backend refuses to start because the configdir is missing.

Yet the configdir cannot exist, or initdb will refuse to run.

IMO we should just treat the configdir as implicitly if_exists, but
otherwise, there needs to be a separate if_exists option.

I'll gladly submit a patch to fix this for 9.4, I just want an opinion
on which way to go first.

(initdb then 'sed' the config file afterwards is not an answer)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [REVIEW] Re: Re: BUG #9578: Undocumented behaviour for temp tables created inside query language (SQL) functions

2014-06-18 Thread Haribabu Kommi
On Tue, Jun 17, 2014 at 12:30 AM, Abhijit Menon-Sen a...@2ndquadrant.com 
wrote:
 (Cc: to pgsql-bugs dropped.)

 At 2014-03-17 18:24:55 +1100, kommi.harib...@gmail.com wrote:

 *** a/doc/src/sgml/xfunc.sgml
 --- b/doc/src/sgml/xfunc.sgml
 ***
 *** 153,159  SELECT clean_emp();
 --- 153,186 
   (literal\/) (assuming escape string syntax) in the body of
   the function (see xref linkend=sql-syntax-strings).
  /para
 +
 +sect2 id=xfunc-sql-function-parsing-mechanism
 + titleParsing mechanism of a function/title

 +indexterm
 + primaryfunction/primary
 + secondaryparsing mechanism/secondary
 +/indexterm

 I suggest Catalog changes within functions instead of the above title.

 + para
 +  The body of an SQL function is parsed as if it were a single - 
 multi-part
 +  statement and thus uses a constant snapshot of the system catalogs for
 +  every sub-statement therein. Commands that alter the catalog will 
 likely not
 +  work as expected.
 + /para
 +
 + para
 +  For example: Issuing CREATE TEMP TABLE within an SQL function will 
 add
 +  the table to the catalog but subsequent statements in the same 
 function will
 +  not see those additions and thus the temporary table will be 
 invisible to them.
 + /para
 +
 + para
 +  Thus it is generally advised that applicationPL/pgSQL/ be used, 
 instead of
 +  acronymSQL/acronym, when any catalog visibilities are required in 
 the same function.
 + /para
 +/sect2

 I don't think that much text is warranted. I suggest something like the
 following condensed wording:

 para
  The body of an SQL function is parsed as if it were a single
  multi-part statement, using a constant snapshot of the system
  catalogs. The effect of any commands that alter the catalogs
  (e.g. CREATE TEMP TABLE) will therefore not be visible to
  subsequent commands in the function body.
 /para

 para
  The recommended workaround is to use applicationPL/PgSQL/.
 /para

 Does that seem sensible to you?

Looks good, Thanks for the review.
Updated patch attached.

Regards,
Hari Babu
Fujitsu Australia


sql_functions_parsing_doc_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] include_dir catch-22

2014-06-18 Thread Craig Ringer
On 06/18/2014 03:30 PM, Craig Ringer wrote:
 There is, IMO, a significant oversight with the include_dir feature.
 
 If a distributor wants to enable it by default at initdb time, they
 can't just turn it on in postgresql.conf.sample, because initdb will die
 when the postgres backend refuses to start because the configdir is missing.
 
 Yet the configdir cannot exist, or initdb will refuse to run.
 
 IMO we should just treat the configdir as implicitly if_exists, but
 otherwise, there needs to be a separate if_exists option.
 
 I'll gladly submit a patch to fix this for 9.4, I just want an opinion
 on which way to go first.

Oh, and while it's possible for include_if_exists, you get a spammy
initdb like:

creating template1 database in testdb/base/1 ... LOG:  skipping missing
configuration file /home/ec2-user/testdb/some.conf
 2014-06-18 04:05:15.194 EDT LOG:  skipping missing configuration file
/home/ec2-user/testdb/some.conf
ok
initializing pg_authid ... LOG:  skipping missing configuration file
/home/ec2-user/testdb/some.conf
 2014-06-18 04:05:15.894 EDT LOG:  skipping missing configuration file
/home/ec2-user/testdb/some.conf
ok
initializing dependencies ... LOG:  skipping missing configuration file
/home/ec2-user/testdb/some.conf
 2014-06-18 04:05:15.927 EDT LOG:  skipping missing configuration file
/home/ec2-user/testdb/some.conf
ok





-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Selectivity estimation for inet operators

2014-06-18 Thread Emre Hasegeli
I wanted to check the patch last time and found a bug effecting
MVC vs MVC part of the join selectivity. Fixed version attached.

Emre Hasegeli e...@hasegeli.com:
 Comparing the lists with each other slows down the function when
 statistics set to higher values. To avoid this problem I only use
 log(n) values of the lists. It is the first log(n) value for MCV,
 evenly separated values for histograms. In my tests, this optimization
 does not affect the planning time when statistics = 100, but does
 affect accuracy of the estimation. I can send the version without
 this optimization, if slow down with larger statistics is not a problem
 which should be solved on the selectivity estimation function.

Also, I changed this from log(n) to sqrt(n). It seems much better
now.

I try to explain the reason to processes some of the values with more
comments. I hope it is understandable.


inet-selfuncs-v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minmax indexes

2014-06-18 Thread Heikki Linnakangas

On 06/17/2014 09:16 PM, Andres Freund wrote:

On 2014-06-17 12:14:00 -0400, Robert Haas wrote:

On Tue, Jun 17, 2014 at 12:04 PM, Andres Freund and...@2ndquadrant.com wrote:

Well, I'm not the guy who does things with geometric data, but I don't
want to ignore the significant percentage of our users who are.  As
you must surely know, the GIST implementations for geometric data
types store bounding boxes on internal pages, and that seems to be
useful to people.  What is your reason for thinking that it would be
any less useful in this context?


For me minmax indexes are helpful because they allow to generate *small*
'coarse' indexes over large volumes of data. From my pov that's possible
possible because they don't contain item pointers for every contained
row.
That'ill imo work well if there are consecutive rows in the table that
can be summarized into one min/max range. That's quite likely to happen
for common applications of number of scalar datatypes. But the
likelihood of placing sufficiently many rows with very similar bounding
boxes close together seems much less relevant in practice. And I think
that's generally likely for operations which can't be well represented
as btree opclasses - the substructure that implies inside a Datum will
make correlation between consecutive rows less likely.


Well, I don't know: suppose you're loading geospatial data showing the
location of every building in some country.  It might easily be the
case that the data is or can be loaded in an order that provides
pretty good spatial locality, leading to tight bounding boxes over
physically consecutive data ranges.


Well, it might be doable to correlate them along one axis, but along
both?  That's more complicated... And even alongside one axis you
already get into problems if your geometries are irregularly sized.


Sure, there are cases where it would be useless. But it's easy to 
imagine scenarios where it would work well, where points are loaded in 
clusters and points that are close to each other also end up physically 
close to each other.



Asingle large polygon will completely destroy indexability for anything
stored physically close by because suddently the minmax range will be
huge... So you'll need to cleverly sort for that as well.


That's an inherent risk with minmax indexes: insert a few rows to the 
wrong locations in the heap, and the selectivity of the index degrades 
rapidly.


The main problem with using it for geometric types is that you can't 
easily CLUSTER the table to make the minmax index effective again. But 
there are ways around that.



But I'm not trying to say that we absolutely have to support that kind
of thing; what I am trying to say is that there should be a README or
a mailing list post or some such that says: We thought about how
generic to make this.  We considered A, B, and C.  We rejected C as
too narrow, and A because if we made it that general it would have
greatly enlarged the disk footprint for the following reasons.
Therefore we selected B.


Isn't 'simpler implementation' a valid reason that's already been
discussed onlist? Obviously simpler implementation doesn't trump
everything, but it's one valid reason...
Note that I have zap to do with the design of this feature. I work for
the same company as Alvaro, but that's pretty much it...


Without some analysis (e.g implementing it and comparing), I don't buy 
that it makes the implementation simpler to restrict it in this way. 
Maybe it does, but often it's actually simpler to solve the general case.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [REVIEW] Re: Re: BUG #9578: Undocumented behaviour for temp tables created inside query language (SQL) functions

2014-06-18 Thread Abhijit Menon-Sen
Thanks, I've marked this ready for committer.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] RHEL6 packaging issue for 9.4 beta - libevent conflict

2014-06-18 Thread Craig Ringer
Hi all

I posted about a possible packaging issue with RHEL 6 PGDG packages for
9.4beta on pgsql-yum-pkg, but things are pretty quiet over there (a
prior mail asking about what happened with moving to git hasn't had a
response).

Given that time is a concern with 9.4, I thought I'd raise it here.

http://www.postgresql.org/message-id/53a10ef9.9060...@2ndquadrant.com

Hopefully it's just PEBKAC.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] datistemplate of pg_database does not behave as per description in documentation

2014-06-18 Thread MauMau

From: Rajeev rastogi rajeev.rast...@huawei.com

Please find the attached patch with only documentation change.


I marked this as ready for committer.  The patch is good because it matches 
the description in the following page:


http://www.postgresql.org/docs/devel/static/manage-ag-templatedbs.html

Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minmax indexes

2014-06-18 Thread Andres Freund
On 2014-06-18 12:18:26 +0300, Heikki Linnakangas wrote:
 On 06/17/2014 09:16 PM, Andres Freund wrote:
 Well, it might be doable to correlate them along one axis, but along
 both?  That's more complicated... And even alongside one axis you
 already get into problems if your geometries are irregularly sized.
 
 Sure, there are cases where it would be useless. But it's easy to imagine
 scenarios where it would work well, where points are loaded in clusters and
 points that are close to each other also end up physically close to each
 other.

 Asingle large polygon will completely destroy indexability for anything
 stored physically close by because suddently the minmax range will be
 huge... So you'll need to cleverly sort for that as well.
 
 That's an inherent risk with minmax indexes: insert a few rows to the
 wrong locations in the heap, and the selectivity of the index degrades
 rapidly.

Sure. But it's fairly normal to have natural clusteredness in many
columns (surrogate keys, dateseries type of data). Even if you insert
geometric types in a geographic clusters you'll have worse results
because some bounding boxes will be big and such.

And:
 The main problem with using it for geometric types is that you can't easily
 CLUSTER the table to make the minmax index effective again. But there are
 ways around that.

Which are? Sure you can try stuff like recreating the table, sorting
rows with boundary boxes area above threshold first, and then go on to
sort by the lop left corner of the bounding box. But that'll be neither
builtin, nor convenient, nor perfect. In contrast to a normal CLUSTER
for types with a btree opclass which will yield the perfect order.

 Isn't 'simpler implementation' a valid reason that's already been
 discussed onlist? Obviously simpler implementation doesn't trump
 everything, but it's one valid reason...
 Note that I have zap to do with the design of this feature. I work for
 the same company as Alvaro, but that's pretty much it...
 
 Without some analysis (e.g implementing it and comparing), I don't buy that
 it makes the implementation simpler to restrict it in this way. Maybe it
 does, but often it's actually simpler to solve the general case.

So to implement a feature one now has to implement the most generic
variant as a prototype first? Really?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minmax indexes

2014-06-18 Thread Andres Freund
On 2014-06-17 16:48:07 -0700, Greg Stark wrote:
 On Tue, Jun 17, 2014 at 11:16 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  Well, it might be doable to correlate them along one axis, but along
  both?  That's more complicated... And even alongside one axis you
  already get into problems if your geometries are irregularly sized.
  Asingle large polygon will completely destroy indexability for anything
  stored physically close by because suddently the minmax range will be
  huge... So you'll need to cleverly sort for that as well.
 
 I think there's a misunderstanding here, possibly mine. My
 understanding is that a min/max index will always be exactly the same
 size for a given size table. It stores the minimum and maximum value
 of the key for each page. Then you can do a bitmap scan by comparing
 the search key with each page's minimum and maximum to see if that
 page needs to be included in the scan. The failure mode is not that
 the index is large but that a page that has an outlier will be
 included in every scan as a false positive incurring an extra iop.

I just rechecked, and no, it doesn't, by default, store a range for each
page. It's MINMAX_DEFAULT_PAGES_PER_RANGE=128 pages by
default... Haven't checked what's the lowest it can be se tto.


Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removing dependency to wsock32.lib when compiling code on WIndows

2014-06-18 Thread MauMau

From: Michael Paquier michael.paqu...@gmail.com

When doing some work on Windows, I noticed that the mkvc specs in
src/tools/msvc use wsock32.lib, which is as far as I understand an
old, obsolete version of the Windows socket library. Wouldn't it make
sense to update the specs to build only with ws2_32.lib like in the
patch attached?


Yes, that makes sense, because wsock32.dll is an obsolete WinSock 1.1 
library, while ws2_32.dll is a successor WinSock 2.0 library which is fully 
compatible with the old one.


Doing grep -lir wsock32 . shows the following files.  Could you modify and 
test these files as well for code cleanup?


./configure
./configure.in
./contrib/pgcrypto/Makefile
./src/interfaces/libpq/Makefile
./src/interfaces/libpq/win32.c
./src/interfaces/libpq/win32.mak
./src/test/thread/README
./src/tools/msvc/Mkvcbuild.pm


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] btreecheck extension

2014-06-18 Thread Stephen Frost
* Peter Geoghegan (p...@heroku.com) wrote:
 On Tue, Jun 17, 2014 at 5:11 PM, Robert Haas robertmh...@gmail.com wrote:
  Now, we could.  We could come up with an extensible syntax, like this:
 
  CHECK relation [ USING { checktype [ '(' arg [, ...] '}' [, ...] ];
 
 That's what I had in mind. Using the same trick that you came up with
 for EXPLAIN, to avoid grammar bloat and let the am figure out for
 itself what to name the various check types, with a generic default
 check.

I'm fine with having these start out as external tools which are doing
checks, but I've been specifically asked about (and have desired myself
from time-to-time...) an in-core capability to check index/heap/etc
validity.  Folks coming from certain other RDBMS's find it amazing that
we don't have any support for that when what they really want is a
background worker which is just automatically going around doing these
checks.

Now, perhaps we could have the background worker without the syntax for
running these by hand, but I don't particularly like that idea.  Being
able to run these checks by hand is extremely useful and I'd much prefer
to be able to force that than to have some mechanism where I have to
submit a request for a check to another process through a queue or
something along those lines.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Minmax indexes

2014-06-18 Thread Heikki Linnakangas

On 06/18/2014 01:46 PM, Andres Freund wrote:

On 2014-06-18 12:18:26 +0300, Heikki Linnakangas wrote:

The main problem with using it for geometric types is that you can't easily
CLUSTER the table to make the minmax index effective again. But there are
ways around that.


Which are? Sure you can try stuff like recreating the table, sorting
rows with boundary boxes area above threshold first, and then go on to
sort by the lop left corner of the bounding box.


Right, something like that. Or cluster using some other column that 
correlates with the geometry, like a zip code.



But that'll be neither
builtin, nor convenient, nor perfect. In contrast to a normal CLUSTER
for types with a btree opclass which will yield the perfect order.


Sure.

BTW, CLUSTERing by a geometric type would be useful anyway, even without 
minmax indexes.



Isn't 'simpler implementation' a valid reason that's already been
discussed onlist? Obviously simpler implementation doesn't trump
everything, but it's one valid reason...
Note that I have zap to do with the design of this feature. I work for
the same company as Alvaro, but that's pretty much it...


Without some analysis (e.g implementing it and comparing), I don't buy that
it makes the implementation simpler to restrict it in this way. Maybe it
does, but often it's actually simpler to solve the general case.


So to implement a feature one now has to implement the most generic
variant as a prototype first? Really?


Implementing something is a good way to demonstrate how it would look 
like. But no, I don't insist on implementing every possible design 
whenever a new feature is proposed.


I liked Greg's sketch of what the opclass support functions would be. It 
doesn't seem significantly more complicated than what's in the patch now.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?

2014-06-18 Thread Stephen Frost
* Matheus de Oliveira (matioli.math...@gmail.com) wrote:
 Hi Hackers,

Having read only the subject- +1 from me on the idea.  Maybe +1000.

 I was facing a situation were we wanted to set temp_tablespaces to a
 tablespace on a ephemeral disk (yes, it is AWS ephemeral disk), and I know
 many users have faced the same situation. Although it seems safe to create
 a tablespace on ephemeral disks if you use it to store only temporary files
 (either created by queries or temp tables), PostgreSQL does not have such
 information, and can't safely prevent a careless user of creating a
 non-temporary relation on this tablespace.

Right.

 So, what you guys think about letting PG know somehow that a tablespace is
 temporary?

PG would need to enforce that it's only used for temporary objects as
well, of course..  Or at least, that was my thinking on this.

 I have took some small time to make a PoC just to see if that is doable.
 And so I did a new syntax like:
 
 CREATE TABLESPACE spcname [TEMP | TEMPORARY] LOCATION ...
 
 So, if TEMP or TEMPORARY is present, I mark a new column at pg_tablespace
 as true. On every table creation or moving to a new tablespace, I just
 check this, and fails if the tablespace is temporary but the
 relpersistence says the table is not.

Not sure about that specific syntax (don't we have SET options now?) but
I do like the general idea.

 The other part is, every time some query or relation is asked to be stored
 on this tablespace, I create (on-demand) the PG_TEMP_FILES_DIR inside of it
 (also if it is temporary).
 
 The attached patch (and also on my Github, [1]), shows the PoC. For now,
 I'm not worried about the code quality, there are yet a lot of work to be
 done there (like ALTER TABLESPACE, better testing, use relcache, etc...),
 and it is my first hacking on PG (so I'm a newbie). But I'd like to hear
 from you guys if such feature is desirable and if I could starting working
 on that for real. Also some thoughts about better way of implementing it.

Yeah, +1 here and it should go into an appropriate commitfest to get a
proper review.  This would be *really* nice to have.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] comparison operators

2014-06-18 Thread Stephen Frost
* Andrew Dunstan (and...@dunslane.net) wrote:
 I think I'd rather just say for many data types or something along
 those lines, rather than imply that there is some obvious rule that
 users should be able to intuit.

Perhaps with a link to where the informaiton about which exist is
available..?  Or a query to get the list?

In general, I agree with you.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Minmax indexes

2014-06-18 Thread Robert Haas
On Tue, Jun 17, 2014 at 2:16 PM, Andres Freund and...@2ndquadrant.com wrote:
 But I'm not trying to say that we absolutely have to support that kind
 of thing; what I am trying to say is that there should be a README or
 a mailing list post or some such that says: We thought about how
 generic to make this.  We considered A, B, and C.  We rejected C as
 too narrow, and A because if we made it that general it would have
 greatly enlarged the disk footprint for the following reasons.
 Therefore we selected B.

 Isn't 'simpler implementation' a valid reason that's already been
 discussed onlist? Obviously simpler implementation doesn't trump
 everything, but it's one valid reason...
 Note that I have zap to do with the design of this feature. I work for
 the same company as Alvaro, but that's pretty much it...

It really *hasn't* been discussed on-list.  See these emails,
discussing the same ideas, from 8 months ago:

http://www.postgresql.org/message-id/5249b2d3.6030...@vmware.com
http://www.postgresql.org/message-id/ca+tgmoyscbw-uc8lqv96szignxsuzayqbfdqmk-fgu22hdk...@mail.gmail.com

Now, Alvaro did not respond to those emails, nor did anyone involved
in the development of the feature.  There may be an argument that
implementing that would be too complicated, but Heikki said he didn't
think it would be, and nobody's made a concrete argument as to why
he's wrong (and Heikki knows a lot about indexing).

 Basically, I think Heikki asked a good
 question - which was could we abstract this more? - and I can't
 recall seeing a clear answer explaining why we could or couldn't and
 what the trade-offs would be.

 'could we abstract more' imo is a pretty bad design guideline. It's 'is
 there benefit in abstracting more'. Otherwise you end up with way to
 complicated systems.

On the flip side, if you don't abstract enough, you end up being able
to cover only a small set of the relevant use cases, or else you end
up with a bunch of poorly-coordinated tools to cover slightly
different use cases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] datistemplate of pg_database does not behave as per description in documentation

2014-06-18 Thread Fujii Masao
On Wed, Jun 18, 2014 at 7:47 PM, MauMau maumau...@gmail.com wrote:
 From: Rajeev rastogi rajeev.rast...@huawei.com

 Please find the attached patch with only documentation change.


 I marked this as ready for committer.  The patch is good because it matches
 the description in the following page:

 http://www.postgresql.org/docs/devel/static/manage-ag-templatedbs.html

ISTM that this patch has already been committed by Bruce. Please see
the commit 72590b3a69baaf24d1090a2c2ceb9181be34043e

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-06-18 Thread Rahila Syed
Hello ,

I have a few preliminary comments about your patch
Thank you for review comments.

the patch creates src/common/lz4/.travis.yml, which it shouldn't.
Agree. I will remove it.

Shouldn't this use palloc?
palloc() is disallowed in critical sections and we are already in CS while
executing this code. So we use malloc(). It's OK since the memory is
allocated just once per session and it stays till the end.

At the very minimum, I would move the if (!compressed_pages_allocated)
block outside the for (i = 0; i  XLR_MAX_BKP_BLOCKS; i++) loop,.

Yes , the code for allocating memory is being executed just once for each
run of the program so it can be taken out of the for loop. But as the
condition
 if (compress_backup_block != BACKUP_BLOCK_COMPRESSION_OFF 
!compressed_pages_allocated)  evaluates to be true
for just the first loop , I am not sure if the change will be a significant
improvement from performance point of view except it will save few
condition checks.

and
add some comments. I think we could live with that
I will add comments.

If we were going to keep multiple compression algorithms around, I'd be
inclined to create a pg_compress(…, compression_algorithm) function to
hide these return-value differences from the callers. and a
pg_decompress() function that does error checking

+1 for abstracting out the differences in the return values and arguments
and provide a common interface for all compression algorithms.

   if (compress_backup_block = BACKUP_BLOCK_COMPRESSION_SNAPPY)
{
if (pg_snappy_compress(page, BLCKSZ, buf) == EIO)
return NULL;
}
else if (compress_backup_block = BACKUP_BLOCK_COMPRESSION_LZ4)
{
if (pg_LZ4_compress(page, BLCKSZ, buf) == 0)
return NULL;
}
else if (compress_backup_block = BACKUP_BLOCK_COMPRESSION_PGLZ)
{
if (pglz_compress(page, BLCKSZ, (PGLZ_Header *) buf,
  PGLZ_strategy_default) != 0)
return NULL;
}
else
elog(ERROR, Wrong value for compress_backup_block GUC);

/*
 * …comment about insisting on saving at least two bytes…
 */

if (VARSIZE(buf) = orig_len - 2)
return NULL;

*len = VARHDRSIZE + VARSIZE(buf);

return buf;
I guess it doesn't matter *too* much if the intention is to have all
these compression algorithms only during development/testing and pick
just one in the end. But the above is considerably easier to read in
the meanwhile.

The above version is better  as it avoids goto statement.

I don't mind the suggestion elsewhere in this thread to use
full_page_compression = y (as a setting alongside
torn_page_protection = x).

This change of GUC is in the ToDo for this patch.


Thank you,

Rahila


On Tue, Jun 17, 2014 at 5:17 PM, Abhijit Menon-Sen a...@2ndquadrant.com
wrote:

 At 2014-06-13 20:07:29 +0530, rahilasye...@gmail.com wrote:
 
  Patch named Support-for-lz4-and-snappy adds support for LZ4 and Snappy
  in PostgreSQL.

 I haven't looked at this in any detail yet, but I note that the patch
 creates src/common/lz4/.travis.yml, which it shouldn't.

 I have a few preliminary comments about your patch.

  @@ -84,6 +87,7 @@ boolXLogArchiveMode = false;
   char*XLogArchiveCommand = NULL;
   bool EnableHotStandby = false;
   bool fullPageWrites = true;
  +int  compress_backup_block = false;

 I think compress_backup_block should be initialised to
 BACKUP_BLOCK_COMPRESSION_OFF. (But see below.)

  + for (j = 0; j  XLR_MAX_BKP_BLOCKS; j++)
  + compressed_pages[j] = (char *)
 malloc(buffer_size);

 Shouldn't this use palloc?

  + * Create a compressed version of a backup block
  + *
  + * If successful, return a compressed result and set 'len' to its
 length.
  + * Otherwise (ie, compressed result is actually bigger than original),
  + * return NULL.
  + */
  +static char *
  +CompressBackupBlock(char *page, uint32 orig_len, char *dest, uint32
 *len)
  +{

 First, the calling convention is a bit strange. I understand that you're
 pre-allocating compressed_pages[] so as to avoid repeated allocations;
 and that you're doing it outside CompressBackupBlock so as to avoid
 passing in the index i. But the result is a little weird.

 At the very minimum, I would move the if (!compressed_pages_allocated)
 block outside the for (i = 0; i  XLR_MAX_BKP_BLOCKS; i++) loop, and
 add some comments. I think we could live with that.

 But I'm not at all fond of the code in this function either. I'd write
 it like this:

 struct varlena *buf = (struct varlena *) dest;

 if (compress_backup_block = BACKUP_BLOCK_COMPRESSION_SNAPPY)
 {
 if (pg_snappy_compress(page, BLCKSZ, buf) == EIO)
 return NULL;
 }
 else if (compress_backup_block = BACKUP_BLOCK_COMPRESSION_LZ4)
 {
 if (pg_LZ4_compress(page, BLCKSZ, buf) == 0)
 return NULL;
 }
 else if 

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-06-18 Thread Andres Freund
On 2014-06-18 18:10:34 +0530, Rahila Syed wrote:
 Hello ,
 
 I have a few preliminary comments about your patch
 Thank you for review comments.
 
 the patch creates src/common/lz4/.travis.yml, which it shouldn't.
 Agree. I will remove it.
 
 Shouldn't this use palloc?
 palloc() is disallowed in critical sections and we are already in CS while
 executing this code. So we use malloc(). It's OK since the memory is
 allocated just once per session and it stays till the end.

malloc() isn't allowed either. You'll need to make sure all memory is
allocated beforehand

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-06-18 Thread Abhijit Menon-Sen
At 2014-06-18 18:10:34 +0530, rahilasye...@gmail.com wrote:

 palloc() is disallowed in critical sections and we are already in CS
 while executing this code. So we use malloc().

Are these allocations actually inside a critical section? It seems to me
that the critical section starts further down, but perhaps I am missing
something.

Second, as Andres says, you shouldn't malloc() inside a critical section
either; and anyway, certainly not without checking the return value.

 I am not sure if the change will be a significant improvement from
 performance point of view except it will save few condition checks.

Moving that allocation out of the outer for loop it's currently in is
*nothing* to do with performance, but about making the code easier to
read.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-06-18 Thread Abhijit Menon-Sen
At 2014-06-18 18:25:34 +0530, a...@2ndquadrant.com wrote:

 Are these allocations actually inside a critical section? It seems to me
 that the critical section starts further down, but perhaps I am missing
 something.

OK, I was missing that XLogInsert() itself can be called from inside a
critical section. So the allocation has to be moved somewhere else
altogether.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-06-18 Thread Pavan Deolasee
On Wed, Jun 18, 2014 at 6:25 PM, Abhijit Menon-Sen a...@2ndquadrant.com
wrote:

 At 2014-06-18 18:10:34 +0530, rahilasye...@gmail.com wrote:
 
  palloc() is disallowed in critical sections and we are already in CS
  while executing this code. So we use malloc().

 Are these allocations actually inside a critical section? It seems to me
 that the critical section starts further down, but perhaps I am missing
 something.


ISTM XLogInsert() itself is called from other critical sections. See
heapam.c for example.


 Second, as Andres says, you shouldn't malloc() inside a critical section
 either; and anyway, certainly not without checking the return value.


I was actually surprised to see Andreas comment. But he is right. OOM
inside CS will result in a PANIC. I wonder if we can or if we really do
enforce that though. The code within #ifdef WAL_DEBUG in the same function
is surely doing a palloc(). That will be caught since there is an assert
inside palloc(). May be nobody tried building with WAL_DEBUG since that
assert was added.

May be Rahila can move that code to InitXLogAccess or even better check for
malloc() return value and proceed without compression. There is code in
snappy.c which will need similar handling, if we decide to finally add that
to core.

 I am not sure if the change will be a significant improvement from
  performance point of view except it will save few condition checks.

 Moving that allocation out of the outer for loop it's currently in is
 *nothing* to do with performance, but about making the code easier to
 read.


+1.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Proposal for CSN based snapshots

2014-06-18 Thread Robert Haas
On Tue, Jun 17, 2014 at 9:00 PM, Craig Ringer cr...@2ndquadrant.com wrote:
 On 06/18/2014 12:41 AM, Robert Haas wrote:
 On Mon, Jun 16, 2014 at 12:58 AM, Craig Ringer cr...@2ndquadrant.com wrote:
  On 05/30/2014 11:14 PM, Heikki Linnakangas wrote:
  Yeah. To recap, the failure mode is that if the master crashes and
  restarts, the transaction becomes visible in the master even though it
  was never replicated.
 
  Wouldn't another pg_clog bit for the transaction be able to sort that out?
 How?

 A flag to indicate that the tx is locally committed but hasn't been
 confirmed by a streaming synchronous replica, so it must not become
 visible until the replica confirms it or SR is disabled.

 Then scan pg_clog on start / replica connect and ask the replica to
 confirm local commit for those tx's.

 No?

No.  Otherwise, one of those bits could get changed after a backend
takes a snapshot and before it finishes using it - so that the
transaction snapshot is in effect changing underneath it.  You could
avoid that by memorizing the contents of CLOG when taking a snapshot,
but that would defeat the whole purpose of CSN-based snapshots, which
is to make the small and fixed-size.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL replay bugs

2014-06-18 Thread Robert Haas
On Tue, Jun 17, 2014 at 5:40 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Wed, Jun 18, 2014 at 1:40 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jun 2, 2014 at 8:55 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 I'm not sure if this is reasonably possible, but one thing that would
 make this tool a whole lot easier to use would be if you could make
 all the magic happen in a single server.  For example, suppose you had
 a background process that somehow got access to the pre and post
 images for every buffer change, and the associated WAL record, and
 tried applying the WAL record to the pre-image to see whether it got
 the corresponding post-image.  Then you could run 'make check' or so
 and afterwards do something like psql -c 'SELECT * FROM
 wal_replay_problems()' and hopefully get no rows back.
 So your point is to have a 3rd independent server in the process that
 would compare images taken from a master and its standby? Seems to
 complicate the machinery.

No, I was trying to get it down form 2 servers to 1, not 2 servers up to 3.

 Don't get me wrong, having this tool at all sounds great.  But I think
 to really get the full benefit out of it we need to be able to run it
 in the buildfarm, so that if people break stuff it gets noticed
 quickly.
 The patch I sent has included a regression test suite making the tests
 rather facilitated: that's only a matter of running actually make
 check in the contrib repository containing the binary able to compare
 buffer captures between a master and a standby.

Cool!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] comparison operators

2014-06-18 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * Andrew Dunstan (and...@dunslane.net) wrote:
 I think I'd rather just say for many data types or something along
 those lines, rather than imply that there is some obvious rule that
 users should be able to intuit.

 Perhaps with a link to where the informaiton about which exist is
 available..?  Or a query to get the list?

Queries for this sort of thing are covered in the chapter about index
opclasses.  The basic query would be like

select opcintype::regtype from pg_opclass where opcmethod = 403 and opcdefault;

but I'm not sure if this is completely useful; it's not obvious for
example that the text opclass is also used for varchar.  Another
point is that some of the operators aren't named in the conventional
way.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-18 Thread Robert Haas
On Tue, Jun 17, 2014 at 9:45 PM, Brightwell, Adam
adam.brightw...@crunchydatasolutions.com wrote:
 I absolutely appreciate all of the feedback that has been provided.  It has
 been educational.  To your point above, I started putting together a wiki
 page, as Stephen has spoken to, that is meant to capture these concerns and
 considerations as well as to capture ideas around solutions.

 https://wiki.postgresql.org/wiki/Row_Security_Considerations

 This page is obviously not complete, but I think it is a good start.
 Hopefully this document will help to continue the conversation and assist in
 addressing all the concerns that have been brought to the table.  As well, I
 hope that this document serves to demonstrate our intent and that we *are*
 taking these concerns seriously.  I assure you that as one of the
 individuals who is working towards the acceptance of this feature/patch, I
 am very much concerned about meeting the expected standards of quality and
 security.

Cool, thanks for weighing in.  I think that page is a good start.  An
item that I think should be added there is the potential overlap
between security_barrier views and row-level security.  How can we
reuse code (and SQL syntax?) for existing features like WITH CHECK
OPTION instead of writing new code (and inventing new syntax) for very
similar concepts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-06-18 Thread Craig Ringer
On 06/18/2014 09:12 PM, Robert Haas wrote:
 No.  Otherwise, one of those bits could get changed after a backend
 takes a snapshot and before it finishes using it - so that the
 transaction snapshot is in effect changing underneath it.  You could
 avoid that by memorizing the contents of CLOG when taking a snapshot,
 but that would defeat the whole purpose of CSN-based snapshots, which
 is to make the small and fixed-size.

Ah.

Thankyou. I appreciate the explanation.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Cube distance patch?

2014-06-18 Thread hubert depesz lubaczewski
Hi,

In September 2013, there was patch sent by Stas Kelvich (
http://www.postgresql.org/message-id/9e07e159-e405-41e2-9889-a04f534fc...@gmail.com)
that adds indexable kNN searches to cube contrib module.

What is needed so that it could get committed?

Regards,

depesz


Re: [HACKERS] RHEL6 packaging issue for 9.4 beta - libevent conflict

2014-06-18 Thread Tom Lane
Craig Ringer cr...@2ndquadrant.com writes:
 I posted about a possible packaging issue with RHEL 6 PGDG packages for
 9.4beta on pgsql-yum-pkg, but things are pretty quiet over there (a
 prior mail asking about what happened with moving to git hasn't had a
 response).
 http://www.postgresql.org/message-id/53a10ef9.9060...@2ndquadrant.com

Why in the world is PGDG installing something named libevent at all?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] datistemplate of pg_database does not behave as per description in documentation

2014-06-18 Thread MauMau

From: Fujii Masao masao.fu...@gmail.com

On Wed, Jun 18, 2014 at 7:47 PM, MauMau maumau...@gmail.com wrote:
I marked this as ready for committer.  The patch is good because it 
matches

the description in the following page:

http://www.postgresql.org/docs/devel/static/manage-ag-templatedbs.html


ISTM that this patch has already been committed by Bruce. Please see
the commit 72590b3a69baaf24d1090a2c2ceb9181be34043e


Oh, the devel doc certainly reflects the change:

http://www.postgresql.org/docs/devel/static/catalog-pg-database.html

I marked this as committed.

I hope Magnus-san's enhancements to the CommitFest app will enable the 
CommitFest entry to be automatically updated at git commit.


Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RHEL6 packaging issue for 9.4 beta - libevent conflict

2014-06-18 Thread Andrew Dunstan


On 06/18/2014 09:34 AM, Tom Lane wrote:

Craig Ringer cr...@2ndquadrant.com writes:

I posted about a possible packaging issue with RHEL 6 PGDG packages for
9.4beta on pgsql-yum-pkg, but things are pretty quiet over there (a
prior mail asking about what happened with moving to git hasn't had a
response).
http://www.postgresql.org/message-id/53a10ef9.9060...@2ndquadrant.com

Why in the world is PGDG installing something named libevent at all?





The only thing I can think of offhand that would require it is pgbouncer.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cube distance patch?

2014-06-18 Thread Abhijit Menon-Sen
At 2014-06-18 15:33:37 +0200, dep...@gmail.com wrote:

 Hi,
 
 In September 2013, there was patch sent by Stas Kelvich (
 http://www.postgresql.org/message-id/9e07e159-e405-41e2-9889-a04f534fc...@gmail.com)
 that adds indexable kNN searches to cube contrib module.
 
 What is needed so that it could get committed?

I suggest adding it to the 2014-08 CF so that it gets attention in the
next round of reviews.

I see some correspondence about it from Sergey Konoplev and Alexander
Korotkov (added to Cc:) in March suggesting that it'll be resubmitted
with changes for 9.5:

http://www.postgresql.org/message-id/CAL_0b1uaGVBnsqx-G6G2Q+N6eGppG89ry6FF0n=yel+yq9a...@mail.gmail.com

Perhaps Alexander can comment if there's an updated version of the
patch in the works.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-18 Thread Robert Haas
On Tue, Jun 17, 2014 at 10:06 PM, Stephen Frost sfr...@snowman.net wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
 On Fri, Jun 13, 2014 at 3:11 AM, Dean Rasheed dean.a.rash...@gmail.com 
 wrote:
  Yeah, I was thinking something like this could work, but I would go
  further. Suppose you had separate GRANTable privileges for direct
  access to individual tables, bypassing RLS, e.g.
 
GRANT DIRECT SELECT|INSERT|UPDATE|DELETE ON table_name TO role_name

 So, is this one new privilege (DIRECT) or four separate new privileges
 that are variants of the existing privileges (DIRECT SELECT, DIRECT
 INSERT, DIRECT UPDATE, DIRECT DELETE)?

 I had taken it to be a single privilege, but you're right, it could be
 done for each of those..  I really don't think we have the bits for more
 than one case here though (if that) without a fair bit of additional
 rework.  I'm not against that rework (and called for it wayyy back when
 I proposed the TRUNCATE privilege, as I recall) but that's a whole
 different challenge and no small bit of work..

Technically, there are 4 bits left, and that's what we need for
separate privileges.  We last consumed bits in 2008 (for TRUNCATE) and
2006 (for GRANT ON DATABASE), so even if we used all of the remaining
bits it might be another 5 years before anyone has to do that
refactoring.  But even if the refactoring needs to be done now for
some reason, it's only June, and the last CommitFest doesn't start
until February 15th.  I think we're being way too quick to jump to
talking about what can and can't be done in time for 9.5.  Let's start
by figuring out how we'd really like it to work and then, if it's too
ambitious, we can scale it back.

My main concern about using only one bit is that someone might want to
allow a user to bypass RLS on SELECT while still enforcing it for
data-modifying operations.  That seems like a plausible use case to
me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clarification of FDW API Documentation

2014-06-18 Thread Bernd Helmle



--On 13. Juni 2014 13:46:38 -0400 Tom Lane t...@sss.pgh.pa.us wrote:


Imagine if `BeginForeignScan` set up a remote cursor and
`IterateForeignScan` just fetched _one tuple at a time_ (unlike the
current behavior where they are fetched in batches). The tuple would be
passed to `ExecForeignDelete` (as is required), but the remote cursor
would remain pointing at that tuple. Couldn't `ExecForeignDelete` just
call `DELETE FROM table WHERE CURRENT OF cursor` to then delete that
tuple?


No.  This is not guaranteed (or even likely) to work in join cases: the
tuple to be updated/deleted might no longer be the current one of the
scan. You *must* arrange for the scan to return enough information to
uniquely identify the tuple later, and that generally means adding some
resjunk columns.


Yeah, this is exactly the trap i ran into while implementing the 
informix_fdw driver. It used an updatable cursor to implement the modify 
actions as you proposed first. Consider a query like


UPDATE remote SET f1 = t.id FROM local t WHERE t.id = f1

The planner might choose a hash join where the hash table is built by 
forwarding the cursor via the foreign scan. You'll end up with the cursor 
positioned at the end and you have no way to get it back in sync when the 
modify action is actually called.


--
Thanks

Bernd


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?

2014-06-18 Thread Fabrízio de Royes Mello
On Wed, Jun 18, 2014 at 9:00 AM, Stephen Frost sfr...@snowman.net wrote:

  I have took some small time to make a PoC just to see if that is doable.
  And so I did a new syntax like:
 
  CREATE TABLESPACE spcname [TEMP | TEMPORARY] LOCATION ...
 
  So, if TEMP or TEMPORARY is present, I mark a new column at
pg_tablespace
  as true. On every table creation or moving to a new tablespace, I just
  check this, and fails if the tablespace is temporary but the
  relpersistence says the table is not.

 Not sure about that specific syntax (don't we have SET options now?) but
 I do like the general idea.


Maybe something like that:

CREATE TABLESPACE spcname LOCATION '/foo/bar' WITH (only_temp_relations =
true);

Have in mind you must take care if you use ALTER TABLESPACE spcname SET
(...) to guarantee that exists only temp objects stored in the target
tablespace, and if exists a regular object you must throw an exception.

Makes sense?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] comparison operators

2014-06-18 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Stephen Frost sfr...@snowman.net writes:
  * Andrew Dunstan (and...@dunslane.net) wrote:
  I think I'd rather just say for many data types or something along
  those lines, rather than imply that there is some obvious rule that
  users should be able to intuit.
 
  Perhaps with a link to where the informaiton about which exist is
  available..?  Or a query to get the list?
 
 Queries for this sort of thing are covered in the chapter about index
 opclasses.  The basic query would be like

Right, a link to there from this would be useful, imv.

 select opcintype::regtype from pg_opclass where opcmethod = 403 and 
 opcdefault;
 
 but I'm not sure if this is completely useful; it's not obvious for
 example that the text opclass is also used for varchar.  Another
 point is that some of the operators aren't named in the conventional
 way.

Good point.  Hopefully a link over to the index-opclasses.html would be
helpful to users exploring these questions.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] comparison operators

2014-06-18 Thread David G Johnston
Andrew Dunstan wrote
 On 06/17/2014 07:25 PM, Andres Freund wrote:
 On 2014-06-17 19:22:07 -0400, Tom Lane wrote:
 Andrew Dunstan lt;

 andrew@

 gt; writes:
 I went to have a look at documenting the jsonb comparison operators,
 and
 found that the docs on comparison operators contain this:
  Comparison operators are available for all relevant data types.
 They neglect to specify further, however. This doesn't seem very
 satisfactory. How is a user to know which are relevant? I know they are
 not available for xml and json, but are for jsonb. Just talking about
 all relevant types seems rather hand-wavy.
 Well, there are 38 default btree opclasses in the standard system ATM.
 Are we worried enough about this to list them all explicitly?  Given the
 lack of complaints to date, I'm not.
 
 I think I'd rather just say for many data types or something along 
 those lines, rather than imply that there is some obvious rule that 
 users should be able to intuit.

Ideal world for me: we'd list the data types that do not provide comparison
operators (or not a full set) by default with links to the section in the
documentation where the reasoning for said omission is explained and/or
affirmed.

My other reaction is that referring to data types at all in this section is
unnecessary - other than maybe to state (which it does not currently) that
both sides of the comparison must be of the same (or binary equivalent, like
text/varchar) type or there must exist an implicit cast for one of the
operands.  Much of that knowledge is implied and well understood though, as
is the fact that operators are closely associated with data types.  IOW - I
would be fine with removing Comparison operators are available for all
relevant data types and not replacing it with anything.  Though for many
data types is my preferred equivalent phrase for the same reasons Andrew
noted.

David J.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/comparison-operators-tp5807654p5807757.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] comparison operators

2014-06-18 Thread Tom Lane
David G Johnston david.g.johns...@gmail.com writes:
 Andrew Dunstan wrote
 I think I'd rather just say for many data types or something along 
 those lines, rather than imply that there is some obvious rule that 
 users should be able to intuit.

 Ideal world for me: we'd list the data types that do not provide comparison
 operators (or not a full set) by default with links to the section in the
 documentation where the reasoning for said omission is explained and/or
 affirmed.

I was just wondering whether that wouldn't be a shorter list.
It's not hard to get the base types that don't have btree opclasses:

select typname from pg_type where not exists
(select 1 from pg_opclass where opcmethod = 403 and opcdefault and opcintype = 
pg_type.oid) and typtype = 'b' and not (typelem!=0 and typlen=-1) order by 1;
typname
---
 aclitem
 box
 cid
 cidr
 circle
 gtsvector
 json
 line
 lseg
 path
 pg_node_tree
 point
 polygon
 refcursor
 regclass
 regconfig
 regdictionary
 regoper
 regoperator
 regproc
 regprocedure
 regtype
 smgr
 txid_snapshot
 unknown
 varchar
 xid
 xml
(28 rows)

although this is misleading because some of these are binary-coercible to
indexable types, which means that the indexable type's opclass works for
them.

Eliminating those, we get

select typname from pg_type where not exists
(select 1 from pg_opclass where opcmethod = 403 and opcdefault and 
binary_coercible(pg_type.oid, opcintype)) and typtype = 'b' and not (typelem!=0 
and typlen=-1) order by 1;
typname
---
 aclitemhaven't bothered, no obvious sort order anyway
 boxno linear sort order
 cidhaven't bothered
 circle no linear sort order
 gtsvector  internal type, wouldn't be useful
 json
 line   no linear sort order
 lseg   no linear sort order
 path   no linear sort order
 point  no linear sort order
 polygonno linear sort order
 refcursor  haven't bothered
 smgr   useless legacy type
 txid_snapshot  no linear sort order
 unknownthere are no operations for 'unknown'
 xidno linear sort order (yes, really)
 xml
(17 rows)

So really we're pretty close to being able to say there are comparison
operators for every built-in type for which it's sensible.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RHEL6 packaging issue for 9.4 beta - libevent conflict

2014-06-18 Thread Devrim Gündüz

Hi,

On Wed, 2014-06-18 at 09:41 -0400, Andrew Dunstan wrote:
 The only thing I can think of offhand that would require it is
 pgbouncer.

Yeah. Recent pgbouncer versions require libevent 2.0+, and RHEL 6 has
1.4.

Regards,
-- 
Devrim GÜNDÜZ
Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR



signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-18 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Tue, Jun 17, 2014 at 10:06 PM, Stephen Frost sfr...@snowman.net wrote:
  I had taken it to be a single privilege, but you're right, it could be
  done for each of those..  I really don't think we have the bits for more
  than one case here though (if that) without a fair bit of additional
  rework.  I'm not against that rework (and called for it wayyy back when
  I proposed the TRUNCATE privilege, as I recall) but that's a whole
  different challenge and no small bit of work..
 
 Technically, there are 4 bits left, and that's what we need for
 separate privileges.  

I'd really hate to chew them all up..

 We last consumed bits in 2008 (for TRUNCATE) and
 2006 (for GRANT ON DATABASE), so even if we used all of the remaining
 bits it might be another 5 years before anyone has to do that
 refactoring.  

Perhaps, or we might come up with some new whiz-bang permission to add
next year. :/

 But even if the refactoring needs to be done now for
 some reason, it's only June, and the last CommitFest doesn't start
 until February 15th.  I think we're being way too quick to jump to
 talking about what can and can't be done in time for 9.5.  Let's start
 by figuring out how we'd really like it to work and then, if it's too
 ambitious, we can scale it back.

Alright- perhaps we can discuss what kind of refactoring would be needed
for such a change then, to get a better idea as to the scope of the
change and the level of effort required.

My thoughts on how to address this were to segregate the ACL bits by
object type.  That is to say, the AclMode stored for databases might
only use bits 0-2 (create/connect/temporary), while tables would use
bits 0-7 (insert/select/update/delete/references/trigger).  This would
allow us to more easily add more rights at the database and/or
tablespace level too.

 My main concern about using only one bit is that someone might want to
 allow a user to bypass RLS on SELECT while still enforcing it for
 data-modifying operations.  That seems like a plausible use case to
 me.

I absolutely agree that it's a real use-case and one which we should
support, just trying to avoid biting off more than can be done between
now and February.  Still, if we get things hammered out and more-or-less
agreement on the way forward, getting the code written may move quickly.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-18 Thread Robert Haas
On Tue, Jun 17, 2014 at 10:25 PM, Stephen Frost sfr...@snowman.net wrote:
 Yeah, if we have to ask an external security module a question for
 each row, there's little hope of any real optimization.  However, I
 think there will be a significant number of cases where people will
 want filtering clauses that can be realized by doing an index scan
 instead of a sequential scan, and if we end up forcing a sequential
 scan anyway, the feature will be useless to those people.

 I agree that we want to support that, if we can do so reasonably.  What
 I was trying to get at is simply this- don't we provide that already
 with the leakproof attribute and functions?  If we don't have enough
 there to allow index scans then we should be looking to add more, I'm
 thinking.

So the reason why we got onto this particular topic was because of the
issue of multiple security policies for a single table.  Of course,
multiple security policies can always be merged into a single
more-complex policy, but the resulting policy may be so complex that
the query-planner is no longer capable of doing a good job optimizing
it.  I won't mention here exactly what a certain large commercial
database vendor has implemented here; suffice it to say, however, that
their design avoids this pitfall, and ours currently does not.

 I agree on this point, but I'm still hopeful that we'll be able to get a
 good feature into 9.5.  There are quite a few resources available for
 the 'just programming' part, so the long pole in the tent here is
 absolutely hashing out what we want and how it should function.

Agreed.

 I'd be happy to host or participate in a conference call or similar if
 that would be useful to move this along- or we can continue to
 communicate via email.  There's a bit of a lull in conferences to which
 I'm going to right now, so in person is unlikely, unless folks want to
 get together somewhere on the east coast (I'd be happy to travel to
 Philly, Pittsburgh, NYC, etc, if it'd help..).

For me, email is easiest; but there are other options, too.

  What solution did you come up with for this case, which performed well
  and was also secure..?

 I put the logic in the client.  :-(

 Well, that's not helpful here. ;)

Sure.  The reason I brought it up is to say - hey, look, I had this
come up in the real world.  What would it take to be able to do
actually do it in the database server?  And the answer is - something
that will handle multiple security policies cleanly.

 But I'm not sure; that
 feels like it's giving something up that might be important.  And I
 think that the kinds of syntax we're discussing won't support leaving
 that out of the initial version and adding it later, so if we commit
 to this syntax, we're stuck with that behavior.  To avoid that, we'd
 need something like this:

 ALTER TABLE tab ADD POLICY polname WHERE quals;
 GRANT SELECT (polname) ON TABLE tab TO role;

 Right, if we were to support multiple policies on a given table then we
 would have to support adding and removing them individually, as well as
 specify when they are to be applied- and what if that when overlaps?
 Do we apply both and only a row which passed them all gets sent to the
 user?  Essentially we'd be defining the RLS policies to be AND'd
 together, right?  Would we want to support both AND-based and OR-based,
 and allow users to pick what set of conditionals they want applied to
 their various overlapping RLS policies?

AND is not a sensible policy; it would need to be OR.  If you grant
someone access to two different subsets of the rows in a table, it
stands to reason that they will expect to have access to all of the
rows that are in at least one of those subsets.  If you give someone
your car key and your house key, that means they can operate your car
or enter your house; it does not mean that they can operate your car
but only when it's inside your garage.

Alternatively, we could:

- Require the user to specify in some way which of the available
policies they want applied, and then apply only that one.
or
- Decide that such scenarios constitute misconfiguration. Throw an
error and make the table owner or other relevant local authority fix
it.

 Sounds all rather painful and much better done programatically by the
 user in a language which is suited to that task- eg: pl/pgsql, perl, C,
 or something besides our ALTER syntax + catalog representation.

I think exactly the opposite, for the query planning reasons
previously stated.  I think the policies will quickly get so
complicated that they're no longer optimizable.  Here's a simple
example:

- Policy 1 allows the user to access rows for which complexfunc() returns true.
- Policy 2 allows the user to access rows for which a = 1.

Most users have access only through policy 2, but some have access
through policy 1.  Users who have access through policy 1 will always
get a sequential scan, but users who have access through policy 2 have
an excellent chance of getting an index 

Re: [HACKERS] RHEL6 packaging issue for 9.4 beta - libevent conflict

2014-06-18 Thread Joshua D. Drake


On 06/18/2014 06:34 AM, Tom Lane wrote:

Craig Ringer cr...@2ndquadrant.com writes:

I posted about a possible packaging issue with RHEL 6 PGDG packages for
9.4beta on pgsql-yum-pkg, but things are pretty quiet over there (a
prior mail asking about what happened with moving to git hasn't had a
response).
http://www.postgresql.org/message-id/53a10ef9.9060...@2ndquadrant.com


Why in the world is PGDG installing something named libevent at all?



pg_bouncer.

JD


--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, @cmdpromptinc
If we send our children to Caesar for their education, we should
 not be surprised when they come back as Romans.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-18 Thread Robert Haas
On Wed, Jun 18, 2014 at 10:40 AM, Stephen Frost sfr...@snowman.net wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
 On Tue, Jun 17, 2014 at 10:06 PM, Stephen Frost sfr...@snowman.net wrote:
  I had taken it to be a single privilege, but you're right, it could be
  done for each of those..  I really don't think we have the bits for more
  than one case here though (if that) without a fair bit of additional
  rework.  I'm not against that rework (and called for it wayyy back when
  I proposed the TRUNCATE privilege, as I recall) but that's a whole
  different challenge and no small bit of work..

 Technically, there are 4 bits left, and that's what we need for
 separate privileges.

 I'd really hate to chew them all up..

Usually it's the patch author who WANTS to chew up all the available
bit space and OTHER people who say no.  :-)

 We last consumed bits in 2008 (for TRUNCATE) and
 2006 (for GRANT ON DATABASE), so even if we used all of the remaining
 bits it might be another 5 years before anyone has to do that
 refactoring.

 Perhaps, or we might come up with some new whiz-bang permission to add
 next year. :/

Well, people proposed separate permissions for things like VACUUM and
ANALYZE around the time TRUNCATE was added, and those were rejected on
the grounds that they didn't add enough value to justify wasting bits
on them.  I think we see whether there's a workable system that such
that marginal permissions (like TRUNCATE) that won't be checked often
don't have to consume bits.

 But even if the refactoring needs to be done now for
 some reason, it's only June, and the last CommitFest doesn't start
 until February 15th.  I think we're being way too quick to jump to
 talking about what can and can't be done in time for 9.5.  Let's start
 by figuring out how we'd really like it to work and then, if it's too
 ambitious, we can scale it back.

 Alright- perhaps we can discuss what kind of refactoring would be needed
 for such a change then, to get a better idea as to the scope of the
 change and the level of effort required.

 My thoughts on how to address this were to segregate the ACL bits by
 object type.  That is to say, the AclMode stored for databases might
 only use bits 0-2 (create/connect/temporary), while tables would use
 bits 0-7 (insert/select/update/delete/references/trigger).  This would
 allow us to more easily add more rights at the database and/or
 tablespace level too.

Yeah, that's another idea.  But it really deserves its own thread.
I'm still not convinced we have to do this at all to meet this need,
but that should be argued back and forth on that other thread.

 My main concern about using only one bit is that someone might want to
 allow a user to bypass RLS on SELECT while still enforcing it for
 data-modifying operations.  That seems like a plausible use case to
 me.

 I absolutely agree that it's a real use-case and one which we should
 support, just trying to avoid biting off more than can be done between
 now and February.  Still, if we get things hammered out and more-or-less
 agreement on the way forward, getting the code written may move quickly.

Nifty.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics hardware support table supported architectures

2014-06-18 Thread Robert Haas
On Tue, Jun 17, 2014 at 1:55 PM, Andres Freund and...@2ndquadrant.com wrote:
 But the concern is more whether 1 byte can actually be written
 without also writing neighbouring values. I.e. there's hardware out
 there that'll implement a 1byte store as reading 4 bytes, changing one
 of the bytes in a register, and then write the 4 bytes out again. Which
 would mean that a neighbouring write will possibly cause a wrong value
 to be written...

Ah, OK.  I've heard about that before, but had forgotten.

 What happens is that gcc will do a syscall triggering the kernel to turn
 of scheduling; perform the math and store the result; turn scheduling on
 again. That way there cannot be a other operation influencing the
 calculation/store. Imagine if you have hardware that, internally, only
 does stores in 4 byte units. Even if it's a single CPU machine, which
 most of those are, the kernel could schedule a separate process after
 the first 4bytes have been written. Oops. The kernel has ways to prevent
 that, userspace doesn't...

Interesting.  Turn off scheduling sounds like a pretty dangerous syscall.

  Does somebody want other columns in there?

 I think the main question at the developer meeting was how far we want
 to go with supporting primitives like atomic add, atomic and, atomic
 or, etc.  So I think we should add columns for those.

 Well, once CAS is available, atomic add etc is all trivially
 implementable - without further hardware support. It might be more
 efficient to use the native instruction (e.g. xadd can be much better
 than a cmpxchg loop because there's no retries), but that's just
 optimization that won't matter unless you have a fair bit of
 concurrency.

 There's currently fallbacks like:
 #ifndef PG_HAS_ATOMIC_FETCH_ADD_U32
 #define PG_HAS_ATOMIC_FETCH_ADD_U32
 STATIC_IF_INLINE uint32
 pg_atomic_fetch_add_u32_impl(volatile pg_atomic_uint32 *ptr, uint32 add_)
 {
 uint32 old;
 while (true)
 {
 old = pg_atomic_read_u32_impl(ptr);
 if (pg_atomic_compare_exchange_u32_impl(ptr, old, old + 
 add_))
 break;
 }
 return old;
 }

I understand, but the performance characteristics are quite different.
My understanding from the developer meeting was that we'd be OK with
having, say, three levels of support for atomic ops: all ops
supported, only TAS, none.  Or maybe four: all ops, CAS + TAS, only
TAS, none.  But I think there was resistance (in which I participate)
to the idea of, say, having platform 1 with add but not and and
or, platform 2 with and and or but not add, platform 3 with
both, platform 4 with neither, etc.  Then it becomes too hard for
developers to predict whether something that is a win on their
platform will be a loss on some other platform.

  3) sparcv8: Last released model 1997.

 I seem to recall hearing about this in a customer situation relatively
 recently, so there may be a few of these still kicking around out
 there.

 Really? As I'd written in a reply solaris 10 (released 2005) dropped
 support for it. Dropping support for a platform that's been desupported
 10 years ago by it's manufacturer doesn't sound bad imo...

We definitely have at least one customer using Solaris 9.  I don't
know their architecture for certain, but they did recently install a
new version of PostgreSQL.

  4) i386: Support dropped from windows 98 (yes, really), linux, openbsd
 (yes, really), netbsd (yes, really). No code changes needed.

 Wow, OK.  In that case, yeah, let's dump it.  But let's make sure we
 adequately document that someplace in the code comments, along with
 the reasons, because not everyone may realize how dead it is.

 I'm generally wondering how to better document the supported os/platform
 combinations. E.g. it's not apparent that we only support certain
 platforms on a rather limited set of compilers...

 Maybe a table with columns like: platform, platform version,
 supported-OSs, supported-compilers?

Sounds worth at try.

  6) armv-v5

 I think this is also a bit less dead than the other ones; Red Hat's
 shows Bugzilla shows people filing bugs for platform-specific problems
 as recently as January of 2013:

 https://bugzilla.redhat.com/show_bug.cgi?id=892378

 Closed as WONTFIX :P.

 Joking aside, I think there are still usecases for arm-v5 - but it's
 embedded stuff without a real OS and such. Nothing you'd install PG
 on. There's distributions that are dropping ARMv6 support already... My
 biggest problem is that it's not even documented whether v5 has atomic
 4byte stores - while it's documted for v6.

I think in doubtful cases we might as well keep the support in.  If
you've got the fallback to non-atomics, keeping the other code around
doesn't hurt much, and might make it easier for someone who is
interested in one of those platforms.  It's fine and good to kill
things that are totally dead, but I think it's better for a user of
some obscure platform to find that it 

Re: [HACKERS] Quantify small changes to predicate evaluation

2014-06-18 Thread Robert Haas
On Tue, Jun 17, 2014 at 10:23 AM, Dennis Butterstein soullinu...@web.de wrote:
 Hi Marti, thank you for your quick reply. I tried the proposed tweaks and
 see some differences regarding the measurements. It seems as if the overall
 query performance dropped a little what I think the disabled turbo boost
 mode is responsible for (all measurements are single query only). I think
 that kind of behaviour is somewhat expected. Run1: 26.559s Run2: 28.280s
 Unfortunately the variance between the runs seems to remain high. In fact I
 have exclusive access to the machine and I made sure not to run in any i/o
 related problems regarding buffer caches. Are there any other stumbling
 blocks I'm missing at the moment? Maybe I've to rethink my (end-to-end)
 measurement strategy. In your experience how small is it possible to get
 measurement variances on Postgres? Thank you very much. Kind regards, Dennis

I find that it's possible to get smaller variations than what you're
experiencing on read-only workloads without any special tuning, but
variation on workloads that write data is much higher, similar to what
you're seeing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] buildfarm client release 4.13

2014-06-18 Thread Andrew Dunstan

I have released version 4.13 of the PostgreSQL Buildfarm client.

This can be downloaded from 
http://www.pgbuildfarm.org/downloads/releases/build-farm-4_13.tgz


Changes in this release (from the git log):

   fcc182b Don't run TestCollateLinuxUTF8 on unsupported branches.
   273af50 Don't run FileTextArrayFDW tests if not wanted.
   9944a4a Don't remove the ccache directory on failure, unless
   configured to.
   73e4187 Make ccache and vpath builds play nicely together.
   9ff8c22 Work around path idiocy in msysGit.
   0948ac7 revert naming change for log files
   ca68525 Exclude ecpg/tests from find_typedefs code.


If you are using ccache, please note that there are adjustments to the 
recommended use pattern. The sample config file no longer suggests that 
the ccache directory have the branch name at the end. It is now 
recommended that you use a single cache for all branches for a 
particular member. To do this remove /$branch from the relevant line 
in your config file, if you have it, and remove those directories in the 
cache root. Your first run on each branch will rebuild some or all of 
the cache. My unifoed cache on crake is now at 615 Mb, rather than the 
multiples of Gb it had been previously.


It is recommended that this release be deployed by all users fairly 
quickly because of the fix in log file names that was discarding some 
that were quite important.


cheers

andrew






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 CF1

2014-06-18 Thread Magnus Hagander
On Tue, Jun 17, 2014 at 6:54 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Alvaro Herrera alvhe...@2ndquadrant.com writes:
  Robert Haas wrote:
  On Mon, Jun 16, 2014 at 2:36 AM, Abhijit Menon-Sen a...@2ndquadrant.com
 wrote:
  P.S. If you tag your reviews with [REVIEW] in the Subject, it'll be
  easier to keep track of them.

  I and, I believe, various other people hate that style, because at
  least in Gmail, it breaks the threading.  It is much easier to find
  things if they are all posted on one thread.

  Yes, please don't do that.  A simple, normal reply to the message that
  submits the patch is much better from my point of view as a subsequent
  reviewer and committer.

 Worth noting also is that Magnus is working on a new version of the
 commitfest app that will be able to automatically keep track of threads
 about patches --- so long as they *are* threads according to our mailing
 list archives.  I'm not sure if the archives recognize replies with a
 changed Subject: as being the same thread or not.


The archives code does threading based on the headers (in-reply-to and
references, in priority order). It completely ignores the subject when it
comes to the threading.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


[HACKERS] Is analyze_new_cluster.sh still useful?

2014-06-18 Thread Christoph Berg
Hi,

now that we have vacuumdb --all --analyze-in-stages in 9.4, wouldn't
it make sense to get rid of the analyze_new_cluster.sh file which
pg_upgrade writes? The net content is a single line which could as
well be printed by pg_upgrade itself. Instead of an lengthy
explanation how to invoke that manually, there should be a short note
and a pointer to some manual section. I think the chances of people
reading that would even be increased.

Similary, I don't really see the usefulness of delete_old_cluster.sh
as a file, when rm -rf could just be presented on the console for
the admin to execute by cut-and-paste.

Christoph
-- 
Senior Berater, Tel.: +49 (0)21 61 / 46 43-187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Hohenzollernstr. 133, 41061 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
pgp fingerprint: 5C48 FE61 57F4 9179 5970  87C6 4C5A 6BAB 12D2 A7AE


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG #10680 - ldapbindpasswd leaks to postgresql log

2014-06-18 Thread Magnus Hagander
On Wed, Jun 18, 2014 at 4:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Steven Siebert smsi...@gmail.com writes:
  Attached is a proposed patch for BUG #10680.

  It's a simple fix to the problem of the ldapbindpasswd leaking in
  clear text to the postgresql log.  The patch simply removes the raw
  pg_hba.conf line from the log message, but retains the log line number
  to assist admins in troubleshooting.

 You haven't exactly explained why this is a problem.  The proposed patch
 would impede diagnosing of many other problems, so it's not going to get
 committed without a thoroughly compelling rationale.


Yes, properly logging that was intentional, in commit
7f49a67f954db3e92fd96963169fb8302959576e.


Hint: I don't store my postmaster log securely is not compelling.
 We've been over that ground before; there are far too many reasons
 why access to the postmaster log is a potential security hazard
 to justify concluding that this particular one is worse.


Yeah, and the password is already in cleartext in a file next to it.

If we actually feel the need to get rid of it, we should do a better job.
Such as actively blanking it out with something else. Since we know the
password (we parsed it out), it shouldn't be impossible to actually blank
out *just the password*, without ruining all the other diagnostics usage of
it.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Atomics hardware support table supported architectures

2014-06-18 Thread Andres Freund
On 2014-06-18 11:15:15 -0400, Robert Haas wrote:
 On Tue, Jun 17, 2014 at 1:55 PM, Andres Freund and...@2ndquadrant.com wrote:
  What happens is that gcc will do a syscall triggering the kernel to turn
  of scheduling; perform the math and store the result; turn scheduling on
  again. That way there cannot be a other operation influencing the
  calculation/store. Imagine if you have hardware that, internally, only
  does stores in 4 byte units. Even if it's a single CPU machine, which
  most of those are, the kernel could schedule a separate process after
  the first 4bytes have been written. Oops. The kernel has ways to prevent
  that, userspace doesn't...
 
 Interesting.  Turn off scheduling sounds like a pretty dangerous syscall.

The kernel does that internally, just during cmpxchg. It'll reenable
interrupts before returning. There's hundreds of places inside at least
linux doing that internally...

Should you be interested in the details:
https://www.kernel.org/doc/Documentation/arm/kernel_user_helpers.txt

   Does somebody want other columns in there?
 
  I think the main question at the developer meeting was how far we want
  to go with supporting primitives like atomic add, atomic and, atomic
  or, etc.  So I think we should add columns for those.
 
  Well, once CAS is available, atomic add etc is all trivially
  implementable - without further hardware support. It might be more
  efficient to use the native instruction (e.g. xadd can be much better
  than a cmpxchg loop because there's no retries), but that's just
  optimization that won't matter unless you have a fair bit of
  concurrency.
 
  There's currently fallbacks like:
  #ifndef PG_HAS_ATOMIC_FETCH_ADD_U32
  #define PG_HAS_ATOMIC_FETCH_ADD_U32
  STATIC_IF_INLINE uint32
  pg_atomic_fetch_add_u32_impl(volatile pg_atomic_uint32 *ptr, uint32 add_)
  {
  uint32 old;
  while (true)
  {
  old = pg_atomic_read_u32_impl(ptr);
  if (pg_atomic_compare_exchange_u32_impl(ptr, old, old + 
  add_))
  break;
  }
  return old;
  }
 
 I understand, but the performance characteristics are quite different.

There might be cases where that's true, but in the majority of cases
where the variable isn't highly contended it's just about the same. In
many cases the microcode will implement atomic ops using ll/sc
operations internally anyway.

 My understanding from the developer meeting was that we'd be OK with
 having, say, three levels of support for atomic ops: all ops
 supported, only TAS, none.  Or maybe four: all ops, CAS + TAS, only
 TAS, none.  But I think there was resistance (in which I participate)
 to the idea of, say, having platform 1 with add but not and and
 or, platform 2 with and and or but not add, platform 3 with
 both, platform 4 with neither, etc.  Then it becomes too hard for
 developers to predict whether something that is a win on their
 platform will be a loss on some other platform.

I don't think developers really have to care. In nearly all the cases
the above cmpxchg loop will loop precisely once. Usually the cacheline
will stay in the exclusive state on that process, and that's it. In most
cases what it'll be replacing will be something protected by a spinlock
anyways - and the above isn't any worse than that.

Note that we're far from the only one falling back to cmpxchg for all
these ops - that's pretty much a standard fallback.

I'm fine with always implementing everything but tas, cas, add ontop of
cas for now. I want or/and/sub/... to be conveniently available to C
code, but they don't need to be based on hardware ops. Having open-coded
versions of the above in several places sounds like a horrible idea to
me. Both, because we might want to optimize it at some point and because
it's horrible readability wise.

   3) sparcv8: Last released model 1997.
 
  I seem to recall hearing about this in a customer situation relatively
  recently, so there may be a few of these still kicking around out
  there.
 
  Really? As I'd written in a reply solaris 10 (released 2005) dropped
  support for it. Dropping support for a platform that's been desupported
  10 years ago by it's manufacturer doesn't sound bad imo...
 
 We definitely have at least one customer using Solaris 9.  I don't
 know their architecture for certain, but they did recently install a
 new version of PostgreSQL.

But Solaris 9 doesn't imply sparcv8. Ultrasparc (first sparcv9) support
was added in solaris 2.5... That's a *long* time ago.

If my googling turned up the right information you needed to use special
-xarch flags for sun studio to even compile binaries that are pure v8
(v8plus is fine btw) since sun studio 9... Same for gcc apparently 
(-mno-v8plus),
although I don't know how far back.

The reason I'd like to kill it is that there's basically zero chance of
ever testing it and we'd need to write configure check that not only
detect the suncc intrinsics but also runs 

Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?

2014-06-18 Thread Stephen Frost
* Fabrízio de Royes Mello (fabriziome...@gmail.com) wrote:
 On Wed, Jun 18, 2014 at 9:00 AM, Stephen Frost sfr...@snowman.net wrote:
  Not sure about that specific syntax (don't we have SET options now?) but
  I do like the general idea.
 
 Maybe something like that:
 
 CREATE TABLESPACE spcname LOCATION '/foo/bar' WITH (only_temp_relations =
 true);

Yeah, that's more along the lines of what I was thinking.

 Have in mind you must take care if you use ALTER TABLESPACE spcname SET
 (...) to guarantee that exists only temp objects stored in the target
 tablespace, and if exists a regular object you must throw an exception.
 
 Makes sense?

Yes.  I'd definitely like to see an ALTER TABLESPACE option, with an
ERROR that lists out all of the non-temporary objects which are found
(and lists any other databases which have objects in those
tablespaces..).  That would allow administrators who have existing
notionally temporary-only tablespaces to go clean things up to make them
actually temporary-only.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Robert Haas
On Tue, Jun 17, 2014 at 12:50 PM, Andres Freund and...@2ndquadrant.com wrote:
  I can clearly understand the utility of being able to reset the system
  ID to a new, randomly-generated system ID - but giving the user the
  ability to set a particular value of their own choosing seems like a
  pretty sharp tool.  What is the use case for that?

 I've previously hacked this up adhoc during data recovery when I needed
 to make another cluster similar enough that I could replay WAL.

 Another usecase is to mark a database as independent from its
 origin. Imagine a database that gets sharded across several
 servers. It's not uncommon to do that by initially basebackup'ing the
 database to several nodes and then use them separately from
 thereon. It's quite useful to actually mark them as being
 distinct. Especially as several of them right now would end up with the
 same timeline id...

Sure, but that only requires being able to reset the ID randomly, not
to a particular value.

 But it seems to me that we might need to have a process discussion
 here, because, while I'm all in favor of incremental feature proposals
 that build towards a larger goal, it currently appears that the larger
 goal toward which you are building is not something that's been
 publicly discussed and debated on this list.  And I really think we
 need to have that conversation.  Obviously, individual patches will
 still need to be debated, but I feel like 2ndQuadrant is trying to
 construct a castle without showing the community the floor plan.  I
 believe that there is relatively broad agreement that we would all
 like a castle, but different people may have legitimately different
 ideas about how it should be constructed.  If the work arrives as a
 series of disconnected pieces (user-specified system ID, event
 triggers for CREATE, etc.), then everyone outside of 2ndQuadrant has
 to take it on faith that those pieces are going to eventually fit
 together in a way that we'll all be happy with.  In some cases, that's
 fine, because the feature is useful on its own merits whether it ends
 up being part of the castle or not.


 Uh. Right now this patch has been written because it's needed for a out
 of core replication solution. That's what BDR is at this point. The
 patch is unobtrusive, has other usecases than just our internal one and
 doesn't make pg_resetxlog even more dangerous than it already is. I
 don't see much problem with considering it on it's own cost/benefit?

Well, I think it *does* make pg_resetxlog more dangerous; see previous
discussion of pg_computemaxlsn.

 So this seems to be a concern that's relatively independent of this
 patch. Am I seing that right?

Partially; not completely.

 I actually don't think any of the discussions I was involved in had the
 externally visible version of replication identifiers limited to 16bits?
 If you are referring to my patch, 16bits was just the width of the
 *internal* name that should basically never be looked at. User visible
 replication identifiers are always identified by an arbitrary string -
 whose format is determined by the user of the replication identifier
 facility. *BDR* currently stores the system identifer, the database id
 and a name in there - but that's nothing core needs to concern itself
 with.

I don't think you're going to be able to avoid users needing to know
about those IDs.  The configuration table is going to have to be the
same on all nodes, and how are you going to get that set up without
those IDs being user-visible?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2014-06-18 Thread David Fetter
On Tue, Jun 17, 2014 at 04:07:55PM -0400, Robert Haas wrote:
 On Sat, Jun 14, 2014 at 7:56 PM, Kevin Grittner kgri...@ymail.com wrote:
  I looked at the standard, and initially tried to implement the
  standard syntax for this; however, it appeared that the reasons
  given for not using standard syntax for the row variables also
  apply to the transition relations (the term used by the standard).
  There isn't an obvious way to tie that in to all the PLs we
  support.  It could be done, but it seems like it would intolerably
  ugly, and more fragile than what we have done so far.
 
 I'm not too familiar with this area.  Can you describe what the
 standard syntax for the row variables is, as opposed to our syntax?
 Also, what's the standard syntax for the the transition relations?

The good:
- Generating the tuplestores.  Yay!

The bad:
- Generating them exactly and only for AFTER triggers
- Requiring that the tuplestores both be generated or not at all.
  There are real use cases described below where only one would be relevant.
- Generating the tuplestores unconditionally.

The ugly:
- Attaching tuplestore generation to tables rather than callers (triggers, 
DML, etc.)

The SQL standard says:

trigger definition ::=
CREATE TRIGGER trigger name trigger action time trigger event
ON table name [ REFERENCING transition table or variable list ]
triggered action

trigger action time ::=
BEFORE
| AFTER
| INSTEAD OF

trigger event ::=
INSERT
| DELETE
| UPDATE [ OF trigger column list ]

trigger column list ::=
column name list

triggered action ::=
[ FOR EACH { ROW | STATEMENT } ]
[ triggered when clause ]
triggered SQL statement

triggered when clause ::=
WHEN left paren search condition right paren

triggered SQL statement ::=
SQL procedure statement
| BEGIN ATOMIC { SQL procedure statement semicolon }... END

transition table or variable list ::=
transition table or variable...

transition table or variable ::=
  OLD [ ROW ] [ AS ] old transition variable name
| NEW [ ROW ] [ AS ] new transition variable name
| OLD TABLE [ AS ] old transition table name
| NEW TABLE [ AS ] new transition table name

old transition table name ::=
transition table name
new transition table name ::=
transition table name

transition table name ::=
identifier

old transition variable name ::=
correlation name

new transition variable name ::=
correlation name

Sorry that was a little verbose, but what it does do is give us what we need at
trigger definition time.  I'd say it's pilot error if a trigger
definition says make these tuplestores and the trigger body then
does nothing with them, which goes to Robert's point below re:
unconditional overhead.

  Some things which I *did* follow from the standard: these new
  relations are only allowed within AFTER triggers, but are available
  in both AFTER STATEMENT and AFTER ROW triggers.  That is, an AFTER
  UPDATE ... FOR EACH ROW trigger could use both the OLD and NEW row
  variables as well as the delta relations (under whatever names we
  pick).  That probably won't be used very often, but I can imagine
  some cases where it might be useful.  I expect that these will
  normally be used in FOR EACH STATEMENT triggers.
 
 I'm concerned about the performance implications of capturing the
 delta relations unconditionally.

Along that same line, we don't always need to capture both the before
tuplestores and the after ones.  Two examples of this come to mind:

- BEFORE STATEMENT triggers accessing rows, where there is no after part to 
use, and
- DML (RETURNING BEFORE, e.g.) which only touches one of them.  This
  applies both to extant use cases of RETURNING and to planned ones.

I'm sure if I can think of two, there are more.

In summary, I'd like to propose that the tuplestores be generated
separately in general and attached to callers. We can optimize this by
not generating redundant tuplestores.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replication identifier format

2014-06-18 Thread Andres Freund
On 2014-06-18 12:36:13 -0400, Robert Haas wrote:
  I actually don't think any of the discussions I was involved in had the
  externally visible version of replication identifiers limited to 16bits?
  If you are referring to my patch, 16bits was just the width of the
  *internal* name that should basically never be looked at. User visible
  replication identifiers are always identified by an arbitrary string -
  whose format is determined by the user of the replication identifier
  facility. *BDR* currently stores the system identifer, the database id
  and a name in there - but that's nothing core needs to concern itself
  with.
 
 I don't think you're going to be able to avoid users needing to know
 about those IDs.  The configuration table is going to have to be the
 same on all nodes, and how are you going to get that set up without
 those IDs being user-visible?

Why? Users and other systems only ever see the external ID. Everything
leaving the system is converted to the external form. The short id
basically is only used in shared memory and in wal records. For both
using longer strings would be problematic.

In the patch I have the user can actually see them as they're stored in
pg_replication_identifier, but there should never be a need for that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is analyze_new_cluster.sh still useful?

2014-06-18 Thread Tom Lane
Christoph Berg christoph.b...@credativ.de writes:
 now that we have vacuumdb --all --analyze-in-stages in 9.4, wouldn't
 it make sense to get rid of the analyze_new_cluster.sh file which
 pg_upgrade writes? The net content is a single line which could as
 well be printed by pg_upgrade itself. Instead of an lengthy
 explanation how to invoke that manually, there should be a short note
 and a pointer to some manual section. I think the chances of people
 reading that would even be increased.

 Similary, I don't really see the usefulness of delete_old_cluster.sh
 as a file, when rm -rf could just be presented on the console for
 the admin to execute by cut-and-paste.

There are contexts where pg_upgrade is executed by some wrapper script
and the user doesn't normally see its output directly.  This is the
case in the Red Hat packaging (unless Honza changed it since I left ;-))
and I think Debian might be similar.

I generally don't like the amount of cruft pg_upgrade leaves lying
around, so I'd be glad to see these script files go away if possible;
but we need to think about how this will play when there's a wrapper
script between pg_upgrade and the human user.

In the Red Hat wrapper script, the pg_upgrade output is dumped into a
log file, which the user can look at if he wants, but I'd bet the
average user doesn't read it --- that was certainly the expectation.
Of course, said user probably never notices the separate shell
scripts either, so maybe it's a wash.

Another angle is that some folks might have tried to automate things
even more, with a wrapper script that starts up the new postmaster
and runs analyze_new_cluster.sh all by itself.  I guess they could
make the wrapper do vacuumdb --all --analyze-in-stages directly,
though, so maybe that's not a fatal objection either.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Andres Freund
On 2014-06-18 12:36:13 -0400, Robert Haas wrote:
 On Tue, Jun 17, 2014 at 12:50 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
   I can clearly understand the utility of being able to reset the system
   ID to a new, randomly-generated system ID - but giving the user the
   ability to set a particular value of their own choosing seems like a
   pretty sharp tool.  What is the use case for that?
 
  I've previously hacked this up adhoc during data recovery when I needed
  to make another cluster similar enough that I could replay WAL.
 
  Another usecase is to mark a database as independent from its
  origin. Imagine a database that gets sharded across several
  servers. It's not uncommon to do that by initially basebackup'ing the
  database to several nodes and then use them separately from
  thereon. It's quite useful to actually mark them as being
  distinct. Especially as several of them right now would end up with the
  same timeline id...
 
 Sure, but that only requires being able to reset the ID randomly, not
 to a particular value.

I can definitely see a point in a version of the option that generates
the id randomly. But the use case one up actually does require setting
it to a s specific value...

  Uh. Right now this patch has been written because it's needed for a out
  of core replication solution. That's what BDR is at this point. The
  patch is unobtrusive, has other usecases than just our internal one and
  doesn't make pg_resetxlog even more dangerous than it already is. I
  don't see much problem with considering it on it's own cost/benefit?
 
 Well, I think it *does* make pg_resetxlog more dangerous; see previous
 discussion of pg_computemaxlsn.

Wasn't the thing around pg_computemaxlsn that there's actually no case
where it could be used safely? And that experienced people suggested to
use it an unsafe fashion?
I don't see how the proposed ability makes it more dangerous. It
*already* has the ability to reset the timelineid. That's the case where
users are much more likely to think about resetting it because that's
much more plausible than taking a unrelated cluster and resetting its
sysid, timeline and LSN.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is analyze_new_cluster.sh still useful?

2014-06-18 Thread Andres Freund
On 2014-06-18 12:51:43 -0400, Tom Lane wrote:
 Another angle is that some folks might have tried to automate things
 even more, with a wrapper script that starts up the new postmaster
 and runs analyze_new_cluster.sh all by itself.  I guess they could
 make the wrapper do vacuumdb --all --analyze-in-stages directly,
 though, so maybe that's not a fatal objection either.

Wouldn't that be quite counterproductive? The reason we don't normally
do that and why --analyze-in-stages exists is that the cluster should be
started up as fast as possible. Restarting it after ANALYZE went through
would be defeating that purpose, no?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?

2014-06-18 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 Yes.  I'd definitely like to see an ALTER TABLESPACE option, with an
 ERROR that lists out all of the non-temporary objects which are found
 (and lists any other databases which have objects in those
 tablespaces..).  That would allow administrators who have existing
 notionally temporary-only tablespaces to go clean things up to make them
 actually temporary-only.

That seems just about impossible from a concurrency standpoint
(ie, what if someone is creating a new table in the tablespace
concurrently with your check?  Possibly in a different database?)

I would certainly suggest that the first version of the patch not
undertake to allow this property to be ALTERed; the cost-benefit
ratio isn't there IMO.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Andres Freund
On 2014-06-18 18:54:05 +0200, Andres Freund wrote:
 On 2014-06-18 12:36:13 -0400, Robert Haas wrote:
  Sure, but that only requires being able to reset the ID randomly, not
  to a particular value.
 
 I can definitely see a point in a version of the option that generates
 the id randomly.

That's actually included in the patch btw (thanks Abhijit)...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is analyze_new_cluster.sh still useful?

2014-06-18 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-06-18 12:51:43 -0400, Tom Lane wrote:
 Another angle is that some folks might have tried to automate things
 even more, with a wrapper script that starts up the new postmaster
 and runs analyze_new_cluster.sh all by itself.  I guess they could
 make the wrapper do vacuumdb --all --analyze-in-stages directly,
 though, so maybe that's not a fatal objection either.

 Wouldn't that be quite counterproductive? The reason we don't normally
 do that and why --analyze-in-stages exists is that the cluster should be
 started up as fast as possible. Restarting it after ANALYZE went through
 would be defeating that purpose, no?

How so?  Once you've started the postmaster, you're open for business,
no?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Robert Haas
On Wed, Jun 18, 2014 at 12:54 PM, Andres Freund and...@2ndquadrant.com wrote:
 Well, I think it *does* make pg_resetxlog more dangerous; see previous
 discussion of pg_computemaxlsn.

 Wasn't the thing around pg_computemaxlsn that there's actually no case
 where it could be used safely? And that experienced people suggested to
 use it an unsafe fashion?

There is a use case - to determine whether WAL has been replayed from
the future relative to the WAL stream and control file you have on
disk.  But the rest is true enough.

 I don't see how the proposed ability makes it more dangerous. It
 *already* has the ability to reset the timelineid. That's the case where
 users are much more likely to think about resetting it because that's
 much more plausible than taking a unrelated cluster and resetting its
 sysid, timeline and LSN.

All right, well, I've said my piece.  I don't have anything to add to
that that isn't sheer repetition.  My vote is to hold off on this
until we've talked about replication identifiers and other related
topics in more depth.  But if that position doesn't garner majority
support ... so be it!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is analyze_new_cluster.sh still useful?

2014-06-18 Thread Andres Freund
On 2014-06-18 13:24:14 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-06-18 12:51:43 -0400, Tom Lane wrote:
  Another angle is that some folks might have tried to automate things
  even more, with a wrapper script that starts up the new postmaster
  and runs analyze_new_cluster.sh all by itself.  I guess they could
  make the wrapper do vacuumdb --all --analyze-in-stages directly,
  though, so maybe that's not a fatal objection either.
 
  Wouldn't that be quite counterproductive? The reason we don't normally
  do that and why --analyze-in-stages exists is that the cluster should be
  started up as fast as possible. Restarting it after ANALYZE went through
  would be defeating that purpose, no?
 
 How so?  Once you've started the postmaster, you're open for business,
 no?

Wasn't there lots of talk about making the server inaccessible while
pg_upgrade is doing its thing? Also, many people are desparately unhappy
if postgres has to be restarted (to return to be being OS managed) after
their application already has connected.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Petr Jelinek

On 18/06/14 19:26, Robert Haas wrote:

On Wed, Jun 18, 2014 at 12:54 PM, Andres Freund and...@2ndquadrant.com wrote:

I don't see how the proposed ability makes it more dangerous. It
*already* has the ability to reset the timelineid. That's the case where
users are much more likely to think about resetting it because that's
much more plausible than taking a unrelated cluster and resetting its
sysid, timeline and LSN.


All right, well, I've said my piece.  I don't have anything to add to
that that isn't sheer repetition.  My vote is to hold off on this
until we've talked about replication identifiers and other related
topics in more depth.  But if that position doesn't garner majority
support ... so be it!



I am not sure I get what does this have to do with replication 
identifiers. The patch has several use-cases, one of them has to do that 
you can know the future system id before you set it, which is useful for 
automating some things...


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Andres Freund
On 2014-06-18 13:26:37 -0400, Robert Haas wrote:
 My vote is to hold off on this until we've talked about replication
 identifiers and other related topics in more depth.

I really don't understand this. We're *NOT* proposing this patch as an
underhanded way of preempting the discussion of whether/how replication
identifiers are going to be used. We're proposing it because we
currently have a need for the facility and this will reduce the number
of patches we have to keep around after 9.5. And more importantly
because there's several other use cases than our internal one for it.

To settle one more point: I am not planning to propose BDR's generation
of replication identifier names for integration. It works well enough
for BDR but I think we can come up with something better for core. If I
had my current knowledge two years back I'd not have chosen the current
scheme.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Josh Berkus
On 06/13/2014 05:31 PM, Petr Jelinek wrote:
 Hello,
 
 attached is a simple patch which makes it possible to change the system
 identifier of the cluster in pg_control. This is useful for
 individualization of the instance that is started on top of data
 directory produced by pg_basebackup - something that's helpful for
 logical replication setup where you need to easily identify each node
 (it's used by Bidirectional Replication for example).

I'm unclear on why we would overload pg_resetxlog for this.  Wouldn't it
be better design to have an independant function,
pg_set_system_identifier?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Alvaro Herrera
Josh Berkus wrote:
 On 06/13/2014 05:31 PM, Petr Jelinek wrote:
  Hello,
  
  attached is a simple patch which makes it possible to change the system
  identifier of the cluster in pg_control. This is useful for
  individualization of the instance that is started on top of data
  directory produced by pg_basebackup - something that's helpful for
  logical replication setup where you need to easily identify each node
  (it's used by Bidirectional Replication for example).
 
 I'm unclear on why we would overload pg_resetxlog for this.  Wouldn't it
 be better design to have an independant function,
 pg_set_system_identifier?

We have overloaded pg_resetxlog for all pg_control-tweaking needs.  I
don't see any reason to do differently here.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Andres Freund
On 2014-06-18 10:44:56 -0700, Josh Berkus wrote:
 On 06/13/2014 05:31 PM, Petr Jelinek wrote:
  Hello,
  
  attached is a simple patch which makes it possible to change the system
  identifier of the cluster in pg_control. This is useful for
  individualization of the instance that is started on top of data
  directory produced by pg_basebackup - something that's helpful for
  logical replication setup where you need to easily identify each node
  (it's used by Bidirectional Replication for example).
 
 I'm unclear on why we would overload pg_resetxlog for this.  Wouldn't it
 be better design to have an independant function,
 pg_set_system_identifier?

You mean an independent binary? Because it's not possible to change this
at runtime.

If so, it's because pg_resetxlog already has the option to change many
related things (e.g. the timeline id). And it'd require another copy of
several hundred lines of code. It's all stored in the control file/checkpoints.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Abhijit Menon-Sen
At 2014-06-18 10:44:56 -0700, j...@agliodbs.com wrote:

 I'm unclear on why we would overload pg_resetxlog for this. 

Because pg_resetxlog already does something very similar, so the patch
is small. If it were independent, it would have to copy quite some code
from pg_resetxlog.

 Wouldn't it be better design to have an independant function,
 pg_set_system_identifier?

A *function*? Why?

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to change the pgsql source code and build it??

2014-06-18 Thread Shreesha
Well, the initdb issue looks to be resolved.  Actually, after making the
changes as suggested by Kyotaro Horiguchi, I copied only initdb binary to
my platform and didn't copy all of them. Hence, the dependencies were still
not resolved and was getting the error. However, now the database server is
started and is up and running.

But, When I try to connect the client to the server, I am getting the
following error:

/switch/pgsql/bin # ./psql
FATAL:  database root does not exist
psql: FATAL:  database root does not exist

Upon browsing couple of links, I learned that in order to get through with
this issue, we should login with the actual postgres user so that it will
let the client to get connected with the default database. However in my
case, I don't know why there wasn't a default database with name 'root' got
created or why the server rejects the client when it tries to connect to
the default database.

Can anyone shed some light on
1) when the default database gets created
2) how is this database 'name' is decided? Or what is the name of the
default database name?
3) Is there any other places in the database server code where this check
is applied?

Upon looking at the error I got, I believe the code is searching for the
database name == user name.  If anyone can give some input on the code, it
would be helpful!

Thanks,
Shreesha



On Mon, Jun 16, 2014 at 6:58 PM, Craig Ringer cr...@2ndquadrant.com wrote:

 On 06/17/2014 02:02 AM, Shreesha wrote:
  But I believe, I am looking forward for the exact opposite of it. In
  other words, a possible work around for a root user to execute certain
  executable(s) as an unprivileged user.
 

 So you want the su or sudo commands?

 --
  Craig Ringer   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services




-- 
~Shreesha.


Re: [HACKERS] Is analyze_new_cluster.sh still useful?

2014-06-18 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-06-18 13:24:14 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
 On 2014-06-18 12:51:43 -0400, Tom Lane wrote:
 Another angle is that some folks might have tried to automate things
 even more, with a wrapper script that starts up the new postmaster
 and runs analyze_new_cluster.sh all by itself.  I guess they could
 make the wrapper do vacuumdb --all --analyze-in-stages directly,
 though, so maybe that's not a fatal objection either.

 Wouldn't that be quite counterproductive? The reason we don't normally
 do that and why --analyze-in-stages exists is that the cluster should be
 started up as fast as possible. Restarting it after ANALYZE went through
 would be defeating that purpose, no?

 How so?  Once you've started the postmaster, you're open for business,
 no?

 Wasn't there lots of talk about making the server inaccessible while
 pg_upgrade is doing its thing? Also, many people are desparately unhappy
 if postgres has to be restarted (to return to be being OS managed) after
 their application already has connected.

I think we're not on the same page.  My point is that someone might want
to automate the whole sequence: stop old postmaster, run pg_upgrade, start
the updated postmaster normally (hence it *is* open for business), kick
off the analyze runs.  If you're concerned about minimal downtime you
would not want to be waiting around for the admin to issue a perfectly
predictable series of commands.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is analyze_new_cluster.sh still useful?

2014-06-18 Thread Andres Freund
On 2014-06-18 13:54:16 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-06-18 13:24:14 -0400, Tom Lane wrote:
  Andres Freund and...@2ndquadrant.com writes:
  On 2014-06-18 12:51:43 -0400, Tom Lane wrote:
  Another angle is that some folks might have tried to automate things
  even more, with a wrapper script that starts up the new postmaster
  and runs analyze_new_cluster.sh all by itself.  I guess they could
  make the wrapper do vacuumdb --all --analyze-in-stages directly,
  though, so maybe that's not a fatal objection either.
 
  Wouldn't that be quite counterproductive? The reason we don't normally
  do that and why --analyze-in-stages exists is that the cluster should be
  started up as fast as possible. Restarting it after ANALYZE went through
  would be defeating that purpose, no?
 
  How so?  Once you've started the postmaster, you're open for business,
  no?
 
  Wasn't there lots of talk about making the server inaccessible while
  pg_upgrade is doing its thing? Also, many people are desparately unhappy
  if postgres has to be restarted (to return to be being OS managed) after
  their application already has connected.
 
 I think we're not on the same page.  My point is that someone might want
 to automate the whole sequence: stop old postmaster, run pg_upgrade, start
 the updated postmaster normally (hence it *is* open for business), kick
 off the analyze runs.  If you're concerned about minimal downtime you
 would not want to be waiting around for the admin to issue a perfectly
 predictable series of commands.

Oh, yea. Definitely. I think that's what I've seen happen in pretty much
*all* usages of pg_upgrade. I somehow misread that you wanted to add
that into pg_upgrade. Not really sure how, sorry.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Josh Berkus
On 06/18/2014 10:48 AM, Abhijit Menon-Sen wrote:
 At 2014-06-18 10:44:56 -0700, j...@agliodbs.com wrote:

 I'm unclear on why we would overload pg_resetxlog for this. 
 
 Because pg_resetxlog already does something very similar, so the patch
 is small. If it were independent, it would have to copy quite some code
 from pg_resetxlog.

Aha.  In that case, it seems like it's time to rename pg_resetxlog, if
it does a bunch of things that aren't resetting the xlog.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Andres Freund
On 2014-06-18 10:59:59 -0700, Josh Berkus wrote:
 On 06/18/2014 10:48 AM, Abhijit Menon-Sen wrote:
  At 2014-06-18 10:44:56 -0700, j...@agliodbs.com wrote:
 
  I'm unclear on why we would overload pg_resetxlog for this. 
  
  Because pg_resetxlog already does something very similar, so the patch
  is small. If it were independent, it would have to copy quite some code
  from pg_resetxlog.
 
 Aha.  In that case, it seems like it's time to rename pg_resetxlog, if
 it does a bunch of things that aren't resetting the xlog.

Well, all those actually do write to the xlog (to write a new
checkpoint, containing the updated control file). Since pg_resetxlog has
done all this pretty much since forever renaming it now seems to be a
big hassle for users for pretty much no benefit? This isn't a tool the
average user should ever touch.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?

2014-06-18 Thread Fabrízio de Royes Mello
On Wed, Jun 18, 2014 at 1:59 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Stephen Frost sfr...@snowman.net writes:
  Yes.  I'd definitely like to see an ALTER TABLESPACE option, with an
  ERROR that lists out all of the non-temporary objects which are found
  (and lists any other databases which have objects in those
  tablespaces..).  That would allow administrators who have existing
  notionally temporary-only tablespaces to go clean things up to make them
  actually temporary-only.

 That seems just about impossible from a concurrency standpoint
 (ie, what if someone is creating a new table in the tablespace
 concurrently with your check?  Possibly in a different database?)


You are correct, I completely forgot that CREATE TABLESPACE is not
transaction safe.


 I would certainly suggest that the first version of the patch not
 undertake to allow this property to be ALTERed; the cost-benefit
 ratio isn't there IMO.


+1

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Josh Berkus
On 06/18/2014 11:03 AM, Andres Freund wrote:
 Well, all those actually do write to the xlog (to write a new
 checkpoint, containing the updated control file). Since pg_resetxlog has
 done all this pretty much since forever renaming it now seems to be a
 big hassle for users for pretty much no benefit? This isn't a tool the
 average user should ever touch.

If we're using it to create a unique system ID which can be used to
orchestrate replication and clustering systems, a lot more people are
going to be touching it than ever did before -- and not just for BDR.

Or are you saying that we have to destroy the data by resetting the xlog
before we can change the system identifier?  If so, this feature is less
than completely useful ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-18 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Tue, Jun 17, 2014 at 10:25 PM, Stephen Frost sfr...@snowman.net wrote:
  I agree that we want to support that, if we can do so reasonably.  What
  I was trying to get at is simply this- don't we provide that already
  with the leakproof attribute and functions?  If we don't have enough
  there to allow index scans then we should be looking to add more, I'm
  thinking.
 
 So the reason why we got onto this particular topic was because of the
 issue of multiple security policies for a single table.  Of course,
 multiple security policies can always be merged into a single
 more-complex policy, but the resulting policy may be so complex that
 the query-planner is no longer capable of doing a good job optimizing
 it.  

Yeah, I could see that happening with some use-cases.

  ALTER TABLE tab ADD POLICY polname WHERE quals;
  GRANT SELECT (polname) ON TABLE tab TO role;
 
  Right, if we were to support multiple policies on a given table then we
  would have to support adding and removing them individually, as well as
  specify when they are to be applied- and what if that when overlaps?
  Do we apply both and only a row which passed them all gets sent to the
  user?  Essentially we'd be defining the RLS policies to be AND'd
  together, right?  Would we want to support both AND-based and OR-based,
  and allow users to pick what set of conditionals they want applied to
  their various overlapping RLS policies?
 
 AND is not a sensible policy; it would need to be OR.  If you grant
 someone access to two different subsets of the rows in a table, it
 stands to reason that they will expect to have access to all of the
 rows that are in at least one of those subsets.  

I think I can buy off on this.  What that also means is that any
'short-circuiting' that we try to do here would be based on stop once
we get back a 'true'.  This could seriously change how we actually
implement RLS though as doing it all through query rewrites and making
this work with multiple security policies which are OR'd together and
yet keeping the optimization and qual push-down and index-based plans is
looking pretty daunting.  

I'm also of the opinion that this isn't strictly necessary for the
initial RLS offering in PG- there's a clear way we could migrate
existing users to a multi-policy system from a single-policy system.
Sure, to get the performance and optimization benefits that we'd
presumably have in the multi-policy case they'd need to re-work their
RLS configuration, but for users who care, they'll likely be very happy
to do so to gain those benefits.

Perhaps the question here is- if we implement RLS one way for the single
case and then change the implementation all around for the multi case,
will we end up breaking the single case?  Or destroying the performance
for it?  I can't see either of those cases being allowed- if and when we
support multi, we must still support single and the whole point of multi
would be to allow more performant implementations and that solution will
require the single case to be at least as performant as what we're
proposing to do today, I believe.

Or are you thinking that we would never support calling user-defined
functions in any RLS scheme because we want to be able to do that
optimization?  I don't see that being acceptable from a feature
standpoint.

 Alternatively, we could:
 
 - Require the user to specify in some way which of the available
 policies they want applied, and then apply only that one.

I'd want to at least see a way to apply an ordering to the policies
being applied, or have PG work out which one is cheapest and try that
one first.

 - Decide that such scenarios constitute misconfiguration. Throw an
 error and make the table owner or other relevant local authority fix
 it.

Having them all be OR'd together feels simpler and easier to work with
than trying to provide the user with all the knobs necessary to select
which subset of users they want the policy applied to when (user X from
IP range a.b.c.d/24 at time 1500).  We could probably make it work with
exclusion constraints, range types, etc, and perhaps it'd be a reason to
bring btree_gist into core (which I'm all for) and make it work with
catalog tables, but... just 'yuck' all around, for my part.

  Sounds all rather painful and much better done programatically by the
  user in a language which is suited to that task- eg: pl/pgsql, perl, C,
  or something besides our ALTER syntax + catalog representation.
 
 I think exactly the opposite, for the query planning reasons
 previously stated.  I think the policies will quickly get so
 complicated that they're no longer optimizable.  Here's a simple
 example:
 
 - Policy 1 allows the user to access rows for which complexfunc() returns 
 true.
 - Policy 2 allows the user to access rows for which a = 1.
 
 Most users have access only through policy 2, but some have access
 through policy 1.  Users who have access through policy 1 

Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?

2014-06-18 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Stephen Frost sfr...@snowman.net writes:
  Yes.  I'd definitely like to see an ALTER TABLESPACE option, with an
  ERROR that lists out all of the non-temporary objects which are found
  (and lists any other databases which have objects in those
  tablespaces..).  That would allow administrators who have existing
  notionally temporary-only tablespaces to go clean things up to make them
  actually temporary-only.
 
 That seems just about impossible from a concurrency standpoint
 (ie, what if someone is creating a new table in the tablespace
 concurrently with your check?  Possibly in a different database?)

Yeah, that's definitely an annoying complexity.

 I would certainly suggest that the first version of the patch not
 undertake to allow this property to be ALTERed; the cost-benefit
 ratio isn't there IMO.

I suppose scheduling downtime to do the check manually across all
databases, then drop and recreate the tablespace, would work.  As
someone who's working with a couple of these cases, it'd be awful nice
if there was a way PG would handle it for me.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_control is missing a field for LOBLKSIZE

2014-06-18 Thread Robert Haas
On Tue, Jun 17, 2014 at 8:46 PM, Bruce Momjian br...@momjian.us wrote:
 On Tue, Jun 17, 2014 at 07:12:16PM -0400, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  On Tue, Jun 17, 2014 at 03:55:02PM -0400, Alvaro Herrera wrote:
  Can't you compare it to the historic default value?  I mean, add an
  assumption that people thus far has never tweaked it.

  Well, if they did tweak it, then they would be unable to use pg_upgrade
  because it would complain about a mismatch if they actually matched the
  old and new servers.

 What about comparing to the symbolic value LOBLKSIZE?  This would make
 pg_upgrade assume that the old installation had been tweaked the same
 as in its own build.  This ends up being the same as what you said,
 ie, effectively no comparison ... but it might be less complicated to
 code/understand.

 OK, assume the compiled-in default is the value for an old cluster that
 has no value --- yeah, I could do that.

I'm not really sure why this is better than Bruce's original proposal, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idle_in_transaction_timeout

2014-06-18 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:

 I thought the reason why this hasn't been implemented before
 now is that sending an ErrorResponse to the client will result
 in a loss of protocol sync.

 Hmm ... you are right that this isn't as simple as an
 ereport(ERROR), but I'm not sure it's impossible.  We could for
 instance put the backend into skip-till-Sync state so that it
 effectively ignored the next command message.  Causing that to
 happen might be impracticably messy, though.

 Another thing we could maybe do is AbortCurrentTransaction() and
 send the client a NoticeResponse saying hey, expect all of your
 future commands to fail with complaints about the transaction
 being aborted.

Wow, until I read through the thread on this patch and the old
thread that Robert linked to I had forgotten the attempt by Andres
to deal with this three and a half years ago.  That patch died
because of how complicated it was to do the right thing if the
connection was left open.

Personally, where I have seen a use for this, treating it as FATAL
would be better anyway; but was OK with treating it as an ERROR if
that didn't sink the patch.  I guess that puts me in the same camp
as Josh and Robert.

Vik has submitted v1 and v2 of this patch; the only difference
between them is that v1 makes a timeout FATAL and v2 makes it an
ERROR (with appropriate corresponding changes to code comments,
documentation, and message text).  It's not hard to show the
problems with an ERROR under v2, confirming that the simple
approach of just changing the ereport severity is not enough:

test=# set idle_in_transaction_timeout = '3s';
SET
test=# begin;
BEGIN
test=# select 1;
ERROR:  current transaction is aborted due to idle-in-transaction timeout
test=# commit;
ERROR:  current transaction is aborted, commands ignored until end of 
transaction block
test=# commit;
ROLLBACK

The first thing is that I don't think a delay between the BEGIN and
the SELECT should cause a timeout to trigger, but more importantly
there should not be two ERROR responses to one SELECT statement.

I'm inclined to abandon the ERROR approach as not worth the effort
and fragility, and focus on v1 of the patch.  If we can't get to
consensus on that, I think that this patch should be flagged
Returned with Feedback, noting that any follow-up version
requires some way to deal with the issues raised regarding multiple
ERROR messages.

Abridged for space. hopefully without losing the main points of the
authors, so far:

Josh Berkus:
 My $0.018: terminating the connection is the
 preferred behavior.

Tom Lane:
 FWIW, I think aborting the transaction is probably better,
 especially if the patch is designed to do nothing to
 already-aborted transactions.  If the client is still there, it
 will see the abort as a failure in its next query, which is less
 likely to confuse it completely than a connection loss.  (I
 think, anyway.)

Álvaro Herrera:
 I think if we want this at all it should abort the xact, not drop
 the connection.

Andrew Dunstan:
 [quotes Tom's argument]
 Yes, I had the same thought.

David G Johnston:
 Default to dropping the connection but give the
 usersadministrators the capability to decide for themselves?

Robert Haas:
 Assuming that the state of play hasn't changed in some way I'm
 unaware of since 2010, I think the best argument for FATAL here
 is that it's what we can implement.  I happen to think it's
 better anyway, because the cases I've seen where this would
 actually be useful involve badly-written applications that are
 not under the same administrative control as the database and
 supposedly have built-in connection poolers, except sometimes
 they seem to forget about some of the connections they themselves
 opened.  The DBAs can't make the app developers fix the app; they
 just have to try to survive.  Aborting the transaction is a step
 in the right direction but since the guy at the other end of the
 connection is actually or in effect dead, that just leaves you
 with an eternally idle connection.

Tom Lane (reprise):
 I'm not sure whether cancel-transaction behavior is enough better
 than cancel-session to warrant extra effort here.

Andres Freund:
 [quotes Tom's latest (above)]
 I don't think idle_in_transaction_timeout is worth this, but if
 we could implement it we could finally have recovery conflicts
 against idle in txn sessions not use FATAL...

Kevin Grittner:
 Personally, where I have seen a use for this, treating it as
 FATAL would be better anyway

A few ideas were put forward on how a much more complicated patch
could allow transaction rollback with an ERROR work acceptably, but
I think that would probably need to be a new patch in a later CF,
so that is omitted here.

So, can we agree to use FATAL to terminate the connection?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers 

Re: [HACKERS] [bug fix] Memory leak in dblink

2014-06-18 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/10/2014 02:57 AM, MauMau wrote:
 From: Amit Kapila amit.kapil...@gmail.com
 Is there a need to free memory context in PG_CATCH()? In error
 path the memory due to this temporary context will get freed as
 it will delete the transaction context and this temporary context
 will definitely be in the hierarchy of it, so it should also get
 deleted.  I think such handling can be useful incase we use
 PG_CATCH() to suppress the error.
 
 I thought the same, but I also felt that I should make an effort
 to release resources as soon as possible, considering the memory
 context auto deletion as a last resort.  However, looking at other
 places where PG_CATCH() is used, memory context is not deleted.
 So, I removed the modification from PG_CATCH() block.  Thanks.

I think the context deletion was missed in the first place because
storeRow() is the wrong place to create the context. Rather than
creating the context in storeRow() and deleting it two levels up in
materializeQueryResult(), I think it should be created and deleted in
the interim layer, storeQueryResult(). Patch along those lines attached.

Any objections to me back-patching it this way?

Joe

- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTod+UAAoJEDfy90M199hlaWgP/iitSQm7m+Sf2i8hP73TAdTX
Naw+TKtag822xzVT6AoVJafeZvEv0PNRYurP69y4zE11cwiG9u+RfoO1X1hP7FYu
mdHo5++YG+znmqDVC57Qav0qAheYaCk2Y9RKEZk9gEJLUO+HJRxuzc+L277lsAni
Iyu3DyaiogzmGBtjrPPex4VMtpFAPHzHWfABaL11kvNBXXpUjO/Z02ex+1nuZDV7
+wNSV4IdTTsmpdbbi/NRhDeU7sZOerIAY29B6vI/GXfHdwhDhg+3tG4PuoDg6YG2
jX/H37UE0zq0+J3ySnM92HZtn9RxfrjdkHTz/X7DAjZjHNwBNVi18oNfrsXUyqHO
3j3VEaRelow2+tUDaTxziEkZRA0vCRoeG20B7dgQY3op60wm9lpJ+pCQH8UGwPYC
HDxaOYcVKmxbWq+j86l2ZyYlfP8sVbqTMEc00dnYoc2sdDzU36+KJadIqoMoUgds
G5uNYOZ5eTxumua9exz2cqGVOdgzcDD3r/ZAUUVM0jk3LwdA4CKLorsy26Ce59lF
mSIO58uAJe198tyOCmbpZjbypIRRO3Qm73SfUmioCbcUzSg2tDdPpf0LP62V/YEm
gFIsPHNyZnaVm0baLilbJFg+88sxFSo1hvpoaUfNPVww7woAfFxi6uBt5h5KthUO
JoDWxYjYUy9VFDdC4rh4
=gj6p
-END PGP SIGNATURE-
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index a81853f..46c4a51 100644
*** a/contrib/dblink/dblink.c
--- b/contrib/dblink/dblink.c
*** storeQueryResult(storeInfo *sinfo, PGcon
*** 1076,1081 
--- 1076,1089 
  	if (!PQsetSingleRowMode(conn))		/* shouldn't fail */
  		elog(ERROR, failed to set single-row mode for dblink query);
  
+ 	/* Create short-lived memory context for data conversions */
+ 	if (!sinfo-tmpcontext)
+ 		sinfo-tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+   dblink temporary context,
+   ALLOCSET_DEFAULT_MINSIZE,
+   ALLOCSET_DEFAULT_INITSIZE,
+   ALLOCSET_DEFAULT_MAXSIZE);
+ 
  	for (;;)
  	{
  		CHECK_FOR_INTERRUPTS();
*** storeQueryResult(storeInfo *sinfo, PGcon
*** 1119,1124 
--- 1127,1136 
  	/* clean up GUC settings, if we changed any */
  	restoreLocalGucs(nestlevel);
  
+ 	/* clean up data conversion short-lived memory context */
+ 	if (sinfo-tmpcontext != NULL)
+ 		MemoryContextDelete(sinfo-tmpcontext);
+ 
  	/* return last_res */
  	res = sinfo-last_res;
  	sinfo-last_res = NULL;
*** storeRow(storeInfo *sinfo, PGresult *res
*** 1204,1218 
  		if (sinfo-cstrs)
  			pfree(sinfo-cstrs);
  		sinfo-cstrs = (char **) palloc(nfields * sizeof(char *));
- 
- 		/* Create short-lived memory context for data conversions */
- 		if (!sinfo-tmpcontext)
- 			sinfo-tmpcontext =
- AllocSetContextCreate(CurrentMemoryContext,
- 	  dblink temporary context,
- 	  ALLOCSET_DEFAULT_MINSIZE,
- 	  ALLOCSET_DEFAULT_INITSIZE,
- 	  ALLOCSET_DEFAULT_MAXSIZE);
  	}
  
  	/* Should have a single-row result if we get here */
--- 1216,1221 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_control is missing a field for LOBLKSIZE

2014-06-18 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jun 17, 2014 at 8:46 PM, Bruce Momjian br...@momjian.us wrote:
 On Tue, Jun 17, 2014 at 07:12:16PM -0400, Tom Lane wrote:
 What about comparing to the symbolic value LOBLKSIZE?  This would make
 pg_upgrade assume that the old installation had been tweaked the same
 as in its own build.  This ends up being the same as what you said,
 ie, effectively no comparison ... but it might be less complicated to
 code/understand.

 OK, assume the compiled-in default is the value for an old cluster that
 has no value --- yeah, I could do that.

 I'm not really sure why this is better than Bruce's original proposal, though.

The net behavior would be the same, but I thought it might be easier to
code by thinking of it this way.  Or maybe it wouldn't --- it's just a
suggestion.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Memory leak in dblink

2014-06-18 Thread Tom Lane
Joe Conway m...@joeconway.com writes:
 I think the context deletion was missed in the first place because
 storeRow() is the wrong place to create the context. Rather than
 creating the context in storeRow() and deleting it two levels up in
 materializeQueryResult(), I think it should be created and deleted in
 the interim layer, storeQueryResult(). Patch along those lines attached.

Since the storeInfo struct is longer-lived than storeQueryResult(),
it'd probably be better if the cleanup looked like

+   if (sinfo-tmpcontext != NULL)
+   MemoryContextDelete(sinfo-tmpcontext);
+   sinfo-tmpcontext = NULL;

I find myself a bit suspicious of this whole thing though.  If it's
necessary to explicitly clean up the tmpcontext, why not also the
sinfo-cstrs allocation?  And what about the tupdesc, attinmeta,
etc created further up in that if (first) block?  I'd have expected
that all this stuff gets allocated in a context that's short-lived
enough that we don't really need to clean it up explicitly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Memory leak in dblink

2014-06-18 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/18/2014 12:09 PM, Tom Lane wrote:
 Joe Conway m...@joeconway.com writes:
 I think the context deletion was missed in the first place
 because storeRow() is the wrong place to create the context.
 Rather than creating the context in storeRow() and deleting it
 two levels up in materializeQueryResult(), I think it should be
 created and deleted in the interim layer, storeQueryResult().
 Patch along those lines attached.
 
 Since the storeInfo struct is longer-lived than
 storeQueryResult(), it'd probably be better if the cleanup looked
 like
 
 + if (sinfo-tmpcontext != NULL) +
 MemoryContextDelete(sinfo-tmpcontext); + sinfo-tmpcontext =
 NULL;


good point

 I find myself a bit suspicious of this whole thing though.  If
 it's necessary to explicitly clean up the tmpcontext, why not also
 the sinfo-cstrs allocation?  And what about the tupdesc,
 attinmeta, etc created further up in that if (first) block?  I'd
 have expected that all this stuff gets allocated in a context
 that's short-lived enough that we don't really need to clean it up
 explicitly.

Yeah, I thought about that too. My testing showed a small amount of
remaining leakage -- like 20 MB on 100,000 iterations -- I wasn't sure
that it was worthwhile to worry about. The memory context leak was
much larger.

Joe


- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJToeUQAAoJEDfy90M199hls8kP/0hhsVdn8HKcY7OZ64cKju4e
dH0SwQNMZxklpQkjEqb6vpbTmqaQ/7At/i7b6RGTMWFbIBu7L5Vpz+NZnJ7SyfaC
UjSgKfSe6M2auDIXlLCiZ62d8KZJJmVQj6U2h8ljhqorJq22kvClgXAUlcxZMoVv
SEthE1y7Udxgf3IqBO0AN6SMPrxP8bZDgrPbtZqw9UEHkCGZK0MDdH8TSRip7p6V
heyg9GbWeTvwFWBUYomMnMEUB6UlgBRnmHYsZIAjmUgxZfGiKhlydGOrb7MDHopz
ejyb06fg2MsSQnrEnCElLbonUqb5S9bD4p9BZHF/bz67AhVieJvDvnsSIoDjzR1a
+JIYe36ZDqo2kIGx4PuESgGX4ZTxxsrw033AQhVBW+KGkIuxjnvhh7tvEj3ACSjd
0bC1ztrnzCJLyNFd6jKaq5KAcNCEb/zvfKQAdHygJ87JkwE8I6u+ms1hdDpc10uZ
w484lcv/qP/JO2SaqPerDMwRsDXaovT8kEcsbqr1D7XFXu4cufHVGwe64IJYMjt+
9Y09/+i+Xt9DKXfkxC68Q2QWxwst5NfShRWDF3NG02iWlMqU2OsoVTQB6137DCiB
7hBKoNBmOApLJIjwCzEjB5IJmeeAz9x7YxgsqicRPnXoCKO+aF3dTbS+1u04va84
5XQ0p0lwUZaNfYr/xbi6
=H5bR
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_control is missing a field for LOBLKSIZE

2014-06-18 Thread Robert Haas
On Wed, Jun 18, 2014 at 2:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Jun 17, 2014 at 8:46 PM, Bruce Momjian br...@momjian.us wrote:
 On Tue, Jun 17, 2014 at 07:12:16PM -0400, Tom Lane wrote:
 What about comparing to the symbolic value LOBLKSIZE?  This would make
 pg_upgrade assume that the old installation had been tweaked the same
 as in its own build.  This ends up being the same as what you said,
 ie, effectively no comparison ... but it might be less complicated to
 code/understand.

 OK, assume the compiled-in default is the value for an old cluster that
 has no value --- yeah, I could do that.

 I'm not really sure why this is better than Bruce's original proposal, 
 though.

 The net behavior would be the same, but I thought it might be easier to
 code by thinking of it this way.  Or maybe it wouldn't --- it's just a
 suggestion.

Well, the difference is that if we just don't check it, there can
never be an error.  Basically, it's the user's job to DTRT.  If we
check it against some semi-arbitrary value, we'll catch the case where
the old cluster was modified with a custom setting and the new one was
not - but couldn't we also get false positives under obscure
circumstances?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idle_in_transaction_timeout

2014-06-18 Thread Josh Berkus
On 06/18/2014 11:50 AM, Kevin Grittner wrote:
 The first thing is that I don't think a delay between the BEGIN and
 the SELECT should cause a timeout to trigger, but more importantly
 there should not be two ERROR responses to one SELECT statement.

I do think a delay between BEGIN and SELECT should trigger the timeout.
 There are plenty of badly-written applications which auto-begin, that
is, they issue a BEGIN; immediately after every COMMIT; whether or
not there's any additional work to do.  This is a major source of IIT
and the timeout should not ignore it.

 I'm inclined to abandon the ERROR approach as not worth the effort
 and fragility, and focus on v1 of the patch.  If we can't get to
 consensus on that, I think that this patch should be flagged
 Returned with Feedback, noting that any follow-up version
 requires some way to deal with the issues raised regarding multiple
 ERROR messages.

+1

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_control is missing a field for LOBLKSIZE

2014-06-18 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Jun 18, 2014 at 2:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The net behavior would be the same, but I thought it might be easier to
 code by thinking of it this way.  Or maybe it wouldn't --- it's just a
 suggestion.

 Well, the difference is that if we just don't check it, there can
 never be an error.  Basically, it's the user's job to DTRT.  If we
 check it against some semi-arbitrary value, we'll catch the case where
 the old cluster was modified with a custom setting and the new one was
 not - but couldn't we also get false positives under obscure
 circumstances?

Huh?  What we'd be checking is the LOBLKSIZE compiled into pg_upgrade
versus that stored into pg_control by the new postmaster.  If those
are different, then pg_upgrade didn't come from the same build as the
new postmaster, which is already a pretty hazardous situation (especially
if the user is fooling with low-level stuff like this).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idle_in_transaction_timeout

2014-06-18 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
  There are plenty of badly-written applications which auto-begin, that
 is, they issue a BEGIN; immediately after every COMMIT; whether or
 not there's any additional work to do.  This is a major source of IIT
 and the timeout should not ignore it.

Nonsense.  We explicitly don't do anything useful until the first actual
command arrives, precisely to avoid that problem.

It might be that we should slap such apps' wrists anyway, but given
that we've gone to the trouble of working around the behavior at the
system structural level, I'd be inclined to say not.  What you'd be
doing is preventing people who have to deal with such apps from using
the timeout in any useful fashion.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_control is missing a field for LOBLKSIZE

2014-06-18 Thread Robert Haas
On Wed, Jun 18, 2014 at 3:28 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Jun 18, 2014 at 2:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The net behavior would be the same, but I thought it might be easier to
 code by thinking of it this way.  Or maybe it wouldn't --- it's just a
 suggestion.

 Well, the difference is that if we just don't check it, there can
 never be an error.  Basically, it's the user's job to DTRT.  If we
 check it against some semi-arbitrary value, we'll catch the case where
 the old cluster was modified with a custom setting and the new one was
 not - but couldn't we also get false positives under obscure
 circumstances?

 Huh?  What we'd be checking is the LOBLKSIZE compiled into pg_upgrade
 versus that stored into pg_control by the new postmaster.  If those
 are different, then pg_upgrade didn't come from the same build as the
 new postmaster, which is already a pretty hazardous situation (especially
 if the user is fooling with low-level stuff like this).

OK, I agree that checking that wouldn't hurt anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is analyze_new_cluster.sh still useful?

2014-06-18 Thread Jeff Janes
On Wed, Jun 18, 2014 at 10:58 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-06-18 13:54:16 -0400, Tom Lane wrote:

 I think we're not on the same page.  My point is that someone might want
 to automate the whole sequence: stop old postmaster, run pg_upgrade, start
 the updated postmaster normally (hence it *is* open for business), kick
 off the analyze runs.  If you're concerned about minimal downtime you
 would not want to be waiting around for the admin to issue a perfectly
 predictable series of commands.

 Oh, yea. Definitely. I think that's what I've seen happen in pretty much
 *all* usages of pg_upgrade.

I think it is a popular way to do it not because it is a particularly
good way, but because the better alternatives are not readily
available.

If your database needs statistics badly enough that you want to do a
coarse pre-pass with default_statistics_target=1, why would you want
that pass to be done on an open database?  Surely you don't want 100
open connections all doing giant seq scans (that should be single-row
look up, but without stats they are not) competing with the analyze.

Having a database which is open to queries but they have such
deranged execution plans that they never actually finish is not truly
open, and the attempts to service those futile queries just delays the
true opening even further.

If you really need a multi pass ANALYZE, you probably need the first
pass to be before the database opens because otherwise the open will
be a disaster, and the 2nd pass to be after the database opens but
before your bulk queries (mass deletes, EOM reports, etc.) kick in.
Having both passes be on the same side of the opening seems unlikely
to do much good for most use cases.  Fortunately it probably doesn't
do much harm to most people, either, simple because most databases are
not terribly sensitive to the issue.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Reduce the number of semaphores used under --disable-spinlocks.

2014-06-18 Thread Robert Haas
On Wed, Jun 18, 2014 at 3:32 PM, Andres Freund and...@2ndquadrant.com wrote:
 Hi,

 On 2014-01-08 23:58:16 +, Robert Haas wrote:
 Reduce the number of semaphores used under --disable-spinlocks.

 Instead of allocating a semaphore from the operating system for every
 spinlock, allocate a fixed number of semaphores (by default, 1024)
 from the operating system and multiplex all the spinlocks that get
 created onto them.  This could self-deadlock if a process attempted
 to acquire more than one spinlock at a time, but since processes
 aren't supposed to execute anything other than short stretches of
 straight-line code while holding a spinlock, that shouldn't happen.

 One motivation for this change is that, with the introduction of
 dynamic shared memory, it may be desirable to create spinlocks that
 last for less than the lifetime of the server.  Without this change,
 attempting to use such facilities under --disable-spinlocks would
 quickly exhaust any supply of available semaphores.  Quite apart
 from that, it's desirable to contain the quantity of semaphores
 needed to run the server simply on convenience grounds, since using
 too many may make it harder to get PostgreSQL running on a new
 platform, which is mostly the point of --disable-spinlocks in the
 first place.

 I'm looking at the way you did this in the context of the atomics
 patch. Won't:
 s_init_lock_sema(volatile slock_t *lock)
 {
 static int  counter = 0;

 *lock = (++counter) % NUM_SPINLOCK_SEMAPHORES;
 }

 lead to bad results if spinlocks are intialized after startup?

Why?

 Essentially mapping new spinlocks to the same semaphore?

Yeah, but so what?  If we're mapping a bajillion spinlocks to the same
semaphore already, what's a few more?

 That's a
 restriction I can live with, especially as this is only for super old
 platforms. But it might be worth mentioning somewhere?

Dunno.  What restriction?

 I've essentially reeimplemented that kind of logic in the atomics
 patch. Looking to get rid of the duplication... There I used something
 like
 slot = ((uintptr_t) addr_of_atomic  (sizeof(void*) + 5)) % NUM_LOCKS
 but I think your method is actually better because it allows to place
 spinlocks/atomics to be placed in dsm segments placed at different
 location in individual processes.

Right.

 My current plan to get rid of the duplication is to simply embed the
 spinlock inside the atomic variable instead of having a separate array
 of spinlocks protecting atomic variables.

Doesn't sound crazy at first glance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?

2014-06-18 Thread Alvaro Herrera
Stephen Frost wrote:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
  Stephen Frost sfr...@snowman.net writes:
   Yes.  I'd definitely like to see an ALTER TABLESPACE option, with an
   ERROR that lists out all of the non-temporary objects which are found
   (and lists any other databases which have objects in those
   tablespaces..).  That would allow administrators who have existing
   notionally temporary-only tablespaces to go clean things up to make them
   actually temporary-only.

  I would certainly suggest that the first version of the patch not
  undertake to allow this property to be ALTERed; the cost-benefit
  ratio isn't there IMO.
 
 I suppose scheduling downtime to do the check manually across all
 databases, then drop and recreate the tablespace, would work.  As
 someone who's working with a couple of these cases, it'd be awful nice
 if there was a way PG would handle it for me.

I wonder if some form of NOT VALID marking could be useful here.  Of
course, this is not a constraint.  But a mechanism of a similar spirit
seems apropos.  It seems reasonable to leave such a thing for future
improvement.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idle_in_transaction_timeout

2014-06-18 Thread Josh Berkus
On 06/18/2014 12:32 PM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
  There are plenty of badly-written applications which auto-begin, that
 is, they issue a BEGIN; immediately after every COMMIT; whether or
 not there's any additional work to do.  This is a major source of IIT
 and the timeout should not ignore it.
 
 Nonsense.  We explicitly don't do anything useful until the first actual
 command arrives, precisely to avoid that problem.

Oh, we don't allocate a snapshot?  If not, then no objection here.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Reduce the number of semaphores used under --disable-spinlocks.

2014-06-18 Thread Andres Freund
On 2014-06-18 15:52:49 -0400, Robert Haas wrote:
 On Wed, Jun 18, 2014 at 3:32 PM, Andres Freund and...@2ndquadrant.com wrote:
  Hi,
 
  On 2014-01-08 23:58:16 +, Robert Haas wrote:
  Reduce the number of semaphores used under --disable-spinlocks.
 
  Instead of allocating a semaphore from the operating system for every
  spinlock, allocate a fixed number of semaphores (by default, 1024)
  from the operating system and multiplex all the spinlocks that get
  created onto them.  This could self-deadlock if a process attempted
  to acquire more than one spinlock at a time, but since processes
  aren't supposed to execute anything other than short stretches of
  straight-line code while holding a spinlock, that shouldn't happen.
 
  One motivation for this change is that, with the introduction of
  dynamic shared memory, it may be desirable to create spinlocks that
  last for less than the lifetime of the server.  Without this change,
  attempting to use such facilities under --disable-spinlocks would
  quickly exhaust any supply of available semaphores.  Quite apart
  from that, it's desirable to contain the quantity of semaphores
  needed to run the server simply on convenience grounds, since using
  too many may make it harder to get PostgreSQL running on a new
  platform, which is mostly the point of --disable-spinlocks in the
  first place.
 
  I'm looking at the way you did this in the context of the atomics
  patch. Won't:
  s_init_lock_sema(volatile slock_t *lock)
  {
  static int  counter = 0;
 
  *lock = (++counter) % NUM_SPINLOCK_SEMAPHORES;
  }
 
  lead to bad results if spinlocks are intialized after startup?
 
 Why?

Because every further process will start with a copy of the postmaster's
counter or with 0 (EXEC_BACKEND)?

  Essentially mapping new spinlocks to the same semaphore?
 
 Yeah, but so what?  If we're mapping a bajillion spinlocks to the same
 semaphore already, what's a few more?

Well, imagine something like parallel query creating new segments,
including a spinlock (possibly via a lwlock) at runtime. If there were
several backends processing such queries this they'd all map to the same
semaphore.

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Memory leak in dblink

2014-06-18 Thread Tom Lane
Joe Conway m...@joeconway.com writes:
 On 06/18/2014 12:09 PM, Tom Lane wrote:
 I find myself a bit suspicious of this whole thing though.  If
 it's necessary to explicitly clean up the tmpcontext, why not also
 the sinfo-cstrs allocation?  And what about the tupdesc,
 attinmeta, etc created further up in that if (first) block?  I'd
 have expected that all this stuff gets allocated in a context
 that's short-lived enough that we don't really need to clean it up
 explicitly.

 Yeah, I thought about that too. My testing showed a small amount of
 remaining leakage -- like 20 MB on 100,000 iterations -- I wasn't sure
 that it was worthwhile to worry about. The memory context leak was
 much larger.

[ Thinks for awhile... ]  Ah.  The memory context is a child of
the econtext's ecxt_per_tuple_memory, which is supposed to be
short-lived, but we only do MemoryContextReset() on that after
each tuple, not MemoryContextResetAndDeleteChildren().  I recall
there's been debate about changing that, but it's not something
we can risk changing in back branches, for sure.  So I concur
we need an explicit context delete here.

I do see growth in the per-query context as well.  I'm not sure
where that's coming from, but we probably should try to find out.
A couple hundred bytes per iteration is going to add up, even if it's
not as fast as 8K per iteration.  I'm not sure it's dblink's fault,
because I don't see anything in dblink.c that is allocating anything in
the per-query context, except for the returned tuplestores, which
somebody else should clean up.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?

2014-06-18 Thread Fabrízio de Royes Mello
On Wed, Jun 18, 2014 at 4:53 PM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:

 Stephen Frost wrote:
  * Tom Lane (t...@sss.pgh.pa.us) wrote:
   Stephen Frost sfr...@snowman.net writes:
Yes.  I'd definitely like to see an ALTER TABLESPACE option, with an
ERROR that lists out all of the non-temporary objects which are
found
(and lists any other databases which have objects in those
tablespaces..).  That would allow administrators who have existing
notionally temporary-only tablespaces to go clean things up to make
them
actually temporary-only.

   I would certainly suggest that the first version of the patch not
   undertake to allow this property to be ALTERed; the cost-benefit
   ratio isn't there IMO.
 
  I suppose scheduling downtime to do the check manually across all
  databases, then drop and recreate the tablespace, would work.  As
  someone who's working with a couple of these cases, it'd be awful nice
  if there was a way PG would handle it for me.

 I wonder if some form of NOT VALID marking could be useful here.  Of
 course, this is not a constraint.  But a mechanism of a similar spirit
 seems apropos.  It seems reasonable to leave such a thing for future
 improvement.


+1

Then, to summarize Matheus must do:
* use an option instead of change the syntax and catalog to indicate that a
tablespace is used to store temp objects
* throw an exception if we try ALTER the option only_temp_relations
* add regression tests
* add documentation

And, of course, register to the next open commitfest [1] to get detailed
feedback and review.

Regards,

[1] https://commitfest.postgresql.org/

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] [COMMITTERS] pgsql: Reduce the number of semaphores used under --disable-spinlocks.

2014-06-18 Thread Robert Haas
On Wed, Jun 18, 2014 at 3:56 PM, Andres Freund and...@2ndquadrant.com wrote:
  I'm looking at the way you did this in the context of the atomics
  patch. Won't:
  s_init_lock_sema(volatile slock_t *lock)
  {
  static int  counter = 0;
 
  *lock = (++counter) % NUM_SPINLOCK_SEMAPHORES;
  }
 
  lead to bad results if spinlocks are intialized after startup?

 Why?

 Because every further process will start with a copy of the postmaster's
 counter or with 0 (EXEC_BACKEND)?

Oh, true.  Maybe we should randomize that.

  Essentially mapping new spinlocks to the same semaphore?

 Yeah, but so what?  If we're mapping a bajillion spinlocks to the same
 semaphore already, what's a few more?

 Well, imagine something like parallel query creating new segments,
 including a spinlock (possibly via a lwlock) at runtime. If there were
 several backends processing such queries this they'd all map to the same
 semaphore.

Yeah.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idle_in_transaction_timeout

2014-06-18 Thread Robert Haas
On Wed, Jun 18, 2014 at 3:53 PM, Josh Berkus j...@agliodbs.com wrote:
 On 06/18/2014 12:32 PM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
  There are plenty of badly-written applications which auto-begin, that
 is, they issue a BEGIN; immediately after every COMMIT; whether or
 not there's any additional work to do.  This is a major source of IIT
 and the timeout should not ignore it.

 Nonsense.  We explicitly don't do anything useful until the first actual
 command arrives, precisely to avoid that problem.

 Oh, we don't allocate a snapshot?  If not, then no objection here.

The only problem I see is that it makes the semantics kind of weird
and confusing.  Kill connections that are idle in transaction for too
long is a pretty clear spec; kill connections that are idle in
transaction except if they haven't executed any commands yet because
we think you don't care about that case is not quite as clear, and
not really what the GUC name says, and maybe not what everybody wants,
and maybe masterminding.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics hardware support table supported architectures

2014-06-18 Thread Robert Haas
On Wed, Jun 18, 2014 at 12:13 PM, Andres Freund and...@2ndquadrant.com wrote:
 There might be cases where that's true, but in the majority of cases
 where the variable isn't highly contended it's just about the same. In
 many cases the microcode will implement atomic ops using ll/sc
 operations internally anyway.

But if the variable isn't highly contended, then why are we messing
around with atomic ops in the first place?

 I'm fine with always implementing everything but tas, cas, add ontop of
 cas for now. I want or/and/sub/... to be conveniently available to C
 code, but they don't need to be based on hardware ops. Having open-coded
 versions of the above in several places sounds like a horrible idea to
 me. Both, because we might want to optimize it at some point and because
 it's horrible readability wise.

OK, so if we're only going to have TAS, CAS, and fetch-and-add as
primitives (which sounds sensible to me) and implement the rest in
terms of those for now, then the table on the wiki only needs one more
column: information about support for fetch-and-add.

 More than that, I actually really hate
 things that don't have a configure option, like WAL_DEBUG, because you
 have to change a checked-in file, which shows up as a diff, and if
 you're not careful you check it in, and if you are careful it still
 gets blown away every time you git reset --hard, which I do a lot.  I
 think the fact that both Heikki and I on separate occasions have made
 commits enabling WAL_DEBUG shows pretty clearly the weaknesses of that
 method of doing business.

 Why don't you pass it to configure or add it in Makefile.custom? That's
 what I do.

Yeah, I should probably do that instead.  But I still think the point
that two different commiters have managed to flip that flag by
accident is telling.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics hardware support table supported architectures

2014-06-18 Thread Andres Freund
On 2014-06-18 17:04:49 -0400, Robert Haas wrote:
 On Wed, Jun 18, 2014 at 12:13 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  There might be cases where that's true, but in the majority of cases
  where the variable isn't highly contended it's just about the same. In
  many cases the microcode will implement atomic ops using ll/sc
  operations internally anyway.
 
 But if the variable isn't highly contended, then why are we messing
 around with atomic ops in the first place?

a) Quite often the strange atomic op is only needed to allow a more
   performance critical codepart to use atomic ops (e.g. setting flags
   atomically is only required to make atomic pins work).
b) A spinlock protected region is often more likely to take longer than
   a single atomic op because it'll often touch more cachelines and
   such. For such cmpxchg loops there's pretty much no intermediary
   instructions inbetween which the cacheline could be stolen by another
   core.
c) The overall amount of bus traffic is often noticeably lower with a
   atomic op because the average total number of locked instructions is
   lower (unlocking often enough requires another atomic op or barrier)


  Why don't you pass it to configure or add it in Makefile.custom? That's
  what I do.
 
 Yeah, I should probably do that instead.  But I still think the point
 that two different commiters have managed to flip that flag by
 accident is telling.

Not arguing against having the configure flag here (already decided),
just wondering whether your life could be made easier :)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >