Re: [HACKERS] pg_archivecleanup should remove WAL files also in pg_xlog?

2010-12-12 Thread Fujii Masao
On Mon, Dec 13, 2010 at 4:28 PM, Heikki Linnakangas
 wrote:
> On 13.12.2010 08:44, Fujii Masao wrote:
>>
>> pg_archivecleanup removes unnecessary WAL files from the archive, but not
>> from pg_xlog directory. So, after failover, those WAL files might
>> exist in pg_xlog and be archived again later.
>
> A file that has already been archived successfully should not be archived
> again. The server keeps track of which files it has already archived with
> the .ready/.done files.

This seems to require

* archiver to save the last archived WAL file name in the shmem
* walsender to send it to walreceiver
* walreceiver to create .done file when it's arrived
* bgwriter not to remove WAL files which don't have .done file in standby

Right? One good side effect of this is that we can prevent WAL files from
being removed from the standby before the master archives them.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] hstores in pl/python

2010-12-12 Thread Jan Urbański
It would be cool to be able to transparently use hstores as Python
dictionaries and vice versa. It would be easy enough with hstore as a
core type, but with hstore as an addon it's not that easy.

There was talk about including hstore in core, is there still chance for
that to happen in 9.1? I'd like to include hstore<->dict handling, but
with hstore out-of-core the only half-sane way I see is:
 * hack PL/Python's makefile to add -Icontrib/hstore (yuck!)
 * create an extension module for Python that knows how to handle
hstores that would live next to plpython.so
 * install it in $libdir on make install
 * when PL/Python receives or is asked to create an hstore, load the
extension module and use it to parse the value (ugly, probably slow)
 * the module would also have to make sure hstore.so is loaded in the
database, which in itself is not pretty, as it would refer to
hstore_in/out symbols

I wrote a module that can be used with current PL/Python to simplify
hstore handling (https://github.com/wulczer/pyhstore), but it suffers
from most of the aforementioned problems, and on top of that you get
hstore->text->dict instead of just hstore->dict, which sucks.

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Simon Riggs
On Sun, 2010-12-12 at 19:07 -0500, Tom Lane wrote:
> Robert Haas  writes:
> > ...  On the
> > other hand, there's clearly also a use case for this behavior.  If a
> > bulk load of prevalidated data forces an expensive revalidation of
> > constraints that are already known to hold, there's a real chance the
> > DBA will be backed into a corner where he simply has no choice but to
> > not use foreign keys, even though he might really want to validate the
> > foreign-key relationships on a going-forward basis.
> 
> There may well be a case to be made for doing this on grounds of
> practical usefulness.  I'm just voicing extreme skepticism that it can
> be supported by reference to the standard.
> 
> Personally I'd prefer to see us look into whether we couldn't arrange
> for low-impact establishment of a verified FK relationship, analogous to
> CREATE INDEX CONCURRENTLY.  We don't let people just arbitrarily claim
> that a uniqueness condition exists, and ISTM that if we can handle that
> case we probably ought to be able to handle FK checking similarly.

I think we should do *both* things. Sometimes you already know the check
will pass, sometimes you don't. In particular, reloading data from
another source where you knew the checks passed. Enforcing re-checking
in that case reduces data availability.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread tv
> On Sun, Dec 12, 2010 at 9:16 PM, Tomas Vondra  wrote:
>> Dne 13.12.2010 03:00, Robert Haas napsal(a):
>>> Well, the question is what data you are actually storing.  It's
>>> appealing to store a measure of the extent to which a constraint on
>>> column X constrains column Y, because you'd only need to store
>>> O(ncolumns^2) values, which would be reasonably compact and would
>>> potentially handle the zip code problem - a classic "hard case" rather
>>> neatly.  But that wouldn't be sufficient to use the above equation,
>>> because there A and B need to be things like "column X has value x",
>>> and it's not going to be practical to store a complete set of MCVs for
>>> column X for each possible value that could appear in column Y.
>>
>> O(ncolumns^2) values? You mean collecting such stats for each possible
>> pair of columns? Well, I meant something different.
>>
>> The proposed solution is based on contingency tables, built for selected
>> groups of columns (not for each possible group). And the contingency
>> table gives you the ability to estimate the probabilities needed to
>> compute the selectivity. Or am I missing something?
>
> Well, I'm not real familiar with contingency tables, but it seems like
> you could end up needing to store a huge amount of data to get any
> benefit out of it, in some cases.  For example, in the United States,
> there are over 40,000 postal codes, and some even larger number of
> city names, and doesn't the number of entries go as O(m*n)?  Now maybe
> this is useful enough anyway that we should Just Do It, but it'd be a
> lot cooler if we could find a way to give the planner a meaningful
> clue out of some more compact representation.

Yes, storing a complete contingency table is not feasible in such cases.
My original proposal actually did not address this particular issue
(cities and ZIP codes) as it was based on simplified contingency tables
(with bins corresponding to current histograms, not to individual values).
So the number of values to store would grow much slower.

On the other hand, this generalization really makes it unusable in some
cases, and the issue we're discussing here (cities and ZIP codes) is one
of them. I think in such cases we could build a contingency table for MCV
and then use it to estimate those conditional probabilities we need, but I
expect it to be very tricky.

Thanks for the comments.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_archivecleanup should remove WAL files also in pg_xlog?

2010-12-12 Thread Heikki Linnakangas

On 13.12.2010 08:44, Fujii Masao wrote:

pg_archivecleanup removes unnecessary WAL files from the archive, but not
from pg_xlog directory. So, after failover, those WAL files might
exist in pg_xlog and be archived again later.


A file that has already been archived successfully should not be 
archived again. The server keeps track of which files it has already 
archived with the .ready/.done files.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: auxiliary functions for record type

2010-12-12 Thread Pavel Stehule
2010/12/12 Florian Pflug :
> On Dec12, 2010, at 00:19 , Pavel Stehule wrote:
>> I prefer a table based
>> solution, because I don't need a one "unnest", but other preferences
>> are valid too.
> That's fine with me.
>
>> I dissatisfied with your design of explicit target type
>> via unused value.  I think, so we are not a infrastructure for it now
>> - from my view is better to use a common type, that is text now. It's
>> nothing new - plpgsql use it too.
> Sorry, I can't follow you here. Where does plpgsql use text as "common" type?
>
>> I see one well design of explicit target type based on polymorphic
>> types that respect a PostgreSQL fmgr practice:
>>
>> We have to allow a polymorphic functions without polymorphic
>> parameters. These functions shoud be designed to return value in
>> "unknown" type format when this function has not outer information.
> I don't think "unknown" is the right type for that. As far as I known, 
> "unknown" is still a textual type, used to have some type to assign to string 
> literals during parsing when no better type can be inferred.
>
>> This information can be passed in function context. When function
>> context isn't null, then function has to read target type and should
>> to return value in target type. Who can fill a function context? It is
>> task for executor. And when CAST contains just function call, then we
>> can recheck, if function is polymorphic, and if it is, then we can set
>> function context to target type, and then we don't need to call a
>> conversion function, because polymorphic function must returns data in
>> correct format.
> The main difficulty is that currently types are assigned in a bottom-up 
> fashion as far as I know. To make functions with a polymorphic return value, 
> but without polymorphic arguments work, you need to assign the return type in 
> a top-down fashion (It depends on where to value *goes*, not where it *comes 
> from*). That seems like a rather huge change and has the potential to 
> complicate quite a few other parts, most notably function lookup/resolution.
>
> Plus, the general case where type information must bubble up more than one 
> level seems pretty much intractable, as it'd require a full-blown type 
> inference algorithm like ML or Haskell. Not a place where we want to go, I 
> believe.
>
> The restricted case, on the other hand, brings very little benefit compared 
> to the dummy-parameter approach. Yeah, "()::type" may 
> look a bit cleaner than "(NULL::type)", but thats about 
> is. It's only assignments in pl/pgsql which really benefit, since you'd be 
> able to leave out the type completely, writing simply "v_value := 
> ()". Does that really warrant the effort that'd be 
> involved?

There is a second possibility - and hardly simpler. We can use a
specialised statement with own parser/executor node. Then
implementation should be really simply

syntax:

EXTRACT_VALUE(expr1 FROM expr2 AS typename) ... RETURNS typename

expr1 ... result must be converted to text .. fieldname
expr2 ... result must be composite type

disadvantage - EXTRACT_VALUE must be a keyword
advantage - simple implementation, available for all environments, readable

var := EXTRACT_VALUE('f1' FROM myrec AS int);

note: name for this statement isn't important now, can be EXTRACT_FIELD, ...

comments, ideas?

Regards

Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_archivecleanup should remove WAL files also in pg_xlog?

2010-12-12 Thread Fujii Masao
On Mon, Dec 13, 2010 at 3:44 PM, Fujii Masao  wrote:
> pg_archivecleanup removes unnecessary WAL files from the archive, but not
> from pg_xlog directory. So, after failover, those WAL files might
> exist in pg_xlog
> and be archived again later. Re-archiving of unnecessary WAL files seems odd
> to me. To avoid this problem, how about changing pg_archivecleanup so that
> it removes WAL files also in pg_xlog or creates .done file in
> archive_status when
> removing them from the archive?

Well, we can avoid this problem by specifying pg_xlog directory instead of
the archive in recovery_end_command:

recovery_end_command = 'pg_archivecleanup pg_xlog %r'

Though this sounds like somewhat bad know-how..

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_archivecleanup should remove WAL files also in pg_xlog?

2010-12-12 Thread Fujii Masao
Hi,

pg_archivecleanup removes unnecessary WAL files from the archive, but not
from pg_xlog directory. So, after failover, those WAL files might
exist in pg_xlog
and be archived again later. Re-archiving of unnecessary WAL files seems odd
to me. To avoid this problem, how about changing pg_archivecleanup so that
it removes WAL files also in pg_xlog or creates .done file in
archive_status when
removing them from the archive?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - file_fdw

2010-12-12 Thread Itagaki Takahiro
On Sat, Dec 11, 2010 at 05:30, Andrew Dunstan  wrote:
> On 12/04/2010 11:11 PM, Itagaki Takahiro wrote:
>> One exports the copy functions from the core, and another
>> implements file_fdw using the infrastructure.
>
> Who is actually going to do this split?

I'm working for it :-)  I extract those functions from copy.c:

- CopyState BeginCopyFrom(Relation rel, const char *filename,
  List *attnamelist, List *options);
- void EndCopyFrom(CopyState cstate);
- bool NextCopyFrom(CopyState cstate,
Datum *values, bool *nulls, Oid *oid);

There was Reset() in file_fdw, but it is not contained in the
patch. It will be added again if required, but I wonder we might
need not only reset but also mark/restore a position in a file.

-- 
Itagaki Takahiro


copy_export-20101213.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unlogged tables

2010-12-12 Thread Rob Wultsch
On Sun, Dec 12, 2010 at 7:33 PM, Robert Haas  wrote:
> On Sun, Dec 12, 2010 at 9:31 PM, Rob Wultsch  wrote:
>> On Fri, Dec 10, 2010 at 5:34 PM, Cédric Villemain
>>  wrote:
>>> 2010/12/8 Kineticode Billing :
 On Dec 8, 2010, at 10:37 AM, Chris Browne wrote:

> Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.

 EVANESCENT.
>>>
>>> UNSAFE ?
>>>
>> 
>> MyISAM
>> 
>
> Heh.  But that would be corrupt-on-crash, not truncate-on-crash, no?
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Yep. Truncate-on-shutdown MySQL options are the MEMORY and PBXT (using
the memory resident option).


I like TRANSIENT but wonder if MEMORY might be more easily understood by users.
-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Rob Wultsch
On Sun, Dec 12, 2010 at 7:24 PM, Andrew Dunstan  wrote:
> In fact it's possible now to disable FK enforcement, by disabling the
> triggers. It's definitely a footgun though. Just the other day I was asked
> how data violating the constraint could have got into the table, and caused
> some surprise by demonstrating how easy this was to produce.

Ugh. I have read the entire pg manual and I did not recall that
footgun.  At least in MySQL disabling fk's is explicit. There is
something to be said for being able to tell the database: "Hey, hold
my beer and watch this, it might be stupid but it is what we are going
to do". The database telling it's user that is a much larger issue
(and yes, MySQL is generally worse). The user at least gets to talk to
db through sql, the database only really gets to talk to the user
through errors and the manual.

The fact that fk checks are implemented by the trigger system somehow
seems "surprising".

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Use symbolic names not octal constants for file permission flags

2010-12-12 Thread Glen Knowles
On Sun, Dec 12, 2010 at 8:16 AM, Tom Lane  wrote:

> > What are the values of _S_IREAD and _S_IWRITE, anyway?  I'm still
> > wondering how come the previous coding with hardwired constants
> > behaved correctly.
>
> Still curious about this.
>
> FWIW, _S_IREAD and _S_IWRITE are defined by Visual Studio C++ 2008 in
sys/stat.h as 0x0100 and 0x0080 respectively.

Glen


[HACKERS] pg_is_in_recovery=1

2010-12-12 Thread aaliya zarrin
Hi All,

When pg_is_in_recovery in the table changes to zero(status change)??

At the time when recovery stops?
If switch over has to be done then, after receivibg the signal and telling
the postgres to run the startup process (got_SIGHUP = true),
shall we have to stop replication or wait for the recovery to get stop?
Can I change this pg_is_in_recovery = 0?

-- 
Thanks & Regards,

Aaliya Zarrin
(+91)-9160665888


Re: [HACKERS] SQL/MED - core functionality

2010-12-12 Thread Shigeru HANADA
On Sun, 12 Dec 2010 23:47:53 +0200
Peter Eisentraut  wrote:
> On ons, 2010-12-01 at 12:30 +0900, Hitoshi Harada wrote:
> > From a user's view, this is very long way to see a simplest foreign
> > table. I know it is based on the standard, but I really want a
> > shortcut. Especially, I don't understand why CREATE USER MAPPING FOR
> > current_user SERVER  is needed for default use case. If you
> > forget CREATE USER MAPPING and do CREATE FOREIGN TABLE, it raises an
> > error. User mapping is useful if the local user and remote user should
> > be mapped but I imagine in most cases they are the same.
> > postgresql_fdw can tell the remote user by conninfo string, in
> > addition.
> 
> I reviewed the standard about this, and a lot of things are
> implementation-defined.  I think user mappings could be made optional.

Simple FDWs such as File FDW might not have concept of "user" on
remote side.  In such case, it would be enough to control access
privilege per local user with GRANT/REVOKE SELECT statement.

> > This is another topic, but it would be useful if CREATE FOREIGN TABLE
> > can omit column definitions since fdw usually knows what should be
> > there in the definitions. I some times mistyped the column names
> > between remote and local and resulted in fail on execution.
> 
> Also, according to the standard, the column list in CREATE FOREIGN TABLE
> is optional (if you can get it in some automatic way, of course).

To allow omitting column definitions for that purpose, a way to create
ero-column tables would have to be provided.  New syntax which allows
FDWs to determine column definition would be necessary.

ex)
-- Create foo from the remote table foo on the server bar
CREATE FOREIGN TABLE foo SERVER bar;
-- Create zero-column table foo
CREATE FOREIGN TABLE foo () SERVER bar;

To support this feature, another hook function need to be added to FDW
API.  ISTM that this feature should be considered with IMPORT SCHEMA
statement.

Regards,
--
Shigeru Hanada



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] rest of works for security providers in v9.1

2010-12-12 Thread KaiGai Kohei
I'd like to see opinions what facilities should be developed
to the current v9.1 development cycle.

We have integrated some of facilities to support a starter-
version of SE-PostgreSQL. It allows to hook controls on DML
permission checks and assign security labels of client and
database obejcts either by-hand or automatically.

On the other hand, hooks on DDL permission checks are still
future works from now. I believe object_access_hook is applied
on various kind of DDL permission checks, but we cannot complete
to put these hooks at once, because of patch scale.

So, I plan to integrate the following four facilities only in
the last commit-fest of v9.1, although it still does not cover
comprehensive DDL accesses .


* Expand object_access_hook to deliver arguments

Some of DDL hooks will need to deliver several arguments
in addition to OID of the object being modified. For example,
a flag to show whether this deletion is cascaded, or not.
So, prototype of the object_access_hook needs to be revised.

My idea is to add two arguments: an integer variable for number
of arguments and an array variable for the additional information.
Then, macros will wrap up invocation of this hook to keep
the code simple.


* Permission checks on object-prep-creation

It was not well concluded in the previous discussion, whether
two hooks are needed, or one.
I think the idea to divide creation hooks into two phases by its
role eventually enables to reduce the burden of code management.

Now we have OAT_POST_CREATE hooks just after registration of
dependency, basically. It is a simple enough basis, and quite
natural places to assign new security labels.
However, several cases shall be exceptions of the basis, if we
try to check permissions also in the post-creation hooks,
in addition to default labeling.

For example, heap_create_with_catalog() is called from five
places, but only two needs permission checks: DefineRelation()
and OpenIntoRel().
A few cases are not obvious whether we need permission checks
in this invocation, like a code path from make_new_heap().
It defines a new pg_class entry, but the external module cannot
determine from the catalog whether is is invoked on the code
path that needs permission checks, or not.

So, I want OAT_CREATE hooks being just after existing permission
checks for the purpose of access control, not default labeling.


* Permission checks on object-deletion

The existing code put permission checks of object deletion on
command handlers like RemoveRelations(), then it invokes
functions in dependency.c to drop the specified and dependent
objects (if necessary).
I think it is straight-forward to put object-deletion hooks
next to the existing permission checks. But it is unavailable
to check cascaded objects to be removed here.
So, it seems to me findDependentObjects() should be exposed to
external modules to inform what objects shall be dropped.
Unlike old SE-PostgreSQL implementation, I don't consider it
is a good idea to put hook within dependency.c, because we need
to inform dependency.c whether this deletion is by-hand, or
something internals (such as cleanups of temporary objects).


* Permission checks on misc easy implementables

Apart from the priority of development, it seems to me that we can
hook controls at the following commands quite easy. It is an idea
to put hooks that we can implement with little impact around the
existing codes.
 - GRANT/REVOKE
 - COMMENT ON
 - SECURITY LABEL

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Instrument checkpoint sync calls

2010-12-12 Thread Robert Haas
On Sun, Dec 5, 2010 at 4:23 PM, Greg Smith  wrote:
> Jeff Janes wrote:
>>
>> I've attached a tiny patch to apply over yours, to deal with this and
>> with the case where no files are synced.
>>
>
> Thanks for that.  That obvious error eluded me because in most of the patch
> update testing I was doing (on ext3), the longest sync was always about the
> same length as the total sync time.
>
> Attached patch (in correct diff form this time!) collects up all changes.
>  That includes elimination of a potential race condition if someone changes
> log_checkpoints while a long sync phase is executing.  I don't know whether
> that can happen, and it obviously won't give accurate stats going back to
> the beginning of the checkpoint in that case, but it  tries to defend aginst
> producing complete garbage if that value changes out from under it.
>
> This is the first version of this patch I feel fairly good about; no open
> concerns left on my side.  Jeff, since you're now the de-facto credited
> reviewer of this one by virtue of suggesting bug fixes, could you take this
> update out for a spin too?

I took a look at this and it looks generally good, but I'm wondering
why md.c is converting the results from an exact value to a floating
point, only to have xlog.c turn around and convert back to an integer.
 I think it could just return milliseconds directly, or if you're
worried about a checkpoint that takes more than 24 days to complete,
seconds and microseconds.  Or am I missing something?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Nathan Boley
>> The proposed solution is based on contingency tables, built for selected
>> groups of columns (not for each possible group). And the contingency
>> table gives you the ability to estimate the probabilities needed to
>> compute the selectivity. Or am I missing something?
>
> Well, I'm not real familiar with contingency tables, but it seems like
> you could end up needing to store a huge amount of data to get any
> benefit out of it, in some cases.  For example, in the United States,
> there are over 40,000 postal codes, and some even larger number of
> city names, and doesn't the number of entries go as O(m*n)?

Not to mention that the model parameters will be impossible to estimate well.

( I've only scanned the thread, so sorry if Im repeating something
that's already been said )

My intuition is that storing the covariance structure will be
unworkable both technically and statistically for all but the simplest
of cases. That being said, I dont think the problem is a lack of ways
to parameterize the covariance structure ( there are several papers on
mutli-dim histogram estimators, at least one of whichtalks about
databases explicitly, not to mention approaches like CART[1] ) , but a
complete lack of infrastructure to do anything with the estimates. So
keep working on it ;-) ( but try to make the framework general enough
to allow better estimators ).

I wonder if a good first step might be to focus on the AND and OR
operators since they seem like the most common cases and union and
intersection commute ( although it's important to remember that the
estimate variances do NOT )  That is, what about estimating the
selectivity of the condition

WHERE X=A and Y=B

by

f(A,B) = x_1*(selectivity(X = A) + selectivity( Y = B )) +
x_2*selectivity(X = A)*selectivity( Y = B ) + x_3

where x_{1,2,3} are parameters to be estimated from the data.

Another quick note: I think that storing the full contingency table is
wasteful since the marginals are already stored in the single column
statistics. Look at copulas [2] ( FWIW I think that Josh Tolley was
looking at this a couple years back ).

Best,
Nathan

[1] http://en.wikipedia.org/wiki/Classification_and_regression_tree
[2] http://en.wikipedia.org/wiki/Copula_%28statistics%29

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PS display and standby query conflict

2010-12-12 Thread Fujii Masao
On Sat, Dec 11, 2010 at 11:28 PM, Simon Riggs  wrote:
>> This problem happens because ResolveRecoveryConflictWithVirtualXIDs
>> resets PS display for each read-only transactions that recovery
>> waits for. Why do we need to reset that each time even though
>> the conflict has not been resolved yet? The attached patch
>> suppresses such a needless reset. Comments?
>
> The reset occurs at most each 500ms, so not much problem there.
>
> But if it annoys you, it seems OK to change it. Don't see a reason to 
> backpatch though?

I think that It's worth backpatch to prevent users who observe the
occurrence of the query conflicts carefully for testing 9.0 from
getting confusing.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Kevin Grittner
Peter Geoghegan  wrote:
 
> If we followed this behaviour, we wouldn't "let people just
> arbitrarily claim" that a referential condition exists - rather,
> we'd let them assert that it /ought/ to exist, and that it will be
> maintained going forward, and give them the option of verifying
> that assertion at a later time, after which it actually exists.
 
What you outline would be quite valuable to our shop.  Under the
law, the "custodians of the data" are the elected clerks of circuit
court, and under state law and rules of the state supreme court we
can't "clean up" even the most glaring apparent data problems
without the OK of the elected official or his or her designee.  We
have a very complex schema (although no more complex than necessary
to model the reality of the data) with hundreds of foreign key
relationships.
 
For various reasons (conversions from old systems, etc.), these
relationships don't hold on all tables in all county databases.  It
would be desirable to have foreign key definitions define the
intended relationships anyway, and very useful for them to prevent
further data degradation.  For those situations where we get a
business analyst to work with clerk of court staff to clean up
orphaned rows, it would be very slick to be able to run some
statement (like CHECK DATA) to see if the cleanup is complete and
successful and to flag that the constraint is now enforced.
 
So +1 on what Peter outlined as current DB2 features in this regard.
 
-Kevin




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BufFreelistLock

2010-12-12 Thread Jim Nasby
On Dec 10, 2010, at 10:49 AM, Tom Lane wrote:
> Alvaro Herrera  writes:
>> Excerpts from Jeff Janes's message of vie dic 10 12:24:34 -0300 2010:
>>> As far as I can tell, bgwriter never adds things to the freelist.
>>> That is only done at start up, and when a relation or a database is
>>> dropped.  The clock sweep does the vast majority of the work.
> 
>> AFAIU bgwriter runs the clock sweep most of the time (BgBufferSync).
> 
> I think bgwriter just tries to write out dirty buffers so they'll be
> clean when the clock sweep reaches them.  It doesn't try to move them to
> the freelist.

Yeah, it calls SyncOneBuffer which does nothing for the clock sweep.

> There might be some advantage in having it move buffers
> to a freelist that's just protected by a simple spinlock (or at least,
> a lock different from the one that protects the clock sweep).  The
> idea would be that most of the time, backends just need to lock the
> freelist for long enough to take a buffer off it, and don't run clock
> sweep at all.

Yeah, the clock sweep code is very intensive compared to pulling a buffer from 
the freelist, yet AFAICT nothing will run the clock sweep except backends. 
Unless I'm missing something, the free list is practically useless because 
buffers are only put there by InvalidateBuffer, which is only called by 
DropRelFileNodeBuffers and DropDatabaseBuffers. So we make backends queue up 
behind the freelist lock with very little odds of getting a buffer, then we 
make them queue up for the clock sweep lock and make them actually run the 
clock sweep.

BTW, when we moved from 96G to 192G servers I tried increasing shared buffers 
from 8G to 28G and performance went down enough to be noticeable (we don't have 
any good benchmarks, so I cant really quantify the degradation). Going back to 
8G brought performance back up, so it seems like it was the change in shared 
buffers that caused the issue (the larger servers also have 24 cores vs 16). My 
immediate thought was that we needed more lock partitions, but I haven't had 
the chance to see if that helps. ISTM the issue could just as well be due to 
clock sweep suddenly taking over 3x longer than before.

We're working on getting a performance test environment setup, so hopefully in 
a month or two we'd be able to actually run some testing on this.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq changes for synchronous replication

2010-12-12 Thread Fujii Masao
On Sat, Dec 11, 2010 at 11:37 PM, Robert Haas  wrote:
> Committed with just a few changes to the documentation.

Thanks a lot!

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unlogged tables

2010-12-12 Thread Robert Haas
On Sun, Dec 12, 2010 at 9:31 PM, Rob Wultsch  wrote:
> On Fri, Dec 10, 2010 at 5:34 PM, Cédric Villemain
>  wrote:
>> 2010/12/8 Kineticode Billing :
>>> On Dec 8, 2010, at 10:37 AM, Chris Browne wrote:
>>>
 Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.
>>>
>>> EVANESCENT.
>>
>> UNSAFE ?
>>
> 
> MyISAM
> 

Heh.  But that would be corrupt-on-crash, not truncate-on-crash, no?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unlogged tables

2010-12-12 Thread Rob Wultsch
On Fri, Dec 10, 2010 at 5:34 PM, Cédric Villemain
 wrote:
> 2010/12/8 Kineticode Billing :
>> On Dec 8, 2010, at 10:37 AM, Chris Browne wrote:
>>
>>> Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.
>>
>> EVANESCENT.
>
> UNSAFE ?
>

MyISAM




-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Robert Haas
On Sun, Dec 12, 2010 at 9:16 PM, Tomas Vondra  wrote:
> Dne 13.12.2010 03:00, Robert Haas napsal(a):
>> Well, the question is what data you are actually storing.  It's
>> appealing to store a measure of the extent to which a constraint on
>> column X constrains column Y, because you'd only need to store
>> O(ncolumns^2) values, which would be reasonably compact and would
>> potentially handle the zip code problem - a classic "hard case" rather
>> neatly.  But that wouldn't be sufficient to use the above equation,
>> because there A and B need to be things like "column X has value x",
>> and it's not going to be practical to store a complete set of MCVs for
>> column X for each possible value that could appear in column Y.
>
> O(ncolumns^2) values? You mean collecting such stats for each possible
> pair of columns? Well, I meant something different.
>
> The proposed solution is based on contingency tables, built for selected
> groups of columns (not for each possible group). And the contingency
> table gives you the ability to estimate the probabilities needed to
> compute the selectivity. Or am I missing something?

Well, I'm not real familiar with contingency tables, but it seems like
you could end up needing to store a huge amount of data to get any
benefit out of it, in some cases.  For example, in the United States,
there are over 40,000 postal codes, and some even larger number of
city names, and doesn't the number of entries go as O(m*n)?  Now maybe
this is useful enough anyway that we should Just Do It, but it'd be a
lot cooler if we could find a way to give the planner a meaningful
clue out of some more compact representation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Peter Geoghegan
I wouldn't like to comment on whether or not Simon has correctly
interpreted the words of the SQL standards committee, because
standards committees sometimes word things in an intentionally
ambiguous way to placate different interests, and because it seems
fairly inconsequential in this case. IMHO this is a useful feature
that should be pursued.

There is another precedent that no one mentioned - DB2. From their docs:

"You can add a foreign key with the NOT ENFORCED option to create an
informational referential constraint. This action does not leave the
table space in CHECK-pending status, and you do not need to execute
CHECK DATA."

I understand that DB2's informational referential constraints won't
ever be enforced (they just show intent, which is useful to their
planner), so this isn't really the same thing. However, DB2 apparently
doesn't initially enforce referential integrity when an FK is created
on a table with existing data, without any special syntax on the
CREATE:

"DB2 does not validate the data when you add the foreign key. Instead,
if the table is populatedthe table space that contains the table
is placed in CHECK-pending status, just as if it had been loaded with
ENFORCE NO. In this case, you need to execute the CHECK DATA utility
to clear the CHECK-pending status."

If I am not mistaken, this is almost exactly the behaviour described
by Simon, because referential integrity is, presumably, enforced after
the FK is created, but before the CHECK DATA utility is optionally run
to ensure that we actually have referential integrity at a later time.
I believe that Simon's proposal is essentially sound. I don't know why
CHECK DATA operates at the tablespace granularity rather than the FK
granularity - IANADB2U.

If we followed this behaviour, we wouldn't "let people just
arbitrarily claim" that a referential condition exists - rather, we'd
let them assert that it /ought/ to exist, and that it will be
maintained going forward, and give them the option of verifying that
assertion at a later time, after which it actually exists.
Unfortunately, this refinement of Simon's proposal would probably
entail adding an additional column to pg_constraint.

-- 
Regards,
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Andrew Dunstan



On 12/12/2010 08:27 PM, Rob Wultsch wrote:


MySQL does in fact have this feature and it is used by mysqldump. This
feature is very useful.



The trouble is that FK's have more than one use. In particular, they 
have a documentary use that's used by tools that analyze databases, as 
well as by tools like htsql. They also have a role as an enforced 
constraint.


In fact it's possible now to disable FK enforcement, by disabling the 
triggers. It's definitely a footgun though. Just the other day I was 
asked how data violating the constraint could have got into the table, 
and caused some surprise by demonstrating how easy this was to produce.


So what would actually be an advance in my view would be a mechanism 
that allowed explicit disabling of a constraint but ensured that it was 
not violated when re-enabling it.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Tomas Vondra
Dne 13.12.2010 03:00, Robert Haas napsal(a):
> Well, the question is what data you are actually storing.  It's
> appealing to store a measure of the extent to which a constraint on
> column X constrains column Y, because you'd only need to store
> O(ncolumns^2) values, which would be reasonably compact and would
> potentially handle the zip code problem - a classic "hard case" rather
> neatly.  But that wouldn't be sufficient to use the above equation,
> because there A and B need to be things like "column X has value x",
> and it's not going to be practical to store a complete set of MCVs for
> column X for each possible value that could appear in column Y.

O(ncolumns^2) values? You mean collecting such stats for each possible
pair of columns? Well, I meant something different.

The proposed solution is based on contingency tables, built for selected
groups of columns (not for each possible group). And the contingency
table gives you the ability to estimate the probabilities needed to
compute the selectivity. Or am I missing something?

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Tomas Vondra
> P(A|B) = P(A and B) / P(B).

Well, until this point we've discussed failure cases involving 'AND'
conditions. What about 'OR' conditions? I think the current optimizer
computes the selectivity as 's1+s2 - s1*s2' (at least that's what I
found in backend/optimizer/path/clausesel.c:630).

Sometimes that may return nearly 2x the actual selectivity, but in
general it's a reasonable estimate. Are there any severe failure cases
that produce much worse estimates?

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Robert Haas
On Sun, Dec 12, 2010 at 8:46 PM, Tomas Vondra  wrote:
> Dne 13.12.2010 01:05, Robert Haas napsal(a):
>> This is a good idea, but I guess the question is what you do next.  If
>> you know that the "applicability" is 100%, you can disregard the
>> restriction clause on the implied column.  And if it has no
>> implicatory power, then you just do what we do now.  But what if it
>> has some intermediate degree of implicability?
>
> Well, I think you've missed the e-mail from Florian Pflug - he actually
> pointed out that the 'implicativeness' Heikki mentioned is called
> conditional probability. And conditional probability can be used to
> express the "AND" probability we are looking for (selectiveness).
>
> For two columns, this is actually pretty straighforward - as Florian
> wrote, the equation is
>
>   P(A and B) = P(A|B) * P(B) = P(B|A) * P(A)

Well, the question is what data you are actually storing.  It's
appealing to store a measure of the extent to which a constraint on
column X constrains column Y, because you'd only need to store
O(ncolumns^2) values, which would be reasonably compact and would
potentially handle the zip code problem - a classic "hard case" rather
neatly.  But that wouldn't be sufficient to use the above equation,
because there A and B need to be things like "column X has value x",
and it's not going to be practical to store a complete set of MCVs for
column X for each possible value that could appear in column Y.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Tomas Vondra
Dne 13.12.2010 01:05, Robert Haas napsal(a):
> This is a good idea, but I guess the question is what you do next.  If
> you know that the "applicability" is 100%, you can disregard the
> restriction clause on the implied column.  And if it has no
> implicatory power, then you just do what we do now.  But what if it
> has some intermediate degree of implicability?

Well, I think you've missed the e-mail from Florian Pflug - he actually
pointed out that the 'implicativeness' Heikki mentioned is called
conditional probability. And conditional probability can be used to
express the "AND" probability we are looking for (selectiveness).

For two columns, this is actually pretty straighforward - as Florian
wrote, the equation is

   P(A and B) = P(A|B) * P(B) = P(B|A) * P(A)

where P(B) may be estimated from the current histogram, and P(A|B) may
be estimated from the contingency (see the previous mails). And "P(A and
B)" is actually the value we're looking for.

Anyway there really is no "intermediate" degree of aplicability, it just
gives you the right estimate.

And AFAIR this is easily extensible to more than two columns, as

  P(A and B and C) = P(A and (B and C)) = P(A|(B and C)) * P(B and C)

so it's basically a recursion.

Well, I hope my statements are really correct - it's been a few years
since I gained my degree in statistics ;-)

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Rob Wultsch
On Sun, Dec 12, 2010 at 4:49 PM, Robert Haas  wrote:
> On Sun, Dec 12, 2010 at 6:20 PM, Tom Lane  wrote:
>> Simon Riggs  writes:
>>> On Sun, 2010-12-12 at 17:57 -0500, Tom Lane wrote:
 Huh?  It allows you to postpone the check until commit.  That's far from
 not enforcing it.
>>
>>> This clearly implies that un-enforced constraints are not checked at
>>> commit.
>>
>> [ shrug... ]  I can't argue with you about what may or may not be in an
>> unpublished draft of an unratified version of the standard, since I
>> don't have a copy.  But allow me to harbor doubts that they really
>> intend to allow someone to force a constraint to be considered valid
>> without any verification.  This proposal strikes me as something mysql
>> would do, not the standards committee.  (In particular, can a constraint
>> go from not-enforced to enforced state without getting checked at that
>> time?)
>>
>> Even if you're reading the draft correctly, and the wording makes it
>> into a released standard, the implementation you propose would break our
>> code.  The incremental FK checks are designed on the assumption that the
>> constraint condition held before; they aren't likely to behave very
>> sanely if the data is bad.  I'd want to see a whole lot more analysis of
>> the resulting behavior before even considering an idea like this.
>
> Wow, you've managed to bash Simon, MySQL, and the SQL standards
> committee all in one email.
>
> I'm not going to argue that careful analysis isn't needed before doing
> something like this - and, in particular, if we ever get inner-join
> removal, which I'm still hoping to do at some point, a foreign key
> that isn't actually guaranteed to be valid might result in queries
> returning different answers depending on whether or not a join is
> removed.  I guess we'd have to define that as the user's problem for
> alleging a foreign-key relationship that doesn't truly exist.  On the
> other hand, there's clearly also a use case for this behavior.  If a
> bulk load of prevalidated data forces an expensive revalidation of
> constraints that are already known to hold, there's a real chance the
> DBA will be backed into a corner where he simply has no choice but to
> not use foreign keys, even though he might really want to validate the
> foreign-key relationships on a going-forward basis.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

MySQL does in fact have this feature and it is used by mysqldump. This
feature is very useful.

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with pg_upgrade (8.4 -> 9.0) due to ALTER DATABASE SET ROLE

2010-12-12 Thread Florian Pflug
On Dec13, 2010, at 00:16 , Robert Haas wrote:
> And in fact it strikes me that we might not have much choice about how
> to fix this.  I think we are not going to retroactively change the
> behavior of ALTER DATABASE .. SET ROLE in a released version, but yet
> we do, I think, want to make pg_upgrade work.


A simple fix is to teach pg_upgrade to issue "RESET SESSION AUTHORIZATION" 
immediately after connecting to a database. I don't see any downside of this 
currently - it seems that the only case where this wouldn't be a NO-OP is if 
someone set ROLE to to something else either per-database, per-user or both.

Actually, I'd like to provide an option for pg_dump and pg_restore to do that 
too (not by default, though). If people think this is a good idea, I could come 
up with a patch.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Robert Haas
On Sun, Dec 12, 2010 at 7:07 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> ...  On the
>> other hand, there's clearly also a use case for this behavior.  If a
>> bulk load of prevalidated data forces an expensive revalidation of
>> constraints that are already known to hold, there's a real chance the
>> DBA will be backed into a corner where he simply has no choice but to
>> not use foreign keys, even though he might really want to validate the
>> foreign-key relationships on a going-forward basis.
>
> There may well be a case to be made for doing this on grounds of
> practical usefulness.  I'm just voicing extreme skepticism that it can
> be supported by reference to the standard.

Dunno, I haven't read it either.  But it does seem like the natural
interpretation of "NOT ENFORCED".

> Personally I'd prefer to see us look into whether we couldn't arrange
> for low-impact establishment of a verified FK relationship, analogous to
> CREATE INDEX CONCURRENTLY.  We don't let people just arbitrarily claim
> that a uniqueness condition exists, and ISTM that if we can handle that
> case we probably ought to be able to handle FK checking similarly.

That'd be useful, too, but I don't think it would remove the use case
for skipping the check altogether.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Tom Lane
Robert Haas  writes:
> ...  On the
> other hand, there's clearly also a use case for this behavior.  If a
> bulk load of prevalidated data forces an expensive revalidation of
> constraints that are already known to hold, there's a real chance the
> DBA will be backed into a corner where he simply has no choice but to
> not use foreign keys, even though he might really want to validate the
> foreign-key relationships on a going-forward basis.

There may well be a case to be made for doing this on grounds of
practical usefulness.  I'm just voicing extreme skepticism that it can
be supported by reference to the standard.

Personally I'd prefer to see us look into whether we couldn't arrange
for low-impact establishment of a verified FK relationship, analogous to
CREATE INDEX CONCURRENTLY.  We don't let people just arbitrarily claim
that a uniqueness condition exists, and ISTM that if we can handle that
case we probably ought to be able to handle FK checking similarly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Robert Haas
On Sun, Dec 12, 2010 at 9:43 AM, Heikki Linnakangas
 wrote:
> The way I think of that problem is that once you know the postcode, knowing
> the city name doesn't add any information. The postcode implies the city
> name. So the selectivity for "postcode = ? AND city = ?" should be the
> selectivity of "postcode = ?" alone. The measurement we need is
> "implicativeness": How strongly does column A imply a certain value for
> column B. Perhaps that could be measured by counting the number of distinct
> values of column B for each value of column A, or something like that. I
> don't know what the statisticians call that property, or if there's some
> existing theory on how to measure that from a sample.

This is a good idea, but I guess the question is what you do next.  If
you know that the "applicability" is 100%, you can disregard the
restriction clause on the implied column.  And if it has no
implicatory power, then you just do what we do now.  But what if it
has some intermediate degree of implicability?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Robert Haas
On Sun, Dec 12, 2010 at 6:20 PM, Tom Lane  wrote:
> Simon Riggs  writes:
>> On Sun, 2010-12-12 at 17:57 -0500, Tom Lane wrote:
>>> Huh?  It allows you to postpone the check until commit.  That's far from
>>> not enforcing it.
>
>> This clearly implies that un-enforced constraints are not checked at
>> commit.
>
> [ shrug... ]  I can't argue with you about what may or may not be in an
> unpublished draft of an unratified version of the standard, since I
> don't have a copy.  But allow me to harbor doubts that they really
> intend to allow someone to force a constraint to be considered valid
> without any verification.  This proposal strikes me as something mysql
> would do, not the standards committee.  (In particular, can a constraint
> go from not-enforced to enforced state without getting checked at that
> time?)
>
> Even if you're reading the draft correctly, and the wording makes it
> into a released standard, the implementation you propose would break our
> code.  The incremental FK checks are designed on the assumption that the
> constraint condition held before; they aren't likely to behave very
> sanely if the data is bad.  I'd want to see a whole lot more analysis of
> the resulting behavior before even considering an idea like this.

Wow, you've managed to bash Simon, MySQL, and the SQL standards
committee all in one email.

I'm not going to argue that careful analysis isn't needed before doing
something like this - and, in particular, if we ever get inner-join
removal, which I'm still hoping to do at some point, a foreign key
that isn't actually guaranteed to be valid might result in queries
returning different answers depending on whether or not a join is
removed.  I guess we'd have to define that as the user's problem for
alleging a foreign-key relationship that doesn't truly exist.  On the
other hand, there's clearly also a use case for this behavior.  If a
bulk load of prevalidated data forces an expensive revalidation of
constraints that are already known to hold, there's a real chance the
DBA will be backed into a corner where he simply has no choice but to
not use foreign keys, even though he might really want to validate the
foreign-key relationships on a going-forward basis.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Tom Lane
Simon Riggs  writes:
> On Sun, 2010-12-12 at 17:57 -0500, Tom Lane wrote:
>> Huh?  It allows you to postpone the check until commit.  That's far from
>> not enforcing it.

> This clearly implies that un-enforced constraints are not checked at
> commit.

[ shrug... ]  I can't argue with you about what may or may not be in an
unpublished draft of an unratified version of the standard, since I
don't have a copy.  But allow me to harbor doubts that they really
intend to allow someone to force a constraint to be considered valid
without any verification.  This proposal strikes me as something mysql
would do, not the standards committee.  (In particular, can a constraint
go from not-enforced to enforced state without getting checked at that
time?)

Even if you're reading the draft correctly, and the wording makes it
into a released standard, the implementation you propose would break our
code.  The incremental FK checks are designed on the assumption that the
constraint condition held before; they aren't likely to behave very
sanely if the data is bad.  I'd want to see a whole lot more analysis of
the resulting behavior before even considering an idea like this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with pg_upgrade (8.4 -> 9.0) due to ALTER DATABASE SET ROLE

2010-12-12 Thread Robert Haas
On Sun, Dec 12, 2010 at 11:01 AM, Tom Lane  wrote:
> This is about like arguing that pg_dump and pg_upgrade should still work
> after you've done "delete from pg_proc;".  Superusers are assumed to
> know what they're doing and not break fundamental operations.

No, it isn't like that at all.  You've made that argument in the past,
and it carries no water with me at all.  There's no help for the fact
that direct modification of the system catalog contents can
fundamentally break things, but DDL commands should not.  I'm willing
to reserve judgment on whether ALTER DATABASE .. SET ROLE should be
disallowed, or whether it should be made to not break things, but
blaming the DBA for shooting himself with the loaded foot-gun we
thoughtfully provided is unreasonable.

And in fact it strikes me that we might not have much choice about how
to fix this.  I think we are not going to retroactively change the
behavior of ALTER DATABASE .. SET ROLE in a released version, but yet
we do, I think, want to make pg_upgrade work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] create tablespace fails silently, or succeeds improperly

2010-12-12 Thread Tom Lane
Peter Eisentraut  writes:
> On mån, 2010-10-18 at 15:50 -0400, Tom Lane wrote:
>> Basically, I'm thinking that given CREATE TABLESPACE LOCATION '/foo/bar'
>> the creation and properties of /foo/bar/PG_9.0_201004261 ought to be
>> handled *exactly* the way that the -D target directory of initdb is.
>> We have more than ten years experience behind the assertion that we're
>> dealing with that case in a good way.  We should transfer that
>> behavior over to tablespace directories rather than inventing
>> something that works a shade differently.

> I'm still struggling with the above argument.  In one case you are
> applying a behavior to the argument given to initdb, in the other case
> you are applying the behavior to a subdirectory of the argument given to
> CREATE TABLESPACE.  I'm not saying the solution is necessarily wrong,
> but it doesn't seem that this will make things easier or more
> consistent.

Well, it is only an argument by analogy, but the proposal does fix the
IMO-clear misbehavior complained of way back at the start of this
thread.

> An idle thought: How about creating a version-subdirectory also in the
> PGDATA path.  The point about mountpoint annoyance applies here just as
> well.  And it could also make the directory juggling during in-place
> upgrade more normalized and robust.

I can't get excited about it.  That would break every existing tool that
looks into PGDATA, for a fairly marginal simplification during version
upgrades.  To give just one example of the pain we'd be letting
ourselves in for, pg_ctl would now become extremely version-specific.
You couldn't even get away with using the wrong copy of pg_ctl during a
reinstall after a catversion bump during development.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Simon Riggs
On Sun, 2010-12-12 at 17:57 -0500, Tom Lane wrote:
> Simon Riggs  writes:
> > The new SQL Standard (SQL:2011) contains this:
> > "Table constraints are either enforced or not enforced. Domain
> > constraints and assertions are always enforced.", 4.17.2
> 
> > The SQL Standard allows you to turn the checking on and off for CHECK
> > constraints, UNIQUE constraints and FOREIGN KEYS.
> 
> Huh?  It allows you to postpone the check until commit.  That's far from
> not enforcing it.

"When a  is executed, all enforced constraints are
effectively checked and, if any enforced
constraint is not satisfied, then an exception condition is raised and
the SQL-transaction is terminated by an
implicit ."

This clearly implies that un-enforced constraints are not checked at
commit.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Tom Lane
Simon Riggs  writes:
> The new SQL Standard (SQL:2011) contains this:
> "Table constraints are either enforced or not enforced. Domain
> constraints and assertions are always enforced.", 4.17.2

> The SQL Standard allows you to turn the checking on and off for CHECK
> constraints, UNIQUE constraints and FOREIGN KEYS.

Huh?  It allows you to postpone the check until commit.  That's far from
not enforcing it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Simon Riggs

The new SQL Standard (SQL:2011) contains this:
"Table constraints are either enforced or not enforced. Domain
constraints and assertions are always enforced.", 4.17.2

The SQL Standard allows you to turn the checking on and off for CHECK
constraints, UNIQUE constraints and FOREIGN KEYS.

Which of those make sense for us, if any? The ability to create FKs
without checking all the data has been frequently requested to me over
many years. OTOH, I can't really see any point in turning on/off all of
the other aspects mentioned by the SQL Standard, especially indexes.
It's lots of work and seems likely to end with poorer data quality. And
the obvious thing is if you don't want a CHECK constraint, just drop
it...

My proposal is that we add a short and simple clause NOT ENFORCED onto
the ADD constraint syntax. So we have

ALTER TABLE foo
ADD FOREIGN KEY  NOT ENFORCED;

The "enforced" state is not persisted - once added the FK is checked
every time. So there is no additional column on pg_constraint.

The benefit here is that we implement a capability that allows skipping
very long running SQL statements when required, and doesn't require too
much code. It has been discussed before on hackers, but that was before
it was part of the SQL Standard. Oracle has had this for years and it is
popular feature. We can expect other RDBMS to implement this feature,
now it is part of the standard.

If you want more than my good-bits-only proposal, it really isn't going
to happen for 9.1, and seems pretty pointless anyway.

Very short hack to implement this attached for discussion. No tests, not
even a compile - just showing how quick a patch this can be.

Thoughts? Alternative syntax?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 784feae..ecadcbd 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -42,7 +42,7 @@ ALTER TABLE name
 ALTER [ COLUMN ] column SET ( attribute_option = value [, ... ] )
 ALTER [ COLUMN ] column RESET ( attribute_option [, ... ] )
 ALTER [ COLUMN ] column SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }
-ADD table_constraint
+ADD table_constraint [ ENFORCED | NOT ENFORCED ]
 DROP CONSTRAINT [ IF EXISTS ]  constraint_name [ RESTRICT | CASCADE ]
 DISABLE TRIGGER [ trigger_name | ALL | USER ]
 ENABLE TRIGGER [ trigger_name | ALL | USER ]
@@ -220,11 +220,13 @@ ALTER TABLE name

 

-ADD table_constraint
+ADD table_constraint [ ENFORCED | NOT ENFORCED ]
 
  
   This form adds a new constraint to a table using the same syntax as
-  .
+  . Newly added constraints can be defined
+  as NOT ENFORCED, rather than the default setting
+  ENFORCED.
  
 

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 937992b..3cacad0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -304,13 +304,13 @@ static void ATExecAddIndex(AlteredTableInfo *tab, Relation rel,
 			   IndexStmt *stmt, bool is_rebuild, LOCKMODE lockmode);
 static void ATExecAddConstraint(List **wqueue,
 	AlteredTableInfo *tab, Relation rel,
-	Constraint *newConstraint, bool recurse, LOCKMODE lockmode);
+	Constraint *newConstraint, bool recurse, LOCKMODE lockmode, bool enforced);
 static void ATAddCheckConstraint(List **wqueue,
 	 AlteredTableInfo *tab, Relation rel,
 	 Constraint *constr,
 	 bool recurse, bool recursing, LOCKMODE lockmode);
 static void ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel,
-		  Constraint *fkconstraint, LOCKMODE lockmode);
+		  Constraint *fkconstraint, LOCKMODE lockmode, bool enforced);
 static void ATExecDropConstraint(Relation rel, const char *constrName,
 	 DropBehavior behavior,
 	 bool recurse, bool recursing,
@@ -2970,11 +2970,11 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 			break;
 		case AT_AddConstraint:	/* ADD CONSTRAINT */
 			ATExecAddConstraint(wqueue, tab, rel, (Constraint *) cmd->def,
-false, lockmode);
+false, lockmode, cmd->enforced);
 			break;
 		case AT_AddConstraintRecurse:	/* ADD CONSTRAINT with recursion */
 			ATExecAddConstraint(wqueue, tab, rel, (Constraint *) cmd->def,
-true, lockmode);
+true, lockmode, cmd->enforced);
 			break;
 		case AT_DropConstraint:	/* DROP CONSTRAINT */
 			ATExecDropConstraint(rel, cmd->name, cmd->behavior,
@@ -4914,7 +4914,7 @@ ATExecAddIndex(AlteredTableInfo *tab, Relation rel,
  */
 static void
 ATExecAddConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel,
-	Constraint *newConstraint, bool recurse, LOCKMODE lockmode)
+	Constraint *newConstraint, bool recurse, LOCKMODE lockmode, bool enforced)
 {
 	Assert(IsA(newConstraint, Constraint));
 
@@ -4958,7 +4958,7 @@ ATExecAddConstrain

Re: [HACKERS] Per-column collation

2010-12-12 Thread Peter Eisentraut
On mån, 2010-12-06 at 21:26 +0200, Peter Eisentraut wrote:
> 
> > * contrib/citext raises an encoding error when COLLATE is specified
> > even if it is the collation as same as the database default.
> > We might need some special treatment for C locale.
> > =# SHOW lc_collate;  ==> C
> > =# SELECT ('A'::citext) = ('a'::citext);  ==> false
> > =# SELECT ('A'::citext) = ('a'::citext) COLLATE "C";
> > ERROR:  invalid multibyte character for locale
> > HINT:  The server's LC_CTYPE locale is probably incompatible with
> the
> > database encoding.
> 
> OK, I can reproduce that.  That's fallout from the lc_ctype_is_c()
> optimization that I removed, as explained in another email.  I'll have
> to think about that again.

This is fixed in the 20101213 patch I'm about to send out.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation

2010-12-12 Thread Peter Eisentraut
On tis, 2010-12-07 at 11:46 +0900, Itagaki Takahiro wrote:
> On Sun, Dec 5, 2010 at 01:04, Peter Eisentraut  wrote:
> > Here is an updated patch to address the issues discussed during this
> > commitfest.
> 
> I found another issue in the patch; ILIKE in WHERE clause doesn't work.
> It was surprising because LIKE in WHERE clause and ILIKE in SELECT list
> works expectedly.
>  - SELECT * FROM pg_class WHERE relname LIKE 'pg%'
>  - SELECT relname ILIKE 'pg%' FROM pg_class;
> 
> 
> postgres=# SELECT name, setting FROM pg_settings
>  WHERE name IN ('lc_ctype', 'lc_collate', 'server_encoding');
>   name   | setting
> -+-
>  lc_collate  | C
>  lc_ctype| C
>  server_encoding | UTF8
> (3 rows)
> 
> postgres=# SELECT * FROM pg_class WHERE relname ILIKE 'pg%';
> ERROR:  no collation was derived

This is fixed in the 20101213 patch I'm about to send out.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] create tablespace fails silently, or succeeds improperly

2010-12-12 Thread Peter Eisentraut
On mån, 2010-10-18 at 15:50 -0400, Tom Lane wrote:
> Yeah.  We have gotten complaints in the past from people who tried to
> specify a mount point as a tablespace, and it failed because of
> lost+found or the mount dir being root-owned.  We've told them to make
> a subdirectory, but that always seemed like a workaround.  With the
> new layout there's no longer any strong reason to prevent this case
> from working.
> 
> Basically, I'm thinking that given CREATE TABLESPACE LOCATION
> '/foo/bar'
> the creation and properties of /foo/bar/PG_9.0_201004261 ought to be
> handled *exactly* the way that the -D target directory of initdb is.
> We have more than ten years experience behind the assertion that we're
> dealing with that case in a good way.  We should transfer that
> behavior over to tablespace directories rather than inventing
> something that works a shade differently.

I'm still struggling with the above argument.  In one case you are
applying a behavior to the argument given to initdb, in the other case
you are applying the behavior to a subdirectory of the argument given to
CREATE TABLESPACE.  I'm not saying the solution is necessarily wrong,
but it doesn't seem that this will make things easier or more
consistent.

An idle thought: How about creating a version-subdirectory also in the
PGDATA path.  The point about mountpoint annoyance applies here just as
well.  And it could also make the directory juggling during in-place
upgrade more normalized and robust.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wildcard search support for pg_trgm

2010-12-12 Thread Alexander Korotkov
On Mon, Dec 13, 2010 at 12:14 AM, Dimitri Fontaine
wrote:

> How different (and better) is it from wildspeed?
>

The general advantage is possibility of usage wildcard search and trigram
similarity search using the same index.
I expect that GIN trigram index is slightly less space demanding, but
slightly slower on search than wildspeed. Also, I expect GiST trigram index
to be slower on search, but faster on updates. While I didn't check these
assumptions in details.
I've lack of test datasets for sufficient testing, and I would like to ask
community to help me with it. Because testing on dictionaries is good,
but obviously
not enough.


With best regards,
Alexander Korotkov.


Re: [HACKERS] Extensions, patch v17

2010-12-12 Thread David E. Wheeler
On Dec 12, 2010, at 12:50 PM, Dimitri Fontaine wrote:

> The only item with still some work to be done on it is the regression
> tests support: we're not aiming to full coverage is my understanding,
> and installing contribs goes a long way towards testing extensions. Do
> we want more? If so, please detail what exactly.

At least those things not exercised by the contrib modules.

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - core functionality

2010-12-12 Thread Peter Eisentraut
On ons, 2010-12-01 at 12:30 +0900, Hitoshi Harada wrote:
> I've tried SQL/MED with postgresql_fdw last night, and found myself
> confusing the long setup procedure. A simplest way to use it AFAIK is:
> 
> 1.CREATE FOREIGN DATA WRAPPER ... (or run install sql script)
> 2.CREATE SERVER ... FOREIGN DATA WRAPPER ...
> 3.CREATE USER MAPPING FOR ...
> 4.CREATE FOREIGN TALBE( ... )
> 
> From a user's view, this is very long way to see a simplest foreign
> table. I know it is based on the standard, but I really want a
> shortcut. Especially, I don't understand why CREATE USER MAPPING FOR
> current_user SERVER  is needed for default use case. If you
> forget CREATE USER MAPPING and do CREATE FOREIGN TABLE, it raises an
> error. User mapping is useful if the local user and remote user should
> be mapped but I imagine in most cases they are the same.
> postgresql_fdw can tell the remote user by conninfo string, in
> addition.

I reviewed the standard about this, and a lot of things are
implementation-defined.  I think user mappings could be made optional.

> This is another topic, but it would be useful if CREATE FOREIGN TABLE
> can omit column definitions since fdw usually knows what should be
> there in the definitions. I some times mistyped the column names
> between remote and local and resulted in fail on execution.

Also, according to the standard, the column list in CREATE FOREIGN TABLE
is optional (if you can get it in some automatic way, of course).



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wildcard search support for pg_trgm

2010-12-12 Thread Dimitri Fontaine
Alexander Korotkov  writes:
> Here is first version of patch, which enable index support of wildcard
> search in pg_trgm contrib module.

How different (and better) is it from wildspeed?

  http://www.sai.msu.su/~megera/wiki/wildspeed

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unlogged tables

2010-12-12 Thread Dimitri Fontaine
Cédric Villemain  writes:
> 2010/12/8 Kineticode Billing :
>> On Dec 8, 2010, at 10:37 AM, Chris Browne wrote:
>>
>>> Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.

I kind of like TRANSIENT, but that's only because it's a property I've
been working with in some other systems 

  http://www.erlang.org/doc/design_principles/sup_princ.html

Restart = permanent | transient | temporary

Restart defines when a terminated child process should be restarted.

A permanent child process is always restarted.
A temporary child process is never restarted.
A transient child process is restarted only if it terminates abnormally, 
i.e. with another exit reason than normal.
 
>> EVANESCENT.
>
> UNSAFE ?

What about NOT PERSISTENT ? Then we would have two flavours of them,
that's NOT PERSISTENT ON RESTART TRUNCATE or ON RESTART FLUSH, I guess?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Make S_IRGRP etc available in mingw builds as well as MSVC.

2010-12-12 Thread Tom Lane
Magnus Hagander  writes:
> On Sun, Dec 12, 2010 at 19:54, Tom Lane  wrote:
>> It's a fair question.  We could clean up some of these messy ifdefs
>> if we dropped support for that combination.  I assume that an MSVC-built
>> libpq.dll would still work for Borland users, no?

> The dynamic one, yes. Static linked one, no.

> IIRC I suggest desupporting it every now and then and get voted down
> ;) And quite often we have someone showing up around the x.y.2 release
> to clean it up so it works again...

Yeah, a look in the commit logs shows that happening about once a year.
It'd sure be nice if we had a less haphazard process for it though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Make S_IRGRP etc available in mingw builds as well as MSVC.

2010-12-12 Thread Magnus Hagander
On Sun, Dec 12, 2010 at 19:54, Tom Lane  wrote:
> Andrew Dunstan  writes:
>> On 12/12/2010 01:43 PM, Tom Lane wrote:
>>> (Hm, I wonder whether BCC defines them either...)
>
>> Is anyone building the client stuff with BCC any more? I don't recall
>> having heard of anyone doing so for quite some years.
>
> It's a fair question.  We could clean up some of these messy ifdefs
> if we dropped support for that combination.  I assume that an MSVC-built
> libpq.dll would still work for Borland users, no?

The dynamic one, yes. Static linked one, no.

IIRC I suggest desupporting it every now and then and get voted down
;) And quite often we have someone showing up around the x.y.2 release
to clean it up so it works again...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Make S_IRGRP etc available in mingw builds as well as MSVC.

2010-12-12 Thread Tom Lane
Andrew Dunstan  writes:
> On 12/12/2010 01:43 PM, Tom Lane wrote:
>> (Hm, I wonder whether BCC defines them either...)

> Is anyone building the client stuff with BCC any more? I don't recall 
> having heard of anyone doing so for quite some years.

It's a fair question.  We could clean up some of these messy ifdefs
if we dropped support for that combination.  I assume that an MSVC-built
libpq.dll would still work for Borland users, no?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Make S_IRGRP etc available in mingw builds as well as MSVC.

2010-12-12 Thread Andrew Dunstan



On 12/12/2010 01:43 PM, Tom Lane wrote:

Make S_IRGRP etc available in mingw builds as well as MSVC.

(Hm, I wonder whether BCC defines them either...)




Is anyone building the client stuff with BCC any more? I don't recall 
having heard of anyone doing so for quite some years.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Use symbolic names not octal constants for file permission flags

2010-12-12 Thread Tom Lane
Andrew Dunstan  writes:
> On 12/12/2010 11:16 AM, Tom Lane wrote:
>> I put in #define's for these, and it seems to have fixed the MSVC
>> buildfarm members, but cygwin is still broken.  How come ... doesn't
>> that port use port/win32.h?

> ITYM Mingw. And yes, it does use port/win32.h; narwhal's log says:
> config.status: linking src/include/port/win32.h to 
> src/include/pg_config_os.h

Oh, I guess the point is that WIN32_ONLY_COMPILER doesn't get defined,
and that block of file-permission-bit #defines is nested inside
#ifdef WIN32_ONLY_COMPILER.

So apparently the issue is that the mingw headers provide some but not
all of those symbols.  Which have they got, and how are they defined?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Use symbolic names not octal constants for file permission flags

2010-12-12 Thread Andrew Dunstan



On 12/12/2010 11:16 AM, Tom Lane wrote:


I put in #define's for these, and it seems to have fixed the MSVC
buildfarm members, but cygwin is still broken.  How come ... doesn't
that port use port/win32.h?



ITYM Mingw. And yes, it does use port/win32.h; narwhal's log says:

   config.status: linking src/include/port/win32.h to src/include/pg_config_os.h


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with pg_upgrade (8.4 -> 9.0) due to ALTER DATABASE SET ROLE

2010-12-12 Thread Florian Pflug
On Dec12, 2010, at 17:01 , Tom Lane wrote:
> Florian Pflug  writes:
>> pg_upgrade aborted during the step "Adding support functions to new cluster" 
>> with "ERROR:  permission denied for language c" error. Unfortunately, the 
>> log didn't include the name of the database where the error occurred, so it 
>> took me a while to figure out that the culprit was a "ALTER DATABASE SET 
>> ROLE = " I had done on one of my databases, which effectively 
>> prevented pg_upgrade from connection with superuser privileges.
> 
> That seems like a pretty stupid thing to have done; it would prevent
> *any* connection to that database with superuser privileges, no?

I had two developers working with that database who regularly modify the 
schema, often creating new objects (it's a development machine). They both were 
annoyed that if one of them created a table, he'd be the owner and some 
operations on that table would be restricted to him and superusers. The "ALTER 
DATABASE SET ROLE" fixes that nicely for me. When I needed to work as a 
superuser with that database, I simply did "SET ROLE " to 
restore my superuser powers.

Nowadays, I could probably do the "SET ROLE" just for some specific combination 
of user and database. That option, however, wasn't there at the time I did the 
"ALTER DATABASE SET ROLE".

>> While one could argue that this behaviour is perfectly consistent, I believe 
>> most users will expect pg_upgrade (and to a lesser extent also pg_dump and 
>> pg_restore) to be unaffected by such settings.
> 
> This is about like arguing that pg_dump and pg_upgrade should still work
> after you've done "delete from pg_proc;".  Superusers are assumed to
> know what they're doing and not break fundamental operations.

Sure. If you believe in proof by exaggeration, which I don't.

The way I see it, how is a DBA supposed to know that setting a per-database 
ROLE is a bad idea, but per-database settings for other GUCs are fine.
For example, what about
  synchronous_commit=off
  vacuum_freeze_min_age
  datestyle
  sql_inheritance
  standard_conforming_strings
  array_nulls
  default_with_oids
  ...

Without checking the code, all of these have about the same chance of breaking 
pg_upgrade. But then, by your line of reasoning, "ALTER DATABASE SET ROLE" 
shouldn't haven been invented in the first place. Which maybe even true, but 
it's too late for that. So the next best thing, IMHO, is to give superusers a 
way to avoid the hazard it poses.
 
> I'm thinking that if there's anything we should forbid here, it's the
> ALTER ... SET itself.  In particular, some experimentation suggests that
> a non-superuser database owner can do it:
> 
> regression=# create user joe;
> CREATE ROLE
> regression=# create database joe with owner joe;
> CREATE DATABASE
> regression=# \c joe joe
> You are now connected to database "joe" as user "joe".
> joe=> alter database joe set role joe;
> ALTER DATABASE
> 
> which seems to me at least a bad idea and arguably a security hazard.
I'm sorry, I don't see that security hazard there. Care to explain?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] function attributes

2010-12-12 Thread Tom Lane
Andrew Dunstan  writes:
> On 12/12/2010 10:43 AM, Tom Lane wrote:
>>> At the moment the behaviour is triggered by a custom setting
>>> (plperl.pass_binary_bytea), but this isn't really satisfactory.

>> I do not want to go there.

> But the real issue is that we have no way of specifying properties for a 
> function at creation time other than those provided for in the grammar. 
> We've already made a couple of fairly ugly hacks to do stuff like this 
> in plpgsql. Is that really the road we want to go down? Is it less messy 
> than providing some catalog support for language specific function 
> properties, where they might be visible outside the function source?

There might be an argument in the abstract for that, but I can't see
expending the work until we have a more pressing concrete requirement
than this one.  I don't believe that defining this particular behavior
as a function property is a good long-term solution, because it seems
practically certain that everybody will want to migrate to the new
behavior.  A GUC works well for that, because you can flip over the
default once you reach the point of having converted or marked all your
functions.  A function property doesn't work at all, unless it's just a
means of locally overriding the GUC ... and the SET clause exists for
that already.

I could be talked into function properties given a few examples of
properties that could be expected to remain in use for a long time
(like volatile/immutable for instance).  But this example is no sale.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Tomas Vondra
Dne 12.12.2010 17:33, Florian Pflug napsal(a):
> On Dec12, 2010, at 15:43 , Heikki Linnakangas wrote:
>> The way I think of that problem is that once you know the postcode, knowing 
>> the city name doesn't add any information. The postcode implies the city 
>> name. So the selectivity for "postcode = ? AND city = ?" should be the 
>> selectivity of "postcode = ?" alone. The measurement we need is 
>> "implicativeness": How strongly does column A imply a certain value for 
>> column B. Perhaps that could be measured by counting the number of distinct 
>> values of column B for each value of column A, or something like that. I 
>> don't know what the statisticians call that property, or if there's some 
>> existing theory on how to measure that from a sample.
> 
> The statistical term for this is "conditional probability", written P(A|B), 
> meaning the probability of A under the assumption or knowledge of B. The 
> basic tool for working with conditional probabilities is bayes' theorem which 
> states that
> 
> P(A|B) = P(A and B) / P(B).
> 
> Currently, we assume that P(A|B) = P(A), meaning the probability (or 
> selectivity as we call it) of an event (like a=3) does not change under 
> additional assumptions like b=4. Bayes' theorem thus becomes
> 
> P(A) = P(A and B) / P(B)<=>
> P(A and B) = P(A)*P(B)
> 
> which is how we currently compute the selectivity of a clause such as "WHERE 
> a=3 AND b=4".
> 
> I believe that measuring this by counting the number of distinct values of 
> column B for each A is basically the right idea. Maybe we could count the 
> number of distinct values of "b" for every one of the most common values of 
> "a", and compare that to the overall number of distinct values of "b"...

Good point!

Well, I was thinking about this too - generally this means creating a
contingency table with the MCV as bins. Then you can compute these
interesting probabilities P(A and B). (OK, now I definitely look like
some contingency table weirdo, who tries to solve everything with a
contingency table. OMG!)

The question is - what are we going to do when the values in the query
are not in the MCV list? Is there some heuristics to estimate the
probability from MCV, or something like that? Could we use some
"average" probability or what?

Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] function attributes

2010-12-12 Thread Andrew Dunstan



On 12/12/2010 10:43 AM, Tom Lane wrote:



At the moment the behaviour is triggered by a custom setting
(plperl.pass_binary_bytea), but this isn't really satisfactory. We could
turn it on  permanently, but that would break a lot of legacy code. What
we really need is a way of marking a function with some attributes. Of
course, we could put it in the program text like plpgsql's
#variable_conflict, but that's really rather ugly. The grammar already
has an attribute mechanism for functions, and ISTM we just need to
extend that a bit to allow setting of function attributes reasonably
flexibly, much as we can now specify format options on EXPLAIN or we'll
soon be able to specify options for foreign tables.

I do not want to go there.  What you're proposing will soon turn into a
real mess, with arbitrary language-specific junk tagged onto pg_proc
entries.  And what's worse, it'll be mixed with non-language-specific
junk, because of the existing legacy WITH entries.



Arguably we should deprecate those legacy entries. One, isCachable,  is 
stated in the docs to be be obsolete, and has been for many releases 
now. The other, isStrict, is a non-preferred way of specifying that the 
function is strict.


But the real issue is that we have no way of specifying properties for a 
function at creation time other than those provided for in the grammar. 
We've already made a couple of fairly ugly hacks to do stuff like this 
in plpgsql. Is that really the road we want to go down? Is it less messy 
than providing some catalog support for language specific function 
properties, where they might be visible outside the function source?


In the present case, Robert's suggestion of using "create function ... 
set plperl.pass_binary_bytea = true" seems to work well enough, although 
I haven't tried very hard yet to break it.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Florian Pflug
On Dec12, 2010, at 15:43 , Heikki Linnakangas wrote:
> The way I think of that problem is that once you know the postcode, knowing 
> the city name doesn't add any information. The postcode implies the city 
> name. So the selectivity for "postcode = ? AND city = ?" should be the 
> selectivity of "postcode = ?" alone. The measurement we need is 
> "implicativeness": How strongly does column A imply a certain value for 
> column B. Perhaps that could be measured by counting the number of distinct 
> values of column B for each value of column A, or something like that. I 
> don't know what the statisticians call that property, or if there's some 
> existing theory on how to measure that from a sample.

The statistical term for this is "conditional probability", written P(A|B), 
meaning the probability of A under the assumption or knowledge of B. The basic 
tool for working with conditional probabilities is bayes' theorem which states 
that

P(A|B) = P(A and B) / P(B).

Currently, we assume that P(A|B) = P(A), meaning the probability (or 
selectivity as we call it) of an event (like a=3) does not change under 
additional assumptions like b=4. Bayes' theorem thus becomes

P(A) = P(A and B) / P(B)<=>
P(A and B) = P(A)*P(B)

which is how we currently compute the selectivity of a clause such as "WHERE 
a=3 AND b=4".

I believe that measuring this by counting the number of distinct values of 
column B for each A is basically the right idea. Maybe we could count the 
number of distinct values of "b" for every one of the most common values of 
"a", and compare that to the overall number of distinct values of "b"...

A (very) quick search on scholar.google.com for "estimate conditional 
probability" didn't turn up anything useful, but it's hard to believe that 
there isn't at least some literature on the subject.

best regards,
Florian Pflug
 
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] function attributes

2010-12-12 Thread Andrew Dunstan



On 12/12/2010 10:43 AM, Tom Lane wrote:


Tim Bunce seemed to think that this particular problem might be solvable
in a completely transparent way, by having byteas convert into Perl
objects that have a hook for producing a backwards-compatible text
translation.  Have you looked into that idea?


No. If you're referring to this sentence, which was referring to arrays, 
not to byteas:



It's possible a blessed ref with string overloading would avoid
backwards compatibility issues.



then it won't work (or at least it would be far more complex than what 
I've done, and I can't see how it would work) in the case of a bytea, 
since a bytea becomes a scalar, not a ref, and you can only bless refs.



cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Tomas Vondra
Dne 12.12.2010 15:43, Heikki Linnakangas napsal(a):
>> The classic failure case has always been: postcodes and city names.
>> Strongly correlated, but in a way that the computer can't easily see.
> 
> Yeah, and that's actually analogous to the example I used in my
> presentation.
> 
> The way I think of that problem is that once you know the postcode,
> knowing the city name doesn't add any information. The postcode implies
> the city name. So the selectivity for "postcode = ? AND city = ?" should
> be the selectivity of "postcode = ?" alone. The measurement we need is
> "implicativeness": How strongly does column A imply a certain value for
> column B. Perhaps that could be measured by counting the number of
> distinct values of column B for each value of column A, or something
> like that. I don't know what the statisticians call that property, or if
> there's some existing theory on how to measure that from a sample.

Yes, those issues are a righteous punishment for breaking BCNF rules ;-)

I'm not sure it's solvable using the contingency tables, as it requires
knowledge about dependencies between individual values (working with
cells is not enough, although it might improve the estimates).

Well, maybe we could collect these stats (number of cities for a given
ZIP code and number of ZIP codes for a given city). Collecting a good
stats about this is a bit tricky, but possible. What about collecting
this for the MCVs from both columns?

Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Use symbolic names not octal constants for file permission flags

2010-12-12 Thread Tom Lane
I wrote:
> Magnus Hagander  writes:
>> On Sat, Dec 11, 2010 at 18:46, Tom Lane  wrote:
>>> I think we can just #define the other cases as zeroes.  I'm not sure why
>>> you think that's an issue for open --- the privileges don't exist.

>> Hmm. I was/am worried about any case that specifies *just* one of the
>> permissions that don't exist. That'll leave it at zero, whereas the
>> correct one might be the user-only version of whatever (read/write)
>> was given.

> If we didn't specify the "user" read or write privilege, we shouldn't
> get it.

I put in #define's for these, and it seems to have fixed the MSVC
buildfarm members, but cygwin is still broken.  How come ... doesn't
that port use port/win32.h?

> What are the values of _S_IREAD and _S_IWRITE, anyway?  I'm still
> wondering how come the previous coding with hardwired constants
> behaved correctly.

Still curious about this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with pg_upgrade (8.4 -> 9.0) due to ALTER DATABASE SET ROLE

2010-12-12 Thread Tom Lane
Florian Pflug  writes:
> pg_upgrade aborted during the step "Adding support functions to new cluster" 
> with "ERROR:  permission denied for language c" error. Unfortunately, the log 
> didn't include the name of the database where the error occurred, so it took 
> me a while to figure out that the culprit was a "ALTER DATABASE SET ROLE = 
> " I had done on one of my databases, which effectively 
> prevented pg_upgrade from connection with superuser privileges.

That seems like a pretty stupid thing to have done; it would prevent
*any* connection to that database with superuser privileges, no?

> While one could argue that this behaviour is perfectly consistent, I believe 
> most users will expect pg_upgrade (and to a lesser extent also pg_dump and 
> pg_restore) to be unaffected by such settings.

This is about like arguing that pg_dump and pg_upgrade should still work
after you've done "delete from pg_proc;".  Superusers are assumed to
know what they're doing and not break fundamental operations.

I'm thinking that if there's anything we should forbid here, it's the
ALTER ... SET itself.  In particular, some experimentation suggests that
a non-superuser database owner can do it:

regression=# create user joe;
CREATE ROLE
regression=# create database joe with owner joe;
CREATE DATABASE
regression=# \c joe joe
You are now connected to database "joe" as user "joe".
joe=> alter database joe set role joe;
ALTER DATABASE

which seems to me at least a bad idea and arguably a security hazard.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Tomas Vondra
Hi!

Dne 12.12.2010 15:17, Martijn van Oosterhout napsal(a):
>> Lets talk about one special case - I'll explain how the proposed
>> solution works, and then I'll explain how to make it more general, what
>> improvements are possible, what issues are there. Anyway this is by no
>> means a perfect or complete solution - it's just a starting point.
> 
> It looks like you handled most of the issues. Just a few points:
> 
> - This is obviously applicable to more than just integers, probably
>   anything with a b-tree operator class. What you've coded seems rely
>   on calculations on the values. Have you thought about how it could
>   work for, for example, strings?

Yes, I know, I just forgot to address this in my previous e-mail. The
contingency tables have a really nice feature - they are based on
splitting the sets into groups (~ bins of the histograms for each
column). And this can be done if you can sort the values, you really
don't need any calculations. So it should work with strings.

And another thing I somehow forgot is handling the case when there is no
histogram, just MCV. That's mostly the same - each of the values might
be a separate group, or the values might be grouped to form less groups,
etc.

> The classic failure case has always been: postcodes and city names.
> Strongly correlated, but in a way that the computer can't easily see.
> Not that I suggest you fix this, but it's food for though. Though
> strictly speaking this is a different kind of correlation than what
> you're looking at.

Hmmm, I see. I think the proposal does not fix this particular case,
although it might improve the situation a little bit (limit the error
between expected and observed number of rows).

The problem is that once we get to a cell-level of the contingency
table, there is no additional (more detailed) information. So we're
stuck with the multiplication estimate, or something like that.

I was thinking about it actually, and I think we could collect some more
info - a correlation coefficient for each bin, or something like that.
But that was not part of my proposal, and I'm not sure how to do that.

>> 2) I really don't think we should collect stats for all combinations of
>>columns of a table - I do like the Oracle-like approach where a DBA
>>has to enable cross-column stats using an ALTER TABLE (for a
>>particular list of columns).
>>
>>The only exception might be columns from a multi-column index. It
>>might be quite efficient I guess?
> 
> In the past it has been suggested to only do it for multi-column
> indexes, but I find these days I find in some situations I prefer to
> make individual indexes and let the bitmap scan code combine them. So
> perhaps it would be best to let it be configured by the DBA.

Yes, I prefer individual indexes too.

The idea behind collecting cross-column stats for multi-column indexes
was that maybe we could 'append' this to the current functionality
(building the index or something like that) so that it does not
introduce significant performance problems.

>> 3) There are independence tests for contingency tables (e.g. Pearson's
>>Chi-squared test), so that it's easy to find out whether the columns
>>are independent. In that case we can just throw away these stats and
>>use the simple estimation.
>>
>>http://mathworld.wolfram.com/Chi-SquaredTest.html
> 
> I think this would be good to include, if possible. 
> 
> Actually, I wonder if the existing stats collection code could be
> altered to attempt to calculate the correlation between columns as part
> of its other work.

I guess that would be rather expensive - to compute correlation you need
two passes, and you need to do that for each pair or columns. So I'd be
surprised if it is possible (and effective).

Another thing is that you can compute correlation only for numeric
columns, so it's not possible to do that for city/ZIP code mentioned
above. More precisely - it's possible to do that (if you map strings to
numbers somehow), but I doubt you'll get useful results as the
assignment is rather random.

Well, you could ask the governments to assign the ZIP codes to cities in
strictly alphabecital order, but I guess they'll say no.

>> 4) Or we could store just those cells where expected and observed values
>>differ significantly (may help if most of the values are indendent,
>>but there's a small glitch somewhere).
> 
> Comrpessing that grid would be useful, given that for many dimensions
> most of the grid will be not interesting. In fact, storing the 20
> largest values may be enough. Worth an experiment.

Not exactly just the largest values - rather values that are
significantly different from the expected values. Generally there are
two interesting cases

expected << observed - The optimizer may choose index scan, although
   the seq scan would be better.

expected >> observed - The optimizer may choose seq scan, although
   the index scan 

Re: [HACKERS] function attributes

2010-12-12 Thread Tom Lane
Andrew Dunstan  writes:
> Yesterday I did a bit of work on allowing bytea values to be passed into 
> and out of plperl in binary format, effectively removing the need to 
> escape and de-escape them. (The work can be seen on he plperlargs branch 
> of my development repo at 
> ).

> At the moment the behaviour is triggered by a custom setting 
> (plperl.pass_binary_bytea), but this isn't really satisfactory. We could 
> turn it on  permanently, but that would break a lot of legacy code. What 
> we really need is a way of marking a function with some attributes. Of 
> course, we could put it in the program text like plpgsql's 
> #variable_conflict, but that's really rather ugly. The grammar already 
> has an attribute mechanism for functions, and ISTM we just need to 
> extend that a bit to allow setting of function attributes reasonably 
> flexibly, much as we can now specify format options on EXPLAIN or we'll 
> soon be able to specify options for foreign tables.

I do not want to go there.  What you're proposing will soon turn into a
real mess, with arbitrary language-specific junk tagged onto pg_proc
entries.  And what's worse, it'll be mixed with non-language-specific
junk, because of the existing legacy WITH entries.

Tim Bunce seemed to think that this particular problem might be solvable
in a completely transparent way, by having byteas convert into Perl
objects that have a hook for producing a backwards-compatible text
translation.  Have you looked into that idea?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Heikki Linnakangas

On 12.12.2010 15:17, Martijn van Oosterhout wrote:

On Sun, Dec 12, 2010 at 03:58:49AM +0100, Tomas Vondra wrote:
Very cool that you're working on this.


+1


Lets talk about one special case - I'll explain how the proposed
solution works, and then I'll explain how to make it more general, what
improvements are possible, what issues are there. Anyway this is by no
means a perfect or complete solution - it's just a starting point.


It looks like you handled most of the issues. Just a few points:

- This is obviously applicable to more than just integers, probably
   anything with a b-tree operator class. What you've coded seems rely
   on calculations on the values. Have you thought about how it could
   work for, for example, strings?

The classic failure case has always been: postcodes and city names.
Strongly correlated, but in a way that the computer can't easily see.


Yeah, and that's actually analogous to the example I used in my 
presentation.


The way I think of that problem is that once you know the postcode, 
knowing the city name doesn't add any information. The postcode implies 
the city name. So the selectivity for "postcode = ? AND city = ?" should 
be the selectivity of "postcode = ?" alone. The measurement we need is 
"implicativeness": How strongly does column A imply a certain value for 
column B. Perhaps that could be measured by counting the number of 
distinct values of column B for each value of column A, or something 
like that. I don't know what the statisticians call that property, or if 
there's some existing theory on how to measure that from a sample.


That's assuming the combination has any matches. It's possible that the 
user chooses a postcode and city combination that doesn't exist, but 
that's no different from a user doing "city = 'fsdfsdfsd'" on a single 
column, returning no matches. We should assume that the combination 
makes sense.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, patch v16

2010-12-12 Thread Oleg Bartunov

Hi there,

it's clear we need versions, probably, major.minor would be enough. The problem
I see is how to keep .so in sync with .sql ? Should we store .sql in database ?

Also, we need permissions for extension, since we have open/closed 
extensions.



Oleg

On Sat, 11 Dec 2010, David E. Wheeler wrote:


On Dec 11, 2010, at 1:09 PM, David Fetter wrote:


Why is it in the makefile at all?  If the makefile does need to know it,
why don't we have it scrape the number out of the control file?  Or even
more to the point, since when do we need version numbers in extensions?


We *absolutely* need version numbers in extensions.  People will want
to have a certain version, or a certain minimum version, etc., etc.,
etc., just as they do for any other software.

Seriously, are you OK?


One of the biggest mistakes in the creation of CPAN was allowing modules 
without extensions. It makes figuring out what to upgrade extremely difficult. 
Learning from that, PGXN requires version numbers for all extensions.

Best,

David





Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Martijn van Oosterhout
On Sun, Dec 12, 2010 at 03:58:49AM +0100, Tomas Vondra wrote:
> Hi everyone,
> 
> one of the ssesion I've attended on PgDay last week was Heikki's session
> about statistics in PostgreSQL. One of the issues he mentioned (and one
> I regularly run into) is the absence of cross-column stats. When the
> columns are not independent, this usually result in poor estimates (and
> then in suboptimal plans).

Very cool that you're working on this.

> Lets talk about one special case - I'll explain how the proposed
> solution works, and then I'll explain how to make it more general, what
> improvements are possible, what issues are there. Anyway this is by no
> means a perfect or complete solution - it's just a starting point.

It looks like you handled most of the issues. Just a few points:

- This is obviously applicable to more than just integers, probably
  anything with a b-tree operator class. What you've coded seems rely
  on calculations on the values. Have you thought about how it could
  work for, for example, strings?

The classic failure case has always been: postcodes and city names.
Strongly correlated, but in a way that the computer can't easily see.
Not that I suggest you fix this, but it's food for though. Though
strictly speaking this is a different kind of correlation than what
you're looking at.

> 2) I really don't think we should collect stats for all combinations of
>columns of a table - I do like the Oracle-like approach where a DBA
>has to enable cross-column stats using an ALTER TABLE (for a
>particular list of columns).
> 
>The only exception might be columns from a multi-column index. It
>might be quite efficient I guess?

In the past it has been suggested to only do it for multi-column
indexes, but I find these days I find in some situations I prefer to
make individual indexes and let the bitmap scan code combine them. So
perhaps it would be best to let it be configured by the DBA.

> 3) There are independence tests for contingency tables (e.g. Pearson's
>Chi-squared test), so that it's easy to find out whether the columns
>are independent. In that case we can just throw away these stats and
>use the simple estimation.
> 
>http://mathworld.wolfram.com/Chi-SquaredTest.html

I think this would be good to include, if possible. 

Actually, I wonder if the existing stats collection code could be
altered to attempt to calculate the correlation between columns as part
of its other work.

> 4) Or we could store just those cells where expected and observed values
>differ significantly (may help if most of the values are indendent,
>but there's a small glitch somewhere).

Comrpessing that grid would be useful, given that for many dimensions
most of the grid will be not interesting. In fact, storing the 20
largest values may be enough. Worth an experiment.

Hope this helps,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first. 
>   - Charles de Gaulle


signature.asc
Description: Digital signature


Re: [HACKERS] proposal: auxiliary functions for record type

2010-12-12 Thread Pavel Stehule
2010/12/12 Florian Pflug :
> On Dec12, 2010, at 00:19 , Pavel Stehule wrote:
>> I prefer a table based
>> solution, because I don't need a one "unnest", but other preferences
>> are valid too.
> That's fine with me.
>
>> I dissatisfied with your design of explicit target type
>> via unused value.  I think, so we are not a infrastructure for it now
>> - from my view is better to use a common type, that is text now. It's
>> nothing new - plpgsql use it too.
> Sorry, I can't follow you here. Where does plpgsql use text as "common" type?

plpgsql uses only IO casts. So inside assign statement is checked
target type and real type. But this checking is late! I did a patch
for early conversion to target type (in plan), but this patch was
rejected. So actually, there isn't available information about target
type in expression - and probably will be - from compatibility
reasons. For example: when target variable is int, but you used a
numeric constant, then any assignment does a IO cast from num to int.

>
>> I see one well design of explicit target type based on polymorphic
>> types that respect a PostgreSQL fmgr practice:
>>
>> We have to allow a polymorphic functions without polymorphic
>> parameters. These functions shoud be designed to return value in
>> "unknown" type format when this function has not outer information.
> I don't think "unknown" is the right type for that. As far as I known, 
> "unknown" is still a textual type, used to have some type to assign to string 
> literals during parsing when no better type can be inferred.
>
>> This information can be passed in function context. When function
>> context isn't null, then function has to read target type and should
>> to return value in target type. Who can fill a function context? It is
>> task for executor. And when CAST contains just function call, then we
>> can recheck, if function is polymorphic, and if it is, then we can set
>> function context to target type, and then we don't need to call a
>> conversion function, because polymorphic function must returns data in
>> correct format.
> The main difficulty is that currently types are assigned in a bottom-up 
> fashion as far as I know. To make functions with a polymorphic return value, 
> but without polymorphic arguments work, you need to assign the return type in 
> a top-down fashion (It depends on where to value *goes*, not where it *comes 
> from*). That seems like a rather huge change and has the potential to 
> complicate quite a few other parts, most notably function lookup/resolution.

I don't think:
a) the place where we don't know a target type is limited only to
first outer cast
b) I didn't defined polymorphic function without polymorphic
parameters (PFWPP) as absolutly undescribed - it returns a "unknown"
or "text" in default. There isn't problem to search a this function -
and isn't a problem for later work, so this function returns "text",
because first outer cast ensure transformation to correct type.
c) when function is called without outer cast then it runs too - but
there will be one IO cast more.

some alchemy with function descriptor ale used now too - when default
parameters are used.

>
> Plus, the general case where type information must bubble up more than one 
> level seems pretty much intractable, as it'd require a full-blown type 
> inference algorithm like ML or Haskell. Not a place where we want to go, I 
> believe.
>
> The restricted case, on the other hand, brings very little benefit compared 
> to the dummy-parameter approach. Yeah, "()::type" may 
> look a bit cleaner than "(NULL::type)", but thats about 
> is. It's only assignments in pl/pgsql which really benefit, since you'd be 
> able to leave out the type completely, writing simply "v_value := 
> ()". Does that really warrant the effort that'd be 
> involved?
>
>> Without described functionality we can design a not polymorphic
>> function, that can returns unknown type. When similar functionality
>> will be implemented, then this function will be changed to
>> polymorphic, but from user's perspective, there isn't a change.

> I don't really understand why you resist the idea of a dummy parameter so 
> much. It might not be pretty, but is it bad enough to rectify putting in all 
> this work? Plus, the whole record-manipulation stuff isn't going to win a 
> beauty contest anytime soon. But it's better than nothing, so as long as it's 
> reasonably efficient I think one can live with a few warts on the API.

I wrote it. In this case, you don't need to know a value, you have to
work with type. So using a typed null isn't intuitive and it isn't
nice - for me - too ugly for in general module. I know, so PFWPP
functions need a lot of coding without sure result, and it's reason,
why I didn't used it and why I use a "text" type. And I have a other
reason for - I expect so there is bigger probability to iterate over
different type's fields, so coercion to one target type isn't
available in one path. Using a more pat

Re: [HACKERS] Wildcard search support for pg_trgm

2010-12-12 Thread Alexander Korotkov
I found another problem. GIN index suffers from "GIN indexes do not support
whole-index scans" when no trigram can be extracted from pattern.


With best regards,
Alexander Korotkov.


[HACKERS] Re: [COMMITTERS] pgsql: Reduce spurious Hot Standby conflicts from never-visible records

2010-12-12 Thread Simon Riggs
On Sat, 2010-12-11 at 22:03 +0100, Heikki Linnakangas wrote:
> (Moving to pgsql-hackers)
> 
> On 10.12.2010 20:21, Tom Lane wrote:
> > Simon Riggs  writes:
> >> Reduce spurious Hot Standby conflicts from never-visible records.
> >> Hot Standby conflicts only with tuples that were visible at
> >> some point. So ignore tuples from aborted transactions or for
> >> tuples updated/deleted during the inserting transaction when
> >> generating the conflict transaction ids.
> >
> >> Following detailed analysis and test case by Noah Misch.
> >> Original report covered btree delete records, correctly observed
> >> by Heikki Linnakangas that this applies to other cases also.
> >> Fix covers all sources of cleanup records via common code.
> >> Includes additional fix compared to commit on HEAD
> >
> > ISTM HeapTupleHeaderAdvanceLatestRemovedXid is still pretty broken,
> > in that it's examining xmax without having checked that xmax is (a)
> > valid or (b) a lock rather than a deletion xmax.
> 
> In current use, it's only called for tuples that are known to be dead, 
> so either xmax is a valid deletion, or xmin didn't commit in which case 
> the function doesn't use xmax for anything. So I think it actually works 
> as it is.

Well, I think you're both right.

The function shouldn't be called in places where xmax is the wrong
flavour, but there should be specific safeguards in case of mistake.

> I agree it doesn't look right, though. At the very least it needs 
> comments explaining that, but preferably it should do something sane 
> when faced with a tuple that's not dead after all. Perhaps throw an 
> error (though that would be bad during recovery), or an Assert, or just 
> refrain from advancing latestRemovedXid (or advance it, that would be 
> the conservative stance given the current use).

Yes

> Also, I'm not totally convinced it's correct when xmin > xmax, despite 
> Simon's follow-up commit to fix that. Shouldn't it advance 
> latestRemovedXid to xmin in that case? Or maybe it's ok as it is because 
> we know that xmax committed after xmin. The impression I get from the 
> comment above the function now is that it advances latestRemovedXid to 
> the highest XID present in the tuple, but that's not what it does in the 
> xmin > xmax case. That comment needs clarification.

Hmmm, my earlier code took xmax only if xmax > xmin. That was wrong;
what I have now is better, but your point is there may be an even better
truth. I'll think on that a little more.

> While we're at it, perhaps it would be better to move this function to 
> tqual.c. And I feel that a more natural interface would be something like:
> 
> TransactionId
> HeapTupleHeaderGetLatestRemovedXid(HeapTupleHeader tuple);
> 
> IOW, instead bumping up the passed-in latestRemovedXid value, return the 
> highest XID on the tuple (if it was dead).
> 
> PS. it would be good to set hint bits in that function like in 
> HeapTupleSatisfies* functions.

I'm not that happy with refactoring inside a release, plus I'm not even
sure if that is the right way.

I suspect the best way would be to do this as a side-effect of
HeapSatisfiesVacuum(), since this processing should only ever be done in
conjunction with that function.

Will respond later today on those thoughts.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Problem with pg_upgrade (8.4 -> 9.0) due to ALTER DATABASE SET ROLE

2010-12-12 Thread Florian Pflug
Hi

I've just ran into a problem while upgrading from 8.4 to 9.0.

pg_upgrade aborted during the step "Adding support functions to new cluster" 
with "ERROR:  permission denied for language c" error. Unfortunately, the log 
didn't include the name of the database where the error occurred, so it took me 
a while to figure out that the culprit was a "ALTER DATABASE SET ROLE = 
" I had done on one of my databases, which effectively prevented 
pg_upgrade from connection with superuser privileges.

While one could argue that this behaviour is perfectly consistent, I believe 
most users will expect pg_upgrade (and to a lesser extent also pg_dump and 
pg_restore) to be unaffected by such settings.

Should we provide a way (for super-users only, of course) to skip 
per-database/per-role settings when connecting?

best regards
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: auxiliary functions for record type

2010-12-12 Thread Florian Pflug
On Dec12, 2010, at 00:19 , Pavel Stehule wrote:
> I prefer a table based
> solution, because I don't need a one "unnest", but other preferences
> are valid too.
That's fine with me.

> I dissatisfied with your design of explicit target type
> via unused value.  I think, so we are not a infrastructure for it now
> - from my view is better to use a common type, that is text now. It's
> nothing new - plpgsql use it too.
Sorry, I can't follow you here. Where does plpgsql use text as "common" type?

> I see one well design of explicit target type based on polymorphic
> types that respect a PostgreSQL fmgr practice:
> 
> We have to allow a polymorphic functions without polymorphic
> parameters. These functions shoud be designed to return value in
> "unknown" type format when this function has not outer information.
I don't think "unknown" is the right type for that. As far as I known, 
"unknown" is still a textual type, used to have some type to assign to string 
literals during parsing when no better type can be inferred.

> This information can be passed in function context. When function
> context isn't null, then function has to read target type and should
> to return value in target type. Who can fill a function context? It is
> task for executor. And when CAST contains just function call, then we
> can recheck, if function is polymorphic, and if it is, then we can set
> function context to target type, and then we don't need to call a
> conversion function, because polymorphic function must returns data in
> correct format.
The main difficulty is that currently types are assigned in a bottom-up fashion 
as far as I know. To make functions with a polymorphic return value, but 
without polymorphic arguments work, you need to assign the return type in a 
top-down fashion (It depends on where to value *goes*, not where it *comes 
from*). That seems like a rather huge change and has the potential to 
complicate quite a few other parts, most notably function lookup/resolution.

Plus, the general case where type information must bubble up more than one 
level seems pretty much intractable, as it'd require a full-blown type 
inference algorithm like ML or Haskell. Not a place where we want to go, I 
believe.

The restricted case, on the other hand, brings very little benefit compared to 
the dummy-parameter approach. Yeah, "()::type" may look a 
bit cleaner than "(NULL::type)", but thats about is. It's 
only assignments in pl/pgsql which really benefit, since you'd be able to leave 
out the type completely, writing simply "v_value := ()". 
Does that really warrant the effort that'd be involved?

> Without described functionality we can design a not polymorphic
> function, that can returns unknown type. When similar functionality
> will be implemented, then this function will be changed to
> polymorphic, but from user's perspective, there isn't a change.
I don't really understand why you resist the idea of a dummy parameter so much. 
It might not be pretty, but is it bad enough to rectify putting in all this 
work? Plus, the whole record-manipulation stuff isn't going to win a beauty 
contest anytime soon. But it's better than nothing, so as long as it's 
reasonably efficient I think one can live with a few warts on the API.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers