date:20101021

Re: [HACKERS] crash in plancache with subtransactions

2010-10-21 Thread Heikki Linnakangas


On 22.10.2010 06:10, Tom Lane wrote:

Right at the moment I'm not seeing any way that the present
exec_eval_simple_expr approach can be fixed to work safely in the
presence of recursion.  What I think we might have to do is give up
on the idea of caching execution state trees across calls, instead
using them just for the duration of a single plpgsql function call.
I'm not sure what sort of runtime penalty might ensue.  The whole design
predates the plancache, and I think it was mostly intended to prevent
having to re-parse-and-plan simple expressions every time.  So a lot of
that overhead has gone away anyway given the plancache, and maybe we
shouldn't sweat too much about paying what remains.


We should test and measure that.


 (But on the third
hand, what are we gonna do for back-patching to versions without the
plancache?)


One simple idea is to keep a flag along with the executor state to 
indicate that the executor state is currently in use. Set it just before 
calling ExecEvalExpr, and reset afterwards. If the flag is already set 
in the beginning of exec_eval_simple_expr, we have recursed, and must 
create a new executor state.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simplifying replication

2010-10-21 Thread Fujii Masao

On Fri, Oct 22, 2010 at 11:03 AM, Josh Berkus  wrote:
>
>> I think it's pretty well explained in the fine manual.
>>
>> http://www.postgresql.org/docs/current/static/runtime-config-wal.html#GUC-WAL-KEEP-SEGMENTS
>
> Nope.  No relationship to checkpoint_segments is explained there.  Try
> again?

Please see
http://archives.postgresql.org/pgsql-docs/2010-10/msg00038.php

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Timeout and wait-forever in sync rep

2010-10-21 Thread Fujii Masao

On Fri, Oct 22, 2010 at 7:33 AM, Bruce Momjian  wrote:
> Fujii Masao wrote:
>> Hi,
>>
>> As the result of the discussion, I think that we need the following two
>> parameters for the case where the standby goes down.
>
> Can we have a parameter that calls a operating system command when a
> standby is declared dead, to notify the administrator?

For me, that command is useful to STONITH the standby when the master
detects the disconnection. I agree to add that parameter.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating-point timestamps versus Range Types

2010-10-21 Thread Robert Haas

On Thu, Oct 21, 2010 at 10:24 PM, Tom Lane  wrote:
> Bruce Momjian  writes:
>> Greg Stark wrote:
>>> Did we have a solution for the problem that understanding which
>>> columns are timestamps requires having a tuple descriptor and parsing
>>> the every tuple? That seems like it would a) be slow and b) require a
>>> lot of high level code in the middle of a low-level codepath.
>
>> Yep, that's what it requires.  It would rewrite in the new format.
>
> In the case of the recent hstore fixes, we were able to put the burden
> on the hstore functions themselves to do any necessary conversion.
> I wonder if it'd be possible to do something similar here?  I haven't
> chased the bits in any detail, but I'm thinking that integer timestamps
> in a plausible range might all look like denormalized floats, and
> conversely plausible float timestamps would look like ridiculously large
> integer timestamps.  Would we be willing to make such assumptions to
> support in-place upgrade of timestamps?

This seems like it might not be entirely reliable, which would make me
disinclined to do it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating-point timestamps versus Range Types

2010-10-21 Thread Greg Stark

On Thu, Oct 21, 2010 at 7:24 PM, Tom Lane  wrote:
>  Would we be willing to make such assumptions to
> support in-place upgrade of timestamps?
>

If something like that is true (I'm not sure it is) then we could
consider doing the equivalent of what we were talking about doing for
changes that require more space in the new version.

1. Backpatch a change that refuses to create new values of the prohibited type
2. Provide a tool which scans all the values in your old database and
ensures there are no values of the prohibited type
3. Only allow pg_migrator from the version that includes the backpatched check

We would also have to make sure the new version's integer timestamp
doesn't write out any instances that look valid float timestamps. We
could have a guc to disable this check if you have a tool that scans
all the pages and rewrites any old values.

I think #1 would be feasible if it's really as simple as checking the
high bit which I imagine is what you're hoping it is. I supose it
would require writing out a 0 or larger denormalized value which would
mean we wouldn't be able to handle values close to the epoch properly.
I suppose for timestamps that doesn't really matter since that's
precision we never really have anyways.

I'm not sure if the tool to rewrite all existing values is so feasible
though. Considering that the values could be stuck in the middle of
arrays or records or even custom data types. Also there's tintervals
and so on to worry about too.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Creation of temporary tables on read-only standby servers

2010-10-21 Thread Greg Stark

On Thu, Oct 21, 2010 at 7:13 PM, Tom Lane  wrote:
> ... and as I recall, we got rid of it principally because the temp
> tables weren't visible to ordinary catalog lookups, thus breaking
> all sorts of client-side logic.
>

Though that wouldn't be the case if the catalogs held a template.

Anyways I think this horse has been beaten to death. Whoever
implements it will look at the pros and cons and decide which way
they'll go.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] crash in plancache with subtransactions

2010-10-21 Thread Tom Lane

Alvaro Herrera  writes:
> Excerpts from Tom Lane's message of jue oct 21 19:36:07 -0300 2010:
>> I don't believe that it's plancache's fault; the real problem is that
>> plpgsql is keeping "simple expression" execution trees around longer
>> than it should.  Your patch masks the problem by forcing those trees to
>> be rebuilt, but it's the execution trees not the plan trees that contain
>> stale data.

> Ahh, this probably explains why I wasn't been able to reproduce the
> problem without involving subxacts, or prepared plans, that seemed to
> follow mostly the same paths around plancache cleanup.

> It's also the likely cause that this hasn't ben reported earlier.

I traced through the details and found that the proximate cause of the
crash is that this bit in fmgr_sql() gets confused:

/*
 * Convert params to appropriate format if starting a fresh execution. (If
 * continuing execution, we can re-use prior params.)
 */
if (es && es->status == F_EXEC_START)
postquel_sub_params(fcache, fcinfo);

After the error in the first subtransaction, the execution state tree
for the "public.dummy(p_name_table)" expression has a fn_extra link
that is pointing to a SQLFunctionCache that's in F_EXEC_RUN state.
So when the second call of broken() tries to re-use the state tree,
fmgr_sql() thinks it's continuing the execution of a set-returning
function, and doesn't bother to re-initialize its ParamListInfo
struct.  So it merrily tries to execute using a text datum that's
pointing at long-since-pfree'd storage for the original
'nonexistant.stuffs' argument string.

I had always felt a tad uncomfortable with the way that plpgsql re-uses
execution state trees for simple expressions; it seemed to me that it
was entirely unsafe in recursive usage.  But I'd never been able to
prove that it was broken.  Now that I've seen this example, I know how
to break it: recurse indirectly through a SQL function.  For instance,
this will dump core immediately:

create or replace function recurse(float8) returns float8 as
$$
begin
  raise notice 'recurse(%)', $1;
  if ($1 < 10) then
return sql_recurse($1 + 1);
  else
return $1;
  end if;
end
$$ language plpgsql;

-- "limit" is to prevent this from being inlined
create or replace function sql_recurse(float8) returns float8 as
$$ select recurse($1) limit 1; $$ language sql;

select recurse(0);

Notice the lack of any subtransaction or error condition.  The reason
this fails is *not* plancache misfeasance or failure to clean up after
error.  Rather, it's that the inner execution of recurse() is trying to
re-use an execution state tree that is pointing at an already-active
execution of sql_recurse().  In general, what plpgsql is doing is
entirely unsafe whenever a called function tries to keep changeable
execution state in storage pointed to by fn_extra.  We've managed to
miss the problem because plpgsql doesn't try to use this technique on
functions returning set (see exec_simple_check_node), and the vast
majority of non-SRF functions that use fn_extra at all use it to cache
catalog lookup results, which don't change from call to call.  But
there's no convention that says a function can't keep execution status
data in fn_extra --- in fact, there isn't anyplace else for it to keep
such data.

Right at the moment I'm not seeing any way that the present
exec_eval_simple_expr approach can be fixed to work safely in the
presence of recursion.  What I think we might have to do is give up
on the idea of caching execution state trees across calls, instead
using them just for the duration of a single plpgsql function call.
I'm not sure what sort of runtime penalty might ensue.  The whole design
predates the plancache, and I think it was mostly intended to prevent
having to re-parse-and-plan simple expressions every time.  So a lot of
that overhead has gone away anyway given the plancache, and maybe we
shouldn't sweat too much about paying what remains.  (But on the third
hand, what are we gonna do for back-patching to versions without the
plancache?)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, this time with a patch

2010-10-21 Thread Itagaki Takahiro

On Fri, Oct 22, 2010 at 1:31 AM, Dimitri Fontaine
 wrote:
> Of course, you what that means? Yes, another version of the patch, that
> will build the control file out of the control.in at build time rather
> than install time, and that's back to using EXTVERSION both in the
> Makefile and in the .control.in file.

Here are detailed report for v9 patch.

* extension.v9.patch.gz seems to contain other changes in the repo.
Did you use old master to get the diff?

* Typo in doc/xfunc.sgml. They are to be "replaceable" probably.
openjade:xfunc.sgml:2510:32:E: element "REPLACABLE" undefined
openjade:xfunc.sgml:2523:46:E: element "REPLACABLE" undefined

* There are some inconsistency between extension names in \dx+ view
and actual name used by CREATE EXTENSION.
  - auto_username => insert_username
  - intarray => _int
  - xml2 => pgxml
We might need to rename them, or add 'installer'/'uninstaller' entries
into control files to support different extension names from .so name.

* pg_execute_from_file() and encoding
It expects the file is in server encoding, but it is not always true
because we support multiple encodings in the same installation.
How about adding encoding parameter to the function?
  => pg_execute_from_file( file, encoding )
CREATE EXTENSION could have optional ENCODING option.
  => CREATE EXTENSION name [ ENCODING 'which' ]

I strongly hope the multi-encoding support for my Japanese textsearch
extension. Comments in the extension is written in UTF-8, but both
UTF-8 and EUC_JP are equally used for database encodings.

* Error messages in pg_execute_from_file()
- "must be superuser to get file information" would be
  "must be superuser to execute file" .
- "File '%s' could not be executed" would be
  "could not execute file: '%s'". Our message style guide is here:
  http://www.postgresql.org/docs/9.0/static/error-style-guide.html
Many messages in extension.c are also to be adjusted.

commands/extension.c needs to be cleaned up a bit more:
* fsize in read_extension_control_file() is not used.
* ferror() test just after AllocateFile() is not needed;
  NULL checking is enough.
* malloc() in add_extension_custom_variable_classes().
I think the README says nothing about malloc() except assign_hook
cases; palloc would be better here.

BTW, did you register your patch to the next commitfest?
It would be better to do so for tracking the activities.
https://commitfest.postgresql.org/action/commitfest_view?id=8

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating-point timestamps versus Range Types

2010-10-21 Thread Tom Lane

Bruce Momjian  writes:
> Greg Stark wrote:
>> Did we have a solution for the problem that understanding which
>> columns are timestamps requires having a tuple descriptor and parsing
>> the every tuple? That seems like it would a) be slow and b) require a
>> lot of high level code in the middle of a low-level codepath.

> Yep, that's what it requires.  It would rewrite in the new format.

In the case of the recent hstore fixes, we were able to put the burden
on the hstore functions themselves to do any necessary conversion.
I wonder if it'd be possible to do something similar here?  I haven't
chased the bits in any detail, but I'm thinking that integer timestamps
in a plausible range might all look like denormalized floats, and
conversely plausible float timestamps would look like ridiculously large
integer timestamps.  Would we be willing to make such assumptions to
support in-place upgrade of timestamps?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simplifying replication

2010-10-21 Thread Robert Haas

On Thu, Oct 21, 2010 at 10:03 PM, Josh Berkus  wrote:
>
>> I think it's pretty well explained in the fine manual.
>>
>> http://www.postgresql.org/docs/current/static/runtime-config-wal.html#GUC-WAL-KEEP-SEGMENTS
>
> Nope.  No relationship to checkpoint_segments is explained there.  Try
> again?

Well, it says "This sets only the minimum number of segments retained
in pg_xlog; the system might need to retain more segments for WAL
archival or to recover from a checkpoint."  So in other words, the
relationship with checkpoint segments is that whichever one currently
requires retaining a larger number of segments applies.  That's all
the relationship there is.  I'm not sure I understand the question.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Creation of temporary tables on read-only standby servers

2010-10-21 Thread Tom Lane

Bruce Momjian  writes:
> Greg Stark wrote:
>> It seems to me simpler and more direct to just nail relcache
>> entries for these objects into memory and manipulate them directly.
>> They can be constructed from the global catalog tables and then
>> tweaked to point to the backend local temporary tables.

> Funny, but that is how I implemented temporary tables in 1999 and lasted
> until 2002 when schema support was added.  It actually worked because
> all the lookups go through the syscache.

... and as I recall, we got rid of it principally because the temp
tables weren't visible to ordinary catalog lookups, thus breaking
all sorts of client-side logic.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simplifying replication

2010-10-21 Thread Josh Berkus


> I think it's pretty well explained in the fine manual.
> 
> http://www.postgresql.org/docs/current/static/runtime-config-wal.html#GUC-WAL-KEEP-SEGMENTS

Nope.  No relationship to checkpoint_segments is explained there.  Try
again?

>> If checkpoint_segments were a hard limit, then we could let admins set
>> wal_keep_segments to -1, knowing that they'd set checkpoint_segments to
>> the max space they had available.
> 
> This assumes that more checkpoint segments is always better, which
> isn't true.  I might have 100 GB of disk space free, but not want to
> replay WAL for 4 days if I have a crash.

No, it assumes no such thing.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simplifying replication

2010-10-21 Thread Robert Haas

On Thu, Oct 21, 2010 at 9:09 PM, Josh Berkus  wrote:
>
>>> Very true.  But the lack of a -1 setting for wal_keep_segments means
>>> that if you would like to take a backup without archiving, you must
>>> set wal_keep_segments to a value greater than or equal to the rate at
>>> which you generate WAL segments multiplied by the time it takes you to
>>> run a backup.  If that doesn't qualify as requiring arcane knowledge,
>>> I'm mystified as to what ever could.
>
> Speaking of which, what's the relationship between checkpoint_segments
> and wal_keep_segments?  PG seems perfectly willing to let me set the
> latter higher than the former, and it's not documented.

I think it's pretty well explained in the fine manual.

http://www.postgresql.org/docs/current/static/runtime-config-wal.html#GUC-WAL-KEEP-SEGMENTS

> If checkpoint_segments were a hard limit, then we could let admins set
> wal_keep_segments to -1, knowing that they'd set checkpoint_segments to
> the max space they had available.

This assumes that more checkpoint segments is always better, which
isn't true.  I might have 100 GB of disk space free, but not want to
replay WAL for 4 days if I have a crash.

I do think that the current default of checkpoint_segments=3 is
pathologically insane, but that's another can of worms.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating-point timestamps versus Range Types

2010-10-21 Thread Bruce Momjian

Greg Stark wrote:
> On Thu, Oct 21, 2010 at 4:49 PM, Bruce Momjian  wrote:
> > One thing we have talked about is converting the page on read-in from
> > the backend. ?Since the timestamps are the same size as float or
> > integer, that might be possible.
> 
> Did we have a solution for the problem that understanding which
> columns are timestamps requires having a tuple descriptor and parsing
> the every tuple? That seems like it would a) be slow and b) require a
> lot of high level code in the middle of a low-level codepath.

Yep, that's what it requires.  It would rewrite in the new format.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simplifying replication

2010-10-21 Thread Josh Berkus


>> Very true.  But the lack of a -1 setting for wal_keep_segments means
>> that if you would like to take a backup without archiving, you must
>> set wal_keep_segments to a value greater than or equal to the rate at
>> which you generate WAL segments multiplied by the time it takes you to
>> run a backup.  If that doesn't qualify as requiring arcane knowledge,
>> I'm mystified as to what ever could.

Speaking of which, what's the relationship between checkpoint_segments
and wal_keep_segments?  PG seems perfectly willing to let me set the
latter higher than the former, and it's not documented.

If checkpoint_segments were a hard limit, then we could let admins set
wal_keep_segments to -1, knowing that they'd set checkpoint_segments to
the max space they had available.

Although we might want to rename those.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rawdump

2010-10-21 Thread Stephen R. van den Berg

Bruce Momjian wrote:
>Greg Stark wrote:
>> On Tue, Oct 19, 2010 at 1:12 PM, Stephen R. van den Berg  
>> wrote:
>> > In order to simplify recovery at this point (enormously), it would
>> > have been very helpful (at almost negligible cost), to have the name
>> > of the table, the name of the columns, and the types of the
>> > columns available.

>> > Why don't we insert that data into the first page of a regular table
>> > file after in the special data area?

>I was thinking we could dump a flat file very 15 minutes into each
>database directory that had recovery-useful information.  It wouldn't be
>perfect, but would probably be sufficient for most forensics.

It would definitely be better than the current state.
But it still disconnects the information from the files they belong to (a bit).
>From a cost/benifit ratio point of view, I'd still prefer to interlace
the information into the tablefiles (which also scales better in case of
numerous tables).
-- 
Stephen.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rawdump

2010-10-21 Thread Stephen R. van den Berg

Greg Stark wrote:
>On Thu, Oct 21, 2010 at 11:30 AM, Stephen R. van den Berg  wrote:
>> For the recovery information I'd like to reserve:
>> identifier: 00: table OID
>> ? ? ? ? ? ?01: table layout

>So here's a proposal for something that could maybe be implemented. I
>think I'm leaning against this currently as there are just too many
>caveats even for this limited functionality. I'm more inclined to go
>with the idea someone else proposed of dumping an audit log of all DDL
>on the table or after any DDL dumping the create table statements as
>pg_dump would generate them would to a separate fork.

Yes, but that has two distinct downsides I'd like to avoid:
- It can grow uncontrollably in the case of someone using alter table
  on a (too) regular basis.
- It separates the data from the tablefile it pertains to (which could
  complicate recovery (a lot)).

>In this meta data object put:

>table oid
>table name
>number of columns
>array of typlen for those columns (-1 for varlena and size for fixed length)

[...]

>Including the type brings in a new set of complications. Even if you
>assume the built-in typoids never change there are going to be typoids
>that correspond to user defined types. Without the names of those
>types the oids are pretty useless.

>Just the typoids would put you over 8k in the worst case and the names
>would put you into the realm of needing arbitrary numbers of blocks
>for even average cases. Simiarly including the column names would
>require potentially many blocks.

All valid points/concerns.
But, let's approach this from the side of the forensics analist instead,
and see what information typically really would be needed (for argument's
sake, let's call the unfortunate sod that lost the catalog to his database
"the customer"):

The customer usually still has access to the developer, or some developer
documentation which documents which columns are used for what.  It would
most likely document most columns (especially the column names, to a
lesser extent, the column types), but might be lacking some of the
more recent changes which (unfortunately) were done on the live
database using alter table, and hadn't been documented properly yet
(I'm talking about typical real world cases I've encountered).

Which means that analist would primarily be concerned with getting back
the information of the column types and the column names.  The next
step would be to know about slight deviations from the documented
columns (dropped or added columns since the last revision of the docs).

Next are custom types.  Custom types are likely to be better documented,
hence the column name would usually be enough to recover the definition
of a custom type from the docs.

Assuming the above assumptions to be true in the common case, I would
propose to implement something along the following lines:

Entries for the special area in tables:

0007 00 tableoidTo be sprinkled in every
megabyte or so.

 01 00 cc tablename...  cc = number of columns

 01 01    ...   // are column widths
 for varlena widths
 01 02    ...   // are typeoids per
column (is 16-bits wide enough
for the common cases?  If not
disregard my suggestion and make
these 32-bit wide each)
 01 03 nn col1name nn col2name ...
nn = length of the column name
that follows
End of filled special area, no
need to parse beyond here in the
current page.

Whereas the 01 special area types could be present in any number
of pages.
If they are present, they shall be present starting at the
first page of the file, and possibly will be present in the
next following page(s), until a page is encountered without
them.

Multiple occurrences of 01 01, 01 02, or 01 03 shall be concatenated
to form the complete informationset.
-- 
Stephen.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simplifying replication

2010-10-21 Thread Bruce Momjian

Robert Haas wrote:
> > However the user-unfriendliness isn't the fact that administrators
> > need to determine how much disk they're willing to dedicate to
> > Postgres. The user-unfriendliness is that they then have to specify
> > this in terms of WAL log files and also have to know that we sometimes
> > keep more than that and so on.
> >
> > We've done a good job in the past of converting GUC variables to
> > meaningful units for administrators and users but it's an ongoing
> > effort. If we need a GUC to control the amount of disk space we use it
> > should be in units of MB/GB/TB. If we need a GUC for controlling how
> > much WAL history to keep for recovering standbys or replicas then it
> > should be specified in units of time.
> >
> > Units like "number of wal files" or worse in the case of
> > checkpoint_segments "number of wal files / 2 - 1" or something like
> > that are terrible. They require arcane knowledge for the
> > administrator to have a clue how to set.
> 
> Very true.  But the lack of a -1 setting for wal_keep_segments means
> that if you would like to take a backup without archiving, you must
> set wal_keep_segments to a value greater than or equal to the rate at
> which you generate WAL segments multiplied by the time it takes you to
> run a backup.  If that doesn't qualify as requiring arcane knowledge,
> I'm mystified as to what ever could.

LOL.  Time machine required (both forward and backward time options).

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql autocompletion for \z and \dg

2010-10-21 Thread Josh Kupershmidt

On Thu, Oct 21, 2010 at 8:45 PM, Robert Haas  wrote:
> Please add this in the usual spot:
> https://commitfest.postgresql.org/action/commitfest_view/open

Aye sir, added.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simplifying replication

2010-10-21 Thread Robert Haas

On Thu, Oct 21, 2010 at 8:52 PM, Greg Stark  wrote:
> On Thu, Oct 21, 2010 at 5:46 PM, Josh Berkus  wrote:
>>
>>> Agreed, but as a reality check:  when I proposed that wal_keep_segments
>>> = -1 would keep all WAL segments (for use while the file system was
>>> being backed up), I was told administrators shoud compute how much free
>>> disk space they had.  Obviously easy of use is not our #1 priority.
>>
>> Depends.  Running out of disk space isn't exactly user-friendly either.
>>  And detecting how much free space is available would be a painful bit
>> of platform-dependant code ...
>
> Nor can we assume we're the only thing using disk space.
>
> However the user-unfriendliness isn't the fact that administrators
> need to determine how much disk they're willing to dedicate to
> Postgres. The user-unfriendliness is that they then have to specify
> this in terms of WAL log files and also have to know that we sometimes
> keep more than that and so on.
>
> We've done a good job in the past of converting GUC variables to
> meaningful units for administrators and users but it's an ongoing
> effort. If we need a GUC to control the amount of disk space we use it
> should be in units of MB/GB/TB. If we need a GUC for controlling how
> much WAL history to keep for recovering standbys or replicas then it
> should be specified in units of time.
>
> Units like "number of wal files" or worse in the case of
> checkpoint_segments "number of wal files / 2 - 1" or something like
> that are terrible. They require arcane knowledge for the
> administrator to have a clue how to set.

Very true.  But the lack of a -1 setting for wal_keep_segments means
that if you would like to take a backup without archiving, you must
set wal_keep_segments to a value greater than or equal to the rate at
which you generate WAL segments multiplied by the time it takes you to
run a backup.  If that doesn't qualify as requiring arcane knowledge,
I'm mystified as to what ever could.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rawdump

2010-10-21 Thread Bruce Momjian

Greg Stark wrote:
> On Tue, Oct 19, 2010 at 1:12 PM, Stephen R. van den Berg  wrote:
> > In order to simplify recovery at this point (enormously), it would
> > have been very helpful (at almost negligible cost), to have the name
> > of the table, the name of the columns, and the types of the
> > columns available.
> >
> > Why don't we insert that data into the first page of a regular table
> > file after in the special data area?
> >
> > I'd be willing to create a patch for that (should be pretty easy),
> > if nobody considers it to be a bad idea.
> 
> There isn't necessarily one value for these attributes.  You can
> rename columns and that rename may succeed and commit or fail and
> rollback. You can drop or add columns and some rows will have or not
> have the added columns at all. You could even add a column, insert
> some rows, then abort -- all in a transaction. So some (aborted) rows
> will have extra columns that aren't even present in the current table
> definition.
> 
> All this isn't to say the idea you're presenting is impossible or a
> bad idea. If this meta information was only a hint for forensic
> purposes and you take into account these caveats it might still be
> useful. But I'm not sure how useful. I mean, you can't really decipher
> everything properly without the data in the catalog -- and you have to
> premise this on the idea that you've lost everything in the catalog
> but not the data in other tables. Which seems like a narrow use case.

I was thinking we could dump a flat file very 15 minutes into each
database directory that had recovery-useful information.  It wouldn't be
perfect, but would probably be sufficient for most forensics.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simplifying replication

2010-10-21 Thread Greg Stark

On Thu, Oct 21, 2010 at 5:46 PM, Josh Berkus  wrote:
>
>> Agreed, but as a reality check:  when I proposed that wal_keep_segments
>> = -1 would keep all WAL segments (for use while the file system was
>> being backed up), I was told administrators shoud compute how much free
>> disk space they had.  Obviously easy of use is not our #1 priority.
>
> Depends.  Running out of disk space isn't exactly user-friendly either.
>  And detecting how much free space is available would be a painful bit
> of platform-dependant code ...

Nor can we assume we're the only thing using disk space.

However the user-unfriendliness isn't the fact that administrators
need to determine how much disk they're willing to dedicate to
Postgres. The user-unfriendliness is that they then have to specify
this in terms of WAL log files and also have to know that we sometimes
keep more than that and so on.

We've done a good job in the past of converting GUC variables to
meaningful units for administrators and users but it's an ongoing
effort. If we need a GUC to control the amount of disk space we use it
should be in units of MB/GB/TB. If we need a GUC for controlling how
much WAL history to keep for recovering standbys or replicas then it
should be specified in units of time.

Units like "number of wal files" or worse in the case of
checkpoint_segments "number of wal files / 2 - 1" or something like
that are terrible. They require arcane knowledge for the
administrator to have a clue how to set.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simplifying replication

2010-10-21 Thread Josh Berkus


> Agreed, but as a reality check:  when I proposed that wal_keep_segments
> = -1 would keep all WAL segments (for use while the file system was
> being backed up), I was told administrators shoud compute how much free
> disk space they had.  Obviously easy of use is not our #1 priority.

Depends.  Running out of disk space isn't exactly user-friendly either.
 And detecting how much free space is available would be a painful bit
of platform-dependant code ...

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql autocompletion for \z and \dg

2010-10-21 Thread Robert Haas

On Wed, Oct 20, 2010 at 10:56 PM, Josh Kupershmidt  wrote:
> It looks like psql's tab completion for the \z and \dg commands in
> psql are missing. I couldn't see a reason for this, so attached patch
> fixes.
>
> Also, this patch proposes to change psql's "\?" help text to say that
> \dg and \du are the same, since AFAICT they do exactly the same thing.

Please add this in the usual spot:
https://commitfest.postgresql.org/action/commitfest_view/open

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simplifying replication

2010-10-21 Thread Robert Haas

On Thu, Oct 21, 2010 at 8:22 PM, Bruce Momjian  wrote:
> Josh Berkus wrote:
>> Greg,
>>
>> > The way things stand you *always* need archived logs. Even if you have
>> > streaming set up it might try to use archived logs if it falls too far
>> > behind.
>>
>> Actually, you don't.  If you're willing to accept possible
>> desynchronization and recloning of the standbys, then you can skip the
>> archive logs.
>
> Agreed, but as a reality check:  when I proposed that wal_keep_segments
> = -1 would keep all WAL segments (for use while the file system was
> being backed up), I was told administrators shoud compute how much free
> disk space they had.  Obviously easy of use is not our #1 priority.

Amen.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_wal_senders must die

2010-10-21 Thread Bruce Momjian

Robert Haas wrote:
> On Thu, Oct 21, 2010 at 4:21 PM, Josh Berkus  wrote:
> > On 10/20/10 6:54 PM, Robert Haas wrote:
> >> I find it impossible to believe that's
> >> a good decision, and IMHO we should be focusing on how to make the
> >> parameters PGC_SIGHUP rather than PGC_POSTMASTER, which would give us
> >> most of the same benefits without throwing away hard-won performance.
> >
> > I'd be happy to accept that. ?Is it possible, though?
> 
> I sketched an outline of the problem AIUI here:
> 
> http://archives.postgresql.org/pgsql-hackers/2010-10/msg01348.php
> 
> I think it's possible; I'm not quite sure how hard it is.
> Unfortunately, I've not had as much PG-hacking time lately as I'd
> like...

Have we documented these TODOs?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_wal_senders must die

2010-10-21 Thread Bruce Momjian

Robert Haas wrote:
> On Wed, Oct 20, 2010 at 3:40 PM, Greg Stark  wrote:
> > On Wed, Oct 20, 2010 at 6:29 AM, Robert Haas  wrote:
> >> Exactly. ?It doesn't take many 3-7% slowdowns to add up to being 50%
> >> or 100% slower, and that sucks. ?In fact, I'm still not convinced that
> >> we were wise to boost default_statistics_target as much as we did. ?I
> >> argued for a smaller boost at the time.
> >
> > Well we don't want to let ourselves be paralyzed by FUD so it was
> > important to identify specific concerns and then tackle those
> > concerns. Once we identified the worst-case planning cases we profiled
> > them and found that the inflection point of the curve was fairly
> > clearly above 100 but that there were cases where values below 1,000
> > caused problems. So I'm pretty happy with the evidence-based approach.
> 
> The inflection point of the curve was certainly a good thing for us to
> look at but the fact remains that we took a hit on a trivial
> benchmark, and we can't afford to take too many of those.

Agreed.  If people start wondering if our new major releases are perhaps
_slower_ than previous ones, we have lost a huge amount of momentum.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] crash in plancache with subtransactions

2010-10-21 Thread Alvaro Herrera

Excerpts from Tom Lane's message of jue oct 21 19:36:07 -0300 2010:

> I'm not immediately sure why plpgsql_subxact_cb is failing to clean up
> correctly in this example, but that seems to be where to look.

I think the reason is that one econtext is pushed for function
execution, and another one for blocks that contain exceptions.  The
example function does not contain exceptions -- the savepoints are
handled by the external SQL code.

I'll have a closer look tomorrow.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simplifying replication

2010-10-21 Thread Bruce Momjian

Josh Berkus wrote:
> Greg,
> 
> > The way things stand you *always* need archived logs. Even if you have
> > streaming set up it might try to use archived logs if it falls too far
> > behind.
> 
> Actually, you don't.  If you're willing to accept possible
> desynchronization and recloning of the standbys, then you can skip the
> archive logs.

Agreed, but as a reality check:  when I proposed that wal_keep_segments
= -1 would keep all WAL segments (for use while the file system was
being backed up), I was told administrators shoud compute how much free
disk space they had.  Obviously easy of use is not our #1 priority.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Creation of temporary tables on read-only standby servers

2010-10-21 Thread Bruce Momjian

Tom Lane wrote:
> Bruce Momjian  writes:
> > Currently it isn't possible to create temporary tables on read-only
> > standby servers, and I don't see it listed on the TODO list.  Can I add
> > it?
> 
> Not unless you have some credible concept for how it might ever be
> implemented.  You can't create temp tables because you can't modify
> system catalogs, and if you did somehow create them you couldn't put
> anything in them because you can't generate XIDs on a slave ... much
> less commit them.  We have talked about ways that temp tables might be
> created without touching the "real" system catalogs, but the XID issue
> seems a complete showstopper.

So, this is one of those odd cases where we know people are going to ask
for a feature (temp tables on slaves), but we are not ready to put it on
our TODO list.  Where do we document that this isn't going to happen? 
In "Features we don't want"?  That title doesn't really match. 
"Features we don't know how to do" doesn't sound good.  ;-)

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rawdump

2010-10-21 Thread Stephen R. van den Berg

Kevin Grittner wrote:
>Tom Lane  wrote:
>> Most of the data-loss reports I've seen appeared to come from
>> people who wouldn't be capable of doing such recovery work even if
>> better tools were available.

>No doubt; but the recovery work often winds up in the hands of
>people with more skills than those responsible for the loss. 
>Whoever comes in to help with recovery is looking for every bit of
>traction they can get.  You'd be amazed at some of the odd straws
>people can grasp to help recover data.

And, obviously, the amount of time spent (by the professional) to
recover the data goes down a lot when more information is available;
which reduces cost and shortens the downtime to the sad sod that
lost the data in the first place.
-- 
Stephen.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Creation of temporary tables on read-only standby servers

2010-10-21 Thread Bruce Momjian

Greg Stark wrote:
> On Tue, Oct 19, 2010 at 12:03 PM, Robert Haas  wrote:
> > The trick is that it would require us to have two pg_class tables, two
> > pg_attribute tables, two pg_attrdef tables, etc.: in each case, one
> > permanent and one temporary. ?I am not sure how complex that will turn
> > out to be.
> 
> Tom suggested using inheritance for this.
> 
> I find it strange to try constructing catalog tables to represent
> these local definitions which never need to be read by any other
> backend and in any case are 1:1 copies of the global catalog entries.
> 
> It seems to me simpler and more direct to just nail relcache
> entries for these objects into memory and manipulate them directly.
> They can be constructed from the global catalog tables and then
> tweaked to point to the backend local temporary tables.

Funny, but that is how I implemented temporary tables in 1999 and lasted
until 2002 when schema support was added.  It actually worked because
all the lookups go through the syscache.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating-point timestamps versus Range Types

2010-10-21 Thread Greg Stark

On Thu, Oct 21, 2010 at 4:49 PM, Bruce Momjian  wrote:
> One thing we have talked about is converting the page on read-in from
> the backend.  Since the timestamps are the same size as float or
> integer, that might be possible.

Did we have a solution for the problem that understanding which
columns are timestamps requires having a tuple descriptor and parsing
the every tuple? That seems like it would a) be slow and b) require a
lot of high level code in the middle of a low-level codepath.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating-point timestamps versus Range Types

2010-10-21 Thread Bruce Momjian

Robert Haas wrote:
> On Mon, Oct 18, 2010 at 2:29 PM, Jeff Davis  wrote:
> > A reasonable conversion path might be to offer integer timestamps using
> > a different type name (e.g. inttimestamp) that always means integer
> > timestamps. Then, they could convert using ALTER TABLE, then do an
> > in-place upgrade. We could even make pg_upgrade optionally convert
> > inttimestamp to timestamp in O(1) on an integer-timestamps build.
> 
> I think in retrospect it would certainly have been better to make
> integer timestamps and float timestamps two separate data types,
> rather than two versions of the same data type.  Whether it's worth
> providing that now after the fact is not clear to me.  I'd be inclined
> to wait and see whether we get many complaints...
> 
> One problem with changing types in pg_upgrade is that type OIDs can
> get embedded in the on-disk representation - I believe that this
> happens for arrays, for instance.  So I think it's practical for
> pg_upgrade to change type names during a version upgrade, but not type
> OIDs.

One thing we have talked about is converting the page on read-in from
the backend.  Since the timestamps are the same size as float or
integer, that might be possible.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Serializable snapshot isolation patch

2010-10-21 Thread Kevin Grittner

Jeff Davis  wrote:

> When using locks in an unconventional way, it would be helpful to
> describe the invalid schedules that you're preventing. Perhaps an
> example if you think it would be reasonably simple? Also some
> indication of how another process is intended to modify the list
> without walking it.

I've just pushed some comment changes intended to address this.  Did
I hit the mark?

-Kevin

P.S.  Sorry for the delay in responding to such simple requests --
I've been tied up with a family medical crisis; I hope to crank
through much of what you've raised this weekend.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] crash in plancache with subtransactions

2010-10-21 Thread Alvaro Herrera

Excerpts from Tom Lane's message of jue oct 21 19:36:07 -0300 2010:
> Alvaro Herrera  writes:
> > A customer was hitting some misbehavior in one of their internal tests and
> > I tracked it down to plancache not behaving properly with
> > subtransactions: in particular, a plan is not being marked "dead" when
> > the subtransaction on which it is planned rolls back.
> 
> I don't believe that it's plancache's fault; the real problem is that
> plpgsql is keeping "simple expression" execution trees around longer
> than it should.  Your patch masks the problem by forcing those trees to
> be rebuilt, but it's the execution trees not the plan trees that contain
> stale data.

Ahh, this probably explains why I wasn't been able to reproduce the
problem without involving subxacts, or prepared plans, that seemed to
follow mostly the same paths around plancache cleanup.

It's also the likely cause that this hasn't ben reported earlier.

> I'm not immediately sure why plpgsql_subxact_cb is failing to clean up
> correctly in this example, but that seems to be where to look.

Will take a look ... if the girls let me ...

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rawdump

2010-10-21 Thread Robert Haas

On Thu, Oct 21, 2010 at 5:21 PM, Kevin Grittner
 wrote:
> Tom Lane  wrote:
>
>> Most of the data-loss reports I've seen appeared to come from
>> people who wouldn't be capable of doing such recovery work even if
>> better tools were available.
>
> No doubt; but the recovery work often winds up in the hands of
> people with more skills than those responsible for the loss.
> Whoever comes in to help with recovery is looking for every bit of
> traction they can get.  You'd be amazed at some of the odd straws
> people can grasp to help recover data.
>
> I don't know how big the special area is, but if enough room could
> be carved out to include even the relation ID or filename, it might
> be a big help to someone.  I'm pretty skeptical about including
> information about attributes, though.

Unfortunately, the use case for the relation ID or filename is much
thinner than the case for the column definitions.  You're less likely
to lose that information, and if you do lose it you can probably guess
by file size or by running strings on the data files.  The really hard
thing is to remember just exactly what columns you had in there, in
what order... and oh yeah there was that column we dropped.  But like
you, I'm pretty dubious about trying to store all that stuff.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simplifying replication

2010-10-21 Thread Mark Kirkwood


On 19/10/10 13:16, Josh Berkus wrote:

Robert asked me to write this up, so here it is.

It is critical that we make replication easier to set up, administrate 
and monitor than it currently is.  In my conversations with people, 
this is more important to our users and the adoption of PostgreSQL 
than synchronous replication is.


First, I'm finding myself constantly needing to tutor people on how to 
set up replication.  The mere fact that it requires a minimum 1-hour 
class to explain how to use it, or a 10-page tutoral, tells us it's 
too complex.  As further evidence, Bruce and I explained binary 
replication to several MySQL geeks at OpenSQLCamp last weekend, and 
they were horrified at the number and complexity of the steps 
required.  As it currently is, binary replication is not going to win 
us a lot of new users from the web development or virtualization world.




+1

I've been having the same experience - how to set this up and do 
failover and failback etc occupies quite a bit of time in courses I've 
been teaching here in NZ and Australia. Having this whole replication 
business much simpler is definitely the way to go.


A good example of how simple it can be is mongodb, where it is 
essentially one command to setup a 2 replica system with a voting arbiter:


$ mongo
> rs.initiate(
  {
_id : "replication_set0",
members : [
 { _id  : 0, host : "192.163,2,100" },
 { _id  : 1, host : "192.168.2.101" },
 { _id  : 2, host : "192.168.2.103", arbiterOnly : true }
]
  }
)



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] docs on contrib modules that can't pg_upgrade?

2010-10-21 Thread Bruce Momjian

Robert Treat wrote:
> Howdy folks,
> 
> Was wondering if there are any docs on which contrib modules don't work with 
> pg_upgrade? I seem to remember discussion on this during the 9.0 cycle, but 
> couldn't find it in the mail archive, and don't see anything in the wiki.  
> What 
> brings this up is I'm currently working on an 8.3 upgrade and it has 
> pg_freespacemap which breaks things; I think easy enough to work-around in 
> this case, but I am sure for other contribs, or for folks with a lot of 
> machinery built on top of a contrib, that won't always be the case. If 
> something like this doesn't exist, I'll start a wiki page on it, but thought 
> I'd ask first.

I don't know of any /contrib modules that will not upgrade;  if we had
any they would be mentioned in the pg_upgrade docs and checked in the
source code.

We had a possible issue with hstore, but that was changed in a
backward-compatible way for 9.0.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] crash in plancache with subtransactions

2010-10-21 Thread Tom Lane

Alvaro Herrera  writes:
> A customer was hitting some misbehavior in one of their internal tests and
> I tracked it down to plancache not behaving properly with
> subtransactions: in particular, a plan is not being marked "dead" when
> the subtransaction on which it is planned rolls back.

I don't believe that it's plancache's fault; the real problem is that
plpgsql is keeping "simple expression" execution trees around longer
than it should.  Your patch masks the problem by forcing those trees to
be rebuilt, but it's the execution trees not the plan trees that contain
stale data.

I'm not immediately sure why plpgsql_subxact_cb is failing to clean up
correctly in this example, but that seems to be where to look.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Timeout and wait-forever in sync rep

2010-10-21 Thread Bruce Momjian

Fujii Masao wrote:
> Hi,
> 
> As the result of the discussion, I think that we need the following two
> parameters for the case where the standby goes down.

Can we have a parameter that calls a operating system command when a
standby is declared dead, to notify the administrator?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] pg_filedump binary for CentOS

2010-10-21 Thread Bruce Momjian

Greg Stark wrote:
> On Thu, Oct 14, 2010 at 2:53 PM, Tom Lane  wrote:
> > Bruce Momjian  writes:
> >> Should we consider moving pg_filedump into our /contrib?
> >
> > Can't: it's GPL.
> >
> 
> I don't particularly see a problem with having GPL'd contrib modules.
> It would mean any users hoping to redistribute the package couldn't
> include those modules except under the GPL. But most repackagers don't
> include the contrib modules anyways. Even ones that do and want to
> include those modules would only have to include the source to that
> module.
> 
> I can see not wanting to let that camel's nose in for fear of having
> packagers always be uncertain about the status of each contrib module
> though.

I think we should just link to the tool from our docs so there is no
license complexity.  Where do we add it?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UNION ALL has higher cost than inheritance

2010-10-21 Thread David E. Wheeler

On Oct 21, 2010, at 2:17 PM, Tom Lane wrote:

> The oversight here is that we don't use appendrel planning for
> a top-level UNION ALL construct.  That didn't use to matter,
> because you always got the same stupid Append plan either way.
> Now it seems like we ought to have some more intelligence for the
> top-level SetOp case.  I smell some code refactoring coming up.

Does it smell like chicken?

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why do we have a database specification in .pgpass?

2010-10-21 Thread Bruce Momjian

Peter Eisentraut wrote:
> On ons, 2010-10-13 at 14:32 -0400, Bruce Momjian wrote:
> > We have a database specification in .pgpass:
> > 
> > hostname:port:database:username:password
> > 
> > What is the purpose of 'database' since username/password combinations
> > are global, not per database?  I would like to documents its purpose.
> 
> As a side note, the thing at the other end of a connection is not
> necessarily a PostgreSQL server.  It could be a connection pool proxy.
> I don't know if any implementatation could make use of the database
> field at the moment, but it should be kept in mind.
> 
> That said, it would probably be good to document that the database field
> is currently only useful in certain limited circumstances.

Agreed, done.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 70d9202..a911c50 100644
*** a/doc/src/sgml/libpq.sgml
--- b/doc/src/sgml/libpq.sgml
*** myEventProc(PGEventId evtId, void *evtIn
*** 6331,6336 
--- 6331,6338 
 or the default socket directory) connections coming from the local
 machine. In a standby server, a database name of replication
 matches streaming replication connections made to the master server.
+The database field is of limited usefulness because
+users have the same password for all databases in the same cluster.

  


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rawdump

2010-10-21 Thread Kevin Grittner

Tom Lane  wrote:

> Most of the data-loss reports I've seen appeared to come from
> people who wouldn't be capable of doing such recovery work even if
> better tools were available.

No doubt; but the recovery work often winds up in the hands of
people with more skills than those responsible for the loss. 
Whoever comes in to help with recovery is looking for every bit of
traction they can get.  You'd be amazed at some of the odd straws
people can grasp to help recover data.

I don't know how big the special area is, but if enough room could
be carved out to include even the relation ID or filename, it might
be a big help to someone.  I'm pretty skeptical about including
information about attributes, though.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UNION ALL has higher cost than inheritance

2010-10-21 Thread Tom Lane

Greg Stark  writes:
> On Thu, Oct 21, 2010 at 6:57 AM, Tom Lane  wrote:
>>> Thanks. It also explains my another question why Merge Append cannot
>>> be used for UNION ALL plans.

>> Hmm, seems like the example you show ought to work.  I wonder if there
>> was an oversight in that patch...

> Huh, that definitely worked in the earlier versions of the patch (as
> much as it "worked" at all)

Actually, it works as long as the UNION is in a subquery:

regression=# EXPLAIN select * from (
(SELECT * FROM ONLY parent ORDER BY i) UNION ALL
(SELECT * FROM child ORDER BY i)) ss ORDER BY i LIMIT 10;
  QUERY PLAN
---
 Limit  (cost=168.76..169.13 rows=10 width=4)
   ->  Result  (cost=168.76..294.51 rows=3400 width=4)
 ->  Merge Append  (cost=168.76..294.51 rows=3400 width=4)
   Sort Key: parent.i
   ->  Sort  (cost=168.75..174.75 rows=2400 width=4)
 Sort Key: parent.i
 ->  Seq Scan on parent  (cost=0.00..34.00 rows=2400 
width=4)
   ->  Index Scan using child_i_idx on child  (cost=0.00..43.25 
rows=1000 width=4)
(8 rows)

The oversight here is that we don't use appendrel planning for
a top-level UNION ALL construct.  That didn't use to matter,
because you always got the same stupid Append plan either way.
Now it seems like we ought to have some more intelligence for the
top-level SetOp case.  I smell some code refactoring coming up.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] find -path isn't portable

2010-10-21 Thread Alvaro Herrera

Excerpts from Tom Lane's message of jue oct 21 17:48:18 -0300 2010:
> buildfarm member koi, having recently been rescued from git purgatory,
> is failing like this:
> 
> configure: using CFLAGS=-O -Kinline
> configure: using CPPFLAGS= -I/usr/local/include/libxml2  -I/usr/local/include
> configure: using LDFLAGS= -L/usr/local/lib  -L/usr/local/lib
> preparing build tree... UX:find: ERROR: Illegal option -- -path
> UX:find: TO FIX: Usage: find [path-list] [predicate-list]
> UX:ln: ERROR: Cannot create 
> /home/ohp/pgfarmbuild/HEAD/pgsql.10464//config/Makefile: No such file or 
> directory
> configure: error: failed
> 
> which indicates it can't cope with your recent patch to prep_buildtree.
> I was afraid that might be a problem.  Can we please stick to the
> find(1) options defined by the Single Unix Spec?
> http://www.opengroup.org/onlinepubs/007908799/xcu/find.html

Ugh.  I'll look into this.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] crash in plancache with subtransactions

2010-10-21 Thread Alvaro Herrera

Hi,

A customer was hitting some misbehavior in one of their internal tests and
I tracked it down to plancache not behaving properly with
subtransactions: in particular, a plan is not being marked "dead" when
the subtransaction on which it is planned rolls back.  It was reported
in 8.4, but I can reproduce the problem on 9.0 too with this small
script:

drop schema alvherre cascade;
drop schema test cascade;
create schema test;
create schema alvherre;
set search_path = 'alvherre';

create or replace function dummy(text) returns text language sql
as $$ SELECT relname::text FROM pg_class c WHERE c.oid = $1::regclass 
$$;

create or replace function broken(p_name_table text) returns void
language plpgsql as $$
declare
v_table_full text := alvherre.dummy(p_name_table);
begin
return;
end;
$$;

BEGIN;
 create table test.stuffs (stuff text);
 SAVEPOINT a;
 select broken('nonexistant.stuffs');

 ROLLBACK TO a;
 select broken('test.stuffs');

rollback;


The symptom is that the second call to broken() fails with this error
message:

ERROR:  relation "" does not exist
CONTEXT:  SQL function "dummy" statement 1
PL/pgSQL function "broken" line 3 during statement block local variable 
initialization

Note that this is totally bogus, because the relation being referenced
does indeed exist.  In fact, if you commit the transaction and call the
function again, it works.

Also, the state after the first call is a bit bogus: if you repeat the
whole sequence starting at the BEGIN line, it causes a crash on 8.4.

I hacked up plancache a bit so that it marks plans as dead when the
subtransaction resource owner releases it.  It adds a new arg to
ReleaseCachedPlan(); if true, the plan is marked dead.  All current
callers, except the one in ResourceOwnerReleaseInternal(), use false
thus preserving the current behavior.  resowner sets this as true when
aborting a (sub)transaction.

I have to admit that it seems somewhat the wrong API, but I don't see a
better way.  (I thought above relcache or syscache inval, but as far as
I can't tell there isn't any here).  I'm open to suggestions.

Patch attached.

-- 
Álvaro Herrera 


0001-Mark-a-cache-plan-as-dead-when-aborting-its-creating.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exposing an installation's default value of unix_socket_directory

2010-10-21 Thread Tom Lane

"A.M."  writes:
> On Oct 21, 2010, at 4:19 PM, Robert Haas wrote:
>> I think adding this to pg_config is sensible.  Sure, the user could
>> have moved the socket directory.  But it's a place to start looking.
>> So why not?

> Because pg_config is supposed to return the current state of a cluster?

pg_config is not supposed to do any such thing.  It exists specifically
and solely to tell you about build options that were baked into the
compiled code.

Actually, the only reason this is even up for discussion is that there's
no configure option to set DEFAULT_PGSOCKET_DIR.  If there were, and
debian were using it, then pg_config --configure would tell what I wish
to know.  I thought for a bit about proposing we add such an option,
but given the current state of play it might be more misleading than
helpful: as long as distros are accustomed to changing this setting via
a patch, you couldn't trust pg_config --configure to tell you what a
given installation actually has compiled into it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] find -path isn't portable

2010-10-21 Thread Tom Lane

buildfarm member koi, having recently been rescued from git purgatory,
is failing like this:

configure: using CFLAGS=-O -Kinline
configure: using CPPFLAGS= -I/usr/local/include/libxml2  -I/usr/local/include
configure: using LDFLAGS= -L/usr/local/lib  -L/usr/local/lib
preparing build tree... UX:find: ERROR: Illegal option -- -path
UX:find: TO FIX: Usage: find [path-list] [predicate-list]
UX:ln: ERROR: Cannot create 
/home/ohp/pgfarmbuild/HEAD/pgsql.10464//config/Makefile: No such file or 
directory
configure: error: failed

which indicates it can't cope with your recent patch to prep_buildtree.
I was afraid that might be a problem.  Can we please stick to the
find(1) options defined by the Single Unix Spec?
http://www.opengroup.org/onlinepubs/007908799/xcu/find.html

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per-column collation, work in progress

2010-10-21 Thread Josh Berkus


> I think that that would probably involve a whole lot more notational
> busywork than just replacing typmod with something more complicated.
> However, we're talking about breaking vast amounts of code in either
> case, so maybe making it even vaster isn't a real consideration.

Gods, yes.  Please let's not extend typemod any further without an overhaul.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per-column collation, work in progress

2010-10-21 Thread Tom Lane

Robert Haas  writes:
> On Thu, Oct 21, 2010 at 4:28 PM, Tom Lane  wrote:
>> TypeName per se is completely inappropriate for use beyond the first
>> stage of parsing, because it requires catalog lookups to make any sense
>> of.  I think the post-parsing representation should still start with a
>> type OID.  I can agree with replacing typmod with a struct, though.

> I think we should have both the type OID and the typmod in the struct.
>  Carrying the type OID separately from the typmod has caused us enough
> heartache already.  No?

I think that that would probably involve a whole lot more notational
busywork than just replacing typmod with something more complicated.
However, we're talking about breaking vast amounts of code in either
case, so maybe making it even vaster isn't a real consideration.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exposing an installation's default value of unix_socket_directory

2010-10-21 Thread A.M.


On Oct 21, 2010, at 4:19 PM, Robert Haas wrote:

> 2010/10/21 Tom Lane :
>> Alvaro Herrera  writes:
>>> Excerpts from Cédric Villemain's message of jue oct 21 16:01:30 -0300 2010:
 I agree this is interesting information to get, but wonder how
 pg_config can know that and it looks to me that this information as
 nothing to do in pg_config
 
 pg_config is all about installation, socket_dir is a postgresql.conf 
 setting.
>> 
>>> Yeah -- how is pg_config to know?  All it can tell you is what was the
>>> compiled-in default.
>> 
>> That's what I wanted, actually.  If you've set a non-default value in
>> postgresql.conf, SHOW will tell you about that, but it fails to expose
>> the default value.
>> 
>>> Maybe you should go the SHOW route.  The user could connect via TCP and
>>> find out the socket directory that way.
>> 
>> Yeah, the SHOW case is not useless by any means.
> 
> I think adding this to pg_config is sensible.  Sure, the user could
> have moved the socket directory.  But it's a place to start looking.
> So why not?


Because pg_config is supposed to return the current state of a cluster?
Because it might indicate a connection to the wrong server?

Cheers,
M
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per-column collation, work in progress

2010-10-21 Thread Robert Haas

On Thu, Oct 21, 2010 at 4:28 PM, Tom Lane  wrote:
> Peter Eisentraut  writes:
>> We already have TypeName as a structure that contains type and typmod
>> (and collation, in my patch).  We could make that a primnode instead of
>> a parsenode, and use it in more places, or we could make a new leaner
>> structure that only contains the numeric info.
>
> TypeName per se is completely inappropriate for use beyond the first
> stage of parsing, because it requires catalog lookups to make any sense
> of.  I think the post-parsing representation should still start with a
> type OID.  I can agree with replacing typmod with a struct, though.

I think we should have both the type OID and the typmod in the struct.
 Carrying the type OID separately from the typmod has caused us enough
heartache already.  No?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per-column collation, work in progress

2010-10-21 Thread Tom Lane

Peter Eisentraut  writes:
> We already have TypeName as a structure that contains type and typmod
> (and collation, in my patch).  We could make that a primnode instead of
> a parsenode, and use it in more places, or we could make a new leaner
> structure that only contains the numeric info.

TypeName per se is completely inappropriate for use beyond the first
stage of parsing, because it requires catalog lookups to make any sense
of.  I think the post-parsing representation should still start with a
type OID.  I can agree with replacing typmod with a struct, though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_wal_senders must die

2010-10-21 Thread Robert Haas

On Thu, Oct 21, 2010 at 4:21 PM, Josh Berkus  wrote:
> On 10/20/10 6:54 PM, Robert Haas wrote:
>> I find it impossible to believe that's
>> a good decision, and IMHO we should be focusing on how to make the
>> parameters PGC_SIGHUP rather than PGC_POSTMASTER, which would give us
>> most of the same benefits without throwing away hard-won performance.
>
> I'd be happy to accept that.  Is it possible, though?

I sketched an outline of the problem AIUI here:

http://archives.postgresql.org/pgsql-hackers/2010-10/msg01348.php

I think it's possible; I'm not quite sure how hard it is.
Unfortunately, I've not had as much PG-hacking time lately as I'd
like...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rawdump

2010-10-21 Thread Tom Lane

"Stephen R. van den Berg"  writes:
> Tom Lane wrote:
>> There are way too many scenarios where you'll have no hope of doing
>> any such manual recovery anyway.

> True.  It's all a matter of statistics.  Judging by the number of reports
> I find by googling net-history, I'd have to conclude that the proposed
> extra information would have helped more than half of them.

Uh, no, it would have helped whatever minuscule fraction of them had the
tools and the expertise to make use of the information.  This should not
be confused with a magic fix-it button.  If you've lost your system
catalogs you're facing a whole lot of work that will at best recover an
untrustworthy version of some of your data.  Most of the data-loss
reports I've seen appeared to come from people who wouldn't be capable
of doing such recovery work even if better tools were available.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL and HugePage

2010-10-21 Thread David Fetter

On Thu, Oct 21, 2010 at 12:10:22PM -0700, David Gould wrote:
> On Thu, Oct 21, 2010 at 08:16:27AM -0700, Mark Wong wrote:
> > On Tue, Oct 19, 2010 at 8:30 PM, daveg  wrote:
> > > On Wed, Oct 20, 2010 at 04:08:37PM +1300, Mark Kirkwood wrote:
> > >> On 20/10/10 16:05, Mark Kirkwood wrote:
> > >> >
> > >> >
> > >> >shmget and friends are hugetlbpage  aware, so it seems it should 'just
> > >> >work'.
> > >> >
> > >>
> > >> Heh - provided you specify
> > >>
> > >> SHM_HUGETLB
> > >>
> > >>
> > >> in the relevant call that is :-)
> > >
> > > I had a patch for this against 8.3 that I could update if there is any
> > > interest. I suspect it is helpful.
> > 
> > Oh, probably better than me digging up my broken one.  Send it out as
> > is if you don't want to update it. :)
> 
> I'll update it and see if I can get a largish machine to test, at least with
> pgbench on. But not today alas.

If you'd be so kind as to update it, others can probably find the
aforementioned largish machine to test it on :)

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_wal_senders must die

2010-10-21 Thread Josh Berkus

On 10/20/10 6:54 PM, Robert Haas wrote:
> I find it impossible to believe that's
> a good decision, and IMHO we should be focusing on how to make the
> parameters PGC_SIGHUP rather than PGC_POSTMASTER, which would give us
> most of the same benefits without throwing away hard-won performance.

I'd be happy to accept that.  Is it possible, though?

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exposing an installation's default value of unix_socket_directory

2010-10-21 Thread Robert Haas

2010/10/21 Tom Lane :
> Alvaro Herrera  writes:
>> Excerpts from Cédric Villemain's message of jue oct 21 16:01:30 -0300 2010:
>>> I agree this is interesting information to get, but wonder how
>>> pg_config can know that and it looks to me that this information as
>>> nothing to do in pg_config
>>>
>>> pg_config is all about installation, socket_dir is a postgresql.conf 
>>> setting.
>
>> Yeah -- how is pg_config to know?  All it can tell you is what was the
>> compiled-in default.
>
> That's what I wanted, actually.  If you've set a non-default value in
> postgresql.conf, SHOW will tell you about that, but it fails to expose
> the default value.
>
>> Maybe you should go the SHOW route.  The user could connect via TCP and
>> find out the socket directory that way.
>
> Yeah, the SHOW case is not useless by any means.

I think adding this to pg_config is sensible.  Sure, the user could
have moved the socket directory.  But it's a place to start looking.
So why not?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exposing an installation's default value of unix_socket_directory

2010-10-21 Thread Tom Lane

Alvaro Herrera  writes:
> Excerpts from CÃ©dric Villemain's message of jue oct 21 16:01:30 -0300 2010:
>> I agree this is interesting information to get, but wonder how
>> pg_config can know that and it looks to me that this information as
>> nothing to do in pg_config
>> 
>> pg_config is all about installation, socket_dir is a postgresql.conf setting.

> Yeah -- how is pg_config to know?  All it can tell you is what was the
> compiled-in default.

That's what I wanted, actually.  If you've set a non-default value in
postgresql.conf, SHOW will tell you about that, but it fails to expose
the default value.

> Maybe you should go the SHOW route.  The user could connect via TCP and
> find out the socket directory that way.

Yeah, the SHOW case is not useless by any means.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rawdump

2010-10-21 Thread Greg Stark

On Thu, Oct 21, 2010 at 11:30 AM, Stephen R. van den Berg  wrote:
> For the recovery information I'd like to reserve:
> identifier: 00: table OID
>            01: table layout
>

So here's a proposal for something that could maybe be implemented. I
think I'm leaning against this currently as there are just too many
caveats even for this limited functionality. I'm more inclined to go
with the idea someone else proposed of dumping an audit log of all DDL
on the table or after any DDL dumping the create table statements as
pg_dump would generate them would to a separate fork.

But here goes just to give you an idea what I think is doable and how
limited it would be:

In this meta data object put:

table oid
table name
number of columns
array of typlen for those columns (-1 for varlena and size for fixed length)

That would have a maximum size of just over 6k which is too large for
a BUFSZ 4k database but fits on default 8k databases. It would have
enough information to be able to find all the table columns but not to
understand how to interpret the contents either what their name or
types.

Including the type brings in a new set of complications. Even if you
assume the built-in typoids never change there are going to be typoids
that correspond to user defined types. Without the names of those
types the oids are pretty useless.

Just the typoids would put you over 8k in the worst case and the names
would put you into the realm of needing arbitrary numbers of blocks
for even average cases. Simiarly including the column names would
require potentially many blocks.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exposing an installation's default value of unix_socket_directory

2010-10-21 Thread Dimitri Fontaine

Tom Lane  writes:
> One possible response would be to add an item to what pg_config knows
> about, eg "pg_config --socketdir".  This doesn't answer every possible
> use-case either, but it would be helpful for some scenarios.
>
> Thoughts?

Following some links one can find out
  http://packages.debian.org/source/unstable/postgresql-9.0

http://ftp.de.debian.org/debian/pool/main/p/postgresql-9.0/postgresql-9.0_9.0.1-1.debian.tar.gz

And check that debian package is patching src/include/pg_config_manual.h
and not using the other facility proposed in the comments:

 * here's where to twiddle it.  You can also override this at runtime
 * with the postmaster's -k switch.

-#define DEFAULT_PGSOCKET_DIR  "/tmp"
+#define DEFAULT_PGSOCKET_DIR  "/var/run/postgresql"

But still, I wonder how this -k switch will get a role here, pg_control
certainly won't know about that. I guess it's worse to give a wrong
value rather than none, but that's easy to fix by having a good label
for the line, I guess.

All in all it would be good to have that in pg_control.
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exposing an installation's default value of unix_socket_directory

2010-10-21 Thread Alvaro Herrera

Excerpts from Cédric Villemain's message of jue oct 21 16:01:30 -0300 2010:

> I agree this is interesting information to get, but wonder how
> pg_config can know that and it looks to me that this information as
> nothing to do in pg_config
> 
> pg_config is all about installation, socket_dir is a postgresql.conf setting.

Yeah -- how is pg_config to know?  All it can tell you is what was the
compiled-in default.

pg_ctl would be nice, but we'd have to make it parse the config file
(there has been talk about that).  In any case, if you don't know where
the socket is, presumably you don't know where the config file is,
either.  You've just moved the problem.

Maybe you should go the SHOW route.  The user could connect via TCP and
find out the socket directory that way.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL and HugePage

2010-10-21 Thread daveg

On Thu, Oct 21, 2010 at 08:16:27AM -0700, Mark Wong wrote:
> On Tue, Oct 19, 2010 at 8:30 PM, daveg  wrote:
> > On Wed, Oct 20, 2010 at 04:08:37PM +1300, Mark Kirkwood wrote:
> >> On 20/10/10 16:05, Mark Kirkwood wrote:
> >> >
> >> >
> >> >shmget and friends are hugetlbpage  aware, so it seems it should 'just
> >> >work'.
> >> >
> >>
> >> Heh - provided you specify
> >>
> >> SHM_HUGETLB
> >>
> >>
> >> in the relevant call that is :-)
> >
> > I had a patch for this against 8.3 that I could update if there is any
> > interest. I suspect it is helpful.
> 
> Oh, probably better than me digging up my broken one.  Send it out as
> is if you don't want to update it. :)

I'll update it and see if I can get a largish machine to test, at least with
pgbench on. But not today alas.

-dg
 
-- 
David Gould   da...@sonic.net  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per-column collation, work in progress

2010-10-21 Thread Robert Haas

On Thu, Oct 21, 2010 at 2:44 PM, Peter Eisentraut  wrote:
> On tor, 2010-10-14 at 22:54 -0400, Robert Haas wrote:
>> and maybe not that bad, but I wonder if there is some preparatory
>> refactoring that could be done to trim it down a bit.  I notice, for
>> example, that a lot of places that looked at > first/last> now look at .  In
>> particular, all the pathkey stuff is like this.  And similarly places
>> that used to care about  now have to care about > tymod, collation>.  There might be ways to restructure some of this
>> code so that these things can be changed without having to touch quite
>> so many places.
>
> Yeah, I think that's what I'll try to do next.
>
> We already have TypeName as a structure that contains type and typmod
> (and collation, in my patch).  We could make that a primnode instead of
> a parsenode, and use it in more places, or we could make a new leaner
> structure that only contains the numeric info.
>
> We could then, for example, change things like this:
>
> typedef struct Var
> {
>    Expr        xpr;
>    ...
>    Oid         vartype;
>    int32       vartypmod;
>    ...
> }
>
> into this
>
> typedef struct Var
> {
>    Expr        xpr;
>    ...
>    TypeName/TypeFoo vartype;
>    ...
> }
>
> This would save boatloads of duplicate code.

I think that the idea of having a node that represents "a type in all
its glory" is a very good one.  I'm somewhat inclined not to reuse
TypeName, because I think we'll end up wanting to use this in places
where "names" and "location" are not available.  In fact, judging by
some of the logic in LookupTypeNames(), we have some cases like that
already, which might be worth trying to clean up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exposing an installation's default value of unix_socket_directory

2010-10-21 Thread Cédric Villemain

2010/10/21 Tom Lane :
> I just noticed that there doesn't seem to be any good way of finding
> out what a postmaster's default value of unix_socket_directory is.
> If you try to SHOW it you just get an empty string.  We could probably
> fix things so that SHOW exposes the actual setting, but (1) there might
> be security arguments against that, and (2) if your problem is that you
> would like to find out the value so's you can connect to said
> postmaster, SHOW isn't going to help you.
>
> One possible response would be to add an item to what pg_config knows
> about, eg "pg_config --socketdir".  This doesn't answer every possible
> use-case either, but it would be helpful for some scenarios.
>
> Thoughts?

I agree this is interesting information to get, but wonder how
pg_config can know that and it looks to me that this information as
nothing to do in pg_config

pg_config is all about installation, socket_dir is a postgresql.conf setting.

I'd prefer a 'pg_ctl status' output.

>
>                        regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in plpython's Python Generators

2010-10-21 Thread Alvaro Herrera

Excerpts from Alvaro Herrera's message of jue oct 21 15:32:53 -0300 2010:
> Excerpts from Jean-Baptiste Quenot's message of jue oct 21 09:20:16 -0300 
> 2010:
> 
> > I get this error when calling the function:
> > 
> > test=# select foobar();
> > ERROR:  error fetching next item from iterator
> 
> I can reproduce this here.  The first bug to solve is, I think, getting
> a more meaningful error report.

Something like this.  Somebody that really knows their way around Python
has to clean this up.

alvherre=# select * from foobar();
ERROR:  error extrayendo el próximo elemento del iterador
CONTEXTO:  falló SPI_execute: SPI_ERROR_UNCONNECTED
función PL/Python «foobar»

I think all error cases in plpython need some improvement so that they
show the error message from Python.  Right now they are ignored.

... and presumably somebody can fix the real bug that Jean-Baptiste hit,
too.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


plpy.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per-column collation, work in progress

2010-10-21 Thread Peter Eisentraut

On tor, 2010-10-14 at 22:54 -0400, Robert Haas wrote:
> and maybe not that bad, but I wonder if there is some preparatory
> refactoring that could be done to trim it down a bit.  I notice, for
> example, that a lot of places that looked at  first/last> now look at .  In
> particular, all the pathkey stuff is like this.  And similarly places
> that used to care about  now have to care about  tymod, collation>.  There might be ways to restructure some of this
> code so that these things can be changed without having to touch quite
> so many places.

Yeah, I think that's what I'll try to do next.

We already have TypeName as a structure that contains type and typmod
(and collation, in my patch).  We could make that a primnode instead of
a parsenode, and use it in more places, or we could make a new leaner
structure that only contains the numeric info.

We could then, for example, change things like this:

typedef struct Var
{
Exprxpr;
...
Oid vartype;
int32   vartypmod;
...
}

into this

typedef struct Var
{
Exprxpr;
...
TypeName/TypeFoo vartype;
...
}

This would save boatloads of duplicate code.

> It looks like you've define collations as objects that exist within
> particular namespaces, but there's no CREATE COLLATION statement, so I
> don't see what purpose this serves.  I suppose we could leave that to
> be added later, but is there actually a use case for having collations
> in individual schemas, or should we treat them more like we do casts -
> i.e. as database-global objects?

The SQL standard defines it that way, and there should be a CREATE
COLLATION statement later.  Application-specific collation sequences
might not be unreasonable in the future.

> Why does the executor ever need to see collate clauses?

Hmm, maybe not.  I think it did in an earlier working draft.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Serializable snapshot isolation patch

2010-10-21 Thread Kevin Grittner

Jeff Davis  wrote:

> in this case we do clearly have a problem, because the result is
> not equal to the serial execution of the transactions in either
> order.

Yeah, you're right.  I misread that example -- newbie with the
PERIOD type.

> So the question is: at what point is the logic wrong? It's either:
>   1. PreCommit_CheckForSerializationFailure() is missing a failure
> case.
>   2. The state prior to entering that function (which I believe I
> sufficiently described) is wrong.

It's (2).  For the reasons I described in my previous email.  Even
though misread the specifics of your example, I was close enough to
see where the problem was accurately.  :-/

> If it's (2), then what should the state look like, and how is the
> GiST code supposed to result in that state?

The second insert should create conflicts similar to what the first
did, but in the other direction -- simple write skew.  How GiST is
supposed to catch this is the big question.  My logic that a
conflicting insert will modify a page read by the other transaction
only holds until someone inserts a conflicting entry.  That's why it
wasn't reliably failing until you had and example where both
transactions accessing the same leaf page.

In your example, session 1's insert creates the leaf entry and
propagates entries up to the root.  When session 2 inserts, it
can just modify the leaf, so the conflict is missed.  As I said, the
most obvious way to fix this is to look for conflicts while
descending to the leaf for an insert.  I'm almost sure we can do
better than that, but I haven't finished thinking it through.  A
rough idea might be that when we find a conflict on an insert, we
acquire additional predicate locks on everything between the lowest
point of conflict and the leaf; the rest of the logic would remain
as-is.  I haven't finished mulling that over, but it seems likely to
work.  If we did that, session 2 would detect the conflict on the
insert to the leaf, and all would be well.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ask for review of MERGE

2010-10-21 Thread Greg Smith


Robert Haas wrote:

I think the right way to write UPSERT is something
along the lines of:

MERGE INTO Stock t USING (VALUES (10, 1)) s(item_id, balance) ON
s.item_id = t.item_id ...
  


That led in the right direction, after a bit more fiddling I was finally 
able to get something that does what I wanted:  a single table UPSERT 
implemented with this MERGE implementation.  Here's a log of a test 
session, suitable for eventual inclusion in the regression tests:


CREATE TABLE Stock(item_id int UNIQUE, balance int);
INSERT INTO Stock VALUES (10, 2200);
INSERT INTO Stock VALUES (20, 1900);
SELECT * FROM Stock ORDER BY item_id;

item_id | balance
-+-
 10 |2200
 20 |1900

MERGE INTO Stock t
USING (VALUES(10,100)) AS s(item_id,balance)
ON s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=t.balance + s.balance
WHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance)
;

MERGE 1

SELECT * FROM Stock ORDER BY item_id;
item_id | balance
-+-
 10 |2300
 20 |1900

MERGE INTO Stock t
USING (VALUES(30,2000)) AS s(item_id,balance)
ON s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=t.balance + s.balance
WHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance)
;

MERGE 1
SELECT * FROM Stock ORDER BY item_id;
item_id | balance
-+-
 10 |2300
 20 |1900
 30 |2000

I'm still a little uncertain as to whether any of my other examples 
should have worked under the spec but just didn't work here, but I'll 
worry about that later.


Here's what the query plan looks like on a MATCH:

Merge  (cost=0.00..8.29 rows=1 width=22) (actual time=0.166..0.166 
rows=0 loops=1)

  Action 1: Update When Matched
  Action 2: Insert When Not Mactched
  MainPlan:
  ->  Nested Loop Left Join  (cost=0.00..8.29 rows=1 width=22) (actual 
time=0.050..0.061 rows=1 loops=1)
->  Values Scan on "*VALUES*"  (cost=0.00..0.01 rows=1 width=8) 
(actual time=0.009..0.010 rows=1 loops=1)
->  Index Scan using stock_item_id_key on stock t  
(cost=0.00..8.27 rows=1 width=14) (actual time=0.026..0.030 rows=1 loops=1)

  Index Cond: ("*VALUES*".column1 = item_id)
Total runtime: 0.370 ms


And here's a miss:

Merge  (cost=0.00..8.29 rows=1 width=22) (actual time=0.145..0.145 
rows=0 loops=1)

  Action 1: Update When Matched
  Action 2: Insert When Not Mactched
  MainPlan:
  ->  Nested Loop Left Join  (cost=0.00..8.29 rows=1 width=22) (actual 
time=0.028..0.033 rows=1 loops=1)
->  Values Scan on "*VALUES*"  (cost=0.00..0.01 rows=1 width=8) 
(actual time=0.004..0.005 rows=1 loops=1)
->  Index Scan using stock_item_id_key on stock t  
(cost=0.00..8.27 rows=1 width=14) (actual time=0.015..0.015 rows=0 loops=1)

  Index Cond: ("*VALUES*".column1 = item_id)
Total runtime: 0.255 ms

Next steps here:
1) Performance/concurrency tests against trigger-based UPSERT approach.
2) Finish bit rot cleanup against HEAD.
3) Work out more complicated test cases to try and fine more unexpected 
behavior edge cases and general bugs.


--
Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rawdump

2010-10-21 Thread Stephen R. van den Berg

Tom Lane wrote:
>"Stephen R. van den Berg"  writes:
>> If it's inserted in the "special" area, it will not break any
>> compatibility.

>I'll tell you what I really don't like about this proposal: we discuss
>some scheme or other for taking over the "special space" in heap pages
>at least once a year.  None of them have been particularly compelling
>so far, but one may come along that is; especially given that we're now
>trying to maintain on-disk compatibility across versions.  So I think
>the opportunity cost of assigning a use to that space is mighty high.
>I don't find this idea important enough to justify foreclosing future
>uses for the special space.

Well, I had (of course) thought of that, and the classical solution to
this is to specify a certain attribute based format in order not to
canabalise the space and block it for further other use.

I.e. in the special area, we could start using something like:
2-byte field length (including the length field), 1-byte identifier,
field content.

For the recovery information I'd like to reserve:
identifier: 00: table OID
01: table layout

>The real bottom line is this: if you care enough about your data to
>be willing to expend a large amount of effort on manual recovery
>attempts, why didn't you have a decent backup scheme in place?

Two obvious valid answers would be: Stupidity and/or ignorance,
sometimes a strain of bad luck.
I know it is a sad state of affairs, but not all users of postgresql
are equally knowledgable/intelligent/responsible.

>There are way too many scenarios where you'll have no hope of doing
>any such manual recovery anyway.

True.  It's all a matter of statistics.  Judging by the number of reports
I find by googling net-history, I'd have to conclude that the proposed
extra information would have helped more than half of them.
-- 
Stephen.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in plpython's Python Generators

2010-10-21 Thread Alvaro Herrera

Excerpts from Jean-Baptiste Quenot's message of jue oct 21 09:20:16 -0300 2010:

> I get this error when calling the function:
> 
> test=# select foobar();
> ERROR:  error fetching next item from iterator

I can reproduce this here.  The first bug to solve is, I think, getting
a more meaningful error report.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UNION ALL has higher cost than inheritance

2010-10-21 Thread Greg Stark

On Thu, Oct 21, 2010 at 6:57 AM, Tom Lane  wrote:
>> Thanks. It also explains my another question why Merge Append cannot
>> be used for UNION ALL plans.
>
> Hmm, seems like the example you show ought to work.  I wonder if there
> was an oversight in that patch...
>

Huh, that definitely worked in the earlier versions of the patch (as
much as it "worked" at all)

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Exposing an installation's default value of unix_socket_directory

2010-10-21 Thread Tom Lane

I just noticed that there doesn't seem to be any good way of finding
out what a postmaster's default value of unix_socket_directory is.
If you try to SHOW it you just get an empty string.  We could probably
fix things so that SHOW exposes the actual setting, but (1) there might
be security arguments against that, and (2) if your problem is that you
would like to find out the value so's you can connect to said
postmaster, SHOW isn't going to help you.

One possible response would be to add an item to what pg_config knows
about, eg "pg_config --socketdir".  This doesn't answer every possible
use-case either, but it would be helpful for some scenarios.

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [JDBC] Support for JDBC setQueryTimeout, et al.

2010-10-21 Thread Bruce Momjian

Kevin Grittner wrote:
> Bruce Momjian  wrote:
>  
> > why would we do this client-side rather than server-side?
>  
> Because the timeout is supposed to be a limit on the time allowed
> for specific Java methods to complete, which might be running a
> large number of SQL statements within one invocation, and which may
> include significant network latency.  It's a lot of work to get
> "pretty close" on the server side, and you can never really
> implement exactly what the JDBC API is requesting.
>  
> What if you have an app which can draw data from any of a number of
> remote databases, and you want to use this limit so if one becomes
> unavailable for some reason you can re-run the request on another
> within a reasonable time?  The network connection goes down after
> you submit your request, you've got a period of minutes or hours
> until TCP gives up, and the user expects a response within a few
> seconds...
>  
> If you implement something with server-side semantics, there's
> nothing to prevent an application which is PostgreSQL-aware from
> accessing it through JDBC, of course.  statement_timeout and other
> GUCs can be set locally to your heart's content.

OK, thanks.  Just had to ask.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [JDBC] Support for JDBC setQueryTimeout, et al.

2010-10-21 Thread Kevin Grittner

Bruce Momjian  wrote:
 
> why would we do this client-side rather than server-side?
 
Because the timeout is supposed to be a limit on the time allowed
for specific Java methods to complete, which might be running a
large number of SQL statements within one invocation, and which may
include significant network latency.  It's a lot of work to get
"pretty close" on the server side, and you can never really
implement exactly what the JDBC API is requesting.
 
What if you have an app which can draw data from any of a number of
remote databases, and you want to use this limit so if one becomes
unavailable for some reason you can re-run the request on another
within a reasonable time?  The network connection goes down after
you submit your request, you've got a period of minutes or hours
until TCP gives up, and the user expects a response within a few
seconds...
 
If you implement something with server-side semantics, there's
nothing to prevent an application which is PostgreSQL-aware from
accessing it through JDBC, of course.  statement_timeout and other
GUCs can be set locally to your heart's content.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [JDBC] Support for JDBC setQueryTimeout, et al.

2010-10-21 Thread Bruce Momjian

Tom Lane wrote:
> Bruce Momjian  writes:
> >> Tom Lane  wrote:
> >>> I'm all for doing this client-side.
> 
> > Uh, why would we do this client-side rather than server-side?  If we do
> > it server-side, all interfaces get it.
> 
> 1. The API that's being implemented is JDBC-specific.
> 
> 2. Even if you want to argue that it would be generally useful to have
> such a feature, it would certainly require additional client-side
> programming to make each client interface make use of it.

Wouldn't it be simpler to code this in the server and have the client
use that facility, rather than have the each client implement it.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [JDBC] Support for JDBC setQueryTimeout, et al.

2010-10-21 Thread Tom Lane

Bruce Momjian  writes:
>> Tom Lane  wrote:
>>> I'm all for doing this client-side.

> Uh, why would we do this client-side rather than server-side?  If we do
> it server-side, all interfaces get it.

1. The API that's being implemented is JDBC-specific.

2. Even if you want to argue that it would be generally useful to have
such a feature, it would certainly require additional client-side
programming to make each client interface make use of it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Domains versus arrays versus typmods

2010-10-21 Thread Kevin Grittner

Tom Lane  wrote:
 
> you are confusing an array over a domain type with a domain over
> an array type.
 
Yes I was.  Sorry.  Argument withdrawn.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [JDBC] Support for JDBC setQueryTimeout, et al.

2010-10-21 Thread Bruce Momjian

Kevin Grittner wrote:
> Tom Lane  wrote:
>  
> >  I'm all for doing this client-side.
>  
> Well, that makes a big difference.  Unless someone can make a
> convincing argument for why we should modify the server side to
> support this, I think we should just focus on this one client-side
> patch.
>  
> I'd be happy to give it a closer look, but I may not be able to do so
> for a few weeks, and won't complain if someone beats me to it.

Uh, why would we do this client-side rather than server-side?  If we do
it server-side, all interfaces get it.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Domains versus arrays versus typmods

2010-10-21 Thread Tom Lane

"Kevin Grittner"  writes:
> Tom Lane  wrote:
>> So we will downcast myia to int[], or at least one might assume
>> that's what's happening.  But actually it's worse than that: the
>> result of this operation is thought to be myia not int[], because
>> myia itself is taken as matching ANYARRAY, and the operator result
>> is the same ANYARRAY type.
 
> That is actually what I would want and expect.  Let's say I have an
> array of attorney bar numbers, and I add one more as a literal. 
> While an argument could be made that the integer should be cast to
> a bar number before being added to the array, we don't require that
> for an assignment to a simple variable in the domain, so it would be
> surprising to require a cast here, and even more surprising for the
> concatenation to result in an array of primitive integers rather
> than a array of attorney bar numbers.

I disagree with that argument: you are confusing an array over a domain
type with a domain over an array type.  In the latter case, the domain
could have additional constraints (such as the length constraint in my
other example), and there's no reason to assume that || or other array
operators would preserve those constraints.

A perhaps comparable example is

create domain verysmallint as int check (value < 10);

select 9::verysmallint + 1;

The result of the addition is int, not verysmallint, which is why you
don't get an error.

>From an abstract-data-type point of view, the fact that any of these
operations are even allowed without an explicit downcast is a bit
uncool: it exposes the implementation of the domain type, which one
could argue shouldn't be allowed, at least not without some notational
marker showing you know what you're doing.  But the SQL committee
seems to have decided to ignore that tradition and allow auto-downcasts.
Nonetheless, when a domain type is fed to an operator that works on its
base type, it has to be clearly understood that there *is* an implied
downcast, and whatever special properties the domain may have had will
be lost.  As far as the operator and its result are concerned, the
domain is just its base type.

I'm not against fixing these cases so that auto downcasts happen, I'm
just saying that it's outside the scope of what I'm going to do in
response to the current bug.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Domains versus arrays versus typmods

2010-10-21 Thread Kevin Grittner

Tom Lane  wrote:

> regression=# select array[1,2] || 3::myi;
> ERROR:  operator does not exist: integer[] || myi
> 
> In this case, one might expect myi to be automatically downcast to
< int so that it could be matched up with the int array, but that's
> not happening.

I guess it should allow that, although for my uses of domains it's
hard to see a reasonable use case for it, so that part doesn't
bother me too much.

> regression=# create domain myia as int[];
> CREATE DOMAIN
> regression=# select array[1,2]::myia || 3;
>  ?column? 
> --
>  {1,2,3}
> (1 row)
> 
> So we will downcast myia to int[], or at least one might assume
> that's what's happening.  But actually it's worse than that: the
> result of this operation is thought to be myia not int[], because
> myia itself is taken as matching ANYARRAY, and the operator result
> is the same ANYARRAY type.

That is actually what I would want and expect.  Let's say I have an
array of attorney bar numbers, and I add one more as a literal. 
While an argument could be made that the integer should be cast to
a bar number before being added to the array, we don't require that
for an assignment to a simple variable in the domain, so it would be
surprising to require a cast here, and even more surprising for the
concatenation to result in an array of primitive integers rather
than a array of attorney bar numbers.

> regression=# create domain myia2 as int[]
> check(array_length(value,1) = 2);
> CREATE DOMAIN

> regression=# select array[1,2]::myia2 || 3;
>  ?column? 
> --
>  {1,2,3}
> (1 row)

> So we have a value that's claimed to belong to the domain, but it
> doesn't meet the domain's constraints.

Yeah, that's obviously wrong.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Serializable snapshot isolation patch

2010-10-21 Thread Jeff Davis

On Thu, 2010-10-21 at 10:29 -0500, Kevin Grittner wrote:
> Basically, when we already have a pivot, but no transaction has yet
> committed, we wait to see if TN commits first.  If so, we have a
> problem; if not, we don't.  There's probably some room for improving
> performance by cancelling T0 or T1 instead of TN, at least some of
> the time; but in this pass we are always cancelling the transaction
> in whose process we detect the need to cancel something.

Well, in this case we do clearly have a problem, because the result is
not equal to the serial execution of the transactions in either order.

So the question is: at what point is the logic wrong? It's either:
  1. PreCommit_CheckForSerializationFailure() is missing a failure case.
  2. The state prior to entering that function (which I believe I
sufficiently described) is wrong.

If it's (2), then what should the state look like, and how is the GiST
code supposed to result in that state?

I know some of these questions are answered in the relevant research,
but I'd like to discuss this concrete example specifically.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Domains versus arrays versus typmods

2010-10-21 Thread Tom Lane

Robert Haas  writes:
> On Wed, Oct 20, 2010 at 10:03 AM, Tom Lane  wrote:
>> My point is that anyplace that is relying on the surface typelem,
>> without drilling down to see what the base type is, is wrong.
>> So yeah, those lookups are (will be) necessary.

> OK.  In that case, +1 from me.

I've come across another interesting definitional issue, which is what
properties should domains have with respect to matching to polymorphic
arguments.  Currently, the polymorphic matching functions take domains
at face value (ie, without noticing their relationships to their base
types), with one exception: because they use get_element_type() to
decide if a type matches ANYARRAY, domains over arrays will be
considered to match ANYARRAY.  This leads to some weird inconsistencies
and at least one genuine bug.  Observe (this is with 9.0.x HEAD):

regression=# create domain myi as int;
CREATE DOMAIN
regression=# select array[1,2] || 3;
 ?column? 
--
 {1,2,3}
(1 row)

regression=# select array[1,2] || 3::myi;
ERROR:  operator does not exist: integer[] || myi

In this case, one might expect myi to be automatically downcast to int
so that it could be matched up with the int array, but that's not
happening.  However:

regression=# create domain myia as int[];
CREATE DOMAIN
regression=# select array[1,2]::myia || 3;
 ?column? 
--
 {1,2,3}
(1 row)

So we will downcast myia to int[], or at least one might assume that's
what's happening.  But actually it's worse than that: the result of this
operation is thought to be myia not int[], because myia itself is taken
as matching ANYARRAY, and the operator result is the same ANYARRAY type.
Thus, this case goes off the rails completely:

regression=# create domain myia2 as int[] check(array_length(value,1) = 2);
CREATE DOMAIN
regression=# select array[1,2]::myia2;
 array 
---
 {1,2}
(1 row)

regression=# select array[1,2,3]::myia2;
ERROR:  value for domain myia2 violates check constraint "myia2_check"
regression=# select array[1,2]::myia2 || 3;
 ?column? 
--
 {1,2,3}
(1 row)

The result of the || is considered to be myia2, as can be seen for
example this way:

regression=# create view vvv as select array[1,2]::myia2 || 3 as x;
CREATE VIEW
regression=# \d vvv
 View "public.vvv"
 Column | Type  | Modifiers 
+---+---
 x  | myia2 | 

So we have a value that's claimed to belong to the domain, but it
doesn't meet the domain's constraints.

What I am intending to do about this in the short run is to leave the
anyarray-ness tests in the polymorphic-compatibility-checking functions
as-is.  That will mean (with the change in typelem for domains) that a
domain over array doesn't match ANYARRAY unless you explicitly downcast
it.  I argue that this is consistent with the current behavior of not
auto-downcasting domains to match the element type of an array.  We
could go back and change it later, but if we do, we should try to make
both cases provide auto-downcast-when-needed behavior.  I have not dug
into just what code changes would be needed for that.  Auto-downcast
wouldn't be exactly compatible with the current behavior anyway, since
it would result in a different claimed type for the operator result.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, this time with a patch

2010-10-21 Thread Robert Haas

On Thu, Oct 21, 2010 at 11:12 AM, Dimitri Fontaine
 wrote:
> "David E. Wheeler"  writes:
>> Sure. The reason to do it, though, is so that extension authors can create
>> just one metadata file, instead of two (or three, if one must also put such
>> data into the Makefile).
>
> That's a good idea, but my guess is that the implementation cost of
> supporting the control format in your perl infrastructure is at least an
> order of magnitude lower than the cost for me to support your current
> JSON file format, so I lean towards you having an automated way to fill
> in the json file from the control one...

Ah, truth to power.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, this time with a patch

2010-10-21 Thread Alvaro Herrera

Excerpts from Dimitri Fontaine's message of jue oct 21 12:53:18 -0300 2010:

> This part of the problem didn't receive much thoughts yet, and it shows
> up. About the rest of the patch have been in my head for months, I
> expect less problems there...

Keep on it.  You're doing a terrific job.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, this time with a patch

2010-10-21 Thread Dimitri Fontaine

"David E. Wheeler"  writes:
> Be aware that PGXS sets a $(VERSION) variable already, so you'll need
> to use another name, I think, to guard from conflicts. This is in
> Makefile.global:

Of course that's not a problem for contribs, and I used EXTVERSION in a
previous version of the patch. I guess I will get back to use
$(EXTVERSION) in the Makefile next time I have to produce a patch.

This part of the problem didn't receive much thoughts yet, and it shows
up. About the rest of the patch have been in my head for months, I
expect less problems there...

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, this time with a patch

2010-10-21 Thread David E. Wheeler

On Oct 21, 2010, at 8:12 AM, Dimitri Fontaine wrote:

> That's a good idea, but my guess is that the implementation cost of
> supporting the control format in your perl infrastructure is at least an
> order of magnitude lower than the cost for me to support your current
> JSON file format, so I lean towards you having an automated way to fill
> in the json file from the control one...

Well, it *will* be easier. Eventually. Right now, the file has to be edited by 
hand. Which I can tell you from experience is rather…error-prone.

Anyway, I wouldn't push for a JSON file format until a parser was just there 
for you to use without too much trouble.

> The Makefile supports $(VERSION) because chances are it's already there
> (think packaging or tarball release targets). Having yet another place
> where to manually maintain a version number ain't appealing.

Be aware that PGXS sets a $(VERSION) variable already, so you'll need to use 
another name, I think, to guard from conflicts. This is in Makefile.global:

VERSION = 9.0.1
MAJORVERSION = 9.0

Maybe use EXTVERSION? You don't want to overwrite the core version because a 
makefile author could use it to change the build (pgTAP does this, for example).

> In the latest patch, though, the only other thing you find in the
> Makefile about the extension is its basename, which must be the one of
> both the .control and the .sql files. And it's possible for $(EXTENSION)
> to be a list of them, too, because of contrib/spi.

Right, that makes sense.

Best,

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Serializable snapshot isolation patch

2010-10-21 Thread Kevin Grittner

Jeff Davis  wrote:

> That looks like a reasonable state to me, but I'm not sure exactly
> what the design calls for. I am guessing that the real problem is
> in PreCommit_CheckForSerializationFailure(), where there are 6
> conditions that must be met for an error to be thrown. T2 falls
> out right away at condition 1. T1 falls out on condition 4. I
> don't really understand condition 4 at all -- can you explain it?
> And can you explain conditions 5 and 6 too?

Since most transactions are rolled back on a conflict detection
during a read or write attempt, there are only a few very specific
conditions which can "slip through" to where they need to be
detected on commit.  Here's the code with the six conditions:

if (MySerializableXact->inConflict != InvalidSerializableXact
  && MySerializableXact->inConflict != MySerializableXact
  && !(MySerializableXact->inConflict->rolledBack)
  && MySerializableXact->inConflict->inConflict !=
   InvalidSerializableXact
  && !SxactIsCommitted(MySerializableXact->inConflict)
  && !SxactIsCommitted(MySerializableXact->inConflict->inConflict))

Condition 4 is testing whether MySerializableXact is on the "out"
side of a pivot -- in the parlance of most examples, is
MySerializableXact TN?

Condition 5 and 6 confirm that neither T0 nor T1 have committed
first; we can only have a problem if TN commits first.

Basically, when we already have a pivot, but no transaction has yet
committed, we wait to see if TN commits first.  If so, we have a
problem; if not, we don't.  There's probably some room for improving
performance by cancelling T0 or T1 instead of TN, at least some of
the time; but in this pass we are always cancelling the transaction
in whose process we detect the need to cancel something.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL and HugePage

2010-10-21 Thread Mark Wong

On Tue, Oct 19, 2010 at 8:30 PM, daveg  wrote:
> On Wed, Oct 20, 2010 at 04:08:37PM +1300, Mark Kirkwood wrote:
>> On 20/10/10 16:05, Mark Kirkwood wrote:
>> >
>> >
>> >shmget and friends are hugetlbpage  aware, so it seems it should 'just
>> >work'.
>> >
>>
>> Heh - provided you specify
>>
>> SHM_HUGETLB
>>
>>
>> in the relevant call that is :-)
>
> I had a patch for this against 8.3 that I could update if there is any
> interest. I suspect it is helpful.

Oh, probably better than me digging up my broken one.  Send it out as
is if you don't want to update it. :)

Regards,
Mark

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, this time with a patch

2010-10-21 Thread Dimitri Fontaine

"David E. Wheeler"  writes:
> Sure. The reason to do it, though, is so that extension authors can create
> just one metadata file, instead of two (or three, if one must also put such
> data into the Makefile).

That's a good idea, but my guess is that the implementation cost of
supporting the control format in your perl infrastructure is at least an
order of magnitude lower than the cost for me to support your current
JSON file format, so I lean towards you having an automated way to fill
in the json file from the control one...

The Makefile supports $(VERSION) because chances are it's already there
(think packaging or tarball release targets). Having yet another place
where to manually maintain a version number ain't appealing.

In the latest patch, though, the only other thing you find in the
Makefile about the extension is its basename, which must be the one of
both the .control and the .sql files. And it's possible for $(EXTENSION)
to be a list of them, too, because of contrib/spi.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL and HugePage

2010-10-21 Thread Mark Wong

On Wed, Oct 20, 2010 at 1:13 PM, Robert Haas  wrote:
> On Wed, Oct 20, 2010 at 3:47 PM, daveg  wrote:
>> On Wed, Oct 20, 2010 at 12:28:25PM -0700, Greg Stark wrote:
>>> On Wed, Oct 20, 2010 at 12:17 PM, Greg Stark  wrote:
>>> > I don't think it's a big cost once all the processes
>>> > have been forked if you're reusing them beyond perhaps slightly more
>>> > efficient cache usage.
>>>
>>> Hm, this site claims to get a 13% win just from the reduced tlb misses
>>> using a preload hack with Pg 8.2. That would be pretty substantial.
>>>
>>> http://oss.linbit.com/hugetlb/
>>
>> That was my motivation in trying a patch. TLB misses can be a substantial
>> overhead. I'm not current on the state of play, but working at Sun's
>> benchmark lab on a DB TPC-B benchmark something for the first generation
>> of MP systems, something like 30% of all bus traffic was TLB misses. The
>> next iteration of the hardward had a much larger TLB.
>>
>> I have a client with 512GB memory systems, currently with 128GB configured
>> as postgresql buffer cache. Which is 32M TLB entires trying to fit in the
>> few dozed cpu TLB slots. I suspect there may be some contention.
>>
>> I'll benchmark of course.
>
> Do you mean 128GB shared buffers, or shared buffers + OS cache?  I
> think that the general wisdom is that performance tails off beyond
> 8-10GB of shared buffers anyway, so a performance improvement on 128GB
> shared buffers might not mean much unless you can also show that 128GB
> shared buffers actually performs better than some smaller amount.

I'm sure someone will correct me if I'm wrong, but when I looked at
this a couple years ago I believe a side effect of using hugetlbs is
that these segments are never swapped out.

I made a weak attempt to patch postgres to use hugetlbs when
allocating shared memory.  If I can find that patch I'll send it out..

Mark

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, this time with a patch

2010-10-21 Thread David E. Wheeler

On Oct 21, 2010, at 12:33 AM, Dimitri Fontaine wrote:

> I don't see what it buys us in this very context. The main thing here to
> realize is that I wrote about no code to parse the control file. I don't
> think the extension patch should depend on the JSON-in-core patch.
> 
> Now, once we have JSON and before the release, I guess given a good
> reason to have much more complex configuration files that don't look at
> all like postgresql.conf, we could revisit.

Sure. The reason to do it, though, is so that extension authors can create just 
one metadata file, instead of two (or three, if one must also put such data 
into the Makefile).

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, this time with a patch

2010-10-21 Thread Dimitri Fontaine

Tom Lane  writes:
> ... well, that's just a bug in hstore.  *All* the contrib modules
> should be using PGXS, unless they have a damn good reason not to;
> which is not apparent for hstore.

Here's a patch.
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

*** a/contrib/hstore/Makefile
--- b/contrib/hstore/Makefile
***
*** 1,9 
  # contrib/hstore/Makefile
  
- subdir = contrib/hstore
- top_builddir = ../..
- include $(top_builddir)/src/Makefile.global
- 
  MODULE_big = hstore
  OBJS = hstore_io.o hstore_op.o hstore_gist.o hstore_gin.o hstore_compat.o \
  	crc32.o
--- 1,5 
***
*** 12,15  DATA_built = hstore.sql
--- 8,21 
  DATA = uninstall_hstore.sql
  REGRESS = hstore
  
+ ifdef USE_PGXS
+ PG_CONFIG = pg_config
+ PGXS := $(shell $(PG_CONFIG) --pgxs)
+ include $(PGXS)
+ else
+ subdir = contrib/hstore
+ top_builddir = ../..
+ include $(top_builddir)/src/Makefile.global
  include $(top_srcdir)/contrib/contrib-global.mk
+ endif
+ 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Serializable snapshot isolation patch

2010-10-21 Thread Kevin Grittner

Jeff Davis  wrote:

>> Also, it appears to be non-deterministic, to a degree at least,
>> so you may not observe the problem in the exact way that I do.
> The SELECTs only look at the root and the predicate doesn't match.
> So each SELECT sets an SIReadLock on block 0 and exits the search.
> Looks good so far.
> 
> T1 then inserts, and it has to modify page 0, so it does
> FlagRWConflict(). That sets writer->inConflict = reader and
> reader->outConflict = writer (where writer is T1 and reader is
> T2); and T1->outConflict and T2->inConflict remain NULL.
> 
> Then T2 inserts, and I didn't catch that part in as much detail in
> gdb, but it apparently has no effect on that state, so we still
> have T1->inConflict = T2, T1->outConflict = NULL, T2->inConflict =
> NULL, and T2->outConflict = T1.

I now see where the wheels fall off.  The GiST query initially stops
at a high level, so predicate locks only go that deep, and the
*first* insert of a conflicting row must ripple up and modify a
locked page; but *subsequent* inserts may only need to modify the
leaf level.  Even though your particular example doesn't involve a
cycle and therefore doesn't require a rollback for correctness
(although it might tend to generate a false positive if index page
locks were working correctly), you've exposed a flaw in the
GiST AM implementation of predicate locks.

On a first glance, it appears that we would need to check for
conflicts as we move down through the index to find the right spot
for an insert, not as we modify pages for the insert.  I hope
there's some more subtle technique or some way to qualify it;
otherwise a search which stops at the root page would generate a
conflict out to just about any index insertion from a concurrent
transaction.

I will add this to my list of issues to fix based on your review,
unless it's something you would like to tackle -- I'm not going to
chase away anyone who wants to help with this.  :-)

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, this time with a patch

2010-10-21 Thread Tom Lane

Dimitri Fontaine  writes:
> Itagaki Takahiro  writes:
>> Why does only hstore have version '9.1'? Any other modules have
>> '9.1devel'.

> It's the only contrib that's not using PGXS but instead directly
> includes $(top_builddir)/src/Makefile.global,

... well, that's just a bug in hstore.  *All* the contrib modules
should be using PGXS, unless they have a damn good reason not to;
which is not apparent for hstore.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rawdump

2010-10-21 Thread Tom Lane

"Stephen R. van den Berg"  writes:
> Robert Haas wrote:
>> and break on-disk compatibility just to make it easier to

> If it's inserted in the "special" area, it will not break any
> compatibility.

I'll tell you what I really don't like about this proposal: we discuss
some scheme or other for taking over the "special space" in heap pages
at least once a year.  None of them have been particularly compelling
so far, but one may come along that is; especially given that we're now
trying to maintain on-disk compatibility across versions.  So I think
the opportunity cost of assigning a use to that space is mighty high.
I don't find this idea important enough to justify foreclosing future
uses for the special space.

The real bottom line is this: if you care enough about your data to
be willing to expend a large amount of effort on manual recovery
attempts, why didn't you have a decent backup scheme in place?
There are way too many scenarios where you'll have no hope of doing
any such manual recovery anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UNION ALL has higher cost than inheritance

2010-10-21 Thread Tom Lane

Itagaki Takahiro  writes:
> On Thu, Oct 21, 2010 at 2:18 PM, Tom Lane  wrote:
>> The plan for UNION initially involves a couple of SubqueryScan nodes,
>> which impose an extra cost of cpu_tuple_cost per tuple. Â Those later
>> get optimized away, but we don't try to readjust the cost estimates
>> for that.

> Thanks. It also explains my another question why Merge Append cannot
> be used for UNION ALL plans.

Hmm, seems like the example you show ought to work.  I wonder if there
was an oversight in that patch...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: Fix snapshot taking inconsistencies

2010-10-21 Thread Marko Tiikkaja


Hi,

Here's an updated patch.

I'm still not too fond of the logic in spi.c, but I can't see a better 
way to do this.  If someone sees a better way, I'm not going to object.


I also made some changes to the SQL functions now that we have a 
different API.  The previous code pushed and popped snapshots quite heavily.


I'd also like to see more regression tests for SQL functions, but I'm 
going to submit my suggestions as a separate patch.



Regards,
Marko Tiikkaja
*** a/src/backend/catalog/pg_proc.c
--- b/src/backend/catalog/pg_proc.c
***
*** 755,761  fmgr_sql_validator(PG_FUNCTION_ARGS)
--- 755,763 
Oid funcoid = PG_GETARG_OID(0);
HeapTuple   tuple;
Form_pg_proc proc;
+   List   *raw_parsetree_list;
List   *querytree_list;
+   ListCell   *list_item;
boolisnull;
Datum   tmp;
char   *prosrc;
***
*** 828,836  fmgr_sql_validator(PG_FUNCTION_ARGS)
 */
if (!haspolyarg)
{
!   querytree_list = pg_parse_and_rewrite(prosrc,
!   
  proc->proargtypes.values,
!   
  proc->pronargs);
(void) check_sql_fn_retval(funcoid, proc->prorettype,
   
querytree_list,
   
NULL, NULL);
--- 830,854 
 */
if (!haspolyarg)
{
!   /*
!* Parse and rewrite the queries in the function text.
!*
!* Even though check_sql_fn_retval is only interested 
in the last
!* query, we analyze all of them here to check for any 
errors.
!*/
!   raw_parsetree_list = pg_parse_query(prosrc);
!   
!   querytree_list = NIL;
!   foreach(list_item, raw_parsetree_list)
!   {
!   Node *parsetree = (Node *) lfirst(list_item);
! 
!   querytree_list = 
pg_analyze_and_rewrite(parsetree, prosrc,
!   
proc->proargtypes.values, proc->pronargs);
!   }
! 
!   Assert(querytree_list != NIL);
! 
(void) check_sql_fn_retval(funcoid, proc->prorettype,
   
querytree_list,
   
NULL, NULL);
*** a/src/backend/executor/functions.c
--- b/src/backend/executor/functions.c
***
*** 84,89  typedef struct
--- 84,90 
boolreturnsSet; /* true if returning multiple 
rows */
boolreturnsTuple;   /* true if returning whole tuple result 
*/
boolshutdown_reg;   /* true if registered shutdown callback 
*/
+   boolsnapshot;   /* true if pushed an active 
snapshot */
boolreadonly_func;  /* true to run in "read only" mode */
boollazyEval;   /* true if using lazyEval for 
result query */
  
***
*** 93,107  typedef struct
  
JunkFilter *junkFilter; /* will be NULL if function returns 
VOID */
  
!   /* head of linked list of execution_state records */
!   execution_state *func_state;
  } SQLFunctionCache;
  
  typedef SQLFunctionCache *SQLFunctionCachePtr;
  
  
  /* non-export function prototypes */
! static execution_state *init_execution_state(List *queryTree_list,
 SQLFunctionCachePtr fcache,
 bool lazyEvalOK);
  static void init_sql_fcache(FmgrInfo *finfo, bool lazyEvalOK);
--- 94,107 
  
JunkFilter *junkFilter; /* will be NULL if function returns 
VOID */
  
!   List *func_state;
  } SQLFunctionCache;
  
  typedef SQLFunctionCache *SQLFunctionCachePtr;
  
  
  /* non-export function prototypes */
! static List *init_execution_state(List *queryTree_list,
 SQLFunctionCachePtr fcache,
 bool lazyEvalOK);
  static void init_sql_fcache(FmgrInfo *finfo, bool lazyEvalOK);
***
*** 123,183  static void sqlfunction_destroy(DestReceiver *self);
  
  
  /* Set up the list of per-query execution_state records for a SQL function */
! static execution_state *
  init_execution_state(List *que

1 2 >

1 - 100 of 105 matches

Mail list logo