date:20120621

[HACKERS] Pruning the TODO list

2012-06-21 Thread Simon Riggs

On 15 June 2012 03:10, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jun 14, 2012 at 11:39 AM, Amit Kapila amit.kap...@huawei.com wrote:

 I am planning to work on the below Todo list item...
...
 Suggest me if my understanding is correct?

 I guess my first question is: why do we need this?  There are lots of
 things in the TODO list that someone wanted once upon a time, but
 they're not all actually important.  Do you have reason to believe
 that this one is?  It's been six years since that email, so it's worth
 asking if this is actually relevant.


The current TODO list is close to worthless and needs substantial
pruning or even complete truncation. Or massive redesign.


The above shows a recent attempt to fix one of the items on the TODO
list, which many people have then spent time questioning. That is just
a waste of time for *all* concerned.

I see many items on there that have been there for eight years and longer.

The situation is now so bad that many experienced developers ignore
the list completely. Robert's answer is the final nail in that coffin;
I agree with him.


ISTM that we should prune the list right down to nothing, or very,
very few entries. We must have a TODO list that we can trust to save
us time. I don't want to see any more people waste their time on
issues that aren't really wanted. At the moment we have lots of false
positives and so waste time and bring the TODO list into disrepute. It
would be better to have a compact list where every item was reasonably
accepted, so we can begin to trust it again. Trusting the TODO list is
what brought me my first patch, and I think it could and should be the
same with others.


Can Tom go through the list and archive items no longer considered
likely to fly? A simple triage is all that is needed here. Or would
you like me or another to do this?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata

2012-06-21 Thread Tom Lane

Amit Kapila amit.kap...@huawei.com writes:
 The reason I'm concerned about selecting a next-LSN that's certainly beyond 
 every LSN in the database is that not doing 
 so could result in introducing further corruption, which would be entirely 
 avoidable with more care in choosing the 
 next-LSN.

 The further corruption can only be possible when we replay some wrong
 WAL by selecting wrong LSN.

No, this is mistaken.  Pages in the database that have LSN ahead of
where the server thinks the end of WAL is cause lots of problems
unrelated to replay; for example, inability to complete a checkpoint.
That might not directly lead to additional corruption, but consider
the case where such a page gets further modified, and the server decides
it doesn't need to create a full-page image because the LSN is ahead of
where the last checkpoint was.  A crash or two later, you have new
problems.

(Admittedly, once you've run pg_resetxlog you're best advised to just be
trying to dump what you've got, and not modify it more.  But sometimes
you have to hack the data just to get pg_dump to complete.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SP-GiST for ranges based on 2d-mapping and quad-tree

2012-06-21 Thread Heikki Linnakangas


On 14.06.2012 01:56, Alexander Korotkov wrote:

Hackers,

attached patch implements quad-tree on ranges. Some performance results in
comparison with current GiST indexing.



@@ -788,7 +774,7 @@ range_super_union(TypeCacheEntry *typcache, RangeType * r1, 
R
angeType * r2)
  * part of the relcache entry for the index, typically) this essentially
  * eliminates lookup overhead during operations on a GiST range index.
  */
-static Datum
+Datum
 TrickFunctionCall2(PGFunction proc, FmgrInfo *flinfo, Datum arg1, Datum arg2)
 {
FunctionCallInfoData fcinfo;


I don't think we want to expose TrickFunctionCall2(). Not with that 
name, anyway. Perhaps we should refactor the functions called this way, 
range_adjacent, range_overlaps etc., to have internal counterparts that 
can be called without FunctionCall(). Like:



***
*** 692,697 
--- 692,708 
  {
RangeType  *r1 = PG_GETARG_RANGE(0);
RangeType  *r2 = PG_GETARG_RANGE(1);
+
+   typcache = range_get_typcache(fcinfo, RangeTypeGetOid(r1));
+
+   PG_RETURN_BOOL(range_adjacent_internal(r1, r2, typcache);
+ }
+
+ bool
+ range_adjacent_internal(RangeType r1, RangeType r2, TypeCacheEntry *typcache)
+ {
+   RangeType  *r1 = PG_GETARG_RANGE(0);
+   RangeType  *r2 = PG_GETARG_RANGE(1);
TypeCacheEntry *typcache;
RangeBound  lower1,
lower2;


The gist and SP-gist consistent functions could call the internal 
function directly.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pruning the TODO list

2012-06-21 Thread Peter Eisentraut

On tor, 2012-06-21 at 15:01 +0800, Simon Riggs wrote:
 ISTM that we should prune the list right down to nothing, or very,
 very few entries. We must have a TODO list that we can trust to save
 us time. I don't want to see any more people waste their time on
 issues that aren't really wanted. At the moment we have lots of false
 positives and so waste time and bring the TODO list into disrepute. It
 would be better to have a compact list where every item was reasonably
 accepted, so we can begin to trust it again. Trusting the TODO list is
 what brought me my first patch, and I think it could and should be the
 same with others.

Discussing the merits and design of a feature is half the work, so one
shouldn't expect all the things on the TODO list to be completely
discussed.  Otherwise someone would have typed in the code already.  So
we just need to clarify to readers that the list are items worth
thinking about, not items needing someone to type in some code.  The
archive links that should accompany most items should contain hints
about how far discussion has progressed.

Nonetheless, it would be a good idea to prune the TODO list regularly,
such as after a release.  We used to do that a bit, not so much lately,
perhaps.  But everyone is invited to contribute to that.

Note, however, that even in well-maintained bug tracking systems,
garbage and obsolete items accumulate over time, so some maintenance of
this kind would still be necessary.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sortsupport for text

2012-06-21 Thread Florian Pflug

On Jun20, 2012, at 19:38 , Peter Geoghegan wrote:
 On 20 June 2012 17:41, Tom Lane t...@sss.pgh.pa.us wrote:
 In any case, if you have to redefine the meaning of equality in order
 to justify a performance patch, I'm prepared to walk away at the start.
 
 The advantage of my proposed implementation is precisely that I won't
 have to redefine the meaning of equality, and that only the text
 datatype will have to care about equivalency, so you can just skip
 over an explanation of equivalency for most audiences.

Ok, so what exactly is it that you're proposing then? AFAICS, you're
proposing to make the less-than operator rely solely on strcol(). If you
don't also redefine the quality operator to match, you'll end up with
cases where !(a  b) and !(b  a), yet also !(a == b). This breaks the
trichotomy law, no? Which in turns breaks merge-joins - I believe we'd
output the cartesian product {potty, potyty} x {potyty, potty} when
merge-joining {potty, potyty} with itself, unless the comparison operator
contains a tie-breaker. Which is obviously wrong unless you decree
that 'potty' = 'potyty'.

I do agree that there's a use-case for having a textual type which
treats equivalent strings as equal (and which should then also treat
equivalent Unicode representations of the same character as equal). But
it should be a separate type, not bolted onto the existing textual types.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Too frequent message of pgbench -i?

2012-06-21 Thread Tatsuo Ishii

 On Wed, Jun 20, 2012 at 4:04 AM, Tatsuo Ishii is...@postgresql.org wrote:
 Currently pgbench -i prints following message every 10k tuples created.

                        fprintf(stderr, %d tuples done.\n, j);

 I think it's long time ago when the frequency of message seemed to be
 appropriate because computer is getting so fast these days and every
 10k message seems to be too often for me. Can we change the frequency
 from 10k to 100k?
 
 +1 for doing it that way.  I have an old patch lying around for that.
 It's one line, which seems to be about as much work as the problem
 justifies.

+1. Your patch looks good.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pruning the TODO list

2012-06-21 Thread Simon Riggs

On 21 June 2012 08:30, Peter Eisentraut pete...@gmx.net wrote:

 Nonetheless, it would be a good idea to prune the TODO list regularly,
 such as after a release.  We used to do that a bit, not so much lately,
 perhaps.  But everyone is invited to contribute to that.

The idea is to remove contentious issues from the list, to avoid the
waste of time.

I don't believe everyone can do that, not even close. That's why I
raise it here, rather than just doing it.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes

2012-06-21 Thread Andres Freund

Hi Steve,

On Thursday, June 21, 2012 02:16:57 AM Steve Singer wrote:
 On 12-06-13 07:28 AM, Andres Freund wrote:
  From: Andres Freundand...@anarazel.de
  
  The individual changes need to be identified by an xid. The xid can be a
  subtransaction or a toplevel one, at commit those can be reintegrated by
  doing a k-way mergesort between the individual transaction.
  
  Callbacks for apply_begin, apply_change and apply_commit are provided to
  retrieve complete transactions.
  
  Missing:
  - spill-to-disk
  - correct subtransaction merge, current behaviour is simple/wrong
  - DDL handling (?)
  - resource usage controls
 
 Here is an initial review of the ApplyCache patch.
Thanks!

 This patch provides a module for taking actions in the WAL stream and
 groups the actions by transaction, then passing these change records to
 a set of plugin functions.
 
 For each transaction it encounters it keeps a list of the actions in
 that transaction. The ilist included in an earlier patch is used,
 changes resulting from that patch review would effect the code here but
 not in a way that chances the design.  When the module sees a commit for
 a transaction it calls the apply_change callback for each change.
 
 I can think of three ways that a replication system like this could try
 to apply transactions.
 
 1) Each time it sees a new transaction it could open up a new
 transaction on the replica and makes that change.  It leaves the
 transaction open and goes on applying the next change (which might be
 for the current transaction or might be for another one).
 When it comes across a commit record it would then commit the
 transaction.   If 100 concurrent transactions were open on the origin
 then 100 concurrent transactions will be open on the replica.
 
 2) Determine the commit order of the transactions, group all the changes
 for a particular transaction together and apply them in that order for
 the transaction that committed first, commit that transaction and then
 move onto the transaction that committed second.
 
 3) Group the transactions in a way that you move the replica from one
 consistent snapshot to another.  This is what Slony and Londiste do
 because they don't have the commit order or commit timestamps. Built-in
 replication can do better.
 
 This patch implements option (2).   If we had a way of implementing
 option (1) efficiently would we be better off?
 Option (2) requires us to put unparsed WAL data (HeapTuples) in the
 apply cache.  You can't translate this to an independent LCR until you
 call the apply_change record (which happens once the commit is
 encountered).  The reason for this is because some of the changes might
 be DDL (or things generated by a DDL trigger) that will change the
 translation catalog so you can't translate the HeapData to LCR's until
 your at a stage where you can update the translation catalog.  In both
 cases you might need to see later WAL records before you can convert an
 earlier one into an LCR (ie TOAST).
Very good analysis, thanks!

Another reasons why we cannot easily do 1) is that subtransactions aren't 
discernible from top-level transactions before the top-level commit happens, 
we can only properly merge in the right order (by sorting via lsn) once we 
have seen the commit record which includes a list of all committed 
subtransactions.

I also don't think 1) would be particularly welcome by people trying to 
replicate into foreign systems.

 Some of my concerns with the apply cache are
 
 Big transactions (bulk loads, mass updates) will be cached in the apply
 cache until the commit comes along.  One issue Slony has todo with bulk
 operations is that the replicas can't start processing the bulk INSERT
 until after it has commited.  If it takes 10 hours to load the data on
 the master it will take another 10 hours (at best) to load the data into
 the replica(20 hours after you start the process).  With binary
 streaming replication your replica is done processing the bulk update
 shortly after the master is.
One reason why we have the low level apply stuff planned is that that way the 
apply on the standby actually takes less time than on the master. If you do it 
right there is significantly lower overhead there.
Still, a 10h bulk load will definitely cause problems. I don't think there is 
a way around that for now.

 Long running transactions can sit in the cache for a long time.  When
 you spill to disk we would want the long running but inactive ones
 spilled to disk first.  This is solvable but adds to the complexity of
 this module, how were you planning on managing which items of the list
 get spilled to disk?
I planned to have some cutoff 'max_changes_in_memory_per_txn' value. If it has 
been reached for one transaction all existing changes are spilled to disk. New 
changes again can be kept in memory till its reached again.

We need to support serializing the cache for crash recovery + shutdown of the 
receiving side as well. Depending on how

Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata

2012-06-21 Thread Amit Kapila

 The reason I'm concerned about selecting a next-LSN that's certainly beyond 
 every LSN in the database is that not doing 
 so could result in introducing further corruption, which would be entirely 
 avoidable with more care in choosing the 
 next-LSN.

 The further corruption can only be possible when we replay some wrong
 WAL by selecting wrong LSN.

 No, this is mistaken.  Pages in the database that have LSN ahead of
 where the server thinks the end of WAL is cause lots of problems
 unrelated to replay; for example, inability to complete a checkpoint.
 That might not directly lead to additional corruption, but consider
 the case where such a page gets further modified, and the server decides
 it doesn't need to create a full-page image because the LSN is ahead of
 where the last checkpoint was.  A crash or two later, you have new
 problems.

Incase any modification happen to the database after it started, even if the 
next-LSN is max LSN of pages,
the modification can create a problem because the database will be in 
inconsistent state. 

Please correct me if I am wrong in assuming that the next-LSN having value as 
max LSN of pages
1. has nothing to do with Replay. We should still reset the WAL so that no 
replay happens.
2. It is to avoid some further disasters.

With Regards,
Amit Kapila.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sortsupport for text

2012-06-21 Thread Florian Pflug

On Jun21, 2012, at 02:22 , Peter Geoghegan wrote:
 I've written a very small C++ program, which I've attached, that
 basically proves that this can still make a fairly large difference -
 I hope it's okay that that it's C++, but that allowed me to write the
 program quickly, with no dependencies for anyone who cares to try this
 out, other than a C++ compiler and the standard library.

Uh, are you sure the results aren't swapped? string_wrapper takes a
parameter called use_strcoll, and it indeed uses strcoll() if that
parameter is true, and strxfm() otherwise. But then it passes *false*
to determine the speed of strcoll(), and *true* to determine the speed
of strxfm()...

Also, place_random_string() needs to insert a terminating zero byte,
otherwise set_test segfaults, at least on my OSX machine.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sortsupport for text

2012-06-21 Thread Peter Geoghegan

On 21 June 2012 10:24, Florian Pflug f...@phlo.org wrote:
 On Jun21, 2012, at 02:22 , Peter Geoghegan wrote:
 I've written a very small C++ program, which I've attached, that
 basically proves that this can still make a fairly large difference -
 I hope it's okay that that it's C++, but that allowed me to write the
 program quickly, with no dependencies for anyone who cares to try this
 out, other than a C++ compiler and the standard library.

 Uh, are you sure the results aren't swapped? string_wrapper takes a
 parameter called use_strcoll, and it indeed uses strcoll() if that
 parameter is true, and strxfm() otherwise. But then it passes *false*
 to determine the speed of strcoll(), and *true* to determine the speed
 of strxfm()...

Egads, you're right. Apparently in my haste I gave the wrong boolean
argument to the class constructor in each case. I am completely wrong
about this. I apologise for the careless mistake. At least I advanced
the debate, though - we don't want to do any ad-hoc generation of
strxfrm() output within comparators, even when one side can have a
cached value.

You're right about merge-joins; I cannot answer that without
arbitrarily making a convenient exception about the conditions that
they join on, which is itself unacceptable. I cannot change the
semantics of equality to do something with strxfrm either, since that
would have unacceptable performance implications, as is evident from
my test-case. I wish someone had a better idea, but I suppose at this
point the idea of concatenating strxfrm() output with the original
string begins to look like the least-worst option.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sortsupport for text

2012-06-21 Thread Florian Pflug

On Jun21, 2012, at 11:55 , Peter Geoghegan wrote:
 On 21 June 2012 10:24, Florian Pflug f...@phlo.org wrote:
 On Jun21, 2012, at 02:22 , Peter Geoghegan wrote:
 I've written a very small C++ program, which I've attached, that
 basically proves that this can still make a fairly large difference -
 I hope it's okay that that it's C++, but that allowed me to write the
 program quickly, with no dependencies for anyone who cares to try this
 out, other than a C++ compiler and the standard library.
 
 Uh, are you sure the results aren't swapped? string_wrapper takes a
 parameter called use_strcoll, and it indeed uses strcoll() if that
 parameter is true, and strxfm() otherwise. But then it passes *false*
 to determine the speed of strcoll(), and *true* to determine the speed
 of strxfm()...
 
 Egads, you're right. Apparently in my haste I gave the wrong boolean
 argument to the class constructor in each case. I am completely wrong
 about this. I apologise for the careless mistake. At least I advanced
 the debate, though - we don't want to do any ad-hoc generation of
 strxfrm() output within comparators, even when one side can have a
 cached value.

FWIW, I've played around a bit with a fixed version of your set_test.cpp.
I made it use the cached sort key of the RHS also, made it preserve sort
keys during copy construction, even used a boost::shared_ptr to avoid
actually copying the cached sort key. The strxfrm() version still performed
about 30% worse than the strcoll() version. 

From that, I figured that tree inserts don't trigger enough comparisons to
make strxfrm() worthwhile. So I mangled set_test.cpp a bit more and instead
of a set::set, I created an (C-style) array of string_wrapper instances,
and sorted them with std::sort(). Which made strcoll() twice as fast as
strxfrm()...

At this point, my theory is that your choice of random strings prevents
strxfrm() from ever winning over strcoll(). The reason being that you pick
each letter uniformly distributed from a-z, resulting in a probability of
two string differing in the first of 1 - 1/26 =~ 96%. Thus, even for
extremely complex collation rules, strcoll() probably only needs to compare
a few characters to determine the order. Whereas strxfrm() has to compute
the whole sort key, no matter what.

The question is thus, how good a model are your random strings for the
input of a typical sorting step in postgres? My guess is, a quite good one
actually, since people probably don't deal with a lot of very similar strings
very often. Which makes we wonder if using strxfrm() during sorting wouldn't
be a net loss, all things considered…

My modified set_test.cpp (which should now be called array_test.cpp)
is attached.

best regards,
Florian Pflug


set_test.cpp
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sortsupport for text

2012-06-21 Thread Peter Geoghegan

On 21 June 2012 11:40, Florian Pflug f...@phlo.org wrote:
 At this point, my theory is that your choice of random strings prevents
 strxfrm() from ever winning over strcoll(). The reason being that you pick
 each letter uniformly distributed from a-z, resulting in a probability of
 two string differing in the first of 1 - 1/26 =~ 96%. Thus, even for
 extremely complex collation rules, strcoll() probably only needs to compare
 a few characters to determine the order. Whereas strxfrm() has to compute
 the whole sort key, no matter what.

Good point.

 The question is thus, how good a model are your random strings for the
 input of a typical sorting step in postgres? My guess is, a quite good one
 actually, since people probably don't deal with a lot of very similar strings
 very often. Which makes we wonder if using strxfrm() during sorting wouldn't
 be a net loss, all things considered…

Well, I think the answer to that has to be no. I posted a simple
postgres text - strxfrm blob bytea SQL-callable wrapper function a
few days ago that clearly wins by quite a bit on some sample queries
against the dellstore sample database, which has a representative
sample set. Completely uniformly-distributed data actually isn't
representative of the real world at all. Normal distributions abound.

This C++ program was never intended to justify the general utility of
using strxfrm() for sorting, which I don't believe is in question. I
just wanted to get some idea about the performance characteristics of
using strxfrm() to traverse an index.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] not null validation option in contrib/file_fdw

2012-06-21 Thread Etsuro Fujita

I rebased the patch to current head.  Attached is an updated version of the
patch.

Best regards,
Etsuro Fujita

 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Etsuro Fujita
 Sent: Tuesday, April 17, 2012 2:40 PM
 To: 'Andrew Dunstan'; 'Shigeru HANADA'
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] not null validation option in contrib/file_fdw

 I updated the patch.  Attached is an updated version of the patch.

 Changes:
   * fix a bug in fileGetOptions()
   * rename the validation option and its code to validate_data_file
   * clean up

 Best regards,
 Etsuro Fujita

  -Original Message-
  From: pgsql-hackers-ow...@postgresql.org
  [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Etsuro Fujita
  Sent: Monday, April 16, 2012 4:09 PM
  To: 'Andrew Dunstan'; 'Shigeru HANADA'
  Cc: pgsql-hackers@postgresql.org
  Subject: Re: [HACKERS] not null validation option in contrib/file_fdw
 
  Thank you for the review.
 
   -Original Message-
   From: pgsql-hackers-ow...@postgresql.org
   [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Andrew
   Dunstan
   Sent: Friday, April 13, 2012 9:16 PM
   To: Shigeru HANADA
   Cc: Etsuro Fujita; pgsql-hackers@postgresql.org
   Subject: Re: [HACKERS] not null validation option in
   contrib/file_fdw
  
  
  
   On 04/13/2012 07:21 AM, Shigeru HANADA wrote:
(2012/04/13 16:59), Etsuro Fujita wrote:
I updated the patch added to CF 2012-Next [1].  Attached is the
updated version of the patch.
I applied the patch and ran regression tests of file_fdw, and I
got SIGSEGV X-(
   
The failure occurs in fileGetOptions, and it is caused by
list_delete_cell used in foreach loop; ListCell points delete
target has been free-ed in list_delete_cell, but foreach accesses
it to get next element.
   
Some of backend functions which use list_delete_cell in loop use
for
loop instead of foreach, and other functions exit the loop after
calling list_delete_cell.  Since we can't stop searching non-COPY
options until meeting the end of the options list, we would need
to choose former (for loop), or create another list which
contains only valid COPY options and return it via other_options
 parameter.
   
  
   Yes, the code in fileGetOptions() appears to be bogus.
 
  Sorry, I will fix it.
 
   Also, validate is a terrible name for the option (and in the code)
  IMNSHO.
   It's far too generic. validate_not_null or some such would surely
   be better.
 
  I thought it would be used for not only NOT NULL but also CHECK and
 foreign
  key constraints.  That is, when a user sets the option to 'true',
  file_fdw verifies that each tuple meets all kinds of constraints.  So,
  how about validate_data_file or simply validate_file?
 
  Best regards,
  Etsuro Fujita
 
   cheers
  
   andrew
  
  
   --
   Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
   To make changes to your subscription:
   http://www.postgresql.org/mailpref/pgsql-hackers
 
 
 
  --
  Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To
  make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-hackers



begin 666 file_fdw_notnull_v4.patch
M9EF9B M+6=I=!A+V-O;G1R:6(O9FEL95]F9'O9FEL95]F9'N8R!B+V-O
M;G1R:6(O9FEL95]F9'O9FEL95]F9'N8PII;F1E!E,V(Y,C(S+BXQ.3$X
M-6(X(#$P,#8T- HM+2T@82]C;VYTFEB+V9I;5?9F1W+V9I;5?9F1W+F,*
M*RLK((O8V]N=')I8B]F:6QE7V9D=R]F:6QE7V9D=RYCD! (TW,2PV(LW
M,2PY($! ('-T871I8R!C;VYS=!S=')U8W0@1FEL949D=T]P=EO;B!V86QI
M9%]O'1I;VYS6UT@/2![B )(H@9F]R8V5?75O=4@:7,@;F]T('-U'!O
MG1E9!B2!F:6QE7V9D=R!B96-A=7-E(ET)W,@9F]R($-/4%D@5$\NB )
M(HOB **PDO*B!686QI9%T:6]N(]P=EO;G,@*B\**PE[(G9A;ED871E
M7V1A=%?9FEL92(L($9OF5I9VY486)L95)E;%T:6]N261]+ HKB )+RH@
M4V5N=EN96P@*B\*( E[3E5,3P@26YV86QI9$]I9'T*('T[D! (TY-PV
M(LY-RPX($! ('1Y5D968@W1R=6-T($9I;59'=%5C=71I;VY3=%T
M90H@6-H87()( @*F9I;5N86UE.PD)+RH@9FEL92!T;R!R96%D(HOB )
M3ES= D@( J;W!T:6]NSL)2\J(UEF=E9!#3U!9(]P=EO;G,L(5X
M8VQU9EN9R!F:6QE;F%M92 J+PH@4-O'E3=%T90ECW1A=4[0D)+RH@
MW1A=4@;V8@F5A9EN9R!F:6QE(HOBL)8F]O; D)=F%L:61A=5?9%T
M85]F:6QE.PHK0D)0D)0DO*B!W:5T:5R('1O('9A;ED871E($Y/5!.
M54Q,(-O;G-TF%I;G1S(HOB!]($9I;59'=%5C=71I;VY3=%T93L*
M( H@+RH*0$ @+3$S-PW(LQ,SDL.2! 0!S=%T:6,@8F]O;!F:6QE06YA
M;'EZ949OF5I9VY486)L92A296QA=EO;B!R96QA=EO;BP*( J+PH@W1A
M=EC()O;VP@:7-?=F%L:61?;W!T:6]N*-O;G-T(-H87(@*F]P=EO;BP@
M3VED(-O;G1E'0I.PH@W1A=EC('9O:60@9FEL94=E=$]P=EO;G,H3VED
M(9OF5I9VYT86)L96ED+ HM0D)( @8VAAB J*F9I;5N86UE+!,:7-T
M(HJ;W1H97)?;W!T:6]NRD[BL)0D)0D@(!C:%R(HJ9FEL96YA;64L
MBL)0D)0D@(!,:7-T(HJ;W1H97)?;W!T:6]NRP**PD)0D)2 @()O
M;VP@*G9A;ED871E7V1A=%?9FEL92D[B!S=%T:6,@3ES= J9V5T7V9I
M;5?9F1W7V%T=')I8G5T95]O'1I;VYS*$]I9!R96QI9D[B!S=%T:6,@
M=F]I9!EW1I;6%T95]S:7IE*%!L86YN97));F9O(IR;V]T+!296Q/'1)
M;F9O(IB87-EF5L+ H@0D)(!:6QE1F1W4QA;E-T871E(IF9'=?')I
M=F%T92D[D! (TQ.#(L-B K,3@Y+#@0$ @9FEL95]F9'=?=F%L:61A=]R
M*%!'7T953D-424].7T%21U,IB )8VAA@D@( J9FEL96YA;64@/2!.54Q,

Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases

2012-06-21 Thread Kyotaro HORIGUCHI

Hello,

  Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr
  and SvPVUTF8() when turning a perl string into a cstring.
 
 Right.

I spent a bit longer time catching on pl/perl and now understand
what is the problem...

 So I played a bit with this patch, and touched it a bit mainly just to
 add some more comments; and while at it I noticed that some of the
 functions in Util.xs might leak some memory, so I made an attempt to
 plug them, as in the attached patch (which supersedes yours).

Ok, Is it ok to look into the newer patch including fix of leaks
at first?

-- Coding and styles.

This also seems to have polished the previous one on some codes,
styles and comments which generally look reasonable. And patch
style was corrected into unified.

-- Functions
I seems to work properly on the database the encodings of which
are SQL_ASCII and UTF8 (and EUC-JP) as below,

=
= create or replace function foo(text) returns text language plperlu as $$ $a 
= shift; return BOO! if ($a != a\x80cあ); return $a; $$;
SQL_ASCII= select foo(E'a\200cあ') = E'a\200cあ';
 ?column? 
--
 t
UTF8= select foo(E'a\200cあ');
ERROR:  invalid byte sequence for encoding UTF8: 0x80
UTF8= select foo(E'a\302\200cあ') = E'a\u0080cあ';
 ?column? 
--
 t
=

This looks quite valid according to the definition of the
encodings and perl's nature as far as I see.


-- The others

Variable naming in util_quote_*() seems a bit confusing,

  text *arg = sv2text(sv);
  text *ret = DatumGetTextP(..., PointerGetDatum(arg)));
  char *str;
  pfree(arg);
  str = text_to_cstring(ret);
  RETVAL = cstr2sv(str);
  pfree(str);

Renaming ret to quoted and str to ret as the patch attached might
make it easily readable.


 Now, with my version of the patch applied and using a SQL_ASCII database
 to test the problem in the original report, I notice that we now have a
 regression failure:
snip.
 I'm not really sure what to do here -- maybe have a second expected file
 for that test is a good enough answer?  Or should I just take the test
 out?  Opinions please.


The attached ugly patch does it. We seem should put NO_LOCALE=1
on the 'make check' command line for the encodings not compatible
with the environmental locale, although it looks work.

# UtfToLocal() seems to have a bug that always report illegal
# encoding was UTF8 regardless of the real encoding. But
# plper_lc_*.(sql|out) increases if the bug is fixed.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.
diff --git a/src/pl/plperl/Util.xs b/src/pl/plperl/Util.xs
index 7d0102b..4b4b680 100644
--- a/src/pl/plperl/Util.xs
+++ b/src/pl/plperl/Util.xs
@@ -67,8 +67,11 @@ static text *
 sv2text(SV *sv)
 {
 	char	   *str = sv2cstr(sv);
+	text	   *text;
 
-	return cstring_to_text(str);
+	text = cstring_to_text(str);
+	pfree(str);
+	return text;
 }
 
 MODULE = PostgreSQL::InServer::Util PREFIX = util_
@@ -113,10 +116,12 @@ util_quote_literal(sv)
 }
 else {
 text *arg = sv2text(sv);
-text *ret = DatumGetTextP(DirectFunctionCall1(quote_literal, PointerGetDatum(arg)));
-		char *str = text_to_cstring(ret);
-		RETVAL = cstr2sv(str);
-		pfree(str);
+text *quoted = DatumGetTextP(DirectFunctionCall1(quote_literal, PointerGetDatum(arg)));
+char *ret;
+pfree(arg);
+ret = text_to_cstring(quoted);
+RETVAL = cstr2sv(ret);
+pfree(ret);
 }
 OUTPUT:
 RETVAL
@@ -132,10 +137,12 @@ util_quote_nullable(sv)
 else
 	{
 text *arg = sv2text(sv);
-text *ret = DatumGetTextP(DirectFunctionCall1(quote_nullable, PointerGetDatum(arg)));
-		char *str = text_to_cstring(ret);
-		RETVAL = cstr2sv(str);
-		pfree(str);
+text *quoted = DatumGetTextP(DirectFunctionCall1(quote_nullable, PointerGetDatum(arg)));
+char *ret;
+pfree(arg);
+ret = text_to_cstring(quoted);
+RETVAL = cstr2sv(ret);
+pfree(ret);
 }
 OUTPUT:
 RETVAL
@@ -145,14 +152,15 @@ util_quote_ident(sv)
 SV *sv
 PREINIT:
 text *arg;
-text *ret;
-		char *str;
+text *quoted;
+char *ret;
 CODE:
 arg = sv2text(sv);
-ret = DatumGetTextP(DirectFunctionCall1(quote_ident, PointerGetDatum(arg)));
-		str = text_to_cstring(ret);
-		RETVAL = cstr2sv(str);
-		pfree(str);
+quoted = DatumGetTextP(DirectFunctionCall1(quote_ident, PointerGetDatum(arg)));
+pfree(arg);
+		ret = text_to_cstring(quoted);
+RETVAL = cstr2sv(ret);
+pfree(ret);
 OUTPUT:
 RETVAL
 
diff --git a/src/pl/plperl/plperl_helpers.h b/src/pl/plperl/plperl_helpers.h
index 1b6648b..ed99194 100644
--- a/src/pl/plperl/plperl_helpers.h
+++ b/src/pl/plperl/plperl_helpers.h
@@ -3,21 +3,29 @@
 
 /*
  * convert from utf8 to database encoding
+ *
+ * Returns a palloc'ed copy of the original string
  */
 static inline char *

[HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-21 Thread Andres Freund

Hi Robert, Hi all,

Robert and I talked quite a bit about different methods of providing enough 
information to extract tuples from wal. I don't think either of us is yet 
really convinced of any individual method, so I want to recap our discussion 
in one email so others can chime in without reading the already huge thread.

I hope I am not misrepesenting Roberts opinion here, but I am sure he will 
correct me if I do ;)

To satisfy the different needs people have for changeset extraction we 
currently think that for each individual tuple extracted from wal we need to 
provide the following information:

a) one or more pieces of tuple data (HeapTupleData structs)
  * INSERT: full new tuple
  * UPDATE: full new tuple, old pkey (optionally the full old tuple)
  * DELETE: old pkey (optionally the full old tuple)
b) the action performed (INSERT|UPDATE|DELETE)
c) the table on which the action was performed
d) access to the table structure (names, column types, ...) procs (*_out 
functions for the individual columns)

The problem with getting that data is that at the point were decoding the wal 
the catalog may have evolved significantly from the state it was in when the 
tuple was put into the wal.
We can extract a) and b) without any problems (lets not talk about it here) 
but we don't necessarily know how to make sense of the data because a 
HeapTuple cannot be properly interpreted without the knowledge of c) and d).

I am of the opinion that c) is basically equivalent to solving d) because the 
wal only contains the tuple (pg_database.oid, pg_tablespace.oid, 
pg_class.relfilenode) of the table and not the 'pg_class.oid'. The relfilenode 
is changed by operations that rewrite the table like ALTER TABLE ADD COLUMN 
... DEFAULT ...; TRUNCATE; CLUSTER and some others.

A single transaction can contain tuples for different relfilenodes and with 
different columns:

CREATE TABLE foo(id serial primary key, data text);
BEGIN;
INSERT INTO foo ...;
TRUNCATE foo;
INSERT INTO foo ...; -- same structure, different relfilenode

ALTER TABLE foo ADD COLUMN bar text;
INSERT INTO foo ...; -- same relfilenode, different table structure

ALTER TABLE foo ADD COLUMN zaphod text DEFAULT '';
INSERT INTO foo ...; -- different relfilenode, different table structure
COMMIT;

There are several additional complex scenarios.

In http://archives.postgresql.org/message-
id/201206192023.20589.and...@2ndquadrant.com I listed which options I see for 
reaching that goal.

A refined version of that list:


1.)
Decode on a different, possibly catalog-only, pg instance kept in sync using 
the command trigger infrastructure (but not necessarily user-level defined 
command triggers)

If the command/event trigger infrastructure logs into a system-catalog table 
keeping the catalog in the correct state is relatively easy. When 
replaying/converting a reassembled transaction everytime an INSERT into that 
system table happens the contained DDL gets performed. 
The locking on the generating side takes care of the concurrency aspects.

Advantages:
* minimal overhead (space, performance)
* allows additional tables/indexes/triggers if you take care with oid 
allocation
* easy transactionally correct catalog behaviour behaviour
* the decoding instance can be used to store all data in a highly efficient 
manner (no decoding, no full detoasting, ...)
   * the decoding instance is fully writable without problems if you don't 
generate conflicts (separate tables, non-overlapping writes, whatever)
* implementable in a pretty unintrusive way

Disadvantes:
* the table structure of replicated tables needs to be exactly the same
* the type definition + support procs needs to be similar enough to read the 
data
* error checking of the above isn't easy but probably possible
* full version/architecture compatibility required
* a proxy instance required even if you want to replicate into some other 
system/architecture/version

2.)
Keep the decoding site up2date by replicating the catalog via normal HS 
recovery 
mechanisms.

Advantages:
* most of the technology is already there
* minimal overhead (space, performance)
* no danger of out of sync catalogs
* no support for command triggers required that can keep a catalog in sync, 
including oids

Disadvantages:
* driving the catalog recovery that way requires some somewhat intricate code 
as it needs to be done in lockstep with decoding the wal-stream
* requires an additional feature to guarantee HS always has enough information 
to be queryable after a crash/shutdown
* some complex logic/low-level fudging required to keep the transactional 
behaviour sensible when querying the catalog
* full version/architecture compatibility required
* the decoding site will always ever be only readable

3)
Multi-Versioned catalog

Below are two possible implementation strategies for that concept

Advantages:
* Decoding is done on the master in an asynchronous fashion
* low overhead during normal DML execution, not much additional

Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases

2012-06-21 Thread Kyotaro HORIGUCHI

Ouch!

 # UtfToLocal() seems to have a bug that always report illegal
 # encoding was UTF8 regardless of the real encoding. But
 # plper_lc_*.(sql|out) increases if the bug is fixed.

This is not a bug. The error message invalid byte sequence for
encoding UTF8 meant invalid INPUT byte sequence as encoding
UTF8 not encoding for output.. So the regtest patch covers all
of possible patterns - UTF8 and SQL_ASCII..

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Resource Owner reassign Locks

2012-06-21 Thread Heikki Linnakangas


On 18.06.2012 13:59, Heikki Linnakangas wrote:

On 10.06.2012 23:39, Jeff Janes wrote:
I found the interface between resowner.c and lock.c a bit confusing.
resowner.c would sometimes call LockReassignCurrentOwner() to reassign
all the locks, and sometimes it would call LockReassignCurrentOwner() on
each individual lock, with the net effect that's the same as calling
LockReleaseCurrentOwner(). And it doesn't seem right for
ResourceOwnerRemember/ForgetLock to have to accept a NULL owner.

I rearranged that so that there's just a single
LockReassignCurrentOwner() function, like before this patch. But it
takes as an optional argument a list of locks held by the current
resource owner. If the caller knows it, it can pass them to make the
call faster, but if it doesn't it can just pass NULL and the function
will traverse the hash table the old-fashioned way. I think that's a
better API.

Please take a look to see if I broke something.


I hear no complaints, so committed. Thanks!

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-21 Thread Florian Pflug

On Jun21, 2012, at 13:41 , Andres Freund wrote:
 3b)
 Ensure that enough information in the catalog remains by fudging the xmin 
 horizon. Then reassemble an appropriate snapshot to read the catalog as the 
 tuple in question has seen it.

The ComboCID machinery makes that quite a bit harder, I fear. If a tuple is
updated multiple times by the same transaction, you cannot decide whether a
tuple was visible in a certain snapshot unless you have access to the updating
backend's ComboCID hash.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pruning the TODO list

2012-06-21 Thread Tom Lane

Simon Riggs si...@2ndquadrant.com writes:
 On 21 June 2012 08:30, Peter Eisentraut pete...@gmx.net wrote:
 Nonetheless, it would be a good idea to prune the TODO list regularly,
 such as after a release.  We used to do that a bit, not so much lately,
 perhaps.  But everyone is invited to contribute to that.

 The idea is to remove contentious issues from the list, to avoid the
 waste of time.

The thing is, a lot of stuff gets punted to the TODO list *because*
it's contentious, ie there's not consensus on what to do.  If there
were consensus we might've just done it already.  I'm not sure we want
to remove such entries, though perhaps somehow marking them as debatable
would be a good thing.

There may well be stuff on the list that is no longer very relevant in
today's world, but somebody would have to go through it item by item
to decide which ones those are.  I'm not volunteering.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-21 Thread Florian Pflug

On Jun21, 2012, at 13:41 , Andres Freund wrote:
 5.)
 The actually good idea. Yours?

What about a mixure of (3b) and (4), which writes the data not to the WAL
but to a separate logical replication log. More specifically:

There's a per-backend queue of change notifications.

Whenever a non-catalog tuple is modified, we queue a TUPLE_MODIFIED
record containing (xid, databaseoid, tableoid, old xmin, old ctid, new ctid)

Whenever a table (or something that a table depends on) is modified we
wait until all references to that table's oid have vanished from the queue,
then queue a DDL record containing (xid, databaseoid, tableoid, text). Other
backend cannot concurrently add further TUPLE_MODIFIED records since we alreay
hold an exclusive lock on the table at that point.

A background process continually processes these queues. If the front of the
queue is a TUPLE_MODIFIED record, it fetches the old and the new tuple
based on their ctids and writes the old tuple's PK and the full new tuple
to the logical replication log. Since table modifications always wait for
all previously queued TUPLE_MODIFIED records referencing that table to be
processes *before* altering the catalog, tuples can always be interpreted
according to the current (SnapshotNow) catalog contents.

Upon transaction COMMIT and ROLLBACK, we queue COMMIT and ROLLBACK records,
which are also written to the log by the background process. The background
process may decide to wait until a backend commits before processing that
backend's log. In that case, rolled back transaction don't leave a trace in
the logical replication log. Should a backend, however, issue a DDL statement,
the background process *must* process that backend's queue immediately, since
otherwise there's a dead lock.

The background process also maintains a value in shared memory which
contains the oldest value in any of the queue's xid or old xmin fields.
VACUUM and the like must not remove tuples whose xmin is = that value.
Hit bits *may* be set for newest tuples though, provided that the background
process ignores hint bits when fetching the old and new tuples.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Reseting undo/redo logs

2012-06-21 Thread Edmon Begoli

I have this issue on Greenplum which is a MPP hybrid build from
postgres 8.2, and the issue I am seeing is 100% from pg code.

One of the Greenplum segments went down and it cannot recover because
PANIC  XX000 invalid redo/undo record in shutdown checkpoint
(xlog.c:6576)

I am posting this question here because most casual users of
Postgres/Greenplum are telling me that database is hosed, but I think
that with pg_resetxlog and some
(http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html) data
loss I could at least hack database to come back up.

What I am asking for help here is to help me calculate the reset
values - where to find the most recent valid one and how to
*specifically* calculate the reset ones.

Please advise,
Edmon


2012-06-12 13:16:18.614912
EDT p14611  th802662304 0   
seg-1   LOG 0   mirror 
transition,
primary address(port) 'boxgp10a(41001)' mirror address(port)
'boxgp02a(51001)'   mirroring role 'primary 
role' mirroring state
'change tracking' segment state 'not initialized' process name(pid)
'filerep main process(14611)' filerep state 'not initialized'
0   cdbfilerep.c3371
2012-06-12 13:16:18.617047
EDT p14612  th802662304 0   
seg-1   LOG 0   
CHANGETRACKING:
ChangeTracking_RetrieveIsTransitionToInsync() found
insync_transition_completed:'false' full
resync:'false'  0   
cdbresynchronizechangetracking.c2522
2012-06-12 13:16:18.617113
EDT p14612  th802662304 0   
seg-1   LOG 0   
CHANGETRACKING:
ChangeTracking_RetrieveIsTransitionToResync() found
resync_transition_completed:'false' full
resync:'false'  0   
cdbresynchronizechangetracking.c2559
2012-06-12 13:16:18.746870
EDT p14612  th802662304 0   
seg-1   LOG 0   
searching for last
checkpoint location for creating the initial resynchronize
changetracking  0   
xlog.c  10836
2012-06-12 13:16:18.747318
EDT p14612  th802662304 0   
seg-1   LOG 0   record 
with zero
length at 14/4870   0   
xlog.c  4182
2012-06-12 13:16:18.747491
EDT p14612  th802662304 0   
seg-1   LOG 0   scanned 
through 1
initial xlog records since last checkpoint for writing into the
resynchronize change log
0   cdbresynchronizechangetracking.c206
2012-06-12 13:16:18.750830
EDT p14624  th802662304 0   
seg-1   LOG 0   
database system was
shut down at 2012-06-12 11:00:13 EDT
0   xlog.c  6326
2012-06-12 13:16:18.750987
EDT p14624  th802662304 0   
seg-1   LOG 0   
checkpoint record is
at 14/4820  0   
xlog.c  6425
2012-06-12 13:16:18.751016
EDT p14624  th802662304 0   
seg-1   LOG 0   redo 
record is at
14/4820; undo record is at 14/42AC2118; shutdown
TRUE0   xlog.c  
6534
2012-06-12 13:16:18.751041
EDT p14624  th802662304 0   
seg-1   LOG 0   next 
transaction ID:
0/4553423; next OID: 241771 
0   xlog.c  6538
2012-06-12 13:16:18.751065
EDT p14624  th802662304 0   
seg-1   LOG 0   next 
MultiXactId: 271;
next MultiXactOffset: 549   
0   xlog.c  6541
2012-06-12 13:16:18.796637
EDT p14624  th802662304 0   
seg-1   PANIC   XX000   invalid
redo/undo record in shutdown checkpoint

Re: [HACKERS] Release versioning inconsistency

2012-06-21 Thread Magnus Hagander

On Wed, Jun 20, 2012 at 5:20 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Eisentraut pete...@gmx.net writes:
 On ons, 2012-06-20 at 13:26 +0200, Magnus Hagander wrote:
 That might actually be a good idea. We can't really change the way we
 named the betas, but it's not too late to consider naming the actual
 release as 9.2.0...

 The final release was always going to be called 9.2.0, but naming the
 beta 9.2.0betaX is wrong.  There was a previous discussion about that
 particular point.

 Yes.  There is no reason to change the naming scheme we have been using
 for years now (at least since version_stamp.pl was invented for 7.4).
 The only problem is that somebody got the name of the directory wrong on
 the FTP server.

If that wasn't clear, then yes, that was me.

I don't recall the reason why using 9.2.0betax was actually wrong - i
realize that's not the name of the version, so thereby the directory
was wrong. But in what way would it be wrong to call the version that?
Given that it would help with sorting. (And yes, this is a very
long-forward question, more about 9.3, since we can't really go back
and change the current filename..)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Release versioning inconsistency

2012-06-21 Thread Magnus Hagander

On Wed, Jun 20, 2012 at 1:35 PM, Dickson S. Guedes lis...@guedesoft.net wrote:
 2012/6/20 Magnus Hagander mag...@hagander.net:
 On Wed, Jun 20, 2012 at 11:23 AM, Marti Raudsepp ma...@juffo.org wrote:
 On Wed, Jun 20, 2012 at 12:18 PM, Magnus Hagander mag...@hagander.net 
 wrote:
 (I do believe that using the v9.2.0beta marker is
 *better*, because then it sorts properly. But likely not enough much
 better to be inconsistent with previous versions)

 Good point. Maybe that's a reason to change the versioning scheme and
 stick with 9.2.0betaX everywhere. Including calling the final
 release 9.2.0 instead of simply 9.2?

 That might actually be a good idea. We can't really change the way we
 named the betas, but it's not too late to consider naming the actual
 release as 9.2.0...


 May be a symlink could be created just do fit the same pattern that other
 versions do and keeps the actual links (for beta) working.

That we can do - in fact, done.



-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pruning the TODO list

2012-06-21 Thread Simon Riggs

On 21 June 2012 15:00, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 On 21 June 2012 08:30, Peter Eisentraut pete...@gmx.net wrote:
 Nonetheless, it would be a good idea to prune the TODO list regularly,
 such as after a release.  We used to do that a bit, not so much lately,
 perhaps.  But everyone is invited to contribute to that.

 The idea is to remove contentious issues from the list, to avoid the
 waste of time.

 The thing is, a lot of stuff gets punted to the TODO list *because*
 it's contentious, ie there's not consensus on what to do.  If there
 were consensus we might've just done it already.  I'm not sure we want
 to remove such entries, though perhaps somehow marking them as debatable
 would be a good thing.

 There may well be stuff on the list that is no longer very relevant in
 today's world, but somebody would have to go through it item by item
 to decide which ones those are.  I'm not volunteering.

smiles Understood

I'll have a play. Maybe I should just go with the idea of Simon's
TODO List - stuff I personally think is worth working on, and leave
it at that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases

2012-06-21 Thread Alex Hunsaker

On Wed, Jun 20, 2012 at 1:15 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Alex Hunsaker's message of vie feb 10 16:53:05 -0300 2012:

 Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr
 and SvPVUTF8() when turning a perl string into a cstring.

 Right.

 So I played a bit with this patch, and touched it a bit mainly just to
 add some more comments; and while at it I noticed that some of the
 functions in Util.xs might leak some memory, so I made an attempt to
 plug them, as in the attached patch (which supersedes yours).

I think most of these leaks go back to 9.0. Dunno if its worth
backpatching them...

 to test the problem in the original report, I notice that we now have a
 regression failure:

 I'm not really sure what to do here -- maybe have a second expected file
 for that test is a good enough answer?  Or should I just take the test
 out?  Opinions please.

I think we have broken that check twice so it seems like it would be
nice to keep. But I don't feel *to* strongly about it.

The comment and cleanups all look good to me.

Thanks!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-21 Thread Simon Riggs

On 21 June 2012 12:41, Andres Freund and...@2ndquadrant.com wrote:

 3)
 Multi-Versioned catalog

 Below are two possible implementation strategies for that concept

 Advantages:
 * Decoding is done on the master in an asynchronous fashion
 * low overhead during normal DML execution, not much additional code in that
 path
 * can be very efficient if architecture/version are the same
 * version/architecture compatibility can be done transparently by falling back
 to textual versions on mismatch

 Disadvantages:
 * decoding probably has to happen on the master which might not be what people
 want performancewise

 3a)
 Change the system catalogs to be versioned

 Advantages.
 * catalog access is easy
 * might be interesting for other users

 Disadvantages:
 * catalog versioning is complex to implement
 * space overhead for all users, even without using logical replication
 * I can't see -hackers signing off

Hmm, there's all sorts of stuff mixed up there in your description.

ISTM we should maintain a lookup table on target system that has the
minimal required information on it.

There is no need to version the whole catalog. (Complete overkill - I
would oppose it ;-)

If we keep the lookup table on the target as a normal table, we can
insert new rows into it as changes occur. If we need to perform
recovery then the earlier version rows will still be there and we just
use those. Versioning is easy to implement, just use LSN as additional
key in the table. Then lookup based on key and LSN. If a transaction
that makes DDL changes aborts, then the changes will be automatically
backed out.

Only keep the lookup table if using logical replication, so zero
overhead otherwise. We just need to setup the initial state carefully,
so it matches whats in the database, but that sounds OK.

So I don't see any of the disadvantages you have there. Its just darn
simple, and hence will probably work. It's also a very similar
solution to the other lookups required in memory by the apply process.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-21 Thread Andres Freund

Hi,

On Thursday, June 21, 2012 04:39:21 PM Simon Riggs wrote:
 On 21 June 2012 12:41, Andres Freund and...@2ndquadrant.com wrote:
  3)
  Multi-Versioned catalog
  
  Below are two possible implementation strategies for that concept
  
  Advantages:
  * Decoding is done on the master in an asynchronous fashion
  * low overhead during normal DML execution, not much additional code in
  that path
  * can be very efficient if architecture/version are the same
  * version/architecture compatibility can be done transparently by falling
  back to textual versions on mismatch
  
  Disadvantages:
  * decoding probably has to happen on the master which might not be what
  people want performancewise
  
  3a)
  Change the system catalogs to be versioned
  
  Advantages.
  * catalog access is easy
  * might be interesting for other users
  
  Disadvantages:
  * catalog versioning is complex to implement
  * space overhead for all users, even without using logical replication
  * I can't see -hackers signing off
 
 Hmm, there's all sorts of stuff mixed up there in your description.
Sure, it tried to compress a complex topic discussed in a long thread ;)

 ISTM we should maintain a lookup table on target system that has the
 minimal required information on it.
You need just about the whole catalog because the *_out procs might need to 
lookup types, operators and such again. 
Unless you want to rewrite those functions you need to provide a normal 
execution environment.

I don't see how your idea works because of that? Am I missing something?

Yes, that would be easier if we didn't want to support conversion to text and 
similar, but I don't see that flying. And even if it would be acceptable you 
would need to have enough information to construct a btree ScanKey which means 
you already need a lot of the catalogs.

 There is no need to version the whole catalog. (Complete overkill - I
 would oppose it ;-)
Hey, that originally was your idea :P. But I definitely agree, its not a good 
idea.

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases

2012-06-21 Thread Alex Hunsaker

On Thu, Jun 21, 2012 at 5:22 AM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:

 So I played a bit with this patch, and touched it a bit mainly
 [...] functions in Util.xs might leak some memory, so I made an attempt to

 Ok, Is it ok to look into the newer patch including fix of leaks
 at first?

Yeah :-).

 Variable naming in util_quote_*() seems a bit confusing,

 Renaming ret to quoted and str to ret as the patch attached might
 make it easily readable.

Ok.

 [ regression failure on a  SQL_ASCII database ]
 I'm not really sure what to do here -- maybe have a second expected file
 for that test is a good enough answer?  Or should I just take the test
 out?  Opinions please.

 The attached ugly patch does it. We seem should put NO_LOCALE=1
 on the 'make check' command line for the encodings not compatible
 with the environmental locale, although it looks work.

+REGRESS_LC0 = $(subst .sql,,$(shell cd sql; ls plperl_lc_$(shell echo
$(ENCODING) | tr A-Z- a-z_).sql 2/dev/null))
+REGRESS_LC = $(if $(REGRESS_LC0),$(REGRESS_LC0),plperl_lc)
+REGRESS = plperl $(REGRESS_LC) plperl_trigger plperl_shared
plperl_elog plperl_util plperl_init plperlu plperl_array

Hrm, that's quite cute. I dunno if there is a more cannon way of doing
the above-- but it seems to work. I'm not sure this regression test is
worth it. I'm thinking maybe we should just remove
theegressionegression test instead.

There is a minor issue with the patch where
sql/plperl_lc_sql_ascii.sql contains the text plperl_lc.sql. After
copying sql/plperl_lc.sql to sql/plperl_lc_sql_ascii.sql everything
worked as described.

Thanks for the review!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-21 Thread Simon Riggs

On 21 June 2012 15:53, Andres Freund and...@2ndquadrant.com wrote:

 ISTM we should maintain a lookup table on target system that has the
 minimal required information on it.

 You need just about the whole catalog because the *_out procs might need to
 lookup types, operators and such again.
 Unless you want to rewrite those functions you need to provide a normal
 execution environment.

OK, so its more tables than I first thought, but its not all rows and
columns of all catalog tables.


 I don't see how your idea works because of that? Am I missing something?

Why does the number/size of the tables required make that not work?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-21 Thread Andres Freund

On Thursday, June 21, 2012 05:05:04 PM Simon Riggs wrote:
 On 21 June 2012 15:53, Andres Freund and...@2ndquadrant.com wrote:
  ISTM we should maintain a lookup table on target system that has the
  minimal required information on it.
  
  You need just about the whole catalog because the *_out procs might need
  to lookup types, operators and such again.
  Unless you want to rewrite those functions you need to provide a normal
  execution environment.
 
 OK, so its more tables than I first thought, but its not all rows and
 columns of all catalog tables.
Sure, there are a few you probably can leave out (pg_database, pg_auth*, 
pg_*acl, pg_(sh)?depend, pg_database, pg_tablespace, ...) but its not many.

  I don't see how your idea works because of that? Am I missing something?
 Why does the number/size of the tables required make that not work?
The number of tables itself isn't a fundamental problem although it would make 
stuff harder.
The problem is that the out functions expect a normal operating environment 
and might e.g. do catalog lookups themselves. I don't see how we can do 
anything here without providing a (nearly) full catalog.

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-21 Thread Simon Riggs

On 21 June 2012 16:13, Andres Freund and...@2ndquadrant.com wrote:
 On Thursday, June 21, 2012 05:05:04 PM Simon Riggs wrote:
 On 21 June 2012 15:53, Andres Freund and...@2ndquadrant.com wrote:
  ISTM we should maintain a lookup table on target system that has the
  minimal required information on it.
 
  You need just about the whole catalog because the *_out procs might need
  to lookup types, operators and such again.
  Unless you want to rewrite those functions you need to provide a normal
  execution environment.

 OK, so its more tables than I first thought, but its not all rows and
 columns of all catalog tables.
 Sure, there are a few you probably can leave out (pg_database, pg_auth*,
 pg_*acl, pg_(sh)?depend, pg_database, pg_tablespace, ...) but its not many.

That's a start. Leaving out the shared catalogs makes me smile already.


  I don't see how your idea works because of that? Am I missing something?
 Why does the number/size of the tables required make that not work?
 The number of tables itself isn't a fundamental problem although it would make
 stuff harder.
 The problem is that the out functions expect a normal operating environment
 and might e.g. do catalog lookups themselves. I don't see how we can do
 anything here without providing a (nearly) full catalog.

I accept that there could be pathological functions in there. We're
not trying to make it work with any conceivable datatype/operator, so
forcing logical replication to follow sensible rules makes sense. Are
there any out functions that anybody uses that do that?

It's too much change to actually version the main catalog. Keeping a
separate copy of a versioned catalog for use by replication sounds
much more likely to fly.


In any case, I think we'll have to go back through the list and do
more work on evaluation. When the options look like that, its typical
to have ruled out the final winner early on, but that doesn't mean it
isn't in there somewhere.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-21 Thread Andres Freund

On Thursday, June 21, 2012 05:25:41 PM Simon Riggs wrote:
 On 21 June 2012 16:13, Andres Freund and...@2ndquadrant.com wrote:
  On Thursday, June 21, 2012 05:05:04 PM Simon Riggs wrote:
  On 21 June 2012 15:53, Andres Freund and...@2ndquadrant.com wrote:
   ISTM we should maintain a lookup table on target system that has the
   minimal required information on it.
   
   You need just about the whole catalog because the *_out procs might
   need to lookup types, operators and such again.
   Unless you want to rewrite those functions you need to provide a
   normal execution environment.
  
  OK, so its more tables than I first thought, but its not all rows and
  columns of all catalog tables.
  
  Sure, there are a few you probably can leave out (pg_database, pg_auth*,
  pg_*acl, pg_(sh)?depend, pg_database, pg_tablespace, ...) but its not
  many.
 
 That's a start. Leaving out the shared catalogs makes me smile already.
 
   I don't see how your idea works because of that? Am I missing
   something?
  
  Why does the number/size of the tables required make that not work?
  
  The number of tables itself isn't a fundamental problem although it would
  make stuff harder.
  The problem is that the out functions expect a normal operating
  environment and might e.g. do catalog lookups themselves. I don't see
  how we can do anything here without providing a (nearly) full catalog.
 
 I accept that there could be pathological functions in there. We're
 not trying to make it work with any conceivable datatype/operator, so
 forcing logical replication to follow sensible rules makes sense. Are
 there any out functions that anybody uses that do that?
Loads. enum_out, record_out, array_out are examples I can think of without 
even looking. I am pretty sure there are more. But imo this list already shows 
its prohibitive.

 It's too much change to actually version the main catalog. Keeping a
 separate copy of a versioned catalog for use by replication sounds
 much more likely to fly.
I don't yet see how that should work given oids and everything are quite 
possibly hardcoded in those functions. You could start switching out the 
catalogs on a lower level but I think at that point its getting too ugly.

 In any case, I think we'll have to go back through the list and do
 more work on evaluation. When the options look like that, its typical
 to have ruled out the final winner early on, but that doesn't mean it
 isn't in there somewhere.
I hope we have but I am not convinced that there is an elegant solution...

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-21 Thread Andres Freund

On Thursday, June 21, 2012 03:56:54 PM Florian Pflug wrote:
 On Jun21, 2012, at 13:41 , Andres Freund wrote:
  3b)
  Ensure that enough information in the catalog remains by fudging the xmin
  horizon. Then reassemble an appropriate snapshot to read the catalog as
  the tuple in question has seen it.
 
 The ComboCID machinery makes that quite a bit harder, I fear. If a tuple is
 updated multiple times by the same transaction, you cannot decide whether a
 tuple was visible in a certain snapshot unless you have access to the
 updating backend's ComboCID hash.
Thats a very good point. Not sure how I forgot that.

It think it might be possible to reconstruct a sensible combocid mapping from 
the walstream. Let me think about it for a while...

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] COMMUTATOR doesn't seem to work

2012-06-21 Thread D'Arcy Cain


Maybe I am using it wrong but I get no error message when I use it.  I
have a type called chkpass (a version is in the additional supplied
modules) and I create the equality operator like this:

CREATE OPERATOR = (
PROCEDURE = eq,
LEFTARG = chkpass,
RIGHTARG = text,
COMMUTATOR = =,
NEGATOR = 
);

This works;

cosmostest=# SELECT 'aaa'::chkpass = 'aaa';
 ?column?
--
 t
(1 row)

But...

cosmostest=# SELECT 'aaa' = 'aaa'::chkpass;
ERROR:  operator is only a shell: text = chkpass
LINE 1: SELECT 'aaa' = 'aaa'::chkpass;

When I look at the operators I see why, sort of...

 public | =| chkpass   | text   | boolean |
 public | =| text  | chkpass| -   |

So while it created the operator it didn't set a return type.

I don't know if this is a new issue or I simply got lucky and never
tried the opposite test before but I see this in 9.0.4 and 9.1.3.
Am I using the command improperly?

--
D'Arcy J.M. Cain da...@druid.net |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.
IM: da...@vex.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-21 Thread Andres Freund

On Thursday, June 21, 2012 04:05:54 PM Florian Pflug wrote:
 On Jun21, 2012, at 13:41 , Andres Freund wrote:
  5.)
  The actually good idea. Yours?
 
 What about a mixure of (3b) and (4), which writes the data not to the WAL
 but to a separate logical replication log. More specifically:
 
 There's a per-backend queue of change notifications.
 
 Whenever a non-catalog tuple is modified, we queue a TUPLE_MODIFIED
 record containing (xid, databaseoid, tableoid, old xmin, old ctid, new
 ctid)
 
 Whenever a table (or something that a table depends on) is modified we
 wait until all references to that table's oid have vanished from the queue,
 then queue a DDL record containing (xid, databaseoid, tableoid, text).
 Other backend cannot concurrently add further TUPLE_MODIFIED records since
 we alreay hold an exclusive lock on the table at that point.
 
 A background process continually processes these queues. If the front of
 the queue is a TUPLE_MODIFIED record, it fetches the old and the new tuple
 based on their ctids and writes the old tuple's PK and the full new tuple
 to the logical replication log. Since table modifications always wait for
 all previously queued TUPLE_MODIFIED records referencing that table to be
 processes *before* altering the catalog, tuples can always be interpreted
 according to the current (SnapshotNow) catalog contents.
 
 Upon transaction COMMIT and ROLLBACK, we queue COMMIT and ROLLBACK records,
 which are also written to the log by the background process. The background
 process may decide to wait until a backend commits before processing that
 backend's log. In that case, rolled back transaction don't leave a trace in
 the logical replication log. Should a backend, however, issue a DDL
 statement, the background process *must* process that backend's queue
 immediately, since otherwise there's a dead lock.
 
 The background process also maintains a value in shared memory which
 contains the oldest value in any of the queue's xid or old xmin fields.
 VACUUM and the like must not remove tuples whose xmin is = that value.
 Hit bits *may* be set for newest tuples though, provided that the
 background process ignores hint bits when fetching the old and new tuples.
I think thats too complicated to fly. Getting that to recover cleanly in case 
of crash would mean you'd need another wal.

I think if it comes to that going for 1) is more realistic...

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reseting undo/redo logs

2012-06-21 Thread Tom Lane

Edmon Begoli ebeg...@gmail.com writes:
 One of the Greenplum segments went down and it cannot recover because
 PANICXX000 invalid redo/undo record in shutdown checkpoint
 (xlog.c:6576)

 I am posting this question here because most casual users of
 Postgres/Greenplum are telling me that database is hosed, but I think
 that with pg_resetxlog and some
 (http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html) data
 loss I could at least hack database to come back up.

 What I am asking for help here is to help me calculate the reset
 values - where to find the most recent valid one and how to
 *specifically* calculate the reset ones.

pg_controldata should give you useful starting points.  I don't think we
can offer any more help than what is on the pg_resetxlog reference page
as to what to do with them.  (Though you might try reading the more
recent releases' versions of that page to see if anything's been
clarified.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reseting undo/redo logs

2012-06-21 Thread Edmon Begoli

Thanks. I was going down this route, so just your confirmation that
this is the right path is helpful.

Edmon

On Thu, Jun 21, 2012 at 11:58 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Edmon Begoli ebeg...@gmail.com writes:
 One of the Greenplum segments went down and it cannot recover because
 PANIC        XX000 invalid redo/undo record in shutdown checkpoint
 (xlog.c:6576)

 I am posting this question here because most casual users of
 Postgres/Greenplum are telling me that database is hosed, but I think
 that with pg_resetxlog and some
 (http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html) data
 loss I could at least hack database to come back up.

 What I am asking for help here is to help me calculate the reset
 values - where to find the most recent valid one and how to
 *specifically* calculate the reset ones.

 pg_controldata should give you useful starting points.  I don't think we
 can offer any more help than what is on the pg_resetxlog reference page
 as to what to do with them.  (Though you might try reading the more
 recent releases' versions of that page to see if anything's been
 clarified.)

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COMMUTATOR doesn't seem to work

2012-06-21 Thread Tom Lane

D'Arcy Cain da...@druid.net writes:
 Maybe I am using it wrong but I get no error message when I use it.  I
 have a type called chkpass (a version is in the additional supplied
 modules) and I create the equality operator like this:

 CREATE OPERATOR = (
  PROCEDURE = eq,
  LEFTARG = chkpass,
  RIGHTARG = text,
  COMMUTATOR = =,
  NEGATOR = 
 );

Did you actually create a text = chkpass function and operator?
This declaration merely promises that you will provide one eventually.
The system does not have the ability to make one for you.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COMMUTATOR doesn't seem to work

2012-06-21 Thread Florian Pflug

On Jun21, 2012, at 17:46 , D'Arcy Cain wrote:

 Maybe I am using it wrong but I get no error message when I use it.  I
 have a type called chkpass (a version is in the additional supplied
 modules) and I create the equality operator like this:
 
...
 
 So while it created the operator it didn't set a return type.
 
 I don't know if this is a new issue or I simply got lucky and never
 tried the opposite test before but I see this in 9.0.4 and 9.1.3.
 Am I using the command improperly?

COMMUTATOR (and also NEGATOR) only inform the planner/optimizer
about the relationship between operators, but you still have to create
all of them manually. What you see is the placeholder for that
to-be-created operator that CREATE OPERATOR fabricated. It does that
because if the COMMUTATOR or NEGATOR was required to already exist,
how would you ever be able to create a pair of commuting operators?

If you later use CREATE OPERATOR to actually create the COMMUTATOR
or NEGATOR you specified previously, it'll simply complete the
previously created placeholder.

Long story short, you need another CREATE OPERATOR command for the
COMMUTATOR (which argument types swapped), and it should in turn
name the original operator as it's COMMUTATOR.

best regards,
Florian Pflug




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Release versioning inconsistency

2012-06-21 Thread Dickson S. Guedes

2012/6/21 Magnus Hagander mag...@hagander.net:
 On Wed, Jun 20, 2012 at 1:35 PM, Dickson S. Guedes lis...@guedesoft.net 
 wrote:
 2012/6/20 Magnus Hagander mag...@hagander.net:
 On Wed, Jun 20, 2012 at 11:23 AM, Marti Raudsepp ma...@juffo.org wrote:
 On Wed, Jun 20, 2012 at 12:18 PM, Magnus Hagander mag...@hagander.net 
 wrote:
 (I do believe that using the v9.2.0beta marker is
 *better*, because then it sorts properly. But likely not enough much
 better to be inconsistent with previous versions)

 Good point. Maybe that's a reason to change the versioning scheme and
 stick with 9.2.0betaX everywhere. Including calling the final
 release 9.2.0 instead of simply 9.2?

 That might actually be a good idea. We can't really change the way we
 named the betas, but it's not too late to consider naming the actual
 release as 9.2.0...


 May be a symlink could be created just do fit the same pattern that other
 versions do and keeps the actual links (for beta) working.

 That we can do - in fact, done.

It works fine here, thanks!

[]s
-- 
Dickson S. Guedes
mail/xmpp: gue...@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Release versioning inconsistency

2012-06-21 Thread Peter Eisentraut

On tor, 2012-06-21 at 16:19 +0200, Magnus Hagander wrote:
 On Wed, Jun 20, 2012 at 1:35 PM, Dickson S. Guedes lis...@guedesoft.net 
 wrote:
  2012/6/20 Magnus Hagander mag...@hagander.net:
  On Wed, Jun 20, 2012 at 11:23 AM, Marti Raudsepp ma...@juffo.org wrote:
  On Wed, Jun 20, 2012 at 12:18 PM, Magnus Hagander mag...@hagander.net 
  wrote:
  (I do believe that using the v9.2.0beta marker is
  *better*, because then it sorts properly. But likely not enough much
  better to be inconsistent with previous versions)
 
  Good point. Maybe that's a reason to change the versioning scheme and
  stick with 9.2.0betaX everywhere. Including calling the final
  release 9.2.0 instead of simply 9.2?
 
  That might actually be a good idea. We can't really change the way we
  named the betas, but it's not too late to consider naming the actual
  release as 9.2.0...
 
 
  May be a symlink could be created just do fit the same pattern that other
  versions do and keeps the actual links (for beta) working.
 
 That we can do - in fact, done.

Why not just move the whole thing and not increase the confusion?  The
press releases don't refer to the directory directly.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Release versioning inconsistency

2012-06-21 Thread Peter Eisentraut

On tor, 2012-06-21 at 16:17 +0200, Magnus Hagander wrote:
 I don't recall the reason why using 9.2.0betax was actually wrong - i
 realize that's not the name of the version, so thereby the directory
 was wrong. But in what way would it be wrong to call the version that?

It's not the beta for 9.2.0, it's the beta for the 9.2 series.  There is
not 9.2.1betaX, after all.

 Given that it would help with sorting.

How does it help?  9.2.0, 9.2.0betaX, 9.2.1, ... ???



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Btree or not btree? That is the question

2012-06-21 Thread Greg Sabino Mullane

  ERROR:  index pg_class_oid_index is not a btree
 
 That means you got bogus data while reading the metapage.
 I'm beginning to wonder about the hardware on this server ...

This happened again, and this time I went back through 
the logs and found that it is always the exact same query causing 
the issue. I also found it occuring on different servers, 
which rules out RAM anyway (still shared disk, so those are suspect).
This query also sometimes gives errors like this:

ERROR:  could not read block 3 of relation 1663/1554846571/3925298284: 
  read only 0 of 8192 bytes

However, the final number changes: these are invariably temporary relations. 
The query itself is a GROUP BY over a large view and the explain plan is 
107 rows, with nothing esoteric about it. Most of the tables used are 
fairly common ones. I'm trying to duplicate on a non-production box, without 
success so far, and I'm loath to run it on production as it sometimes 
causes multiple backends to freeze up and requires a forceful restart.

Any ideas on how to carefully debug this? There are a couple of quicksorts 
when I explain analyze on a non-prod system, which I am guessing where 
the temp tables come from (work_mem is 24MB). I'm not sure I understand 
what could be causing both the 'read 0' and btree errors for the 
same query - bad blocks on disk for one of the underlying tables?
I'll work next on checking each of the tables the view is using.

-- 
Greg Sabino Mullane g...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8


pgpCBcRgxlYYF.pgp
Description: PGP signature

[HACKERS] pg_dump and dependencies and --section ... it's a mess

2012-06-21 Thread Tom Lane

Don't know if everybody on this list has been paying attention to the
pgsql-bugs thread about bug #6699.  The shortest example of the problem
is

create table t1 (f1 int primary key, f2 text);
create view v1 as select f1, f2 from t1 group by f1;

The view's query is legal only because of the primary key on f1,
else the reference to f2 would be considered inadequately grouped.
So when we create the view we mark it as dependent on that primary key
(or more accurately, the view's _RETURN rule is so marked).  This all
works fine so far as the backend is concerned: you can't drop the PK
without dropping or redefining the view.  But it gives pg_dump
indigestion.  If you do pg_dump -Fc | pg_restore -l -v you see
(relevant entries only):

5; 2615 2200 SCHEMA - public postgres   PRE_DATA
168; 1259 41968 TABLE public t1 postgresPRE_DATA
;   depends on: 5
1921; 2606 41975 CONSTRAINT public t1_pkey postgres POST_DATA
;   depends on: 168 168
169; 1259 41976 VIEW public v1 postgres PRE_DATA
;   depends on: 1919 5
1922; 0 41968 TABLE DATA public t1 postgres DATA
;   depends on: 168

Now, there are two not-nice things about this.  First, the view is shown
as depending on object 1919, which is pg_dump's internal DumpId for
the view's _RETURN rule --- but that's nowhere to be seen in the dump,
so the fact that it's dependent on the t1_pkey constraint is not
visible in the dump.  This puts parallel pg_restore at risk of doing
the wrong thing.  Second, because view definitions are preferentially
dumped in the PRE_DATA section, the t1_pkey constraint has been hoisted
up to come before the table data.  That means that in an ordinary serial
restore, the index will be created before the table's data is loaded,
which is undesirable for efficiency reasons.

It gets worse though.  I've labeled the above archive items with their
SECTION codes (which for some reason pg_restore -l -v doesn't print)
so that you can see that we've got a POST_DATA item that must come
before a PRE_DATA item.  This wreaks havoc with the entire concept
of splitting dump files into three sections.  When I first realized
that, I was thinking that we would have to revert the --section flag
we added to pg_dump/pg_restore in 9.2, for lack of any way to guarantee
that the items can actually be restored if they are split according
to sections.

I think that we can fix it though.  There are provisions in pg_dump
already for splitting a view apart from its _RETURN rule --- and if the
rule comes out as a separate object, it's POST_DATA.  So at least in
principle, we could fix things by dumping this scenario this way:

SCHEMA public   PRE_DATA
TABLE t1PRE_DATA
TABLE v1PRE_DATA (at this point it's a table not a view)
TABLE DATA t1   DATA
CONSTRAINT t1_pkey  POST_DATA
RULE for v1 POST_DATA (adding the rule turns v1 into a view)

The problem is how to persuade pg_dump to do that; as the code stands,
it will only break a view apart from its rule if that's necessary to
break a circular dependency loop, and there is none in this example.

Another point worth thinking about is that if --section is to be trusted
at all, we need some way of guaranteeing that the dependency-sorting
code won't happily create other similar cases.  There is nothing in
there right now that would think anything is wrong with an ordering
that breaks the section division.

I believe the right fix for both of these issues is to add knowledge of
the section concept to the topological sort logic, so that an ordering
that puts POST_DATA before DATA or PRE_DATA after DATA is considered to
be a dependency-ordering violation.  One way to do that is to add dummy
fencepost objects to the sort, representing the start and end of the
DATA section.  However, these objects would need explicit dependency
links to every other DumpableObject, so that doesn't sound very good
from a performance standpoint.  What I'm going to go look at is whether
we can mark DumpableObjects with their SECTION codes at creation time
(rather than adding that information at ArchiveEntry() time) and then
have the topo sort logic take that marking into account in addition to
the explicit dependency links.

Assuming we can make that work, how far should it be back-patched?
The --section issue is new in 9.2, but this would also take care of
the restore problems for views with constraint dependencies, which
are allowed as of 9.1, so I'm thinking we'd better put it into 9.1.

The other problem is the bogus dependency IDs that pg_dump emits by
virtue of not caring whether the dependency links to an object it's
actually dumped.  I posted a patch for that in the pgsql-bugs thread
but pointed out that it would not work before 9.2 without additional
surgery.  If we fix the view vs constraint issue by adding section
knowledge to the sort, I think we can maybe get away with not fixing
the dependency IDs in back

Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)

2012-06-21 Thread Jaime Casanova

On Sun, Jun 10, 2012 at 4:15 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 2012/6/8 Simon Riggs si...@2ndquadrant.com:

 I have a prototype that has some of these characteristics, so I see
 our work as complementary.

 At present, I don't think this patch would be committable in CF1, but
 I'd like to make faster progress with it than that. Do you want to
 work on this more, or would you like me to merge our prototypes into a
 more likely candidate?

 I'm not favor in duplicate similar efforts. If available, could you merge
 some ideas in my patch into your prototypes?


so, we are waiting for a new patch? is it coming from Simon or Kohei?

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Btree or not btree? That is the question

2012-06-21 Thread Tom Lane

Greg Sabino Mullane g...@endpoint.com writes:
 ERROR:  index pg_class_oid_index is not a btree
 That means you got bogus data while reading the metapage.
 I'm beginning to wonder about the hardware on this server ...

 This happened again, and this time I went back through 
 the logs and found that it is always the exact same query causing 
 the issue. I also found it occuring on different servers, 
 which rules out RAM anyway (still shared disk, so those are suspect).
 This query also sometimes gives errors like this:
 ERROR:  could not read block 3 of relation 1663/1554846571/3925298284: 
   read only 0 of 8192 bytes
 However, the final number changes: these are invariably temporary relations. 

Oh really ... okay, maybe it is a software problem then.

 Any ideas on how to carefully debug this? There are a couple of quicksorts 
 when I explain analyze on a non-prod system, which I am guessing where 
 the temp tables come from (work_mem is 24MB).

No, that error message is complaining about an attempt to read an
actual, named, relation (could be temp, no way to be sure from this
info).  A sort might generate temp files but those don't have that
kind of name.

 I'm not sure I understand 
 what could be causing both the 'read 0' and btree errors for the 
 same query - bad blocks on disk for one of the underlying tables?

The could not read thing looks more like an attempt to fetch an
invalid TID; you could get such an error for instance if you had an
index that referenced a tuple in block 3, but the table on disk isn't
that long.  So one possible theory for what's happening here is that
once in a while we get confused about which shared buffer holds which
disk block, and either find the wrong block entirely when looking for
pg_class_oid_index (the first case) or grab a page of the wrong index
in the second case.  This theory would be more plausible if you're
wrong about the second-case tables being temp, though, because if they
are temp then their indexes would be kept in local buffers not shared
buffers, making it much harder to believe in a single bug causing both
symptoms.

One possible mechanism for confusion of that sort would be if the
spinlock code wasn't quite right, or the compiler was incorrectly
moving loads/stores into or out of locked sections.  So it might be
time to ask exactly what kind of hardware this is, which compiler
PG was built with, etc.

On the other hand, if the issue were of that sort then it ought to
affect all buffers more or less at random; so if you're consistently
seeing exactly these symptoms (in particular, if it's always
pg_class_oid_index that's complained of), then I'm not sure I believe
this theory either.

Which PG version again?  Are you in the habit of doing VACUUM FULLs
on system catalogs, and if so do these glitches correlate at all
with such activities?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Incorrect behaviour when using a GiST index on points

2012-06-21 Thread Alexander Korotkov

On Wed, Feb 22, 2012 at 5:56 PM, Alexander Korotkov aekorot...@gmail.comwrote:

 Attached patch fixes GiST behaviour without altering operators behaviour.


I think we definitely should apply this patch before 9.2 release, because
it is a bug fix. Otherwise people will continue produce incorrect GiST
indexes with in-core geometrical opclasses until 9.3. Patch is very simple
and only changes few lines of code.

Any thoughts?

--
With best regards,
Alexander Korotkov.


gistproc_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases

2012-06-21 Thread Alvaro Herrera


Excerpts from Kyotaro HORIGUCHI's message of jue jun 21 08:02:58 -0400 2012:
 
 Ouch!
 
  # UtfToLocal() seems to have a bug that always report illegal
  # encoding was UTF8 regardless of the real encoding. But
  # plper_lc_*.(sql|out) increases if the bug is fixed.
 
 This is not a bug. The error message invalid byte sequence for
 encoding UTF8 meant invalid INPUT byte sequence as encoding
 UTF8 not encoding for output.. So the regtest patch covers all
 of possible patterns - UTF8 and SQL_ASCII..

Right, that confused me too for a while.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases

2012-06-21 Thread Alvaro Herrera


Excerpts from Kyotaro HORIGUCHI's message of jue jun 21 07:22:43 -0400 2012:
 Hello,
 
   Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr
   and SvPVUTF8() when turning a perl string into a cstring.
  
  Right.
 
 I spent a bit longer time catching on pl/perl and now understand
 what is the problem...

Hi,

Thanks for the review.

  So I played a bit with this patch, and touched it a bit mainly just to
  add some more comments; and while at it I noticed that some of the
  functions in Util.xs might leak some memory, so I made an attempt to
  plug them, as in the attached patch (which supersedes yours).
 
 Ok, Is it ok to look into the newer patch including fix of leaks
 at first?

Yeah, I'm going to have to backpatch the whole of it so having full
review is good.  Thanks.

 -- The others
 
 Variable naming in util_quote_*() seems a bit confusing,
 
   text *arg = sv2text(sv);
   text *ret = DatumGetTextP(..., PointerGetDatum(arg)));
   char *str;
   pfree(arg);
   str = text_to_cstring(ret);
   RETVAL = cstr2sv(str);
   pfree(str);
 
 Renaming ret to quoted and str to ret as the patch attached might
 make it easily readable.

I think I'm going to refrain from this because it will be more painful
to backpatch.

  Now, with my version of the patch applied and using a SQL_ASCII database
  to test the problem in the original report, I notice that we now have a
  regression failure:
 snip.
  I'm not really sure what to do here -- maybe have a second expected file
  for that test is a good enough answer?  Or should I just take the test
  out?  Opinions please.
 
 
 The attached ugly patch does it. We seem should put NO_LOCALE=1
 on the 'make check' command line for the encodings not compatible
 with the environmental locale, although it looks work.

The idea of separating the test into its own file has its merit; but
instead of having two different tests, I'm going to have a single test
and two expected files.  That seems simpler than messing around in the
makefile.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Btree or not btree? That is the question

2012-06-21 Thread Greg Sabino Mullane

I dug through the logs and found some other occurances of 
the could not read block errors. Some on dirt simple 
SELECT queries. Nothing else has generated the btree 
error yet. About 35 found in the last month.

 This theory would be more plausible if you're wrong about the second-case 
 tables being temp, though, because if they are temp then their indexes 
 would be kept in local buffers not shared buffers, making it much harder 
 to believe in a single bug causing both symptoms.

I grepped the last month of logs and found about 20 instances of 
that error: none of the relfilenodes given shows up in pg_class, 
even for that dirt simple SELECT.

 One possible mechanism for confusion of that sort would be if the
 spinlock code wasn't quite right, or the compiler was incorrectly
 moving loads/stores into or out of locked sections.  So it might be
 time to ask exactly what kind of hardware this is, which compiler
 PG was built with, etc.

Quad core AMD Opteron. RHEL. Compiled with gcc with all the options 
(basically the standard compilation e.g. --build=x86_64-redhat-linux-gnu)
I can give you more details offlist if it will help.

 On the other hand, if the issue were of that sort then it ought to
 affect all buffers more or less at random; so if you're consistently
 seeing exactly these symptoms (in particular, if it's always
 pg_class_oid_index that's complained of), then I'm not sure I believe
 this theory either.

I've never seen any other index for the btree error, but it has only 
happened a grand total of 3 times ever. The other error appears to 
be fairly random, except that the one particular query that gives 
the btree error always seems to give one version or the other.

 Which PG version again?  Are you in the habit of doing VACUUM FULLs
 on system catalogs, and if so do these glitches correlate at all
 with such activities?

Heh. 8.3.18. Yes, very heavy vac fulls (and reindexes) of the system 
catalogs. Cron-driven, and depends on the time of day and if any 
DDL is running (if so, it does not run), but probably on average 
pg_class is vacfulled and reindexed twice an hour during the times 
this happens (which is, during normal business hours). There is a lot 
in churn in pg_class, pg_attribute, and pg_depend in particular from 
all the temp stuff being created and torn down all day, as well as 
some Bucardo pg_class updating.

-- 
Greg Sabino Mullane g...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8


pgpLQFTbOf8Tw.pgp
Description: PGP signature

Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases

2012-06-21 Thread Alvaro Herrera


Excerpts from Alex Hunsaker's message of jue jun 21 10:27:41 -0400 2012:
 
 On Wed, Jun 20, 2012 at 1:15 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Excerpts from Alex Hunsaker's message of vie feb 10 16:53:05 -0300 2012:
 
  Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr
  and SvPVUTF8() when turning a perl string into a cstring.
 
  Right.
 
  So I played a bit with this patch, and touched it a bit mainly just to
  add some more comments; and while at it I noticed that some of the
  functions in Util.xs might leak some memory, so I made an attempt to
  plug them, as in the attached patch (which supersedes yours).
 
 I think most of these leaks go back to 9.0. Dunno if its worth
 backpatching them...

Well, nobody has ever complained about them so maybe they're just not
worth the effort.

  to test the problem in the original report, I notice that we now have a
  regression failure:
 
  I'm not really sure what to do here -- maybe have a second expected file
  for that test is a good enough answer?  Or should I just take the test
  out?  Opinions please.
 
 I think we have broken that check twice so it seems like it would be
 nice to keep. But I don't feel *to* strongly about it.

Hmm, if we're broken it then it's probably best to keep it.

 The comment and cleanups all look good to me.

Thanks for the review.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SP-GiST for ranges based on 2d-mapping and quad-tree

2012-06-21 Thread Alexander Korotkov

On Thu, Jun 21, 2012 at 11:12 AM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

 On 14.06.2012 01:56, Alexander Korotkov wrote:

 Hackers,

 attached patch implements quad-tree on ranges. Some performance results in
 comparison with current GiST indexing.


  @@ -788,7 +774,7 @@ range_super_union(**TypeCacheEntry *typcache,
 RangeType * r1, R
 angeType * r2)
  * part of the relcache entry for the index, typically) this essentially
  * eliminates lookup overhead during operations on a GiST range index.
  */
 -static Datum
 +Datum
  TrickFunctionCall2(PGFunction proc, FmgrInfo *flinfo, Datum arg1, Datum
 arg2)
  {
FunctionCallInfoData fcinfo;


 I don't think we want to expose TrickFunctionCall2(). Not with that name,
 anyway. Perhaps we should refactor the functions called this way,
 range_adjacent, range_overlaps etc., to have internal counterparts that can
 be called without FunctionCall(). Like:

  ***
 *** 692,697 
 --- 692,708 
  {
RangeType  *r1 = PG_GETARG_RANGE(0);
RangeType  *r2 = PG_GETARG_RANGE(1);
 +
 +   typcache = range_get_typcache(fcinfo, RangeTypeGetOid(r1));
 +
 +   PG_RETURN_BOOL(range_adjacent_**internal(r1, r2, typcache);
 + }
 +
 + bool
 + range_adjacent_internal(**RangeType r1, RangeType r2, TypeCacheEntry
 *typcache)
 + {
 +   RangeType  *r1 = PG_GETARG_RANGE(0);
 +   RangeType  *r2 = PG_GETARG_RANGE(1);
TypeCacheEntry *typcache;
RangeBound  lower1,
lower2;


 The gist and SP-gist consistent functions could call the internal function
 directly.


I like idea of replacing TrickFunctionCall2 with internal function. Do you
think I should post a separate patch for existing GiST code?

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] SP-GiST for ranges based on 2d-mapping and quad-tree

2012-06-21 Thread Alexander Korotkov

On Thu, Jun 14, 2012 at 2:56 AM, Alexander Korotkov aekorot...@gmail.comwrote:

 attached patch implements quad-tree on ranges. Some performance results in
 comparison with current GiST indexing.
 Index creation is slightly slower. Probably, it need some investigation.
 Search queries on SP-GiST use much more pages. However this comparison can
 be not really correct, because SP-GiST can pin same buffer several times
 during one scan. In CPU search queries on SP-GiST seems to be slightly
 faster. Dramatical difference in column @ const query is thanks to
 2d-mapping.


Patch with another SP-GiST implementation for ranges is attached. It uses
k-d tree instead of quad-tree. I would like to leave only one
implementation of SP-GiST for ranges. I'm going to do as comprehensive
testing as I can for it.

--
With best regards,
Alexander Korotkov.


range_spgist_kd-0.1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SP-GiST for ranges based on 2d-mapping and quad-tree

2012-06-21 Thread Tom Lane

Alexander Korotkov aekorot...@gmail.com writes:
 On Thu, Jun 21, 2012 at 11:12 AM, Heikki Linnakangas 
 heikki.linnakan...@enterprisedb.com wrote:
 I don't think we want to expose TrickFunctionCall2(). Not with that name,
 anyway. Perhaps we should refactor the functions called this way,
 range_adjacent, range_overlaps etc., to have internal counterparts that can
 be called without FunctionCall(). Like:

 I like idea of replacing TrickFunctionCall2 with internal function. Do you
 think I should post a separate patch for existing GiST code?

+1 ... that was a pretty grotty hack, so let's get rid of it if we can.
It's still going to require some documentation though I think.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)

2012-06-21 Thread Simon Riggs

On 21 June 2012 19:13, Jaime Casanova ja...@2ndquadrant.com wrote:
 On Sun, Jun 10, 2012 at 4:15 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 2012/6/8 Simon Riggs si...@2ndquadrant.com:

 I have a prototype that has some of these characteristics, so I see
 our work as complementary.

 At present, I don't think this patch would be committable in CF1, but
 I'd like to make faster progress with it than that. Do you want to
 work on this more, or would you like me to merge our prototypes into a
 more likely candidate?

 I'm not favor in duplicate similar efforts. If available, could you merge
 some ideas in my patch into your prototypes?


 so, we are waiting for a new patch? is it coming from Simon or Kohei?

There is an updated patch coming from me. I thought I would focus on
review of other work first.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump and dependencies and --section ... it's a mess

2012-06-21 Thread Andrew Dunstan




On 06/21/2012 02:13 PM, Tom Lane wrote:

Don't know if everybody on this list has been paying attention to the
pgsql-bugs thread about bug #6699.  The shortest example of the problem
is

create table t1 (f1 int primary key, f2 text);
create view v1 as select f1, f2 from t1 group by f1;

The view's query is legal only because of the primary key on f1,
else the reference to f2 would be considered inadequately grouped.
So when we create the view we mark it as dependent on that primary key
(or more accurately, the view's _RETURN rule is so marked).  This all
works fine so far as the backend is concerned: you can't drop the PK
without dropping or redefining the view.  But it gives pg_dump
indigestion.  If you do pg_dump -Fc | pg_restore -l -v you see
(relevant entries only):

5; 2615 2200 SCHEMA - public postgres   PRE_DATA
168; 1259 41968 TABLE public t1 postgresPRE_DATA
;   depends on: 5
1921; 2606 41975 CONSTRAINT public t1_pkey postgres POST_DATA
;   depends on: 168 168
169; 1259 41976 VIEW public v1 postgres PRE_DATA
;   depends on: 1919 5
1922; 0 41968 TABLE DATA public t1 postgres DATA
;   depends on: 168

Now, there are two not-nice things about this.  First, the view is shown
as depending on object 1919, which is pg_dump's internal DumpId for
the view's _RETURN rule --- but that's nowhere to be seen in the dump,
so the fact that it's dependent on the t1_pkey constraint is not
visible in the dump.  This puts parallel pg_restore at risk of doing
the wrong thing.  Second, because view definitions are preferentially
dumped in the PRE_DATA section, the t1_pkey constraint has been hoisted
up to come before the table data.  That means that in an ordinary serial
restore, the index will be created before the table's data is loaded,
which is undesirable for efficiency reasons.

It gets worse though.  I've labeled the above archive items with their
SECTION codes (which for some reason pg_restore -l -v doesn't print)
so that you can see that we've got a POST_DATA item that must come
before a PRE_DATA item.  This wreaks havoc with the entire concept
of splitting dump files into three sections.  When I first realized
that, I was thinking that we would have to revert the --section flag
we added to pg_dump/pg_restore in 9.2, for lack of any way to guarantee
that the items can actually be restored if they are split according
to sections.

I think that we can fix it though.  There are provisions in pg_dump
already for splitting a view apart from its _RETURN rule --- and if the
rule comes out as a separate object, it's POST_DATA.  So at least in
principle, we could fix things by dumping this scenario this way:

SCHEMA public   PRE_DATA
TABLE t1PRE_DATA
TABLE v1PRE_DATA (at this point it's a table not a view)
TABLE DATA t1   DATA
CONSTRAINT t1_pkey  POST_DATA
RULE for v1 POST_DATA (adding the rule turns v1 into a view)

The problem is how to persuade pg_dump to do that; as the code stands,
it will only break a view apart from its rule if that's necessary to
break a circular dependency loop, and there is none in this example.

Another point worth thinking about is that if --section is to be trusted
at all, we need some way of guaranteeing that the dependency-sorting
code won't happily create other similar cases.  There is nothing in
there right now that would think anything is wrong with an ordering
that breaks the section division.

I believe the right fix for both of these issues is to add knowledge of
the section concept to the topological sort logic, so that an ordering
that puts POST_DATA before DATA or PRE_DATA after DATA is considered to
be a dependency-ordering violation.  One way to do that is to add dummy
fencepost objects to the sort, representing the start and end of the
DATA section.  However, these objects would need explicit dependency
links to every other DumpableObject, so that doesn't sound very good
from a performance standpoint.  What I'm going to go look at is whether
we can mark DumpableObjects with their SECTION codes at creation time
(rather than adding that information at ArchiveEntry() time) and then
have the topo sort logic take that marking into account in addition to
the explicit dependency links.

Assuming we can make that work, how far should it be back-patched?
The --section issue is new in 9.2, but this would also take care of
the restore problems for views with constraint dependencies, which
are allowed as of 9.1, so I'm thinking we'd better put it into 9.1.

The other problem is the bogus dependency IDs that pg_dump emits by
virtue of not caring whether the dependency links to an object it's
actually dumped.  I posted a patch for that in the pgsql-bugs thread
but pointed out that it would not work before 9.2 without additional
surgery.  If we fix the view vs constraint issue by adding section
knowledge to the sort, I think we can maybe get

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-21 Thread Peter Geoghegan

On 20 June 2012 14:38, Andres Freund and...@2ndquadrant.com wrote:
 It incurs a rather high performance overhead due to added memory allocations
 and added pointer indirections. Thats fine for most of the current users of
 the List interface, but certainly not for all. In other places you cannot even
 have memory allocations because the list lives in shared memory.

Yes, in general lists interact horribly with the memory hierarchy. I
think I pointed out to you once a rant of mine on -hackers a while
back in which I made various points about just how badly they do these
days.

On modern architectures, with many layers of cache, the cost of the
linear search to get an insertion point is very large. So this:

/*
 * removes a node from a list
 * Attention: O(n)
 */
static inline void ilist_s_remove(ilist_s_head *head,
  ilist_s_node *node)


is actually even worse than you might suspect.

 E.g. in the ApplyCache, where I use the submitted ilist.h stuff, when
 reconstructing transactions you add to a potentially really long linked list
 of individual changes for every interesting wal record. Before I prevented
 memory allocations in that path it took about 12-14% of the time when applying
 changes in the same backend. Afterwards it wasn't visible in the profile
 anymore.

I find that very easy to believe.

 Several of the pieces of code I pointed out in a previous email use open-coded
 list implementation exactly to prevent those problems.

Interesting.

So, it seems like this list implementation could be described as a
minimal embeddable list implementation that requires the user to do
all the memory allocation, and offers a doubly-linked list too. Not an
unreasonable idea. I do think that the constraints you have are not
well served by any existing implementation, including List and Dllist.
Are you planning on just overhauling the Dllist interface in your next
iteration?

As to the question of inling, the C99 standard (where inlining is
standardised by ANSI, but inspired by earlier extensions to C), unlike
the C89 standard, seems to be well respected by vendors as far as it
goes, with some compilers going to pains to implement it correctly,
like ICC and Clang. We can't really switch to C99, because MSVC
doesn't support it, and it is patently obvious that Microsoft have
zero interest in it.

Funnily enough, if anyone takes C89 as a standard seriously still,
it's Microsoft, if only due to indifference to later standards. This
hack exists purely for the benefit of their strict interpretation of
C89, I think:

/* Define to `__inline__' or `__inline' if that's what the C compiler
   calls it, or to nothing if 'inline' is not supported under any name.  */
#ifndef __cplusplus
/* #undef inline */
#endif

If anyone today is using PostgreSQL binaries in production that were
built with a compiler that does not USE_INLINE, I would be very
surprised indeed. The idea that anyone intends to build 9.3 with a
compiler that doesn't support inline functions is very difficult to
believe. Other C open source projects like Linux freely use inline
functions. Now, granted, it was only possible to build the kernel for
a long time using gcc, but inline had nothing to do with the problem
of building the kernel.

My point is that broadly it makes more practical sense to talk about
GNU C as a standard than C89, and GNU C supports inline functions (C99
is a different matter, but that isn't going to happen in the
foreseeable future). Don't believe me? This is from our configure
script:

# Check if it's Intel's compiler, which (usually) pretends to be gcc,
# but has idiosyncrasies of its own.  We assume icc will define
# __INTEL_COMPILER regardless of CFLAGS.

All of the less popular compilers we support we support precisely
because they pretend to be GCC, with the sole exception, as always, of
the Microsoft product, in this case MSVC. So my position is that I'm
in broad agreement that we should freely allow the use of inline
without macro hacks, since we generally resists using macro hacks if
that makes code ugly, which USE_INLINE certainly does, and for a
benefit that is indistinguishable from zero, at least to me.

Why are you using the stdlib's assert.h? Why have you used the
NDEBUG macro rather than USE_ASSERT_CHECKING? This might make sense if
the header was intended to live in port, but it isn't, right?

Why have you done this:

#ifdef __GNUC__
#define unused_attr __attribute__((unused))
#else
#define unused_attr
#endif

and then gone on to use this unused_attr macro all over the place?
Firstly, that isn't going to suppress the warnings on many platforms
that we support, and we do make an effort to build warning free on at
least 3 compilers these days - GCC, Clang and MSVC. Secondly,
compilers give these warnings because it doesn't make any sense to
have an unused parameter - so why have you used one? At the very
least, if you require this exact interface, use compatibility macros.
I can't imagine why

Re: [HACKERS] pg_dump and dependencies and --section ... it's a mess

2012-06-21 Thread Tom Lane

Andrew Dunstan and...@dunslane.net writes:
 If I'm understanding you correctly, fixing the bogus dependency thing is 
 more an insurance policy than fixing a case (other than the constraint 
 dependency) that is known to be broken.

Right.  That's the only *known* broken case, and it does seem like
we'd have heard by now about others.  Also, what I have in mind will
cause at least HEAD, and however far we back-patch it, to actively
complain if it runs into a case where the sections can't be separated,
rather than silently outputting items in a funny order as now.  So
if there are any more cases lurking I think we'll hear about them
quickly, and then we can evaluate whether further backpatching is
required.

 (There's another bug to do with parallel pg_restore and clustering that 
 Andrew Hammond raised back in January, that I want to fix when I get 
 some time.)

Hm, I guess I've forgotten that one?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump and dependencies and --section ... it's a mess

2012-06-21 Thread Andrew Dunstan




On 06/21/2012 06:25 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

(There's another bug to do with parallel pg_restore and clustering that
Andrew Hammond raised back in January, that I want to fix when I get
some time.)

Hm, I guess I've forgotten that one?


See http://archives.postgresql.org/pgsql-hackers/2012-01/msg00561.php

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump and dependencies and --section ... it's a mess

2012-06-21 Thread Tom Lane

Andrew Dunstan and...@dunslane.net writes:
 On 06/21/2012 06:25 PM, Tom Lane wrote:
 Hm, I guess I've forgotten that one?

 See http://archives.postgresql.org/pgsql-hackers/2012-01/msg00561.php

I didn't understand that then, and I still don't.  The ALTER TABLE
CLUSTER might need exclusive lock, but it's not going to hold the lock
long enough to be an issue.  I could see that there's a problem with
identify_locking_dependencies believing that two CONSTRAINT items
conflict (do they really?) but not convinced the CLUSTER aspect has
anything to do with it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-21 Thread Tom Lane

Peter Geoghegan pe...@2ndquadrant.com writes:
 All of the less popular compilers we support we support precisely
 because they pretend to be GCC, with the sole exception, as always, of
 the Microsoft product, in this case MSVC.

This is nonsense.  There are at least three buildfarm machines running
compilers that do not pretend to be gcc (at least, configure
recognizes them as not gcc) and are not MSVC either.  We ought to have
more IMO, because software monocultures are dangerous.  Of those three,
two pass the quiet inline test and one --- the newest of the three
if I guess correctly --- does not.  So it is not the case that
!USE_INLINE is dead code, even if you adopt the position that we don't
care about any compiler not represented in the buildfarm.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-21 Thread Peter Geoghegan

On 22 June 2012 01:04, Tom Lane t...@sss.pgh.pa.us wrote:
 This is nonsense.  There are at least three buildfarm machines running
 compilers that do not pretend to be gcc (at least, configure
 recognizes them as not gcc) and are not MSVC either.

So those three don't have medium to high degrees of compatibility with GCC?

 We ought to have more IMO, because software monocultures are dangerous.  Of
 those three, two pass the quiet inline test and one --- the newest of the 
 three
 if I guess correctly --- does not.  So it is not the case that
 !USE_INLINE is dead code, even if you adopt the position that we don't
 care about any compiler not represented in the buildfarm.

I note that you said that it doesn't pass the quiet inline test, and
not that it doesn't support inline functions.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump and dependencies and --section ... it's a mess

2012-06-21 Thread Andrew Dunstan




On 06/21/2012 07:43 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

On 06/21/2012 06:25 PM, Tom Lane wrote:

Hm, I guess I've forgotten that one?

Seehttp://archives.postgresql.org/pgsql-hackers/2012-01/msg00561.php

I didn't understand that then, and I still don't.  The ALTER TABLE
CLUSTER might need exclusive lock, but it's not going to hold the lock
long enough to be an issue.  I could see that there's a problem with
identify_locking_dependencies believing that two CONSTRAINT items
conflict (do they really?) but not convinced the CLUSTER aspect has
anything to do with it.




If something else holds a lock on the table (e.g. another CREATE INDEX) 
the ALTER TABLE will block until it's done, waiting for an ACCESS 
EXCLUSIVE lock. The whole method of operation of parallel restore is 
that we are not supposed to start items that might be blocked by 
currently running operations.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump and dependencies and --section ... it's a mess

2012-06-21 Thread Tom Lane

Andrew Dunstan and...@dunslane.net writes:
 On 06/21/2012 07:43 PM, Tom Lane wrote:
 I didn't understand that then, and I still don't.

 If something else holds a lock on the table (e.g. another CREATE INDEX) 
 the ALTER TABLE will block until it's done, waiting for an ACCESS 
 EXCLUSIVE lock. The whole method of operation of parallel restore is 
 that we are not supposed to start items that might be blocked by 
 currently running operations.

Oh, I see --- Andrew Hammond's complaint involves *three* index
creations on the same table (one running; a second one completed except
for the ALTER TABLE CLUSTER step attached to it, which is blocked behind
the running index creation; and a third blocked behind the ALTER TABLE
CLUSTER).  I had misread it to imply that only two indexes were enough
to manifest the problem.

Yeah, we should break out the ALTER TABLE CLUSTER step as a separate
TOC item to fix this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Allow breaking out of hung connection attempts

2012-06-21 Thread Shigeru HANADA


(2012/05/01 0:30), Ryan Kelly wrote:

On Mon, Apr 30, 2012 at 09:02:33AM -0400, Alvaro Herrera wrote:

Well, do *you* want it?

Of course. That way I can stop patching my psql and go back to using the
one that came with my release :)


Here is result of my review of v4 patch.

Submission
--
- The patch is in context diff format
- It needs rebase for HEAD, but patch command takes care automatically.
- Make finishes cleanly, and all regression tests pass

Functionality
-
After applying this patch, I could cancel connection attempt at start-up 
by pressing Ctrl+C on terminal just after invoking psql command with an 
unused IP address.  Without this patch, such attempt ends up with error 
such as No route to host after uninterruptable few seconds (maybe the 
duration until error would depend on environment).


Connection attempt by \connect command could be also canceled by 
pressing Ctrl+C on psql prompt.


In addition, I tried setting PGCONNECT_TIMEOUT to 0 (infinite), but psql 
gave up after few seconds, for both start-up and re-connect.  Is this 
intentional behavior?


Detail of changes
-
Basically this patch consists of three changes:

1) defer setup of cancel handler until start-up connection has established
2) new libpq API PQconnectTimeout which returns connect_timeout value of 
current session
3) use asynchronous connection API at \connect psql command, this 
requires API added by 2)


These seem reasonable to achieve canceling connection attempt at 
start-up and \connect, but, as Ryan says, codes added to do_connect 
might need more simplification.


I have some random comments.

- Checking status by calling PQstatus just after PQconnectStartParams is 
necessary.
- Copying only select() part of pqSocketPoll seems not enough, should 
we use poll(2) if it is supported?
- Don't we need to clear error message stored in PGconn after 
PQconnectPoll returns OK status, like connectDBComplete?


Regards,
--
Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COMMUTATOR doesn't seem to work

2012-06-21 Thread D'Arcy Cain


On 12-06-21 12:18 PM, Tom Lane wrote:

Did you actually create a text = chkpass function and operator?
This declaration merely promises that you will provide one eventually.
The system does not have the ability to make one for you.


I guess I am missing the point of COMMUTATOR then.  The docs say When
you are defining a self-commutative operator, you just do it.  It
seems you need to do more than just do it.  As far as I can tell I
will need to add another C function and another CREATE OPERATOR.
Then I can create an operator that takes args (text, chkpass) instead
of (chkpass, text).  What is the COMMUTATOR for then?  Is it just a
hint to the planner?

--
D'Arcy J.M. Cain da...@druid.net |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.
IM: da...@vex.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Skip checkpoint on promoting from streaming replication

2012-06-21 Thread Kyotaro HORIGUCHI

Hello,

 Is it guaranteed that all the files (e.g., the latest timeline history file)
 required for such crash recovery exist in pg_xlog? If yes, your
 approach might work well.

Particularly regarding the promotion, the files reuiqred are the
history file of the latest timeline, the WAL file including redo
location of the latest restartpoint, and all WAL files after the
first one each of which is of appropriate timeline.

On current (9.2/9.3dev) implement, as far as I know, archive
recovery and stream replication will create regular WAL files
requireded during recovery sequence in slave's pg_xlog
direcotory. And only restart point removes them older than the
one on which the restart point takes place. If so, all required
files mentioned above should be in pg_xlog directory. Is there
something I've forgotten?

However, it will be more robust if we could check if all required
files available on promotion. I could guess two approaches which
might accomplish that.

1. Record the id of the WAL segment which is not in pg_xlog as
   regular WAL file on reading.

   For example, if we modify archive recovery so as to make work
   WAL files out of pg_xlog or give a special name which cannot
   be refferred to for fetching them in crash recovery afterward,
   record the id of the segment. The shutdown checkpoint on
   promotion or end of recovery cannot be skipped if this
   recorded segment id is equal or larger than redo point of the
   latest of checkpoint. This approach of cource reduces the
   chance to skip shutdown checkpoint than forcing to copy all
   required files into pg_xlog, but still seems to be effective
   for most common cases, say promoting enough minutes after
   wal-streaming started to have a restart point on a WAL in
   pg_xlog.

   I hope this is promising.

   Temporary WAL file for streaming? It seems for me to make
   shutdown checkpoint mandatory since no WAL files before
   promotion becomes accessible at the moment. On the other hand,
   preserving somehow the WALs after the latest restartpoint
   seems to have not significant difference to the current way
   from the viewpoint of disk consumption.

2. Check for all required WAL files on promotion or end of
   recovery.

   We could check the existence of all required files on
   promotion scanning with the manner similar to recovery. But
   this requires to add the codes similar to the existing or
   induces the labor to weave new function into existing
   code. Furthurmore, this seems to take certain time on
   promotion (or end of recovery).

   The discussion about temporary wal files would be the same to 1.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COMMUTATOR doesn't seem to work

2012-06-21 Thread Tom Lane

D'Arcy Cain da...@druid.net writes:
 On 12-06-21 12:18 PM, Tom Lane wrote:
 Did you actually create a text = chkpass function and operator?
 This declaration merely promises that you will provide one eventually.
 The system does not have the ability to make one for you.

 I guess I am missing the point of COMMUTATOR then.  The docs say When
 you are defining a self-commutative operator, you just do it.  It
 seems you need to do more than just do it.

Um, an operator with different types on left and right cannot be its own
commutator.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COMMUTATOR doesn't seem to work

2012-06-21 Thread D'Arcy Cain


On 12-06-22 12:22 AM, Tom Lane wrote:

Um, an operator with different types on left and right cannot be its own
commutator.


Understood.  I completely misunderstood the purpose of COMMUTATOR.  I
thought that it was telling the system that the procedure could be
called with the arguments reversed if it could not find a specific
procedure.  I see now what it is for.

So I have my type working now but I had to create a new C function
that take the opposite argument order.  Seems redundant but I could
not see a better way.

--
D'Arcy J.M. Cain da...@druid.net |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.
IM: da...@vex.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

70 matches

Mail list logo